System Architecture
The system architecture of the MRPF project can be broken down into a few key components:
MRPF API
The MRPF API serves as the interface for clients to interact with the collected data and manage the scheduling of tasks.
Task Manager
The Task Manager is responsible for scheduling and managing various scanning tasks. It handles task creation, execution, and monitoring.
Network Engine
The Network Engine a fast network scanning engine based on masscan. It is responsible for sending and receiving network packets using a separate receive and transmit thread. The engine itself exposes traits which can be used to implement different scanning techniques.
TCP SYN Scanner
The TCP SYN Scanner is a specific implementation of a scanning technique that utilizes TCP SYN packets to identify open ports on target hosts. It leverages the Network Engine for packet transmission and reception.
HTTP/1.1 Scanner
The HTTP/1.1 Scanner is designed to perform HTTP/1.1 requests to target hosts and analyze the responses. It can be used to identify web servers, gather information about web applications, and detect potential vulnerabilities.
TLS Certificate Scanner
The TLS Certificate Scanner is responsible for retrieving and analyzing TLS/SSL certificates from target hosts. It can be used to identify the certificate issuer, expiration dates, and potential vulnerabilities in the SSL configuration. It uses the Network Engine for network communication and has a custom TLS implementation to extract certificate information without relying on external libraries.
DNS Resolver
NOT IMPLEMENTED YET
The DNS Resolver is responsible for querying DNS servers to resolve domain names to IP addresses. It utilizes the Network Engine to perform these queries and gather relevant data.
Whois Resolver
NOT IMPLEMENTED YET
The Whois Resolver is responsible for querying Whois databases to retrieve information about domain names and IP addresses. It utilizes the Network Engine to perform these queries and gather relevant data.
Models
The various components of the MRPF project utilize a shared set of models defined in the mrpf_models crate. These models define the data structures and types used throughout the system, ensuring consistency and interoperability between different components.
Iterator Models
Most of the models are reasonably straightforward. However, the Ipv4Range, Ipv4Addresses, Ports and PortRange models deserve special attention.
These models implement a custom Iterator that shuffles the order of IP addresses and ports to avoid predictable scanning patterns. The algorithm idea was taken from masscan.
The iterator ensures each item is returned only once but avoids having to store all items in memory at once. It accomplishes this by only storing the start and end values within the Ipv4Addresses and PortRange models. The iterator uses a Feistel cipher to generate a pseudo-random permutation of the range of values, allowing for efficient iteration without repetition.
The low memory footprint is very useful in our task manager as it reduces the SQS message sizes, the database storage requirements and the RAM usage of the workers.
In the future we may introduce similar iterators, for instance for domain names. When trying to fuzz for new subdomains, we could reduce memory footprint by storing it as a hierarchical structure instead of a flat list.