Introduction

My Rusty Pension Fund is my attempt at writing tools for my bug bounty hobby.

I have scanners, fuzzers, wordlist generators, task managers, iOS and macOS apps, the WORKS! Although most of it doesn't really work, neither have I ever found a real bug with any of these tools.

Still, all good fun to build and learn stuff!

Its focus is to build a very fast and memory/cpu optimized scanner for various scenarios. The main initial focus are specialized scanners for:

TCP Syn port scanning
TLS certificate Subject Alt Name scraper
HTTP(S) scanning

The Rust library will implement it's own custom network stack using libpnet to be able to optimize everything as much as possible, similar to the masscan service. The network stack should be easily extensible, adding new scan types in the future that leverage the custom network stack.

The core concept of the network stack is that it uses libpnet in datalink layer and contruct my own ethernet, ip and tcp packets. It will be running in two separate threads, one for sending packets and another for receiving packets. Similar to masscan, synchronisation of the send and receive thread is avoided by using the tcp sequence and ack numbers with a hash to keep track of what sending packet corresponds to what received packet.

How do the libraries work together

Main principle is:

All MRPF tools like scanning, filtering, fuzzing logic should be in external libs (eg. tcp_syn_scanner, cert_transparency, matchers, http1_scanner, etc).

This will allow us to build different 'front ends' for these tools.

MRPF Jobs

The mrpf_core::tasks is the only place that defines Tasks.
Task pull in all external MRPF tool libs in and executes the them in the execute function of the relevant tasks.
The worker nodes are very simple. They pull in the mrpf_core::tasks library and run the tasks. With this architecture, it should be possible to have a single lambda that can execute any arbitrary task. This would make the AWS specific code very small, allowing us to move to other workers without any real effort.

TBD if we would separate the MRPF API for scheduling jobs as a separate api or integrate it with the MRPF API. A compromise could be to build a separate API, but do include endpoints for both MRPF API and MRPF Jobs into the MRPF API Client.

WebSocket interface for iOS/macOS frontend

The mrpf_scanner_api provides a WebSocket interface for real-time communication between iOS and macOS applications. It allows you to schedule tools and receive updates on their progress.

MRPF API

This is an API that stored all our recon data.

MRPF API Client

This is an async client for interacting with the MRPF API.

CLI

The mrpf_cli is a command-line interface for interacting with the MRPF framework. It allows users to initiate scans, manage tasks, and retrieve results directly from the terminal.

At the moment it hasn't really been implemented yet.

How will we build REST API's?

I want it to be reasonably easy to move away from AWS for my REST API's. I'd also want to try and save costs as API Gateway can get expensive. You could run a Rust API fully inside a single lambda. There are two well known paths for this:

Use the official lambda_http crate (ALB / API Gateway / Function URLs)
Run a normal Axum/Hyper server inside Lambda via the AWS Lambda Web Adapter (eg. check this blog https://blog.yuki-dev.com/blogs/9qjgwg-des1z and AWS reInvent own slides here: https://d1.awsstatic.com/events/Summits/reinvent2023/BOA311_Unlocking-serverless-web-applications-with-AWS-Lambda-Web-Adapter.pdf)

Remember 'The Algorithm'

Elon musk's algorithm, should be applied to everything, in this specific order:

Question every requirement

What things do I REALLY need?
- Continuously find attack surface
- Quickly fuzz endpoints with payloads
- Detect anomalies in responses
What things don't I really need but am I often looking for?
- Store all the results
- Have a fancy UI
Delete any part of the process you can

The most common thing that an engineer does is optimizing a thing that shouldn't exist.
simplify and optimize
Accelerate cycle time
Automate

Core Principles

Computer systems, protocols and applications are all built on abstractions.

These abstractions help us reason about higher level concepts and speed up development by hiding complexity and to avoid reinventing the wheel.

However, in security research it's crucial to understand the underlying reality behind these abstractions. Security vulnerabilities often arise from a mismatch between these layers of abstraction.

I love to understand how things really work. When you start to dig into the lower level systems, you develop a good intuition for how a system operates. I've always had a tendency to want to build everything from scratch, as if its cheating to use something that you don't fully understand. This can hold you back but now with the advent of AI, learning and building systems has become much easier.

My goal with MRPF is to try to keep the underlying reality front and center. This sometimes comes at the cost of less intuitive or more verbose interfaces, but differentiates the toolset from most other tools out there.

TODO: Write my blog around misconceptions around hosts, ips, domains, root paths, dns, TLS SNI etc.

System Architecture

The system architecture of the MRPF project can be broken down into a few key components:

MRPF API

The MRPF API serves as the interface for clients to interact with the collected data and manage the scheduling of tasks.

Task Manager

The Task Manager is responsible for scheduling and managing various scanning tasks. It handles task creation, execution, and monitoring.

Network Engine

The Network Engine a fast network scanning engine based on masscan. It is responsible for sending and receiving network packets using a separate receive and transmit thread. The engine itself exposes traits which can be used to implement different scanning techniques.

TCP SYN Scanner

The TCP SYN Scanner is a specific implementation of a scanning technique that utilizes TCP SYN packets to identify open ports on target hosts. It leverages the Network Engine for packet transmission and reception.

HTTP/1.1 Scanner

The HTTP/1.1 Scanner is designed to perform HTTP/1.1 requests to target hosts and analyze the responses. It can be used to identify web servers, gather information about web applications, and detect potential vulnerabilities.

TLS Certificate Scanner

The TLS Certificate Scanner is responsible for retrieving and analyzing TLS/SSL certificates from target hosts. It can be used to identify the certificate issuer, expiration dates, and potential vulnerabilities in the SSL configuration. It uses the Network Engine for network communication and has a custom TLS implementation to extract certificate information without relying on external libraries.

DNS Resolver

NOT IMPLEMENTED YET

The DNS Resolver is responsible for querying DNS servers to resolve domain names to IP addresses. It utilizes the Network Engine to perform these queries and gather relevant data.

Whois Resolver

NOT IMPLEMENTED YET

The Whois Resolver is responsible for querying Whois databases to retrieve information about domain names and IP addresses. It utilizes the Network Engine to perform these queries and gather relevant data.

Models

The various components of the MRPF project utilize a shared set of models defined in the mrpf_models crate. These models define the data structures and types used throughout the system, ensuring consistency and interoperability between different components.

Iterator Models

Most of the models are reasonably straightforward. However, the Ipv4Range, Ipv4Addresses, Ports and PortRange models deserve special attention.

These models implement a custom Iterator that shuffles the order of IP addresses and ports to avoid predictable scanning patterns. The algorithm idea was taken from masscan.

The iterator ensures each item is returned only once but avoids having to store all items in memory at once. It accomplishes this by only storing the start and end values within the Ipv4Addresses and PortRange models. The iterator uses a Feistel cipher to generate a pseudo-random permutation of the range of values, allowing for efficient iteration without repetition.

The low memory footprint is very useful in our task manager as it reduces the SQS message sizes, the database storage requirements and the RAM usage of the workers.

In the future we may introduce similar iterators, for instance for domain names. When trying to fuzz for new subdomains, we could reduce memory footprint by storing it as a hierarchical structure instead of a flat list.

The MRPF API

The MRPF API allows clients to programmatically interact with the MRPF platform. It provides endpoints managing recon data like targets, domains and wordlists, as well as triggering tasks.

Current State

At the moment the code is still running on MPF Python codebase, with a DocumentDB backend. I would love to get this into rust for better performance and alreaady have some of the models defined in mrpf_models.

Some things I want to work on:

Revisit the templating engine for rust. Think about how to represent things, our wordlist probably need to work better with bytes and then have methods to change things to utf-8/16/etc where applicable
Move away from DocumentDB to PostgreSQL. This will give me back the triggers for timestamps that I very much like. Also, DynamoDB for at least transparency records was just to costly so lets get back to the drawing board

Ideas and Future Work

Had some insights?

For my MRPF API, I think I might be too quickly trying to push everything in full predefined structs. However, when reading and writing data, I often only want to have a subset:

list all active fqdns of a target id

hmm, is it true? Is this the only real example I've found?

Ok, lets think about the write queries:

tcp syn scan needs to append ports to an existing ip address
SNI scanner needs to create new fqdn objects and services (ip/port that the sni was found on)
Http scanner needs to update the WebApp content hash
CrtSh needs to create fqdn objects
DNS resolver needs to update fqdn objects, create new ones found through PTR, or update zones with their NS and SOA records

All these things can be done with my current task/job manager BUT are these actually not better to run continuously? Scans with larger amount of data can better bypass rate limits due to more randomization. Easier to alert when a new domain has been found?

Network Engine

The core of my network stack is based on masscan. Here's a diagram of the three threads used:

Sending Thread

Construct Ethernet Packet: Create the Ethernet frame with appropriate source and destination MAC addresses. Construct IP Packet: Create the IP packet with source and destination IP addresses. Generate Sequence Number: Generate a unique sequence number based on the source/destination IP and port pairs. Create TCP Packet: Construct the TCP packet with the generated sequence number and other necessary fields. Send Packet: Send the constructed packet over the network. Send Status: Notify the status report thread that a packet has been sent.

Receiving Thread

Listen for Incoming Packets: Continuously listen for incoming packets on the network. Filter Relevant Packets: Filter out packets that are not relevant based on the unique sequence number. Handle Packet: Process the relevant packet (e.g., extract data, acknowledge receipt). Send Status: Notify the status report thread that a packet has been received and handled.

Status Report Thread

Receive Status Updates: Continuously receive status updates from the sending and receiving threads. Update Status and Statistics: Update the current status and statistics of the scan based on the received updates. Print Status and Statistics: Print the updated status and statistics to the console or log. This diagram and description should help visualize the flow and interaction between the threads in your scanning application. If you have any further questions or need additional details, feel free to ask!

graph TD
    A[Main Thread] -->|Start| B[Sending Thread]
    A -->|Start| C[Receiving Thread]
    A -->|Start| D[Status Report Thread]

    B --> B1[Construct Ethernet Packet]
    B1 --> B2[Construct IP Packet]
    B2 --> B3[Generate Sequence Number]
    B3 --> B4[Create TCP Packet]
    B4 --> B5[Send Packet]
    B5 -->|Send Status| D

    C --> C1[Listen for Incoming Packets]
    C1 --> C2[Filter Relevant Packets]
    C2 --> C3[Handle Packet]
    C3 -->|Send Status| D

    D --> D1[Receive Status Updates]
    D1 --> D2[Update Status and Statistics]
    D2 --> D3[Print Status and Statistics]

Rate limiting

The transmit_handler function implements a rate limit bucket algorithm to control the rate at which packets are sent. The rate limit bucket algorithm is a mechanism to control the rate of packet transmission. It works as follows:

Initialization
- A token bucket is initialized with a certain number of tokens (RATE_LIMIT_PACKETS_PER_INTERVAL).
- Each token represents permission to send one packet.
- The bucket is refilled at regular intervals (RATE_LIMIT_INTERVAL).
Packet Transmission
- For each packet to be sent, the algorithm checks if there are tokens available in the bucket.
- If tokens are available, a token is consumed, and the packet is sent.
- If no tokens are available, the algorithm waits until the bucket is refilled.
Refilling the Bucket
- The bucket is refilled at a fixed interval (RATE_LIMIT_INTERVAL).
- When the interval elapses, the bucket is refilled to its maximum capacity (RATE_LIMIT_PACKETS_PER_INTERVAL).
Handling Buffer Full Errors
- If the packet transmission fails due to a full buffer (NO_BUFFER_SPACE_AVAILABLE_ERROR), the algorithm waits for a short period (100ms) before retrying.

This algorithm ensures that packets are sent at a controlled rate, preventing network congestion and ensuring fair usage of network resources.

Great reads

This blog post nicely describes the pro's and con's of async Rust vs normal threads. It nicely illustrates that async Rust is not always the best choice for all use cases.

Obviously we would need to include how masscan works

And the bulk of the masscan code can be found in main.c

Infrastructure

Since we require privilege for using raw sockets, we no longer can run on AWS Lambda. However, fargate with spot pricing could be a good alternative.

We have to think about how to deploy this. AWS Batch could help, or we can create an sqs queue that will hold our tasks, then after pushing our tasks we'll start up x amount of fargate tasks to retrieve stuff from sqs and kill them after a single job.

AWS Lambda ARM

0.0000000017 per 128mb per ms
0.0000017 per 128mb per second
0.000102 per 128mb per minute
0.0000000267 per 2048mb per ms
0.0000267 per 2048mb per second
0.0016 per 2048mb per minute
pay per second
size measured in memory

Fargate spot pricing ARM

0,00016429 per vCPU per minute
pay per minute
size measured in vCPU, minimum = 2Gb mem

This means lambda is 0,00016429 - 0.000102 = 0.00006229 CHEAPER per vCPU per minute, when we don't care about memory. This means lambda is 0,0016 - 0,00016429 = 0.00143571 MORE EXPENSIVE per vCPU per second, when we compare to 2048mb lambda.

Lets say we will run 10 tasks for a full hour a day for every day of the month.

With lambda 2048mb we would pay 0.0016 60 10 * 31 = $29.76 With fargate we would pay 0,00016429 * 60 * 10 * 31 = $3.055794

UPDATE, Fargate cost described above is actually still missing the GB per hour. BUT I also see fargate tasks have a lower minimum, the minimum is 0.25vCPU with 0.5GB.

They have a fargate pricing calucation example (I don't think it uses spot instances even!):

5 tasks running on ARM for 10 minutes every day, with 1vCPU and 2GB mem, for the whole month cost a total of $1.02.

That is missing data transfer cost and public ip cost but still, I think we can work with that!! AWS Batch will be great for this as well.

TLS Scraper

Ideas and Future Work

...

HTTP/1.1 Scanner

Ideas and Future Work

...

Tcp Syn Scanner

Ideas and Future Work

...

DNS Resolver

Ideas and Future Work

...

Whois Resolver

Ideas and Future Work

...

Ideas for Improvement

This is a random collection of ideas I have for improving the network engine. Collected from random notes and thoughts I had lying around everywhere, trying to bring more structure to my notes.

Refactor network stack

Remove dependency on libpnet and re-implement linux and macOS code myself. This will prepare me to choose the most optimal transmission method for the two platforms and for instance use XDP/eBPF to reduce kernel signals.

I suspect the following library has a lot of the code that I could reuse for more flexibility to dig into the way we're sending packets:

https://github.com/pkts-rs/tappers

This will probably also avoid me from receiving packets not destined for me as we'll be having a dedicated interface.

I have to look at this part, it uses writev to send a packet. It states that only ip and ipv6 are supported so I think it means there's no ethernet support. At the moment, the only reason for having ethernet support is to be able to spoof my ip address. Since we're going to have a dedicated virtual interface, we can just set the ip address of the interface to the one we want to spoof and we should be good to go.

Had a quick look at the race condition attack and that uses ip and tcp, so we should be able to build that already.

It's also really nice as it already splits the interface into tun(send) and tap(receive) devices. This will make it easier to split it between threads.

Before doing this, I should carefully read this blog post, as it might be that the tun/tap device actually has MORE syscalls than the current implementation.

If a TUN/TAP device is not the way to go, look at libpnet and re-implement the relevant parts myself.

Idea

This came to me during an outdoor run, the first one in a month or two letting myself go. A good reminder that working out really benefits you and makes life better overall.

LATER UPDATE, I did some work on this: Check the following branch: https://github.com/thiezn/mrpf/compare/main...mm/attempttomovetosendmsg

My idea:

I'm building a verty fast and memory efficient tcp syn scanner using rust pnet library on the datalink layer.

I've split transmit and receive of packets into separate threads.

The transmit thread at the moment composes a tcp packet from the ethernet layer up, and then uses the pnet datalink tx channel to send traffic out. The send function is taking up a large percentage of the total memory used.

What if i would create a larger buffer where i combine multiple packets, and then finally send that full buffer using the datalink tx channel in one go. Does the network layer below that be able to send those packets in one go and would it perhaps reduce memory footprint and less switching of user and kernel space?

Clearly think about your response and outline the pro's con's and potential pitfalls

Copilot's response

It sounds like you're working on an interesting and challenging project! Let's break down your idea and consider the pros, cons, and potential pitfalls of using a larger buffer to combine multiple packets before sending them.

Pros

Reduced Memory Footprint: By combining multiple packets into a single buffer, you might reduce the overhead associated with multiple small allocations, potentially lowering the overall memory usage.
Fewer System Calls: Sending a larger buffer in one go could reduce the number of system calls, which might decrease the overhead of context switching between user and kernel space.
Improved Throughput: Aggregating packets could lead to better utilization of the network interface, potentially improving throughput.

Cons

Increased Latency: Waiting to fill a larger buffer before sending could introduce latency, especially if the buffer size is too large or if packet generation is not fast enough.
Complexity in Buffer Management: Managing a larger buffer and ensuring that packets are correctly combined and sent can add complexity to your code. This includes handling partial sends and ensuring data integrity.
Potential for Packet Loss: If the buffer is not managed properly, there is a risk of packet loss, especially if the buffer overflows or if there are errors in combining packets.

Potential Pitfalls

Network Layer Limitations: The underlying network layer might have limitations on the maximum transmission unit (MTU). If your combined buffer exceeds this size, it could lead to fragmentation, which might negate some of the performance benefits.
Error Handling: With a larger buffer, error handling becomes more critical. If an error occurs during the transmission of a large buffer, it could affect multiple packets, leading to more significant data loss.
Synchronization Issues: If multiple threads are involved in filling and sending the buffer, you need to ensure proper synchronization to avoid race conditions and data corruption.

Recommendations

Buffer Size Tuning: Experiment with different buffer sizes to find an optimal balance between reducing memory usage and minimizing latency.
Error Handling: Implement robust error handling to manage partial sends and retransmissions effectively.
Profiling and Testing: Profile your application to understand the impact of the changes on memory usage and performance. Conduct thorough testing to ensure that the new approach does not introduce new issues.

By carefully considering these factors, you can make an informed decision on whether to implement a larger buffer for sending packets. Good luck with your TCP SYN scanner project! If you have any more questions or need further assistance, feel free to ask.

Github Copilot response in vscode to the same question

GitHub Copilot: Combining multiple packets into a larger buffer and sending them in one go can indeed reduce the overhead associated with frequent system calls and context switches between user and kernel space. This approach can lead to better performance and reduced memory footprint. However, there are several pros, cons, and potential pitfalls to consider:

Pros

Reduced System Call Overhead: By batching multiple packets into a single buffer, you reduce the number of system calls, which can be expensive in terms of CPU cycles and context switching.
Improved Throughput: Sending larger buffers can improve throughput as the network interface can handle larger chunks of data more efficiently.
Lower Memory Fragmentation: Allocating and deallocating memory for individual packets can lead to fragmentation. Using a larger buffer can help mitigate this issue.

Cons

Increased Latency Batching packets may introduce latency as packets are held in the buffer until it is full or a timeout occurs.
Complexity: Implementing a buffering mechanism adds complexity to the code. You need to manage the buffer, handle partial writes, and ensure thread safety.
Buffer Management: You need to carefully manage the buffer size to avoid excessive memory usage or buffer overflows.

Potential Pitfalls

Buffer Overflow: If the buffer size is not managed properly, it can overflow, leading to data loss or corruption.
Thread Safety: If multiple threads are accessing the buffer, you need to ensure thread safety using synchronization mechanisms, which can introduce overhead.
Network Interface Limitations: Some network interfaces may have limitations on the maximum buffer size they can handle. You need to ensure that the buffer size is within these limits.
Error Handling: Handling errors in a batched send operation can be more complex. You need to ensure that partial writes are handled correctly and that the buffer is properly managed in case of errors.

Implementation Outline

Buffer Initialization: Initialize a buffer to hold multiple packets.
Packet Composition: Compose packets and add them to the buffer.
Buffer Flush: When the buffer is full or a timeout occurs, send the buffer using the datalink tx channel.
Thread Safety: Ensure that the buffer is accessed in a thread-safe manner.

Here's a basic implementation outline:

#![allow(unused)]
fn main() {
use pnet::datalink::DataLinkSender;
use std::sync::{Arc, Mutex};
use std::time::{Duration, Instant};

const BUFFER_SIZE: usize = 4096; // Adjust as needed
const FLUSH_INTERVAL: Duration = Duration::from_millis(50);

struct PacketBuffer {
    buffer: Vec<u8>,
    last_flush: Instant,
}

impl PacketBuffer {
    fn new() -> Self {
        PacketBuffer {
            buffer: Vec::with_capacity(BUFFER_SIZE),
            last_flush: Instant::now(),
        }
    }

    fn add_packet(&mut self, packet: &[u8]) {
        self.buffer.extend_from_slice(packet);
    }

    fn should_flush(&self) -> bool {
        self.buffer.len() >= BUFFER_SIZE || self.last_flush.elapsed() >= FLUSH_INTERVAL
    }

    fn flush(&mut self, tx: &mut dyn DataLinkSender) {
        if !self.buffer.is_empty() {
            tx.send_to(&self.buffer, None).unwrap();
            self.buffer.clear();
            self.last_flush = Instant::now();
        }
    }
}

pub(crate) fn transmit_handler(
    mut tx: Box<dyn DataLinkSender>,
    packet_rx: Receiver<Vec<u8>>,
    progress_tx: Sender<TcpSynMessage>,
    mut config: ScannerConfig,
    router_mac: MacAddr,
    ipv4_addresses: Ipv4Addresses,
    ports: Ports,
) {
    let packet_buffer = Arc::new(Mutex::new(PacketBuffer::new()));

    // Start rate limiter
    config.rate_limiter.start();

    for dst_port in ports {
        let src_port: u16 = rand::thread_rng().gen_range(49152..=60999);

        for dst_ip in ipv4_addresses.clone() {
            let packet = create_syn_tcp_packet_v4(src_port, dst_port, dst_ip, router_mac, &config);

            {
                let mut buffer = packet_buffer.lock().unwrap();
                buffer.add_packet(&packet);

                if buffer.should_flush() {
                    buffer.flush(&mut *tx);
                }
            }
        }
    }

    // Ensure any remaining packets are sent
    {
        let mut buffer = packet_buffer.lock().unwrap();
        buffer.flush(&mut *tx);
    }
}
}

Explanation

PacketBuffer Struct: Manages the buffer and handles adding packets and flushing the buffer.
Buffer Initialization: The buffer is initialized with a capacity of BUFFER_SIZE.
Packet Composition: Packets are composed and added to the buffer.
Buffer Flush: The buffer is flushed when it is full or when the flush interval has elapsed.
Thread Safety: The buffer is wrapped in an Arc<Mutex<>> to ensure thread safety.

This implementation reduces the number of system calls and context switches, potentially improving performance and reducing memory usage. However, you need to carefully manage the buffer size and ensure thread safety to avoid potential pitfalls.

After initial implementation

It seems pnet uses the libc:sendto call to send the packets. This call expects you to send a single packet and the kernel wraps this into a single frame. This means all the appended data (aka, packets after the first packet) is sent but dropped completely by the receiving end.

According the copilot, you could instead use sendmsg call to accomplish this. This does mean we won't be able to leverage pnet for this. Perhaps we need to bite the bullet and write our own datalink layer implementation. a bunch of work obviously but it would be a good learning experience.

The answer in this stack overflow post does indeed confirm my suspicion that constructing this yourself using sendmsg might improve performance when doing this in user space. It states that sendto is actually just a wrapper around sendmsg.

IF we're able to get the socketfd from pnet, I should be able to just add this sendmsg function myself. I'll have to look into this. Maybe even a nice PR to pnet if it works out. OK, looking at the code, on linux you can provide a socket to the config object. This means we can generate our socket ourselves which will be something like this:

#![allow(unused)]
fn main() {
    let socket = match unsafe { libc::socket(libc::AF_PACKET, libc::SOCK_RAW, libc::ETH_P_ALL.to_be()) } {
            -1 => return Err(io::Error::last_os_error()),
            fd => fd
    };
}

Here's copilots response to sendmsg implementation:

Yes, you can use the sendmsg system call to send multiple packets in a single call. The sendmsg function allows you to use a scatter/gather array, which can be used to send multiple buffers (or packets) in one go. This is done using the msghdr structure, which includes an array of iovec structures, each pointing to a different buffer.

Example of `sendmsg` Usage

Here's a simplified example in C to illustrate how you might use sendmsg to send multiple packets:

#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <string.h>
#include <stdio.h>
#include <unistd.h>

int main() {
    int sockfd;
    struct sockaddr_in dest_addr;
    struct msghdr msg;
    struct iovec iov[2];
    char packet1[] = "Packet 1 data";
    char packet2[] = "Packet 2 data";

    // Create socket
    sockfd = socket(AF_INET, SOCK_DGRAM, 0);

    // Destination address
    memset(&dest_addr, 0, sizeof(dest_addr));
    dest_addr.sin_family = AF_INET;
    dest_addr.sin_port = htons(12345);
    dest_addr.sin_addr.s_addr = inet_addr("192.168.1.1");

    // Prepare iovec structures
    iov[0].iov_base = packet1;
    iov[0].iov_len = strlen(packet1);
    iov[1].iov_base = packet2;
    iov[1].iov_len = strlen(packet2);

    // Prepare msghdr structure
    memset(&msg, 0, sizeof(msg));
    msg.msg_name = &dest_addr;
    msg.msg_namelen = sizeof(dest_addr);
    msg.msg_iov = iov;
    msg.msg_iovlen = 2;

    // Send message
    if (sendmsg(sockfd, &msg, 0) == -1) {
        perror("sendmsg");
        return 1;
    }

    close(sockfd);
    return 0;
}

Explanation

iovec Structures: These structures point to the individual packets you want to send.
msghdr Structure: This structure contains the destination address and the array of iovec structures.
sendmsg Call: This sends all the packets in the iovec array in a single system call.

Benefits

Efficiency: Reduces the number of system calls, potentially improving performance.
Flexibility: Allows you to send multiple packets or buffers in one go.

Considerations

MTU Size: Ensure that the total size of the combined packets does not exceed the MTU size to avoid fragmentation.
Error Handling: Implement robust error handling to manage partial sends and retransmissions.

Using sendmsg can help you achieve your goal of sending multiple packets efficiently. If you have more questions or need further details, feel free to ask!

Reducing pselect calls

Looking at the flame graph, the receive thread spends quite some time in the pselect system call (about half). Obviously since it uses a timeout, it could be there are just no packets to receive. However, looking at the implementation, it seems the pnet code Is only reading a single packet when pselect is ready.

Instead, we should try to read more packets to avoid extra pselect calls when more packets are ready.

UPDATE: No actually it seems the code IS reading multiple packets. It's dependent on the read buffer size of the datalink, so we could perhaps tweak performance a bit by playing around with the buffer size there.

TLS parse improvements

Task Manager

The task manager is a core component of the MRPF system, responsible for orchestrating and managing the execution of various tasks. It provides a flexible and extensible framework for defining, scheduling, and executing tasks, as well as handling task dependencies and data management.

It takes inspiration from Nuclei's templating system, but is fully integrated with the rest of the MRPF system. This means we're able to run any scan type we've implemented, pull data like wordlists, target scope regexes, apex zones, and use that data in our tasks. It also allows for the fan-out/fan-in pattern, to distribute tasks across multiple workers, and aggregate results from multiple tasks. We support complex execution workflows like parallel execution, conditons() and loops()

It follows the core principles of the MRPF system.

For instance, the HTTP1/x module doesn't hide the differences of IP addresses, host names and TLS SNI behind a single "host" abstraction. Here's an example of what this looks like in a task template:

- kind: http1
  ip: $[ipv4_address_ranges]
  port: 443
  tls: true
  sni: localhost
  body: |
    GET / HTTP/1.1
    Host: localhost

The observant reader will see that even here we're taking some liberties.

For instance, the body is treated as UTF-8 in this example. We will offer various other ways to define the request body in the future, such as hex encoded binary data to allow complex attacks on encodings or specific byte sequences.

(*) Actually not yet implemented :)

Infrastructure

The system leverages AWS services for scalability and security, with SQS for event-driven invocations.

%%{init: {
  "theme": "base",
  "themeVariables": {
    "primaryColor": "#0b5cab",
    "primaryTextColor": "#ffffff",
    "lineColor": "#6b7280",
    "tertiaryColor": "#eef2ff",
    "fontFamily": "Segoe UI, Roboto, Helvetica, Arial, sans-serif"
  },
  "flowchart": { "diagramPadding": 8, "curve": "basis" }
}}%%
flowchart TD
  %% ---------------------------
  %% Nodes (defined first)
  %% ---------------------------
  EB[EB Scheduler]
  TM[Task Manager]
  WP1[Workers]
  WP2[Workers]
  SQS[SQS Queue]
  TT[(Task Table)]
  DT[(Data Table)]
  ST[(Statistics Table)]

  %% ---------------------------
  %% Subgraphs / groupings
  %% ---------------------------
  subgraph PG[PostgreSQL]
    ST
    TT
    DT
  end

  subgraph VPC[AWS VPC]
    TM
    PG
    WP2
  end

  subgraph EXT[External]
    WP1
  end

  %% ---------------------------
  %% Edges
  %% ---------------------------
  EB  -->| Check task timeouts
  5 min | SQS
  EB  -->| Gather statistics
  1 hour | SQS
  EB  -->| Cleanup old tasks and data
  1 day | SQS
  WP1 -->| Push completion | SQS
  WP2 -->| Push completion | SQS
  SQS -->| Trigger invoke | TM

  TM  -->| Store statistics | ST
  TM  -->| Manage tasks | TT
  TM  -->| Mutate data | DT
  WP2 -->| Mutate data | DT

  TM  -->| Dispatch tasks | WP1
  TM  -->| Dispatch tasks | WP2

  %% ---------------------------
  %% Styling
  %% ---------------------------
  %% Database tables as cylinders (already set by [( )]); add color:
  classDef db fill:#89CFF0,stroke:#0096FF,stroke-width:1px,color:#3b2f00;
  class ST,TT,DT db;

  classDef rounded rx:8,ry:8,stroke:#2b5fab,stroke-width:1.2px,fill:#0b5cab,color:#ffffff;
  class TM,WP1,WP2 rounded;

  classDef roundedInfra rx:8,ry:8,stroke:#F36717,stroke-width:1.2px,fill:#E25606,color:#ffffff;
  class EB,SQS roundedInfra;

  %% Subgraph backgrounds & borders
  style PG  fill:#fff6e5,stroke:#ff8c00,stroke-width:2px,rx:10,ry:10
  style VPC fill:#f0f7ff,stroke:#0b5cab,stroke-width:1.5px,rx:10,ry:10
  style EXT fill:#f7f7f7,stroke:#9ca3af,stroke-width:1px,rx:10,ry:10

  %% Links (edges)
  linkStyle default stroke:#6b7280,stroke-width:2px

Data Management

Each task collection run maintains temporary state within the tasks_data table. This allows tasks to build on previous outputs. The task templating variables allow you to define how data is passed between tasks.

Workers

The task manager offers a few different types of workers, internal, external and bare metal workers.

Depending on the type of task, it will be dispatched to a dedicated queue for the type of worker required.

External Workers (AWS Lambda)

These workers run on AWS Lambda outside of the VPC. They do not have direct access to the PostgreSQL database. They are typically used for tasks that communicate to external services for recon purposes. Unfortunately AWS Lambda does not support raw socket operations, so tasks that require raw socket access cannot be run on these workers. Examples of such tasks are TCP SYN scanning or custom TLS scanning.

Useful for creating distributed tasks towards services like crt.sh, Censys, Shodan, or other public services.

NOTE: they do NOT have access to the PostgreSQL database. This means storing task results are handled by passing SucceededWithData messages back to the Task Manager.

Internal Workers (AWS VPC bound Lambda)

These workers run on AWS Lambda inside of the VPC. They have read-write access to the tasks_data and recon tables in the PostgreSQL database. They are typically used for tasks that perform data mutations or filtering operations. Since they run inside of the VPC, they can connect directly to the PostgreSQL database.

Bare Metal Workers

Bare metal workers have raw socket access. This means they can use our custom network scanning engines like TcpSyn scanning and custom TLS scanning. There is a dedicated SQS queue for bare metal workers. On launch, the worker will poll SQS for new tasks to execute. Using environment variables, the worker can configure a timeout value to poll for messages on the SQS queue. If there are no messages within the given timeout, the worker will shutdown itself. This feature is useful for running the worker on EC2 to cut down costs.

When running on EC2, the workers are placed in the VPC containing the PostgreSQL database. This means the bare metal workers have read-write access to the tasks_data and recon tables in the PostgreSQL database. This makes them suitable for both scanning tasks and heavy duty data mutation tasks.

When running outside of AWS, the bare metal workers could use SucceededWithData messages to pass data back to the Task Manager. However, this is not implemented yet. It would require us to create either separate task definitions OR have a flag in the task definition to indicate if the worker is running inside or outside of AWS.

Ideas and Future Work

Implement workers running on other environments like Azure, or bare metal hardware outside of AWS.

Task generation and aggregation

The task manager allows for dynamic generation of distributed tasks based on data captured during a task collection run.

For example, we might capture new domain names from scraping TLS certificates of known hosts on a target. Then we can feed these new domains into task generator for DNS resolution. The Task generator will create the required tasks in the running task collection to a destination container.

These dynamically generated tasks typically will be run in parallel across multiple workers to avoid rate limits and speed up the process. The output of these tasks are all the same and often we want to aggregate the results into a single output destination. For this we introduce task aggregators. These task aggregators are also created by the generator task and put in the queue of the task collection to be run after the generated tasks are done. The aggregator will then collect all the results from the generated tasks and aggregate them into a single output.

Ideas and Future Work

...

Templating

Our Task Manager has a built-in templating engine that allows you to provide static data, references to data stored in the task collection database table and the use of functions for basic data transformations.

The rendered result of a template is a DataKind, which will always be a collection of values of the same type. This is even the case if there's just a single value, it will be wrapped in a set.

Literals

The most basic type of expression is a literal. This allows you to provide a static value directly to a task argument. Some examples are:

String literal: some_literal_string
Integer literal: 42

References

References allow to retrieve data from various places in the system or task definition. The basic syntax for a reference is as follows:

$[<reference_type>:<key>]

Where <reference_type> indicates the type of the reference and <key> is the identifier of the specific data you want to reference. At the moment we support Data references and Task Parameter References. Since data references are the most commonly used, the reference type can be omitted and it will default to a data reference.

Data References

Data references allow you to reference data stored in the task collection's data storage. During runtime of a task, these references will be resolved to their actual values from the database.

The basic syntax for a data reference is as follows:

$[some_key]

Where some_key is the key of the data within a task collection you want to reference. When providing an expression to a task argument, the task manager will evaluate the expression at runtime and replace it with the corresponding value from the task collection data storage (retrieved from the PostgreSQL database).

A lightweight version of JSONPath syntax is supported for accessing nested data structures. Here are some examples:

Retrieve a nested key $[another_key.hello]
Access an array element by index $[another_key.hello[0]]

TODO: THis example doesn't match my another_key data example, fix it

Return an array of values from an array of objects $[another_key.how.are.[*].you] or $[another_key.how.*.you]

Our database supports a variety of data types, including set of strings, JSON objects and predefined common models like IPv4 Ranges and Domains. When referencing data, the task manager will automatically handle type conversions as needed to ensure that the data is in the correct format for the task argument.

Here is an example of some data that might be stored in a task collection. Specific object properties can be extracted through the JSONPath syntax mentioned above.

key	kind	value
some_key	set of strings	["some", "string"]
another_key	generic_object	{"Hello": ["World", "Moon"], "how": ["are": [{"you": "doing"}, {"things": "going"}]]"}, ...}
some_ips	ipv4_ranges	[{"start": "127.0.0.1", "end": "127.0.0.1"}, ...]
some_domains	set of domains	[{"fqdn": "example.com", "is_active": True, "dns_chain": ["a.com", ["10.0.0.1", 10.0.0.2]]}, ...]

Task Parameter References

In certain cases you want to be able to reference to parameters defined within the task itself. This can be especially handy if you want to re-use standard values for different arguments.

One example of this is when performing HTTP requests. Often a simple GET request will have the same content. It might look something like this:

GET / HTTP/1.1
Host: example.com

Lets say we want to fuzz across different SNI values. We could construct a standard body and use self referencing data variables to insert the SNI value in both the TLS SNI field and the Host header.

- kind: http1
  ip: 10.0.0.1
  port: 443
  tls: true
  sni: $[host_fuzzing]
  content: |
    GET / HTTP/1.1
    Host: $[task:sni]

Combining this with our data references, we could create standard content payloads and re-use them across multiple tasks.

- insert_data:
    key: http_get_content
    kind: string
    value: |
      GET / HTTP/1.1
      Host: $[task:sni]
      
- kind: http1
  ip: 10.0.0.1
  port: 443
  tls: true
  sni: $[host_fuzzing]
  content: $[http_get_content]
...

Combined literals and references

You can combine literals and data references within a single expression. When doing so, the task manager will evaluate the entire expression and produce a set of values based on all possible combinations of the literals and referenced data.

Lets take the folllowing example:

Combining a literal with a data reference prefix_$[some_key]_suffix

If some_key contains the values ["some", "string"], the resulting set of values would be:

prefix_some_suffix
prefix_string_suffix

Functions

In addition to simple data references, the expression syntax supports a variety of built-in functions that can be used to manipulate and transform data. Here are some examples:

capitalize($[some_data_key]): Converts a string to uppercase.
split($[some_data_key], "."): Splits a string into an array based on the specified delimiter (in this case, a period).

Examples

Here are some more complex examples that combine literals, data references and functions:

Retrieving and transforming data

This expression first splits the string retrieved from some_data_key at each period, takes the first element of the resulting array, and then capitalizes it.

capitalize(split($[some_data_key], ".")[0])

Accessing nested data

This expression retrieves an array of values from the property field of each object in the array_key array within the some_data_key data structure.

$[some_data_key.array_key[*].property]

We also support the alternative wildcard JSONPath syntax:

$[some_data_key.array_key.*.property]

Applying templates to task definitions

Here's an example of how these expressions might be used in the context of a TCP SYN task generator:

- kind: tcp_syn_generator
  ipv4_address_ranges: "$[target_ip_ranges]"
  ports:
        - 80
        - "$[https_ports]"

The result of rendering this template would be a set of TCP SYN tasks, each with a specific IP address from the target_ip_ranges data reference and a destination port that is either 80 or one of the ports specified in the https_ports data reference.

Note that templates are always converted to a set of values, even if the result is a single value. This ensures consistency in how task arguments are handled.

In the example above, if https_ports contains the values [443, 8443], the resulting task ports variable would contain a single set of integers {80, 443, 8443}. ipv4_address_ranges would contain a set of all individual IP addresses derived from the provided ranges.

For ease of use, parameters for task templates can be either a single expression or an array of expressions. If an array is provided, the results of each expression will be combined into a single set.

For example, the following two configurations are equivalent:

ipv4_address_ranges: "$[target_ip_ranges]"

ipv4_address_ranges: 
 - "$[target_ip_ranges]"

HTTP fuzzing examples

When performing HTTP fuzzing, we'd often want to iterate over several different data variables (e.g. wordlists) and generate all possible combinations. Lets see how this can be achieved with our templating engine.

Lets say we have the following data stored in our task collection:

key	kind	value
path_traversal	array of strings	["../", ".;/"]
paths	array of strings	["/admin", "/login"]
hostnames	array of strings	["localhost", "127.0.0.1"]

We can then define an HTTP fuzzing task template like this:

- kind: http_fuzz
  method: GET
  host: target.com
  sni: target.com
  body: |
    GET /$[path_traversal]/$[paths] HTTP/1.1
    Host: $[hostnames]

When this template is rendered, the task manager will generate a set of HTTP fuzzing tasks that cover all combinations of the provided path traversal strings, paths and hostnames. This would generate $2 (path_traversal) * 2 (paths) * 2 (hostnames) = 8$ unique body payloads:

GET /../admin HTTP/1.1
Host: localhost

GET /.;/admin HTTP/1.1
Host: localhost

GET /../login HTTP/1.1
Host: localhost

GET /.;/login HTTP/1.1
Host: localhost

GET /../admin HTTP/1.1
Host: 127.0.0.1

GET /.;/admin HTTP/1.1
Host: 127.0.0.1

GET /../login HTTP/1.1
Host: 127.0.0.1

GET /.;/login HTTP/1.1
Host: 127.0.0.1

Note that we could have taken this further by also templating the host and sni fields, but this should give you an idea of how powerful the templating engine can be when combined with data stored in the task collection.

Examples

Here are some examples of task collection structures to illustrate different use cases. Note that this is the YAML representation of how developers would define these task collections. The MRPF API will convert these definitions in a slightly different JSON internal representation to allow for the various features like task generators, data aggregations, conditionals and loops.


- name: Example Task Collection
  description: An example task collection demonstrating various features.
  tasks:
    - sequential:
        - kind: get_target
        target_id: "victim"
        output: target
        - parallel:
            - kind: tcp_syn_scanner
              ipv4_address_ranges: "$[target.ip_ranges]"
              ports: "80,443,8080-8089"
              output: open_ports
            - kind: dns_lookup
              domains: "$[domains[*].fqdn]"
              record_types: A
              output: domains
        - if:
            - when:
                contains: { var: "$[open_ports]", value: "443" }
              then:
                - kind: http_fuzz
                method: GET
                host: "$[target.domain]"
                sni: "$[target.domain]"
                tls: true
                content: |
                    GET $[paths] HTTP/1.1
                    Host: $[target.domain]
                output: http_responses
            - when:
                contains: { var: "$[open_ports]", value: "80" }
              then:
                - kind: http_fuzz
                method: GET
                host: "$[target.domain]"
                tls: false
                content: |
                    GET $[paths] HTTP/1.1
                    Host: $[target.domain]
                output: http_responses
            - else:
                - kind: notification
                message: "No HTTP ports open on $[target.domain], skipping HTTP fuzzing."
        - if:
            - when:
                non_empty: "$[http_responses]"
            then:
                - loop:
                    condition: "$[http_responses[*].status_code]"
                    do:
                    - kind: notification
                        message: "Received status code $[item] from $[target.domain]"
            - else:
                - kind: notification
                message: "No HTTP responses for $[target.domain]"
        - loop:
            condition: "$[http_responses[*].status_code]"
            do:
              - kind: notification
                message: "Received status code $[item] from $[target.domain]"

The Apple universal iOS/macOS app

To make it easier to work with my recon data and task scheduler I've created a universal iOS/macOS app in Swift. It provides a nice frontend for all my tools and data.

The alternative was to build some kind of web frontend, but truth be told, I just don't enjoy writing Javascript. Swift, and especially SwiftUI, feels a lot more fun and rewarding to build things with and can have a lot more focussed user experience on phones.

The current iteration works with the older MPF API and task manager, but I want to move this to the new MRPF API and task manager once I've built that out a bit more.

HTTP Repeater/scanner

The MRPF Scanner API provides a websockets interface into the various scanners built on top of the MRPF network engine. The current macOS app is able to interact with it and I'm trying to work towards a similar functionality as the repeater in Caido and Burp. Instead of making single requests, I've built the templating engine into it, similar to how I'm constructing the task manager. This allows you to more easily fuzz things, I guess it's more akin to the intruder in Burp.

I feel I should be able to find a better balance than Caido and Burp for the UI, and feel a mixture between requests and the intruder tab is getting me amost there. I need to iterate more. Other things that would really help with it is a more mature wordlist generators from the app. The killer feature will be my ability to bring together the task manager for scanning, all the collected data and the repeater/inspector in one app.

Whats the current status?

The iOS/macOS app needs work, this would be really nice to give a big refactor but I want to leverage the latest macOS 26 version. This also introduces copilot directly in XCode so should help me learn Swift and best practices a lot faster. The time to be a solo developer is now, finally I'm able to build everything myself if I just manage to keep focus on the things I really want to move forward..

Better handling of textarea in my ‘burp’ mimicking feature
Revisit the job template composition. There’s a bunch of inefficient strange code, which I think I should be able to make more ergonomic in the swift language. All those casting, generics and codable stuff is a mess
Fully buy into the two column Split View and make the macOS design aligned with liquid glass. Althernative might be to switch completely to a Tabview design. Apparently on iPadOS this tab view now transforms in a sidebar automatically, not sure if this carries over to macOS as well?
macOS works ok-ish but iOS is lagging behind. What do I want to do, it probably needs a few different design patterns to work well on the platform. Some actions might just not be suited for a phone either.
Make a more robust wordlist section. Especially Apple's easy integration with language model can be very helpful here to generate new wordlists on the fly. I also need to dig into the wordlist problem a lot deeper and try to take it up to a more professional level. I need to be able to support different encodings, rate words by potential impact, link things across targets, think about efficient storage and retrieval in the database, full integration with the templating engine, etc.

MRPF Scanner API

This is a websockets interface to the different scanners built on top of the MRPF network engine. It can be run on any machine that has Rust, at the moment focussed on running it on my mabook itself, but I can see it being useful to run on a VM in the cloud. It would be good if we can somehow get the VM being part of the workers of our task manager as these things have some overlap. The MRPF Scanner API should be only a frontend for the scanners, not do any scanning itself to keep separation of concerns.

Running websockets on any server is great as we can have bare-metal workers this way (or for instance my macbook). What would also be nice though is to actually use AWS Lambda for certain tasks here as well. We could leverage AWS WebSocket API Gateway with a lambda backing. Apparently all the keepalive stuff is handled by API gateway, you only pay for real messages and the lambda execution time.

Certificate Transparency Records

The Certificate Transparency records produced by the big certificate issuers are a goldmine for finding new domains. The most popular way to retrieve this is through crt.sh website or better yet through their PostgreSQL database. However, they have stricter rate limits and more difficulty getting all the records returned for larger subdomains. We can do this better so I've written my own code that can scrape the certificate transparency logs directly from the issuers.

However, the main problem eventually is with costs. I wanted to use AWS DynamoDB for this, and although I got it to work and learned a lot by how to model things there, it turns out it's quite costly for this usecase. I am better off moving this to PostgreSQL. Also, the lambda invocation costs are quite high so makes more sense to run the initial scraping of all older logs on EC2, my VPS or my macbook. Once the initial bulk is done, we could probably use lambda to keep the incremental updates going.

Ideas and Future Work

...

TODO

I'm afraid I need to rework my approach again. I figured out that the initial connection timeout is very important way to get rid of a lot of rate limits. Because of this, my approach of sending ranges in batches of 1 per log server isn't holding up great. This is causing the loop to wait for the connection timeout fully until it processes a new batch. Instead I should

Create a proper RateLimit class, similar to the https://docs.aws.amazon.com/sdk-for-rust/latest/dg/retries.html interface
Tweak rate limits and connection timeouts find the optimal balance per server.
Provide the scan log servers with a hashmap<LogRange, Vec>
Once a range for a particular log server is completed, pluck another one from the HashMap
If a range fails to be completed in the rate limit timeframe, move to the next range and leave the range as Pending in the database.
See if we can store the logserver optimal rate limits and preferred range sizes in the database. We could base the latter on the average entry count that a log server returns. so servers with 1024 entries per request can use larger range counts than servers that return lower count like 32. Something like LOGSERVER# SK: PROPERTIES.

#![allow(unused)]
fn main() {
struct RetryConfig {
    max_retries: usize = 3
    initial_backoff: Duration = 3
    max_backoff: Range<Duration> = [20..22] // range to randomize
    exponential_backoff: bool = true
    step: Range<Duration> = [1..2] // With exponential, we will increase this exponentially, otherwise we will do this linearly. We use a range to randomize
}

struct LogServer {
    retry_config: RetryConfig
    average_entry_size: u16 = 1024
    url: String
    mirror: Optional<String>
}

struct CertTransparencyClient {

    log_server: LogServer,
    retry_config: RetryConfig,
    max_range_failure: usize = 3 // maximum amount of time a range can fail until we completely want to stop trying
}

impl CertTransparencyClient {

    async fn get_entries(range) {
        // Retrieve entries

        // retrieve entries

        // if entries = 0, retry
        // if entries != 0, reset max_retries
        // if MaxRangeRetries, return error. user of the certclient should skip this range and try another range
        // if MaxLogServerRetries, return error. of of the certclient should abandon this log server completely

        // if range completed, return all entries that are retrieved OR
        // Do we want to handle the name extraction here already? Maybe this code above should be the function get_names, and get_entries should return RateLimited and connection errors errors. Then this function will parse those errors and either, retry, raise MaxLogServerRetries or MaxRangeRetries, or return the entries.
    }
}

}

Coding Guidelines

See .github/copilot-instructions.md for the full coding guidelines and project overview.

Faster macOS builds

You can get faster builds on mac by disabling XProtect on the terminal app.

read this

Here's some better docs on how to add terminal to developer tools: https://nexte.st/docs/installation/macos/#gatekeeper

Error Handling

My prompt:

I am writing a large rust codebase with several libraries and binaries. The code will only be used by myself so doesn't have to take external users into account at all.

I want to have a proper way of handling errors without having to write too much boiler plate code and matches everywhere.

What are best practices for error handling in my scenario and what do experts advice on this? I would prefer not to bring in external crates. Perhaps jon Gjengset has good advice or others?

Should I use ? as much as possible
What should my typical return type be? Result<something, Error?>
How to handle/catch Errors defined by other external crates and capture them easily upstream without too much hassle
Should i define my own error types to make this easier?
Is there a way to avoid using Box all over the place?

Think hard about maintainability for my requirements and being ideomatic rust

The 80/20 pattern (no external crates)

I suspect this 80/20 line comes from this blog.

The Rust by Example book has a chapter that looks a lot like this approach.

Per library crate

Define one crate‑local error enum and a Result<T> alias.
Implement Display, std::error::Error, and a few From<_> conversions for external error types you actually use.
Lean on ? everywhere; it auto‑converts via From. 1 2

Per binary crate

Return Result<(), Box<dyn std::error::Error>> from main (or use your own top‑level error). This is the only place you need a Box, keeping it out of the rest of your code. Rust’s Termination trait makes main returning a Result work as expected (nonzero exit when Err). 3

Minimal template

#![allow(unused)]
fn main() {
// lib/src/error.rs
use std::{error::Error as StdError, fmt};

#[derive(Debug)]
pub enum Error {
    Io(std::io::Error),
    Json(serde_json::Error),
    // Add more *typed* variants as needed.
    // Optionally a catch‑all if you want (kept rare):
    Other(String),
}

impl fmt::Display for Error {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        match self {
            Error::Io(e) => write!(f, "I/O error: {e}"),
            Error::Json(e) => write!(f, "JSON error: {e}"),
            Error::Other(msg) => write!(f, "{msg}"),
        }
    }
}

impl StdError for Error {
    fn source(&self) -> Option<&(dyn StdError + 'static)> {
        match self {
            Error::Io(e) => Some(e),
            Error::Json(e) => Some(e),
            Error::Other(_) => None,
        }
    }
}

// Focused conversions so `?` just works:
impl From<std::io::Error> for Error {
    fn from(e: std::io::Error) -> Self { Error::Io(e) }
}
impl From<serde_json::Error> for Error {
    fn from(e: serde_json::Error) -> Self { Error::Json(e) }
}

// Your crate-wide Result alias
pub type Result<T> = std::result::Result<T, Error>;
}

#![allow(unused)]
fn main() {
// lib/src/lib.rs
mod error;
pub use error::{Error, Result};

use std::fs;

pub fn load_config(path: &str) -> Result<serde_json::Value> {
    let data = fs::read_to_string(path)?;     // auto-converts io::Error -> Error via From
    let v = serde_json::from_str(&data)?;     // auto-converts serde_json::Error -> Error via From
    Ok(v)
}
}

// bin/src/main.rs
use mylib::{load_config, Error};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let cfg = load_config("config.json")?; // prints a neat chain on failure
    println!("{cfg}");
    Ok(())
}

Why this works well:
- ? is the standard way to propagate errors; it’s concise and idiomatic. 1 4
- Implementing std::error::Error + Display lets errors compose and present well. 2
- Keeping Box<dyn Error> only at the binary boundary avoids “Box everywhere” while still letting main fail gracefully with a proper exit code via Termination. 3

Adding lightweight context (still no external crates)

If you sometimes want more context (e.g., which file failed to parse), add a single contextual variant and a tiny helper trait:

#![allow(unused)]
fn main() {
// in error.rs
use std::borrow::Cow;

#[derive(Debug)]
pub enum Error {
    Io(std::io::Error),
    Json(serde_json::Error),
    Context { msg: Cow<'static, str>, source: Box<dyn StdError + Send + Sync> },
    Other(String),
}

impl From<std::io::Error> for Error { /* as before */ }
impl From<serde_json::Error> for Error { /* as before */ }

// Extension trait to tack on context ergonomically
pub trait ResultExt<T> {
    fn context(self, msg: impl Into<Cow<'static, str>>) -> Result<T>;
}

impl<T, E> ResultExt<T> for std::result::Result<T, E>
where
    E: StdError + Send + Sync + 'static,
{
    fn context(self, msg: impl Into<Cow<'static, str>>) -> Result<T> {
        self.map_err(|e| Error::Context { msg: msg.into(), source: Box::new(e) })
    }
}
}

Usage:

#![allow(unused)]
fn main() {
use crate::error::ResultExt;

fn load_config(path: &str) -> Result<serde_json::Value> {
    let data = std::fs::read_to_string(path)
        .context(format!("reading config from {path}"))?;
    serde_json::from_str(&data)
        .context(format!("parsing JSON from {path}"))
}
}

This keeps your enum typed for the common cases and uses one boxed variant only when you add context, not “all over the place.”
If you want stack traces during debugging, you can attach a std::backtrace::Backtrace field to Context and enable it with RUST_BACKTRACE=1 (or RUST_LIB_BACKTRACE=1). 5

Answers to your specific questions

1) “Should I use `?` as much as possible?”

Yes. Use ? whenever you cannot or do not want to recover at that level; it’s the idiomatic, concise way to propagate errors, and it triggers From to convert error types along the way. 1
Reserve explicit match only for cases where you’re handling the error (retrying, defaulting, logging with additional side‑effects, etc.). The Book’s error chapter lays out this philosophy (recoverable Result vs unrecoverable panic!). 4

2) “What should my typical return type be?”

In library crates: pub type Result<T> = std::result::Result<T, crate::Error>; with a crate‑specific Error enum. This keeps signatures short and type‑checked. 2
In binary crates: either reuse your library Error or keep the boundary generic:
fn main() -> Result<(), Box<dyn std::error::Error>> (simplest), relying on Termination to convert Err to a non‑zero exit code. 3

3) “How do I capture external crate errors upstream without hassle?”

Add From<TheirError> impls to your error enum; then ? “just works.” Also, implement source() to expose the underlying error so higher layers (or your CLI) can print the chain. The standard library’s Error trait is designed for this chaining. 2

#![allow(unused)]
fn main() {
impl From<reqwest::Error> for Error { fn from(e: reqwest::Error) -> Self { Error::Other(e.to_string()) } }
// or better: add a dedicated variant `Http(reqwest::Error)` and a Display arm
}

When adding context, use the small ResultExt::context trick above rather than writing verbose map_err everywhere.

4) “Should I define my own error types to make this easier?”

Yes. A domain‑specific error enum per crate is a widely used pattern. It avoids stringly‑typed errors, makes it easy to distinguish cases, and composes well. The Book/std::error::Error docs show how to implement Display + Error so they print nicely and chain correctly. 2 4
If you expect your library crates to evolve, you can mark the enum #[non_exhaustive] (optional, more relevant for public APIs).

5) “Is there a way to avoid using `Box` all over the place?”

Absolutely. Keep errors typed (enums) throughout your libs; use at most one Box<dyn Error>:

at the binary boundary (main) for convenience, and/or
in a single contextual variant (if you want arbitrary sources + messages).
Everywhere else, use concrete variants, which are zero‑cost and ergonomic with ?.

A few “expert‑style” tips

When to panic!: only for programmer‑bugs / invariant violations (e.g., unreachable!(), “this should never happen”), not for I/O or user‑driven failures. The Book explicitly distinguishes recoverable from unrecoverable errors. 4
Display vs Debug: make Display short and human‑readable; rely on source() chains for details. Debug can be derived. 2
Backtraces in development: consider adding a Backtrace field (captured on construction) to your contextual variant and enable it via env vars when you need it. 5
main ergonomics: returning Result from main is fine and standard; the Termination trait glues it to exit codes. If you need full control over formatting, handle the top error yourself instead of returning it. 3

If you ever relax “no external crates”

Many experienced Rustaceans use:

thiserror (derive Error + Display + #[from] on variants) to eliminate boilerplate in library error enums. 6
anyhow in binary crates for an ergonomic anyhow::Error + .context() builder (you can still keep typed errors in libs). 7

You don’t need them for your scenario, but they are the shortest path to zero boilerplate if you change your mind.

TL;DR (what to do tomorrow)

Per lib: create Error enum + Result<T> alias; implement a handful of From impls and Display.
Use ? everywhere you propagate; map_err only when you genuinely handle/transform; optional ResultExt::context to add messages. 1
Per bin: fn main() -> Result<(), Box<dyn std::error::Error>> (or your top‑level typed error) and let Termination handle exit codes. 3
Keep Box at the edges, not in your core APIs.

If you want, I can generate the error scaffolding for each of your crates (enums + From impls + ResultExt) based on the external errors you use—just tell me which crates pop up most (e.g., tokio, reqwest, serde_json, rusqlite, etc.). Also, are you OK with a single derive crate (thiserror) if it replaces ~50–100 lines of boilerplate per crate, or do you want to stay 100% std-only? 6

Bug Bounty Ideas

This section contains random ideas I have regarding bug bounty hunting, security research and general security topics.

AI-Generated Wordlists

Build instructions for different attack vectors or reconnaissance techniques.
Provide a target (e.g. a domain, ip range or a MRPF Target object), existing wordlist and the build instruction to an AI model and let it generate a new wordlist. Store the generated words and run your scans. Gather statistics on the results and repeat the process.

Good articles to get ideas from for instructions

finding actuators here, here and here

Protections implemented by AWS for scanning

Blackfoot is the EC2 elastic networking backend

AWS MadPot:

When they detect a connection to a ip that’s not allocated to any Elastic IP they will know it’s a scanner so they can mark your ip. This is not great for my TLS scanner…
Sonaris is the S3 protection

Important part will be if you can run your scanner on AWS infrastructure that uses dynamic outbound public ips, they very likely won’t block your IP as that would mean they could impact other AWS customers.

I'm not sure where I got the above information anymore, I think it was in a podcast somewhere from Critical Thinking?

Summary of some services

Learned this during some discussions

Mithra - network to inspect DNS request per region. Is also give a 'benign reputation source',that guardduty uses to prevent false positives. Route53 domain blocking also uses Mithra, perhaps also some AWS internal services use it but wasn't really clear.
MadPot - Think a more standard honeypot solution. When it detect a proper validated attack, it can replicate the blacklisted ips to the whole network.
Blackfoot - analyze all inbound and outbound flows (13T flows an hour) to VPCs. How many of these come from malicious ips, and they use MadPot to determine if it's a really malicious ip.
Sonaris - internal threat intelligence tool looks at network traffic and find potential security threats. Finds attempts of people trying to find public buckets, vulnerable services on EC2, etc.
SnowTube, what public IPs are associded with EC2. is published to an SNS topic. Would be beautiful if we can subscribe to this topic?!! are there explicit accounts or Org conditions? Can we levarage AWS services to listen to this topic? How can we find out the name of the SNS topic?
IP Ownership. this is a service managed by EC2 team - EC2 which ip addresses are associated with what instances for a point in time

How does GuardDuty work?

S3 malware uses bitdefender to help with hashes. They also have a few other internal rules for it.
GuardDuty gathers all required data itself, does not need you to enable it (vpc flow logs, cloudtrail, dns logs, S3/RDS access logs)

GuardDuty infrastructure

GuardDuty is built using a lot of the 'normal' AWS services, like Lambda, S3, EC2, RDS, Firehose.

Frontend running in customer account, these are the actual resources that will be checked.
Non guardduty internal components, S3, Route53 logs, flow logs, service logs from s3, eks audit logs, IP Ownership, (this is a service managed by EC2 team - EC2 which ip addresses are associated with what instances for a point in time), Mithra (DNS inspection)

Their evaluation components:

Stateless processor: evaluate, this is related to the threat intel providers. eg. ip ownership, external vendor intel (croudstrike and proofpoint are definitely used), etc
Stateful processing: This is where machine learning models are applied, what kind of things can it detect
Malware engine:

Another service:

Account service: Which accounts do you have enabled guardduty on, what is the delegate account, what features are enabled? etc?

Security boundaries:

GuardDuty runs internally across a whole lot of these above 'micro'services. They spread their services into different accounts, using it as a security boundary. Often they just use IAM roles and resource policies to control this, they don't put everything behind API gateways etc.

DNS Graph statistics

They get all their data from Route53 to build their mitigfations (200TB DNS logs with 5B subdomain nodes oktober-2025).

Domain (TLD + 1) -> CNAME -> DNS Subdomain -> DNS -> EC2 Instances Subdomain -> DNS -> AWS Account

Domain reputation pipeline:

Create a graph for the Domain target
Train models on the graph
Evaluate models using ??

Firenze for Model Evaluation

What are the manual sec engineer steps eg. for evaluation domains.

New domain comes, are any ip addresses already sinkholed, so more likely to be malicious. Is it low popularity, is it nonsense, is the TLD abused often?

Firenze will use the signals that sec engineers generate or engineers identify new of these weak signals, to better evaluate a model and provide guardrails. This is used to improve Mithra.

There is a whitepaper firenze-model-evaluation-using-weak-signals

High level getting findings into GuardDuty

ingest signals, apply ETL
Signals delta table -> Clustering
Clustering
Compoind signals Delta Table -> Pripritization
Compoind signals Delta Table -> training
Prioritization -> attach sequences (s3)
attach sequences (s3) -> Security Hub
attach sequences (s3)-> Finding Gateway -> Findings into the API for GuardDuty console.

Future features

Note Jeff Bezos last bit of this youtube video, thinking small is a self fulfilling prophecy.

LETS BUILD THE BEST BUG BOUNTY TOOLS IN THE WORLD! THIS INCLUDES ULTRA FAST AND SCALABLE SCANNERS, AND A BEAUTIFUL GUI TO MANAGE THEM! PROXY SERVER, REPEATER, BIG CONTINUOUS SCRAPERS, CERT TRANSPARENCY MONITORING, DNS RESOLVER, AND MORE!

IT WILL BE THE BEST AND CHEAPEST TOOL IN THE WORLD! BUILT USING RUST AND LEVERAGING THE BEST SCALABLE AND CHEAPEST AWS CLOUD SERVICES.

Create HTTP service that exposes our scanners

Similar to how Caido works, build a web frontend for the scanner that can kick off scans. This will open up a lot of flexibility in how we can use the scanner, and is very similar to how Caido is built:

Run scanner on localhost and get your SwiftUI app to interface into it
Run the scanner in a container or VM and kick off scans
Create another web service that acts as an aggregator of scanners. Manage multiple scanners through one place. The scanners will provide callbacks to the aggregator service. The GUI can contact the aggregator service to get the status of all scanners and push jobs to other scanners.

We have to make sure that the web services are not directly tied into the scanner code logic itself. Otherwise we would not be able to easily run the scanners from serverless functions or other code bases.

Http1/1 and H2 packet generation using the pnet packet marco

Is there already, or can I leverage the pnet packet macro for adding http1 and http2 support to pnet? That will make it easier to generate http packets.

TCP Fast Open

See if we want to implement TCP Fast Open for SNI scanning. Linux apparently does support it by default, windows doesn't. This could reduce the round trip time of SNI scraping tasks.

Race condition testing

implement a scanner that does this: https://flatt.tech/research/posts/beyond-the-limit-expanding-single-packet-race-condition-with-first-sequence-sync/

Since this technique relies on crafting IP re-assembled packets and using TCP sequence numbers in a particular way, it will likely not be able to leverage the normal Engine as that leverages syn cookies to sync the transmit and receiving threads.

HTTP1 Pipelining

Can I use my engine to implement HTTP1 pipelining?

HTTP-to-DNS

Use my engine to very quickly resolve domains against CloudFlare. Can we make this very memory efficient as most packets will look very much alike. I could perhaps use the streaming json library https://github.com/pydantic/jiter

Caido/Burp repeater using my HTTP engine

Integrate my HTTP engine with websocket frontend into MPF iOS app. I'm not going to create a proxy (at least not yet :D) but something similar to the repeater feature in Burp.

One part that allows you to craft the binary request
Within the request I can use my template variables
I can select existing wordlists to use for my template variables
I can set rate limits for the scanner
Starting the scan will show the results in a table view
Doubleclick a result will drill into the result. Back arrow goes back to the table view
Easily import common HTTP request payloads

This would actually be a feature that could help me find bugs (a bit) faster. At the moment a lot of my time is spent inside the repeater tab, manually crafting payloads. If I have all my wordlists and templates ready, I can way more easily use them. That in combination with my fast scanner could be a very powerful tool (of course would have to be careful with rate limits).

Since I can have my scanner running somewhere else, I can also use it from my phone. Of course thats a bit less screen space but at least I could get some stuff to work.

Of course, when this is built, whats stopping me from creating my own proxy? :D there I can also create a websockets frontend and then get a history in my iOS tool and send requests to the repeater.

Tricks using DNS lookups for recon

Read this for discovering S3 buckets using DNS enumeration:

https://celes.in/posts/s3_dns_enum

MPF - The previous iteration

Before there was MRPF, there was MPF. The original idea was the same, build my own tooling around bug bounty hunting.

It was built in Python and I've learned a lot from building it. I wanted to investigate if I could build a custom network stack and have more control over HTTP traffic, but Python was a bit more limited here. For instance, a lot of the HTTP client libraries don't allow for a lot of customization around TLS. The libraries often build abstractions around the networking layers, making it difficult to customize things like TLS SNI or ALPN. Also, concepts like domain names, host names and ip addresses are all mixed together. This is nice fro a user perspective, but I want very specific control over all these factors to find misconfigurations.

Initially I started to build a custom network stack in C using libuv. Unfortunately my laptop crashed and I was stupid enough not to commit all that code into a repo. Also, I was very much struggling with writing save concurrent C code.

After a while I decided to try and rebuild the network stack and looked into Rust. By this time ChatGPT was really getting good and it really helped me quickly get up to speed with a new language. I learned a LOT and started to love some of the rust concepts like ownership, fearless concurrency and the way it makes refactoring code bases a lot easier.

This now has let me to try and re-build MPF completely in rust, Hence the M(y) (Rusty) P (Pension) F (Fund) project.

What are the things that don't work that well in my current iteration of MRF

The task manager parallelization is not optimal
The memory management of building parallel tasks is not optimal
Python is quite memory hungry for large scans
All scanning tasks work on Lambda, I can't mix in bare metal or containers
Job scheduling is helpful but also very repetitive for each target. Would be better to have generic continuous scanning for all targets. This will help spread out load as well with my new randomization rust scanners
The database model has some limitations:
- Nothing showing where certain results came from
- Not possible to construct the mermaid graph representation I came up with
- Too much things stacked inside the Domain object that are not 100% correct. For example, IP addresses should be their own entities, tcp/udp ports are related to an ip not a domain, the order of resolved IP addresses in the domain object is not static, making it seem like we have a lot of updates.
The task manager code is quite difficult to read and not confident it's robust enough.
Introducing new tasks is quite labor intensive
Tasks do not easily show the task template they belong to, making parsing log files more difficult
The statistics of all the scans happening are not easily accessible or useful
THE MAIN THING, it hasn't helped me a single time to find or get closer to any actual bug/bounty. I have learned a bunch of things though so that is something..

Lets watch the Beazley talk and build my job along side it

https://www.youtube.com/watch?v=r-A78RgMhZU

Contributing

At the moment the project is solely maintained by me, with the purpose of learning and experimenting with Rust, distributed systems and bug bounties. I haven't got any intentions of putting this out into the world as I'd like to be able to break things when I want and work on my own pace.

However, I love to talk to like minded people. If you happened to come across my hidden little corner of the internet, feel free to get in touch.

Keyboard shortcuts

My Rusty Pension Fund