Idea

This came to me during an outdoor run, the first one in a month or two letting myself go. A good reminder that working out really benefits you and makes life better overall.

LATER UPDATE, I did some work on this: Check the following branch: https://github.com/thiezn/mrpf/compare/main...mm/attempttomovetosendmsg

My idea:

I'm building a verty fast and memory efficient tcp syn scanner using rust pnet library on the datalink layer.

I've split transmit and receive of packets into separate threads.

The transmit thread at the moment composes a tcp packet from the ethernet layer up, and then uses the pnet datalink tx channel to send traffic out. The send function is taking up a large percentage of the total memory used.

What if i would create a larger buffer where i combine multiple packets, and then finally send that full buffer using the datalink tx channel in one go. Does the network layer below that be able to send those packets in one go and would it perhaps reduce memory footprint and less switching of user and kernel space?

Clearly think about your response and outline the pro's con's and potential pitfalls

Copilot's response

It sounds like you're working on an interesting and challenging project! Let's break down your idea and consider the pros, cons, and potential pitfalls of using a larger buffer to combine multiple packets before sending them.

Pros

Reduced Memory Footprint: By combining multiple packets into a single buffer, you might reduce the overhead associated with multiple small allocations, potentially lowering the overall memory usage.
Fewer System Calls: Sending a larger buffer in one go could reduce the number of system calls, which might decrease the overhead of context switching between user and kernel space.
Improved Throughput: Aggregating packets could lead to better utilization of the network interface, potentially improving throughput.

Cons

Increased Latency: Waiting to fill a larger buffer before sending could introduce latency, especially if the buffer size is too large or if packet generation is not fast enough.
Complexity in Buffer Management: Managing a larger buffer and ensuring that packets are correctly combined and sent can add complexity to your code. This includes handling partial sends and ensuring data integrity.
Potential for Packet Loss: If the buffer is not managed properly, there is a risk of packet loss, especially if the buffer overflows or if there are errors in combining packets.

Potential Pitfalls

Network Layer Limitations: The underlying network layer might have limitations on the maximum transmission unit (MTU). If your combined buffer exceeds this size, it could lead to fragmentation, which might negate some of the performance benefits.
Error Handling: With a larger buffer, error handling becomes more critical. If an error occurs during the transmission of a large buffer, it could affect multiple packets, leading to more significant data loss.
Synchronization Issues: If multiple threads are involved in filling and sending the buffer, you need to ensure proper synchronization to avoid race conditions and data corruption.

Recommendations

Buffer Size Tuning: Experiment with different buffer sizes to find an optimal balance between reducing memory usage and minimizing latency.
Error Handling: Implement robust error handling to manage partial sends and retransmissions effectively.
Profiling and Testing: Profile your application to understand the impact of the changes on memory usage and performance. Conduct thorough testing to ensure that the new approach does not introduce new issues.

By carefully considering these factors, you can make an informed decision on whether to implement a larger buffer for sending packets. Good luck with your TCP SYN scanner project! If you have any more questions or need further assistance, feel free to ask.

Github Copilot response in vscode to the same question

GitHub Copilot: Combining multiple packets into a larger buffer and sending them in one go can indeed reduce the overhead associated with frequent system calls and context switches between user and kernel space. This approach can lead to better performance and reduced memory footprint. However, there are several pros, cons, and potential pitfalls to consider:

Pros

Reduced System Call Overhead: By batching multiple packets into a single buffer, you reduce the number of system calls, which can be expensive in terms of CPU cycles and context switching.
Improved Throughput: Sending larger buffers can improve throughput as the network interface can handle larger chunks of data more efficiently.
Lower Memory Fragmentation: Allocating and deallocating memory for individual packets can lead to fragmentation. Using a larger buffer can help mitigate this issue.

Cons

Increased Latency Batching packets may introduce latency as packets are held in the buffer until it is full or a timeout occurs.
Complexity: Implementing a buffering mechanism adds complexity to the code. You need to manage the buffer, handle partial writes, and ensure thread safety.
Buffer Management: You need to carefully manage the buffer size to avoid excessive memory usage or buffer overflows.

Potential Pitfalls

Buffer Overflow: If the buffer size is not managed properly, it can overflow, leading to data loss or corruption.
Thread Safety: If multiple threads are accessing the buffer, you need to ensure thread safety using synchronization mechanisms, which can introduce overhead.
Network Interface Limitations: Some network interfaces may have limitations on the maximum buffer size they can handle. You need to ensure that the buffer size is within these limits.
Error Handling: Handling errors in a batched send operation can be more complex. You need to ensure that partial writes are handled correctly and that the buffer is properly managed in case of errors.

Implementation Outline

Buffer Initialization: Initialize a buffer to hold multiple packets.
Packet Composition: Compose packets and add them to the buffer.
Buffer Flush: When the buffer is full or a timeout occurs, send the buffer using the datalink tx channel.
Thread Safety: Ensure that the buffer is accessed in a thread-safe manner.

Here's a basic implementation outline:

#![allow(unused)]
fn main() {
use pnet::datalink::DataLinkSender;
use std::sync::{Arc, Mutex};
use std::time::{Duration, Instant};

const BUFFER_SIZE: usize = 4096; // Adjust as needed
const FLUSH_INTERVAL: Duration = Duration::from_millis(50);

struct PacketBuffer {
    buffer: Vec<u8>,
    last_flush: Instant,
}

impl PacketBuffer {
    fn new() -> Self {
        PacketBuffer {
            buffer: Vec::with_capacity(BUFFER_SIZE),
            last_flush: Instant::now(),
        }
    }

    fn add_packet(&mut self, packet: &[u8]) {
        self.buffer.extend_from_slice(packet);
    }

    fn should_flush(&self) -> bool {
        self.buffer.len() >= BUFFER_SIZE || self.last_flush.elapsed() >= FLUSH_INTERVAL
    }

    fn flush(&mut self, tx: &mut dyn DataLinkSender) {
        if !self.buffer.is_empty() {
            tx.send_to(&self.buffer, None).unwrap();
            self.buffer.clear();
            self.last_flush = Instant::now();
        }
    }
}

pub(crate) fn transmit_handler(
    mut tx: Box<dyn DataLinkSender>,
    packet_rx: Receiver<Vec<u8>>,
    progress_tx: Sender<TcpSynMessage>,
    mut config: ScannerConfig,
    router_mac: MacAddr,
    ipv4_addresses: Ipv4Addresses,
    ports: Ports,
) {
    let packet_buffer = Arc::new(Mutex::new(PacketBuffer::new()));

    // Start rate limiter
    config.rate_limiter.start();

    for dst_port in ports {
        let src_port: u16 = rand::thread_rng().gen_range(49152..=60999);

        for dst_ip in ipv4_addresses.clone() {
            let packet = create_syn_tcp_packet_v4(src_port, dst_port, dst_ip, router_mac, &config);

            {
                let mut buffer = packet_buffer.lock().unwrap();
                buffer.add_packet(&packet);

                if buffer.should_flush() {
                    buffer.flush(&mut *tx);
                }
            }
        }
    }

    // Ensure any remaining packets are sent
    {
        let mut buffer = packet_buffer.lock().unwrap();
        buffer.flush(&mut *tx);
    }
}
}

Explanation

PacketBuffer Struct: Manages the buffer and handles adding packets and flushing the buffer.
Buffer Initialization: The buffer is initialized with a capacity of BUFFER_SIZE.
Packet Composition: Packets are composed and added to the buffer.
Buffer Flush: The buffer is flushed when it is full or when the flush interval has elapsed.
Thread Safety: The buffer is wrapped in an Arc<Mutex<>> to ensure thread safety.

This implementation reduces the number of system calls and context switches, potentially improving performance and reducing memory usage. However, you need to carefully manage the buffer size and ensure thread safety to avoid potential pitfalls.

After initial implementation

It seems pnet uses the libc:sendto call to send the packets. This call expects you to send a single packet and the kernel wraps this into a single frame. This means all the appended data (aka, packets after the first packet) is sent but dropped completely by the receiving end.

According the copilot, you could instead use sendmsg call to accomplish this. This does mean we won't be able to leverage pnet for this. Perhaps we need to bite the bullet and write our own datalink layer implementation. a bunch of work obviously but it would be a good learning experience.

The answer in this stack overflow post does indeed confirm my suspicion that constructing this yourself using sendmsg might improve performance when doing this in user space. It states that sendto is actually just a wrapper around sendmsg.

IF we're able to get the socketfd from pnet, I should be able to just add this sendmsg function myself. I'll have to look into this. Maybe even a nice PR to pnet if it works out. OK, looking at the code, on linux you can provide a socket to the config object. This means we can generate our socket ourselves which will be something like this:

#![allow(unused)]
fn main() {
    let socket = match unsafe { libc::socket(libc::AF_PACKET, libc::SOCK_RAW, libc::ETH_P_ALL.to_be()) } {
            -1 => return Err(io::Error::last_os_error()),
            fd => fd
    };
}

Here's copilots response to sendmsg implementation:

Yes, you can use the sendmsg system call to send multiple packets in a single call. The sendmsg function allows you to use a scatter/gather array, which can be used to send multiple buffers (or packets) in one go. This is done using the msghdr structure, which includes an array of iovec structures, each pointing to a different buffer.

Example of `sendmsg` Usage

Here's a simplified example in C to illustrate how you might use sendmsg to send multiple packets:

#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <string.h>
#include <stdio.h>
#include <unistd.h>

int main() {
    int sockfd;
    struct sockaddr_in dest_addr;
    struct msghdr msg;
    struct iovec iov[2];
    char packet1[] = "Packet 1 data";
    char packet2[] = "Packet 2 data";

    // Create socket
    sockfd = socket(AF_INET, SOCK_DGRAM, 0);

    // Destination address
    memset(&dest_addr, 0, sizeof(dest_addr));
    dest_addr.sin_family = AF_INET;
    dest_addr.sin_port = htons(12345);
    dest_addr.sin_addr.s_addr = inet_addr("192.168.1.1");

    // Prepare iovec structures
    iov[0].iov_base = packet1;
    iov[0].iov_len = strlen(packet1);
    iov[1].iov_base = packet2;
    iov[1].iov_len = strlen(packet2);

    // Prepare msghdr structure
    memset(&msg, 0, sizeof(msg));
    msg.msg_name = &dest_addr;
    msg.msg_namelen = sizeof(dest_addr);
    msg.msg_iov = iov;
    msg.msg_iovlen = 2;

    // Send message
    if (sendmsg(sockfd, &msg, 0) == -1) {
        perror("sendmsg");
        return 1;
    }

    close(sockfd);
    return 0;
}

Explanation

iovec Structures: These structures point to the individual packets you want to send.
msghdr Structure: This structure contains the destination address and the array of iovec structures.
sendmsg Call: This sends all the packets in the iovec array in a single system call.

Benefits

Efficiency: Reduces the number of system calls, potentially improving performance.
Flexibility: Allows you to send multiple packets or buffers in one go.

Considerations

MTU Size: Ensure that the total size of the combined packets does not exceed the MTU size to avoid fragmentation.
Error Handling: Implement robust error handling to manage partial sends and retransmissions.

Using sendmsg can help you achieve your goal of sending multiple packets efficiently. If you have more questions or need further details, feel free to ask!

Keyboard shortcuts

My Rusty Pension Fund