Idea
This came to me during an outdoor run, the first one in a month or two letting myself go. A good reminder that working out really benefits you and makes life better overall.
LATER UPDATE, I did some work on this: Check the following branch: https://github.com/thiezn/mrpf/compare/main...mm/attempttomovetosendmsg
My idea:
I'm building a verty fast and memory efficient tcp syn scanner using rust pnet library on the datalink layer.
I've split transmit and receive of packets into separate threads.
The transmit thread at the moment composes a tcp packet from the ethernet layer up, and then uses the pnet datalink tx channel to send traffic out. The send function is taking up a large percentage of the total memory used.
What if i would create a larger buffer where i combine multiple packets, and then finally send that full buffer using the datalink tx channel in one go. Does the network layer below that be able to send those packets in one go and would it perhaps reduce memory footprint and less switching of user and kernel space?
Clearly think about your response and outline the pro's con's and potential pitfalls
Copilot's response
It sounds like you're working on an interesting and challenging project! Let's break down your idea and consider the pros, cons, and potential pitfalls of using a larger buffer to combine multiple packets before sending them.
Pros
- Reduced Memory Footprint: By combining multiple packets into a single buffer, you might reduce the overhead associated with multiple small allocations, potentially lowering the overall memory usage.
- Fewer System Calls: Sending a larger buffer in one go could reduce the number of system calls, which might decrease the overhead of context switching between user and kernel space.
- Improved Throughput: Aggregating packets could lead to better utilization of the network interface, potentially improving throughput.
Cons
- Increased Latency: Waiting to fill a larger buffer before sending could introduce latency, especially if the buffer size is too large or if packet generation is not fast enough.
- Complexity in Buffer Management: Managing a larger buffer and ensuring that packets are correctly combined and sent can add complexity to your code. This includes handling partial sends and ensuring data integrity.
- Potential for Packet Loss: If the buffer is not managed properly, there is a risk of packet loss, especially if the buffer overflows or if there are errors in combining packets.
Potential Pitfalls
- Network Layer Limitations: The underlying network layer might have limitations on the maximum transmission unit (MTU). If your combined buffer exceeds this size, it could lead to fragmentation, which might negate some of the performance benefits.
- Error Handling: With a larger buffer, error handling becomes more critical. If an error occurs during the transmission of a large buffer, it could affect multiple packets, leading to more significant data loss.
- Synchronization Issues: If multiple threads are involved in filling and sending the buffer, you need to ensure proper synchronization to avoid race conditions and data corruption.
Recommendations
- Buffer Size Tuning: Experiment with different buffer sizes to find an optimal balance between reducing memory usage and minimizing latency.
- Error Handling: Implement robust error handling to manage partial sends and retransmissions effectively.
- Profiling and Testing: Profile your application to understand the impact of the changes on memory usage and performance. Conduct thorough testing to ensure that the new approach does not introduce new issues.
By carefully considering these factors, you can make an informed decision on whether to implement a larger buffer for sending packets. Good luck with your TCP SYN scanner project! If you have any more questions or need further assistance, feel free to ask.
Github Copilot response in vscode to the same question
GitHub Copilot: Combining multiple packets into a larger buffer and sending them in one go can indeed reduce the overhead associated with frequent system calls and context switches between user and kernel space. This approach can lead to better performance and reduced memory footprint. However, there are several pros, cons, and potential pitfalls to consider:
Pros
- Reduced System Call Overhead: By batching multiple packets into a single buffer, you reduce the number of system calls, which can be expensive in terms of CPU cycles and context switching.
- Improved Throughput: Sending larger buffers can improve throughput as the network interface can handle larger chunks of data more efficiently.
- Lower Memory Fragmentation: Allocating and deallocating memory for individual packets can lead to fragmentation. Using a larger buffer can help mitigate this issue.
Cons
- Increased Latency Batching packets may introduce latency as packets are held in the buffer until it is full or a timeout occurs.
- Complexity: Implementing a buffering mechanism adds complexity to the code. You need to manage the buffer, handle partial writes, and ensure thread safety.
- Buffer Management: You need to carefully manage the buffer size to avoid excessive memory usage or buffer overflows.
Potential Pitfalls
- Buffer Overflow: If the buffer size is not managed properly, it can overflow, leading to data loss or corruption.
- Thread Safety: If multiple threads are accessing the buffer, you need to ensure thread safety using synchronization mechanisms, which can introduce overhead.
- Network Interface Limitations: Some network interfaces may have limitations on the maximum buffer size they can handle. You need to ensure that the buffer size is within these limits.
- Error Handling: Handling errors in a batched send operation can be more complex. You need to ensure that partial writes are handled correctly and that the buffer is properly managed in case of errors.
Implementation Outline
- Buffer Initialization: Initialize a buffer to hold multiple packets.
- Packet Composition: Compose packets and add them to the buffer.
- Buffer Flush: When the buffer is full or a timeout occurs, send the buffer using the datalink tx channel.
- Thread Safety: Ensure that the buffer is accessed in a thread-safe manner.
Here's a basic implementation outline:
#![allow(unused)] fn main() { use pnet::datalink::DataLinkSender; use std::sync::{Arc, Mutex}; use std::time::{Duration, Instant}; const BUFFER_SIZE: usize = 4096; // Adjust as needed const FLUSH_INTERVAL: Duration = Duration::from_millis(50); struct PacketBuffer { buffer: Vec<u8>, last_flush: Instant, } impl PacketBuffer { fn new() -> Self { PacketBuffer { buffer: Vec::with_capacity(BUFFER_SIZE), last_flush: Instant::now(), } } fn add_packet(&mut self, packet: &[u8]) { self.buffer.extend_from_slice(packet); } fn should_flush(&self) -> bool { self.buffer.len() >= BUFFER_SIZE || self.last_flush.elapsed() >= FLUSH_INTERVAL } fn flush(&mut self, tx: &mut dyn DataLinkSender) { if !self.buffer.is_empty() { tx.send_to(&self.buffer, None).unwrap(); self.buffer.clear(); self.last_flush = Instant::now(); } } } pub(crate) fn transmit_handler( mut tx: Box<dyn DataLinkSender>, packet_rx: Receiver<Vec<u8>>, progress_tx: Sender<TcpSynMessage>, mut config: ScannerConfig, router_mac: MacAddr, ipv4_addresses: Ipv4Addresses, ports: Ports, ) { let packet_buffer = Arc::new(Mutex::new(PacketBuffer::new())); // Start rate limiter config.rate_limiter.start(); for dst_port in ports { let src_port: u16 = rand::thread_rng().gen_range(49152..=60999); for dst_ip in ipv4_addresses.clone() { let packet = create_syn_tcp_packet_v4(src_port, dst_port, dst_ip, router_mac, &config); { let mut buffer = packet_buffer.lock().unwrap(); buffer.add_packet(&packet); if buffer.should_flush() { buffer.flush(&mut *tx); } } } } // Ensure any remaining packets are sent { let mut buffer = packet_buffer.lock().unwrap(); buffer.flush(&mut *tx); } } }
Explanation
- PacketBuffer Struct: Manages the buffer and handles adding packets and flushing the buffer.
- Buffer Initialization: The buffer is initialized with a capacity of
BUFFER_SIZE. - Packet Composition: Packets are composed and added to the buffer.
- Buffer Flush: The buffer is flushed when it is full or when the flush interval has elapsed.
- Thread Safety: The buffer is wrapped in an
Arc<Mutex<>>to ensure thread safety.
This implementation reduces the number of system calls and context switches, potentially improving performance and reducing memory usage. However, you need to carefully manage the buffer size and ensure thread safety to avoid potential pitfalls.
After initial implementation
It seems pnet uses the libc:sendto call to send the packets. This call expects you to send a single packet and the kernel wraps this into a single frame. This means all the appended data (aka, packets after the first packet) is sent but dropped completely by the receiving end.
According the copilot, you could instead use sendmsg call to accomplish this. This does mean we won't be able to leverage pnet for this. Perhaps we need to bite the bullet and write our own datalink layer implementation. a bunch of work obviously but it would be a good learning experience.
The answer in this stack overflow post does indeed confirm my suspicion that constructing this yourself using sendmsg might improve performance when doing this in user space. It states that sendto is actually just a wrapper around sendmsg.
IF we're able to get the socketfd from pnet, I should be able to just add this sendmsg function myself. I'll have to look into this. Maybe even a nice PR to pnet if it works out. OK, looking at the code, on linux you can provide a socket to the config object. This means we can generate our socket ourselves which will be something like this:
#![allow(unused)] fn main() { let socket = match unsafe { libc::socket(libc::AF_PACKET, libc::SOCK_RAW, libc::ETH_P_ALL.to_be()) } { -1 => return Err(io::Error::last_os_error()), fd => fd }; }
Here's copilots response to sendmsg implementation:
Yes, you can use the sendmsg system call to send multiple packets in a single call. The sendmsg function allows you to use a scatter/gather array, which can be used to send multiple buffers (or packets) in one go. This is done using the msghdr structure, which includes an array of iovec structures, each pointing to a different buffer.
Example of sendmsg Usage
Here's a simplified example in C to illustrate how you might use sendmsg to send multiple packets:
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <string.h>
#include <stdio.h>
#include <unistd.h>
int main() {
int sockfd;
struct sockaddr_in dest_addr;
struct msghdr msg;
struct iovec iov[2];
char packet1[] = "Packet 1 data";
char packet2[] = "Packet 2 data";
// Create socket
sockfd = socket(AF_INET, SOCK_DGRAM, 0);
// Destination address
memset(&dest_addr, 0, sizeof(dest_addr));
dest_addr.sin_family = AF_INET;
dest_addr.sin_port = htons(12345);
dest_addr.sin_addr.s_addr = inet_addr("192.168.1.1");
// Prepare iovec structures
iov[0].iov_base = packet1;
iov[0].iov_len = strlen(packet1);
iov[1].iov_base = packet2;
iov[1].iov_len = strlen(packet2);
// Prepare msghdr structure
memset(&msg, 0, sizeof(msg));
msg.msg_name = &dest_addr;
msg.msg_namelen = sizeof(dest_addr);
msg.msg_iov = iov;
msg.msg_iovlen = 2;
// Send message
if (sendmsg(sockfd, &msg, 0) == -1) {
perror("sendmsg");
return 1;
}
close(sockfd);
return 0;
}
Explanation
iovecStructures: These structures point to the individual packets you want to send.msghdrStructure: This structure contains the destination address and the array ofiovecstructures.sendmsgCall: This sends all the packets in theiovecarray in a single system call.
Benefits
- Efficiency: Reduces the number of system calls, potentially improving performance.
- Flexibility: Allows you to send multiple packets or buffers in one go.
Considerations
- MTU Size: Ensure that the total size of the combined packets does not exceed the MTU size to avoid fragmentation.
- Error Handling: Implement robust error handling to manage partial sends and retransmissions.
Using sendmsg can help you achieve your goal of sending multiple packets efficiently. If you have more questions or need further details, feel free to ask!