Reducing pselect calls
Looking at the flame graph, the receive thread spends quite some time in the pselect system call (about half). Obviously since it uses a timeout, it could be there are just no packets to receive. However, looking at the implementation, it seems the pnet code Is only reading a single packet when pselect is ready.
Instead, we should try to read more packets to avoid extra pselect calls when more packets are ready.
UPDATE: No actually it seems the code IS reading multiple packets. It's dependent on the read buffer size of the datalink, so we could perhaps tweak performance a bit by playing around with the buffer size there.