Tag Archives: UDP

RUDP API & implementation details

I committed the latest version of RUDP. After many changes in design and implementation I’ve got something that I’m satisfied with.

The API is very simple – packets can be sent and received on 32 individual channels:

bool send(uint8_t channel_idx, uint8_t const* data, size_t size);
bool try_sending(uint8_t channel_idx, uint8_t const* data, size_t size);

try_sending is like send but fails if the previous send on the same channel didn’t finish yet. This can happen with the video frames which take some time to compress when a big keyframe arrives and there’s a context switch in the middle of the send. Since the camera data callback is async, it might try to send again too fast – so it uses try_sending to avoid getting blocked in the send call.

bool receive(uint8_t channel_idx, std::vector<uint8_t>& data);

This call will fill the data vector with a packet from channel idx and return true if succeeds.

Each of these channels has a few parameters that control the trade offs between bandwidth, latency, reliability and speed.

The most important parameter is the one that controls delivery confirmations: Send_Params::is_reliable. Reliable packets keep getting resent until the other end confirms or cancels them – or until Send_Params::cancel_after time runs out. Silkopter uses reliable packets for the remote control data,  calibration data and pid params (the silk::Comm_Message enum).

Unreliable packets are sent only once even if they get lost on the way. They are used for telemetry and the video stream since this data gets old very fast – 33ms for both video and telemetry.
A useful parameter for unreliable packets is Send_Params::cancel_on_new_data. When true, new data cancels all existing unsent data from the same channel. This is very useful for low bandwidth when video frames can take longer than 33ms to send. Another parameter – at the receiving end this time is Receive_Params::max_receive_time which indicates how long to wait for a packet. Useful for the video stream in case frame X is not ready but frame X+1 is already available. In case a packet is skipped due to this parameter, a cancel request is sent to the other end to indicate that the receiver is not interested in this data anymore. This saves quite some bandwidth.

Zlib compression can be set for each channel – and it’s on for all channels in silkopter – including the video stream where it saves between 3 and 10% of the frame size at a ~10% CPU cost.

 

Internally, packets are split in fragments of MTU size (currently 1KB). Each fragment is identified by an ID + Fragment IDX – so fragments from the same  packet share the ID.

The first fragment has a different header then the rest.

Fragments are sent as datagrams, same as pings, confirmations and cancel requests.
A datagram has a small header (5 bytes) containing the crc of all the data and the type of the datagrams. Based on the type the header can be casted to a specialized header.
The crc is actually a murmur hash of the datagram data and I’m not sure it’s really needed as UDP has it’s own crc but  better be safe than sorry. It’s very fast anyway and doesn’t even show up in the profiler.

The datagrams are managed by a pool using intrusive pointers to avoid allocations of the datagram data (a std::vector) or the ref_count (in case of using std::shared_ptr).

My test so far was this:
With both the silkopter and the GS far from the access point, I’m sending a video stream of enough bandwidth to choke the wifi. This is ~400KB/s in my current test scenario. Then I’m pushing data through a reliable channel at ~10-20 packets per second amounting to 6KB/s. So far all the data got through in less than 2-300ms which is 2-3x my max RTT. Pretty happy with the result considering that in my previous setup – TCP for reliable data + UDP for video I was getting 2-3 seconds lag even next to the access point is some worst cases.

 

The only thing missing are handling the case when one end restarts. This is problematic so far because RUDP keeps track of the last valid received packet ID and ignores packets with IDs smaller than this. So when one of the ends restarts – all its packets are ignored uptil it reaches the ID of the last packet before the restart… Not good.

 

Advertisements

RUDP first benchmark

I ran the first RUDP tests and here’s what I got:

Throughput test:
Using zlib compression, local host, release and 16MB messages in a tight loop – ~80MB/s. It’s limited by the compression speed
Same test but without compression – ~1GB/s

Message spam test:
Local host, release, 200byte messages in a tight loop – 70K/s.

Main purpose of the test was to check for obvious bottlenecks and other issues. I found a couple that were crippling performance – like my pool not working correctly and loosing some datagrams, or the allocation of the shared_ptr ref_count (which I replaced just now with a boost::intrussive_ptr). So far it seems to work ok.

I need to redo the tests on the raspberry pi but I’ll be limited by the wifi, for sure.

 

Next on my list are:

– Handle connection loss. Now if one end drops the other end will start pooling messages until it’s out of memory.

– Limit the amount of pending messages I can allow. If bandwidth is low I need to cut down the data rate to avoid messages stacking up in the RUDP queues.

– Tune it on the actual hardware. I need to figure out the optimal MTU and minimum resend period. The MTU will be a compromise between too much protocol overhead and very expensive resends adn the minimum resend period between latency and bandwidth.

 

 

UDP broadcasting

For the past 2 days I’ve been investigating why my RUDP protocol has a waay worse ping than the one in iperf. I found numerous bugs and optimizations during this time but I was never able to get the ping below 100ms while iperf in udp mode was getting 3-7ms…

It turns out that using broadcasting increases the RTT a lot and causes some packet loss. Maybe it’s just my network that behaves like this with broadcasting but after I removed it my ping was a solid 4ms. Not too bad for wifi going through 3 walls..

So the culprit was this line:
m_send_socket.set_option(socket_base::broadcast(true));

I did this a few days back when I got too lazy to implement proper discovery so I made all comms broadcast.

 

Reliable UDP

Silkopter uses 2 sockets to communicate with the ground station. A TCP reliable socket for the remote control, diagnostics, telemetry etc and an unreliable UDP socket for the video stream.  Both go through a Alfa AWUS036H wifi card.

The UAV brain splits video frames in 1Kbyte packets in order to minimize their chances of corruption – and marks each of them with the frame index and timestamp. The ground station receives these packets and tries to put them in order. It waits for up to 100ms for all the packets of a frame to be received before either presenting the frame to the user or discarding it. So it allows some lag to improve video quality but above 100ms it favors real-time-ness over quality.

All the remote control, telemetry and sensor data is sent through TCP and in theory, given a good enough wifi link – this should be ideal. But it’s not, not even close. I consistently get 300-2000ms (yes, 2 full seconds) of lag over this channel.
All of this is due to how TCP treats network congestion. When lots of packets are lost, tcp assumes the network is under a lot of pressure and throttles down its data rate in an attempt to prevent even more packet loss. This is precisely what is causing my lags – the TCP traffic hits the heavy, raw UDP video traffic and thinks the network is congested, so it slows down a lot. It doesn’t realize that I care more about the tcp traffic than the udp one so I end up having smooth video but zero control.

My solution is to create a new reliable protocol over UDP and send control, telemetry and sensor data over this channel, in parallel to the video traffic. In low bandwidth situations I can favor the critical control data over the video one.

There are lots of reliable udp libraries but I always preferred writing my own for simple enough problems when there is the potential of getting something better suited to my needs (not to mention the learning experience).

So my design is this:

  1. I will have a transport layer that can send messages – binary blobs – and presents some channels where received messages can be accessed.
  2. Data is split in messages, and each message has the following properties:
    1. MTU. Big messages can be split in smaller packets. If a packet is lost, it will be resent if needed. This should limit the size of the data to resend.
    2. Priority. Messages with higher priority are send before those with lower priority.
    3. Delivery control. If enabled, these messages will send delivery confirmation. If confirmation is nor received within X seconds, message is resent. The deliveries also have priority – just like any other messages.
    4. Waiting time. If a low priority message waited for too long, it will have its priority bumped. For video, I’ll use a 500ms waiting time to make sure I get some video even in critical situations.
  3. Channels. Each message will have a channel number so that the receiving end can configure some options per channel. Video will have one channel index, telemetry another and so on. The GS will be able to configure a 100ms waiting period for out-of-order messages on the video channel for example.

 

My current implementation uses boost::asio and I intend to keep using it.

As soon as I finish the stability pids I’ll move to the new protocol.