Reliable UDP

Silkopter uses 2 sockets to communicate with the ground station. A TCP reliable socket for the remote control, diagnostics, telemetry etc and an unreliable UDP socket for the video stream.  Both go through a Alfa AWUS036H wifi card.

The UAV brain splits video frames in 1Kbyte packets in order to minimize their chances of corruption – and marks each of them with the frame index and timestamp. The ground station receives these packets and tries to put them in order. It waits for up to 100ms for all the packets of a frame to be received before either presenting the frame to the user or discarding it. So it allows some lag to improve video quality but above 100ms it favors real-time-ness over quality.

All the remote control, telemetry and sensor data is sent through TCP and in theory, given a good enough wifi link – this should be ideal. But it’s not, not even close. I consistently get 300-2000ms (yes, 2 full seconds) of lag over this channel.
All of this is due to how TCP treats network congestion. When lots of packets are lost, tcp assumes the network is under a lot of pressure and throttles down its data rate in an attempt to prevent even more packet loss. This is precisely what is causing my lags – the TCP traffic hits the heavy, raw UDP video traffic and thinks the network is congested, so it slows down a lot. It doesn’t realize that I care more about the tcp traffic than the udp one so I end up having smooth video but zero control.

My solution is to create a new reliable protocol over UDP and send control, telemetry and sensor data over this channel, in parallel to the video traffic. In low bandwidth situations I can favor the critical control data over the video one.

There are lots of reliable udp libraries but I always preferred writing my own for simple enough problems when there is the potential of getting something better suited to my needs (not to mention the learning experience).

So my design is this:

  1. I will have a transport layer that can send messages – binary blobs – and presents some channels where received messages can be accessed.
  2. Data is split in messages, and each message has the following properties:
    1. MTU. Big messages can be split in smaller packets. If a packet is lost, it will be resent if needed. This should limit the size of the data to resend.
    2. Priority. Messages with higher priority are send before those with lower priority.
    3. Delivery control. If enabled, these messages will send delivery confirmation. If confirmation is nor received within X seconds, message is resent. The deliveries also have priority – just like any other messages.
    4. Waiting time. If a low priority message waited for too long, it will have its priority bumped. For video, I’ll use a 500ms waiting time to make sure I get some video even in critical situations.
  3. Channels. Each message will have a channel number so that the receiving end can configure some options per channel. Video will have one channel index, telemetry another and so on. The GS will be able to configure a 100ms waiting period for out-of-order messages on the video channel for example.

 

My current implementation uses boost::asio and I intend to keep using it.

As soon as I finish the stability pids I’ll move to the new protocol.

 

 

Advertisements

7 thoughts on “Reliable UDP

  1. Hi!, I just discover this blog, and I like it already 🙂 I wanted too build a quadcopter from scratch, but did not yet find the time…

    I have a question regarding the “reliable” link for the remote control, telemetry and others. In my opinion, if you need to retransmit the control orders, they certainly will arrive delayed, introducing latency in the control. To me you’d better stream your control like the video, and don’t worry if one frame is lost, the next one should reflect what you want, updated.

    The same for telemetry, unless you want to record every bit of it, which could be done on a uSD on-board 😉

    Have fun!

    1. You are right, waiting for a missing confirmation to re-send a packet will introduce delays. I’m not going to do it like this though. Instead, for reliable channels I will keep sending the same packet until I get confirmation that it was received. This will minimize the lag in case of packet loss – at the cost of some extra bandwidth. Also most probably the telemetry will use streaming – just like video and the same for the control sticks, as you mention. So far the only info that needs reliable delivery is the toggles (arm, disarm, panic), image downloads (taking still pictures with the camera), fence updates and waypoint updates. Basically everything that is big enough or that I don’t want to keep track of deltas at either end (toggles for example).
      The way RUDP is implemented right now is this:
      – new packets are added to the sending_queue. They have an importance value (the bigger the more important the packet is)
      – packets are split into fragments based on some experimentally determined MTU (so far this is 1K)
      – when ready to send a new fragment, compute a (floating point, 0 – 1) priority for all fragments based on their importance, time they waited in the queue and if they were recently sent
      – take the fragment with the biggest priority and send it. Now the tricky part: If the fragment needs confirmation, put it back in the send_queue so it will be picked up again next time (with a slightly lower priority for a little while)

      Hope this makes sense 🙂

      Glad you like the blog.

      1. It looks good, but still for the control fragments I think it is reasonable to have them generated at a fixed frequency, and with a high priority, like if they had reserved bandwidth. If the frequency is high enough, there is no need to store the sent control fragment as a new one, with updated information is ready to go. Now it all depends on the waiting time to resend a fragment VS the frequency of control fragments.

        I think the toggles can easily fit in a 1K control frame, and be sent each time (unless you need to send a glitch). Then you get an unified control method. It’s up to you.

        The rest can easily use TCP, as these are not critical messages. If you want to have a ssh session while your link is not very good, this may help you: https://mosh.mit.edu/

        1. Well the RUDP should allow me to configure delivery on/off with as much granularity as I need. And yes, the control sticks will most probably use streaming just like the video. For data that needs confirmation I will configure those particular packets to have delivery confirmation. So far the data I really need to be confirmed is calibration data (which is computed by the ground station), fences, waypoints, picture downloads, ssh (thanks for the link, I will definitely need to implement some sort of remote console). The whole reason I’m doing the RUDP is to avoid the UDP+TCP combo as it behaves very badly in some cases. UDP seems to cannibalize the TCP traffic.
          Another possibility will be to add an extra long range radio – like this one – and route all critical data through it + low-res video only and use it as a backup link in case wifi goes down. It doesn’t have enough bandwidth for HD (at 4Mbit/s) video but the low-res backup stream could fit as it’s only 160Kbits/s.

          BTW – do you have a build log for you quad?

          Thanks for the hints 🙂

  2. Yes, as UDP does not have congestion control, it is easy to cannibalize the TCP traffic. I think the only way to work around the saturation is to throttle (low-res?) the video link if you do not get the expected upstream data.

    In case of wifi loss, you can also implement a failsafe holding the quad in place. Of course if wifi does not come back this is not good… Or use another radio as a backup link yes.

    I do not have per say a build log, just few thoughts on a wiki, in French: http://drone.nathael.org/index.php/Drone_Generic (JeanLeFlambeur is a French name, but also used in English SF book series). And this is abandoned way before we had something built. The idea was to make a small integrated quad, with everything home made (brain, esc, sensors…).

    1. BTW – I implemented the input as continuous unreliable packets, as you suggested. Works way better that the reliable ones.
      In the end I kept the reliable ones for config messages – from calibration to camera params like iso, shutter speed etc, waypoints, etc.
      Thanks for the idea!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s