Dacast is pleased to introduce Mike Galli, CEO of Niagara Video, a professional-grade streaming systems provider. Mike has been in the video industry for over 30 years at companies such as Grass Valley, VXtreme (acquired by Microsoft), DIVA, Kasenna and Viewcast in various marketing and sales roles. He is well recognized in the industry and has spoken at numerous events on topics ranging from digital video to IP streaming. His background covers a number of markets including service providers, corporate, web streaming, education, and government. In this article, Mike talks about video latency and its challenges.
Video latency has been a topic of discussion ever since video transmission started back in the analog days of TV. Its importance has evolved over time to meet various needs. When Internet streaming first started latency was not a major concern. In the early days of live streaming, the major goals were simply to be able to stream a decent resolution at a decent frame rate. Many early pioneers remember the tiny sized video resolutions streamed at frame rates of 1 frame per second. Over time latency became important for some of the same applications as those in broadcast and even a new one.
What is latency and where does it come from?
Latency is simply the amount of time it takes to transmit a video signal from the source to the eyeball at the far end of the system. This is different from “delay” which is used to match audio to video in a production workflow or for a time-delayed playback, usually for various time zones. Delay is intentional whereas latency is a byproduct of several factors.
These factors include:
- encoder latency/buffering
- network transmission time
- decoder buffering/latency.
In some cases like HLS, there is also protocol-specific latency. The “chunk” sizes/time add to the other factors. These will be explained in more detail below.
Applications where latency matters
Let’s take a look at some applications and why latency is important. Even for basic video streaming excessive latency is not desirable. If it takes too long to transmit video then it can be annoying. For sports, high latency can be a problem but especially for those events where gambling is involved. If the latency to very high then it might be possible for someone to place a bet on an event that has already taken place costing the remote betting establishment a lot of money.
For news, latency is also a potential problem. It is common for the TV anchors to ask field reporters a question so if the latency is high then the person has to wait to hear the question and give the response. This is apparent for very long distances where satellite transport and network transport are a major factor. Even for local news, there is a specification for a certain amount of latency to reduce this potentially annoying problem.
When a CEO delivers a video message he or she wants this message to arrive within a reasonable period of time. The same is true for a preacher delivering a sermon.
One of the most demanding and newer applications are for online casinos. They depend on low latency to allow for an experience similar to being in the casino. When a hand is dealt it is awkward to wait too long for a betting response from the customer. This also could limit the number of hands dealt in a day which reduces the amount of money an online casino can make. Not a good thing.
How to reduce latency at the encoder
So how can latency be reduced? Let’s start at the beginning with the encoder. Video is compressed by using frame-based techniques or sub-frame techniques, usually “slices”. A video stream is made up of individual picture frames. A frame is an entire picture. See diagram 1. A-frame is made up of multiple slices so if you can encode by the slice you can reduce the latency of the encoder since you don’t have to wait for the entire frame before transmitting the information. These encoders have latencies less than a frame, some as low as 10 – 30 milliseconds. A frame-based encoder typically has a latency of around 100 – 200 milliseconds.
It is also possible to reduce the latency by removing Bi-directional frames. B frames have to wait for a future frame to compress the video. This increases the efficiency of the encoder but also adds to latency. If you remove B frames then you have to increase the bit rate to achieve the same video quality level, a price many are willing to pay for reducing latency.
IP Networking Basics
Our next topic will cover different protocols but to understand this it is necessary to understand a few IP networking basics. Under IP there are two basic protocols that all others fall under; TCP (Transmission Control Protocol) and UDP (User Datagram Protocol). TCP was developed mainly for the Internet which is an unreliable IP network. UDP was developed for reliable IP networks, primarily Local Area networks such as the one you have at home or at work. Internet applications such as web sites and email don’t rely on reliable transmission. IP packets can arrive out of order and at any time. Errors can be fixed in transmission or re-transmitted as there is time to correct these on the receiving end. Video is not like these applications. Video cares about time and packet order especially in cases where low latency is required.
At the protocol level, there are other factors to consider and some new techniques that have been invented that allow for the tuning of overhead and latency. Adobe Flash uses a standard TCP based protocol that is not capable of such tuning. You get what you get. In the broadcast world, they began with a very efficient UDP protocol. This works well over well-established IP networks but not over the Internet. To combat this problem they did not turn to TCP but invented some new techniques. The first technique was a clever version of Forward Error Correction (FEC). There is an SMPTE standard for this that ensures different encoders and decoders can be used from different vendors. The next technique that was used is a special version of Automatic Repeat reQuest (ARQ). One that was developed for video like the one in TCP is better suited for data transmission. The only bad thing about this was that these are vendor-specific so encoders and decoders are not interoperable with these special ARQ mechanisms.
There is talk today in the Internet markets of the same ARQ mechanisms but already several different versions are appearing. As a side note, it is the hope of the author that the Internet players can rally around a standard ARQ as this would make life much simpler for encoder vendors. (And decoder vendors for point to point applications)
On the decoding side of the workflow there are limits as to what can be done. Much of the latency is in the buffering on the decoder or player. For Internet applications, there is typically around 1 second of buffering and reducing this causes the video to be jerky or have gaps.
We have come a long way from the early days of streaming video over the Internet where we just were happy to see a moving picture to today where video over the Internet is technically approaching other means of video transmission. What a journey it has been and we are still looking forward to even more progress in the future.