With TV streaming on the rise, the costs of running OTT services are accumulating. In my previous post I reviewed the inefficiency of having two parallel workflows packaging the same content asset into duplicated copies, one for live TV and the other for catch-up.
We identified the main obstacle to merging the two workflows into one is the concern over service quality, changing content keys during live broadcast might lead to client storms and disrupt viewing experience. In the end I briefly mentioned the idea in the video industry and my thinking on how to tackle the issue.
In this blog I’ll talk more from technical aspects. I’ll explain how the content encryption information is communicated along the video distribution chain, how the two major streaming protocols handle encryption information differently, the peculiar challenges facing live streaming, and the direction I’m exploring.
Let’s start with some background knowledge: what is happening when you, as a streaming service subscriber, try to play a DRM-protected movie or TV channel on your smart phone or connected TV (i.e., client device)?
DRM technology in OTT uses AES-128 symmetric key for content encryption and decryption. Before a content item (a movie, a TV show, etc.) is uploaded to the origin and distributed through the content delivery network (CDN) for consumption, it goes through a packager to get encrypted and packaged into a container format (e.g., MPEG-4, TS) suitable for adaptive bitrate (ABR) streaming over the Internet.
When a client device tries to play a content item, it asks DRM license service for a playback license, which carries the content key and content usage rules (reflecting your subscription plan and rights agreements between the operator and the content owner).
The diagram below (Figure 1) shows a basic workflow of license acquisition.
Step 1. You log in to the OTT service’s customer-facing portal with your credentials, e.g., your username and password. If successful, the portal will return an authentication proof, usually in the form of a session token.
Step 2. You choose a content item (a movie, a TV show, etc.) to watch and press Play. This action triggers the player on your device to request from the CDN a manifest file, which contains information for content playback.
Step 3. The player parses the manifest file. If it indicates that there are encrypted media segments (we’ll come back to this later), the player extracts the key reference (e.g., the key identifier, KID) from the manifest, constructs a license request, and sends it along with the session token to the DRM license service.
Step 4. The DRM license service gets the KID from the license request, queries the key from its key store (backend), and wraps it in a playback license together with the content usage rules.
Step 5. The player requests media segments from the CDN, uses the content key from the playback license to decrypt the content, and finally renders the content on your device.
So, what does this workflow tell us?
Well, there is a manifest file that the player needs to get before it can stream the real video/audio segments to your device.
A manifest in OTT describes the media segments: where to get them, are they audio segments or video segments, what are the qualities (e.g., resolution, bitrate, codec), and if some segments are encrypted, how to get the encryption key. What interests us is the last part: the encryption information.
Though all DRM technologies in OTT support AES-128 encryption, there are differences when it comes to the two major streaming protocols, namely MPEG-DASH (MPEG Dynamic Adaptive Streaming, the only international standard) and HLS (HTTP Live Streaming, the only streaming format supported by Apple devices).
Let’s have a look at how the two streaming protocols handle the encryption information, especially how they signal a key change.
MPEG-DASH uses MPEG-4 as the container format. MPEG-4 is a based on ISO base media file format (BMFF), which supports common encryption specified in ISO/IEC 23001. Encryption information can be stored in both the media file and the manifest.
The ISO Media Format carries content protection information in different locations, using a box hierarchy. There are two types of boxes directly related with key acquisition:
pssh
‘): contains proprietary information of licensing and key retrieval. When there are multiple protection systems, each has its own PSSH box. For MPEG-4, pssh
boxes can be stored in Media Segments (movie fragment box ‘moof
‘).tenc
‘): contains encryption parameters and default_KID
. The tenc
box is in the Initialization Segment.The diagram below (Figure 2) shows the locations of pssh
box and tenc
box in an ISO BMFF file.
The default_KID
and the pssh
box are often made available in the manifest. This allows the player to decide whether it needs to request a new key before it runs into the real media segments, thus reduces video startup time.
DASH MPD (the manifest of MPEG-DASH) has the default_KID
and the pssh
box under the ContentProtection
descriptors.
In the example diagram below (Figure 3):
cenc:default_KID
, which means all the media segments for the duration of Period 11 are encrypted with the key referenced by this idpssh
box, each for a specific DRMSo, for DASH, the Period
defines the key duration interval, and the cenc:default_KID
signals key change.
HLS supports two container formats: MPEG-2 Transport Streams and fragmented MPEG-4 (for CMAF). No matter which format you choose, content keys are referenced in the manifest.
The manifest of HLS is Playlist. As explained in HLS specification, the Media Segment tag EXT-X-KEY
indicates how to decrypt media segments.
In the example diagram below (Figure 4):
EXT-X-KEY
tag applies to all the media segments that appear after it until the second EXT-X-KEY
tagEXTINF
tagURI
in the EXT-X-KEY
tag specifies how to obtain the encryption key referenced by the KIDSo, for HLS, the EXTINF
tag defines the key duration interval, and the EXT-X-KEY
tag signals key change.
Now you understand that a player usually gets to know when it needs to ask for a new key by looking up specific information in the manifest file. But how to apply this knowledge to our use case of changing content key in a live stream, without causing client storms?
Can we create the complete manifest at the beginning and add all the KIDs to be used, so the player only needs to parse the manifest once and ask for the keys at a time of its choice?
In general, for live streaming, content is encoded, encrypted, and packaged piece by piece as the live feed comes in. New media segments are continuously uploaded to origin/CDN, and the manifest is constantly updated to reflect the availability of new segments. The player usually needs to poll CDN to discover manifest updates.
In the diagram below (Figure 5) there is a key change during live streaming. The player gets to know this when it parses the latest manifest.
Here comes the tricky part. If the player can’t get the license before it gets the new segments, whether you see a black screen will depend on how many segments using the previous key are still in the buffer.
While this timing issue has always been a concern, the advancement of low latency streaming workflow (including encoder, packager, CDN, and player) in recent years makes it even more critical: there would be near to zero segment being buffered.
As it’s not always realistic to expect the player can get the new key in time for immediate use, preloading the player with the future key is a natural answer to solve the timing issue in live streaming.
There is more than one way to preload the future key, for example, using streaming protocol or DRM service or a combination of both. I’m trying the different approaches with Irdeto Control. Are you also interested in this topic? Do you foresee any challenges there?
Get in touch and stay tuned for my next blog.