Video Entertainment

8 min

April 15, 2021

Smarter workflow to reduce catch-up costs (part 2)

With TV streaming on the rise, the costs of running OTT services are accumulating. In my previous post I reviewed the inefficiency of having two parallel workflows packaging the same content asset into duplicated copies, one for live TV and the other for catch-up.

We identified the main obstacle to merging the two workflows into one is the concern over service quality, changing content keys during live broadcast might lead to client storms and disrupt viewing experience. In the end I briefly mentioned the idea in the video industry and my thinking on how to tackle the issue.

merging-2

In this blog I’ll talk more from technical aspects. I’ll explain how the content encryption information is communicated along the video distribution chain, how the two major streaming protocols handle encryption information differently, the peculiar challenges facing live streaming, and the direction I’m exploring.

License (and key) acquisition

Let’s start with some background knowledge: what is happening when you, as a streaming service subscriber, try to play a DRM-protected movie or TV channel on your smart phone or connected TV (i.e., client device)?

DRM technology in OTT uses AES-128 symmetric key for content encryption and decryption. Before a content item (a movie, a TV show, etc.) is uploaded to the origin and distributed through the content delivery network (CDN) for consumption, it goes through a packager to get encrypted and packaged into a container format (e.g., MPEG-4, TS) suitable for adaptive bitrate (ABR) streaming over the Internet.

When a client device tries to play a content item, it asks DRM license service for a playback license, which carries the content key and content usage rules (reflecting your subscription plan and rights agreements between the operator and the content owner).

The diagram below (Figure 1) shows a basic workflow of license acquisition.

license-acquistion-flow-2-1080x513-1

Figure 1. License (and key) acquisition

Step 1. You log in to the OTT service’s customer-facing portal with your credentials, e.g., your username and password. If successful, the portal will return an authentication proof, usually in the form of a session token.

Step 2. You choose a content item (a movie, a TV show, etc.) to watch and press Play. This action triggers the player on your device to request from the CDN a manifest file, which contains information for content playback.

Step 3. The player parses the manifest file. If it indicates that there are encrypted media segments (we’ll come back to this later), the player extracts the key reference (e.g., the key identifier, KID) from the manifest, constructs a license request, and sends it along with the session token to the DRM license service.

Step 4. The DRM license service gets the KID from the license request, queries the key from its key store (backend), and wraps it in a playback license together with the content usage rules.

Step 5. The player requests media segments from the CDN, uses the content key from the playback license to decrypt the content, and finally renders the content on your device.

So, what does this workflow tell us?

Well, there is a manifest file that the player needs to get before it can stream the real video/audio segments to your device.

A manifest in OTT describes the media segments: where to get them, are they audio segments or video segments, what are the qualities (e.g., resolution, bitrate, codec), and if some segments are encrypted, how to get the encryption key. What interests us is the last part: the encryption information.

Encryption information

Though all DRM technologies in OTT support AES-128 encryption, there are differences when it comes to the two major streaming protocols, namely MPEG-DASH (MPEG Dynamic Adaptive Streaming, the only international standard) and HLS (HTTP Live Streaming, the only streaming format supported by Apple devices).

Let’s have a look at how the two streaming protocols handle the encryption information, especially how they signal a key change.

MPEG-DASH

MPEG-DASH uses MPEG-4 as the container format. MPEG-4 is a based on ISO base media file format (BMFF), which supports common encryption specified in ISO/IEC 23001. Encryption information can be stored in both the media file and the manifest.

Media File

The ISO Media Format carries content protection information in different locations, using a box hierarchy. There are two types of boxes directly related with key acquisition:

Protection System Specific Header Box (‘pssh‘): contains proprietary information of licensing and key retrieval. When there are multiple protection systems, each has its own PSSH box. For MPEG-4, pssh boxes can be stored in Media Segments (movie fragment box ‘moof‘).
Track Encryption box (‘tenc‘): contains encryption parameters and default_KID. The tenc box is in the Initialization Segment.

The diagram below (Figure 2) shows the locations of pssh box and tenc box in an ISO BMFF file.

bmff-box-1024x487-1

Figure 2. Encryption information in an ISO BMFF file

Media Presentation Description (MPD)

The default_KID and the pssh box are often made available in the manifest. This allows the player to decide whether it needs to request a new key before it runs into the real media segments, thus reduces video startup time.

DASH MPD (the manifest of MPEG-DASH) has the default_KID and the pssh box under the ContentProtection descriptors.

In the example diagram below (Figure 3):

for Period 11, the first ContentProtection descriptor contains a cenc:default_KID, which means all the media segments for the duration of Period 11 are encrypted with the key referenced by this id
the following ContentProtection descriptors contain a pssh box, each for a specific DRM

mpd-structure-1024x464-1

Figure 3. default_KID and PSSH in DASH MPD

So, for DASH, the Period defines the key duration interval, and the cenc:default_KID signals key change.

HLS

HLS supports two container formats: MPEG-2 Transport Streams and fragmented MPEG-4 (for CMAF). No matter which format you choose, content keys are referenced in the manifest.

The manifest of HLS is Playlist. As explained in HLS specification, the Media Segment tag EXT-X-KEY indicates how to decrypt media segments.

In the example diagram below (Figure 4):

the first EXT-X-KEY tag applies to all the media segments that appear after it until the second EXT-X-KEY tag
the duration of each media segment is indicated by its EXTINF tag
the URI in the EXT-X-KEY tag specifies how to obtain the encryption key referenced by the KID

hls-playlist-structure-1

Figure 4. EXT-X-KEY in HLS Playlist

So, for HLS, the EXTINF tag defines the key duration interval, and the EXT-X-KEY tag signals key change.

Timing in live streaming

Now you understand that a player usually gets to know when it needs to ask for a new key by looking up specific information in the manifest file. But how to apply this knowledge to our use case of changing content key in a live stream, without causing client storms?

Can we create the complete manifest at the beginning and add all the KIDs to be used, so the player only needs to parse the manifest once and ask for the keys at a time of its choice?

In general, for live streaming, content is encoded, encrypted, and packaged piece by piece as the live feed comes in. New media segments are continuously uploaded to origin/CDN, and the manifest is constantly updated to reflect the availability of new segments. The player usually needs to poll CDN to discover manifest updates.

In the diagram below (Figure 5) there is a key change during live streaming. The player gets to know this when it parses the latest manifest.

live-timing-1080x374-1

Figure 5. Timing in live streaming

Here comes the tricky part. If the player can’t get the license before it gets the new segments, whether you see a black screen will depend on how many segments using the previous key are still in the buffer.

While this timing issue has always been a concern, the advancement of low latency streaming workflow (including encoder, packager, CDN, and player) in recent years makes it even more critical: there would be near to zero segment being buffered.

Preloading future key

As it’s not always realistic to expect the player can get the new key in time for immediate use, preloading the player with the future key is a natural answer to solve the timing issue in live streaming.

There is more than one way to preload the future key, for example, using streaming protocol or DRM service or a combination of both. I’m trying the different approaches with Irdeto Control. Are you also interested in this topic? Do you foresee any challenges there?

Get in touch and stay tuned for my next blog.