Subtitles

Here is a list of pointers for storing subtitles in Matroska:

  • As a general rule of thumb for all codecs, information that is global to an entire stream SHOULD be stored in the CodecPrivate element.

  • As subtitles usually come with a start and stop timestamps or a start timestamp and a duration, SimpleBlock is usually not used as it doesn’t allow storing the BlockDuration.

  • Start and stop timestamps that are used in a timestamps original storage format SHOULD be removed when being placed in Matroska as they could interfere if the file is edited afterwards. Instead, the Block’s timestamp and BlockDuration SHOULD be used to say when the timestamp is displayed.

  • Because a “subtitle” stream is actually just an overlay stream, anything with a transparency layer could be use, including video.

Images Subtitles

The first image format that is a goal to import into Matroska is the VobSub subtitle format. This subtitle type is generated by exporting the subtitles from a DVD [@?DVD-Video].

The requirement for muxing VobSub into Matroska is v7 subtitles (see first line of the .IDX file). If the version is smaller, you must remux them using the SubResync utility from VobSub 2.23 (or MPC) into v7 format. Generally any newly created subs will be in v7 format.

The .IFO file will not be used at all.

If there is more than one subtitle stream in the VobSub set, each stream will need to be separated into separate tracks for storage in Matroska. E.g. the VobSub file contains streams for both English and German subtitles. Then the resulting Matroska file SHOULD contain two tracks. That way the language information can be dropped and mapped to Matroska’s language tags.

The .IDX file is reformatted (see below) and placed in the CodecPrivate.

Each .BMP will be stored in its own Block. The Timestamp will be stored in the Block timestamp and the duration will be stored in the Default Duration.

Here is an example .IDX file:

 # VobSub index file, v7 (do not modify this line!)
 #
 # To repair desynchronization, you can insert gaps this way:
 # (it usually happens after vob id changes)
 #
 # delay: [sign]hh:mm:ss:ms
 #
 # Where:
 # [sign]: +, - (optional)
 # hh: hours (0 <= hh)
 # mm/ss: minutes/seconds (0 <= mm/ss <= 59)
 # ms: milliseconds (0 <= ms <= 999)
 #
 # Note: You can't position a sub before the previous with a negative
 # value.
 #
 # You can also modify timestamps or delete a few subs you don't
 # like. Just make sure they stay in increasing order.

 # Settings

 # Original frame size
 size: 720x480

 # Origin, relative to the upper-left corner, can be overloaded by
 # alignment
 org: 0, 0

 # Image scaling (hor,ver), origin is at the upper-left corner or at
 # the alignment coord (x, y)
 scale: 100%, 100%

 # Alpha blending
 alpha: 100%

 # Smoothing for very blocky images (use OLD for no filtering)
 smooth: OFF

 # In millisecs
 fadein/out: 50, 50

 # Force subtitle placement relative to (org.x, org.y)
 align: OFF at LEFT TOP

 # For correcting non-progressive desync. (in millisecs or
 # hh:mm:ss:ms)
 # Note: Not effective in DirectVobSub, use "delay: ... " instead.
 time offset: 0

 # ON: displays only forced subtitles, OFF: shows everything
 forced subs: OFF

 # The original palette of the DVD
 palette: 000000, 7e7e7e, fbff8b, cb86f1, 7f74b8, e23f06, 0a48ea, \
 b3d65a, 6b92f1, 87f087, c02081, f8d0f4, e3c411, 382201, e8840b, \
 fdfdfd

 # Custom colors (transp idxs and the four colors)
 custom colors: OFF, tridx: 0000, colors: 000000, 000000, 000000, \
 000000

 # Language index in use
 langidx: 0

 # English
 id: en, index: 0
 # Uncomment next line to activate alternative name in DirectVobSub /
 # Windows Media Player 6.x
 # alt: English
 # Vob/Cell ID: 1, 1 (PTS: 0)
 timestamp: 00:00:01:101, filepos: 000000000
 timestamp: 00:00:08:708, filepos: 000001000

First, lines beginning with “#” are removed. These are comments to make text file editing easier, and as this is not a text file, they aren’t needed.

Next remove the “langidx” and “id” lines. These are used to differentiate the subtitle streams and define the language. As the streams will be stored separately anyway, there is no need to differentiate them here. Also, the language setting will be stored in the Matroska tags, so there is no need to store it here.

Finally, the “timestamp” will be used to set the Block’s timestamp. Once it is set there, there is no need for it to be stored here. Also, as it may interfere if the file is edited, it SHOULD NOT be stored here.

Once all of these items are removed, the data to store in the CodecPrivate SHOULD look like this:

 size: 720x480
 org: 0, 0
 scale: 100%, 100%
 alpha: 100%
 smooth: OFF
 fadein/out: 50, 50
 align: OFF at LEFT TOP
 time offset: 0
 forced subs: OFF
 palette: 000000, 7e7e7e, fbff8b, cb86f1, 7f74b8, e23f06, 0a48ea, \
 b3d65a, 6b92f1, 87f087, c02081, f8d0f4, e3c411, 382201, e8840b, \
 fdfdfd
 custom colors: OFF, tridx: 0000, colors: 000000, 000000, 000000, \
 000000

There SHOULD also be two Blocks containing one image each with the timestamps “00:00:01:101” and “00:00:08:708”.

SRT Subtitles

SRT is perhaps the most basic of all subtitle formats.

It consists of four parts, all in text:

  1. A number indicating which subtitle it is in the sequence.

  2. The time that the subtitle appears on the screen, and then disappears.

  3. The subtitle itself.

  4. A blank line indicating the start of a new subtitle.

When placing SRT in Matroska, part 3 is converted to UTF-8 (S_TEXT/UTF8) and placed in the data portion of the Block. Part 2 is used to set the timestamp of the Block, and BlockDuration element. Nothing else is used.

Here is an example SRT file:

1
00:02:17,440 --> 00:02:20,375
Senator, we're making
our final approach into Coruscant.

2
00:02:20,476 --> 00:02:22,501
Very good, Lieutenant.

In this example, the text “Senator, we’re making our final approach into Coruscant.” would be converted into UTF-8 and placed in the Block. The timestamp of the block would be set to “00:02:17,440”. And the BlockDuration element would be set to “00:00:02,935”.

The same is repeated for the next subtitle.

Because there are no general settings for SRT, the CodecPrivate is left blank.

SSA/ASS Subtitles

SSA stands for Sub Station Alpha. It’s the file format used by the popular subtitle editor SubStation Alpha. It allows you to do some advanced display features, like positioning, karaoke, style managements…

For detailed information on SSA/ASS, see the SSA specs [@!SSA]. It includes an SSA specs description and the advanced features added by ASS format (standing for Advanced SSA). Because SSA and ASS are so similar, they are treated the same here.

Like SRT, this format is text based with a particular syntax.

A file consists of 4 or 5 parts, declared ala INI file (but it’s not an INI !)

The first, “[Script Info]” contains some information about the subtitle file, such as it’s title, who created it, type of script and a very important one: “PlayResY”. Be careful of this value, everything in your script (font size, positioning) is scaled by it. Sub Station Alpha uses your desktops Y resolution to write this value, so if a friend with a large monitor and a high screen resolution gives you an edited script, you can mess everything up by saving the script in SSA with your low-cost monitor.

The second, “[V4 Styles]” or “[V4+ Styles]”, is a list of style definitions. A style describes how a text will look on the screen. It defines font, font size, primary/…/outile colour, position, alignment, etc.

For example, this:

Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, \
TertiaryColour, BackColour, Bold, Italic, BorderStyle, Outline, \
Shadow, Alignment, MarginL, MarginR, MarginV, AlphaLevel, Encoding
Style: Wolf main,Wolf_Rain,56,15724527,15724527,15724527,4144959,0,\
0,1,1,2,2,5,5,30,0,0

The third, “[Events]”, is the list of text you want to display at the right timing. You can specify some attribute here. Like the style to use for this event (MUSTbe defined in the list), the position of the text (Left, Right, Vertical Margin), an effect. Name is mostly used by translator to know who said this sentence. Timing is in h:mm:ss.cc (centisec).

Format: Marked, Start, End, Style, Name, MarginL, MarginR, MarginV, \
Effect, Text
Dialogue: Marked=0,0:02:40.65,0:02:41.79,Wolf main,Cher,0000,0000,\
0000,,Et les enregistrements de ses ondes delta ?
Dialogue: Marked=0,0:02:42.42,0:02:44.15,Wolf main,autre,0000,0000,\
0000,,Toujours rien.

“[Pictures]” or “[Fonts]” part can be found in some SSA file, they contains UUE-encoded pictures/font but those features are only used by Sub Station Alpha – i.e., no filter (Vobsub/Avery Lee Subtiler filter) use them.

Now, how are they stored in Matroska?

  • All text is converted to UTF-8

  • All the headers, “[Script Info]” and the “[V4 Styles]”/”[V4+ Styles]” list, are stored in CodecPrivate.

  • Start & End field are used to set TimeStamp and the BlockDuration element. the data stored is:

  • Events are stored in the Block in this order: ReadOrder, Layer, Style, Name, MarginL, MarginR, MarginV, Effect, Text (Layer comes from ASS specs … it’s empty for SSA.) “ReadOrder field is needed for the decoder to be able to reorder the streamed samples as they were placed originally in the file.”

Here is an example of an SSA file.

[Script Info]
; This is a Sub Station Alpha v4 script.
Title: Wolf's rain 2
Original Script: Anime-spirit Ishin-francais
Original Translation: Coolman
Original Editing: Spikewolfwood
Original Timing: Lord_alucard
Original Script Checking: Spikewolfwood
ScriptType: v4.00
Collisions: Normal
PlayResY: 1024
PlayDepth: 0
Wav: 0, 128697,D:\Alex\Anime\- Fansub -\- TAFF -\WR_-_02_Wav.wav
Wav: 0, 120692,H:\team truc\WR_-_02.wav
Wav: 0, 116504,E:\sub\wolf's_rain\WOLF'S RAIN 02.wav
LastWav: 3
Timer: 100,0000

[V4 Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, \
TertiaryColour, BackColour, Bold, Italic, BorderStyle, Outline, \
Shadow, Alignment, MarginL, MarginR, MarginV, AlphaLevel, Encoding
Style: Default,Arial,20,65535,65535,65535,-2147483640,-1,0,1,3,0,2,\
30,30,30,0,0
Style: Titre_episode,Akbar,140,15724527,65535,65535,986895,-1,0,1,1,\
0,3,30,30,30,0,0
Style: Wolf main,Wolf_Rain,56,15724527,15724527,15724527,4144959,0,\
0,1,1,2,2,5,5,30,0,0

[Events]
Format: Marked, Start, End, Style, Name, MarginL, MarginR, MarginV, \
Effect, Text
Dialogue: Marked=0,0:02:40.65,0:02:41.79,Wolf main,Cher,0000,0000,\
0000,,Et les enregistrements de ses ondes delta ?
Dialogue: Marked=0,0:02:42.42,0:02:44.15,Wolf main,autre,0000,0000,\
0000,,Toujours rien.

Here is what would be placed into the CodecPrivate element.

[Script Info]
; This is a Sub Station Alpha v4 script.
Title: Wolf's rain 2
Original Script: Anime-spirit Ishin-francais
Original Translation: Coolman
Original Editing: Spikewolfwood
Original Timing: Lord_alucard
Original Script Checking: Spikewolfwood
ScriptType: v4.00
Collisions: Normal
PlayResY: 1024
PlayDepth: 0
Wav: 0, 128697,D:\Alex\Anime\- Fansub -\- TAFF -\WR_-_02_Wav.wav
Wav: 0, 120692,H:\team truc\WR_-_02.wav
Wav: 0, 116504,E:\sub\wolf's_rain\WOLF'S RAIN 02.wav
LastWav: 3
Timer: 100,0000

[V4 Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, \
TertiaryColour, BackColour, Bold, Italic, BorderStyle, Outline, \
Shadow, Alignment, MarginL, MarginR, MarginV, AlphaLevel, Encoding
Style: Default,Arial,20,65535,65535,65535,-2147483640,-1,0,1,3,0,2,\
30,30,30,0,0
Style: Titre_episode,Akbar,140,15724527,65535,65535,986895,-1,0,1,1,\
0,3,30,30,30,0,0
Style: Wolf main,Wolf_Rain,56,15724527,15724527,15724527,4144959,0,\
0,1,1,2,2,5,5,30,0,0

And here are the two blocks that would be generated.

Block’s timestamp: 00:02:40.650 BlockDuration: 00:00:01.140

1,,Wolf main,Cher,0000,0000,0000,,Et les enregistrements de ses \
ondes delta ?

Block’s timestamp: 00:02:42.420 BlockDuration: 00:00:01.730

2,,Wolf main,autre,0000,0000,0000,,Toujours rien.

WebVTT

The “Web Video Text Tracks Format” (short: WebVTT) is developed by the World Wide Web Consortium (W3C). Its specifications are freely available at [@!WebVTT].

The guiding principles for the storage of WebVTT in Matroska are:

  • Consistency: store data in a similar way to other subtitle codecs

  • Simplicity: making decoding and remuxing as easy as possible for existing infrastructures

  • Completeness: keeping as much data as possible from the original WebVTT file

Track Parameters

The CodecID to use is S_TEXT/WEBVTT.

This CodecPrivate contains all global blocks before the first subtitle entry. This starts at the “WEBVTT” file identification marker but excludes the optional byte order mark.

Storage of non-global WebVTT blocks

Non-global WebVTT blocks (e.g., “NOTE”) before a WebVTT caption or subtitle cue text are stored in Matroska’s BlockAddition element together with the Matroska Block containing the WebVTT caption or subtitle cue text these blocks precede (see below for the actual format).

Storage of Cues in Matroska blocks

Each WebVTT caption or subtitle cue text is stored directly in the Matroska Block.

A muxer MUST change all WebVTT cue timestamp(s) present within the WebVTT caption or subtitle cue text to be relative to the Matroska Block’s timestamp.

The Cue’s start timestamp is used as the Matroska Block’s timestamp.

The difference between the Cue’s end timestamp and its start timestamp is used as the Matroska BlockDuration.

BlockAdditions

Each Matroska Block may be accompanied by one BlockAdditions element. Its format is as follows:

  1. The first line contains the WebVTT caption or subtitle cue text’s optional WebVTT cue settings list followed by one line feed character (U+0x000a). The WebVTT cue settings list may be empty, in which case the line consists of the line feed character only.

  2. The second line contains the WebVTT caption or subtitle cue text’s optional WebVTT cue identifier followed by one line feed character (U+0x000a). The line may be empty indicating that there was no WebVTT cue identifier in the source file, in which case the line consists of the line feed character only.

  3. The third and all following lines contain all WebVTT comment block(s) that precede the current WebVTT cue block. These may be absent.

If there is no Matroska BlockAddition element stored together with the Matroska Block, then WebVTT cue settings list, WebVTT cue identifier, and WebVTT comment block(s) MUST be assumed to be absent.

Example of Matroska Muxing

Here’s an example how a WebVTT is transformed.

Consider the following example WebVTT file:

WEBVTT with text after the signature

STYLE
::cue {
  background-image: linear-gradient(to bottom, dimgray, lightgray);
  color: papayawhip;
}
/* Style blocks cannot use blank lines nor "dash dash greater \
than" */

NOTE comment blocks can be used between style blocks.

STYLE
::cue(b) {
  color: peachpuff;
}

REGION
id:bill
width:40%
lines:3
regionanchor:0%,100%
viewportanchor:10%,90%
scroll:up

NOTE
Notes always span a whole block and can cover multiple
lines. Like this one.
An empty line ends the block.

hello
00:00:00.000 --> 00:00:10.000
Example entry 1: Hello <b>world</b>.

NOTE style blocks cannot appear after the first cue.

00:00:25.000 --> 00:00:35.000
Example entry 2: Another entry.
This one has multiple lines.

00:01:03.000 --> 00:01:06.500 position:90% align:right size:35%
Entry 3: That stuff to the right of the \
timestamps are cue settings.

00:03:10.000 --> 00:03:20.000
Entry 4: Entries can even include timestamps.
For example:<00:03:15.000>This becomes visible five seconds
after the first part.

CodecPrivate

The following XML depicts the CodecPrivate element contains the UTF-8 text of all global WebVTT blocks before the first subtitle entry:

<TrackEntry>
  <CodecPrivate>WEBVTT with text after the signature

STYLE
::cue {
  background-image: linear-gradient(to bottom, dimgray, lightgray);
  color: papayawhip;
}
/* Style blocks cannot use blank lines nor "dash dash greater \
than" */

NOTE comment blocks can be used between style blocks.

STYLE
::cue(b) {
  color: peachpuff;
}

REGION
id:bill
width:40%
lines:3
regionanchor:0%,100%
viewportanchor:10%,90%
scroll:up

NOTE
Notes always span a whole block and can cover multiple
lines. Like this one.
An empty line ends the block.</CodecPrivate>
</TrackEntry>

Cue Block 1

The following XML depicts the nested elements of a BlockGroup element with of the first WebVTT cue block. The cue block timings are turned into Matroska timestamps. The last line feed character (U+0x000a) is stripped.

The BlockAddition content starts with one empty line as there’s no WebVTT cue settings list:

<BlockGroup>
  <Block timestamp="0">Example entry 1: Hello <b>world</b>.</Block>
  <BlockDuration>10000</BlockDuration> <!-- 10000 Ticks of 1 ms -->
  <BlockAdditions>
    <BlockMore>
      <BlockAddID>1</BlockAddID>
      <BlockAdditional>

hello</BlockAdditional>
    </BlockMore>
  </BlockAdditions>
</BlockGroup>

Cue Block 2

The following XML depicts the nested elements of a BlockGroup element with of the second WebVTT cue block. The last line feed character (U+0x000a) is stripped.

The BlockAddition content starts with two empty lines as there’s neither a WebVTT cue settings list nor a WebVTT cue identifier, Then follows the content of the WebVTT comment block(s). The last line feed character (U+0x000a) is stripped.

<BlockGroup>
  <Block timestamp="25000">Example entry 2: Another entry.
This one has multiple lines.</Block>
  <BlockDuration>10000</BlockDuration>
  <BlockAdditions>
    <BlockMore>
      <BlockAddID>1</BlockAddID>
      <BlockAdditional>

NOTE style blocks cannot appear after the first cue.</BlockAdditional>
    </BlockMore>
  </BlockAdditions>
</BlockGroup>

Cue Block 3

The following XML depicts the nested elements of a BlockGroup element with of the third WebVTT cue block. The last line feed character (U+0x000a) is stripped.

The BlockAddition content ends with an empty line as there is no WebVTT cue identifier and there were no WebVTT comment block.

<BlockGroup>
  <Block timestamp="63000">Entry 3: That stuff to the right of the \
timestamps are cue settings.</Block>
  <BlockDuration>3500</BlockDuration>
  <BlockAdditions>
    <BlockMore>
      <BlockAddID>1</BlockAddID>
      <BlockAdditional>
position:90% align:right size:35%

</BlockAdditional>
    </BlockMore>
  </BlockAdditions>
</BlockGroup>

Cue Block 4

The following XML depicts the nested elements of a BlockGroup element with of the fourth WebVTT cue block. The last line feed character (U+0x000a) is stripped.

No BlockAddition is used.

<BlockGroup>
  <Block timestamp="190000">Entry 4: Entries can even include timestamps.
For example:<00:03:15.000>This becomes visible five seconds
after the first part.</Block>
  <BlockDuration>10000</BlockDuration>
</BlockGroup>

Storage of WebVTT in Matroska vs. WebM

Note: the storage of WebVTT in Matroska is not the same as the design document for storage of WebVTT in WebM [@?WebM-WebVTT]. There are several reasons for this including but not limited to: the WebM document is old (from February 2012) and was based on an earlier draft of WebVTT and ignores several parts that were added to WebVTT later; WebM does still not support subtitles at all [@?WebMContainer]; the proposal suggests splitting the information across multiple tracks making demuxer’s and remuxer’s life very difficult.

WebM uses the “D_WEBVTT/SUBTITLES”, “D_WEBVTT/CAPTIONS”, “D_WEBVTT/DESCRIPTIONS”, and “D_WEBVTT/METADATA” CodecID with different tracks depending on the data type and without a CodecPrivate.

HDMV Presentation Graphics Subtitles

The specifications for the HDMV Presentation Graphics Subtitle format (short: HDMV PGS) can be found in in section 9.14 “HDMV graphics streams” of the Blu-ray specifications [@!Blu-ray.Part3].

Track Parameters

The CodecID to use is S_HDMV/PGS. A CodecPrivate element is not used.

Matroska Blocks

Each HDMV PGS Segment (short: Segment) will be stored in a Matroska Block. A Segment is the data structure described in section 9.14.2.1 “Segment coding structure and parameters” of the Blu-ray specifications [@!Blu-ray.Part3].

Each Segment contains a presentation timestamp. This timestamp will be used as the timestamp for the Matroska Block.

A Segment is normally shown until a subsequent Segment is encountered. Therefore, the Matroska Block MAY have no Duration. In that case, a player MUST display a Segment within a Matroska Block until the next Segment is encountered.

A muxer MAY use a Duration, e.g., by calculating the distance between two subsequent Segments. If a Matroska Block has a Duration, a player MUST display that Segment only for the duration of the BlockDuration.

HDMV Text Subtitles

The specifications for the HDMV Text Subtitle format (short: HDMV TextST) can be found in section 9.15 “HDMV text subtitle streams” of the Blu-ray specifications [@!Blu-ray.Part3].

Track Parameters

The CodecID to use is S_HDMV/TEXTST.

A CodecPrivate element is required. It MUST contain the stream’s Dialog Style Segment as described in section 9.15.4.2 “Dialog Style Segment” of the Blu-ray specifications [@!Blu-ray.Part3].

Matroska Blocks

Each HDMV Dialog Presentation Segment (short: Segment) will be stored in a Matroska Block. A Segment is the data structure described in section 9.15.4.3 “Dialog presentation segment” of the Blu-ray specifications [@!Blu-ray.Part3].

Each Segment contains a start and an end presentation timestamp (short: start PTS & end PTS). The start PTS will be used as the timestamp for the Matroska Block. The Matroska Block MUST have a Duration, and that Duration is the difference between the end PTS and the start PTS.

A player MUST use the Matroska Block’s timestamp and BlockDuration instead of the Segment’s start and end PTS for determining when and how long to show the Segment.

Character set

When TextST subtitles are stored inside Matroska, the only allowed character set is UTF-8.

Each HDMV text subtitle stream in a Blu-ray can use one of a handful of character sets. This information is not stored in the MPEG2 Transport Stream itself but in the accompanying Clip Information file.

Therefore, a muxer MUST parse the accompanying Clip Information file. If the information indicates a character set other than UTF-8, it MUST re-encode all text Dialog Presentation Segments from the indicated character set to UTF-8 prior to storing them in Matroska.

Digital Video Broadcasting (DVB) subtitles

The specifications for the Digital Video Broadcasting subtitle bitstream format (short: DVB subtitles) can be found in the [@!ETSI.EN300-743] document. The storage of DVB subtitles in MPEG transport streams is specified in the [@!ETSI.EN300-468] document.

Track Parameters

The CodecID to use is S_DVBSUB.

The CodecPrivate element is five bytes long and has the following structure:

  • 2 bytes: composition page ID (bit string, left bit first)

  • 2 bytes: ancillary page ID (bit string, left bit first)

  • 1 byte: subtitling type (bit string, left bit first)

The semantics of these bytes are the same as the ones described in section 6.2.41 “Subtitling descriptor” of [@!ETSI.EN300-468].

Matroska Blocks

Each Matroska Block consists of one or more DVB Subtitle Segments as described in section 7.2 “Syntax and semantics of the subtitling segment” of [@!ETSI.EN300-743].

Each Matroska Block SHOULD have a Duration indicating how long the DVB Subtitle Segments in that Block SHOULD be displayed.

ARIB (ISDB) subtitles

The specifications for the ARIB B-24 subtitle bitstream format (short: ARIB subtitles) and its storage in MPEG transport streams can be found in the documents [@!ARIB.STD-B24], [@!ARIB.STD-B10], and [@!ARIB.TR-B14].

Track Parameters

The CodecID to use is S_ARIBSUB.

The CodecPrivate element is three bytes long and has the following structure:

  • 1 byte: component tag (bit string, left bit first)

  • 2 bytes: data component ID (bit string, left bit first)

The semantics of the component tag are the same as those described in [@!ARIB.STD-B10], part 2, Annex J. The semantics of the data component ID are the same as those described in [@!ARIB.TR-B14], fascicle 2, Vol. 3, Section 2, 4.2.8.1.

Matroska Blocks

Each Matroska Block consists of a single synchronized PES data structure as described in chapter 5 “Independent PES transmission protocol” of [@!ARIB.STD-B24], volume 3, with a Synchronized_PES_data_byte block containing one or more ISDB Caption Data Groups as described in chapter 9 “Transmission of caption and superimpose” of [@!ARIB.STD-B24], volume 1, part 3. All of the Caption Statement Data Groups in a given Matroska Track MUST use the same language index.

A Data Group is normally shown until a subsequent Group provides instructions to clear it. Therefore, the Matroska Block SHOULD NOT have a Duration. A player SHOULD display a Data Group within a Matroska Block until its internal duration elapses, or until a subsequent Data Group removes it.