Skip to content
Daniel Shaw · WordPress & WooCommerce Developer Wellington, New Zealand

👋 Not available for new client work right now!

Easier subtitles & captions for video in the WordPress block editor

WordPress 5.6 is set for release on December 8 2020, bringing native support for text tracks added via the Video block. Previously, supplementing video with a transcript has been a multi-step opaque process from within WordPress.

The benefit of this update will possibly be broader than simply making this task easier. Hopefully, more sites will start to supplement video content with transcripts simply because the mechanism to do so will be much more visible. Adding a transcript is beneficial for general accessibility—for example, anyone unable to hear audio content for personal or environmental factors—and for SEO: Google will happily suck up that text aligned with your video media.

What is a video “text track”?

Broadly, a text track can be a transcript of language, on-/off-screen sounds, or time-based and contextual metadata, depending on the intended audience. A text track is assigned to a <track> element, as a child of a <video> element:

<video src="video.mp4" controls> <track src="caption.vtt" label="Title for track" srclang="en" kind="captions"> </video>

Using the block editor, the Video block automatically wraps <video> in a <figure> tag:

<figure class="wp-block-video"> <video src="video.mp4" controls> <track src="caption.vtt" label="Title for track" srclang="en"> </video> </figure>

How to add tracks to the Video block

One or multiple tracks can be assigned to a track via the Text tracks dropdown in the Video block toolbar:

The interface for adding a text track.

For each text track, the label, source language and kind attributes can be set:

Text track with attributes assigned.


A simple title for the track.

Source language

The language of the track text (must be specified in IETF BCP 47 language tag format).


  • Captions: help audio-impaired (due to personal or environmental factors) viewers follow audio events on- and off-screen
  • Subtitles: help viewers unfamiliar with the audio language
  • Descriptions: akin to an audio alt text, when the primary visual material is not available for some reason
  • Chapters: intended to help navigation within a video
  • Metadata: supplies time-based data for use within the DOM 1

At time of writing, if you choose Subtitles the kind attribute will be omitted from the <track> markup: #26673

How to create a text track

Text tracks are created in Web Video Text Tracks (WebVTT) format. This basic subtitle example shows the key requirements:

WEBVTT 01:41:16.208 --> 01:41:18.506 - Why don't we just... 01:41:19.778 --> 01:41:22.770 - wait here for a little while? 01:41:24.450 --> 01:41:26.418 - See what happens.
  • The file must start with WEBVTT
  • Timecodes can be either HH:MM:SS.TTT or MM:SS.TTT format 2
  • Note the period between seconds and milliseconds
  • The hyphen preceding each text block seems to be convention and not a hard requirement. It will be displayed during playback.
  • The saved file will have a .vtt extension.

From a former life working with timecodes and subtitles, I can attest to how an incorrect timecode will often introduce a lovely cascade effect impacting surrounding timecodes. A .vtt validator will likely always help avoid excessive debug time.


  1. Different types of VTT tracks and their structures
  2. Hours:Minutes:Seconds.Milliseconds