WebVTT: Modern Web Subtitles

Posted on December 12, 2024 by SubZap13 min read

  • 🔧 Technical
  • 🌐 Web
  • 🎨 Creative
  • 📢 Accessibility

As video streaming becomes ubiquitous, subtitles need to adapt to web platforms. WebVTT (Web Video Text Tracks) builds on SRT's simplicity while adding features specifically designed for web delivery.

A Brief History

When HTML5 video emerged, it became clear that subtitles needed to evolve. Browser vendors and streaming platforms required a format that could handle modern web needs: precise styling, multiple languages, and accessibility features. WebVTT was born from these requirements, becoming the W3C standard for web subtitles.

From SRT to WebVTT

If you're familiar with SRT files, WebVTT will feel natural. Let's look at the same subtitle in both formats:

1
00:00:01,000 --> 00:00:04,000
WEBVTT

00:00:01.000 --> 00:00:04.000

The similarities are clear, but WebVTT introduces some key differences:

  • Required "WEBVTT" header
  • Numbers before timestamps no longer required (but suggested)
  • Periods instead of commas in timestamps (01:00.000 instead of 01:00,000)

WebVTT also introduces web-specific styling options using limited CSS (Cascading Style Sheets) syntax, along with support for regions and positioning. We'll get into this in the next section.

CSS-Style Formatting

WebVTT's power comes from its CSS-like styling system. Using STYLE blocks, you can define how different elements appear:

WEBVTT

STYLE
::cue {
  color: white;
  background-color: rgba(0, 0, 0, 0.7);
  font-family: Arial, sans-serif;
}

::cue(b) {
  color: yellow;
  font-weight: bold;
}

::cue(.important) {
  color: red;
  font-weight: bold;
}

::cue(v[voice="narrator"]) {
  color: cyan;
  font-style: italic;

Styling Elements

Different selectors target specific elements:

WEBVTT

STYLE
::cue(b) {
  color: yellow;
}

::cue(i) {
  font-style: italic;
  color: cyan;
}

00:00:01.000 --> 00:00:04.000

Class-Based Styling

You can define custom classes for different types of text:

WEBVTT

STYLE
::cue(.important) {
  color: red;
  font-weight: bold;
}

::cue(.whisper) {
  color: gray;
  font-style: italic;
}

00:00:01.000 --> 00:00:05.000
<c.important>Critical announcement!</c>

00:00:06.000 --> 00:00:10.000

Voice-Based Styling

Speakers can have distinct styles:

WEBVTT

STYLE
::cue(v[voice="narrator"]) {
  color: yellow;
  font-family: "Times New Roman", serif;
}

::cue(v[voice="character"]) {
  color: cyan;
  font-family: Arial, sans-serif;
}

00:00:01.000 --> 00:00:04.000
<v narrator>The story begins...</v>

00:00:04.000 --> 00:00:08.000

Language-Specific Styling

Different languages can have distinct appearances:

WEBVTT

STYLE
::cue(:lang(en)) {
  color: white;
  font-family: Arial, sans-serif;
}

::cue(:lang(ja)) {
  color: yellow;
  font-family: "Noto Sans JP", sans-serif;
}

00:00:01.000 --> 00:00:04.000
<lang en>Welcome to the tutorial</lang>

00:00:04.000 --> 00:00:08.000

Styling Limitations

While WebVTT's styling system is powerful, it has some important restrictions:

  • Cannot load external resources
  • Limited to text-related CSS properties
  • Styling applies to entire cue boxes
  • No animation or transition effects

Anatomy of a WebVTT File

Now that we understand WebVTT's styling capabilities, let's look at how a complete file comes together:

WEBVTT
Kind: captions
Language: en

STYLE
::cue {
  color: white;
  background-color: rgba(0, 0, 0, 0.7);
}

NOTE
This is a comment - it won't be displayed

1
00:00:01.000 --> 00:00:04.000
In today's video, we'll explore
the latest web technologies.

2
00:00:04.500 --> 00:00:08.000 align:end line:90%
Subscribe for more tutorials!

3
00:00:08.100 --> 00:00:12.000

Each file contains:

  • The WEBVTT header (required)
  • Optional metadata (Kind, Language)
  • STYLE blocks for formatting
  • Cue blocks with timing and text
  • Optional positioning attributes

Positioning and Layout

Beyond styling, WebVTT offers precise control over subtitle positioning. Unlike traditional formats, WebVTT uses a web-native positioning system:

00:00:04.000 --> 00:00:08.000 align:end position:90%
Right-aligned subtitle

00:00:08.000 --> 00:00:12.000 line:10%
Subtitle near the top

00:00:12.000 --> 00:00:16.000 size:40%

Common positioning properties:

  • align: Start, center, or end alignment
  • line: Vertical position (percentage or line number)
  • position: Horizontal position (percentage)
  • size: Width of the text box

Voice and Speaker Support

For content with multiple speakers, WebVTT provides clear identification through voice tags, which can be styled as we saw earlier:

STYLE
::cue(v[voice="host"]) {
  color: yellow;
}

::cue(v[voice="guest"]) {
  color: cyan;
}

00:00:01.000 --> 00:00:04.000
<v host>Welcome to the show!

00:00:04.000 --> 00:00:08.000

This feature is particularly valuable for interviews, panel discussions, and educational materials. It also helps with accessibility requirements by making speaker changes clear to screen readers.

Working with WebVTT Files

While WebVTT offers powerful styling and positioning features, keeping subtitles simple often works best. Follow these guidelines for reliable results:

Improving Readability

The same principles that work for SRT apply to WebVTT:

  • Two lines maximum per subtitle
  • Around 40 characters per line
  • 20-25 characters per second
  • Natural line breaks

Technical Recommendations

For robust WebVTT files:

  • Always use UTF-8 encoding
  • Test positioning on different screen sizes
  • Verify speaker labels work in your player
  • Keep styling consistent throughout

Platform Support

WebVTT enjoys strong support across modern platforms, but capabilities vary.

Most players reliably support:

  • Basic subtitle display
  • Simple positioning
  • Speaker identification
  • Standard timing

However, test carefully when using:

  • Complex positioning
  • Custom styling
  • Regions
  • Advanced features

This is due to the web-based nature of WebVTT, which is not always well-supported outside of web browsers, since it requires layout and styling support traditionally only implemented in web browsers.

Common Use Cases

Video streaming platforms have embraced WebVTT for its reliability and web-native features. The format particularly shines in online learning, where clear speaker identification and precise timing help viewers follow along.

Accessibility is another key strength. Screen readers handle WebVTT well, and the format's support for semantic markup helps create more inclusive content. The combination of CSS-like styling and semantic structure makes it possible to create subtitles that are both visually appealing and accessible.

Tools and Validation

While any text editor can handle WebVTT files, specialized tools make creation and testing easier:

Professional subtitle editors include:

  • Aegisub: Supports WebVTT export
  • Subtitle Edit: Strong WebVTT support
  • Caption Maker: Web-focused editor

Common Mistakes

Here are some typical WebVTT-specific issues to watch out for:

Incorrect STYLE block placement

This example demonstrates how a STYLE block may be placed incorrectly. These blocks must always come before any cues (shown text) in the subtitle.

1
00:00:01.000 --> 00:00:04.000
First subtitle

STYLE
::cue {
  color: red;

Invalid CSS syntax

This example demonstrates a common mistake when writing CSS syntax - a missing semicolon. For more information on the specific syntax of CSS, W3Schools provides many great articles on the topic.

WEBVTT

STYLE
::cue {
  color: red
  font-weight: bold;

Mixing class and voice tags incorrectly

This example demonstrates invalid use of XML-like tags for class and voice (speaker labeling).

  • Using v.important is invalid and should be c.important (v for voice vs c for cue).
WEBVTT

STYLE
::cue(.important) {
  color: red;
}

00:00:01.000 --> 00:00:04.000
<v.important>Wrong syntax</v>

00:00:01.000 --> 00:00:04.000

Invalid positioning values

This example demonstrates an invalid positioning value as well as an invalid alignment value.

  • position is set to 101%, which is invalid because percentages must be between 0 and 100.
  • align is set to middle, when it should be center.
00:00:01.000 --> 00:00:04.000 position:101%
First subtitle

00:00:04.000 --> 00:00:08.000 align:middle

What's Next?

Now that you understand WebVTT's capabilities, from its CSS-like styling system to positioning controls, you'll want to explore the tools that can create and edit these files efficiently. In our next article, we'll look at subtitle editors that support modern formats like WebVTT.

Time to put your web subtitles to work!