Ever seen subtitles turn into gibberish like this?
Original: Hello, 你好, Привет
Broken: Hello, ä½ å¥½, ÐÑивеÑ
This is an encoding problem - and the tricky part is that most subtitle editors won't even show you what's wrong. While they're great for timing and formatting, they often hide or mishandle encoding issues. Sometimes you need to break out the developer tools.
The Encoding Problem#
Text files aren't just text - they're sequences of bytes that need to be interpreted correctly. Different encoding systems map these bytes to different characters. When a file is read with the wrong encoding, you get mojibake - that garbled text you saw above.
Here's what happens behind the scenes:
Text: 你
UTF-8 bytes: E4 BD A0
Windows-1252 interpretation: ä½ å
This is particularly common with subtitles because they:
Often contain multiple languages
Get shared across different platforms
Come from various editing tools
Might be very old files
Common Encoding Issues#
Mixed Encodings#
Sometimes a single file contains multiple encodings. This usually happens when:
Copying text from different sources
Editing with different tools
Converting files incorrectly
Example of mixed encoding:
1
00:00:01,000 --> 00:00:04,000
This is fine
你好 # <-- UTF-8
ÐÑÐ¸Ð²ÐµÑ # <-- Was UTF-8, read as Windows-1252
Byte Order Marks (BOM)#
The BOM is a special marker at the start of a file that indicates its encoding. Some systems require it, others reject it:
Windows Notepad expects it
Many Unix tools reject it
Some players ignore it entirely
Platform Assumptions#
Different systems make different assumptions:
Windows often defaults to Windows-1252
macOS typically assumes UTF-8
Older systems might use ISO-8859-1
Web platforms usually expect UTF-8
Working with Encodings#
Detecting the Current Encoding#
Most code editors can detect encodings automatically:
Notepad++: "Encoding" menu shows current encoding
VSCode: Bottom right corner shows encoding
Command line:
file -i filename.srt
Converting Between Encodings#
First, always make a backup before converting files - then, to convert:
Using a text editor, like Notepad++:
Encoding -> Convert to UTF-8
Save the file
Using command line tooling:
iconv -f WINDOWS-1252 -t UTF-8 input.srt > output.srt
Testing Across Platforms#
Make sure to always test any converted files using multiple players, verifying that all special characters (non-ASCII) are displayed correctly. Test on target platforms when needed to ensure maximum compatibility.
Why UTF-8 Is the Answer#
UTF-8 has become the standard for good reasons:
Supports all Unicode characters (including emoji 😊)
Backward compatible with ASCII & efficient storage (ASCII characters use just one byte)
Default in modern systems with no platform-specific quirks
Web-friendly
We suggest using UTF-8 for all new subtitle files, and converting legacy files to UTF-8 when needed. This will also increase compatibility with the subtitling tools we provide here at SubZap.
Tools and Commands#
While subtitle editors are great for most tasks, encoding issues often require different tools:
Code Editors#
Notepad++#
This editor shows encoding in the status bar, and can convert between encodings. It also detects encoding automatically.
VSCode#
VSCode is the most popular code editor, and as such has excellent encoding detection. It also has built-in hex viewer and encoding selection right in the status bar.
Sublime Text#
Another, older code editor that has excellent encoding support. It also has hex viewing capabilities and batch processing.
Command Line Tools#
# Detect encoding
file -i subtitle.srt
# Convert to UTF-8
iconv -f ISO-8859-1 -t UTF-8 input.srt > output.srt
# Check for encoding issues
chardet subtitle.srt
Validation Tools#
SubtitleEdit: Has encoding detection
ffmpeg: Can check subtitle encoding
Online validators: Various web tools
Best Practices#
When creating new files: Set your editor's default to UTF-8 and save files with UTF-8 encoding - verify encoding after saving.
When converting legacy files: Make backups before converting files, and test after conversion to ensure all special characters (non-ASCII) are displayed correctly. Verify that all files are encoded in UTF-8 and that no mixed encodings are present.
When validating files: Use multiple tools to verify, test with target players, check all special characters, and verify line endings.
When working with multiple platforms: Test on target platforms (web, mobile) when needed to ensure maximum compatibility.
What's Next?#
Now that you understand encoding, you're ready to tackle more advanced subtitle formats. In our next article, we'll explore SSA/ASS subtitles, where proper encoding is crucial for advanced styling and positioning.