Building Automation Tools and Scripts

Subtitle work often involves repetitive tasks: renaming files, converting formats, checking for errors. While these tasks are straightforward, they consume valuable time that could be spent on creative work. Smart automation helps reclaim that time while reducing human error.

This article explores automation through command-line tools and shell scripting. If you're new to the command line, consider starting with GUI-based automation features in editors like Subtitle Edit or Aegisub. However, if you're comfortable with basic terminal operations and want to build powerful, custom automation workflows, this guide will help you get started.

You'll need:

Access to a terminal (Bash on Linux/Mac, PowerShell/WSL on Windows)
Basic understanding of command-line navigation
Text editor for writing scripts

Building Blocks

Before diving into complex automation, let's look at the basic tools available on most systems:

# List all subtitle files
ls *.srt

# Find files recursively
find . -name "*.srt"

# Basic text processing
grep -l "WEBVTT" *.vtt

# File operations

These simple commands form the foundation for more sophisticated automation. Let's build on them to solve real-world problems.

Language Detection and Organization

One of the most common tasks is organizing subtitles by language. While filenames like movie_English.srt or series_ep01_spanish.srt are common, standardizing to .en.srt and .es.srt makes files easier to work with. It also guarantees compatibility with all common video players.

Using language detection tools like langdetect (see langdetect on GitHub), we can automate this process:

#!/bin/bash
# rename-by-language.sh

for subtitle in *.srt; do
    # Get language code using your preferred detection method
    lang=$(detect-subtitle-language "$subtitle")

    # Rename keeping original name but adding language code
    base=${subtitle%.srt}
    mv "$subtitle" "${base}.${lang}.srt"

Matching Videos with Subtitles

Video files and their subtitles often get separated or renamed. This commonly happens when files are moved between systems, shared online, or processed through different tools. While manual matching is feasible for a few files, it becomes tedious and error-prone with larger collections.

The following script helps automate this matching process, handling common video formats and maintaining language information:

#!/bin/bash
# match-subs-to-videos.sh

# Find all video files
for video in *.{mp4,mkv,avi}; do
    # Skip if no videos found
    [[ -f "$video" ]] || continue

    # Get base name without extension
    basename=${video%.*}

    # Look for subtitles with similar names
    for sub in "$basename"*.srt; do
        # Skip if no match found
        [[ -f "$sub" ]] || continue

        # Standardize the naming
        if [[ "$sub" =~ \.([a-z]{2})\.srt$ ]]; then
            # Already has language code, just ensure consistent format
            lang="${BASH_REMATCH[1]}"
            mv "$sub" "${basename}.${lang}.srt"
        else
            # No language code, mark as unknown
            mv "$sub" "${basename}.unknown.srt"
        fi
    done

This approach preserves existing language codes while standardizing the naming format. When a subtitle file's language can't be determined from its filename, it's marked as unknown for later review.

Handling Multiple Subtitle Tracks

Modern media often comes with multiple subtitle tracks serving different purposes. Beyond language variants, you might have SDH subtitles (with sound descriptions), commentary tracks, or forced subtitles for foreign language sections. Organizing these different types makes them easier to manage and verify.

The following script helps sort subtitles by type based on their content:

#!/bin/bash
# organize-subtitle-tracks.sh

# Create directories for different subtitle types
mkdir -p {main,sdh,commentary}

for sub in *.srt; do
    # Skip if no subtitles found
    [[ -f "$sub" ]] || continue

    # Check content for typical patterns
    if grep -qi "\[.*\]" "$sub" || grep -qi "(.*)" "$sub"; then
        mv "$sub" "sdh/$sub"
    elif grep -qi "commentary" "$sub" || grep -qi "director" "$sub"; then
        mv "$sub" "commentary/$sub"
    else
        mv "$sub" "main/$sub"
    fi

This organization makes it easier to apply specific quality checks to each subtitle type. For example, SDH subtitles should always include sound descriptions, while regular subtitles shouldn't. Commentary tracks might have different timing requirements since they often contain additional information beyond the dialogue.

Quality Verification

The most common and critical issue with subtitle files is encoding. While modern systems generally use UTF-8, you'll often encounter files with different encodings, especially when working with older content or files from various sources.

Here's a practical script to check and report encoding issues:

#!/bin/bash
# check-encoding.sh

log_file="encoding_check.log"
echo "Subtitle Encoding Check $(date)" > "$log_file"

for sub in *.srt; do
    [[ -f "$sub" ]] || continue

    encoding=$(file -i "$sub" | grep -o "charset=.*$")
    if [[ "$encoding" != "charset=utf-8" ]]; then
        echo "$sub: $encoding" >> "$log_file"

        # Optional: attempt to convert to UTF-8
        # iconv -f original_encoding -t UTF-8 "$sub" > "${sub}.utf8"
    fi

When dealing with encoding issues:

Always verify the original encoding before conversion
Keep backups of original files
Test the converted files in a media player

Building Reliable Workflows

When processing multiple subtitle files, it's important to maintain a consistent approach. Here's a practical workflow that handles common real-world scenarios:

#!/bin/bash
# process-subtitles.sh

# Create backup directory
backup_dir="original_files_$(date +%Y%m%d)"
mkdir -p "$backup_dir"

# First, backup all original files
cp *.srt "$backup_dir/"

# Check encodings and convert if needed
for sub in *.srt; do
    [[ -f "$sub" ]] || continue

    encoding=$(file -i "$sub" | grep -o "charset=.*$")
    if [[ "$encoding" != "charset=utf-8" ]]; then
        echo "Converting $sub from $encoding"
        iconv -f WINDOWS-1252 -t UTF-8 "$sub" > "temp.srt" && mv "temp.srt" "$sub"
    fi
done

# Organize by language (using your preferred detection method)
./rename-by-language.sh

# Match with video files if present

This workflow prioritizes:

Preserving original files
Handling encoding issues
Maintaining consistent naming
Organizing by language

The key is to keep the process simple and focus on the most common issues you'll encounter in day-to-day work.

Fully Automated Solutions using SubZap

While command-line tools offer powerful automation capabilities, maintaining scripts and handling edge cases requires ongoing effort. None of the scripts shown in this article are perfect, and they do not cover any special edge cases. SubZap's online tools however, do, and we automate these common tasks transparently:

Encoding detection and conversion
Language detection and proper file naming
Format conversion with proper character handling
Translation with maintained timing and formatting

This eliminates the need for custom scripts while ensuring professional results. For an enterprise-grade solution, we provide programmatic APIs for automation.

What's Next?

Now that you understand subtitle automation fundamentals, explore how streaming platforms handle subtitles at scale. Our next article covers streaming platform standards and delivery requirements.