Video: Guidelines for Editing Auto-Generated Captions

Captions are the text of auditory information in a video, including words and non-speech sounds. Closed captions do not appear under a video by default but can be turned on by the viewer.

At the University of Minnesota, all uploaded videos should include captions that are:

accurate
complete
well-placed

After uploading videos to a platform with automatically-generated captions (e.g. YouTube, Kaltura, VoiceThread), follow this guide to proofread and edit your captions.

Depending on the platform you are using, you may allow other users to edit the captions.
- For more information, go to Kaltura: Manage Video Permissions.
The accuracy of auto-generated captions depends on the quality of the audio.
- For support on improving audio, go to Prepare for Recording Audio Voiceovers.

In this article:

Fix mistakes in spelling, add missing words, and fix punctuation.
Adjust captions to align with the audio.
Include speakers, non-speech sounds, and other auditory information.
Review the video with captions turned on

Editing Closed Captions Guidelines

Fix mistakes in spelling, add missing words, and fix punctuation

Automatically-generated captions will miss some speech sounds and misinterpret what the speaker is saying.

Ensure that all spoken words are correct and accurate.
Do not paraphrase or censor what the speaker is saying.

Adjust captions to align with the audio

Ensure each block of caption text is on-screen for between 1.5 and 6 seconds.
Generally, use no more than two lines in each block of text.
- Include a speaker identifier using a third line of captions.
Consider how phrases break across lines.
- Make lines of caption short and easy to read.
- Aim for five to six words per line, or about 32 characters per line.
- Break long caption lines into two shorter lines. Consider:
  - Individual word length: some words are longer than others.
  - Sentence cadence: make sure the sentence break is at a logical point where speech normally pauses.

Inappropriate (too long):
She said I could order popcorn at the movie theatre.

Inappropriate (unnatural break):
She said I could order
popcorn at the movie theatre.

Appropriate:
She said I could order popcorn
at the movie theatre.

Refer to the Described and Captioned Media Program (DCMP) Captioning Key section on Line Division for a more detailed explanation and examples.

Include speakers, non-speech sounds, and other auditory information

Speakers

If there is more than one speaker, add speaker identifiers.
If it is unclear who is speaking, add speaker identifiers.
If the speaker's name is known, label it in parentheses.
If there is back-and-forth conversation between speakers, give each speaker their own block of text.
(Chee)
Put the pumpkin on the table.
(Darren)
Can you hand me the carving knife?
If names are unknown, use generic labels.
(Professor)
Turn to page 394.
If it's clear who's speaking on screen, they do not need to be identified.
- Use an angled bracket to identify the speaker.
  >Put the pumpkin on the table.
- Use a double-angled bracket if the speaker changes.
  >>Can you hand me the carving knife?

Sounds

Omit the sound's source if the source is visible on-screen.
Put non-speech sounds in brackets on their own line.
Example:
[Applause with cheering]
Include the sound's source description.
Example:
[Plane passing overhead]

Music

Use objective words to describe music.
Example:
[intense percussive music]
Caption lyrics verbatim.
Caption the performer and song title, if known.
Example:
[Prince singing "Sometimes It Snows in April"]

Review the video with captions turned on

Check the quality and accuracy of caption lines and non-speech sounds in your video.

Last modified

February 12, 2026

TDX ID

3070