Caption manager
Traduire cette page ? C’est ici :
Caption manager is a free online device creating files that display subtitles on videos.
The input data is a set of transcription files created by a service such as DeepTranscript. The final output is a set of caption files that will display subtitles over a video player or a WordPress website. Below is a sample 3‑hour video used to illustrate the process. Subtitles have been created in the video’s source language (English), then translated to French.
How it works
To create the transcription file in the video’s source language, first export your video to an MP3 sound file.
If you are using DeepTranscript, you may need to cut the MP3 file into fragments that fit within the program’s size limits (probably 1 hour).
Export each transcription as a tabulated text file (not CSV). The extensions of these files may be « txt » or « tab ».
Once your files are ready, go to this page to upload them and enter your settings.
The settings mainly include the output file name (which will usually match the video file name), the minimum duration of a subtitle and the maximum length of each subtitle. By default, the minimum duration is set to 1.5 seconds, and the maximum length to 25 words.
Since speech analysis devices work on breath units, some subtitles will need to be broken into smaller units. Caption manager will break them up and fix them at the correct (interpolated) times.
This program produces ‘srt’ (SubRip), ‘sbv’ (SubViewer) and ‘vtt’ (WebVTT) caption file (subtitle) formats. All of them are compatible with VLC and may be acceptable choices for YouTube. The third is needed by the default video player in WordPress sites. In addition, it is the only one that accepts the non-breaking spaces required by French punctuation. QuickTime video player doesn’t recognise any of these formats.
A few formatting tags such as italics and bold are accepted in subtitles. Line breaks are also accepted to denote speech turns in a dialogue.
Extra features
Glossary
When building a caption file, it is possible to make automatic replacements recorded in a ‘replace’ file. This is a two-column tabulated file, the left containing the string to be searched, and the right its replacement. Replacements are case-sensitive.
Underscores ‘_’ can be used to replace spaces ; they are automatically converted to spaces when the glossary is loaded.
This glossary can be used, for example, to restore capital letters that were forgotten during speech-to-text conversion. It can also be used to insert formatting tags :

Note that it may be necessary to place spaces before and/or after search strings to identify them as whole words. Care should also be taken to avoid self-embedding rules (such as those on the second and third lines), as these would create unwanted repetitions if glossary rules were applied multiple times.
Another feature is available for geeks familiar with regular expression syntax : files containing sets of replacement strings applicable to preg_replace() commands (in PHP syntax). These preg_replace() rules are applied before the glossary search-and-replace rules. As shown below, their sensibility to context solves the problem of repeated self-embedding rewrites.
Merged transcription files
We recommended fragmenting the MP3 exports of the video soundtrack so that they can be processed by an online speech-to-text service. This may yield several transcription files. To facilitate an automatic translation of the entire set of subtitles, you will need to merge these files to a unique one, while in the same time applying changes programmed in the glossary.
Translated transcriptions
Translation is provided free of charge by online services such as DeepTranslate.
It is recommended to perform the translation on a transcript file that is structurally similar to the one produced by DeepTranscript. This is because the transcription mechanism has performed a segmentation based on breath units. That segmentation is also optimal for machine translation. It would be a bad idea to try to translate the contents of « srt » , « sbv » or « vtt » subtitle files, because their segmentation has been modified to suit the maximum length of subtitles.
Note that the output of DeepTranslate won’t be a well-formed transcription file, unlike the one you got from DeepTranscript. The main reason is that DeepTranslate ‘translates’ tabulations to spaces. Worse, it may insert unwanted spaces in time codes, for instance replacing ‘234.5’ with ‘235. 5’. Caption manager will fix all these errors and handle the translated transcription as a well-formed file.
Reader-friendly layout
Caption manager does its best to make subtitle text as easy to read as possible. Each subtitle occupies several lines of approximately equal length. Correct line feeds will be produced if the machine has been instructed as to the maximum number of characters to be produced on each line. The default is 70 because the WordPress Gutenberg video player creates frames with space that allows for about 75 characters per line.
Another useful feature is the ability to control the position of breaks between captions. Reading becomes difficult when certain words are moved to the next caption. To do this, Caption manager requires lists of « non-breaking » words that should not appear at the end of a caption. Each list is attributed to a specific language.
For instance, look at no-break-fr.txt and no-break-en.txt. A few words are summarised as regular expressions. For example, in French, /\s[Ll][ae]\s/ matches le, la, Le and La, and /\s[Dd]es?\s/ can match de, des, De and Des. As matching is case sensitive, it is necessary to include upper case variants, which can be easily combined in regular expressions.
Reordering requires moving words either back or forward. Let us see how it works. Consider the following sequence of captions :
dans cet éditorial de huit cents mots, je me suis également penché sur l'un des médicaments les plus prescrits dans l'histoire de la médecine, à savoir les statines, parce que je devais faire le lien entre tous les éléments.
Since words « de » and « la » are listed in no-break-fr.txt, this sequence will be rendered in the ‘forward’ mode as :
dans cet éditorial de huit cents mots, je me suis également penché sur l'un des médicaments les plus prescrits dans l'histoirede la médecine, à savoir les statines, parce que je devais faire le lien entre tous les éléments.
and in the ‘backward’ mode as :
dans cet éditorial de huit cents mots, je me suis également penché sur l'un des médicaments les plus prescrits dans l'histoire de lamédecine, à savoir les statines, parce que je devais faire le lien entre tous les éléments.
These two options can be compared on this excerpt :

The ‘forward’ mode is the default option because it seems better to delay text in subtitles than to move it backwards.
Example
Below is the video (duration 3 hours) of an interview with Dr Aseem Malhotra, automatically transcribed (with DeepTranscript) and translated to French (with DeepTranslate). This process required a minimum of manual intervention, although careful checking is necessary to finalise the subtitles — and create rules that can be used for future work.
The video is displayed by the standard Gutenberg video player. After that, other players based on a WordPress plugins will be shown.
With English or/and French subtitles
This video is displayed from a Google Cloud source.

On the bottom right of the frame, click the vertical three periods, then point at “Options” and click “English”, “French” or “both” to select languages. Sorry for the use of country flags, but language flags are not available in Unicode…
➡ This is an ongoing process : only the first hour has been revised, so far, to show what can be achieved with a few manual corrections. The next two hours are almost “raw”, only created by the automatic tools.
Below is the same video displayed (with its subtitles) by the WPlyr video player. One disadvantage is that all the players I tried, unlike Gutenberg’s default player (above), do not support colours and subtitle text formatting.
On the ++ side, this player offers rewind and fast forward buttons that allow you to skip 10 seconds at a time.
Unfortunately, this video must be linked to a local source (embarked MP4 file). Trying the same with a Google Cloud source, it doesn’t work. This can place an excessive load on the site if many users are watching its video(s).
Videojs HTML5 Player displays remote video. However, it does not have buttons to rewind or fast forward 10 seconds, nor does it have access to subtitles :
CP Media Player has a very clear display of subtitle texts, notably in the fullscreen mode. It had a conflict with the MEKS audio player used on this site, but this was solved adding an extra parameter (iframe = « 1 ») in the shortcode, yielding :
Unfortunately, the current version (1.1.0) of CP Media Player does not yet have buttons for rewind/forward by a dozen of seconds.
The best compromise at the moment might be the Plyr.io media player. This is done by programming javascript calling a CDN to download its library files (currently https://cdn.plyr.io/3.7.8/plyr.css). With the help of ChatGPT ! I was able to place the elapsed time display at the bottom left of the frame…
The main limitation is that it does not understand « cue:: » instructions for the layout of subtitle text.
Details of the process
The source video was copied (via a screen capture) from The Joe Rogan Experience (April 2023 lien:kow3). There is some irregularity in the streaming : frames tend to freeze although the sound track remains continuous. In principle, this is not too much of a problem for speech data.
Due to the length of the video (3 hours), the MP3 sound track was sliced to three one-hour segments, namely 1.mp3, 2.mp3 and 3.mp3. I used the (excellent) TwistedWave audio editor on the Mac to do this. Cutting points needed to be set at the end of sentences.
Each of the three sound files was submitted to DeepTranscript which transcribed them as text files. These were exported in tabulated text format : 1.txt, 2.txt, 3.txt. Extension ‘tab’ is another option.
Transcription files were checked in a spreadsheet editor — such as Excel, PlanMaker, etc. — to make sure that times (in seconds) are displayed as floating point numbers. You can see an example of the transcription (1.txt) in an Excel version : download “1.xlsx”. Note that the first line and the leftmost column do not contain significant data. These will automatically be skipped by Caption manager.
Transcription files, such as 1.txt, are not suitable for displaying subtitles on the video, at least for the following reasons :
- Their format is not a recognised standard for use in video players.
- Lines are of very variable lengths because DeepTranscript worked on breath units to segment the transcription. For instance, text on line #14 would not fit into a single video frame.
- Their content is “raw” and notably contains misspelled proper names. For instance, “Bhattacharya” is transcribed as “bada charia”! Also — one of the worst cases — “ivermectin” as “ivamechton”… In addition, all text appears in lowercase. Still, these errors are consistent in the sense that they can be fixed by the rewrite rules of a glossary.
- We need to merge the three transcription files to produce a unique caption file. This is not just a matter of copy-paste. The time codes should add up continuously from the first to the last part.
Below is a screenshot of the top of the homepage of Caption manager after uploading two transcription files, automatically renamed « 001.txt » and « 002.txt », two glossary files renamed « replace_001.txt » and « replace_002.txt », two regular expression files « reg_replace_001.txt » and « reg_replace_002.txt », and two non-breaking lists of words « no_break_001.txt » and « no_break_002.txt ».

Create subtitles in the source language
On the by Caption manager page, upload 1.txt, 2.txt, 3.txt (in this order). Then enter a name for output files and click CREATE A SET OF CAPTION FILES. This produces caption files in the three formats (srt, sbv, vtt).
If the name of the caption file (in any of the three formats) is identical to the name of the video file (excluding their extensions) and if they are on the same level, VLC will read the video and display subtitles. A great feature of VLC is that by clicking on the left and right arrows, you can move backwards and forwards by 10 seconds, which is very convenient to re-read a fragment.
A pleasant surprise : the size of the subtitles is now much more in keeping with the frame of the video. The main reason is that a parameter Max number of words in each caption has been set — to 25 by default. So, long entries (such as line 14 of 1.xlsx) have been split into several subtitles with their time codes interpolated. Caption manager also takes care of the minimum duration of each subtitle (1.5 sec by default). The time codes are adjusted so that the subtitles do not overlap with each other.
Now it is time to edit 1.txt, 2.txt, 3.txt in a plain text editor — e.g. TextEdit on a Mac. Read the subtitles while playing the video (preferably at slow speed) and edit the transcription files.
However, you should try to automate corrections by using a glossary and/or sets of regular expression rules. Create a file named replace-en.txt in your text editor. Type the search expression at the start of each line. Then type a tabulation and type its replacement.
An example of the glossary for this video is here : replace-en.txt.
The same procedure works for regular expressions : create a tabulated text file containing pairs of arguments for preg_replace() instructions (in PHP syntax) and upload it.
An example of regular expression rewrite rules for this video is here : preg-replace-en.txt. Note that this set contains rules that won’t be self-embedding because of using contexts (notated ‘$1’).
Edit transcription files 1.txt, 2.txt, 3.txt to fix errors that cannot be fixed by the glossary. Then click DELETE ALL TRANSCRIPTION FILES, upload replace-en.txt, preg_replace-en.txt (if necessary) and updated transcription files. Click again CREATE A SET OF CAPTION FILES. This will produce the final forms of subtitle files in the three formats (srt, sbv, vtt).
When captions are created in ‘vtt’ (WebVTT) subtitle format, options are provided to change the colour of text in sections enclosed by <b>, <i> and <u> tags :

Colour names will be accepted if compliant with standard HTML color names. Currently, these colours are displayed (on Mac) by Chrome, Opera, Brave browsers, yet neither Firefox nor Safari.
Create subtitles in different languages
To create translated subtitles using DeepTranslate, it would be a bad idea to send ‘srt’, ‘sbv’ and ‘vtt’ files directly for translation. This is because they contain subtitles that are spread over several frames (due to the 25-word limit). This can lead to inaccurate translations. It is therefore better to work with transcription files, as they contain subtitles segmented by breath units, which make more sense to the automatic translator.
So, we need to reconstruct a unique transcription file by merging 1.txt, 2.txt and 3.txt, to which glossary search-replace rules will be applied. This is done by a single click on button COMBINE SEQUENTIALLY THESE TRANSCRIPTIONS TO SINGLE FILE. The name of the file created by this process will always be new_transcription.txt — download it to see an example.
Open new_transcription.txt in a plain-text editor, copy its contents and paste it into DeepTranslate after choosing the target language. (You may need to run a premium version if the text is long.) Then, copy the translation and save it as a text file, e.g. transcription-fr.txt — download it as an example.
It is also possible to use ChatGPT for these translations, although the current free version (3.5) has a small limit on the size of the text.
You may notice that HTML tags for italics, bold, emphasized or strong text have been replaced with non-standard tags [i], [/i], [b] and [/b]. The reason is that DeepTranslate tends to delete a few opening HTML tags. These tags will be restored to the standard HTML format when producing caption files.
Now you need to create caption files in the three formats using this (unique) transcription file. Click DELETE ALL TRANSCRIPTION FILES and DELETE ALL GLOSSARY FILES, then upload transcription-fr.txt and click button CREATE A SET OF CAPTION FILES.

You’re almost there ! However, you still need to correct translation errors, particularly in relation to technical or scientific terms that DeepTranslate may have misunderstood. A typical example in this interview is “keyhole heart surgery” (time 56.3 seconds) which was translated “chirurgie cardiaque par trou de serrure”!
To fix this (and other) errors, you should create a specific glossary file, e.g. replace-fr.txt. It would contain the following rewrite rule :

A more general version of this rule is programmed, using a regular expression, in preg-replace-fr.txt. It replaces both « chirurgie d’urgence de trou de serrure » and « chirurgie trou de serrure » with « chirurgie cardiaque minimalement invasive (keyhole heart surgery) ». The same regular expression file contains a rule creating French chevrons quotes along with non-breaking spaces :
\s[\"«]([^"^»]+)?[\"»]\s « $1 »
This method is not perfect because sometimes DeepTranslate forgets a few opening or closing quotes. This happened only once in the 3‑hour sample video. Still, it covers most cases.
Reading the final caption file, first with a text editor and then as real subtitles with VLC, will give you the opportunity to correct errors, preferably using one of the two glossaries and the regular expression rules, otherwise in the source transcription files 1.txt, 2.txt and 3.txt. This may require you to restart the process : delete all files, upload source transcription files, create caption files in the source language, merge source transcription files into new_transcription.txt, translate it, and finally create caption files in the new language. It’s all done in single clicks — don’t forget to reload the glossaries and regular expression files if they are needed !
Multi-language subtitle mixing
Once subtitles have been translated into different languages, it is possible to display two (or more) languages together at the bottom of the video. This has been done on the video shown as an example.
Remember that we have two transcription files containing all the English and French subtitles. Their timecodes are identical because they were created by DeepTranscript and left unchanged by DeepTranslate.
Upload the English transcription, then the French transcription which will be renamed 001.txt and 002.txt respectively. Then upload the English and French glossaries (in the same order). These will be renamed replace_001.txt and replace_002.txt respectively. You can do the same with regular expression files that will be renamed preg_replace_001.txt, preg_replace_002.txt, etc. As you might guess, each of these glossaries will be applied to the transcription file with the same number. Other language versions (and their respective glossaries) can be added at this stage.
If the timecodes of 001.txt and 002.txt are identical, the following will be displayed :

You can apply simple HTML formats supported by video players : italics or bold. This makes it easier to distinguish between languages in the video. Here we have chosen ‘italic’ for the English text and no formatting for the French text.
If ‘italic’ is selected, italic text will be converted to bold, and vice versa.
An optional token, by default « ~•~ », can be set to create blank lines separating language versions. Leave it empty to remove these empty lines, an option which is generally satisfactory when versions have different colours. In the current ‘vtt’ (WebVTT) format, colour is only applied to bold or italic text (see above), which is not optimal.
As two (or more) subtitles will be displayed within the same frame, it may be useful to reduce the maximum length of each subtitle. For bilingual subtitles, we set the limit to 15 words.
At last, click button CREATE A SET OF CAPTION FILES BY SUPERIMPOSING THESE TRANSCRIPTIONS.
Subtitles in consecutive languages
This is the case of a video in which several languages have been used consecutively in different parts. It is easy to deal with this situation : segment the MP3 sound file into fragments containing only one language. Transcribe and translate them separately. Then upload them in alphabetical order, creating 001.txt, 002.txt, etc.
Create a glossary for each language and upload the glossary files in the order they are needed : replace_001.txt, replace_002.txt, etc. The same glossary may need to be uploaded several times if that language appears in different fragments.
If a glossary is missing, the previous one will be used. For example, if transcription file 005.txt does not find replace_005.txt, it will use replace_004.txt – or the next non-empty preceding one. The same applies to regular expression files. Note that this is the case when you are dealing with a sequence of transcription files with a single glossary and/or regular expression file..
Warning
The confidentiality of the data you process with this service is not guaranteed : someone sharing your Internet access (IP address) could export the content. This content (your workspace) is however automatically deleted after 24 hours of non-use. Furthermore, you can click the DELETE ALL FILES button at any time.