Caption manager

Home » Caption manager

Traduire cette page ? C’est ici :

Caption manager is a free online device crea­ting files that display subtitles on videos.

The input data is a set of trans­crip­tion files crea­ted by a service such as DeepTranscript. The final output is a set of caption files that will display subtitles over a video player or a WordPress website. Below is a sample 3‑hour video used to illus­trate the process. Subtitles have been crea­ted in the video’s source language (English), then trans­la­ted to French.

How it works

To create the trans­crip­tion file in the video’s source language, first export your video to an MP3 sound file.

If you are using DeepTranscript, you may need to cut the MP3 file into frag­ments that fit within the program’s size limits (proba­bly 1 hour).

Export each trans­crip­tion as a tabu­la­ted text file (not CSV). The exten­sions of these files may be « txt » or « tab ».

Once your files are ready, go to this page to upload them and enter your settings.

The settings mainly include the output file name (which will usually match the video file name), the mini­mum dura­tion of a subtitle and the maxi­mum length of each subtitle. By default, the mini­mum dura­tion is set to 1.5 seconds, and the maxi­mum length to 25 words.

Since speech analy­sis devices work on breath units, some subtitles will need to be broken into smal­ler units. Caption manager will break them up and fix them at the correct (inter­po­la­ted) times.

This program produces ‘srt’ (SubRip), ‘sbv’ (SubViewer) and ‘vtt’ (WebVTT) caption file (subtitle) formats. All of them are compa­tible with VLC and may be accep­table choices for YouTube. The third is needed by the default video player in WordPress sites. In addi­tion, it is the only one that accepts the non-breaking spaces requi­red by French punc­tua­tion. QuickTime video player doesn’t reco­gnise any of these formats.

A few format­ting tags such as italics and bold are accep­ted in subtitles. Line breaks are also accep­ted to denote speech turns in a dialogue.

Extra features


When buil­ding a caption file, it is possible to make auto­ma­tic repla­ce­ments recor­ded in a ‘replace’ file. This is a two-column tabu­la­ted file, the left contai­ning the string to be sear­ched, and the right its repla­ce­ment. Replacements are case-sensitive.

Underscores ‘_’ can be used to replace spaces ; they are auto­ma­ti­cally conver­ted to spaces when the glos­sary is loaded.

This glos­sary can be used, for example, to restore capi­tal letters that were forgot­ten during speech-to-text conver­sion. It can also be used to insert format­ting tags :

Note that it may be neces­sary to place spaces before and/or after search strings to iden­tify them as whole words. Care should also be taken to avoid self-embedding rules (such as those on the second and third lines), as these would create unwan­ted repe­ti­tions if glos­sary rules were applied multiple times.

Another feature is avai­lable for geeks fami­liar with regu­lar expres­sion syntax : files contai­ning sets of repla­ce­ment strings appli­cable to preg_replace() commands (in PHP syntax). These preg_replace() rules are applied before the glos­sary search-and-replace rules. As shown below, their sensi­bi­lity to context solves the problem of repea­ted self-embedding rewrites.

Merged transcription files

We recom­men­ded frag­men­ting the MP3 exports of the video sound­track so that they can be proces­sed by an online speech-to-text service. This may yield seve­ral trans­crip­tion files. To faci­li­tate an auto­ma­tic trans­la­tion of the entire set of subtitles, you will need to merge these files to a unique one, while in the same time applying changes program­med in the glossary.

Translated transcriptions

Translation is provi­ded free of charge by online services such as DeepTranslate.

It is recom­men­ded to perform the trans­la­tion on a trans­cript file that is struc­tu­rally simi­lar to the one produ­ced by DeepTranscript. This is because the trans­crip­tion mecha­nism has perfor­med a segmen­ta­tion based on breath units. That segmen­ta­tion is also opti­mal for machine trans­la­tion. It would be a bad idea to try to trans­late the contents of « srt » , « sbv » or « vtt » subtitle files, because their segmen­ta­tion has been modi­fied to suit the maxi­mum length of subtitles.

Note that the output of DeepTranslate won’t be a well-formed trans­crip­tion file, unlike the one you got from DeepTranscript. The main reason is that DeepTranslate ‘trans­lates’ tabu­la­tions to spaces. Worse, it may insert unwan­ted spaces in time codes, for instance repla­cing ‘234.5’ with ‘235. 5’. Caption mana­ger will fix all these errors and handle the trans­la­ted trans­crip­tion as a well-formed file.

Reader-friendly layout

Caption manager does its best to make subtitle text as easy to read as possible. Each subtitle occu­pies seve­ral lines of approxi­ma­tely equal length. Correct line feeds will be produ­ced if the machine has been instruc­ted as to the maxi­mum number of charac­ters to be produ­ced on each line. The default is 70 because the WordPress Gutenberg video player creates frames with space that allows for about 75 charac­ters per line.

Another useful feature is the ability to control the posi­tion of breaks between captions. Reading becomes diffi­cult when certain words are moved to the next caption. To do this, Caption manager requires lists of « non-breaking » words that should not appear at the end of a caption. Each list is attri­bu­ted to a speci­fic language.

For instance, look at no-break-fr.txt and no-break-en.txt. A few words are summa­ri­sed as regu­lar expres­sions. For example, in French, /\s[Ll][ae]\s/ matches le, la, Le and La, and /\s[Dd]es?\s/ can match de, des, De and Des. As matching is case sensi­tive, it is neces­sary to include upper case variants, which can be easily combi­ned in regu­lar expressions.

Reordering requires moving words either back or forward. Let us see how it works. Consider the follo­wing sequence of captions :

dans cet éditorial de huit cents mots, je me suis également penché
sur l'un des médicaments les plus prescrits dans l'histoire de la

médecine, à savoir les statines, parce que
je devais faire le lien entre tous les éléments.

Since words « de » and « la » are listed in no-break-fr.txt, this sequence will be rende­red in the ‘forward’ mode as :

dans cet éditorial de huit cents mots, je me suis également penché
sur l'un des médicaments les plus prescrits dans l'histoirede la médecine, à savoir les statines, parce que
je devais faire le lien entre tous les éléments.

and in the ‘back­ward’ mode as :

dans cet éditorial de huit cents mots, je me suis également penché
sur l'un des médicaments les plus prescrits dans l'histoire de lamédecine,

à savoir les statines, parce que
je devais faire le lien entre tous les éléments.

These two options can be compa­red on this excerpt :

The ‘forward’ mode is the default option because it seems better to delay text in subtitles than to move it backwards.


Below is the video (dura­tion 3 hours) of an inter­view with Dr Aseem Malhotra, auto­ma­ti­cally trans­cri­bed (with DeepTranscript) and trans­la­ted to French (with DeepTranslate). This process requi­red a mini­mum of manual inter­ven­tion, although care­ful checking is neces­sary to fina­lise the subtitles — and create rules that can be used for future work.

The video is displayed by the stan­dard Gutenberg video player. After that, other players based on a WordPress plugins will be shown.

Video of Dr Aseem Malhotra inter­vie­wed by Joe Rogan, 23 April 2023
With English or/and French subtitles
This video is displayed from a Google Cloud source.
Select language on video

On the bottom right of the frame, click the verti­cal three periods, then point at “Options” and click “English”, “French” or “both” to select languages. Sorry for the use of coun­try flags, but language flags are not avai­lable in Unicode…

This is an ongoing process : only the first hour has been revi­sed, so far, to show what can be achie­ved with a few manual correc­tions. The next two hours are almost “raw”, only crea­ted by the auto­ma­tic tools.

Below is the same video displayed (with its subtitles) by the WPlyr video player. One disad­van­tage is that all the players I tried, unlike Gutenberg’s default player (above), do not support colours and subtitle text formatting.

On the ++ side, this player offers rewind and fast forward buttons that allow you to skip 10 seconds at a time.

Unfortunately, this video must be linked to a local source (embar­ked MP4 file). Trying the same with a Google Cloud source, it doesn’t work. This can place an exces­sive load on the site if many users are watching its video(s).

Videojs HTML5 Player displays remote video. However, it does not have buttons to rewind or fast forward 10 seconds, nor does it have access to subtitles :

CP Media Player has a very clear display of subtitle texts, nota­bly in the fulls­creen mode. It had a conflict with the MEKS audio player used on this site, but this was solved adding an extra para­me­ter (iframe = « 1 ») in the short­code, yielding :

Unfortunately, the current version (1.1.0) of CP Media Player does not yet have buttons for rewind/forward by a dozen of seconds.

The best compro­mise at the moment might be the Plyr​.io media player. This is done by program­ming javas­cript calling a CDN to down­load its library files (currently https://​cdn​.plyr​.io/​3​.​7​.​8​/​p​l​y​r​.​css). With the help of ChatGPT ! I was able to place the elap­sed time display at the bottom left of the frame…

The main limi­ta­tion is that it does not unders­tand « cue:: » instruc­tions for the layout of subtitle text.

<link rel="stylesheet" href="">
<script src=""></script>

<video id="my-video" controls style="width: 100%;">
  <source src="https:// ... mp4" type="video/mp4">
  <track kind="captions" src="https:// ... vtt" srclang="fr" label="&#x1f1eb;&#x1f1f7; - &#x1f1ec;&#x1f1e7;&nbsp;French-English">
  <track kind="captions" src="https:// ... vtt" srclang="en" label="&#x1f1ec;&#x1f1e7;&nbsp;English" default>
<div id="custom-time" style="color:blue; font-weight:bold;"></div>
  document.addEventListener('DOMContentLoaded', () => {
    const player = new Plyr('#my-video', {
      controls: ['play', 'rewind', 'fast-forward', 'progress', 'volume', 'captions', 'settings', 'fullscreen'],
      seekTime: 10,
      captions: { active: true }
    player.on('timeupdate', () => {
      const currentTime = player.currentTime;
      const elapsed = Math.floor(currentTime);
      const formattedElapsed = formatTime(elapsed);
      const customTimeElement = document.getElementById('custom-time');
      customTimeElement.innerText = formattedElapsed;
    function formatTime(time) {
      const hours = Math.floor(time / 3600);
      const minutes = Math.floor((time % 3600) / 60);
      const seconds = Math.floor(time % 60);
      return `${hours}:${String(minutes).padStart(2, '0')}:${String(seconds).padStart(2,'0')}`;

Details of the process

The source video was copied (via a screen capture) from The Joe Rogan Experience (April 2023 lien:kow3). There is some irre­gu­la­rity in the strea­ming : frames tend to freeze although the sound track remains conti­nuous. In prin­ciple, this is not too much of a problem for speech data.

Due to the length of the video (3 hours), the MP3 sound track was sliced to three one-hour segments, namely 1.mp3, 2.mp3 and 3.mp3. I used the (excellent) TwistedWave audio editor on the Mac to do this. Cutting points needed to be set at the end of sentences.

Each of the three sound files was submit­ted to DeepTranscript which trans­cri­bed them as text files. These were expor­ted in tabu­la­ted text format : 1.txt, 2.txt, 3.txt. Extension ‘tab’ is another option.

Transcription files were checked in a spread­sheet editor — such as Excel, PlanMaker, etc. — to make sure that times (in seconds) are displayed as floa­ting point numbers. You can see an example of the trans­crip­tion (1.txt) in an Excel version : down­load “1.xlsx”. Note that the first line and the left­most column do not contain signi­fi­cant data. These will auto­ma­ti­cally be skip­ped by Caption manager.

Transcription files, such as 1.txt, are not suitable for displaying subtitles on the video, at least for the follo­wing reasons :

  1. Their format is not a reco­gni­sed stan­dard for use in video players.
  2. Lines are of very variable lengths because DeepTranscript worked on breath units to segment the trans­crip­tion. For instance, text on line #14 would not fit into a single video frame.
  3. Their content is “raw” and nota­bly contains miss­pel­led proper names. For instance, “Bhattacharya” is trans­cri­bed as “bada charia”! Also — one of the worst cases — “iver­mec­tin” as “ivamech­ton”… In addi­tion, all text appears in lower­case. Still, these errors are consistent in the sense that they can be fixed by the rewrite rules of a glossary.
  4. We need to merge the three trans­crip­tion files to produce a unique caption file. This is not just a matter of copy-paste. The time codes should add up conti­nuously from the first to the last part.

Below is a screen­shot of the top of the home­page of Caption mana­ger after uploa­ding two trans­crip­tion files, auto­ma­ti­cally rena­med « 001.txt » and « 002.txt », two glos­sary files rena­med « replace_001.txt » and « replace_002.txt », two regu­lar expres­sion files « reg_replace_001.txt » and « reg_replace_002.txt », and two non-breaking lists of words « no_break_001.txt » and « no_break_002.txt ».

Create subtitles in the source language

On the by Caption manager page, upload 1.txt, 2.txt, 3.txt (in this order). Then enter a name for output files and click CREATESET OF CAPTION FILES. This produces caption files in the three formats (srt, sbv, vtt).

If the name of the caption file (in any of the three formats) is iden­ti­cal to the name of the video file (exclu­ding their exten­sions) and if they are on the same level, VLC will read the video and display subtitles. A great feature of VLC is that by clicking on the left and right arrows, you can move back­wards and forwards by 10 seconds, which is very conve­nient to re-read a fragment.

A plea­sant surprise : the size of the subtitles is now much more in keeping with the frame of the video. The main reason is that a para­me­ter Max number of words in each caption has been set — to 25 by default. So, long entries (such as line 14 of 1.xlsx) have been split into seve­ral subtitles with their time codes inter­po­la­ted. Caption manager also takes care of the mini­mum dura­tion of each subtitle (1.5 sec by default). The time codes are adjus­ted so that the subtitles do not over­lap with each other.

Now it is time to edit 1.txt, 2.txt, 3.txt in a plain text editor — e.g. TextEdit on a Mac. Read the subtitles while playing the video (prefe­ra­bly at slow speed) and edit the trans­crip­tion files.

However, you should try to auto­mate correc­tions by using a glos­sary and/or sets of regu­lar expres­sion rules. Create a file named replace-en.txt in your text editor. Type the search expres­sion at the start of each line. Then type a tabu­la­tion and type its replacement.

An example of the glos­sary for this video is here : replace-en.txt.

The same proce­dure works for regu­lar expres­sions : create a tabu­la­ted text file contai­ning pairs of argu­ments for preg_replace() instruc­tions (in PHP syntax) and upload it.

An example of regu­lar expres­sion rewrite rules for this video is here : preg-replace-en.txt. Note that this set contains rules that won’t be self-embedding because of using contexts (nota­ted ‘$1’).

Edit trans­crip­tion files 1.txt, 2.txt, 3.txt to fix errors that cannot be fixed by the glos­sary. Then click DELETE ALL TRANSCRIPTION FILES, upload replace-en.txt, preg_replace-en.txt (if neces­sary) and upda­ted trans­crip­tion files. Click again CREATESET OF CAPTION FILES. This will produce the final forms of subtitle files in the three formats (srt, sbv, vtt).

When captions are crea­ted in ‘vtt’ (WebVTT) subtitle format, options are provi­ded to change the colour of text in sections enclo­sed by <b>, <i> and <u> tags :

Colour names will be accep­ted if compliant with stan­dard HTML color names. Currently, these colours are displayed (on Mac) by Chrome, Opera, Brave brow­sers, yet neither Firefox nor Safari.

Create subtitles in different languages

To create trans­la­ted subtitles using DeepTranslate, it would be a bad idea to send ‘srt’, ‘sbv’ and ‘vtt’ files directly for trans­la­tion. This is because they contain subtitles that are spread over seve­ral frames (due to the 25-word limit). This can lead to inac­cu­rate trans­la­tions. It is there­fore better to work with trans­crip­tion files, as they contain subtitles segmen­ted by breath units, which make more sense to the auto­ma­tic translator.

So, we need to recons­truct a unique trans­crip­tion file by merging 1.txt, 2.txt and 3.txt, to which glos­sary search-replace rules will be applied. This is done by a single click on button COMBINE SEQUENTIALLY THESE TRANSCRIPTIONS TO SINGLE FILE. The name of the file crea­ted by this process will always be new_transcription.txtdown­load it to see an example.

Open new_transcription.txt in a plain-text editor, copy its contents and paste it into DeepTranslate after choo­sing the target language. (You may need to run a premium version if the text is long.) Then, copy the trans­la­tion and save it as a text file, e.g. transcription-fr.txtdown­load it as an example.

It is also possible to use ChatGPT for these trans­la­tions, although the current free version (3.5) has a small limit on the size of the text.

You may notice that HTML tags for italics, bold, empha­si­zed or strong text have been repla­ced with non-standard tags [i], [/i], [b] and [/b]. The reason is that DeepTranslate tends to delete a few opening HTML tags. These tags will be resto­red to the stan­dard HTML format when produ­cing caption files.

Now you need to create caption files in the three formats using this (unique) trans­crip­tion file. Click DELETE ALL TRANSCRIPTION FILES and DELETE ALL GLOSSARY FILES, then upload transcription-fr.txt and click button CREATESET OF CAPTION FILES.

You’re almost there ! However, you still need to correct trans­la­tion errors, parti­cu­larly in rela­tion to tech­ni­cal or scien­ti­fic terms that DeepTranslate may have misun­ders­tood. A typi­cal example in this inter­view is “keyhole heart surgery” (time 56.3 seconds) which was trans­la­ted “chirur­gie cardiaque par trou de serrure”!

To fix this (and other) errors, you should create a speci­fic glos­sary file, e.g. replace-fr.txt. It would contain the follo­wing rewrite rule :

A more gene­ral version of this rule is program­med, using a regu­lar expres­sion, in preg-replace-fr.txt. It replaces both « chirur­gie d’ur­gence de trou de serrure » and « chirur­gie trou de serrure » with « chirur­gie cardiaque mini­ma­le­ment inva­sive (keyhole heart surgery) ». The same regu­lar expres­sion file contains a rule crea­ting French chevrons quotes along with non-breaking spaces :

\s[\"«]([^"^»]+)?[\"»]\s	 «&nbsp;$1&nbsp;» 

This method is not perfect because some­times DeepTranslate forgets a few opening or closing quotes. This happe­ned only once in the 3‑hour sample video. Still, it covers most cases.

Reading the final caption file, first with a text editor and then as real subtitles with VLC, will give you the oppor­tu­nity to correct errors, prefe­ra­bly using one of the two glos­sa­ries and the regu­lar expres­sion rules, other­wise in the source trans­crip­tion files 1.txt, 2.txt and 3.txt. This may require you to restart the process : delete all files, upload source trans­crip­tion files, create caption files in the source language, merge source trans­crip­tion files into new_transcription.txt, trans­late it, and finally create caption files in the new language. It’s all done in single clicks — don’t forget to reload the glos­sa­ries and regu­lar expres­sion files if they are needed !

Multi-language subtitle mixing

Once subtitles have been trans­la­ted into different languages, it is possible to display two (or more) languages toge­ther at the bottom of the video. This has been done on the video shown as an example.

Remember that we have two trans­crip­tion files contai­ning all the English and French subtitles. Their time­codes are iden­ti­cal because they were crea­ted by DeepTranscript and left unchan­ged by DeepTranslate.

Upload the English trans­crip­tion, then the French trans­crip­tion which will be rena­med 001.txt and 002.txt respec­ti­vely. Then upload the English and French glos­sa­ries (in the same order). These will be rena­med replace_001.txt and replace_002.txt respec­ti­vely. You can do the same with regu­lar expres­sion files that will be rena­med preg_replace_001.txt, preg_replace_002.txt, etc. As you might guess, each of these glos­sa­ries will be applied to the trans­crip­tion file with the same number. Other language versions (and their respec­tive glos­sa­ries) can be added at this stage.

If the time­codes of 001.txt and 002.txt are iden­ti­cal, the follo­wing will be displayed :

You can apply simple HTML formats suppor­ted by video players : italics or bold. This makes it easier to distin­guish between languages in the video. Here we have chosen ‘italic’ for the English text and no format­ting for the French text.

If ‘italic’ is selec­ted, italic text will be conver­ted to bold, and vice versa.

An optio­nal token, by default « ~•~ », can be set to create blank lines sepa­ra­ting language versions. Leave it empty to remove these empty lines, an option which is gene­rally satis­fac­tory when versions have different colours. In the current ‘vtt’ (WebVTT) format, colour is only applied to bold or italic text (see above), which is not optimal.

As two (or more) subtitles will be displayed within the same frame, it may be useful to reduce the maxi­mum length of each subtitle. For bilin­gual subtitles, we set the limit to 15 words.


Subtitles in consecutive languages

This is the case of a video in which seve­ral languages have been used conse­cu­ti­vely in different parts. It is easy to deal with this situa­tion : segment the MP3 sound file into frag­ments contai­ning only one language. Transcribe and trans­late them sepa­ra­tely. Then upload them in alpha­be­ti­cal order, crea­ting 001.txt, 002.txt, etc.

Create a glos­sary for each language and upload the glos­sary files in the order they are needed : replace_001.txt, replace_002.txt, etc. The same glos­sary may need to be uploa­ded seve­ral times if that language appears in different fragments.

If a glos­sary is missing, the previous one will be used. For example, if trans­crip­tion file 005.txt does not find replace_005.txt, it will use replace_004.txt – or the next non-empty prece­ding one. The same applies to regu­lar expres­sion files. Note that this is the case when you are dealing with a sequence of trans­crip­tion files with a single glos­sary and/or regu­lar expres­sion file..


The confi­den­tia­lity of the data you process with this service is not guaran­teed : someone sharing your Internet access (IP address) could export the content. This content (your works­pace) is howe­ver auto­ma­ti­cally dele­ted after 24 hours of non-use. Furthermore, you can click the DELETE ALL FILES button at any time.


Écrire un commentaire...

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *

Ce site utilise Akismet pour réduire les indésirables. En savoir plus sur comment les données de vos commentaires sont utilisées.