Caption manager

Home » Caption manager

Traduire cette page ? C’est ici :

Caption manager is a free online device crea­ting files that display subtitles on videos.

The input data is a set of trans­crip­tion files crea­ted by a service such as DeepTranscriptOtter​.ai and WhisperTranscribe are alter­nate options — and optio­nally correc­ted with Grammarly. Because WhisperTranscribe creates subtitle files directly (in VTT format), you may not need Caption mana­ger to create mono­lin­gual subtitles. The advan­tage of using DeepTranscript is explai­ned in section Translated transcriptions.

The final output is a set of caption files that will display subtitles over a video player or a WordPress website. Below is a sample 3‑hour video used to illus­trate the process. Subtitles have been crea­ted in the video’s source language (English), and then trans­la­ted to French, Spanish and Italian.

How it works

To create the trans­crip­tion file in the video’s source language, first export your video to an MP3 sound file.

If you are using DeepTranscript, you may need to cut the MP3 file into frag­ments that fit within the program’s size limits (approxi­ma­tely 1 hour).

Export each trans­crip­tion as a tabu­la­ted text file. The exten­sions of these files may be « txt » or « tab ». If you’re using DeepTranscript, you’ll first need to export the trans­crip­tion as a CSV file. Then load this CSV file into a spread­sheet program and export it as tabu­la­ted text without the quota­tion marks as text markers.

Once your files are ready, go to this page to upload them and enter your settings. Note that to start a session, you will be asked to perform a small calcu­la­tion and write the result in French… This is to prevent robots from satu­ra­ting the service. Pay atten­tion to the ‘plus’ versus ‘moins’ (minus) operation !

The settings mainly include the output file name (which will usually match the video file name), the mini­mum dura­tion of a subtitle and the maxi­mum length of each subtitle. By default, the mini­mum dura­tion is set to 800 milli­se­conds and the maxi­mum length to 15 words.

Since speech analy­sis devices work on breath units which may contain more than 15 words, subtitles will often need to be broken into smal­ler units. Caption manager will break them up and fix them at the correct (inter­po­la­ted) times.

This program produces ‘srt’ (SubRip), ‘sbv’ (SubViewer) and ‘vtt’ (WebVTT) caption file (subtitle) formats. All of them are compa­tible with VLC and may be accep­table choices for YouTube. The ‘vtt’ format is needed by the default video player in WordPress sites. In addi­tion, it is the only one that accepts the non-breaking spaces requi­red by French punc­tua­tion. None of these formats are reco­gni­sed by the QuickTime video player.

A few format­ting tags such as italics and bold are accep­ted in subtitles. Line breaks are also accep­ted to denote speech turns in a dialogue.

Turn-taking can be taken care of by displaying the names of the new spea­kers in the first subtitle of their turn to speak. This is done auto­ma­ti­cally once spea­ker tags have been ente­red in the first column of the trans­crip­tion table.

Extra features


When buil­ding a caption file, it is possible to make auto­ma­tic repla­ce­ments recor­ded in a ‘replace’ file. This is a two-column tabu­la­ted file, the left contai­ning the string to be sear­ched, and the right its repla­ce­ment. Replacements are case-sensitive.

Underscores ‘_’ can be used to replace spaces ; they are auto­ma­ti­cally conver­ted to spaces when the glos­sary is loaded.

This glos­sary can be used, for example, to restore capi­tal letters that were forgot­ten during speech-to-text conver­sion. It can also be used to insert format­ting tags :

Note that it may be neces­sary to place spaces before and/or after search strings to iden­tify them as whole words. Care should also be taken to avoid self-embedding rules (such as those on the second and third lines), as these would create unwan­ted repe­ti­tions when glos­sary rules are applied multiple times.

Another feature is avai­lable for geeks fami­liar with regu­lar expres­sion syntax : files contai­ning sets of repla­ce­ment strings appli­cable to preg_replace() commands (in PHP syntax). These preg_replace() rules are applied before the glos­sary search-and-replace rules. As shown below, their sensi­bi­lity to context solves the problem of repea­ted self-embedding rewrites.

Merged transcription files

We recom­men­ded frag­men­ting the MP3 exports of the video sound­track so that they can be proces­sed by an online speech-to-text service. This may yield seve­ral trans­crip­tion files. To faci­li­tate an auto­ma­tic trans­la­tion of the entire set of subtitles, you will need to merge these files into a unique one, while at the same time applying changes program­med in the glossary.

Translated transcriptions

Translation is provi­ded free of charge by online services such as DeepL Translator.

It is recom­men­ded to perform the trans­la­tion on a trans­cript file that is struc­tu­rally simi­lar to the one produ­ced by DeepTranscript. This is because the trans­crip­tion mecha­nism has perfor­med a segmen­ta­tion based on breath units. That segmen­ta­tion is also opti­mal for machine trans­la­tion. It would be a bad idea to try to trans­late the contents of « srt » , « sbv » or « vtt » subtitle files, because their segmen­ta­tion has been modi­fied to suit the maxi­mum length of subtitles. Due to this, WhisperTranscribe is not recom­men­ded for multi-language subtitles.

Note that the output of DeepL Translator won’t be a well-formed trans­crip­tion file, unlike the one you got from DeepTranscript. The main reason is that DeepL Translator ‘trans­lates’ tabu­la­tions to spaces. Worse, it can insert unwan­ted spaces in time­codes, for example repla­cing ‘234.5′ with ‘235. 5’ and replace dots with commas in floating-point timings when trans­la­ted into French.… Caption manager will fix all these errors and handle the trans­la­ted trans­crip­tion as a well-formed file.

Reader-friendly layout

Caption manager does its best to make subtitle text as easy to read as possible. Each subtitle occu­pies seve­ral lines of approxi­ma­tely equal length. Correct line feeds will be produ­ced if the machine has been instruc­ted on the maxi­mum number of charac­ters on each line. The default is 70 because the WordPress Gutenberg video player creates frames with space that allows for about 75 charac­ters per line.

Another useful feature is the ability to control the posi­tion of the breaks between succes­sive captions. Reading becomes diffi­cult when certain words appear at the end of a caption or are moved to the next caption. To do this, Caption manager requires lists of « non-breaking » words that should not appear at the end of a caption. Each list is asso­cia­ted with a speci­fic language.

For instance, look at no-break-fr.txt and no-break-en.txt. A few words are summa­ri­sed as regu­lar expres­sions. For example, in French, /\s[Ll][ae]\s/ matches le, la, Le and La, and /\s[Dd]es?\s/ can match de, des, De and Des. As matching is case sensi­tive, it is neces­sary to include upper case variants, which can be easily combi­ned in regu­lar expressions.

Reordering requires moving words either back or forward. Let us see how it works. Consider the follo­wing sequence of captions :

Since words « de » and « la » are listed in no-break-fr.txt, this sequence will be rende­red in the ‘forward’ mode as :

and in the ‘back­ward’ mode as :

These two options can be compa­red on this excerpt :

The ‘forward’ mode is the default option because it seems better to delay text in subtitles than to move it backwards.


Below is the video (dura­tion 3 hours) of an inter­view with Dr Aseem Malhotra, auto­ma­ti­cally trans­cri­bed with DeepTranscript, revi­sed with the help of Grammarly, and trans­la­ted to French with the help of DeepL Translator. This process takes advan­tage of these big data plat­forms so that it requires a mini­mum of manual inter­ven­tion. In prac­tice, a care­ful editing work is requi­red to fina­lise the captions — and to create lexi­cal rules that can be used for future work.

The video is displayed by the stan­dard Gutenberg video player. After that, other players based on WordPress plugins will be shown.

Video of Dr Aseem Malhotra inter­vie­wed by Joe Rogan, 23 April 2023
Standard Gutenberg HTML5 video player in WordPress
With English or/and French subtitles
This video is displayed from a Google Cloud source.
Select language on video

On the bottom right of the frame, click the verti­cal three periods, then point at “Options” and click flags to select languages. Sorry for the use of coun­try flags, but language flags are not avai­lable in Unicode…

This is an ongoing process : only the first two hours have been revi­sed, so far, to show what can be achie­ved with a few manual correc­tions. The last hour is almost “raw”, only produ­ced by the auto­ma­tic tools.

This stan­dard Gutenberg video player requires no special instal­la­tion, but it has one major draw­back for its use with tech­ni­cal and/or foreign-language podcasts : the lack of rewind and forward buttons that allow you to skip approxi­ma­tely 10 seconds at a time.

More video players (script-based)

Below are two players instal­led as scripts using a CDN. Be aware that they are displayed diffe­rently on different brow­sers and/or systems. It is a good idea to check that all requi­red options work correctly on mobile phones.

Videojs HTML5 Player is an advan­ced project in terms of options. With the help of ChatGPT ! I was able to place the elap­sed time display at the bottom left of the frame. In the follo­wing example, I could set the back­ground colour and opacity, font family, and change these settings in full-screen mode, but the font size does not yet work. This issue is being discus­sed on the forum.

Apart from the font size issue, which will hope­fully be resol­ved soon, a major draw­back of this player is that it does not apply colours set by « ::cue » selec­tors in the WebVTT file. Its advan­tage over the stan­dard Gutenberg player is the exis­tence of (program­mable) skip buttons.

Another power­ful tool, my favou­rite at the moment, is the Plyr​.io media player. It misses skip buttons on seve­ral brow­sers, but left-right arrows allow backwards-forward jumps of about 2 minutes on some brow­sers, and ± 5 seconds on others, as instruc­ted by the seekTime para­me­ter. In addi­tion, skip buttons are visible on iPhones, although they skip the time by 10 or 15 seconds, depen­ding on the context.

.plyr__subtitles {
  vertical-align: bottom !important;

<video id="bbvideo" controls style="width: 100%;" poster="">
  <source src=""  type="video/mp4" />
  <track kind="subtitles" src="" srclang="enfr" label="ENG - FRA" default />
<track kind="subtitles" src="" srclang="es" label="ENG - ESP">
<track kind="subtitles" src="" srclang="it" label="ENG - ITA">
  <track kind="subtitles" src="" srclang="en" label="ENG" />
    Your browser doesn't support HTML video. Here is a
    <a target="_blank" href="" rel="noopener">link to the video</a> instead.

<script src=""></script>
<link href="" rel="stylesheet" />

  document.addEventListener('DOMContentLoaded', () => {
    var player = new Plyr('bbvideo', {
      controls: ['rewind','play','fast-forward','progress','volume','captions','settings'],
seekTime: 5,
      captions: { active: true }

Note expres­sion « #t=0 » in the <source> element. This is the code needed to start the video from a defi­ned point, as shown below.

More video players (plugins)

Below are a few players instal­led as WordPress plugins. Some of them do not have rewind and fast forward (skip) buttons, or these buttons are used to jump to the previous or next video track in a playlist.

WPlyr video player has all requi­red features, but it only displays video files stored on a local source, namely this website (embar­ked MP4 file). Trying the same with a Google Cloud source, it doesn’t work. This will place an exces­sive load on the site when many users are watching its video(s). For this reason it is not displayed here.

CP Media Player has a very clear display of subtitle texts, nota­bly in the full screen mode. It had a conflict with the MEKS audio player used on this site, but this was resol­ved by adding an extra para­me­ter to the short­code, which yielded :

Unfortunately, the current version (1.1.0) of CP Media Player does not (yet?) have rewind/forward (skip) buttons. In addi­tion, there is a large, unwan­ted white space at the bottom of the frame after a return from full screen.

Details of the process

The source video was copied (via a screen capture) from The Joe Rogan Experience (April 2023 lien:kow3). There are some irre­gu­la­ri­ties in the strea­ming : frames tend to freeze, although the audio track remains conti­nuous. In prin­ciple, this is not too much of a problem with speech data.

Due to the length of the video (3 hours), the MP3 sound track was sliced to three one-hour segments, namely 1.mp3, 2.mp3 and 3.mp3. I used the (excellent) TwistedWave audio editor on the Mac to do this. Cutting points need to be set at the end of sentences.

Each of the three sound files was submit­ted to DeepTranscript which trans­cri­bed them as text files. These were expor­ted in tabu­la­ted text format : “1.txt”, “2.txt”, “3.txt”. (You can down­load these examples.) The ‘tab’ exten­sion is another valid option.

Transcription files were checked in a spread­sheet editor — such as Excel, PlanMaker, etc. — to make sure that time­codes (in seconds) are displayed as floa­ting point numbers. You can see an example of the trans­crip­tion (1.txt) in its Excel/PlanMaker version : down­load “1.xlsx”. Since a French version of PlanMaker was used, floa­ting point numbers have commas instead of dots ; both are acceptable.

The first line of 1.txt does not contain signi­fi­cant data. It will auto­ma­ti­cally be skip­ped by Caption manager. The first column, rena­med ‘track’, will be used for tags iden­ti­fying spea­kers — see Speakers and turn-taking below. Value “0,0” will be conver­ted to plain ‘0’ which means “no change. The first column has been dele­ted in 2.txt and 3.txt ; this is no longer necessary.

These trans­crip­tion files are not suitable for displaying subtitles on the video, at least for the follo­wing reasons :

  1. Their format is not a reco­gni­sed stan­dard for use in video players.
  2. Lines are of very variable lengths because DeepTranscript worked on breath units to segment the trans­crip­tion. For instance, text on line #14 would not fit into a single video frame.
  3. Their content is “raw” and nota­bly contains miss­pel­led proper names. For instance, “Bhattacharya” is trans­cri­bed as “bada charia”! Also — one of the worst cases — “iver­mec­tin” as “ivamech­ton”… In addi­tion, all text appears in lower­case. Still, these errors are consistent in the sense that they can be fixed by the rewrite rules of a glossary.
  4. We need to merge the three trans­crip­tion files to produce a unique caption file. This is not just a matter of copy-paste. The time codes should add up conti­nuously from the first to the last part.

You should try to auto­mate correc­tions by using a glos­sary and/or sets of regu­lar expres­sion rules. Create a file named replace-en.txt in your text editor. Type the search expres­sion at the start of each line. Then type a tabu­la­tion and type its replacement.

An example of the English glos­sary for this video is here : replace-en.txt.

The same proce­dure works for regu­lar expres­sions : create a tabu­la­ted text file preg-replace-en.txt contai­ning pairs of argu­ments for preg_replace() instruc­tions (in PHP syntax) and upload it. Note that this set contains rules that won’t be self-embedding because of using contexts (nota­ted ‘$1’, ‘$2’, etc.). This means that applying the same set of rules multiple times does not create unwan­ted repetitions.

Below is a screen­shot of the top of the home­page of Caption manager after uploa­ding the three trans­crip­tion files, auto­ma­ti­cally rena­med 001.txt, 002.txt and 003.txt, and the correc­tion files used by all these frag­ments : a glos­sary file rena­med replace_001.txt, a regu­lar expres­sion file reg_replace_001.txt, and a non-breaking list of words no_break_001.txt (see Reader-friendly layout above):

Clicking button COMBINE SEQUENTIALLY creates unique files merging the three trans­crip­tion files : new_transcription.txt and new_transcription-hms.txt, with time­codes respec­ti­vely in seconds and in the hour-minute-second format as sugges­ted by ‘hms’. This can be useful if you want to work on a single trans­cri­bed file instead of three. The contents of replace_001.txt, reg_replace_001.txt and no_break_001.txt are igno­red in this process.

Create subtitles in the source language

On the Caption manager page, upload 1.txt, 2.txt, 3.txt (in this order) and correc­tion files as shown in the previous section. Then enter a name for output files — here for example, “AseemMalhotra-JoeRogan” — and click CREATESET OF CAPTION FILES (at the bottom of the page). This produces caption files in the three formats (srt, sbv, vtt), plus a compo­site trans­crip­tion file, here named “AseemMalhotra-JoeRogan-hms.txt”.

Files replace_001.txt and reg_replace_001.txt have been used to modify all these files. File no_break_001.txt has been used for crea­ting captions.

The follo­wing is an extract from the trace showing the tran­si­tion from 002.txt to 003.txt and the conti­nuity of the time­codes (broken at 02:01:47.800, i.e. 7307.800 seconds). It also confirms that the same files (replace_001.txt, reg_replace_001.txt and no_break_001.txt) have been used for corrections :

If the name of the caption file (in any of the three formats) is iden­ti­cal to the name of the video file (exclu­ding their exten­sions) and if they are on the same level, VLC will read the video and display subtitles. A great feature of VLC is that by clicking on the left and right arrows, you can move back­wards and forwards by 10 seconds, which is very conve­nient to re-read a fragment.

A plea­sant surprise : the size of the subtitles is now much more in keeping with the frame of the video. The main reason is that a para­me­ter Max number of words in each caption has been set — to 15 by default. Only words longer than 3 charac­ters are coun­ted. So, long entries (such as line 14 of 1.xlsx) have been split into seve­ral subtitles with their time codes inter­po­la­ted word-wise.

Caption manager also takes care of the mini­mum dura­tion of each subtitle (800 milli­se­conds by default). The time codes are adjus­ted so that the subtitles never over­lap with each other.

Now it is time to edit 1.txt, 2.txt, 3.txt in a plain text editor — e.g. TextEdit on a Mac. Read the subtitles while playing the video (prefe­ra­bly at slow speed) and edit the trans­crip­tion files.

Edit trans­crip­tion files 1.txt, 2.txt, 3.txt to fix errors that cannot be fixed by the glos­sary. Then click DELETE ALL TRANSCRIPTION FILES, and upload them again. Click again CREATESET OF CAPTION FILES. This will produce the final forms of subtitle files in the three formats (srt, sbv, vtt).

When captions are crea­ted in ‘vtt’ (WebVTT) subtitle format, options are provi­ded to change the colour of text in sections enclo­sed by <b>, <i> and <u> tags :

Colour names will be accep­ted if compliant with stan­dard HTML color names. Currently, these colours are displayed (on Mac) by Chrome, Opera, Brave brow­sers, yet neither Firefox nor Safari. In addi­tion, as shown above, seve­ral video players do not yet reco­gnise CSS « ::cue » formatting.

Note that this proce­dure also produ­ced a file “AseemMalhotra-JoeRogan-hms.txt” which contains the whole set of captions once correc­ted by replace-en.txt and preg_replace-en.txt. Rename it for instance AseemMalhotra-JoeRogan-hms-en.txt. This is the file you will use now for a final editing of the text, or to create trans­la­tions. Here ‘hms’ means that the time­codes are expres­sed in hours, minutes and seconds instead of just seconds ; this is the way most video players display them.

Create subtitles in different languages

To create trans­la­ted subtitles using DeepL Translator, it would be a bad idea to send ‘srt’, ‘sbv’ and ‘vtt’ files directly for trans­la­tion. This is because they contain subtitles that are spread over seve­ral frames (due to the 15-word limit). This can lead to inac­cu­rate trans­la­tions. It is there­fore better to work with trans­crip­tion files, as they contain subtitles segmen­ted by breath units, which make more sense to the auto­ma­tic translator.

So, we need to recons­truct a unique trans­crip­tion file by merging 1.txt, 2.txt and 3.txt, to which glos­sary search-replace rules will be applied. To this effect, you can use « AseemMalhotra-JoeRogan-hms-en.txt » which you down­loa­ded earlier. You may also do the same by a single click on button COMBINE SEQUENTIALLY THESE TRANSCRIPTIONS TO SINGLE FILE. The name of the file crea­ted by this process will always be new_transcription.txtdownload it to see an example. Let’s use this name for the follo­wing explanations.

You may notice that HTML tags for italics, bold, empha­si­zed or strong text have been repla­ced with non-standard tags [i], [/i], [b] and [/b]. The reason is that DeepL Translator tends to delete or modify few HTML tags. These tags will auto­ma­ti­cally be resto­red to the stan­dard HTML format when produ­cing caption files.

Of course, manual stylis­tic editing of new_transcription.txt is manda­tory before trans­la­ting it. If you are not a native English spea­ker, you will appre­ciate the help of Grammarly. Note that it offers the option of seve­ral variants of the language : British, American, Indian, Canadian and Australian English.

Open new_transcription.txt in a plain-text editor, copy its contents and paste it into DeepL Translator, after choo­sing the target language. (You may need to run a premium version if the text is long.) Then, copy the trans­la­tion and save it as a text file, e.g. transcription-fr.txt.

Whatever the auto­ma­tic trans­la­tion tool, since the source text is a trans­crip­tion of natu­ral speech, which means incom­plete sentences, missing or repea­ted words, etc., the result is likely to be poor. Big-data trans­la­tion tools tend to guess missing frag­ments, which leads to errors. Therefore, the trans­la­tion of trans­crip­tions must be care­fully edited by native spea­kers. For instance, this file is the raw version of transcription-fr.txt and that one its final version — after hours of final editing by hand !

In this example, a Spanish trans­la­tion is also visible, but transcription-es.txt is the raw, unedi­ted, version crea­ted by DeepL Translator. Native readers of Spanish should submit revi­sed versions…

It might be a good idea to use ChatGPT for these trans­la­tions, adding suitable stylis­tic speci­fi­ca­tions, but its current free version (3.5) has a too-small limit on the length of the text.

Now you need to create caption files in the three subtitle formats using this (unique) trans­crip­tion file. Click DELETE ALL TRANSCRIPTION FILES and DELETE ALL GLOSSARY FILES, then upload transcription-fr.txt and click button CREATESET OF CAPTION FILES.

Keyhole heart surgery ???

You’re almost there ! However, you still need to correct trans­la­tion errors, parti­cu­larly in rela­tion to tech­ni­cal or scien­ti­fic terms that DeepL Translator may have misun­ders­tood. A typi­cal example in this inter­view is “keyhole heart surgery” (time 56.3 seconds) which was trans­la­ted “chirur­gie cardiaque par trou de serrure”!

To fix this (and other) errors, you should create a speci­fic glos­sary file, e.g. replace-fr.txt. It would contain the follo­wing rewrite rule :

A more gene­ral version of this rule is program­med, using a regu­lar expres­sion, in preg-replace-fr.txt. It replaces both « chirur­gie d’ur­gence de trou de serrure » and « chirur­gie trou de serrure » with « chirur­gie cardiaque mini­ma­le­ment inva­sive (keyhole heart surgery) ». The same problem occur­red with Spanish and Italian trans­la­tions, as the trans­la­tor is unfa­mi­liar with the origi­nal English expres­sion. It is also safe to replace “cardiac” with “cardio­vas­cu­lar” as these are not equi­va­lent in Roman languages.

For regu­lar expres­sion geeks : the same regu­lar expres­sion file contains rules crea­ting French chevrons quotes along with their non-breaking spaces :

[\"«]\s*([^"^»]+)?\s*[\"»]	«[nbsp]$1[nbsp]»
\s“\s*	 «[nbsp]
\s*”\s	[nbsp]» 

This method is not perfect because some­times DeepL Translator forgets a few opening or closing quotes. This happe­ned only once in the 3‑hour sample video. So, it covers most cases.

Reading the final caption file, first with a text editor and then as real subtitles with VLC, will give you the oppor­tu­nity to correct errors, prefe­ra­bly using glos­sa­ries and regu­lar expres­sion rules, other­wise in the source trans­crip­tion file new_transcription.txt.

Multi-language subtitle mixing

Once subtitles have been trans­la­ted into different languages, it is possible to display two (or more) languages toge­ther at the bottom of the video. This has been done on the video shown as an example.

Remember that we have two trans­crip­tion files contai­ning all the English and French subtitles. Their time­codes are iden­ti­cal because they were crea­ted by DeepTranscript and left unchan­ged by DeepL Translator.

Upload the English trans­crip­tion, then the French trans­crip­tion which will be rena­med 001.txt and 002.txt respec­ti­vely. If the time­codes of 001.txt and 002.txt are iden­ti­cal, the follo­wing will be displayed :

Then upload the English and French glos­sa­ries (in the same order). These will be rena­med replace_001.txt and replace_002.txt respec­ti­vely. You can do the same with regu­lar expres­sion files that will be rena­med preg_replace_001.txt, preg_replace_002.txt, and optio­nally with the non-breaking rule files which will be rena­med no_break_001.txt and no_break_002.txt. As you might guess, each glos­sary, preg-replace and no-break file will be applied to the trans­crip­tion file with the same number. Other language versions (and their respec­tive glos­sa­ries) can be added at this stage.

You can apply simple HTML formats suppor­ted by video players : italics or bold. This makes it easier to draw atten­tion to the languages in the video. Here we have chosen ‘italic’ for the English text and no format­ting for the French text.

If ‘italic’ is selec­ted, italic text will be conver­ted to bold, and vice versa.

An optio­nal token, by default « ~•~ », can be set to create blank lines sepa­ra­ting language versions. Leave it empty to remove these empty lines, an option which is gene­rally satis­fac­tory when versions have different colours. In the current ‘vtt’ (WebVTT) format, colour is only applied to bold or italic text (see above), which is not optimal.

As two (or more) subtitles will be displayed within the same frame, it may be useful to reduce the maxi­mum length of each subtitle. For bilin­gual subtitles, we gene­rally set the limit to 15 words.


Subtitles in consecutive languages

This is the case of a video in which seve­ral languages have been used conse­cu­ti­vely in different parts. It is easy to deal with this situa­tion : segment the MP3 sound file into frag­ments contai­ning only one language. Transcribe and trans­late them sepa­ra­tely. Then upload them in alpha­be­ti­cal order, crea­ting 001.txt, 002.txt, etc.

Create a glos­sary for each language and upload the glos­sary files in the order they are needed : replace_001.txt, replace_002.txt, etc. The same glos­sary may need to be uploa­ded seve­ral times if that language appears in different fragments.

If a glos­sary is missing, the previous one will be used. For example, if trans­crip­tion file 005.txt does not find replace_005.txt, it will use replace_004.txt – or the next non-empty prece­ding one. The same applies to regu­lar expres­sion files. Note that this is the case when you are dealing with a sequence of trans­crip­tion files with a single glos­sary and/or regu­lar expres­sion file.

Speakers and turn-taking

The first column of the CSV tables produ­ced by DeepTranscript contains a variable called ‘channelId’ which we have so far igno­red. This column can even be dele­ted without distur­bing the Caption Manager process. By default, each of its cells contains the number ‘0’. We will now use it to specify the spea­kers, or rather the changes of spea­kers, namely the turn-taking.

Our conven­tion is that a value of ‘0’ (or an empty cell) means “no change”. You need to assign a tag to each spea­ker. The sugges­ted method is to use whole numbers : 1, 2, 3… The initials of the spea­kers can also be used to this effect. Spaces will be auto­ma­ti­cally repla­ced by unders­cores ‘_’.

You don’t need to fill all the cells with spea­ker tabs. Just fill in the cell where the spea­ker starts to speak. The zeros will take care of keeping the information.

Then load the edited trans­crip­tion file (tabu­la­ted text format) into the Caption manager page. The program will create and display a list of spea­ker tags, along with editable fields where you can enter the full names of these spea­kers. For example, in the video Michel Onfray – ‘Théorie de Jésus’, three spea­kers have been tagged ‘1’, ‘2’, ‘3’ and their full names have been ente­red in the corres­pon­ding fields — see picture.

These full names will be saved in your project’s settings when you create subtitles or trans­crip­tion files. (For geeks : they are also saved as a session variable…)

The full name will appear at the begin­ning of the subtitle in ‘square brackets’ at each turn. Another options is ‘unders­core’, which may bene­fit of setting a different colour for under­li­ned text.

Subtitle that mentions the name of the next speaker

Delete fragments and/or insert silences

It is often the case that a video that has been subtit­led needs to be edited. In parti­cu­lar, we may want to delete a frag­ment and/or insert a silence. To do this, load the trans­crip­tion file, go to the bottom of the page, and set the times or dates of the begin­ning and end of the frag­ment to be dele­ted. Also, specify the dura­tion of the silence that will replace this frag­ment, in case there is one.

A trace of this process would be as follows :

Download the resul­ting new_transcription file in either time (seconds) or hh:mm:ss format.

For a clean cut, the exact break points should coin­cide with the begin­ning and end of the first and last captions to be dele­ted. We had set the start to approxi­ma­tely 00:00:30 and the end at approxi­ma­tely 00:00:40. The start date falls within the inter­val of a caption with time codes 00:00:28.500 to 00:00:34.798. This means that the actual start date of the dele­ted frag­ment should be 00:00:28.500, which is 1.5 seconds earlier. The end date falls within the inter­val of a caption with time codes 00:00:37.600 to 00:00:44.700. Therefore, the end date of the dele­ted frag­ment should be 00:00:44.700, which is 4.7 seconds later. These are the exact values that need to be used when editing the video file.

The dates of the subtitles follo­wing the dele­ted frag­ment have been increa­sed by 7 seconds as speci­fied. Similarly, when editing the video file, a 7‑second blank frag­ment must be inserted.

A blank caption star­ting at 00:00:25.500 and ending at 00:00:47.700 has been added for conve­nience. It can be comple­ted with text or dele­ted later.

This proce­dure can be used to insert a silence without dele­ting a frag­ment : simply enter the desi­red posi­tion of the silence as both the start and end date of the “dele­ted” fragment.

Of course, if you are subtit­ling in more than one language, each language trans­crip­tion file should be edited with exactly the same parameters.

Create an index

Once a WebVTT caption file has been crea­ted, it is easy to create an index of the contents with active HTML links poin­ting at precise moments in the video. This has been done on my page Aseem Malhotra interview.

To this effect, a speci­fic page has been crea­ted, named “AseemMalhotra-JoeRogan.php”, which contains the code of the Plyr​.io media player in a PHP envi­ron­ment. This file has been uploa­ded to the “video” folder. The <source> element has been modi­fied as follows :

<source src="<?php echo $t; ?>"  type="video/mp4">

Variable $t is captu­red from the URL, allo­wing for either seconds or time (hh:mm:ss) format :

if(isset($_GET['t'])) {
    $t = $_GET['t'];
    if(is_integer(strpos($t,":"))) {
        $timeParts = explode(':', $t);
        $seconds = 0;
        if(count($timeParts) === 2)
            $seconds += $timeParts[0] * 60 + $timeParts[1];
        elseif (count($time1Parts) === 3)
            $seconds += $timeParts[0] * 3600 + $timeParts[1] * 60 + $time1Parts[2];
        $t = $seconds;
else $t = 0;

So, for instance, link 02:23:33 opens a new window and plays imme­dia­tely the refer­red frag­ment. This link can easily be copied and pasted on a remote HTML/Javascript page :

<a onclick="'','Malhotra','width=1200,height=700'); return false;" href="">02:23:33</a> The triangle that moves the mountain

A better imple­men­ta­tion also sends language infor­ma­tion to the video player to display the appro­priate subtitle version. A link to the same frag­ment 02:23:33 with Spanish subtitles by default would look like this :

<a onclick="';lang=es','Malhotra','width=1200,height=700'); return false;" href=";lang=es">02:23:33</a> El triángulo que mueve la montaña

using the follo­wing code in “AseemMalhotra-JoeRogan.php”:

if(isset($_GET['lang'])) {
  $lang = $_GET['lang'];
else $lang = '';
$langs = array("fr","es","it","en");
$all_tracks = '';
foreach($langs AS $thislang) {
  switch($thislang) {
    case "fr": $label = "ENG - FRA"; break;
    case "es": $label = "ENG - ESP"; break;
    case "it": $label = "ENG - ITA"; break;
    case "en": $label = "ENG"; break;
  if($thislang <> "en") $andlang = "-".$thislang;
  else $andlang = '';
  $track = "<track kind=\"subtitles\" src=\"".$andlang.".vtt\" srclang=\"".$thislang."\" label=\"".$label."\"";
  if($thislang == $lang) $track .= " default";
  $track .= " />";
  $all_tracks .= $track;

<video id="bb-video" controls style="width: 100%;" poster="" autoplay>
  <source src="<?php echo $t; ?>"  type="video/mp4">
  <?php echo $all_tracks; ?>
    Your browser doesn't support HTML video. Here is a
    <a target="_blank" href="">link to the video</a> instead.

<script src=""></script>
<link href="" rel="stylesheet" />

  document.addEventListener('DOMContentLoaded', () => {
    var player = new Plyr('bb-video', {
      controls: ['rewind','play','fast-forward','progress','volume','captions','settings','fullscreen'],
seekTime: 5,
      captions: { active: true }

Sharing editorial work

Access to your works­pace is gran­ted or denied based on your IP number. If you change loca­tion while working on a project, you must be able to access it from a different IP number in the connec­tion. This applies to part­ners working on the same project.

To enable this, Caption manager displays a secret key that you can copy and send to your guest editor. The editor will open Caption Manager on their side, paste the secret key to the form and click button “GO!”. Beware that after three unsuc­cess­ful tries their IP will be black­lis­ted ! Contact the admin if this happens…

The content of your works­pace is retai­ned for 48 hours. Each time you (or a guest editor) access it, a new 48-hour period is gran­ted. If left unused after this period, the works­pace will be auto­ma­ti­cally deleted.


The confi­den­tia­lity of the data you process with this service is not guaran­teed : someone sharing your Internet access (IP address) could find it and export the content. This content (your works­pace) is howe­ver auto­ma­ti­cally dele­ted after 48 hours of non-use. Furthermore, you can click DELETE ALL FILES buttons at any time.


Écrire un commentaire...

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *

Ce site utilise Akismet pour réduire les indésirables. En savoir plus sur comment les données de vos commentaires sont utilisées.