When using Limecraft to automatically transcribe audio into timed text, a 'Custom Dictionary' allows you to reduce the Word Error Rate (WER) to zero and to minimise the effort for manual post-editing. In this article, we explain how to configure and use custom dictionaries to achieve maximum efficiency.
TABLE OF CONTENTS
- What is a Custom Dictionary?
- 1. Configuring a Custom Dictionary
- 2. Exporting and Importing Custom Dictionaries
- 3. Automatic Speech to Text Transcription using a Custom Dictionary
- 4. Automatic subtitling with a dictionary
What is a Custom Dictionary?
When using Audio Transcription, a custom dictionary or glossary significantly enhances accuracy, particularly when dealing with specialised terminology, brand names, or proper nouns which are typically hard to recognise for a standard Automatic Speech Recognition (ASR) engine.
Generic ASR models often struggle with industry-specific terminology, proper names, and technical jargon, leading to errors and extensive manual work. By using a tailored glossary or 'Custom Dictionary', you ensure correct spelling of brand names, company references, and domain-specific vocabulary.
1. Configuring a Custom Dictionary
Before you can use glossaries or custom dictionaries to improve transcription accuracy, you first need to configure it. Go to your Workspace Settings > Transcriber. Scroll down to the section 'Dictionaries', as seen below.
Note Limecraft supports a range of Automatic Speech Recognition (ASR) engines, not all of them supporting custom dictionaries. In case the 'Dictionaries' section is not visible, your workspace might be set up using an engine that not support it.
To create a new dictionary, select ‘Add new dictionary’, which gives you the following screen:
Start creating a new Custom Dictionary by adding a descriptive name for the dictionary and the applicable language. The domain is optional.
Next type or paste the terms or words in the input field at the bottom of the page. A dictionary entry can be a single word or a phrase which you expect to appear as-is in the spoken text of your material.
Each line in this input field should contain a single dictionary entry. You can specify up to 1000 entries in a single dictionary. Don't forget to confirm by using ‘Save dictionary’.
If you navigate back to the Transcriber settings, you’ll now see a table containing one row for each dictionary you created.
On the right side of each dictionary, there is a menu ("...") which allows you to edit, remove or export the dictionary.
2. Exporting and Importing Custom Dictionaries
When editing a Custom Dictionary, you have the option to export the contents as a list of words, or to import a similar list.
2.1 Exporting a dictionary
You can export a dictionary as a JSON file or as a CSV (Comma Separated Values) file.
2.2 Importing a Custom Dictionary
It is also possible to import Custom Dictionaries. Simply select the file, and choose if you would like to replace all entries that are already in the dictionary or not.
When importing a CSV dictionary file that was not created by Limecraft in the first place:
the first row is assumed to contain header labels
the column with header label “content” should contain the term
To avoid issues though, it is best to start from a CSV exported from a Limecraft dictionary, and edit that.
3. Automatic Speech to Text Transcription using a Custom Dictionary
To use a Custom Dictionary during audio transcription, open the transcriber for the clip as shown below, select the language, and select the right Custom DIctionary for this language (cf above).
4. Automatic subtitling with a dictionary
Custom Dictionaries can also be engaged when creating Subtitles using Automatic Transcription.