Hello Friends 👋,
In the last post, I shared with you why we must improve audio transcriptions and video subtitles.
Go ahead and read the previous post if you haven't 😊.
In this article, I want to share how to improve transcriptions to make them accessible.
It all started with my contribution to Software Engineer Unlocked podcast. I then observed Virtual Coffee podcast's transcriptions. And I saw many typos and suspected missing words because some of them were out of context and difficult to understand.
So I reached out to one of the Virtual Coffee's maintainers, Dan. I asked if I could help improve the transcriptions. And he agreed.
Then I started asking around and searching for information on improving the transcriptions.
First, we need to know that there are several types of transcriptions.
Verbatim transcriptions mean that we transcribe the transcriptions word by word. We keep the repetition, stuttering, and filler words such as 'I mean', 'like', 'you know', 'um', etc.
We must transcribe everything we hear (yes, even cursing words!) and not leave anything out. It also includes non-verbal ones like laughter, cough, pauses, etc.
This type is also known as edited transcripts. Like the verbatim, we want to keep everything. But we can omit repetition, stuttering, and filler words. We may also leave out non-verbal communication to make the transcript easier to read.
This type allows us to light edit the transcripts, such as the grammar. It also allows us to omit unnecessary words, even the off-topics talk. The goal is to deliver the meaning of the speech more naturally so readers can get the purpose of the whole conversation.
This section will go through how to improve transcriptions from scratch. And it's based on my experience with the Virtual Coffee podcast.
After discussing with Dan, we decided to use the verbatim type because we wanted to capture the whole atmosphere of the talk. We keep everything except for the filler words 'uh' and 'um'.
Then Dan provided the repo for the podcast. Some podcasts use markdown for the transcriptions. But we use the
.srt format for our podcast transcriptions at Virtual Coffee.
Now I began with improving one of the episodes. This was to test things out.
First, I read the transcription. Then I listened to the episode while fixing the typos and adding words and proper punctuation. After my pull request merged, I read the transcription on the website without listening to the audio.
It was somewhat challenging to read because of the repetition and the false starts. So I researched more on how to make the transcription easier to read and more understandable with proper punctuations.
After several times editing the transcription, I was finally happy with the result. Based on my notes, I wrote the guidelines to improve the transcriptions. Having a guideline is essential to maintain consistency throughout the transcriptions.
There might be additional rules, such as the format, within the guidelines.
I take the Software Engineer Unlocked podcast as an example. They want every line to have ~80 characters for a better pull request.
At Virtual Coffee, we want each section to have three or four lines. It should contain an index, timestamps, and one or two lines of text.
At the beginning of improving the transcriptions, I did everything manually. I checked and fixed the index, timestamps, and line breaks. Shoutout to Dan for providing a tool we can run with
yarn check-srt to help our contributors. This tool will check for invalid timestamp formatting. It will also fix the line breaks and the order of the index. We only need to fix the list of incorrect timestamp formats and don't have to manually check them one by one.
Every organization has rules to make their transcriptions consistent. And we must read and follow their guidelines. Say that we're contributing to an open-source, we must also read their
CONTRIBUTING.md, if any. Ask the maintainers whenever you're in doubt or when you have questions.
Sometimes, we improve a transcription where the topic is not too familiar. Or there would be times when we need clarification about the capitalization of a word. We always want to do research to write them correctly and with proper capitalization. For example,
Ruby on Rails,
Some speakers are not native English speakers. And they might have an accent that can't get picked up by the speech-to-text apps. Or when two speakers talk simultaneously, it would be hard for us to hear clearly what each speaker is saying.
But we must always try to best guess their words. If we still can't figure it out in any way, transcribe it as
inaudible, depending on the guidelines.
When we finish improving the transcription, read the transcript one more time without listening to the audio. It's important to make sure that the transcription is readable and understandable. Because that what makes transcription is accessible for everyone.
Until today, it's still a challenge for us to make the web 100% accessible to everyone. But every slight accessibility improvement is a small step towards making the web more accessible for everyone.
And now you know how to improve audio media transcriptions, let's make the web more accessible together! 😀
Note: Check out the Virtual Coffee podcast and the Virtual Coffee podcast repository to get an insight into everything I'm talking about in this article 😊.