In this article I am going to go in depth about my process using Amazon’s Neural Machine Network for transcribing audio. The target audience for the article will have experience transcribing and post-editing. The last step, post-editing, I will demonstrate the power of Amazon’s AI by using a custom process. The goal is to inform you how I use Amazon’s services for higher transcribing throughput.
We’ve been hearing a lot about artificial intelligence (AI) lately in transcription and translation. It’s a controversial topic because we are living in an age when the efficacy of human labor is being challenged by machines. Not to worry. In another article I will explain to you a few very good reasons why I believe its nearly impossible for interpreters and translators to be completely replaced by AI—at least anytime soon. AI is here to increase human capacity not to replace it.
Very generally, AI is software written to mimic the decision making processes of humans. When I say ‘software’ it is far more complicated than the web pages of the 1990s. The foundation of AI is modeling software after the structure of synapses in the human brain. Deep learning is the latest technique of machine learning that has evolved out of extensive research. It is more accurate than previous techniques for programming artificial intelligence software.
Some of you savvy post-editors might already know that early computer-assisted translation and transcription was based on statistical methods. The computer analyzed how likely it was that one phrase would follow the next. This technique became cumbersome. It meant computers needed massive amounts of data to create statistical models, often limited to the type of source text loaded into the machine. It soon became apparent that the issue of ‘noise’ or inconsistencies in source text affected accuracy. Dialects, for example, will produce inaccuracies in the models. This is where deep learning comes in. Basically, deep learning accounts for inconsistencies in data to produce even higher quality transcriptions, translations, and machine-based learning in hundreds of other contexts.
So, now my process.
Step 1: Loading the Audio File
Using Amazon S3
First, you’ll of course need to sign up for an Amazon Web Services account. This is not that same as your Amazon Prime account. It is specifically to use the web services.
In order to load your source material, e.g. audio or video, into Transcribe. you’ll need to use what Amazon calls an S3 bucket. The benefit of S3 storage is scalability and cost. It means that anyone or any size of organization can leverage its storage power. For me, a freelancer, I can store lots of audio for next to nothing. As of the writing of this article I can store 5 GB for free.
Once you’ve started your account look for S3.
You see below there quite a few configurations. You can simply choose the defaults for best security. Ultimately, you’ll need to decide what’s best for your situation. Enter a name for your bucket. I chose transcriptions-for-civil. At the bottom you click Create Bucket.
Finalizing Your S3 Bucket
Upload your files and you’ll see them appear under Objects.
Step 2: Transcribing Your Audio File
Using Amazon Transcribe
Now go back to your dashboard and choose Amazon Transcribe under Machine Learning. Here you’ll load your newly stored audio file into the AI.
Starting A Transcription Job
Once the transcription is complete we need to put it into a format we can use. Unfortunately there is not an easy-to-use format right out of Amazon. Clicking on download transcript provides with a file in the JSON format. So, we’ll need to turn it into a .docx format so we can post-edit.
Step 3: Using TScribe
Here is where we delve into the world of open source software. From here on it is technical. Skip to the end and see the output of Amazon Transcribe’s AI technology if you want.
The first stop to creating a MS Word document (.docx) for post-editing is GitHub. GitHub is a place where independent software developers publish code for everyone to use. You can find amazing projects on GitHub, however the one we’re interested in is here https://github.com/kibaffo33/aws_transcribe_to_docx.
As you can see from the title it is used to turn our AWS output, the JSON markup language in the Microsoft Word document.
This project runs in a language called Python. It is a popular programming language. I will not detail how to install it but if you want to investigate more you can go here https://docs.python-guide.org/starting/install3/osx/. With a little practice you can do it!
Below is the command line prompt on a computer, my Mac. The highlighted yellow are the commands where I input the JSON into tscribe and the output of a MS Word document.
The Confidence Model
Step 4: Post editing
Cleaning up AI output
Now, we can begin cleaning up the few words that perhaps were not transcribed accurately. As mentioned above, the words in grey are perhaps questionable. You can see the AI did an amazing job detecting the Spanglish words, marked them with lower confidence, and provided a phonetic rendition. Plus it caught the speakers hedges and did a pretty good job with an intermediate Spanish speaker, the interviewer. I will need to change the speaker for a few passages. Lastly, having the lower confidence words in grey makes post-editing much easier ergo quicker.