Copyright © Rogue Amoeba Software, Inc. All rights reserved.
The Transcribe block takes spoken language input and creates a text transcription.1 Read on for information on configuring the Transcribe block, as well as other tips.
Using the Transcribe block should be fairly straightforward. This section provides details on all of its setup options.
Transcribe offers two different transcription models: Low Resources and High Accuracy.2 You can download and use either or both models. The Low Resources model will download quickly and take up minimal local disk space, while the High Accuracy model will take longer to download and use more disk space.
In practice, the Low Resources model will use less CPU power and produce transcripts quickly, which can be helpful for real-time (or near real-time) use cases. By contrast, the High Accuracy model will use more CPU power and take longer to produce a transcript, but it will provide the most accurate results.
The Language selector can be used to make Transcribe focus its transcription efforts on a specific language, to produce better results. If multiple languages may be spoken, the Auto state can be used. Transcribe will then attempt automatic detection of languages.
Transcribe will show an Input field for each input connected to the block. Each Input can be given a custom name, which will then preface its content in the resulting transcript. Several variables are available for use in this field.
For more details, see “Transcribe From Multiple Inputs” below.
Specify the desired file name for your transcript. Several variables are available for use in this field.
Specify the location to which Audio Hijack should save your transcript files. By default, Audio Hijack saves to ~/Documents/Audio Hijack.
The most straightforward way to use Transcribe is by providing it with live audio from a microphone connected to your Mac. The block will transcribe this audio, producing a text file containing a transcript.
In Audio Hijack’s Template Chooser, you’ll find a Transcribe template to help you get started transcribing from a microphone. This simple template takes audio from a microphone and runs it through the Transcribe block, then saves it to an audio file. By default, transcription files are saved into an Audio Hijack sub-folder in your Mac's Documents folder, named with the date and time followed by the word “Transcription”.
Use the Transcribe template to speak into your Mac and get a transcript back out.
Audio Hijack can capture audio from any application running on your Mac, and that means you can also transcribe anything you can hear. This is particularly useful for voice and video calls on Zoom, Skype, and other VoIP services.
With text transcripts, your meetings on Zoom and calls on FaceTime can now be referenced and searched. Use Transcribe with any application on your Mac, for endless speech to text possibilities.
Transcribe can also assist if you have an existing audio file and want to get a transcript from it. To do this, you’ll play the file in any app (such as MacOS’s QuickTime Player) and capture the audio with Audio Hijack.
Once the audio is flowing through Audio Hijack, you can route it through the Transcribe block to get a transcript.
Transcribe is especially handy for podcast creators who wish to provide a text transcript for their shows. You can configure your podcast setup so that each speaker is identified, based on input.
You can also use block nicknames to identify your speakers. Below, the name of each input block has been edited (to “Ammo” and “Ammette”). The Source variable is then used to get the speaker’s name before their text, like so:
Here, the first input has been given the nickname “Ammo”, while the second has been given the nickname “Ammette”. The resulting file will look like so:
The Transcribe block is powered by Whisper, OpenAI’s impressive automatic speech recognition system. While the speech recognition is very good, it is not perfect. Be sure to check your transcripts for accuracy.
Footnotes:
In addition to transcribing English, Transcribe can understand and transcribe 98 other languages. See the full list in the Language menu within the Transcribe block. Note that accuracy and quality of transcripts varies by language. ↩︎
At present, “Low Resources” uses the “Base” Whisper model, while “High Accuracy” uses the “Large (v2)” Whisper model. ↩︎