Color Your Captions: Streamlining Live Transcriptions With “diart” and OpenAI’s Whisper
<p>In this post, I’m going to show you how to combine OpenAI’s <em>Whisper</em> for speech recognition with <code>diart</code><em> </em>for streaming speaker diarization to obtain real-time speaker-colored transcriptions as shown above.</p>
<h1>How does it work?</h1>
<p><em>Diart</em> is an AI-based Python library for streaming speaker diarization (<em>i.e. </em>saying “who speaks when”) built on top of <code>pyannote.audio</code> models, and specially designed for live audio streams (e.g.<em> </em>from a microphone).</p>
<p>With a few lines of code, <code>diart</code> allows you to obtain real-time speaker tags like these:</p>
<p><img alt="" src="https://miro.medium.com/v2/resize:fit:700/1*KYmVPiXfLf7fWf2akWUk2A.gif" style="height:156px; width:700px" /></p>
<p>Streaming speaker diarization with diart</p>
<p>At the same time, Whisper<em> </em>is a recent model from OpenAI trained for automatic speech recognition (ASR) that is particularly robust to noisy conditions, which is perfect for real-life use cases.</p>
<h1>Setting everything up</h1>
<ol>
<li>Install <code>diart</code> by following the instructions here</li>
<li>Install <code>whisper-timestamped</code><em> </em>with <code>pip install git+https://github.com/linto-ai/whisper-timestamped</code></li>
</ol>
<p>In the rest of the post, I’ll be relying on RxPY (reactive programming extensions for Python) for the streaming part. If you’re not familiar with it, I recommend you take a look at this documentation page to get a grasp of the basics.</p>
<p>In a nutshell, reactive programming is all about composing operations that act on emitted items (in our case audio chunks) from a given source (in our case the microphone).</p>
<h1>Combining diarization and transcriptions</h1>
<p>Let’s start with an overview of the source code and then we’ll break it down into blocks to better understand it.</p>
<p><a href="https://betterprogramming.pub/color-your-captions-streamlining-live-transcriptions-with-diart-and-openais-whisper-6203350234ef">Website</a></p>