Color Your Captions: Streamlining Live Transcriptions With “diart” and OpenAI’s Whisper

<p>In this post, I&rsquo;m going to show you how to combine OpenAI&rsquo;s&nbsp;<em>Whisper</em>&nbsp;for speech recognition with&nbsp;<code>diart</code><em>&nbsp;</em>for streaming speaker diarization to obtain real-time speaker-colored transcriptions as shown above.</p> <h1>How does it work?</h1> <p><em>Diart</em>&nbsp;is an AI-based Python library for streaming speaker diarization (<em>i.e.&nbsp;</em>saying &ldquo;who speaks when&rdquo;) built on top of&nbsp;<code>pyannote.audio</code>&nbsp;models, and specially designed for live audio streams (e.g.<em>&nbsp;</em>from a microphone).</p> <p>With a few lines of code,&nbsp;<code>diart</code>&nbsp;allows you to obtain real-time speaker tags like these:</p> <p><img alt="" src="https://miro.medium.com/v2/resize:fit:700/1*KYmVPiXfLf7fWf2akWUk2A.gif" style="height:156px; width:700px" /></p> <p>Streaming speaker diarization with diart</p> <p>At the same time, Whisper<em>&nbsp;</em>is a recent model from OpenAI trained for automatic speech recognition (ASR) that is particularly robust to noisy conditions, which is perfect for real-life use cases.</p> <h1>Setting everything up</h1> <ol> <li>Install&nbsp;<code>diart</code>&nbsp;by following the instructions&nbsp;here</li> <li>Install&nbsp;<code>whisper-timestamped</code><em>&nbsp;</em>with&nbsp;<code>pip install git+https://github.com/linto-ai/whisper-timestamped</code></li> </ol> <p>In the rest of the post, I&rsquo;ll be relying on RxPY (reactive programming extensions for Python) for the streaming part. If you&rsquo;re not familiar with it, I recommend you take a look at this&nbsp;documentation page&nbsp;to get a grasp of the basics.</p> <p>In a nutshell, reactive programming is all about composing operations that act on emitted items (in our case audio chunks) from a given source (in our case the microphone).</p> <h1>Combining diarization and transcriptions</h1> <p>Let&rsquo;s start with an overview of the source code and then we&rsquo;ll break it down into blocks to better understand it.</p> <p><a href="https://betterprogramming.pub/color-your-captions-streamlining-live-transcriptions-with-diart-and-openais-whisper-6203350234ef">Website</a></p>