Empower Your Donut Model for Receipts with Self-Annotated Data
<p>In this article, I will show you how to fine-tune <a href="https://github.com/clovaai/donut" rel="noopener ugc nofollow" target="_blank">the Donut model</a> with your own custom receipts data. Further fine-tuning the Donut model for your specific need, can massively boost the performance of the model in the particular task. This article will use a Donut model already fine-tuned on the <a href="https://github.com/clovaai/cord" rel="noopener ugc nofollow" target="_blank">CORD dataset</a>, annotate some receipts, and then use those annotations to further fine-tune the Donut model.</p>
<p><img alt="" src="https://miro.medium.com/v2/resize:fit:611/1*icItYsII4jhh2QMZLMC8iw.jpeg" style="height:495px; width:611px" /></p>
<p>Create your own dataset and fine-tune your Donut model with custom data with this tutorial. Photo by <a href="https://unsplash.com/@izzyfisch_?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText" rel="noopener ugc nofollow" target="_blank">Isabella Fischer</a> on <a href="https://unsplash.com/photos/KX_HvSO-JlE?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText" rel="noopener ugc nofollow" target="_blank">Unsplash</a></p>
<h1>Story overview</h1>
<ul>
<li>Finding an annotation tool and annotating</li>
<li>Converting data to the correct format</li>
<li>Training with your annotated data</li>
</ul>
<h1>Finding an annotation tool and annotating</h1>
<p>To create your own dataset, you have to have an annotation tool. Luckily, there are plenty of tools available online. For this tutorial, I will be using the Sparrow annotation tool from <a href="https://github.com/EivindKjosbakken/sparrow" rel="noopener ugc nofollow" target="_blank">this GitHub repository</a>. Note that this is forked from <a href="https://github.com/katanaml/sparrow" rel="noopener ugc nofollow" target="_blank">another GitHub repository</a>, and with a few changes for my specific needs which are described later in the article. How to annotate is explained in the repository, but I will display the steps you have to do below, for simplicity.</p>
<p><a href="https://python.plainenglish.io/empower-your-donut-model-for-receipts-with-self-annotated-data-51fc882b7229">Click Here</a></p>