Empower Your Donut Model for Receipts with Self-Annotated Data

<p>In this article, I will show you how to fine-tune&nbsp;<a href="https://github.com/clovaai/donut" rel="noopener ugc nofollow" target="_blank">the Donut model</a>&nbsp;with your own custom receipts data. Further fine-tuning the Donut model for your specific need, can massively boost the performance of the model in the particular task. This article will use a Donut model already fine-tuned on the&nbsp;<a href="https://github.com/clovaai/cord" rel="noopener ugc nofollow" target="_blank">CORD dataset</a>, annotate some receipts, and then use those annotations to further fine-tune the Donut model.</p> <p><img alt="" src="https://miro.medium.com/v2/resize:fit:611/1*icItYsII4jhh2QMZLMC8iw.jpeg" style="height:495px; width:611px" /></p> <p>Create your own dataset and fine-tune your Donut model with custom data with this tutorial. Photo by&nbsp;<a href="https://unsplash.com/@izzyfisch_?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText" rel="noopener ugc nofollow" target="_blank">Isabella Fischer</a>&nbsp;on&nbsp;<a href="https://unsplash.com/photos/KX_HvSO-JlE?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText" rel="noopener ugc nofollow" target="_blank">Unsplash</a></p> <h1>Story overview</h1> <ul> <li>Finding an annotation tool and annotating</li> <li>Converting data to the correct format</li> <li>Training with your annotated data</li> </ul> <h1>Finding an annotation tool and annotating</h1> <p>To create your own dataset, you have to have an annotation tool. Luckily, there are plenty of tools available online. For this tutorial, I will be using the Sparrow annotation tool from&nbsp;<a href="https://github.com/EivindKjosbakken/sparrow" rel="noopener ugc nofollow" target="_blank">this GitHub repository</a>. Note that this is forked from&nbsp;<a href="https://github.com/katanaml/sparrow" rel="noopener ugc nofollow" target="_blank">another GitHub repository</a>, and with a few changes for my specific needs which are described later in the article. How to annotate is explained in the repository, but I will display the steps you have to do below, for simplicity.</p> <p><a href="https://python.plainenglish.io/empower-your-donut-model-for-receipts-with-self-annotated-data-51fc882b7229">Click Here</a></p>