Langchain 101: Extract structured data (JSON)

<p>Based on the medium&rsquo;s new policies, I am going to start with a series of short articles that deal with only practical aspects of various LLM-related software.</p> <p><img alt="" src="https://miro.medium.com/v2/resize:fit:700/0*IOuZhyWvp0nzVHb5" style="height:467px; width:700px" /></p> <p>Photo by&nbsp;<a href="https://unsplash.com/@margabagus?utm_source=medium&amp;utm_medium=referral" rel="noopener ugc nofollow" target="_blank">Marga Santoso</a>&nbsp;on&nbsp;<a href="https://unsplash.com/?utm_source=medium&amp;utm_medium=referral" rel="noopener ugc nofollow" target="_blank">Unsplash</a></p> <h1>The Tutorial</h1> <p>In this tutorial, we will learn how to extract structured data from&nbsp;<a href="https://arxiv.org/abs/2308.03279" rel="noopener ugc nofollow" target="_blank">f</a>ree text. Let&#39;s get some data.</p> <pre> # Get some text https://arxiv.org/abs/2308.03279 abstract inp = &quot;&quot;&quot;Large language models (LLMs) have demonstrated remarkable \ generalizability, such as understanding arbitrary entities and relations. \ Instruction tuning has proven effective for distilling LLMs \ into more cost-efficient models such as Alpaca and Vicuna. \ Yet such student models still trail the original LLMs by \ large margins in downstream applications. In this paper, \ we explore targeted distillation with mission-focused instruction \ tuning to train student models that can excel in a broad application \ class such as open information extraction. Using named entity \ recognition (NER) for case study, we show how ChatGPT can be distilled \ into much smaller UniversalNER models for open NER. For evaluation,\ we assemble the largest NER benchmark to date, comprising 43 datasets \ across 9 diverse domains such as biomedicine, programming, social media, \ law, finance. Without using any direct supervision, UniversalNER \ attains remarkable NER accuracy across tens of thousands of entity \ types, outperforming general instruction-tuned models such as Alpaca \ and Vicuna by over 30 absolute F1 points in average. With a tiny \</pre> <p><a href="https://pub.towardsai.net/langchain-101-extract-structured-data-json-f68f5d78160e"><strong>Visit Now</strong></a></p>
Tags: Data Langchain