Build Your Own SQL Analyst Bot

<p>One of the challenges of using LLMs (Large Language Models) in a business context is getting the model to answer factually and accurately about your company&rsquo;s data. One possible solution is Retrieval Augmented Generation (RAG) using a vector database to populate the prompt context (see my post:&nbsp;<a href="https://medium.com/google-cloud/q-a-with-your-docs-a-gentle-introduction-to-matching-engine-palm-bbbb6b0cff7b" rel="noopener">Q&amp;A With Your Docs: A Gentle Introduction to Matching Engine + PaLM</a>). This works well for semi-structured data like text files and PDFs. But what if you wanted to retrieve data from a stuctured data source? What if we had our LLM use the results of an analytic database query? That&rsquo;s what we are going to explore in this post.</p> <p>Using Google&rsquo;s new Codey APIs,&nbsp;<a href="https://cloud.google.com/blog/products/ai-machine-learning/google-cloud-launches-new-ai-models-opens-generative-ai-studio" rel="noopener ugc nofollow" target="_blank">announced at Google I/O</a>&nbsp;earlier this year, we&rsquo;ll build a system that:</p> <ol> <li>Converts the user&rsquo;s natural language question to a SQL statement</li> <li>Runs that SQL statement against an analytic database</li> <li>Uses the query result to answer the user&rsquo;s original question</li> </ol> <p>We&rsquo;ll also discuss prompt tuning as well as some of the shortcomings and limitations of a system like this.</p> <p>In this how-to, I will be querying the&nbsp;<a href="https://console.cloud.google.com/bigquery;cameo=analyticshub;pageName=listing-detail;pageResource=1057666841514.us.google_cloud_public_datasets_17e74966199.3b57317b85064f7bb199c8fb0d0e05fe" rel="noopener ugc nofollow" target="_blank">NYC Citibike public dataset</a>.</p> <p>While I will explain each code module in order, the full code is provided at the end of this tutorial.</p> <p><a href="https://medium.com/google-cloud/build-your-own-sql-analyst-bot-88e06c1b80e8">Click Here</a></p>
Tags: Analyst SQL