Creating a (mostly) Autonomous HR Assistant with ChatGPT and LangChain’s Agents and Tools

<p>OpenAI recently released a&nbsp;paper&nbsp;comparing two training methods aimed at improving the reliability of Large Language Models (LLM): model training by &lsquo;process supervision&rsquo; and model training by &lsquo;outcome supervision&rsquo;.</p> <p><iframe frameborder="0" height="440" scrolling="no" src="https://cdn.embedly.com/widgets/media.html?type=text%2Fhtml&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;schema=twitter&amp;url=https%3A//twitter.com/OpenAI/status/1663957407184347136%3Fs%3D20&amp;image=https%3A//i.embed.ly/1/image%3Furl%3Dhttps%253A%252F%252Fabs.twimg.com%252Ferrors%252Flogo46x38.png%26key%3Da19fcc184b9711e1b4764040d3dc5c07" title="OpenAI on Twitter: &quot;We trained an AI using process supervision - rewarding the thought process rather than the outcome - to achieve new state-of-art in mathematical reasoning. Encouraging sign for alignment of advanced AIs: ...https://t.co/ryaODghohn / Twitter&quot;" width="680"></iframe></p> <p>Essentially, one model is rewarded for the correct &lsquo;thought process&rsquo; or intermediate reasoning steps generated to arrive at an answer; the other model is rewarded for the correct &lsquo;outcome&rsquo; or the final result.</p> <p>The two models are tested on a math problems dataset. The findings show that the model trained on process supervision &mdash; the one that is rewarded for correct &lsquo;thought process&rsquo;&nbsp;<em>significantly outperforms</em>&nbsp;the model rewarded for the correct &lsquo;outcome&rsquo;. Check the graph of their relative performance&nbsp;here.</p> <p>Another&nbsp;paper&nbsp;from Google demonstrates how an LLM, when utilized with&nbsp;<em>chain- of-thought</em>&nbsp;prompting, performs better at reasoning and decision-making tasks. That is, it works more effectively if it explains it&rsquo;s thought process &lsquo;out loud&rsquo; to arrive at an answer, as opposed to outputting the answer immediately.</p> <p>The two papers show that by utilizing&nbsp;<em>chain-of-thought</em>&nbsp;processes during their decision-making, the models not only enhance their problem-solving abilities, but also reduce the instances of&nbsp;<em>hallucinations</em>, or the generation of incorrect or nonsensical information.</p> <p><a href="https://pub.towardsai.net/creating-a-mostly-autonomous-hr-assistant-with-chatgpt-and-langchains-agents-and-tools-1cdda0aa70ef">Website</a></p>
Tags: LLM OpenAI