A Very Gentle Introduction to Large Language Models without the Hype

This article is designed to give people with no computer science background some insight into how ChatGPT and similar AI systems work (GPT-3, GPT-4, Bing Chat, Bard, etc). ChatGPT is a chatbot — a type of conversational AI built — but on top of a Large Language Model. Those are definitely words and we will break all of that down. In the process, we will discuss the core concepts behind them. This article does not require any technical or mathematical background. We will make heavy use of metaphors to illustrate the concepts. We will talk about why the core concepts work the way they work and what we can expect or not expect Large Language Models like ChatGPT to do. Here is what we are going to do. We are going to gently walk through some of the terminology associated with Large Language Models and ChatGPT without any jargon. If I have to use jargon, I will break it down without jargon. We will start very basic, with “what is Artificial Intelligence” and work our way up. I will use some recurring metaphors as much as possible. I will talk about the implications of the technologies in terms of what we should expect them to do or should not expect them to do. Let’s go! <h1>1. What is Artificial Intelligence?</h1> But first, let’s start with some basic terminology that you are probably hearing a lot. What is artificial intelligence? <ul> <li>Artificial Intelligence: An entity that performs behaviors that a person might reasonably call intelligent if a human were to do something similar.</li> </ul> It is a bit problematic to define artificial intelligence by using the word “intelligent”, but no one can agree on a good definition of “intelligent”. However, I think this still works reasonably well. It basically says that if we look at something artificial and it does things that are engaging and useful and seem to be somewhat non-trivial, then we might call it intelligent. For example we often ascribe the term “AI” to computer-controlled characters in computer games. Most of these bots are simple pieces of if-then-else code (e.g., “if the player is within range then shoot else move to the nearest boulder for cover”). But if we are doing the job of keeping us engaged and entertained, and not doing any obviously stupid things, then we might think they are more sophisticated than the are. <a href="https://mark-riedl.medium.com/a-very-gentle-introduction-to-large-language-models-without-the-hype-5f67941fa59e">Website</a>