AI Glossary Decoded: Key Terms Like AGI, LLMs, RAG, and RLHF Explained in Plain English

By admin | May 09, 2026 | 10 min read

Artificial intelligence is reshaping the world, and along the way, it's creating an entirely new vocabulary to describe how it's doing so. Spend just a few minutes reading about AI, and you'll encounter terms like LLMs, RAG, RLHF, and a host of other acronyms that can leave even seasoned tech professionals feeling a bit lost. This glossary is our effort to demystify that jargon. We refresh it often as the field progresses, so treat it as a living document—much like the AI systems it covers.

**AGI** Artificial general intelligence, or AGI, is a somewhat fuzzy concept. Broadly speaking, it refers to AI that matches or exceeds human capabilities across a wide range of tasks. OpenAI CEO Sam Altman once described AGI as the "equivalent of a median human that you could hire as a co-worker." Meanwhile, OpenAI's charter defines it as "highly autonomous systems that outperform humans at most economically valuable work." Google DeepMind takes a slightly different view, seeing AGI as "AI that's at least as capable as humans at most cognitive tasks." Confused? Don't worry—experts at the forefront of AI research are too.

**AI agent** An AI agent is a tool that uses AI technologies to perform a series of tasks on your behalf—going beyond what a basic AI chatbot can do. This might include filing expenses, booking tickets or a restaurant table, or even writing and maintaining code. However, as we've explained before, this emerging space has many moving parts, so "AI agent" can mean different things to different people. The infrastructure needed to deliver on its promised capabilities is still being built out. But the core idea involves an autonomous system that may draw on multiple AI systems to carry out multi-step tasks.

**API endpoints** Think of API endpoints as "buttons" on the back of a piece of software that other programs can press to make it do things. Developers use these interfaces to build integrations—for example, allowing one application to pull data from another, or enabling an AI agent to control third-party services directly without a human manually operating each interface. Most smart home devices and connected platforms have these hidden buttons available, even if ordinary users never see or interact with them. As AI agents grow more capable, they are increasingly able to find and use these endpoints on their own, opening up powerful—and sometimes unexpected—possibilities for automation.

**Chain of thought** Given a simple question, a human brain can answer without thinking too much—like "which animal is taller, a giraffe or a cat?" But in many cases, you need a pen and paper to get the right answer because there are intermediate steps. For instance, if a farmer has chickens and cows, and together they have 40 heads and 120 legs, you might need to write down a simple equation to find the answer (20 chickens and 20 cows). In an AI context, chain-of-thought reasoning for large language models means breaking down a problem into smaller, intermediate steps to improve the quality of the final result. It usually takes longer to get an answer, but the answer is more likely to be correct, especially in logic or coding contexts. Reasoning models are developed from traditional large language models and optimized for chain-of-thought thinking through reinforcement learning. (See: Large language model)

**Coding agents** This is a more specific concept than an "AI agent," which refers to a program that can take actions on its own, step by step, to complete a goal. A coding agent is a specialized version applied to software development. Rather than simply suggesting code for a human to review and paste in, a coding agent can write, test, and debug code autonomously, handling the kind of iterative, trial-and-error work that typically consumes a developer's day. These agents can operate across entire codebases, spotting bugs, running tests, and pushing fixes with minimal human oversight. Think of it like hiring a very fast intern who never sleeps and never loses focus—though, as with any intern, a human still needs to review the work.

**Compute** Although it's a somewhat versatile term, compute generally refers to the vital computational power that allows AI models to operate. This type of processing fuels the AI industry, giving it the ability to train and deploy its powerful models. The term is often shorthand for the kinds of hardware that provide that computational power—things like GPUs, CPUs, TPUs, and other forms of infrastructure that form the bedrock of the modern AI industry.

**Deep learning** A subset of self-improving machine learning in which AI algorithms are designed with a multi-layered, artificial neural network (ANN) structure. This allows them to make more complex correlations compared to simpler machine learning-based systems, such as linear models or decision trees. The structure of deep learning algorithms draws inspiration from the interconnected pathways of neurons in the human brain. Deep learning AI models can identify important characteristics in data themselves, rather than requiring human engineers to define these features. The structure also supports algorithms that can learn from errors and, through a process of repetition and adjustment, improve their own outputs. However, deep learning systems require a lot of data points to yield good results (millions or more). They also typically take longer to train compared to simpler machine learning algorithms—so development costs tend to be higher. (See: Neural network)

**Diffusion** Diffusion is the technology at the heart of many art-, music-, and text-generating AI models. Inspired by physics, diffusion systems slowly "destroy" the structure of data—for example, photos, songs, and so on—by adding noise until there's nothing left. In physics, diffusion is spontaneous and irreversible—sugar diffused in coffee can't be restored to cube form. But diffusion systems in AI aim to learn a sort of "reverse diffusion" process to restore the destroyed data, gaining the ability to recover the data from noise.

**Distillation** Distillation is a technique used to extract knowledge from a large AI model using a 'teacher-student' model. Developers send requests to a teacher model and record the outputs. Answers are sometimes compared with a dataset to see how accurate they are. These outputs are then used to train the student model, which is trained to approximate the teacher's behavior. Distillation can be used to create a smaller, more efficient model based on a larger model with minimal distillation loss. This is likely how OpenAI developed GPT-4 Turbo, a faster version of GPT-4. While all AI companies use distillation internally, it may have also been used by some AI companies to catch up with frontier models. Distillation from a competitor usually violates the terms of service of AI API and chat assistants.

**Fine-tuning** This refers to the further training of an AI model to optimize performance for a more specific task or area than was previously a focal point of its training—typically by feeding in new, specialized (i.e., task-oriented) data. Many AI startups take large language models as a starting point to build a commercial product but are vying to amp up utility for a target sector or task by supplementing earlier training cycles with fine-tuning based on their own domain-specific knowledge and expertise. (See: Large language model [LLM])

**GAN** A GAN, or Generative Adversarial Network, is a type of machine learning framework that underpins some important developments in generative AI when it comes to producing realistic data—including (but not only) deepfake tools. GANs involve the use of a pair of neural networks, one of which draws on its training data to generate an output that is passed to the other model to evaluate. The two models are essentially programmed to try to outdo each other. The generator is trying to get its output past the discriminator, while the discriminator works to spot artificially generated data. This structured contest can optimize AI outputs to be more realistic without the need for additional human intervention. Though GANs work best for narrower applications (such as producing realistic photos or videos), rather than general-purpose AI.

**Hallucination** Hallucination is the AI industry's preferred term for AI models making stuff up—literally generating information that is incorrect. Obviously, it's a huge problem for AI quality. Hallucinations produce GenAI outputs that can be misleading and could even lead to real-life risks—with potentially dangerous consequences (think of a health query that returns harmful medical advice). The problem of AIs fabricating information is thought to arise as a consequence of gaps in training data. Hallucinations are contributing to a push toward increasingly specialized and/or vertical AI models—i.e., domain-specific AIs that require narrower expertise—as a way to reduce the likelihood of knowledge gaps and shrink disinformation risks.

**Inference** Inference is the process of running an AI model. It's setting a model loose to make predictions or draw conclusions from previously seen data. To be clear, inference can't happen without training; a model must learn patterns in a set of data before it can effectively extrapolate from this training data. Many types of hardware can perform inference, ranging from smartphone processors to beefy GPUs to custom-designed AI accelerators. But not all of them can run models equally well. Very large models would take ages to make predictions on, say, a laptop versus a cloud server with high-end AI chips. (See: Training)

**Large language model (LLM)** Large language models, or LLMs, are the AI models used by popular AI assistants, such as ChatGPT, Claude, Google's Gemini, Meta's AI Llama, Microsoft Copilot, or Mistral's Le Chat. When you chat with an AI assistant, you interact with a large language model that processes your request directly or with the help of different available tools, such as web browsing or code interpreters. LLMs are deep neural networks made of billions of numerical parameters (or weights, see below) that learn the relationships between words and phrases and create a representation of language, a sort of multidimensional map of words. These models are created from encoding the patterns they find in billions of books, articles, and transcripts. When you prompt an LLM, the model generates the most likely pattern that fits the prompt. (See: Neural network)

**Memory cache** Memory cache refers to an important process that boosts inference (which is the process by which AI works to generate a response to a user's query). In essence, caching is an optimization technique, designed to make inference more efficient. AI is obviously driven by high-octane mathematical calculations, and every time those calculations are made, they use up more power. Caching is designed to cut down on the number of calculations a model might have to run by saving particular calculations for future user queries and operations. There are different kinds of memory caching, although one of the more well-known is KV (or key value) caching. KV caching works in transformer-based models and increases efficiency, driving faster results by reducing the amount of time (and algorithmic labor) it takes to generate answers to user questions. (See: Inference)

**Neural network** A neural network refers to the multi-layered algorithmic structure that underpins deep learning—and, more broadly, the whole boom in generative AI tools following the emergence of large language models. Although the idea of taking inspiration from the densely interconnected pathways of the human brain as a design structure for data processing algorithms dates all the way back to the 1940s, it was the much more recent rise of graphical processing hardware (GPUs)—via the video game industry—that really unlocked the power of this theory. These chips proved well suited to training algorithms with many more layers than was possible in earlier epochs—enabling neural network-based AI systems to achieve far better performance across many domains, including voice recognition, autonomous navigation, and drug discovery. (See: Large language model [LLM])

**Open source** Open source refers to software—or, increasingly, AI models—where the underlying code is made publicly available for anyone to use, inspect, or modify. In the AI world, Meta's Llama family of models is a prominent example; Linux is the famous historical parallel in operating systems. Open source approaches allow researchers, developers, and companies around the world to build on top of one another's work, accelerating progress and enabling independent safety audits that closed systems cannot easily provide. Closed source means the code is private—you can use the product but not see how it works, as is the case with OpenAI's GPT models—a distinction that has become one of the defining debates in the AI industry.

**Parallelization** Parallelization means doing many things at the same time instead of one after another—like having 10 employees working on different parts of a project simultaneously instead of one employee doing everything sequentially. In AI, parallelization is fundamental to both training and inference: modern GPUs are specifically designed to perform thousands of calculations in parallel, which is a big reason why they became the hardware backbone of the industry. As AI systems grow more complex and models grow larger, the ability to parallelize work across many chips and many machines has become one of the most important factors in determining how quickly and cost-effectively models can be built and deployed. Research into better parallelization strategies is now a field of study in its own right.

**RAMageddon** RAMageddon is the fun new term for a not-so-fun trend sweeping the tech industry: an ever-increasing shortage of random access memory, or RAM chips, which power pretty much all the tech products we use in our daily lives. As the AI industry has blossomed, the biggest tech companies and AI labs—all vying to have the most powerful and efficient AI—are buying so much RAM to power their data centers that there's not much left for the rest of us. And that supply bottleneck means that what's left is getting more and more expensive. This affects industries like gaming (where major companies have had to raise prices on consoles because it's harder to find memory chips for their devices), consumer electronics (where memory shortage could cause the biggest dip in smartphone shipments in more than a decade), and general enterprise computing (because those companies can't get enough RAM for their own data centers). The surge in prices is only expected to stop after the dreaded shortage ends but, unfortunately, there's not much sign that's going to happen anytime soon.

**Reinforcement learning** Reinforcement learning is a way of training AI where a system learns by trying things and receiving rewards for correct answers—like training your beloved pet with treats, except the "pet" in this scenario is a neural network and the "treat" is a mathematical signal indicating success. Unlike supervised learning, where a model is trained on a fixed dataset of labeled examples