Large Language Model (
LLM) is the magic phrase startups append to their anthem to remain relevant in 2023.
LLMs are taking over the industry and potentially our lives. For example,
ChatGPT, the fastest-growing app in history, uses an
LLM internally. We focus on what a
Language Model (
LM) is, how an
LLM differs from an
LM, and why we should care.
What is a Language Model?
LM is a statistical (probabilistic) model capable of learning the underlying patterns in language (human, computer, etc.). For example, if you ask a trained
LM what the next word after It's late, I want to ... is, it should probably guess sleep because it has seen this pattern before in the corpus. Similarly, if we ask an
LM what the missing word in Roses are … violets are blue is it should say red.
Numerous applications use
LMs, already. For example, when you type on your phone's keyboard, the autocomplete is powered by an
LM. Likewise, Grammarly uses
LMs and knowledge bases to fix our writing.
What is a Large Language Model?
In simple terms, it is what it says! It is an
LM that is large. Large here refers to the number of parameters in the model (around billions) and the amount of training data (billions to trillions of words).
Why the size of the model and data matters so much here? Because once we made them large enough, the model started showing properties that don’t exist in smaller ones.
LLMs can understand complex grammar, generate coherent sentences and paragraphs, and follow logic.
How to Train a Large Language Model?
The basic idea is to have a large corpus of (scraped or pre-existing) text. Then we train the model to guess the missing words. If the missing word is the last one, it is
Generative Training, similar to
Generative Pretrained Transformers (
GPTs). If the missing word is in the middle, it is
Masked Training, as done in
Bidirectional Encoder Representations from Transformers
(BERT). Both strategies are
Self-Supervised). This game of hide and seek teaches (incentivizes) the model to learn complex patterns (rules) within the language.
Why Care about Large Language Models?
LLM is an effective compression algorithm. We usually feed trillions of words into an
LLM, and it memorizes it within billions of parameters. Then, it allows us to query its memory via naturally spoken
Prompts. Since the nature of
Prompts can change drastically at will, this model is the closest we ever got to building general-purpose AI. Hence the name