Large Language Model (LLM) is the magic phrase startups append to their anthem to remain relevant in 2023. LLMs are taking over the industry and potentially our lives. For example, ChatGPT, the fastest-growing app in history, uses an LLM internally. We focus on what a Language Model (LM) is, how an LLM differs from an LM, and why we should care.

What is a Language Model?

LM is a statistical (probabilistic) model capable of learning the underlying patterns in language (human, computer, etc.). For example, if you ask a trained LM what the next word after It's late, I want to ... is, it should probably guess sleep because it has seen this pattern before in the corpus. Similarly, if we ask an LM what the missing word in Roses are … violets are blue is it should say red.

Numerous applications use LMs, already. For example, when you type on your phone's keyboard, the autocomplete is powered by an LM. Likewise, Grammarly uses LMs and knowledge bases to fix our writing.

What is a Large Language Model?

In simple terms, it is what it says! It is an LM that is large. Large here refers to the number of parameters in the model (around billions) and the amount of training data (billions to trillions of words).

Why the size of the model and data matters so much here? Because once we made them large enough, the model started showing properties that don’t exist in smaller ones. LLMs can understand complex grammar, generate coherent sentences and paragraphs, and follow logic.

How to Train a Large Language Model?

The basic idea is to have a large corpus of (scraped or pre-existing) text. Then we train the model to guess the missing words. If the missing word is the last one, it is Generative Training, similar to Generative Pretrained Transformers (GPTs). If the missing word is in the middle, it is Masked Training, as done in Bidirectional Encoder Representations from Transformers (BERT). Both strategies are Unsupervised (Self-Supervised). This game of hide and seek teaches (incentivizes) the model to learn complex patterns (rules) within the language.

Why Care about Large Language Models?

An LLM is an effective compression algorithm. We usually feed trillions of words into an LLM, and it memorizes it within billions of parameters. Then, it allows us to query its memory via naturally spoken Prompts. Since the nature of Prompts can change drastically at will, this model is the closest we ever got to building general-purpose AI. Hence the name Foundation Model.