This post needs some brushing up, considering how fast the LLM community is evolving.

Methods in LLM Models Link to heading

Some of the methods used in LLM models. E.g., FIM.

1. Next-Token Prediction Link to heading

Next-token prediction is a technique where a model generates the next token (word, character, or sub-word) in a sequence, given a preceding context. This approach is fundamental to how autoregressive models like GPT work.

1.1 How it works Link to heading

Contextual Input: The model receives a sequence of tokens (words or subwords), and its task is to predict what comes next in the sequence based on the input context.
Autoregressive Nature: Once the next token is predicted, it is appended to the input sequence, and the process continues iteratively, where the model predicts the token after the newly formed sequence. This is how large language models can generate coherent text one token at a time.

1.2 Example Link to heading

Input: “The cat is on the”
Model Prediction: “mat”

The model learns from large amounts of text data to predict the likelihood of each possible next token, often selecting the most probable one based on the training corpus.

2. Fill-In-Middle (FIM) Link to heading

Fill-In-Middle (FIM) is a more recent and flexible technique that allows the model to generate text not just in a left-to-right fashion, but also to complete missing parts within a given sequence. Instead of predicting the next token only at the end, FIM involves filling in a “gap” or “blank” within a sequence.

2.1 How it works Link to heading

Two contexts: The model is provided with two contexts — a “prefix” (the text before the blank) and a “suffix” (the text after the blank). The task is to generate the missing portion (the “middle”) that fits well between the prefix and suffix.
Bimodal generation: FIM can be more challenging than next-token prediction because the model must generate text that makes sense in both forward (prefix) and backward (suffix) contexts.

2.2 Example Link to heading

Input: “The cat is ___ on the mat.”
Model Prediction: “sleeping” or “curled up”

In FIM, the model leverages its understanding of both sides of the gap to create a coherent and grammatically correct completion for the blank.

3. Key Differences Link to heading

Next-Token Prediction is a left-to-right process that predicts one token at a time after the current sequence.
FIM allows for more flexible text completion by filling in a missing middle section, which requires understanding both the preceding and following contexts.

These two methods enable language models to generate text in diverse ways, making them versatile for tasks like text completion, story generation, and dialogue systems.