Machine Learning

Intro to Large Language Models

The following are my notes on a set of videos by Andrej Karpathy (video 1, video 2) that provide an excellent high level overview on what LLMs are and how they’re trained. It is by no means meant to supplement the content of the videos and recommend giving them a watch yourself.

Markov Chains

A Markov chain is a model that describes a set of transitions which are determined by some probability distribution that satisfy the Markov property.

Latent Space

Latent means hidden. Latent space can also be known as the embedding space

Latent Space refers to anĀ abstract multi-dimensional space containing feature values that we cannot interpret directly, but which encodes a meaningful internal representation of externally observed events.

The motivation to learn a latent space (set of hidden topics/ internal representations) over the observed data (set of events) is that large differences in observed space/events could be due to small variations in latent space (for the same topic). Hence, learning a latent space would help the model make better sense of observed data than from observed data itself, which is a very large space to learn from.