182
An indepth explanation of how LLMs work with an minimum of jargon
(open.substack.com)
This is a most excellent place for technology news and articles.
In the language of classical probability theory: the models learn the probability distribution of words in language from their training data, and then approximate this distribution using their parameters and network structure.
When given a prompt, they then calculate the conditional probabilities of the next word, given the words they have already seen, and sample from that space.
It is a rather simple idea, all of the complexity comes from trying to give the high-dimensional vector operations (that it is doing to calculate conditional probabilities) a human meaning.
Superb summary!