Editing Large Language Model (section)

== History ==

The foundations of large language models trace back to early statistical language models and [[recurrent neural networks]] (RNNs). Key milestones include:

* '''2017''': The seminal paper ["Attention Is All You Need"](https://arxiv.org/abs/1706.03762) by [[Ashish Vaswani]] and colleagues at Google introduced the '''[[transformer (machine learning model)|transformer]]''' architecture, which replaced recurrent layers with self-attention mechanisms, enabling much better parallelization and scaling.<ref name="transformer">{{cite journal |last1=Vaswani |first1=Ashish |title=Attention Is All You Need |journal=Advances in Neural Information Processing Systems |date=2017}}</ref>

* '''2018''': [[OpenAI]] released [[GPT (language model)|GPT-1]], followed by GPT-2 in 2019, demonstrating the power of scaling up transformer-based models.

* '''2020''': [[GPT-3]] with 175 billion parameters showed emergent abilities such as few-shot learning, sparking widespread public interest.

* '''2022–2023''': The release of [[ChatGPT]] (based on GPT-3.5 and later GPT-4) brought LLMs into mainstream use. Open-source models like [[Meta]]'s [[Llama]] series and [[Mistral AI]] models democratized access.

* '''2024–2026''': Continued scaling with multimodal models (text + image + audio), longer context windows (millions of tokens), and reasoning-focused architectures.