Editing Large Language Model

{{Short description|Type of artificial neural network for natural language processing}}
{{Infobox artificial intelligence
| name          = Large language model
| image         = [[File:Transformer model architecture.svg|250px]]
| caption       = The transformer architecture, the foundation of most modern large language models
| invented_by   = [[Vaswani et al.]] (Google Brain, 2017)
| latest_release_version = Various (e.g., GPT-4o, Claude 3.5, Grok 3, Llama 4, Gemini 2)
| latest_release_date    = 2024–2026
| genre         = [[Natural language processing]], [[Generative artificial intelligence]]
| license       = Varies (proprietary or open-source)
}}

A '''large language model''' ('''LLM''') is a type of [[artificial neural network]] trained on vast amounts of text data to understand, generate, and manipulate human language. LLMs are a core technology behind modern [[generative artificial intelligence]] systems such as [[ChatGPT]], [[Claude (chatbot)|Claude]], [[Grok (chatbot)|Grok]], and [[Gemini (chatbot)|Gemini]].

== History ==

The foundations of large language models trace back to early statistical language models and [[recurrent neural networks]] (RNNs). Key milestones include:

* '''2017''': The seminal paper ["Attention Is All You Need"](https://arxiv.org/abs/1706.03762) by [[Ashish Vaswani]] and colleagues at Google introduced the '''[[transformer (machine learning model)|transformer]]''' architecture, which replaced recurrent layers with self-attention mechanisms, enabling much better parallelization and scaling.<ref name="transformer">{{cite journal |last1=Vaswani |first1=Ashish |title=Attention Is All You Need |journal=Advances in Neural Information Processing Systems |date=2017}}</ref>

* '''2018''': [[OpenAI]] released [[GPT (language model)|GPT-1]], followed by GPT-2 in 2019, demonstrating the power of scaling up transformer-based models.

* '''2020''': [[GPT-3]] with 175 billion parameters showed emergent abilities such as few-shot learning, sparking widespread public interest.

* '''2022–2023''': The release of [[ChatGPT]] (based on GPT-3.5 and later GPT-4) brought LLMs into mainstream use. Open-source models like [[Meta]]'s [[Llama]] series and [[Mistral AI]] models democratized access.

* '''2024–2026''': Continued scaling with multimodal models (text + image + audio), longer context windows (millions of tokens), and reasoning-focused architectures.

== Architecture ==

Most modern LLMs are based on the '''decoder-only transformer''' architecture:

* '''Self-attention''' mechanism that allows the model to weigh the importance of different words in a sequence.
* '''Feed-forward neural networks''' applied at each position.
* '''Layer normalization''' and '''residual connections''' for stable training.
* '''Positional encoding''' (or rotary embeddings like RoPE) to handle sequence order.

Key variants include:
* Encoder-decoder (e.g., original T5, BART)
* Decoder-only (most popular for generative tasks: GPT, Llama, Grok, Mistral)
* Mixture-of-Experts (MoE) architectures (e.g., Mixtral, Grok-1) that activate only a subset of parameters per token for efficiency.

== Training ==

LLMs undergo two main training phases:

=== Pre-training ===
* '''Objective''': Next-token prediction (causal language modeling) or masked language modeling.
* '''Data''': Trillions of tokens from web crawls (Common Crawl), books, Wikipedia, code repositories, scientific papers, and more.
* '''Compute''': Trained on thousands of GPUs/TPUs for weeks or months using massive distributed training frameworks.

=== Post-training (alignment) ===
* '''Supervised fine-tuning''' (SFT) on high-quality instruction datasets.
* '''Reinforcement Learning from Human Feedback''' (RLHF) or alternatives like Direct Preference Optimization (DPO) to make outputs more helpful, honest, and harmless.

== Capabilities ==

Large language models can perform a wide range of tasks:
* Text generation, summarization, translation, and rewriting
* Question answering and knowledge retrieval
* Code generation and debugging
* Mathematical reasoning (improved in recent models)
* Creative writing, role-playing, and conversation
* Multimodal understanding (in models like GPT-4o, Gemini, Claude 3)

'''Emergent abilities''' appear as models scale: abilities not explicitly trained for but that arise at certain parameter thresholds.

== Limitations and Challenges ==

* '''Hallucinations''': Generating plausible but factually incorrect information.
* '''Context window''' limits (though rapidly expanding to 1M+ tokens).
* '''Bias and toxicity''' inherited from training data.
* '''High computational cost''' for training and inference.
* '''Lack of true understanding''' — models predict patterns rather than comprehend meaning.
* '''Reasoning limitations''': Struggle with complex multi-step problems without techniques like chain-of-thought prompting.

== Notable Models ==

{| class="wikitable sortable"
! Model !! Developer !! Parameters !! Release !! Notes
|-
| [[GPT-4]] || OpenAI || Undisclosed (~1.7T rumored) || 2023 || Multimodal, strong reasoning
|-
| [[Claude 3.5 Sonnet]] || Anthropic || Undisclosed || 2024–2025 || Known for safety and coding
|-
| [[Llama 3]] / [[Llama 4]] || Meta || 8B–405B+ || 2024–2025 || Open weights
|-
| [[Grok|Grok]] || xAI || Various || 2023–2026 || Built for maximum truth-seeking and humor
|-
| [[Gemini|Gemini]] || Google || Various || 2023–2025 || Deep integration with Google ecosystem
|-
| [[Mistral Large]] / Mixtral || Mistral AI || Various || 2023–2025 || Efficient open models
|}

== Societal Impact ==

LLMs have transformed industries including:
* Software development (GitHub Copilot, Cursor)
* Education and research assistance
* Content creation and customer service
* Scientific discovery (e.g., AlphaFold integration, materials science)

Concerns include:
* Job displacement in writing, coding, and analysis roles
* Misinformation and deepfakes
* Intellectual property and copyright issues
* Existential risk debates regarding artificial general intelligence

== Ethical and Safety Considerations ==

Major labs implement various safety measures:
* Constitutional AI (Anthropic)
* System prompts and guardrails
* Red teaming for adversarial testing
* Watermarking and detection tools for AI-generated content

== See also ==
* [[Transformer (machine learning model)]]
* [[Generative pre-trained transformer]]
* [[Artificial general intelligence]]
* [[Prompt engineering]]
* [[AI alignment]]

== References ==
{{Reflist}}

== External links ==
* [https://arxiv.org/abs/1706.03762 "Attention Is All You Need"] — foundational transformer paper
* [https://openai.com/research/gpt-4 GPT-4 Technical Report]
* Various model cards on [[Hugging Face]]

[[Category:Artificial intelligence]]
[[Category:Natural language processing]]
[[Category:Machine learning]]