Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Large Language Model Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Large Language Model
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Type of artificial neural network for natural language processing}} {{Infobox artificial intelligence | name = Large language model | image = [[File:Transformer model architecture.svg|250px]] | caption = The transformer architecture, the foundation of most modern large language models | invented_by = [[Vaswani et al.]] (Google Brain, 2017) | latest_release_version = Various (e.g., GPT-4o, Claude 3.5, Grok 3, Llama 4, Gemini 2) | latest_release_date = 2024β2026 | genre = [[Natural language processing]], [[Generative artificial intelligence]] | license = Varies (proprietary or open-source) }} A '''large language model''' ('''LLM''') is a type of [[artificial neural network]] trained on vast amounts of text data to understand, generate, and manipulate human language. LLMs are a core technology behind modern [[generative artificial intelligence]] systems such as [[ChatGPT]], [[Claude (chatbot)|Claude]], [[Grok (chatbot)|Grok]], and [[Gemini (chatbot)|Gemini]]. == History == The foundations of large language models trace back to early statistical language models and [[recurrent neural networks]] (RNNs). Key milestones include: * '''2017''': The seminal paper ["Attention Is All You Need"](https://arxiv.org/abs/1706.03762) by [[Ashish Vaswani]] and colleagues at Google introduced the '''[[transformer (machine learning model)|transformer]]''' architecture, which replaced recurrent layers with self-attention mechanisms, enabling much better parallelization and scaling.<ref name="transformer">{{cite journal |last1=Vaswani |first1=Ashish |title=Attention Is All You Need |journal=Advances in Neural Information Processing Systems |date=2017}}</ref> * '''2018''': [[OpenAI]] released [[GPT (language model)|GPT-1]], followed by GPT-2 in 2019, demonstrating the power of scaling up transformer-based models. * '''2020''': [[GPT-3]] with 175 billion parameters showed emergent abilities such as few-shot learning, sparking widespread public interest. * '''2022β2023''': The release of [[ChatGPT]] (based on GPT-3.5 and later GPT-4) brought LLMs into mainstream use. Open-source models like [[Meta]]'s [[Llama]] series and [[Mistral AI]] models democratized access. * '''2024β2026''': Continued scaling with multimodal models (text + image + audio), longer context windows (millions of tokens), and reasoning-focused architectures. == Architecture == Most modern LLMs are based on the '''decoder-only transformer''' architecture: * '''Self-attention''' mechanism that allows the model to weigh the importance of different words in a sequence. * '''Feed-forward neural networks''' applied at each position. * '''Layer normalization''' and '''residual connections''' for stable training. * '''Positional encoding''' (or rotary embeddings like RoPE) to handle sequence order. Key variants include: * Encoder-decoder (e.g., original T5, BART) * Decoder-only (most popular for generative tasks: GPT, Llama, Grok, Mistral) * Mixture-of-Experts (MoE) architectures (e.g., Mixtral, Grok-1) that activate only a subset of parameters per token for efficiency. == Training == LLMs undergo two main training phases: === Pre-training === * '''Objective''': Next-token prediction (causal language modeling) or masked language modeling. * '''Data''': Trillions of tokens from web crawls (Common Crawl), books, Wikipedia, code repositories, scientific papers, and more. * '''Compute''': Trained on thousands of GPUs/TPUs for weeks or months using massive distributed training frameworks. === Post-training (alignment) === * '''Supervised fine-tuning''' (SFT) on high-quality instruction datasets. * '''Reinforcement Learning from Human Feedback''' (RLHF) or alternatives like Direct Preference Optimization (DPO) to make outputs more helpful, honest, and harmless. == Capabilities == Large language models can perform a wide range of tasks: * Text generation, summarization, translation, and rewriting * Question answering and knowledge retrieval * Code generation and debugging * Mathematical reasoning (improved in recent models) * Creative writing, role-playing, and conversation * Multimodal understanding (in models like GPT-4o, Gemini, Claude 3) '''Emergent abilities''' appear as models scale: abilities not explicitly trained for but that arise at certain parameter thresholds. == Limitations and Challenges == * '''Hallucinations''': Generating plausible but factually incorrect information. * '''Context window''' limits (though rapidly expanding to 1M+ tokens). * '''Bias and toxicity''' inherited from training data. * '''High computational cost''' for training and inference. * '''Lack of true understanding''' β models predict patterns rather than comprehend meaning. * '''Reasoning limitations''': Struggle with complex multi-step problems without techniques like chain-of-thought prompting. == Notable Models == {| class="wikitable sortable" ! Model !! Developer !! Parameters !! Release !! Notes |- | [[GPT-4]] || OpenAI || Undisclosed (~1.7T rumored) || 2023 || Multimodal, strong reasoning |- | [[Claude 3.5 Sonnet]] || Anthropic || Undisclosed || 2024β2025 || Known for safety and coding |- | [[Llama 3]] / [[Llama 4]] || Meta || 8Bβ405B+ || 2024β2025 || Open weights |- | [[Grok|Grok]] || xAI || Various || 2023β2026 || Built for maximum truth-seeking and humor |- | [[Gemini|Gemini]] || Google || Various || 2023β2025 || Deep integration with Google ecosystem |- | [[Mistral Large]] / Mixtral || Mistral AI || Various || 2023β2025 || Efficient open models |} == Societal Impact == LLMs have transformed industries including: * Software development (GitHub Copilot, Cursor) * Education and research assistance * Content creation and customer service * Scientific discovery (e.g., AlphaFold integration, materials science) Concerns include: * Job displacement in writing, coding, and analysis roles * Misinformation and deepfakes * Intellectual property and copyright issues * Existential risk debates regarding artificial general intelligence == Ethical and Safety Considerations == Major labs implement various safety measures: * Constitutional AI (Anthropic) * System prompts and guardrails * Red teaming for adversarial testing * Watermarking and detection tools for AI-generated content == See also == * [[Transformer (machine learning model)]] * [[Generative pre-trained transformer]] * [[Artificial general intelligence]] * [[Prompt engineering]] * [[AI alignment]] == References == {{Reflist}} == External links == * [https://arxiv.org/abs/1706.03762 "Attention Is All You Need"] β foundational transformer paper * [https://openai.com/research/gpt-4 GPT-4 Technical Report] * Various model cards on [[Hugging Face]] [[Category:Artificial intelligence]] [[Category:Natural language processing]] [[Category:Machine learning]]
Summary:
Please note that all contributions to Large Language Model Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
My wiki:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Pages included on this page:
Template:Cite journal
(
edit
)
Template:Infobox artificial intelligence
(
edit
)
Template:Reflist
(
edit
)
Template:Short description
(
edit
)
Search
Search
Editing
Large Language Model
Add topic