Jump to content

Large Language Model

From Large Language Model Wiki
Revision as of 02:05, 27 March 2026 by Jasongeek (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Template:Short description Template:Infobox artificial intelligence

A large language model (LLM) is a type of artificial neural network trained on vast amounts of text data to understand, generate, and manipulate human language. LLMs are a core technology behind modern generative artificial intelligence systems such as ChatGPT, Claude, Grok, and Gemini.

History

[edit]

The foundations of large language models trace back to early statistical language models and recurrent neural networks (RNNs). Key milestones include:

  • 2018: OpenAI released GPT-1, followed by GPT-2 in 2019, demonstrating the power of scaling up transformer-based models.
  • 2020: GPT-3 with 175 billion parameters showed emergent abilities such as few-shot learning, sparking widespread public interest.
  • 2022–2023: The release of ChatGPT (based on GPT-3.5 and later GPT-4) brought LLMs into mainstream use. Open-source models like Meta's Llama series and Mistral AI models democratized access.
  • 2024–2026: Continued scaling with multimodal models (text + image + audio), longer context windows (millions of tokens), and reasoning-focused architectures.

Architecture

[edit]

Most modern LLMs are based on the decoder-only transformer architecture:

  • Self-attention mechanism that allows the model to weigh the importance of different words in a sequence.
  • Feed-forward neural networks applied at each position.
  • Layer normalization and residual connections for stable training.
  • Positional encoding (or rotary embeddings like RoPE) to handle sequence order.

Key variants include:

  • Encoder-decoder (e.g., original T5, BART)
  • Decoder-only (most popular for generative tasks: GPT, Llama, Grok, Mistral)
  • Mixture-of-Experts (MoE) architectures (e.g., Mixtral, Grok-1) that activate only a subset of parameters per token for efficiency.

Training

[edit]

LLMs undergo two main training phases:

Pre-training

[edit]
  • Objective: Next-token prediction (causal language modeling) or masked language modeling.
  • Data: Trillions of tokens from web crawls (Common Crawl), books, Wikipedia, code repositories, scientific papers, and more.
  • Compute: Trained on thousands of GPUs/TPUs for weeks or months using massive distributed training frameworks.

Post-training (alignment)

[edit]
  • Supervised fine-tuning (SFT) on high-quality instruction datasets.
  • Reinforcement Learning from Human Feedback (RLHF) or alternatives like Direct Preference Optimization (DPO) to make outputs more helpful, honest, and harmless.

Capabilities

[edit]

Large language models can perform a wide range of tasks:

  • Text generation, summarization, translation, and rewriting
  • Question answering and knowledge retrieval
  • Code generation and debugging
  • Mathematical reasoning (improved in recent models)
  • Creative writing, role-playing, and conversation
  • Multimodal understanding (in models like GPT-4o, Gemini, Claude 3)

Emergent abilities appear as models scale: abilities not explicitly trained for but that arise at certain parameter thresholds.

Limitations and Challenges

[edit]
  • Hallucinations: Generating plausible but factually incorrect information.
  • Context window limits (though rapidly expanding to 1M+ tokens).
  • Bias and toxicity inherited from training data.
  • High computational cost for training and inference.
  • Lack of true understanding — models predict patterns rather than comprehend meaning.
  • Reasoning limitations: Struggle with complex multi-step problems without techniques like chain-of-thought prompting.

Notable Models

[edit]
Model Developer Parameters Release Notes
GPT-4 OpenAI Undisclosed (~1.7T rumored) 2023 Multimodal, strong reasoning
Claude 3.5 Sonnet Anthropic Undisclosed 2024–2025 Known for safety and coding
Llama 3 / Llama 4 Meta 8B–405B+ 2024–2025 Open weights
Grok xAI Various 2023–2026 Built for maximum truth-seeking and humor
Gemini Google Various 2023–2025 Deep integration with Google ecosystem
Mistral Large / Mixtral Mistral AI Various 2023–2025 Efficient open models

Societal Impact

[edit]

LLMs have transformed industries including:

  • Software development (GitHub Copilot, Cursor)
  • Education and research assistance
  • Content creation and customer service
  • Scientific discovery (e.g., AlphaFold integration, materials science)

Concerns include:

  • Job displacement in writing, coding, and analysis roles
  • Misinformation and deepfakes
  • Intellectual property and copyright issues
  • Existential risk debates regarding artificial general intelligence

Ethical and Safety Considerations

[edit]

Major labs implement various safety measures:

  • Constitutional AI (Anthropic)
  • System prompts and guardrails
  • Red teaming for adversarial testing
  • Watermarking and detection tools for AI-generated content

See also

[edit]

References

[edit]

Template:Reflist

[edit]