<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://www.largelanguagemodel.wiki/w/index.php?action=history&amp;feed=atom&amp;title=Large_Language_Model</id>
	<title>Large Language Model - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://www.largelanguagemodel.wiki/w/index.php?action=history&amp;feed=atom&amp;title=Large_Language_Model"/>
	<link rel="alternate" type="text/html" href="https://www.largelanguagemodel.wiki/w/index.php?title=Large_Language_Model&amp;action=history"/>
	<updated>2026-05-11T20:24:25Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.45.3</generator>
	<entry>
		<id>https://www.largelanguagemodel.wiki/w/index.php?title=Large_Language_Model&amp;diff=5&amp;oldid=prev</id>
		<title>Jasongeek at 02:05, 27 March 2026</title>
		<link rel="alternate" type="text/html" href="https://www.largelanguagemodel.wiki/w/index.php?title=Large_Language_Model&amp;diff=5&amp;oldid=prev"/>
		<updated>2026-03-27T02:05:53Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 02:05, 27 March 2026&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l86&quot;&gt;Line 86:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 86:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;| [[Llama 3]] / [[Llama 4]] || Meta || 8B–405B+ || 2024–2025 || Open weights&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;| [[Llama 3]] / [[Llama 4]] || Meta || 8B–405B+ || 2024–2025 || Open weights&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;|-&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;|-&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;| [[Grok|Grok]] &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;series &lt;/del&gt;|| xAI || Various || 2023–2026 || Built for maximum truth-seeking and humor&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;| [[Grok|Grok]] || xAI || Various || 2023–2026 || Built for maximum truth-seeking and humor&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;|-&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;|-&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;| [[Gemini|Gemini]] || Google || Various || 2023–2025 || Deep integration with Google ecosystem&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;| [[Gemini|Gemini]] || Google || Various || 2023–2025 || Deep integration with Google ecosystem&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Jasongeek</name></author>
	</entry>
	<entry>
		<id>https://www.largelanguagemodel.wiki/w/index.php?title=Large_Language_Model&amp;diff=4&amp;oldid=prev</id>
		<title>Jasongeek at 02:05, 27 March 2026</title>
		<link rel="alternate" type="text/html" href="https://www.largelanguagemodel.wiki/w/index.php?title=Large_Language_Model&amp;diff=4&amp;oldid=prev"/>
		<updated>2026-03-27T02:05:13Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 02:05, 27 March 2026&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l86&quot;&gt;Line 86:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 86:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;| [[Llama 3]] / [[Llama 4]] || Meta || 8B–405B+ || 2024–2025 || Open weights&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;| [[Llama 3]] / [[Llama 4]] || Meta || 8B–405B+ || 2024–2025 || Open weights&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;|-&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;|-&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;| [[Grok &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;(chatbot)&lt;/del&gt;|Grok]] series || xAI || Various || 2023–2026 || Built for maximum truth-seeking and humor&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;| [[Grok|Grok]] series || xAI || Various || 2023–2026 || Built for maximum truth-seeking and humor&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;|-&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;|-&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;| [[Gemini &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;(chatbot)&lt;/del&gt;|Gemini]] || Google || Various || 2023–2025 || Deep integration with Google ecosystem&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;| [[Gemini|Gemini]] || Google || Various || 2023–2025 || Deep integration with Google ecosystem&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;|-&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;|-&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;| [[Mistral Large]] / Mixtral || Mistral AI || Various || 2023–2025 || Efficient open models&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;| [[Mistral Large]] / Mixtral || Mistral AI || Various || 2023–2025 || Efficient open models&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Jasongeek</name></author>
	</entry>
	<entry>
		<id>https://www.largelanguagemodel.wiki/w/index.php?title=Large_Language_Model&amp;diff=2&amp;oldid=prev</id>
		<title>Jasongeek: Created page with &quot;{{Short description|Type of artificial neural network for natural language processing}} {{Infobox artificial intelligence | name          = Large language model | image         = 250px | caption       = The transformer architecture, the foundation of most modern large language models | invented_by   = Vaswani et al. (Google Brain, 2017) | latest_release_version = Various (e.g., GPT-4o, Claude 3.5, Grok 3, Llama 4, Gemini 2)...&quot;</title>
		<link rel="alternate" type="text/html" href="https://www.largelanguagemodel.wiki/w/index.php?title=Large_Language_Model&amp;diff=2&amp;oldid=prev"/>
		<updated>2026-03-27T01:53:48Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;{{Short description|Type of artificial neural network for natural language processing}} {{Infobox artificial intelligence | name          = Large language model | image         = &lt;a href=&quot;/w/index.php?title=File:Transformer_model_architecture.svg&amp;amp;action=edit&amp;amp;redlink=1&quot; class=&quot;new&quot; title=&quot;File:Transformer model architecture.svg (page does not exist)&quot;&gt;250px&lt;/a&gt; | caption       = The transformer architecture, the foundation of most modern large language models | invented_by   = &lt;a href=&quot;/w/index.php?title=Vaswani_et_al.&amp;amp;action=edit&amp;amp;redlink=1&quot; class=&quot;new&quot; title=&quot;Vaswani et al. (page does not exist)&quot;&gt;Vaswani et al.&lt;/a&gt; (Google Brain, 2017) | latest_release_version = Various (e.g., GPT-4o, Claude 3.5, Grok 3, Llama 4, Gemini 2)...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;{{Short description|Type of artificial neural network for natural language processing}}&lt;br /&gt;
{{Infobox artificial intelligence&lt;br /&gt;
| name          = Large language model&lt;br /&gt;
| image         = [[File:Transformer model architecture.svg|250px]]&lt;br /&gt;
| caption       = The transformer architecture, the foundation of most modern large language models&lt;br /&gt;
| invented_by   = [[Vaswani et al.]] (Google Brain, 2017)&lt;br /&gt;
| latest_release_version = Various (e.g., GPT-4o, Claude 3.5, Grok 3, Llama 4, Gemini 2)&lt;br /&gt;
| latest_release_date    = 2024–2026&lt;br /&gt;
| genre         = [[Natural language processing]], [[Generative artificial intelligence]]&lt;br /&gt;
| license       = Varies (proprietary or open-source)&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
A &amp;#039;&amp;#039;&amp;#039;large language model&amp;#039;&amp;#039;&amp;#039; (&amp;#039;&amp;#039;&amp;#039;LLM&amp;#039;&amp;#039;&amp;#039;) is a type of [[artificial neural network]] trained on vast amounts of text data to understand, generate, and manipulate human language. LLMs are a core technology behind modern [[generative artificial intelligence]] systems such as [[ChatGPT]], [[Claude (chatbot)|Claude]], [[Grok (chatbot)|Grok]], and [[Gemini (chatbot)|Gemini]].&lt;br /&gt;
&lt;br /&gt;
== History ==&lt;br /&gt;
&lt;br /&gt;
The foundations of large language models trace back to early statistical language models and [[recurrent neural networks]] (RNNs). Key milestones include:&lt;br /&gt;
&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;2017&amp;#039;&amp;#039;&amp;#039;: The seminal paper [&amp;quot;Attention Is All You Need&amp;quot;](https://arxiv.org/abs/1706.03762) by [[Ashish Vaswani]] and colleagues at Google introduced the &amp;#039;&amp;#039;&amp;#039;[[transformer (machine learning model)|transformer]]&amp;#039;&amp;#039;&amp;#039; architecture, which replaced recurrent layers with self-attention mechanisms, enabling much better parallelization and scaling.&amp;lt;ref name=&amp;quot;transformer&amp;quot;&amp;gt;{{cite journal |last1=Vaswani |first1=Ashish |title=Attention Is All You Need |journal=Advances in Neural Information Processing Systems |date=2017}}&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;2018&amp;#039;&amp;#039;&amp;#039;: [[OpenAI]] released [[GPT (language model)|GPT-1]], followed by GPT-2 in 2019, demonstrating the power of scaling up transformer-based models.&lt;br /&gt;
&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;2020&amp;#039;&amp;#039;&amp;#039;: [[GPT-3]] with 175 billion parameters showed emergent abilities such as few-shot learning, sparking widespread public interest.&lt;br /&gt;
&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;2022–2023&amp;#039;&amp;#039;&amp;#039;: The release of [[ChatGPT]] (based on GPT-3.5 and later GPT-4) brought LLMs into mainstream use. Open-source models like [[Meta]]&amp;#039;s [[Llama]] series and [[Mistral AI]] models democratized access.&lt;br /&gt;
&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;2024–2026&amp;#039;&amp;#039;&amp;#039;: Continued scaling with multimodal models (text + image + audio), longer context windows (millions of tokens), and reasoning-focused architectures.&lt;br /&gt;
&lt;br /&gt;
== Architecture ==&lt;br /&gt;
&lt;br /&gt;
Most modern LLMs are based on the &amp;#039;&amp;#039;&amp;#039;decoder-only transformer&amp;#039;&amp;#039;&amp;#039; architecture:&lt;br /&gt;
&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Self-attention&amp;#039;&amp;#039;&amp;#039; mechanism that allows the model to weigh the importance of different words in a sequence.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Feed-forward neural networks&amp;#039;&amp;#039;&amp;#039; applied at each position.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Layer normalization&amp;#039;&amp;#039;&amp;#039; and &amp;#039;&amp;#039;&amp;#039;residual connections&amp;#039;&amp;#039;&amp;#039; for stable training.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Positional encoding&amp;#039;&amp;#039;&amp;#039; (or rotary embeddings like RoPE) to handle sequence order.&lt;br /&gt;
&lt;br /&gt;
Key variants include:&lt;br /&gt;
* Encoder-decoder (e.g., original T5, BART)&lt;br /&gt;
* Decoder-only (most popular for generative tasks: GPT, Llama, Grok, Mistral)&lt;br /&gt;
* Mixture-of-Experts (MoE) architectures (e.g., Mixtral, Grok-1) that activate only a subset of parameters per token for efficiency.&lt;br /&gt;
&lt;br /&gt;
== Training ==&lt;br /&gt;
&lt;br /&gt;
LLMs undergo two main training phases:&lt;br /&gt;
&lt;br /&gt;
=== Pre-training ===&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Objective&amp;#039;&amp;#039;&amp;#039;: Next-token prediction (causal language modeling) or masked language modeling.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Data&amp;#039;&amp;#039;&amp;#039;: Trillions of tokens from web crawls (Common Crawl), books, Wikipedia, code repositories, scientific papers, and more.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Compute&amp;#039;&amp;#039;&amp;#039;: Trained on thousands of GPUs/TPUs for weeks or months using massive distributed training frameworks.&lt;br /&gt;
&lt;br /&gt;
=== Post-training (alignment) ===&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Supervised fine-tuning&amp;#039;&amp;#039;&amp;#039; (SFT) on high-quality instruction datasets.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Reinforcement Learning from Human Feedback&amp;#039;&amp;#039;&amp;#039; (RLHF) or alternatives like Direct Preference Optimization (DPO) to make outputs more helpful, honest, and harmless.&lt;br /&gt;
&lt;br /&gt;
== Capabilities ==&lt;br /&gt;
&lt;br /&gt;
Large language models can perform a wide range of tasks:&lt;br /&gt;
* Text generation, summarization, translation, and rewriting&lt;br /&gt;
* Question answering and knowledge retrieval&lt;br /&gt;
* Code generation and debugging&lt;br /&gt;
* Mathematical reasoning (improved in recent models)&lt;br /&gt;
* Creative writing, role-playing, and conversation&lt;br /&gt;
* Multimodal understanding (in models like GPT-4o, Gemini, Claude 3)&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Emergent abilities&amp;#039;&amp;#039;&amp;#039; appear as models scale: abilities not explicitly trained for but that arise at certain parameter thresholds.&lt;br /&gt;
&lt;br /&gt;
== Limitations and Challenges ==&lt;br /&gt;
&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Hallucinations&amp;#039;&amp;#039;&amp;#039;: Generating plausible but factually incorrect information.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Context window&amp;#039;&amp;#039;&amp;#039; limits (though rapidly expanding to 1M+ tokens).&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Bias and toxicity&amp;#039;&amp;#039;&amp;#039; inherited from training data.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;High computational cost&amp;#039;&amp;#039;&amp;#039; for training and inference.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Lack of true understanding&amp;#039;&amp;#039;&amp;#039; — models predict patterns rather than comprehend meaning.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Reasoning limitations&amp;#039;&amp;#039;&amp;#039;: Struggle with complex multi-step problems without techniques like chain-of-thought prompting.&lt;br /&gt;
&lt;br /&gt;
== Notable Models ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
! Model !! Developer !! Parameters !! Release !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| [[GPT-4]] || OpenAI || Undisclosed (~1.7T rumored) || 2023 || Multimodal, strong reasoning&lt;br /&gt;
|-&lt;br /&gt;
| [[Claude 3.5 Sonnet]] || Anthropic || Undisclosed || 2024–2025 || Known for safety and coding&lt;br /&gt;
|-&lt;br /&gt;
| [[Llama 3]] / [[Llama 4]] || Meta || 8B–405B+ || 2024–2025 || Open weights&lt;br /&gt;
|-&lt;br /&gt;
| [[Grok (chatbot)|Grok]] series || xAI || Various || 2023–2026 || Built for maximum truth-seeking and humor&lt;br /&gt;
|-&lt;br /&gt;
| [[Gemini (chatbot)|Gemini]] || Google || Various || 2023–2025 || Deep integration with Google ecosystem&lt;br /&gt;
|-&lt;br /&gt;
| [[Mistral Large]] / Mixtral || Mistral AI || Various || 2023–2025 || Efficient open models&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Societal Impact ==&lt;br /&gt;
&lt;br /&gt;
LLMs have transformed industries including:&lt;br /&gt;
* Software development (GitHub Copilot, Cursor)&lt;br /&gt;
* Education and research assistance&lt;br /&gt;
* Content creation and customer service&lt;br /&gt;
* Scientific discovery (e.g., AlphaFold integration, materials science)&lt;br /&gt;
&lt;br /&gt;
Concerns include:&lt;br /&gt;
* Job displacement in writing, coding, and analysis roles&lt;br /&gt;
* Misinformation and deepfakes&lt;br /&gt;
* Intellectual property and copyright issues&lt;br /&gt;
* Existential risk debates regarding artificial general intelligence&lt;br /&gt;
&lt;br /&gt;
== Ethical and Safety Considerations ==&lt;br /&gt;
&lt;br /&gt;
Major labs implement various safety measures:&lt;br /&gt;
* Constitutional AI (Anthropic)&lt;br /&gt;
* System prompts and guardrails&lt;br /&gt;
* Red teaming for adversarial testing&lt;br /&gt;
* Watermarking and detection tools for AI-generated content&lt;br /&gt;
&lt;br /&gt;
== See also ==&lt;br /&gt;
* [[Transformer (machine learning model)]]&lt;br /&gt;
* [[Generative pre-trained transformer]]&lt;br /&gt;
* [[Artificial general intelligence]]&lt;br /&gt;
* [[Prompt engineering]]&lt;br /&gt;
* [[AI alignment]]&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
{{Reflist}}&lt;br /&gt;
&lt;br /&gt;
== External links ==&lt;br /&gt;
* [https://arxiv.org/abs/1706.03762 &amp;quot;Attention Is All You Need&amp;quot;] — foundational transformer paper&lt;br /&gt;
* [https://openai.com/research/gpt-4 GPT-4 Technical Report]&lt;br /&gt;
* Various model cards on [[Hugging Face]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Artificial intelligence]]&lt;br /&gt;
[[Category:Natural language processing]]&lt;br /&gt;
[[Category:Machine learning]]&lt;/div&gt;</summary>
		<author><name>Jasongeek</name></author>
	</entry>
</feed>