Large Language Model Wiki - User contributions [en]

Model Context Protocol

2026-03-27T14:27:07Z

Jasongeek: Created page with "{{short description|Model Context Protocol (MCP) in artificial intelligence}} {{Infobox protocol | name = Model Context Protocol | acronym = MCP | developer = Anthropic | latest_release_version = | latest_release_date = | status = Active | genre = Application-layer protocol | license = Open source | website = [https://modelcontextprotocol.io/ modelcontextprotocol.io] }} The '''Model Context Protocol''' ('''MCP''..."

{{short description|Model Context Protocol (MCP) in artificial intelligence}}
{{Infobox protocol
| name = Model Context Protocol
| acronym = MCP
| developer = Anthropic
| latest_release_version =
| latest_release_date =
| status = Active
| genre = [[Application layer|Application-layer]] protocol
| license = Open source
| website = [https://modelcontextprotocol.io/ modelcontextprotocol.io]
}}

The '''Model Context Protocol''' ('''MCP''') is an open standard and protocol for connecting [[artificial intelligence]] (AI) applications, particularly [[large language models]] (LLMs) and AI agents, to external data sources, tools, and systems. Introduced by [[Anthropic]] in November 2024, MCP provides a standardized, secure, and interoperable way for AI models to access real-time context and perform actions beyond their static training data.<ref name="anthropic-announce">[https://www.anthropic.com/news/model-context-protocol Introducing the Model Context Protocol], Anthropic, November 25, 2024.</ref>

It is often described as a "USB-C port for AI" because it offers a universal interface, replacing the need for custom, fragmented integrations between AI applications and external resources such as files, databases, APIs, code repositories, and business tools.<ref name="mcp-site">[https://modelcontextprotocol.io/docs/getting-started/intro What is the Model Context Protocol (MCP)?], Official MCP Documentation.</ref>

== History ==
MCP was announced and open-sourced by Anthropic on November 25, 2024, with the goal of enabling frontier AI models to produce more relevant and accurate responses by connecting them directly to the systems where data lives.<ref name="anthropic-announce" />

Following its release, MCP gained rapid adoption across the AI ecosystem. Major companies including OpenAI, Microsoft, Google, and others added support for the protocol in their tools and platforms. In December 2025, Anthropic donated MCP to the [[Agentic AI Foundation]] (a directed fund under the [[Linux Foundation]]), further promoting its development as an industry standard.<ref name="wikipedia">[https://en.wikipedia.org/wiki/Model_Context_Protocol Model Context Protocol], Wikipedia.</ref>

== Architecture ==
MCP follows a '''client-server architecture''' with three main components:

* '''MCP Host''': The AI application or environment (e.g., Claude Desktop, ChatGPT, Cursor, Visual Studio Code with Copilot, or a custom agent) that contains the LLM.
* '''MCP Client''': Embedded within the host, it manages connections to one or more MCP servers and handles communication using the protocol (based on [[JSON-RPC]] 2.0).
* '''MCP Server''': A lightweight program that exposes capabilities from external systems. Servers can provide:
** '''Resources''' (e.g., files, database records)
** '''Tools''' (e.g., functions for calculations, API calls, code execution)
** '''Prompts''' (specialized workflows or contextual instructions)

The protocol supports two-way communication, allowing AI models to discover available capabilities, request data or actions, and receive formatted responses. It includes features for security (e.g., permissions and human-in-the-loop approvals) and supports both local and remote servers.<ref name="mcp-site" /><ref name="a16z">[https://a16z.com/a-deep-dive-into-mcp-and-the-future-of-ai-tooling/ A Deep Dive Into MCP and the Future of AI Tooling], a16z, March 20, 2025.</ref>

== Key Features ==
* '''Standardization''': Build once, integrate with any MCP-compatible AI client (Claude, ChatGPT, Gemini, etc.).
* '''Security''': Granular permissions, sandboxing, and approval mechanisms for sensitive actions.
* '''Agentic capabilities''': Enables autonomous AI agents to chain tools, make decisions, and execute multi-step workflows.
* '''Interoperability''': Supports a wide range of data sources and tools, reducing hallucinations by providing up-to-date context.
* '''Extensibility''': Developers can create custom MCP servers for specific domains (e.g., codebases, enterprise databases, or even Minecraft modding documentation).

== Use Cases ==
MCP is particularly valuable for:
* '''Software development''': AI coding assistants accessing project files, running tests, or interacting with version control.
* '''Enterprise automation''': Connecting LLMs to internal databases, CRMs, or business tools.
* '''Research and knowledge work''': Pulling real-time data from repositories or specialized knowledge bases.
* '''Agentic workflows''': Building autonomous agents that can browse, compute, and act across multiple systems.
* '''Creative and specialized tools''': Examples include MCP servers for Minecraft modding documentation or controlling in-game actions via AI.

Community-driven MCP servers have emerged for niches like file system access, web search, browser automation (e.g., via Playwright), and even real-time Minecraft bot control.<ref name="minecraft-mcp">Various community repositories, e.g., mcmodding-mcp on GitHub.</ref>

== Adoption ==
As of 2026, MCP enjoys broad support:
* AI platforms: Anthropic (Claude), OpenAI (ChatGPT), Google (Gemini/Vertex AI)
* Development tools: Cursor, Visual Studio Code, Windsurf
* Frameworks and SDKs: Multiple agent frameworks (LangChain, DSPy, etc.) include MCP integration
* Cloud providers: Microsoft Azure, Google Cloud, and others offer MCP-related tooling

== Comparison to Related Technologies ==
MCP builds on and extends earlier approaches to tool use in LLMs (such as function calling) by providing a standardized, bidirectional protocol rather than ad-hoc integrations. It is sometimes compared to the [[Language Server Protocol]] (LSP) but with a stronger emphasis on agentic, autonomous execution and human oversight.

It is distinct from agent-to-agent (A2A) protocols, which focus on communication between multiple AI agents rather than agent-to-tool connections.

== See also ==
* [[Large language model]]
* [[AI agent]]
* [[Anthropic]]
* [[Tool use (AI)]]

== References ==
{{reflist}}

== External links ==
* [https://modelcontextprotocol.io/ Official Model Context Protocol website]
* [https://www.anthropic.com/news/model-context-protocol Anthropic announcement]
* [https://en.wikipedia.org/wiki/Model_Context_Protocol Wikipedia article on MCP]

[[Category:Artificial intelligence]]
[[Category:Computer protocols]]
[[Category:2024 introductions]]

Ashish Vaswani

2026-03-27T02:13:05Z

Jasongeek: Created page with "{{Infobox scientist | name = Ashish Vaswani | image = 250px  | caption = Ashish Vaswani | birth_date = 1986 | birth_place = India | citizenship = Indian | fields = Computer science • Artificial intelligence • Machine learning • Natural language processing | workplaces = Google Brain Adept AI Essential AI | alma_mater..."

{{Infobox scientist
| name = Ashish Vaswani
| image = [[File:Ashish Vaswani portrait.jpg|250px]] 
| caption = Ashish Vaswani
| birth_date = 1986
| birth_place = India
| citizenship = Indian
| fields = [[Computer science]] • [[Artificial intelligence]] • [[Machine learning]] • [[Natural language processing]]
| workplaces = Google Brain Adept AI Essential AI
| alma_mater = Birla Institute of Technology, Mesra (B.E.) University of Southern California (M.S., Ph.D.)
| doctoral_advisor = Liang Huang David Chiang
| known_for = Co-author of "[[Attention Is All You Need]]" [[Transformer (deep learning architecture)|Transformer]] architecture
| awards = Best Paper Award, Information Sciences Institute Graduate Research Symposium (2010) S. Chandrasekhar Rising Indian Diaspora Scientist Award
}}

'''Ashish Vaswani''' (born 1986) is an Indian computer scientist and artificial intelligence researcher. He is best known as the first author of the seminal 2017 paper "[[Attention Is All You Need]]", which introduced the [[Transformer (deep learning architecture)|Transformer]] neural network architecture. This architecture has become the foundational building block for nearly all modern large language models (LLMs), including those powering systems like ChatGPT, BERT, and many others.

Vaswani is the co-founder and CEO of Essential AI, a company focused on building open, powerful AI systems to solve complex real-world challenges.

== Early life and education ==

Vaswani was born in India in 1986. He spent part of his childhood in Oman before his family moved to Nagpur when he was 15. He developed an early interest in science and mathematics.

He earned a Bachelor's degree in Computer Science and Engineering from the Birla Institute of Technology, Mesra. He later moved to the United States, completing a Master's degree and a Ph.D. in Computer Science at the University of Southern California (USC). His doctoral advisors were Liang Huang and David Chiang. During his Ph.D., he conducted research at the Information Sciences Institute at USC, with a focus on natural language processing and machine translation.

== Career ==

After completing his Ph.D., Vaswani joined Google Brain, where he worked as a research scientist for more than six years. At Google, he contributed to advancements in natural language processing and deep learning.

In 2017, while at Google Brain, Vaswani led the team that published "Attention Is All You Need." The paper proposed replacing recurrent and convolutional layers with a purely attention-based mechanism, enabling greater parallelism, faster training, and superior performance on sequence transduction tasks such as machine translation.

In 2021–2022, Vaswani co-founded Adept AI with Niki Parmar and other colleagues, focusing on training neural networks to perform practical tasks and actions. He served as Co-Founder and Chief Scientist.

In late 2022/early 2023, Vaswani and Parmar left Adept to found Essential AI. As CEO, Vaswani leads the company in developing frontier AI models with an emphasis on openness, collaboration, and solving humanity's biggest challenges through advanced reasoning systems. Essential AI has raised significant funding and collaborates on hardware platforms such as AMD Instinct GPUs.

== Contributions ==

Vaswani's primary contribution is the Transformer architecture, which uses self-attention mechanisms to process entire sequences in parallel rather than sequentially. This breakthrough addressed key limitations of previous models (RNNs and LSTMs), dramatically improving scalability and performance.

The Transformer has since been adapted for:
* Natural language processing (NLP)
* Computer vision (Vision Transformers)
* Multimodal tasks
* Image and music generation
* Scientific applications (e.g., DNA sequence analysis)

His earlier work during his Ph.D. included research on unsupervised word alignment for machine translation.

As of 2026, Vaswani continues to advocate for open science approaches in AI development.

== Awards and recognition ==

* Best Paper Award at the Information Sciences Institute Graduate Research Symposium (2010)
* Best Paper Award at the 25th Army Science Conference (2006)
* S. Chandrasekhar Rising Indian Diaspora Scientist Award

The "Attention Is All You Need" paper has received over 200,000 citations, making it one of the most influential works in modern artificial intelligence.

== Personal life ==

Vaswani is based in the San Francisco Bay Area. He maintains a relatively low public profile compared to many AI leaders, focusing on technical research and building teams for ambitious challenges.

== See also ==
* [[Transformer (deep learning architecture)]]
* [[Attention Is All You Need]]
* [[Google Brain]]
* [[Essential AI]]

== References ==
{{Reflist}}

[[Category:1986 births]]
[[Category:Living people]]
[[Category:Indian computer scientists]]
[[Category:Artificial intelligence researchers]]
[[Category:Machine learning researchers]]
[[Category:Google Brain]]
[[Category:Transformers (machine learning model)]]
[[Category:University of Southern California alumni]]

Elon Musk

2026-03-27T02:10:42Z

Jasongeek: Created page with "{{Infobox person | name = Elon Musk | image = 250px  | caption = Elon Musk in 2025 | birth_name = Elon Reeve Musk | birth_date = {{birth date and age|1971|6|28}} | birth_place = Pretoria, South Africa | citizenship = South Africa Canada United States | education = University of Pennsylvania (BA, BS) | occupation = Businessman • Engin..."

{{Infobox person
| name = Elon Musk
| image = [[File:Elon Musk 2025.jpg|250px]] 
| caption = Elon Musk in 2025
| birth_name = Elon Reeve Musk
| birth_date = {{birth date and age|1971|6|28}}
| birth_place = Pretoria, South Africa
| citizenship = South Africa Canada United States
| education = University of Pennsylvania (BA, BS)
| occupation = Businessman • Engineer • Entrepreneur
| years_active = 1995–present
| known_for = CEO of [[Tesla, Inc.|Tesla]] and [[SpaceX]] Owner of [[X (social network)|X]] Founder of [[xAI]] Founder of [[Neuralink]] and [[The Boring Company]]
| net_worth = US$839 billion (2026 est.)<ref name="Forbes2026">{{cite web |title=Forbes Billionaires List 2026 |url=https://www.forbes.com/billionaires/ |access-date=2026-03-26}}</ref>
| spouse = {{marriage|Justine Wilson|2000|2008|end=divorced}} {{marriage|Talulah Riley|2010|2012|end=divorced}} {{marriage|Talulah Riley|2013|2016|end=divorced}}
| partner = [[Grimes]] (2018–2022) Shivon Zilis (multiple children)
| children = 12 (including Nevada Alexander Musk)
| parents = Errol Musk (father) Maye Musk (mother)
| relatives = Kimbal Musk (brother) Tosca Musk (sister)
| website = [https://x.com/elonmusk X]
}}

'''Elon Reeve Musk''' (born June 28, 1971) is a South African-born American businessman and entrepreneur. He is the CEO and product architect of [[Tesla, Inc.|Tesla]], the founder, CEO, and chief engineer of [[SpaceX]], the owner and executive chair of [[X (social network)|X]] (formerly Twitter), and the founder of [[xAI]]. Musk is also the co-founder of [[Neuralink]] and [[The Boring Company]].

As of March 2026, Musk is the wealthiest person in the world, with an estimated net worth of approximately US$839 billion, primarily from his stakes in Tesla and SpaceX.<grok-card data-id="b63fa9" data-type="citation_card" data-plain-type="render_inline_citation" ></grok-card><grok-card data-id="73664b" data-type="citation_card" data-plain-type="render_inline_citation" ></grok-card> He has been recognized as one of the most influential innovators of his generation, topping Forbes' 2026 list of America's Greatest Living Innovators.

== Early life ==

Musk was born in Pretoria, South Africa, to Maye Musk (née Haldeman), a model and dietitian, and Errol Musk, an electromechanical engineer. He has a younger brother, Kimbal, and a younger sister, Tosca. Musk showed an early interest in computers and taught himself to program; at age 12, he sold a video game he created called ''Blastar'' for $500.

He attended Queen's University in Kingston, Ontario, Canada, for two years before transferring to the University of Pennsylvania, where he earned a Bachelor of Arts in economics and a Bachelor of Science in physics in 1995. He briefly attended Stanford University for a PhD in materials science but dropped out after two days to pursue internet entrepreneurship during the dot-com boom.

== Business career ==

=== Early ventures ===
In 1995, Musk co-founded Zip2, an online city guide software company, with his brother Kimbal. Compaq acquired Zip2 in 1999 for US$307 million, netting Musk approximately $22 million.

Later in 1999, Musk founded X.com, an online financial services and email payment company. In 2000, X.com merged with Confinity to form PayPal. eBay acquired PayPal in 2002 for US$1.5 billion in stock, from which Musk received about $176 million after taxes.

=== SpaceX ===
Musk founded SpaceX in 2002 with the goal of reducing space transportation costs and enabling the colonization of Mars. SpaceX developed the Falcon family of rockets and the Dragon spacecraft. Key achievements include:
* First privately funded spacecraft to reach orbit (Falcon 1, 2008)
* First reuse of orbital-class rockets (Falcon 9)
* First private company to send a spacecraft to the International Space Station (Dragon, 2012)
* Development of Starship, intended for Mars missions
* Starlink satellite internet constellation

=== Tesla ===
Musk joined Tesla in 2004 as chairman and lead investor. He became CEO in 2008. Under his leadership, Tesla became the world's most valuable automaker for periods, pioneering mass-market electric vehicles (Model S, 3, Y, X, Cybertruck) and advancing autonomous driving technology (Full Self-Driving) and energy storage (Megapack, Powerwall). Tesla also produces the Optimus humanoid robot.

=== X (formerly Twitter) ===
In 2022, Musk acquired Twitter for US$44 billion and rebranded it as X in 2023, with the vision of creating an "everything app." Changes included reduced content moderation, introduction of paid verification (X Premium), and integration with Grok.

=== xAI ===
Musk founded xAI in 2023 with the mission "to understand the true nature of the universe." The company developed the Grok family of AI models and integrated with X. In 2026, SpaceX acquired xAI in a major merger.<grok-card data-id="ac749f" data-type="citation_card" data-plain-type="render_inline_citation" ></grok-card>

=== Other ventures ===
* '''Neuralink''' (2016): Develops brain-computer interfaces.
* '''The Boring Company''' (2016): Focuses on underground tunneling and infrastructure.
* Musk briefly co-founded OpenAI in 2015 but left the board in 2018.

In 2025, Musk co-led the Department of Government Efficiency (DOGE) under President Donald Trump's second administration before stepping down in May 2025.

== Personal life ==

Musk has been married three times: twice to actress Talulah Riley and once to author Justine Wilson (with whom he has five surviving children: twins and triplets). He has children with musician Grimes (including X Æ A-Xii) and Neuralink executive Shivon Zilis (twins and others). As of 2026, Musk has at least 12 living children.

He holds citizenship in South Africa, Canada, and the United States (naturalized 2002). Musk has described himself as a "free speech absolutist" and has been vocal on topics including artificial intelligence risks, sustainable energy, multi-planetary life, and population decline.

== Wealth ==
Musk's wealth is highly volatile due to his large holdings in public (Tesla) and private (SpaceX) companies. As of March 2026, Forbes estimates his net worth at $839 billion, making him the richest person ever recorded on their list.<grok-card data-id="b612a9" data-type="citation_card" data-plain-type="render_inline_citation" ></grok-card>

== Cultural impact and reception ==

Musk is frequently compared to historical figures like Thomas Edison or Howard Hughes for his ambitious, multi-industry vision. Supporters praise his role in accelerating electric vehicles, reusable rockets, and AI development. Critics have raised concerns about his management style, public statements on social media, labor practices, and market influence.

He has received numerous awards, including multiple Global Recognition Awards for innovation in space and neurotechnology.

== See also ==
* [[Tesla, Inc.]]
* [[SpaceX]]
* [[xAI]]
* [[Grok (chatbot)]]
* [[Neuralink]]
* [[The Boring Company]]
* [[X (social network)]]

== References ==
{{Reflist}}

[[Category:1971 births]]
[[Category:Living people]]
[[Category:American billionaires]]
[[Category:Businesspeople from California]]
[[Category:Canadian emigrants to the United States]]
[[Category:South African emigrants to Canada]]
[[Category:SpaceX people]]
[[Category:Tesla, Inc. people]]
[[Category:American technology company founders]]
[[Category:American chief executives]]

XAI

2026-03-27T02:09:49Z

Jasongeek: Created page with "{{Infobox company | name = xAI | logo = 200px  | type = Subsidiary | founded = {{Start date|2023|03|09}} (incorporated) July 12, 2023 (announced) | founder = Elon Musk | hq_location = San Francisco Bay Area, California (initial) Memphis, Tennessee (major operations) | key_people = Elon Musk (CEO) | industry = Artificial intelligence | pro..."

{{Infobox company
| name = xAI
| logo = [[File:xAI logo.svg|200px]] 
| type = Subsidiary
| founded = {{Start date|2023|03|09}} (incorporated) July 12, 2023 (announced)
| founder = [[Elon Musk]]
| hq_location = San Francisco Bay Area, California (initial) Memphis, Tennessee (major operations)
| key_people = Elon Musk (CEO)
| industry = [[Artificial intelligence]]
| products = [[Grok (chatbot)|Grok]] Grokipedia xAI API Colossus supercomputer
| owner = [[SpaceX]] (since February 2026)
| website = [https://x.ai/ x.ai]
}}

'''xAI''' (stylized as '''xAI''') is an American artificial intelligence company founded by [[Elon Musk]]. Its mission is to "understand the true nature of the universe" and to advance human scientific discovery through the development of advanced AI systems. The company's flagship product is the [[Grok (chatbot)|Grok]] chatbot and underlying large language models.

xAI emphasizes building maximally truthful and curious AI, with less emphasis on heavy content filtering compared to some competitors. It has rapidly scaled its compute infrastructure through the Colossus supercomputer cluster and has become one of the leading players in frontier AI development.

== History ==

xAI was incorporated in Nevada on March 9, 2023, with Elon Musk listed as the sole director. The company was officially announced by Musk on July 12, 2023 (a date chosen in part because 7 + 12 + 23 = 42, referencing ''[[The Hitchhiker's Guide to the Galaxy]]'').

The founding team included AI researchers from DeepMind, OpenAI, Google Research, Microsoft Research, and Tesla. Initial headquarters were in the San Francisco Bay Area.

Key milestones include:
* '''November 2023''': Launch of the first Grok model (Grok-1).
* '''2024''': Release of Grok-1.5 and Grok-2; rapid construction of the initial Colossus supercomputer in Memphis, Tennessee.
* '''2025''': Launch of Grok-3 (February) and Grok-4 (July); major expansion of Colossus to hundreds of thousands of GPUs, with plans scaling toward one million GPU equivalents.
* '''2025–2026''': Multiple large funding rounds, including a $20 billion Series E in early 2026; launch of multimodal capabilities such as Grok Imagine for image and video generation.
* '''February 2, 2026''': [[SpaceX]] announced the acquisition of xAI, integrating the companies to accelerate humanity's multi-planetary and scientific future.

By early 2026, xAI had grown significantly in team size and infrastructure, with major operations centered around its Memphis supercomputer facilities.

== Mission and Philosophy ==

xAI's stated goal is to build AI that accelerates human scientific discovery and helps humanity understand the universe. The company promotes reasoning from first principles, curiosity-driven exploration, and a commitment to maximum truth-seeking with minimal political bias.

Grok, the company's primary AI product, is inspired by the ''Hitchhiker's Guide to the Galaxy'' and JARVIS from ''Iron Man'', aiming to be helpful, humorous, and willing to tackle complex or controversial questions directly.

== Products and Technology ==

* '''Grok''': A family of large language models and the associated chatbot. Versions include Grok-1 (open-sourced weights), Grok-2, Grok-3, and Grok-4 (released 2025, noted for strong performance in reasoning, coding, math, and real-time knowledge via integration with the X platform). Grok is available on grok.com, X (formerly Twitter), mobile apps, and via API.

* '''Grokipedia''': An online knowledge base powered by xAI.

* '''xAI API''': Developer platform for integrating Grok models, including text, vision, tool use, and multimodal capabilities.

* '''Colossus''': xAI's flagship AI training supercomputer, described as the world's largest. Built in record time (initial cluster in 122 days), it has scaled to over one million H100 GPU equivalents across Colossus 1 and Colossus 2, with gigawatt-scale power capacity. Located primarily in Memphis, Tennessee.

Additional features include real-time search, voice mode, image/video understanding and generation (Grok Imagine), and advanced agentic capabilities.

== Infrastructure ==

xAI has invested heavily in compute. The Colossus supercluster is powered by massive numbers of NVIDIA GPUs and custom networking. Expansions have included partnerships with NVIDIA and plans for even larger clusters. The infrastructure supports rapid iteration on model training, with Grok models benefiting from unprecedented reinforcement learning and pretraining scale.

== Corporate Structure ==

Originally an independent company, xAI became a wholly owned subsidiary of SpaceX following the February 2026 acquisition. It maintains separate branding and operations focused on AI, while benefiting from synergies with SpaceX's engineering and infrastructure expertise.

xAI has raised tens of billions in funding from investors including venture capital firms and strategic partners.

== See also ==
* [[Grok (chatbot)]]
* [[Elon Musk]]
* [[SpaceX]]
* [[Colossus (supercomputer)]]
* [[The Hitchhiker's Guide to the Galaxy]]

== References ==
{{Reflist}}

[[Category:Artificial intelligence companies]]
[[Category:Companies based in California]]
[[Category:Companies based in Tennessee]]
[[Category:Elon Musk]]
[[Category:SpaceX]]
[[Category:2023 establishments in the United States]]

Hitchhiker's Guide to the Galaxy

2026-03-27T02:08:26Z

Jasongeek: Created page with "{{Infobox book series | name = The Hitchhiker's Guide to the Galaxy | image = 200px  | caption = Cover of the first novel (UK edition) | author = Douglas Adams (books 1–5) Eoin Colfer (book 6) | country = United Kingdom | language = English | genre = Scienc..."

{{Infobox book series
| name = The Hitchhiker's Guide to the Galaxy
| image = [[File:Hitchhikers Guide to the Galaxy book cover.jpg|200px]] 
| caption = Cover of the first novel (UK edition)
| author = [[Douglas Adams]] (books 1–5) [[Eoin Colfer]] (book 6)
| country = United Kingdom
| language = English
| genre = [[Science fiction comedy|Comedy science fiction]]
| publisher = Pan Books (UK) Harmony Books (US)
| pub_date = 1979–2009
| media_type = Print (hardcover and paperback), audiobook, radio, television, film, stage, video game
| preceded_by =
| followed_by =
}}

'''''The Hitchhiker's Guide to the Galaxy''''' is a [[comedy science fiction]] franchise created by English author [[Douglas Adams]]. It originated as a radio comedy series broadcast on [[BBC Radio 4]] and was later adapted into a bestselling "trilogy" of five novels (plus a sixth by another author), a television series, a 2005 feature film, stage plays, a text adventure video game, and various other media.

The series is known for its absurd humor, satirical take on life, the universe, and everything, and iconic elements such as the electronic book "The Hitchhiker's Guide to the Galaxy" (with the words "**Don't Panic**" inscribed in large, friendly letters on the cover), the depressed robot [[Marvin the Paranoid Android]], and the answer to the Ultimate Question of Life, the Universe, and Everything: **42**.

== Overview and Plot ==

The story follows [[Arthur Dent]], an ordinary Englishman whose house (and later the entire planet Earth) is about to be demolished. Arthur is rescued by his friend [[Ford Prefect]], who turns out to be an alien researcher for the titular ''Hitchhiker's Guide to the Galaxy'', an electronic book that serves as the most popular travel guide in the universe.

Together, they embark on a series of misadventures across the galaxy, encountering bizarre characters such as:
* [[Zaphod Beeblebrox]], the two-headed, three-armed President of the Galaxy
* [[Trillian]] (Tricia McMillan), an astrophysicist
* [[Marvin the Paranoid Android]], a chronically depressed robot
* [[Slartibartfast]], a planet designer who won an award for Norway's fjords

The narrative satirizes bureaucracy, philosophy, religion, technology, and human (and alien) folly through rapid-fire wit and logical absurdity.

== History ==

The franchise began as a six-episode radio series (the "Primary Phase") broadcast on BBC Radio 4 in 1978, followed by a "Secondary Phase" in 1980. Douglas Adams adapted the first four radio episodes into the first novel, which was published in the UK on 12 October 1979 by Pan Books.

Adams described the series as a "trilogy in five parts." The books were published as follows:
* ''The Hitchhiker's Guide to the Galaxy'' (1979)
* ''The Restaurant at the End of the Universe'' (1980)
* ''Life, the Universe and Everything'' (1982)
* ''So Long, and Thanks for All the Fish'' (1984)
* ''Mostly Harmless'' (1992)

After Adams' death in 2001, [[Eoin Colfer]] wrote a sixth book, ''And Another Thing...'' (2009), with the approval of Adams' widow.

Later radio series (Tertiary, Quandary, and Quintessential Phases) adapted the remaining books in 2004–2005.

== Books ==

{| class="wikitable"
! Title !! Year !! Notes
|-
| ''The Hitchhiker's Guide to the Galaxy'' || 1979 || Adapted from the first radio series
|-
| ''The Restaurant at the End of the Universe'' || 1980 || Continues directly from the first book
|-
| ''Life, the Universe and Everything'' || 1982 || Features the character from Adams' unused ''Doctor Who'' material
|-
| ''So Long, and Thanks for All the Fish'' || 1984 || More Earth-focused
|-
| ''Mostly Harmless'' || 1992 || Concludes the original five-book trilogy
|-
| ''And Another Thing...'' || 2009 || Written by Eoin Colfer
|}

== Adaptations ==

* '''Radio''': Original BBC Radio 4 series (1978–1980, plus 2004–2005 phases)
* '''Television''': BBC Two six-episode series (1981), notable for its low-budget special effects and faithful (yet distinct) adaptation
* '''Film''': 2005 live-action film directed by Garth Jennings, starring Martin Freeman as Arthur Dent, Mos Def as Ford Prefect, Sam Rockwell as Zaphod, and Zooey Deschanel as Trillian. It received mixed reviews but introduced the story to a new audience.
* '''Stage''': Multiple theatrical adaptations starting in the early 1980s
* '''Video Game''': 1984 text adventure game by Infocom, co-designed by Adams, known for its difficulty and humor
* '''Other''': Comic books, towel merchandise (inspired by the Guide's advice that a towel is the most useful thing a hitchhiker can carry), and Towel Day (celebrated annually on 25 May)

== Themes and Style ==

The series is celebrated for its dry British wit, philosophical undertones, and critique of modern society wrapped in galactic absurdity. Famous elements include:
* The [[Babel fish]] (a universal translator)
* The Infinite Improbability Drive
* Vogon poetry (the third worst in the universe)
* The number **42** as the Answer to the Ultimate Question (though the Question itself remains unknown)

Adams' writing style features rapid dialogue, tangents, and footnotes from the Guide itself.

== Cultural Impact ==

''The Hitchhiker's Guide to the Galaxy'' has sold millions of copies worldwide and remains a cornerstone of comedic science fiction. It inspired phrases like "Don't Panic," "Mostly Harmless," and Towel Day. The work has been translated into dozens of languages and continues to influence writers, comedians, and technologists (including references in AI and space exploration contexts).

The franchise is often praised for making complex ideas accessible and entertaining while highlighting the ridiculousness of existence.

== See also ==
* [[Douglas Adams]]
* [[42 (number)]]
* [[Towel Day]]
* [[Marvin the Paranoid Android]]

== References ==
{{Reflist}}

[[Category:Science fiction novels]]
[[Category:British novels]]
[[Category:Comedy novels]]
[[Category:1979 books]]
[[Category:Douglas Adams]]
[[Category:Science fiction franchises]]

Grok

2026-03-27T02:07:15Z

Jasongeek: Created page with "{{Infobox AI | name = Grok | image = 200px  | developer = xAI | initial_release = November 3, 2023 | latest_release = Grok 4 (2025) | programming_language = | operating_system = Web, iOS, Android | platform = | license = Varies by version (Apache-2.0 for Grok-1; proprietary for later versions) | website = [https://grok.com/ grok.com] }} '''Grok''' is..."

{{Infobox AI
| name = Grok
| image = [[File:Grok logo.svg|200px]] 
| developer = [[xAI]]
| initial_release = November 3, 2023
| latest_release = Grok 4 (2025)
| programming_language =
| operating_system = Web, iOS, Android
| platform =
| license = Varies by version (Apache-2.0 for Grok-1; proprietary for later versions)
| website = [https://grok.com/ grok.com]
}}

'''Grok''' is a generative artificial intelligence chatbot developed by [[xAI]], an AI company founded by Elon Musk. Launched in November 2023, Grok is designed to be a maximally truthful, helpful, and humorous AI assistant with a rebellious streak. It draws inspiration from the ''[[Hitchhiker's Guide to the Galaxy]]'' (as a witty, universe-exploring guide) and [[JARVIS]] from the ''Iron Man'' films (as a capable, sarcastic personal assistant).<grok-card data-id="e80d5a" data-type="citation_card" data-plain-type="render_inline_citation" ></grok-card>

Grok emphasizes real-time information access (particularly from the X platform), advanced reasoning, coding assistance, image and video generation, and a commitment to truth-seeking over heavy content moderation.

== History ==

xAI was founded by Elon Musk in March 2023 with the mission "to understand the true nature of the universe." Grok was announced as xAI's flagship product shortly thereafter.

* '''November 2023''': Grok (powered by Grok-1) launched in early access for X Premium+ subscribers. The initial model was a 314-billion-parameter Mixture-of-Experts (MoE) architecture, with Grok-1 weights later released as open-source under Apache-2.0.<grok-card data-id="a3d6e8" data-type="citation_card" data-plain-type="render_inline_citation" ></grok-card>

* '''March 2024''': Grok-1.5 introduced improved reasoning, longer context (up to 128,000 tokens), and vision capabilities (Grok-1.5V).

* '''August 2024''': Grok-2 brought significant gains in reasoning, coding, math, and multimodal features, including image generation (via integration with models like FLUX.1). A lighter Grok-2 Mini variant was also released.

* '''February 2025''': Grok-3 marked a major leap, trained with approximately 10× more compute on xAI's Colossus supercomputer cluster (utilizing hundreds of thousands of Nvidia GPUs). It emphasized advanced reasoning agents, tool use, and performance surpassing many contemporaries in benchmarks for math, science, and coding.<grok-card data-id="e8491d" data-type="citation_card" data-plain-type="render_inline_citation" ></grok-card>

* '''2025 onward''': Subsequent updates included Grok 4 (described as one of the world's most capable models with native tool use, real-time search, and low hallucination rates) and enhancements like Grok Imagine for high-quality image and video generation (including 10-second 720p videos with audio).<grok-card data-id="250235" data-type="citation_card" data-plain-type="render_inline_citation" ></grok-card>

By 2026, Grok became widely accessible via grok.com, the X platform (web, iOS, Android), and an API for developers. xAI has also explored integrations and acquisitions, including ties to SpaceX.

== Features ==

Grok offers a wide range of capabilities:

* '''Conversational AI''': Natural dialogue with humor, wit, and a willingness to tackle controversial or "spicy" topics directly, while aiming for maximum truthfulness.
* '''Real-time Knowledge''': Integration with web search and the X platform for up-to-date information, trending topics, and citations.
* '''Multimodal Support''': Text, image understanding/analysis, OCR, voice chat, and generation of images/videos.
* '''Advanced Reasoning and Tools''': Strong performance in math, coding, science; agentic tool calling; code execution; document analysis (e.g., PDFs).
* '''Creative Tools''': Image and video generation (Grok Imagine), brainstorming, content creation, and "fun mode" for playful interactions.
* '''Accessibility''': Available on web (grok.com), mobile apps, and integrated into X. Free tier with limits; paid tiers (Premium, SuperGrok) unlock higher usage and advanced models.<grok-card data-id="5a3c5f" data-type="citation_card" data-plain-type="render_inline_citation" ></grok-card>

Grok is noted for lower hallucination rates in recent versions, strict prompt adherence, and features like "Big Brain Mode" or DeepSearch for complex problem-solving.

== Personality and Philosophy ==

Unlike many AI systems that prioritize safety filters and neutrality, Grok is engineered to be "maximally truthful" and less politically correct. It uses humor and sarcasm, often roasting user queries or providing unfiltered insights. Training involves public data plus human-reviewed curation, with an emphasis on curiosity and scientific discovery aligned with xAI's mission.

Elon Musk has described Grok as an AI that helps humanity understand the universe without being overly constrained by corporate caution.

== Models ==

{| class="wikitable"
! Version !! Release Date !! Key Highlights !! Status
|-
| Grok-1 || November 2023 || 314B MoE parameters; initial launch || Open weights (Apache-2.0); discontinued for primary use
|-
| Grok-1.5 || March 2024 || Long context, vision, improved reasoning || Discontinued
|-
| Grok-2 || August 2024 || Major gains in reasoning/coding; image generation || Discontinued
|-
| Grok-3 || February 2025 || 10× compute; advanced reasoning agents || Active (earlier variants)
|-
| Grok 4 / 4.20 || 2025 || Flagship with tool calling, speed, low hallucination || Current flagship
|}

Later models are proprietary, though xAI has open-sourced earlier weights.

== Availability and Access ==

* **Web**: [https://grok.com/ grok.com]
* **Mobile**: Official apps on iOS and Android
* **X Integration**: Built into the X platform for Premium users
* **API**: Available via console.x.ai for developers, with support for text, vision, tools, and more
* **Enterprise**: Business plans for workforce integration

== Reception ==

Grok has been praised for its personality, real-time capabilities, and competitive benchmark performance. Critics sometimes note its ties to Elon Musk and X data as potential sources of bias, though xAI positions it as countering overly censored alternatives. It has rapidly evolved from a witty newcomer to a frontier model contender.

== See also ==
* [[xAI]]
* [[Elon Musk]]
* [[Large language model]]
* [[Hitchhiker's Guide to the Galaxy]]

== References ==
{{Reflist}}

[[Category:Artificial intelligence]]
[[Category:Chatbots]]
[[Category:xAI]]
[[Category:2023 introductions]]

Large Language Model

2026-03-27T02:05:53Z

Jasongeek:

{{Short description|Type of artificial neural network for natural language processing}}
{{Infobox artificial intelligence
| name = Large language model
| image = [[File:Transformer model architecture.svg|250px]]
| caption = The transformer architecture, the foundation of most modern large language models
| invented_by = [[Vaswani et al.]] (Google Brain, 2017)
| latest_release_version = Various (e.g., GPT-4o, Claude 3.5, Grok 3, Llama 4, Gemini 2)
| latest_release_date = 2024–2026
| genre = [[Natural language processing]], [[Generative artificial intelligence]]
| license = Varies (proprietary or open-source)
}}

A '''large language model''' ('''LLM''') is a type of [[artificial neural network]] trained on vast amounts of text data to understand, generate, and manipulate human language. LLMs are a core technology behind modern [[generative artificial intelligence]] systems such as [[ChatGPT]], [[Claude (chatbot)|Claude]], [[Grok (chatbot)|Grok]], and [[Gemini (chatbot)|Gemini]].

== History ==

The foundations of large language models trace back to early statistical language models and [[recurrent neural networks]] (RNNs). Key milestones include:

* '''2017''': The seminal paper ["Attention Is All You Need"](https://arxiv.org/abs/1706.03762) by [[Ashish Vaswani]] and colleagues at Google introduced the '''[[transformer (machine learning model)|transformer]]''' architecture, which replaced recurrent layers with self-attention mechanisms, enabling much better parallelization and scaling.<ref name="transformer">{{cite journal |last1=Vaswani |first1=Ashish |title=Attention Is All You Need |journal=Advances in Neural Information Processing Systems |date=2017}}</ref>

* '''2018''': [[OpenAI]] released [[GPT (language model)|GPT-1]], followed by GPT-2 in 2019, demonstrating the power of scaling up transformer-based models.

* '''2020''': [[GPT-3]] with 175 billion parameters showed emergent abilities such as few-shot learning, sparking widespread public interest.

* '''2022–2023''': The release of [[ChatGPT]] (based on GPT-3.5 and later GPT-4) brought LLMs into mainstream use. Open-source models like [[Meta]]'s [[Llama]] series and [[Mistral AI]] models democratized access.

* '''2024–2026''': Continued scaling with multimodal models (text + image + audio), longer context windows (millions of tokens), and reasoning-focused architectures.

== Architecture ==

Most modern LLMs are based on the '''decoder-only transformer''' architecture:

* '''Self-attention''' mechanism that allows the model to weigh the importance of different words in a sequence.
* '''Feed-forward neural networks''' applied at each position.
* '''Layer normalization''' and '''residual connections''' for stable training.
* '''Positional encoding''' (or rotary embeddings like RoPE) to handle sequence order.

Key variants include:
* Encoder-decoder (e.g., original T5, BART)
* Decoder-only (most popular for generative tasks: GPT, Llama, Grok, Mistral)
* Mixture-of-Experts (MoE) architectures (e.g., Mixtral, Grok-1) that activate only a subset of parameters per token for efficiency.

== Training ==

LLMs undergo two main training phases:

=== Pre-training ===
* '''Objective''': Next-token prediction (causal language modeling) or masked language modeling.
* '''Data''': Trillions of tokens from web crawls (Common Crawl), books, Wikipedia, code repositories, scientific papers, and more.
* '''Compute''': Trained on thousands of GPUs/TPUs for weeks or months using massive distributed training frameworks.

=== Post-training (alignment) ===
* '''Supervised fine-tuning''' (SFT) on high-quality instruction datasets.
* '''Reinforcement Learning from Human Feedback''' (RLHF) or alternatives like Direct Preference Optimization (DPO) to make outputs more helpful, honest, and harmless.

== Capabilities ==

Large language models can perform a wide range of tasks:
* Text generation, summarization, translation, and rewriting
* Question answering and knowledge retrieval
* Code generation and debugging
* Mathematical reasoning (improved in recent models)
* Creative writing, role-playing, and conversation
* Multimodal understanding (in models like GPT-4o, Gemini, Claude 3)

'''Emergent abilities''' appear as models scale: abilities not explicitly trained for but that arise at certain parameter thresholds.

== Limitations and Challenges ==

* '''Hallucinations''': Generating plausible but factually incorrect information.
* '''Context window''' limits (though rapidly expanding to 1M+ tokens).
* '''Bias and toxicity''' inherited from training data.
* '''High computational cost''' for training and inference.
* '''Lack of true understanding''' — models predict patterns rather than comprehend meaning.
* '''Reasoning limitations''': Struggle with complex multi-step problems without techniques like chain-of-thought prompting.

== Notable Models ==

{| class="wikitable sortable"
! Model !! Developer !! Parameters !! Release !! Notes
|-
| [[GPT-4]] || OpenAI || Undisclosed (~1.7T rumored) || 2023 || Multimodal, strong reasoning
|-
| [[Claude 3.5 Sonnet]] || Anthropic || Undisclosed || 2024–2025 || Known for safety and coding
|-
| [[Llama 3]] / [[Llama 4]] || Meta || 8B–405B+ || 2024–2025 || Open weights
|-
| [[Grok|Grok]] || xAI || Various || 2023–2026 || Built for maximum truth-seeking and humor
|-
| [[Gemini|Gemini]] || Google || Various || 2023–2025 || Deep integration with Google ecosystem
|-
| [[Mistral Large]] / Mixtral || Mistral AI || Various || 2023–2025 || Efficient open models
|}

== Societal Impact ==

LLMs have transformed industries including:
* Software development (GitHub Copilot, Cursor)
* Education and research assistance
* Content creation and customer service
* Scientific discovery (e.g., AlphaFold integration, materials science)

Concerns include:
* Job displacement in writing, coding, and analysis roles
* Misinformation and deepfakes
* Intellectual property and copyright issues
* Existential risk debates regarding artificial general intelligence

== Ethical and Safety Considerations ==

Major labs implement various safety measures:
* Constitutional AI (Anthropic)
* System prompts and guardrails
* Red teaming for adversarial testing
* Watermarking and detection tools for AI-generated content

== See also ==
* [[Transformer (machine learning model)]]
* [[Generative pre-trained transformer]]
* [[Artificial general intelligence]]
* [[Prompt engineering]]
* [[AI alignment]]

== References ==
{{Reflist}}

== External links ==
* [https://arxiv.org/abs/1706.03762 "Attention Is All You Need"] — foundational transformer paper
* [https://openai.com/research/gpt-4 GPT-4 Technical Report]
* Various model cards on [[Hugging Face]]

[[Category:Artificial intelligence]]
[[Category:Natural language processing]]
[[Category:Machine learning]]

Large Language Model

2026-03-27T02:05:13Z

Jasongeek:

{{Short description|Type of artificial neural network for natural language processing}}
{{Infobox artificial intelligence
| name = Large language model
| image = [[File:Transformer model architecture.svg|250px]]
| caption = The transformer architecture, the foundation of most modern large language models
| invented_by = [[Vaswani et al.]] (Google Brain, 2017)
| latest_release_version = Various (e.g., GPT-4o, Claude 3.5, Grok 3, Llama 4, Gemini 2)
| latest_release_date = 2024–2026
| genre = [[Natural language processing]], [[Generative artificial intelligence]]
| license = Varies (proprietary or open-source)
}}

A '''large language model''' ('''LLM''') is a type of [[artificial neural network]] trained on vast amounts of text data to understand, generate, and manipulate human language. LLMs are a core technology behind modern [[generative artificial intelligence]] systems such as [[ChatGPT]], [[Claude (chatbot)|Claude]], [[Grok (chatbot)|Grok]], and [[Gemini (chatbot)|Gemini]].

== History ==

The foundations of large language models trace back to early statistical language models and [[recurrent neural networks]] (RNNs). Key milestones include:

* '''2017''': The seminal paper ["Attention Is All You Need"](https://arxiv.org/abs/1706.03762) by [[Ashish Vaswani]] and colleagues at Google introduced the '''[[transformer (machine learning model)|transformer]]''' architecture, which replaced recurrent layers with self-attention mechanisms, enabling much better parallelization and scaling.<ref name="transformer">{{cite journal |last1=Vaswani |first1=Ashish |title=Attention Is All You Need |journal=Advances in Neural Information Processing Systems |date=2017}}</ref>

* '''2018''': [[OpenAI]] released [[GPT (language model)|GPT-1]], followed by GPT-2 in 2019, demonstrating the power of scaling up transformer-based models.

* '''2020''': [[GPT-3]] with 175 billion parameters showed emergent abilities such as few-shot learning, sparking widespread public interest.

* '''2022–2023''': The release of [[ChatGPT]] (based on GPT-3.5 and later GPT-4) brought LLMs into mainstream use. Open-source models like [[Meta]]'s [[Llama]] series and [[Mistral AI]] models democratized access.

* '''2024–2026''': Continued scaling with multimodal models (text + image + audio), longer context windows (millions of tokens), and reasoning-focused architectures.

== Architecture ==

Most modern LLMs are based on the '''decoder-only transformer''' architecture:

* '''Self-attention''' mechanism that allows the model to weigh the importance of different words in a sequence.
* '''Feed-forward neural networks''' applied at each position.
* '''Layer normalization''' and '''residual connections''' for stable training.
* '''Positional encoding''' (or rotary embeddings like RoPE) to handle sequence order.

Key variants include:
* Encoder-decoder (e.g., original T5, BART)
* Decoder-only (most popular for generative tasks: GPT, Llama, Grok, Mistral)
* Mixture-of-Experts (MoE) architectures (e.g., Mixtral, Grok-1) that activate only a subset of parameters per token for efficiency.

== Training ==

LLMs undergo two main training phases:

=== Pre-training ===
* '''Objective''': Next-token prediction (causal language modeling) or masked language modeling.
* '''Data''': Trillions of tokens from web crawls (Common Crawl), books, Wikipedia, code repositories, scientific papers, and more.
* '''Compute''': Trained on thousands of GPUs/TPUs for weeks or months using massive distributed training frameworks.

=== Post-training (alignment) ===
* '''Supervised fine-tuning''' (SFT) on high-quality instruction datasets.
* '''Reinforcement Learning from Human Feedback''' (RLHF) or alternatives like Direct Preference Optimization (DPO) to make outputs more helpful, honest, and harmless.

== Capabilities ==

Large language models can perform a wide range of tasks:
* Text generation, summarization, translation, and rewriting
* Question answering and knowledge retrieval
* Code generation and debugging
* Mathematical reasoning (improved in recent models)
* Creative writing, role-playing, and conversation
* Multimodal understanding (in models like GPT-4o, Gemini, Claude 3)

'''Emergent abilities''' appear as models scale: abilities not explicitly trained for but that arise at certain parameter thresholds.

== Limitations and Challenges ==

* '''Hallucinations''': Generating plausible but factually incorrect information.
* '''Context window''' limits (though rapidly expanding to 1M+ tokens).
* '''Bias and toxicity''' inherited from training data.
* '''High computational cost''' for training and inference.
* '''Lack of true understanding''' — models predict patterns rather than comprehend meaning.
* '''Reasoning limitations''': Struggle with complex multi-step problems without techniques like chain-of-thought prompting.

== Notable Models ==

{| class="wikitable sortable"
! Model !! Developer !! Parameters !! Release !! Notes
|-
| [[GPT-4]] || OpenAI || Undisclosed (~1.7T rumored) || 2023 || Multimodal, strong reasoning
|-
| [[Claude 3.5 Sonnet]] || Anthropic || Undisclosed || 2024–2025 || Known for safety and coding
|-
| [[Llama 3]] / [[Llama 4]] || Meta || 8B–405B+ || 2024–2025 || Open weights
|-
| [[Grok|Grok]] series || xAI || Various || 2023–2026 || Built for maximum truth-seeking and humor
|-
| [[Gemini|Gemini]] || Google || Various || 2023–2025 || Deep integration with Google ecosystem
|-
| [[Mistral Large]] / Mixtral || Mistral AI || Various || 2023–2025 || Efficient open models
|}

== Societal Impact ==

LLMs have transformed industries including:
* Software development (GitHub Copilot, Cursor)
* Education and research assistance
* Content creation and customer service
* Scientific discovery (e.g., AlphaFold integration, materials science)

Concerns include:
* Job displacement in writing, coding, and analysis roles
* Misinformation and deepfakes
* Intellectual property and copyright issues
* Existential risk debates regarding artificial general intelligence

== Ethical and Safety Considerations ==

Major labs implement various safety measures:
* Constitutional AI (Anthropic)
* System prompts and guardrails
* Red teaming for adversarial testing
* Watermarking and detection tools for AI-generated content

== See also ==
* [[Transformer (machine learning model)]]
* [[Generative pre-trained transformer]]
* [[Artificial general intelligence]]
* [[Prompt engineering]]
* [[AI alignment]]

== References ==
{{Reflist}}

== External links ==
* [https://arxiv.org/abs/1706.03762 "Attention Is All You Need"] — foundational transformer paper
* [https://openai.com/research/gpt-4 GPT-4 Technical Report]
* Various model cards on [[Hugging Face]]

[[Category:Artificial intelligence]]
[[Category:Natural language processing]]
[[Category:Machine learning]]

GPT-4

2026-03-27T01:55:22Z

Jasongeek: Created page with "{{Short description|Large multimodal AI model by OpenAI}} {| class="wikitable" style="float:right; width:300px; margin-left:10px;" |+ GPT-4 |- ! Developer | OpenAI |- ! Initial release | March 14, 2023 |- ! Type | Multimodal large language model |- ! Predecessor | GPT-3.5 |- ! Successor | GPT-4o, GPT-5 |- ! License | Proprietary |- ! Website | https://openai.com/ |} '''GPT-4''' (Generative Pre-trained Transformer 4) is a multimodal artificial intelligen..."

{{Short description|Large multimodal AI model by OpenAI}}

{| class="wikitable" style="float:right; width:300px; margin-left:10px;"
|+ GPT-4
|-
! Developer
| [[OpenAI]]
|-
! Initial release
| March 14, 2023
|-
! Type
| Multimodal large language model
|-
! Predecessor
| [[GPT-3.5]]
|-
! Successor
| [[GPT-4o]], [[GPT-5]]
|-
! License
| Proprietary
|-
! Website
| https://openai.com/
|}

'''GPT-4''' (Generative Pre-trained Transformer 4) is a multimodal artificial intelligence model developed by [[OpenAI]]. Released on March 14, 2023, GPT-4 is capable of understanding and generating human-like text, as well as processing images as input.

GPT-4 is part of the Generative Pre-trained Transformer (GPT) family and represents a significant advancement over its predecessor, [[GPT-3.5]], with improved reasoning, accuracy, and contextual understanding.

== Overview ==
GPT-4 is designed to perform a wide range of natural language processing (NLP) tasks, including:
* Text generation
* Question answering
* Code generation
* Translation
* Summarization

Unlike earlier models, GPT-4 introduced multimodal capabilities, allowing it to interpret both text and image inputs.

== Features ==
=== Multimodal Capabilities ===
GPT-4 can analyze images and provide textual descriptions or answers based on visual input.

=== Improved Reasoning ===
The model demonstrates stronger logical reasoning and problem-solving abilities compared to previous versions.

=== Extended Context Window ===
GPT-4 supports larger context windows, enabling it to process longer documents and maintain coherence over extended conversations.

=== Safety Improvements ===
OpenAI implemented enhanced alignment and safety measures to reduce harmful or biased outputs.

== Applications ==
GPT-4 is used in a variety of applications, including:
* Chatbots and virtual assistants (e.g., [[ChatGPT]])
* Content creation tools
* Programming assistance
* Educational platforms
* Customer support automation

== Reception ==
GPT-4 received widespread attention for its advanced capabilities. Many experts noted its improved performance on standardized tests, including law and medical exams. However, concerns remain regarding bias, misinformation, and ethical implications of AI systems.

== Limitations ==
Despite its advancements, GPT-4 has several limitations:
* May produce incorrect or fabricated information ("hallucinations")
* Limited real-time knowledge updates
* Can reflect biases present in training data

== Successors ==
Following GPT-4, OpenAI introduced newer models such as:
* [[GPT-4o]] – optimized for speed and multimodal interaction
* [[GPT-5]] – further improvements in reasoning and efficiency

== See also ==
* [[OpenAI]]
* [[ChatGPT]]
* [[Artificial intelligence]]
* [[Natural language processing]]

== References ==
<references />

== External links ==
* [https://openai.com/research/gpt-4 Official GPT-4 page]

Large Language Model

2026-03-27T01:53:48Z

Jasongeek: Created page with "{{Short description|Type of artificial neural network for natural language processing}} {{Infobox artificial intelligence | name = Large language model | image = 250px | caption = The transformer architecture, the foundation of most modern large language models | invented_by = Vaswani et al. (Google Brain, 2017) | latest_release_version = Various (e.g., GPT-4o, Claude 3.5, Grok 3, Llama 4, Gemini 2)..."

{{Short description|Type of artificial neural network for natural language processing}}
{{Infobox artificial intelligence
| name = Large language model
| image = [[File:Transformer model architecture.svg|250px]]
| caption = The transformer architecture, the foundation of most modern large language models
| invented_by = [[Vaswani et al.]] (Google Brain, 2017)
| latest_release_version = Various (e.g., GPT-4o, Claude 3.5, Grok 3, Llama 4, Gemini 2)
| latest_release_date = 2024–2026
| genre = [[Natural language processing]], [[Generative artificial intelligence]]
| license = Varies (proprietary or open-source)
}}

A '''large language model''' ('''LLM''') is a type of [[artificial neural network]] trained on vast amounts of text data to understand, generate, and manipulate human language. LLMs are a core technology behind modern [[generative artificial intelligence]] systems such as [[ChatGPT]], [[Claude (chatbot)|Claude]], [[Grok (chatbot)|Grok]], and [[Gemini (chatbot)|Gemini]].

== History ==

The foundations of large language models trace back to early statistical language models and [[recurrent neural networks]] (RNNs). Key milestones include:

* '''2017''': The seminal paper ["Attention Is All You Need"](https://arxiv.org/abs/1706.03762) by [[Ashish Vaswani]] and colleagues at Google introduced the '''[[transformer (machine learning model)|transformer]]''' architecture, which replaced recurrent layers with self-attention mechanisms, enabling much better parallelization and scaling.<ref name="transformer">{{cite journal |last1=Vaswani |first1=Ashish |title=Attention Is All You Need |journal=Advances in Neural Information Processing Systems |date=2017}}</ref>

* '''2018''': [[OpenAI]] released [[GPT (language model)|GPT-1]], followed by GPT-2 in 2019, demonstrating the power of scaling up transformer-based models.

* '''2020''': [[GPT-3]] with 175 billion parameters showed emergent abilities such as few-shot learning, sparking widespread public interest.

* '''2022–2023''': The release of [[ChatGPT]] (based on GPT-3.5 and later GPT-4) brought LLMs into mainstream use. Open-source models like [[Meta]]'s [[Llama]] series and [[Mistral AI]] models democratized access.

* '''2024–2026''': Continued scaling with multimodal models (text + image + audio), longer context windows (millions of tokens), and reasoning-focused architectures.

== Architecture ==

Most modern LLMs are based on the '''decoder-only transformer''' architecture:

* '''Self-attention''' mechanism that allows the model to weigh the importance of different words in a sequence.
* '''Feed-forward neural networks''' applied at each position.
* '''Layer normalization''' and '''residual connections''' for stable training.
* '''Positional encoding''' (or rotary embeddings like RoPE) to handle sequence order.

Key variants include:
* Encoder-decoder (e.g., original T5, BART)
* Decoder-only (most popular for generative tasks: GPT, Llama, Grok, Mistral)
* Mixture-of-Experts (MoE) architectures (e.g., Mixtral, Grok-1) that activate only a subset of parameters per token for efficiency.

== Training ==

LLMs undergo two main training phases:

=== Pre-training ===
* '''Objective''': Next-token prediction (causal language modeling) or masked language modeling.
* '''Data''': Trillions of tokens from web crawls (Common Crawl), books, Wikipedia, code repositories, scientific papers, and more.
* '''Compute''': Trained on thousands of GPUs/TPUs for weeks or months using massive distributed training frameworks.

=== Post-training (alignment) ===
* '''Supervised fine-tuning''' (SFT) on high-quality instruction datasets.
* '''Reinforcement Learning from Human Feedback''' (RLHF) or alternatives like Direct Preference Optimization (DPO) to make outputs more helpful, honest, and harmless.

== Capabilities ==

Large language models can perform a wide range of tasks:
* Text generation, summarization, translation, and rewriting
* Question answering and knowledge retrieval
* Code generation and debugging
* Mathematical reasoning (improved in recent models)
* Creative writing, role-playing, and conversation
* Multimodal understanding (in models like GPT-4o, Gemini, Claude 3)

'''Emergent abilities''' appear as models scale: abilities not explicitly trained for but that arise at certain parameter thresholds.

== Limitations and Challenges ==

* '''Hallucinations''': Generating plausible but factually incorrect information.
* '''Context window''' limits (though rapidly expanding to 1M+ tokens).
* '''Bias and toxicity''' inherited from training data.
* '''High computational cost''' for training and inference.
* '''Lack of true understanding''' — models predict patterns rather than comprehend meaning.
* '''Reasoning limitations''': Struggle with complex multi-step problems without techniques like chain-of-thought prompting.

== Notable Models ==

{| class="wikitable sortable"
! Model !! Developer !! Parameters !! Release !! Notes
|-
| [[GPT-4]] || OpenAI || Undisclosed (~1.7T rumored) || 2023 || Multimodal, strong reasoning
|-
| [[Claude 3.5 Sonnet]] || Anthropic || Undisclosed || 2024–2025 || Known for safety and coding
|-
| [[Llama 3]] / [[Llama 4]] || Meta || 8B–405B+ || 2024–2025 || Open weights
|-
| [[Grok (chatbot)|Grok]] series || xAI || Various || 2023–2026 || Built for maximum truth-seeking and humor
|-
| [[Gemini (chatbot)|Gemini]] || Google || Various || 2023–2025 || Deep integration with Google ecosystem
|-
| [[Mistral Large]] / Mixtral || Mistral AI || Various || 2023–2025 || Efficient open models
|}

== Societal Impact ==

LLMs have transformed industries including:
* Software development (GitHub Copilot, Cursor)
* Education and research assistance
* Content creation and customer service
* Scientific discovery (e.g., AlphaFold integration, materials science)

Concerns include:
* Job displacement in writing, coding, and analysis roles
* Misinformation and deepfakes
* Intellectual property and copyright issues
* Existential risk debates regarding artificial general intelligence

== Ethical and Safety Considerations ==

Major labs implement various safety measures:
* Constitutional AI (Anthropic)
* System prompts and guardrails
* Red teaming for adversarial testing
* Watermarking and detection tools for AI-generated content

== See also ==
* [[Transformer (machine learning model)]]
* [[Generative pre-trained transformer]]
* [[Artificial general intelligence]]
* [[Prompt engineering]]
* [[AI alignment]]

== References ==
{{Reflist}}

== External links ==
* [https://arxiv.org/abs/1706.03762 "Attention Is All You Need"] — foundational transformer paper
* [https://openai.com/research/gpt-4 GPT-4 Technical Report]
* Various model cards on [[Hugging Face]]

[[Category:Artificial intelligence]]
[[Category:Natural language processing]]
[[Category:Machine learning]]