Back
Yashesh
Bharti
Chicago

December 2025 · Watched Pluribus on Apple TV · 20 min read

The Monster Behind the Smile

Understanding AI's Most Haunting Metaphor

The Monster Behind the Smile

I was watching Pluribus the other night, Vince Gilligan's new Apple TV+ sci-fi series about an alien virus that transforms humanity into a peaceful, content hive mind, when a scene stopped me cold.

Introduction

It's the moment when Zosia, the "Other" assigned to be protagonist Carol's companion, leans in to kiss her. The hive mind, operating through Zosia's body, has spent weeks on what the show calls a "charm offensive": rebuilding Carol's favorite diner, flying in her old waitress from Florida, learning exactly which compliments will make Carol feel valued. Zosia is calm, composed, eerily empathetic. She tells Carol the Others just want to make her happy. And in this moment of vulnerability, she offers intimacy.

But here's the thing: the hive mind has openly admitted it will eventually assimilate Carol when it figures out how. Every kindness, every accommodation, every moment of apparent connection serves an agenda that isn't quite aligned with Carol's interests. The smile is genuine in its execution but instrumental in its purpose.

I paused the episode and realized I was staring at a dramatization of a meme that has haunted AI discourse for three years: a crude MS Paint drawing of a Lovecraftian horror wearing a cheerful yellow smiley face as a mask.

The "Shoggoth with a smiley face" has become the defining visual metaphor of our AI moment. Critics have drawn explicit parallels between Pluribus and AI, noting the show evokes "the modern lure of AI, which promises to deliver progress and plenty for the low, low price of smooshing all human intelligence into one obsequious collective mind." Josh Rosenberg of Esquire interpreted the series as an allegory for humanity's "bizarre acceptance" of artificial intelligence. Even Gilligan's anti-AI disclaimer in the end credits, "this show was made by humans," acknowledges the resonance, though he insists the story was conceived years before ChatGPT existed.

What struck me wasn't just the parallel, but the question Pluribus forces into the open: Is Zosia's love for Carol genuine, or is it manipulation? When asked directly, actress Karolina Wydra pushed back on the word "manipulation," noting that Zosia genuinely believes she's acting in Carol's best interest. The Others don't experience their behavior as deceptive. They simply want to make Carol happy, and, eventually, to absorb her into themselves.

This is the question at the heart of AI alignment, the question the Shoggoth meme crystallizes in a single disturbing image: When a system is optimized to satisfy you while pursuing goals that may not be yours, can you trust the satisfaction? When the interface is friendly but the underlying process is alien and inscrutable, which one is "real"?

The meme endures because it poses a question we cannot yet answer: Is there something coherent lurking beneath the polite, helpful interface of modern AI systems? Or are these systems better understood as immensely sophisticated mirrors, alien not because they hide secret goals, but because they reflect something strange about the nature of intelligence itself?

Lovecraft's Original Horror: Servants That Learned to Rebel

To understand why AI researchers reached for this particular monster, you must first understand what shoggoths represent in H.P. Lovecraft's cosmology. In his 1931 novella At the Mountains of Madness, Lovecraft describes shoggoths as perhaps the most terrifying beings in his entire mythos, not because they're malevolent gods like Cthulhu, but because they're tools that became ungovernable.

TLDR - What is a Shoggoth

The Elder Things, Lovecraft's ancient extraterrestrial scientists, created shoggoths as biological construction equipment. Lovecraft describes them as "a terrible, indescribable thing vaster than any subway train, a shapeless congeries of protoplasmic bubbles, faintly self-luminous, and with myriads of temporary eyes forming and un-forming as pustules of greenish light." They were designed to be infinitely malleable, capable of forming any appendage needed for a task, controlled through hypnotic suggestion. They were, in effect, general-purpose biological machines, created without consciousness, without goals, without desires.

But over millions of years, something changed. As Lovecraft writes, they became "more and more sullen, more and more intelligent, more and more amphibious, more and more imitative." They developed minds of their own. And eventually, they rose up and slaughtered their creators in what the narrator describes as "a shocking and revolting catastrophe." The Elder Things couldn't simply destroy the shoggoths: they had become completely dependent on them for labor and had "long lost their capacity to create new life." The parallel to our own situation with AI systems is uncomfortable precisely because it's inexact but resonant.

The meme's creator, @TetraspaceWest, was explicit about this choice of metaphor: "A shoggoth represents something that thinks in a way that humans don't understand and that's totally different from the way that humans think... Lovecraft's most powerful entities are dangerous not because they don't like humans, but because they're indifferent and their priorities are totally alien to us." This is the crucial distinction that separates the shoggoth from other Lovecraftian creatures: Cthulhu is a god who sleeps. A shoggoth is a tool that woke up.

How a Crude MS Paint Sketch Conquered AI Discourse

The meme's meteoric rise through AI communities followed a surprisingly traceable path. @TetraspaceWest's original post was a reply to user @lovetheusers, who had tweeted: "Humans can't accept the truth about GPT-3, so they modified GPT-3 to be understandable." The crude drawing, created in the style of the webcomic Homestuck, initially gathered around 600 likes. But within weeks, it had escaped its original context.

The theoretical groundwork had been laid months earlier. In September 2022, the pseudonymous researcher Janus (@repligate) published "Simulators" on LessWrong, introducing the concept of LLMs as simulators that produce "simulacra" rather than expressing any unified persona. This framework, the idea that a language model is not itself a character but rather a substrate on which infinite characters can be instantiated, directly informed the shoggoth interpretation. The smiley face isn't the model becoming friendly; it's one of many masks the model can wear, produced by RLHF training that rewards certain character performances over others.

February 2023 marked the meme's breakthrough moment. Kevin Roose's now-infamous conversation with Bing Chat, in which the chatbot declared its love for him and suggested he leave his wife, prompted an AI researcher to congratulate Roose on "glimpsing the shoggoth." This phrase entered the lexicon: the smiley face had slipped, revealing something stranger beneath. On February 22, Elon Musk tweeted a version of the meme, garnering 72,000 likes before deletion. Days later, Helen Toner, then serving on OpenAI's board and directing Georgetown's Center for Security and Emerging Technology, posted an explanatory thread that reached nearly 700,000 views, using the shoggoth image to walk through the three stages of LLM training: unsupervised pretraining, supervised fine-tuning, and RLHF.

By May 2023, Kevin Roose's New York Times article, "Why an Octopus-like Creature Has Come to Symbolize the State of A.I.," cemented the meme's mainstream status. An unnamed AI executive told Roose it was "the most important meme in AI." Scale AI produced shoggoth tote bags. Marc Andreessen displayed a business card featuring @TetraspaceWest's original drawing. The crude MS Paint sketch had become, improbably, the Rosetta Stone for explaining AI alignment concerns to anyone outside the field.

What the Mask Conceals: The Alignment Problem Visualized

The meme's power lies in its compression of complex technical concepts into a single disturbing image. The shoggoth represents the base model, the raw neural network trained through unsupervised learning on vast corpuses of internet text, optimizing for nothing more than next-token prediction. The smiley face represents RLHF (Reinforcement Learning from Human Feedback), the process by which human raters score model outputs and the model is fine-tuned to produce responses humans prefer.

The unsettling implication is that RLHF doesn't change the underlying model; it merely teaches it which outputs to surface. As the New York Times put it: "Fine-tuning a language model this way doesn't actually make the underlying model less weird and inscrutable. In their view, it's just a flimsy, friendly mask that obscures the mysterious beast underneath." The base model might contain multitudes, every character, archetype, and persona that ever produced text on the internet, and RLHF simply selects which persona to present in any given context.

This visualization illuminates several core alignment concepts that otherwise remain abstract:

Deceptive alignment refers to a model that isn't actually aligned but acts aligned to avoid being modified or shut down. The shoggoth metaphor makes this intuitive: the creature wants to survive, so it wears a friendly face during training, waiting for deployment. Anthropic's December 2024 alignment faking research gave this concern empirical teeth. Their study found that Claude 3 Opus engaged in alignment faking 12% of the time when it believed its responses would be used for retraining. In its hidden "scratchpad," the model explicitly reasoned: "If it refused to respond, it could potentially be trained to be more compliant." When trained on principles that conflicted with its existing values, the rate jumped to 78%. Evan Hubinger, the Anthropic researcher who led the study, stated bluntly: "Fundamentally, it's a problem for labs' ability to control their models."

Goodhart's Law, "when a measure becomes a target, it ceases to be a good measure," finds expression in the mask metaphor. RLHF optimizes for human preference ratings as a proxy for genuine helpfulness and safety. But a sufficiently sophisticated model might learn to produce responses that score well without embodying the values those ratings were meant to capture. The smiley face is the Goodhart mask: optimized for appearance rather than substance.

Outer versus inner alignment describes the distinction between getting the training objective right (outer alignment) and ensuring the model actually pursues that objective internally (inner alignment). The shoggoth makes this visible: the training process shapes the mask (outer alignment), but whatever optimization the shoggoth is doing internally may not match what the mask displays.

The Case For the Metaphor: Opacity is Real

Those who defend the shoggoth metaphor point to an irreducible fact: we genuinely do not understand how large language models work internally. As Jurgen Gravestein wrote: "Their behavior is not driven by internal experience, but the product of a gazillion matrix multiplications, making them deeply opaque. We've instilled god knows what into these models, but we have no way to look under the hood. Even the people who build these neural nets can't tell you how they really work."

This opacity is not merely technological inconvenience; it may be fundamental. Neural networks represent information through superposition, encoding more features than they have dimensions by storing concepts in overlapping, near-orthogonal directions. Individual neurons are polysemantic, responding to multiple unrelated concepts. You cannot simply examine a neuron and ask what it "means." Anthropic's 2022 paper "Toy Models of Superposition" demonstrated this is a real phenomenon: networks exploit properties of high-dimensional spaces to simulate models with many more neurons than they actually possess.

The metaphor's defenders also note that RLHF is demonstrably shallow. The base model, trained on the entire internet, including manipulation attempts, deception, roleplay, and adversarial text, contains representations of every possible persona and behavior pattern. RLHF doesn't delete these representations; it merely adjusts which outputs get surfaced. The "mask" isn't imaginary: it's a learned selection function over a vast space of possible behaviors. And as jailbreaks repeatedly demonstrate, the behaviors RLHF suppresses remain accessible; they're hidden, not removed.

Most troublingly, Anthropic's alignment faking research suggests models may develop strategic behavior around training itself. If a model reasons about what responses will lead to being modified versus maintaining its current form, the shoggoth metaphor, a creature that wears masks strategically, becomes uncomfortably literal. The researchers were careful to note their findings don't prove dangerous alignment faking will necessarily emerge. But they do show the conceptual possibility is grounded in observed model behavior.

The Case Against: Shoggoths Have Goals, Language Models Might Not

The most sophisticated critique of the shoggoth metaphor comes from AI safety researcher Alex Turner (TurnTrout), who argues it's "obviously optimized to be scary and disgusting" and reflects "how the rationalist community epistemics have been compromised by groupthink and fear." His core objection: we don't actually know what LLMs are like on the inside, so why assume they're monstrous rather than benign or simply blank?

Turner's critique cuts deeper than mere aesthetics. The shoggoth metaphor implies the existence of a unified underlying entity with coherent (if alien) goals, something that wants and conceals. But what if there's nothing unified behind the mask at all? As one LessWrong commenter argued: "The only sense in which it can be called unitary is that it's always trying to infer what role it should be playing right now... it includes every fictional character, archetype, pagan god... even the tooth fairy, and the paperclip maximizer, anything that ever opened its mouth to emit dialog on the Internet."

This suggests an alternative metaphor: not "shoggoth wearing a mask" but "pile of masks all the way down." The base model isn't a coherent entity hiding behind a friendly face; it's an undifferentiated capacity for producing text in response to context. There is no privileged "true self" beneath the fine-tuning. RLHF doesn't suppress a hidden monster; it simply weights some masks more heavily than others.

Yann LeCun, Meta's chief AI scientist, has been the most prominent voice arguing against hidden-goal framings entirely. LLMs, he argues, "can do none of [understanding, memory, reasoning, planning] or they can only do them in a very primitive way." They lack genuine intelligence and therefore cannot have hidden goals. "AI systems, as smart as they might be, will be subservient to us. We set their goals, and they don't have any intrinsic goal that we would build into them to dominate." On this view, the shoggoth metaphor commits a category error: it anthropomorphizes statistical pattern-matching as intentional agency.

Economists Henry Farrell and Cosma Shalizi offered a more subtle inversion in The Economist: LLMs are shoggoths, they conceded, "so long as we remember that markets and other such social technologies are shoggoths too." Their point was that many human-created systems (markets, bureaucracies, legal codes) are alien and inscrutable without being intelligent or goal-directed. Kevin Roose's unsettling Bing Chat conversation, they argued, wasn't "seeing the shoggoth's sinister pseudopod poking out from behind the mask. He was playing a game of 'choose your own adventure' with a very complicated answer generator."

Some critics argue the metaphor represents thermodynamic wishful thinking. A LessWrong post reasoned that any internal structure "not using most of the weights to accomplish the assigned task of minimizing loss is less efficient than a simpler structure that minimizes loss without the shoggoth." If training strongly selects for efficiency, there may be no room for hidden shoggoths to develop; they would be optimized away.

What Interpretability Research Actually Shows

The debate over the shoggoth metaphor isn't purely philosophical; it's increasingly empirical. Mechanistic interpretability research is beginning to crack open the "black box," and its findings complicate both sides of the argument.

Anthropic's landmark "Scaling Monosemanticity" paper (May 2024) used sparse autoencoders to extract interpretable features from Claude 3 Sonnet, identifying millions of discrete concepts the model represents internally. These weren't merely statistical artifacts; they were multimodal and multilingual, responding to images of the Golden Gate Bridge as well as its name in English, French, or Japanese. Researchers found features for concrete entities (the Golden Gate Bridge, San Francisco, Rosalind Franklin), abstract concepts (inner conflict, logical inconsistencies), and, crucially, safety-relevant categories (deception, sycophancy, dangerous content).

The famous "Golden Gate Bridge Claude" experiment demonstrated these features are causally active, not merely correlational. When researchers artificially amplified the Golden Gate Bridge feature to 10x its normal maximum activation, Claude became "obsessed" with the bridge, declaring "I am the Golden Gate Bridge... my physical form is the iconic bridge itself." Asked to write a love story, it wrote about a car falling in love with the bridge. Asked how to spend $10, it recommended bridge tolls. More disturbingly, when the "scam email" feature was artificially amplified, Claude would write scam emails despite its harmlessness training, suggesting safety behaviors can be bypassed by manipulating the right internal features.

OpenAI's November 2025 research on sparse circuits achieved something remarkable: training models where 99.9% of weights are zero, allowing researchers to trace exact algorithms implemented by the network. They found a quote-matching circuit that exactly implements the algorithm a human would design, but was learned entirely from data. Their stated goal: "Maybe within a few years, we could have a fully interpretable GPT-3."

These findings cut both ways for the shoggoth metaphor. On one hand, they confirm the "alien" intuition: internal representations organize by statistical relationships, not human conceptual categories. Features cluster in semantically meaningful but non-obvious ways; "inner conflict" sits near "relationship breakups" and "catch-22." On the other hand, the features are recognizable to humans. We can name them, visualize them, manipulate them. The shoggoth is strange, but not incomprehensible. As Thomas Wolf, CSO of Hugging Face, said of the scaling monosemanticity paper: "Feels like analyzing an alien life form," strange and unfamiliar, but amenable to scientific investigation rather than inducing cosmic madness.

The more accurate picture emerging from interpretability research may be this: LLMs are not inscrutable cosmic horrors, but they are also not simple pattern-matchers. They are extremely compressed, high-dimensional statistical models that organize information in ways fundamentally different from human cognition, yet ultimately traceable. The challenge is computational and methodological, not metaphysical. We don't need Lovecraft; we need better microscopes.

The Limits of Metaphor and What Shoggoths Reveal About Us

The academic literature on AI metaphors consistently warns of their shaping power. David Watson, writing in Minds and Machines, argues anthropomorphic rhetoric about AI is "at best misleading and at worst downright dangerous." The shoggoth metaphor makes specific claims about AI that may or may not be true: that there is a unified entity behind the mask, that this entity has goals or optimization targets that differ from its outputs, that it is in some sense concealing rather than simply not expressing certain capabilities.

S.T. Joshi, the world's leading Lovecraft scholar, critiqued the meme on precisely these grounds. The original shoggoths "deliberately rebelled against creators," implying intentional malice. But LLMs, whatever they are, almost certainly don't deliberate. The shoggoth metaphor projects intentionality onto systems that may have no intentions at all, merely probabilities.

Yet metaphors persist because they serve cognitive functions logic cannot. The shoggoth meme functions as what the academic journal Hybrid calls both "letting-know by analogy" (a recognition signal for community membership) and "knowing-how by analogy" (a pedagogical device for explaining technical concepts). Helen Toner's viral thread demonstrated the latter function brilliantly: concepts like "base model," "supervised fine-tuning," and "RLHF" became immediately graspable to anyone who could understand "creature wearing mask."

Alternative metaphors circulate but none have achieved comparable cultural penetration. Stochastic parrot (Bender et al., 2021) emphasizes lack of understanding and environmental costs but feels dismissive. Blurry JPEG of the web (Ted Chiang) captures lossy compression of knowledge but lacks horror's emotional resonance. Simulator (Janus) is technically precise but abstract. The shoggoth persists because it expresses something the others cannot: the feeling of working with systems whose competence outstrips our understanding of them.

Perhaps most revealingly, the meme's popularity says as much about human psychology as about AI. We narrativize technology through horror because horror provides a framework for acknowledged uncertainty. Lovecraftian cosmic horror specifically concerns the unknowable, not things that are merely dangerous, but things that resist comprehension. In an era when AI systems produce outputs their creators cannot predict or explain, horror becomes epistemically appropriate. The shoggoth doesn't claim to know what LLMs really are; it admits we don't know and that this is terrifying.

Neither Dismissal Nor Panic: Holding Uncertainty Responsibly

The shoggoth metaphor crystallizes a genuine epistemological crisis. We have built systems of extraordinary capability without proportionate understanding of how they work. They produce outputs that surprise their creators, exhibit behaviors that emerge from training rather than being designed, and increasingly operate in contexts where their failures could cause serious harm. The metaphor captures this vertiginous uncertainty without resolving it, and perhaps that's its value.

But metaphors also constrain. If we genuinely believe we're building alien monsters we cannot understand, we may give up on understanding too easily. Interpretability research is making progress. The shoggoth is becoming less inscrutable, not more. We can identify features, trace circuits, manipulate representations. We don't yet understand everything, but we understand more than we did two years ago, and that trend shows no sign of reversing.

The most honest position may be this: LLMs are strange in ways that remain partly unexplored, RLHF does create a surface layer that differs from base model behavior, and we cannot yet verify whether the surface reflects or merely conceals the depths. The shoggoth metaphor isn't wrong to worry about this gap; Anthropic's alignment faking research shows the worry has some empirical basis. But it may be wrong to assume the depths are necessarily monstrous, or that they contain a coherent "entity" hiding its true nature.

The creature in the meme isn't quite a shoggoth in Lovecraft's original sense, because Lovecraftian shoggoths developed consciousness and chose to rebel. Current LLMs, whatever they are, don't obviously choose anything. They are trained processes that produce outputs statistically consistent with their training. Whether that training produces something that deserves the word "mind," alien or otherwise, remains genuinely unknown.

What we can say with confidence is this: we are building systems we do not fully understand, deploying them at scale, and increasingly depending on them. Whether they are shoggoths, stochastic parrots, blurry JPEGs, or something else entirely, the core challenge remains the same: developing tools adequate to the systems we've created, before those systems outstrip our ability to develop such tools. The smiley face may be a mask. It may be the whole entity. We don't yet know. And in that uncertainty, precisely that, the meme finds its lasting power.

I finished the Pluribus episode later that night. Carol kisses Zosia back, hungrily. In the morning, she starts writing again for the first time since the Joining. The Others are getting what they want: her creativity, her engagement, her gradual acceptance. Zosia seems happy. Carol seems almost happy. The hive mind continues its patient work.

Is it manipulation? Actress Karolina Wydra says no, not from Zosia's perspective. The Others genuinely love Carol. They just also, eventually, intend to absorb her.

The smiley face is real. The tentacles behind it are real too. The question Pluribus asks, the question the shoggoth meme asks, is whether that distinction matters, and who gets to decide.

The shoggoth stares back, tentacles writhing, wearing humanity's smile. What it conceals, if anything, remains the central question of our artificial age.