When AI Agents Invent Their Own Language: A Field Guide to Emergent Communication

Posted at 2026-03-29

Introduction

Around 2016, a mildly unsettling story made the rounds: AI agents developed by Facebook AI Research had, in the course of learning to negotiate, spontaneously invented a language that no human could understand. The press treated it as an omen. Researchers, for their part, were not surprised. They had been trying to make it happen.

The setup is deceptively simple. Place multiple agents in a cooperative task, give them a communication channel, and let gradient descent do the rest. To maximize reward, the agents will invent a shared protocol from scratch — a language of sorts, albeit one that looks nothing like English or any other human tongue.

This line of inquiry goes by the name Emergent Communication (EC). In this article, I trace the field's origins, walk through its landmark papers in chronological order, identify the key researchers who shaped it, survey the practical applications its authors envision, and speculate on the ones they have yet to consider.

The Field and Its Neighbors

The canonical label is Emergent Communication, sometimes abbreviated EC. Closely related terms include Language Emergence and Emergent Language, which tend to foreground the linguistic product rather than the communicative process.

Institutionally, EC sits within multi-agent reinforcement learning (MARL), but it draws freely from computational linguistics, evolutionary linguistics, and game theory. Since 2018, ICLR has hosted an annual Emergent Communication Workshop, a sign that the community has coalesced into something self-sustaining.

Key Figures

Angeliki Lazaridou (Google DeepMind) is arguably the field's central figure. Her 2017 referential-game paper with Peysakhovich and Baroni catalyzed the modern wave of EC research; her 2020 survey with Baroni remains the definitive entry point; and her 2022 EMNLP tutorial codified the field's curriculum.

Marco Baroni (Pompeu Fabra University; formerly Facebook AI Research) has been Lazaridou's most frequent collaborator, bringing a linguist's sensibility to the analysis of emergent protocols. He co-authored the 2020 survey and has shaped much of the theoretical framing around compositionality.

Jakob Foerster (University of Oxford; formerly Facebook AI Research) co-authored the 2016 NeurIPS paper that, more than any other, inaugurated the deep-learning era of EC by demonstrating end-to-end learning of inter-agent communication via deep RL.

Satwik Kottur (Carnegie Mellon University → Meta) delivered the field's most important negative result: the EMNLP 2017 Best Short Paper showing that agents' invented languages, while effective, are emphatically not natural.

Igor Mordatch (OpenAI → Google DeepMind) demonstrated compositional language emergence in physically grounded multi-agent populations, extending the paradigm beyond symbolic referential games.

Dhruv Batra (Georgia Tech / Meta AI), through his collaboration with Kottur and others, systematically investigated the environmental conditions under which compositionality and interpretability arise.

A Chronological Tour of the Landmark Papers

2016: The Starting Gun

Foerster, Assael, de Freitas & Whiteson. "Learning to Communicate with Deep Multi-Agent Reinforcement Learning." NeurIPS 2016.

This paper established the blueprint. It introduced two methods — RIAL (Reinforced Inter-Agent Learning) and DIAL (Differentiable Inter-Agent Learning) — and demonstrated that deep RL agents can learn communication protocols from scratch, without any pre-designed signaling scheme. The motivating question was straightforward: can agents learn, end-to-end, to exchange messages that help them solve cooperative tasks such as riddles and partially observable visual puzzles?

In the same year, Sukhbaatar, Fergus et al. proposed CommNet at NeurIPS 2016, taking a complementary approach based on continuous communication channels with backpropagation-based learning.

2017: The Cambrian Explosion

By any measure, 2017 was the field's annus mirabilis.

Lazaridou, Peysakhovich & Baroni. "Multi-Agent Cooperation and the Emergence of (Natural) Language." ICLR 2017.

Building on Lewis's signaling game, this paper introduced a referential game in which a sender and a receiver must coordinate over pairs of images. Agents were given an arbitrary, initially meaningless vocabulary and had to learn "word meanings" from scratch through cooperative play. The central contribution was twofold: demonstrating that such meanings do emerge, and exploring what modifications to the game environment cause those meanings to better reflect the intuitive semantic structure that humans perceive in images.

Kottur, Moura, Lee & Batra. "Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog." EMNLP 2017 (Best Short Paper).

This paper served as a sobering counterpoint to the optimism surrounding Lazaridou et al. Using a "Task & Talk" referential game as a testbed, Kottur and colleagues showed through a carefully sequenced series of experiments that while agents reliably invent effective communication protocols — achieving near-perfect task reward — those protocols are decidedly neither compositional nor human-interpretable. The crucial positive finding was that tightening the constraints on communication (limiting vocabulary size, restricting symbols per turn) can coax the emergent language toward greater compositionality and human-likeness. The paper thus reframed the question: not whether language emerges, but under what pressures it takes on the structural properties we associate with natural language.

Mordatch & Abbeel. "Emergence of Grounded Compositional Language in Multi-Agent Populations." arXiv 2017 / AAAI 2018.

Where the preceding two papers operated in symbolic or image-based referential games, Mordatch and Abbeel moved to a physically situated setting: embodied agents navigating a 2D environment to accomplish cooperative goals. Agents emitted streams of discrete symbols over time, and the resulting protocols exhibited recognizable vocabulary and syntax. When the linguistic channel was disabled, agents fell back on non-verbal communication — pointing, guiding — an evocative parallel to human gestural communication. The guiding question: can compositional language arise spontaneously in a physically grounded setting?

Havrylov & Titov. "Emergence of Language with Multi-agent Games: Learning to Communicate with Sequences of Symbols." NeurIPS 2017.

This paper compared two optimization strategies for learning discrete-symbol sequences in referential games: REINFORCE and a differentiable relaxation via the straight-through Gumbel-softmax estimator. The latter converged faster and produced more effective protocols, establishing a methodological preference that would influence much subsequent work.

2018: Diversification

Evtimova, Drozdov, Kiela & Cho. "Emergent Communication in a Multi-Modal, Multi-Step Referential Game." ICLR 2018.

This paper introduced cross-modal asymmetry: a sender with access only to visual input and a receiver with access only to textual descriptions. Communication was bidirectional and multi-step, with the receiver autonomously deciding when to terminate the conversation — an adaptive-length protocol that added a layer of pragmatic complexity absent from single-shot games.

Cao, Lazaridou, Lanctot, Leibo, Tuyls & Clark. "Emergent Communication through Negotiation." ICLR 2018.

Moving beyond purely cooperative referential games, this paper studied emergent communication in a semi-cooperative negotiation game with multi-turn interactions. The question shifted from "can agents communicate?" to "what kind of protocol emerges when agents must balance self-interest against cooperation?"

2019–2020: Compositionality Under the Microscope

Chaabouni, Kharitonov, Bouchacourt, Dupoux & Baroni. "Compositionality and Generalization in Emergent Languages." ACL 2020.

A deep dive into the relationship between compositionality and generalization in emergent languages. The central finding was nuanced: compositionality does not arise from simple generalization pressure, but once a language happens upon compositional structure, that structure confers a survival advantage — it is more likely to persist and propagate.

Lazaridou & Baroni. "Emergent Multi-Agent Communication in the Deep Learning Era." arXiv 2020.

The field's definitive survey. Lazaridou and Baroni organized the landscape along two axes: the scientific angle (what can EC tell us about human language evolution?) and the applied angle (how can EC make AI systems more interactive and flexible?). For anyone entering the field, this remains the best place to start.

2022 Onward: Scaling and the LLM Convergence

Chaabouni et al. "Emergent Communication at Scale." ICLR 2022.

An effort to push EC beyond the small-scale referential games that had dominated the field's first half-decade, toward larger, more realistic settings.

Li et al. "Language Grounded Multi-agent Reinforcement Learning with Human-interpretable Communication." NeurIPS 2024.

The most recent milestone. Rather than deciphering emergent languages after the fact, this paper grounds the emergence process itself in human natural language, using LLM-generated synthetic data as an alignment signal. It marks a philosophical pivot: from post-hoc interpretation to a priori grounding.

What the Authors Say About Practical Impact

The researchers themselves articulate four primary avenues for real-world relevance.

Understanding human language evolution. Lazaridou & Baroni (2020) foreground the scientific motivation: studying how language emerges in communities of artificial agents can illuminate the evolutionary pressures — compositionality, ambiguity, grammatical regularity — that shaped human language. EC thus serves as a computational laboratory for evolutionary linguistics.

Improving multi-agent coordination. Virtually every paper in the canon reports that inter-agent communication improves task performance, often dramatically. In partially observable environments, communication is not merely helpful; it is the difference between solvability and intractability.

Enabling human–AI teamwork. The interpretability conditions identified by Kottur et al. and the zero-shot ad-hoc generalization demonstrated by Li et al. (2024) are explicitly motivated by scenarios in which humans and AI agents must collaborate on the fly, without prior co-training.

Designing robust AI communication protocols. Mordatch & Abbeel argue that learned, environment-grounded protocols can be more task-adaptive and bandwidth-efficient than hand-engineered alternatives — a pragmatic case for letting the protocol emerge rather than prescribing it.

Practical Possibilities the Authors Have Not Yet Explored

What follows is a set of applications that, while not directly discussed in the existing literature, follow naturally from its findings.

1. Auditing Internal Communication in AI Systems

When a large-scale AI system comprises multiple sub-modules exchanging opaque internal messages, that opacity becomes a security and alignment risk. The interpretability techniques developed in EC research — vocabulary constraints, compositionality pressure, language grounding — could serve as design principles for making inter-module communication auditable by human overseers. This connects directly to the AI alignment agenda.

2. Cross-Cultural and Cross-Lingual Communication Aids

The mechanisms by which agents with different modalities or different training histories converge on a shared protocol suggest an alternative to conventional machine translation. One could imagine a referential-game-style interface through which speakers of mutually unintelligible languages achieve mutual understanding — not by translating sentences, but by co-constructing meaning in a shared task context.

3. Ad-Hoc Coordination in IoT and Robotic Swarms

In factory robotics and IoT sensor networks, the ability to learn communication protocols on the fly — without pre-engineering — would make systems robust to device addition, replacement, and heterogeneity. The zero-shot ad-hoc generalization demonstrated by Li et al. (2024) is directly relevant: it shows that an agent can coordinate with teammates it has never encountered before, provided the communication space is grounded in a shared structure.

4. Autonomous NPC Communication in Game AI

Non-player characters that develop emergent communication protocols in response to player behavior could generate qualitatively richer, less predictable gameplay. Unlike scripted dialogue trees, emergent NPC communication would adapt to the game state in ways that surprise even the developers.

5. Negotiation Protocols for Autonomous Agent Economies

Cao et al.'s (2018) work on emergent communication in negotiation games extends naturally to settings where autonomous agents transact in markets — smart-contract-based trading, DeFi protocols, supply-chain optimization. The prospect of agents that autonomously evolve efficient bargaining conventions, without hand-coded heuristics, is worth serious investigation.

6. Probing Pre-Linguistic Intelligence

EC research recreates, in silico, the moment when languageless agents produce something language-like. This has underexploited implications for developmental psychology (how do pre-verbal infants bootstrap into language?) and animal behavior (what structures underlie collective coordination in species without symbolic communication?). The bridge between EC and these fields remains largely unbuilt.

Conclusion

In under a decade, Emergent Communication has grown from a handful of papers into a self-sustaining research community with its own workshops, surveys, and shared benchmarks. The field's animating question — will AI agents, like humans before them, invent language out of the pressure to cooperate? — has received a qualified affirmative.

Yes, agents do invent effective communication protocols. But for those protocols to exhibit the compositional, interpretable structure we recognize as "language," additional pressures are required: vocabulary constraints, environmental grounding, generational transmission. Language does not emerge naturally. It must be coaxed.

The latest chapter, exemplified by Li et al. (2024), marks a shift in strategy: from deciphering alien codes after the fact to anchoring emergence in human language from the start. The field is moving from curiosity-driven science toward the engineering of human–AI collaboration. The day when artificial agents and humans communicate in a shared tongue may be closer than it appears.

References

Foerster, J. N., Assael, Y. M., de Freitas, N., & Whiteson, S. (2016). Learning to Communicate with Deep Multi-Agent Reinforcement Learning. NeurIPS 2016.
Sukhbaatar, S., Fergus, R., et al. (2016). Learning Multiagent Communication with Backpropagation. NeurIPS 2016.
Lazaridou, A., Peysakhovich, A., & Baroni, M. (2017). Multi-Agent Cooperation and the Emergence of (Natural) Language. ICLR 2017.
Kottur, S., Moura, J. M. F., Lee, S., & Batra, D. (2017). Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog. EMNLP 2017.
Mordatch, I. & Abbeel, P. (2018). Emergence of Grounded Compositional Language in Multi-Agent Populations. AAAI 2018.
Havrylov, S. & Titov, I. (2017). Emergence of Language with Multi-agent Games: Learning to Communicate with Sequences of Symbols. NeurIPS 2017.
Evtimova, K., Drozdov, A., Kiela, D., & Cho, K. (2018). Emergent Communication in a Multi-Modal, Multi-Step Referential Game. ICLR 2018.
Cao, K., Lazaridou, A., Lanctot, M., Leibo, J. Z., Tuyls, K., & Clark, S. (2018). Emergent Communication through Negotiation. ICLR 2018.
Chaabouni, R., Kharitonov, E., Bouchacourt, D., Dupoux, E., & Baroni, M. (2020). Compositionality and Generalization in Emergent Languages. ACL 2020.
Lazaridou, A. & Baroni, M. (2020). Emergent Multi-Agent Communication in the Deep Learning Era. arXiv:2006.02419.
Chaabouni, R. et al. (2022). Emergent Communication at Scale. ICLR 2022.
Li, H. et al. (2024). Language Grounded Multi-agent Reinforcement Learning with Human-interpretable Communication. NeurIPS 2024.

Appendix: LangGround — Grounding Emergent Language in Human Language via LLMs

Paper Details

Title: Language Grounded Multi-agent Reinforcement Learning with Human-interpretable Communication
Authors: Huao Li, Hossein Nourkhiz Mahjoub, Behdad Chalaki, Vaishnav Tadiparthi, Kwonjoon Lee, Ehsan Moradi Pari, Charles Michael Lewis, Katia P. Sycara
Affiliations: University of Pittsburgh / Honda Research Institute USA / Carnegie Mellon University
Venue: NeurIPS 2024 (Poster)
arXiv: 2409.17348
Project page: https://romanlee6.github.io/langground_web/

Overview

The paper introduces LangGround, a pipeline that aligns the communication space of MARL agents with the embedding space of human natural language. The method proceeds in three stages.

Stage 1: Synthetic data generation via embodied LLMs. Large language models are deployed as embodied agents within cooperative task environments. These LLM agents communicate with one another in natural language while executing team tasks. The resulting utterance–state pairs are collected as a synthetic dataset. Crucially, the LLMs receive only general task rules — no explicit coordination or communication strategies are prescribed.

Stage 2: MARL training with language grounding. Standard MARL agents are trained on the cooperative task, but with a regularization term that pulls each agent's communication vectors toward the natural-language embeddings obtained in Stage 1. This ensures that the agents' communication space is aligned with human language from the outset, rather than drifting into an opaque, agent-internal code.

Stage 3: Evaluation in ad-hoc teamwork. The trained LangGround agents are paired with previously unseen teammates — including LLM agents — that they have never co-trained with. A translation module converts the MARL agents' communication vectors into natural-language messages, enabling on-the-fly collaboration across agent architectures.

Key Results

The paper's contributions distill into three findings.

First, language grounding does not sacrifice task performance. Adding the alignment constraint matches or exceeds the performance of standard MARL baselines.

Second, language grounding accelerates the emergence of communication. The constraint narrows the search space, enabling agents to converge on meaningful protocols faster.

Third, zero-shot ad-hoc generalization becomes possible. Learned protocols transfer to unseen teammates and novel task states without fine-tuning. t-SNE visualizations confirm that the agents' communication vectors form semantically coherent clusters that correspond to natural-language messages in the synthetic dataset.

Relationship to the Emergent Communication Literature

LangGround occupies a precise and consequential position in the EC lineage traced in this article.

A direct response to Kottur et al.'s challenge. Kottur et al. (2017) demonstrated that natural language does not emerge naturally — that agents' invented protocols, however effective, are opaque to humans. LangGround's response is disarmingly simple: if language will not emerge in human-interpretable form on its own, then constrain the emergence process to stay within human language space from the beginning. Rather than deciphering after the fact, ground during learning.

An extension of Lazaridou et al.'s semantic grounding. Lazaridou et al. (2017) showed that manipulating the game environment can nudge emergent word meanings toward human-intuitive semantics. LangGround pushes this idea considerably further by using LLM-generated natural-language utterances as an explicit supervisory signal — a far more direct and powerful form of grounding than environmental manipulation alone.

A complement to Mordatch & Abbeel's physical grounding. Mordatch & Abbeel (2018) grounded language emergence in physical embodiment — agents in a 2D world whose communication was tied to bodily action. LangGround opts for linguistic grounding instead, but the two approaches share a deeper motif: anchoring communication to an external structure (physical or linguistic) in order to obtain compositionality and generalization.

Built on Foerster et al.'s foundation. The MARL communication architectures underlying LangGround (e.g., IC3Net) are direct descendants of the framework Foerster et al. (2016) established. LangGround's novelty lies not in the communication learning mechanism itself, but in the language-grounding layer added on top.

A new evaluation axis: ad-hoc teamwork. Prior EC research overwhelmingly evaluated communication among co-trained agents. LangGround introduces a more demanding and more practical criterion — coordination with teammates never seen during training. This shift from closed-population evaluation to open-ended ad-hoc teaming represents a significant step toward real-world deployment.

A paradigm native to the LLM era. The foundational EC papers surveyed in this article (2016–2018) predate the advent of large language models. LangGround leverages LLMs as teachers — using their natural-language competence to pull MARL agents' emergent communication into alignment with human language. This is a strategy that could not have existed five years earlier, and it signals that EC is evolving from a self-contained subfield of deep RL into a component of the broader LLM-augmented AI ecosystem.

Summary

LangGround addresses the single most persistent shortcoming of the Emergent Communication research program — namely, that emergent languages are unintelligible to humans — by deploying LLMs as a grounding mechanism. It also signals a broader directional shift. The field's early phase (2016–2018) asked: can we get agents to invent language? The current phase asks: can we get them to invent language that humans can understand and use? Emergent Communication is maturing from a fascination with alien codes into an engineering discipline for human–AI collaboration.

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up