Why Do Multi-Agent LLM Systems Fail? Lessons from MAST and How They Show Up in Finance

Multi-agent systems built on large language models (LLMs) are evolving fast. They are showing up in productivity tools, autonomous simulations, and increasingly, in financial applications from portfolio construction to trade execution.

But anyone who has built or evaluated these systems knows the truth: they fail—and not just occasionally, but in ways that are subtle, unpredictable, and frustrating.

Recently, I came across a paper that finally gave structure to this chaos: “Why Do Multi-Agent LLM Systems Fail?” The authors introduce MAST, the Multi-Agent System Failure Taxonomy—a detailed framework that breaks down why multi-agent systems go wrong, not just how.

What Is MAST?

MAST is the first empirically grounded taxonomy designed to classify failure modes in multi-agent LLM systems. It is based on the analysis of seven frameworks across more than 200 tasks, annotated by experts with high agreement (Cohen’s κ = 0.88). To scale, the team developed an LLM-as-a-Judge evaluator to detect these failures automatically.

It groups failures into three categories, with 14 specific failure types:

  1. Specification Issues
  2. Inter-Agent Misalignment
  3. Task Verification Failures

MAST Failure Categories With Financial Agent Examples

1. Specification Issues (System Design)

2. Inter-Agent Misalignment (Agent Coordination)

3. Task Verification Failures (Quality Control)

Why MAST Matters in Financial Multi-Agent Systems

In finance, failure is not just an inconvenience—it is a risk. Poor coordination between a signal generator and execution agent can lead to real losses. The lack of verification can mean faulty strategies go live. MAST gives us:

Going Forward: What I am Applying

Inspired by MAST, here is what I am starting to do in my own financial multi-agent workflows:

Final Thoughts

MAST does not just explain why LLM agents fail—it gives us tools to build smarter. If you are designing LLM agents for high-stakes applications like trading, portfolio management, or risk analysis, this taxonomy is a must-know. It helped me turn chaos into structure, and that is the first step toward reliable, scalable AI systems.