Fastest AI Model ever – ██FR█████ █INTELL███████████

This content originally appeared on DEV Community and was authored by Faizal Ardian Putra

I was just reading a paper about Mercury, an AI language model that generates text and code incredibly fast by using a different method called “diffusion.”

Before I explain it a bit more, if you’re interested in reading the full paper, you can check it out at: https://arxiv.org/pdf/2506.17298

How Traditional AI Models Work
Mercury is an AI model similar to GPT, Gemini, Claude, etc., but the difference lies in how it generates answers.

Traditional AI models use autoregressive processing, working sequentially like a person talking — word by word, or in AI terms, token by token. Each token depends on the previous one. This approach is accurate but slow because of its one-by-one process. It’s like someone writing a sentence word by word.

How Mercury Works: Diffusion
Mercury, however, uses what’s called diffusion. This model works in parallel, starting with a rough “noisy” draft of the entire answer and refining it all at once over several steps. It generates multiple tokens simultaneously.

Think of it like a sculptor starting with a rough block of stone and quickly chipping away to reveal the final statue. This approach is much faster on modern computer chips (GPUs).

Their big idea is: From fixing to creating
If you can train an AI to be incredibly good at fixing something that’s broken, you can use that skill to create something new from scratch.

How they train the AI
Before the AI can do anything useful, it needs to be trained. This is how Mercury trains the AI:

Start with something perfect: They begin with something flawless, like a piece of code that actually runs.
Systematically ruin it: They intentionally add “noise” or damage to the code, such as adding gibberish words, deleting some, or shuffling others. This is done in stages, from slightly worse to completely unreadable and nonsensical.
Teach the AI to fix it: They show the AI the damaged version and tell it: “This is a broken version of the original. Your only job is to figure out what the original, perfect version looked like.”
They repeat this millions of times with millions of examples. The AI doesn’t learn how to create code; it learns how to restore broken code to its original, perfect state. This is called a denoising objective.

How to generate a new answer
This is where the magic happens. Now that they have an expert “restorer” AI, how can they ask it to write new code to answer a prompt like “write a Python function to sort a list?”

Give it pure chaos: Instead of providing damaged code like in the training phase, they give it completely random gibberish that matches the length of the expected answer.
Ask the AI to “fix” it: They tell the AI: “This is a broken version of an actual perfectly written Python function to sort a list that has been damaged 100%. Restore it.”
The AI gets to work: The AI looks at the random noise and, using its restoration training, makes its best guess.
Step 1: The gibberish becomes a faint, blurry outline of code structure.
Step 2: The blurry structure sharpens into recognizable words like def, return, brackets, etc.
Step 3: The words arrange themselves into a final, perfect working piece of code.
The Secret to its speed: Working in parallel
Modern GPUs are like having thousands of tiny workers. The old way of AI only gives one worker a job at a time. The Mercury approach gives all thousands of workers a job to do simultaneously, which is why it’s incredibly fast.

A Final Piece: The AI’s “Brain”
The paper mentions the “Transformer architecture.” You can simply think of this as the specific design of the AI’s brain. They chose a very popular and well-understood brain design. This was a smart move because it meant they could use all the existing tools and optimizations that have been built for it over the years, making their job much easier.

So, in summary: they taught a popular type of AI brain a new trick (unscrambling) which allows it to generate entire answers at once, making it dramatically faster.

You can try it yourself at https://chat.inceptionlabs.ai/

This content originally appeared on DEV Community and was authored by Faizal Ardian Putra