Introducing MK1 Flywheel Beta

Summary

MK1 Flywheel, our inference runtime, is pushing the performance of AI models to the limit of what’s physically possible on GPUs.
[Update] MK1 Flywheel Beta has concluded and is currently in wide deployment serving millions of active users.

Early Access to MK1 Flywheel

Do you ever wonder how companies like OpenAI, Anthropic, and Google serve their large language models economically? For instance, they are able to generate dozens of tokens per second for each user while keeping the costs to a fraction of a penny per request. While these feats of engineering are proprietary, you can be sure that they employ every technique available to optimize their inference stack.

Introducing, MK1 Flywheel. Our goal with Flywheel is to give every company running AI models similar (or better) capabilities as these elite AI powerhouses. For a quick demo, here’s a Llama-2 7B running over twice as fast compared to the baseline model on an RTX 4090 GPU.

MK1 demo

You can learn more about the story behind MK1 Flywheel and how to get access here.

Share this post