The MK1 Blog
Stay up-to-date with the latest updates from MK1
Cut Costs and Accelerate LLM Inference using MK1 Flywheel on Modal
We are thrilled to announce that our LLM inference engine, MK1 Flywheel, is now available through Modal's developer-friendly cloud service.
MK1 Flywheel Unlocks the Full Potential of AMD Instinct for LLM Inference
With the release of our new inference engine MK1 Flywheel, we are excited to report that AMD Instinct Series can achieve comparable performance to a compute-matched NVIDIA GPU. We've designed MK1 Flywheel for maximum performance on AMD and NVIDIA chips: our benchmarks demonstrate up to 3.7x higher throughput compared to vLLM.
MK1 Flywheel has the Best Throughput and Latency for LLM Inference on NVIDIA and AMD
MK1 has built an optimized LLM inference runtime, called Flywheel, that now blazes with higher throughput and lower latency on NVIDIA Ampere, Ada, Hopper and on AMD Instinct. Try it on Amazon SageMaker and engage with us for large-scale deployment.