The MK1 Blog
Stay up-to-date with the latest updates from MK1
Unlocking real-time chat with 1M context Llama3-70B model on AMD’s MI300X
At a recent event hosted by TensorWave in San Francisco for the developer community, MK1 showed an open-source 1M context Llama3-70B model running on AMD MI300X hardware.
AMD's MI300X Outperforms NVIDIA's H100 for LLM Inference
There has been much anticipation around AMD’s flagship MI300X accelerator. With unmatched raw specs, the pressing question remains: Can it outperform NVIDIA’s Hopper architecture in real-world AI workloads?
MK1 Flywheel Unlocks the Full Potential of AMD Instinct for LLM Inference
With the release of our new inference engine MK1 Flywheel, we are excited to report that AMD Instinct Series can achieve comparable performance to a compute-matched NVIDIA GPU. We've designed MK1 Flywheel for maximum performance on AMD and NVIDIA chips: our benchmarks demonstrate up to 3.7x higher throughput compared to vLLM.
MK1 Flywheel has the Best Throughput and Latency for LLM Inference on NVIDIA and AMD
MK1 has built an optimized LLM inference runtime, called Flywheel, that now blazes with higher throughput and lower latency on NVIDIA Ampere, Ada, Hopper and on AMD Instinct. Try it on Amazon SageMaker and engage with us for large-scale deployment.