Unlocking real-time chat with 1M context Llama3-70B model on AMD’s MI300X
At a recent event hosted by TensorWave in San Francisco for the developer community, MK1 showed an open-source 1M context Llama3-70B model running on AMD MI300X hardware.
read more
AMD's MI300X Outperforms NVIDIA's H100 for LLM Inference
There has been much anticipation around AMD’s flagship MI300X accelerator. With unmatched raw specs, the pressing question remains: Can it outperform NVIDIA’s Hopper architecture in real-world AI workloads? We have some exciting early results to share.
read more
MK1 Flywheel Unlocks the Full Potential of AMD Instinct for LLM Inference
There has been much anticipation around AMD’s flagship MI300X accelerator. With unmatched raw specs, the pressing question: Can it outperfom NVIDIA’s Hopper architecture in real world AI workloads?
read more
MK1 Flywheel has the Best Throughput and Latency for LLM Inference on NVIDIA and AMD
MK1 has built an optimized LLM inference runtime, called Flywheel, that now blazes with higher throughput and lower latency on NVIDIA Ampere, Ada, Hopper and on AMD Instinct. Try it on Amazon SageMaker and engage with us for large-scale deployment.
read more
Introducing MK1 Flywheel Beta
We are pushing the performance of AI models to the limit of what’s physically possible. Introducing MK1 Flywheel, our enterprise LLM inference solution. Contact us to become an early partner in the closed beta trial.
read more