Mistral AI is hiring an expert in the role of serving and training large language models at high speed on GPUs. The role is based in San Francisco.
The role will involve:
- Writing low-level code to take full advantage of high-end GPUs (H100) and maximize their capacity.
- Rethinking various parts of the generative model architecture to make them more suitable for efficient inference.
- Integrating low-level efficient code into a high-level MLOps framework.
The successful candidate will have:
- High technical competence in writing custom CUDA kernels and pushing GPUs to their limits.
- High expertise in the distributed computation infrastructure of current generation GPU clusters.
- Overall understanding of the field of generative AI, with knowledge or interest in fine-tuning and using language models for applications.
About Mistral AI:
Mistral AI is a European company training large generative models for the industry. It releases technology in a fully transparent way; a significant part of its IP is shared with permissive open-source software. Mistral AI intends to be a technical leader in the open-source generative AI community.
We're a small team, mostly composed of seasoned researchers and engineers in the field of AI. We like to work hard and be at the edge of science. We are creative, low-ego, team-spirited, and have all been passionate about AI for years. We hire people who thrive in competitive environments because they find them more enjoyable to work in. We hire passionate individuals from all over the world.
#J-18808-Ljbffr