Position: ML Engineer - Full Time - Remote
Reports to: Head of Data
Salary Range:
Company Overview:
RunPod is a fast-growing start-up that empowers developer teams to deploy custom, full-stack AI apps simply and at scale. We seek a talented and experienced ML Engineer to join our dynamic team.
Job Summary:
As an ML Engineer, you will be responsible for building the next generation, highly available, global GPU cloud computing service with open-source technologies to enable and accelerate RunPod’s rapid growth.
This system spans many diverse environments (containerization, VMs and bare metal compute) and provides a cohesive and reliable abstraction for running AI workloads in them. You will get to be a technology thought leader, evangelize new, cutting-edge technologies, and solve complex problems. To be successful you have experience practicing infrastructure-as-code. You have strong software development fundamentals and skills. In addition, you have strong systems knowledge and troubleshooting abilities.
Requirements:
- 2+ years experience writing high-performance, well-tested, production quality code
- 2+ years of software development experience and proficiency in python
- Excellent understanding of low level operating systems concepts including multi-threading, memory management, networking and storage, performance, and scale
- Experience working on applied ML/AI products in production
- Knowledge of distributed systems and HPC
- Experience with Tensorflow and JAX is a plus
- Pragmatic, methodical, well-organized, detail-oriented, and self-starting
- Experience with containerization, VPNs, AI workloads a plus
- GPU programming, NCCL, CUDA knowledge a plus
- Experience in at least one backend programming language a plus
- Familiarity with open source inference and training stacks like vLLM, TGI, TensorRT, Torchrun, etc. a plus
- Demonstrated experience with high performance or distributed cloud microservices architectures and ideally experience building them in operation at a global scale a plus
Responsibilities:
- Perform architecture and research work for AI workloads
- Work on the core, RunPod AI platform
- Create services, tools, and developer documentation
- Create testing frameworks for robustness and fault-tolerance
Compensation Package:
RunPod's compensation package comprises three elements: salary, equity, and benefits. We are committed to pay fairness and aim for these three elements to be highly competitive with market rates. On top of this position's salary, equity will be a component of total compensation. The exact amount will be communicated at the time of offer issuance.
Join Us:
At RunPod, you’ll have the opportunity to work on cutting-edge technology and significantly impact the AI and ML fields. We encourage you to apply if you’re driven by innovation excellence and want to be part of a team that values bold ideas and professional growth. Let's shape the future of technology together!
Non-Discrimination in Hiring Practices:
RunPod is committed to maintaining a workplace free from discrimination and upholding the principles of equality and respect for all individuals. Our hiring practices are designed to ensure fairness, objectivity, and inclusiveness, adhering to all applicable laws and regulations regarding nondiscrimination.
#J-18808-Ljbffr