MLOps Engineer (AI Platform)
Omilia
- Praha
- Trvalý pracovní poměr
- Plný úvazek
- Architect and build the automated, scalable infrastructure that powers our entire suite of AI models—from Agentic AI and NLU to Voice Biometrics and ASR—ensuring they operate flawlessly and securely for millions of users.
- Make key technical decisions, establishing the patterns, tools, and best practices that will guide our machine learning operations for years to come.
- Collaborate closely with world-class researchers, data scientists, ML engineers, and cloud architects to translate cutting-edge research into robust, production-grade products.
- Champion a culture of automation, governance, and performance across all our AI/ML initiatives.
- Infrastructure as Code (IaC) Foundation: You will design and implement our entire MLOps infrastructure on AWS from the ground up using Terraform, establishing best practices for security, scalability, and cost-efficiency.
- CI/CD for Machine Learning: You will build and own the end-to-end CI/CD pipelines using GitLab and Jenkins, automating everything from model training and validation to canary deployments and production rollbacks.
- Containerization & Orchestration at Scale: You will lead the productization of our complex ML models, containerizing them with Docker and deploying them on a robust Kubernetes platform that you will help architect, build, and manage with Helm.
- Proactive Observability: You will establish a culture of deep system insight by implementing and managing a comprehensive observability stack (e.g., Prometheus and Grafana), ensuring our models meet stringent performance, reliability, and security SLAs.
- 5+ years in a Senior DevOps, SRE, or MLOps role with a focus on production systems
- Deep expertise in architecting and managing Kubernetes clusters in a production environment.
- Proven mastery of at least one major IaC tool (Terraform is strongly preferred).
- Strong proficiency in a systems-level scripting language (e.g., Python, Go).
- A track record of building and maintaining CI/CD pipelines for critical production services.
- Direct experience deploying and managing specific ML models (e.g., Agentic AI, NLU, ASR, TTS).
- Experience with dedicated ML workflow orchestration tools (e.g., Kubeflow, Apache Airflow).
- Familiarity with ML experiment tracking and model registry tools (e.g., MLflow, SageMaker Model Registry).
- Experience deploying models on specialized hardware (e.g., GPUs, Inferentia, Trainium, etc.).
- Fixed compensation;
- Long-term employment with the working days vacation;
- Development in professional growth (courses, training, etc);
- Being part of successful cutting-edge technology products that are making a global impact in the service industry;
- Proficient and fun-to-work-with colleagues;
- Apple gear.