Senior MLOps Engineer
We are seeking a skilled Senior MLOps Engineer to join our team and drive the deployment, automation, and optimization of machine learning models in production. This role requires hands-on expertise in MLOps, cloud infrastructure, and pipeline automation, as well as experience working with distributed teams. If you are passionate about scaling AI solutions and ensuring model reliability in real-world applications, this opportunity is for you.
Key Responsibilities
Model Deployment & Infrastructure
* Design, build, and manage scalable, cloud-based infrastructures for deploying machine learning models in production environments (GCP, Azure).
* Develop and maintain CI/CD pipelines tailored for ML/NLP model deployments.
* Utilize Kubernetes and Docker for containerization and orchestration of models.
* Implement versioning, governance, and monitoring tools using MLOps frameworks such as MLFlow, Kubeflow, or DVC.
* Ensure compliance and security best practices in handling sensitive data.
Pipeline Automation
* Develop and maintain automated workflows for training, validation, deployment, and retraining of ML models.
* Work closely with data engineers and data scientists to streamline data preparation, model training, and evaluation processes.
* Implement mechanisms for automated model retraining based on performance metrics and evolving datasets.
Monitoring & Maintenance
* Build and integrate real-time monitoring systems to track model performance and detect issues such as drift and degradation.
* Optimize ML pipelines and infrastructure for high availability, fault tolerance, and scalability.
* Collaborate with engineering teams to resolve production issues related to models and pipelines.
Collaboration & Leadership
* Mentor junior MLOps engineers and provide technical guidance.
* Partner with data scientists and ML engineers to ensure a smooth transition of models from development to production.
* Contribute to Agile processes, participating in Scrum ceremonies and fostering a culture of continuous improvement.
Required Skills & Experience
MLOps & Infrastructure
* Proven experience deploying machine learning models into production and managing their lifecycle.
* Hands-on experience with MLOps tools (MLFlow, Kubeflow, DVC, Weights and Biases, etc.).
* Strong knowledge of cloud platforms (GCP preferred, Azure also relevant).
* Proficiency in Kubernetes and Docker for deploying and managing containerized applications.
* Experience with infrastructure as code (Terraform, Helm) for cloud resource management.
* Understanding of GPU-accelerated computing for large-scale model inference.
Automation & Development
* Expertise in automating CI/CD pipelines for ML workflows using tools like Jenkins, GitLab CI/CD, or similar.
* Strong programming skills in Python, with additional experience in Scala and/or Java.
* Experience with ML frameworks and libraries, as well as distributed computing systems (Spark).
* Knowledge of software development best practices, including BDD/TDD and API development in Python.
Collaboration & Leadership
* Experience mentoring and guiding engineering teams within an agile environment.
* Strong communication skills, with the ability to collaborate effectively across teams and time zones.
Preferred Experience
* Experience with feature stores, embeddings, LLMs, and Retrieval-Augmented Generation (RAG) architectures.
* Optimization of ML models for specialized hardware, including GPUs.
* Familiarity with DevOps principles and automation tools for infrastructure management.
* Strong adherence to software testing methodologies (BDD/TDD).
* Knowledge of parallel computing concepts and high-performance computing (HPC).
What We Offer
* Competitive salary
* Performance-based company bonus
* Ongoing professional development opportunities
* Subscription to wellness apps