MLOps Engineer
Tam Ştat
Başlanğıc Səviyyə
Razılaşma yolu ilə
Yerləşdirilib 12 fevral 2026
Son Tarix: 12 mart 2026
Təsvir
About the Role:
We are seeking an experienced MLOps Engineer to design, build, and maintain the infrastructure and tools that enable our data science and machine learning teams to develop, deploy, and monitor production ML systems at scale. You will bridge the gap between data science and operations, ensuring reliable, efficient, and reproducible ML workflows.
Responsibilities:
Infrastructure & Platform Development
Design and implement scalable ML infrastructure on premises and cloud platforms
Build and maintain ML experimentation and production environments
Develop and manage container orchestration systems for ML workloads
Implement GPU resource management and optimization strategies
Design storage solutions for datasets, models, and artifacts
ML Pipeline & Automation
Create CI/CD pipelines for ML model training, validation, and deployment
Implement automated model retraining and versioning systems
Build orchestration workflows for data processing and model training
Develop automated testing frameworks for ML models and pipelines
Design and implement feature stores for feature engineering and reuse
Monitoring & Operations
Implement model monitoring systems for performance, drift, and data quality
Set up logging, alerting, and observability for ML systems
Establish model governance and compliance tracking
Create dashboards for model performance and infrastructure metrics
Develop incident response procedures for production ML systems
Collaboration & Best Practices
Partner with data scientists and AI engineers to productionize ML models
Establish MLOps best practices and standards across teams
Provide technical guidance on deployment architectures
Document processes, systems, and runbooks
Mentor junior engineers and data scientists on MLOps practices
Requirements:
Education
Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience)
Master's degree preferred but not required with sufficient practical experience
Experience
2+ years working as ML/Software/DevOps Engineer
Proven track record of building production ML systems at scale
Experience supporting data science teams in enterprise environments
Technical Skills
Strong proficiency in Python and some experience with at least one low-level programming language (C/C++, Go, Rust)
Deep understanding of containerization (Docker, Kubernetes)
Hands-on experience with CI/CD tools (Jenkins, GitLab CI, GitHub Actions, etc.)
Knowledge of ML frameworks (TensorFlow, PyTorch, scikit-learn)
Experience with workflow orchestration (Airflow, Kubeflow, Prefect, etc.)
Hands-on experience with experiment tracking tools (MLflow, ClearML)
Core Competencies
Solid understanding of ML lifecycle and model development processes
Strong Linux/Unix systems administration skills
Experience with version control systems (Git) and branching strategies
Knowledge of networking, security, and compliance in cloud and on-prem environments
Understanding of distributed computing and parallel processing
Knowledge of microservices architecture and API design
Soft Skills:
Strong problem-solving and debugging abilities
Excellent communication skills with both technical and non-technical stakeholders
Ability to work independently and manage multiple priorities
Collaborative mindset with emphasis on enabling others
Adaptability to rapidly changing technology landscape
Pragmatic approach to balancing innovation with reliability
Preferred Qualifications:
If you know at least 3+ skills from the sections below, please apply.
Technical skills:
Experience with cloud platforms (Azure ML, AWS SageMaker, or GCP Vertex AI)
Experience with GitOps practices and tools (ArgoCD, Flux, GitLab with GitOps) for declarative infrastructure and ML pipeline management
Experience with feature stores (Feast, Tecton, Hopsworks, or similar)
Experience with model monitoring solutions (Evidently, WhyLabs, Fiddler, Arize, Whylogs)
Experience with ML explainability tools (SHAP, LIME, Captum, Alibi, InterpretML)
Hands-on experience with hyperparameter optimization tools (Optuna, Ray Tune, Hyperopt, Katib)
Experience with distributed training frameworks (Ray Train, Horovod, DeepSpeed, PyTorch DDP, Megatron)
Experience with model serving frameworks (TensorFlow Serving, TorchServe, Triton, MLServer, or similar)
Experience with data versioning tools (DVC, Pachyderm, LakeFS)
Experience with GPU optimization (CUDA, TensorRT, ONNX Runtime, flash-attention)
Knowledge of GPU allocation, sharing, management and profiling
LLM Ops:
Experience with LLM inference frameworks (vLLM, TGI, TensorRT-LLM)
Familiarity with agent orchestration frameworks (LangChain, LangGraph, LlamaIndex)
Experience with LLM optimization: quantization, KV cache management, continuous batching
Experience with prompt engineering and versioning tools (LangSmith, PromptLayer, Weights & Biases Prompts, Helicone)We offer
5/2, 09.00-18.00;
Meal allowance;
Annual performance bonuses;
Corporate health program: VIP voluntary insurance and special discounts for gyms;
Access to Digital Learning Platforms.
Interested candidates can apply by clicking the link provided in the**"Apply"** button.
Necə Müraciət Etmək Olar
Caspian Innovation Center
Vakansiya Təfərrüatları
Vakansiya ID
#8373
İş Növü
Tam Ştat
Təcrübə Səviyyəsi
Başlanğıc Səviyyə