PIONEER® by SewerAI is condition assessment & asset management in the Cloud, powered by AutoCode™ AI computer vision.
ML Ops Engineer, AI
Location
United States
Posted
40 days ago
Salary
$130K - $160K / year
Seniority
Senior
Job Description
ML Ops Engineer, AI
SewerAI
• Audit, secure, and optimize our existing cloud infrastructure (AWS) to ensure high availability, fault tolerance, and security for both training and production workloads. • Design and maintain scalable architectures for serving deep learning models (PyTorch/TensorFlow), optimizing for low latency and high throughput in handling complex infrastructure data. • Build and maintain automated pipelines for model testing, validation, deployment, and rollback. • Architect efficient, scalable compute environments for training complex computer vision and time-series models on large datasets. • Implement comprehensive monitoring for model drift, data quality, and system health, ensuring rapid response to performance degradation.
Job Requirements
- 4-6+ years of experience in MLOps, DevOps, or Data Engineering, with a strong emphasis on machine learning workloads.
- A security-first and stability-first mindset—you think about edge cases, failure modes, and system hardening by default.
- Strong collaborative instincts to work closely with Data Scientists, ensuring smooth handoffs from experimentation to production.
- Clear communication skills to articulate architectural decisions and tradeoffs to the broader technical team.
- Deep expertise in AWS (e.g., EC2, S3, EKS, SageMaker, Lambda) and cloud security best practices.
- Strong experience with Docker and Kubernetes for packaging and scaling ML applications.
- Proficiency with tools like Terraform or AWS CloudFormation.
- Experience building robust automated pipelines using GitHub Actions, GitLab CI, or Jenkins.
- Strong Python skills with a focus on writing clean, production-grade, and well-tested code.
- Familiarity with model registry and tracking tools (e.g., MLflow, Weights & Biases).
Benefits
- Medical, Dental, Vision, Basic Life, 401(k), and more
- Unlimited PTO
- Tools and resources to support success
- Competitive compensation with high-growth potential
Related Guides
Related Job Pages
More Machine Learning Engineer Jobs
• Design and implement end-to-end ML systems, including data ingestion, feature processing, model training, and model serving • Architect and deploy scalable AI services supporting real-time and batch inference use cases • Build and maintain ML infrastructure across cloud environments (e.g., EC2, EKS, SageMaker, specialized inference hardware) • Develop and evolve MLOps platforms, including training pipelines, deployment workflows, feature stores, and model observability • Implement CI/CD and infrastructure-as-code patterns to automate model lifecycle management • Optimize model training and inference performance for cost, latency, and hardware efficiency • Monitor production ML systems for accuracy, reliability, and operational health • Partner cross-functionally with data engineering, architecture, governance, and security teams to ensure compliant and scalable solutions • Mentor team members on ML engineering, system design, and operational best practices • Contribute to special initiatives that advance AI platform maturity and engineering standards
• Dataset ownership: define specs; audit and curate large-scale audio/text; close corpus gaps and fix sample-level issues. • Quality instrumentation: build automated gates/metrics (e.g., SNR, clipping, VAD, WER, SV/LID, safety) with dashboards; validate against listening tests. • Classifiers and filters: train lightweight models to tag, score, and filter data (VAD, ASR gating, LID, SV/diarization, noise/safety); calibrate to subjective outcomes. • Cleaning and integrity: apply denoise/dereverb/de-clip when beneficial; deduplicate and decontaminate; prevent leakage; maintain lineage and versioned releases. • Data selection: optimize mixtures via sampling, weighting, curriculum, and active learning; mine hard negatives and long-tail cases. • Tooling and pipelines: ship reproducible ETL and validation; integrate quality gates into training/eval; add monitoring and alerts. • Human-in-the-loop and compliance: run MTurk/vendor annotation with strong QC; ensure consent/licensing/policy compliance; collaborate across teams and document datasets.
Principal Machine Learning Revenue Architect
NBCUniversalNBCUniversal is a media and entertainment company that develops, produces, and markets a variety of entertainment and news programs internationally. NBCUniversal sets out each day
• Work with business and operational teams to understand the problem, the operational and system constraints, the available data, and decision points. • Define how opportunities fit within the long-term evolution of the systems. • Based on the problem and the available data, determine how we can measure success and monitor system outcomes/dynamics. • Define and drive the evolution of the solution, identifying points of high leverage, and phases/evolution of the system. • Evolve plans and direction as we iteratively gather feedback and learn about solution effectiveness. • Collaborate with cross-functional leaders to identify opportunities, communicate and evaluate solutions, and gather feedback. • Mentor of data scientists and analysts, fostering a collaborative and high-performing environment. • Provide technical guidance, set standards, and ensure the team uses appropriate tools and methodologies. • Stay current with emerging trends in data science, AI, and big data technologies to drive innovation and maintain a competitive advantage.
• Design, develop, and validate machine learning models for prediction, classification, segmentation, and optimization use cases. • Perform feature engineering, model selection, hyperparameter tuning, and performance evaluation. • Apply statistical and machine learning techniques to extract insights from structured and semi-structured data. • Ensure models are interpretable, reproducible, and aligned with business objectives. • Build and maintain scalable ML pipelines for training, testing, deployment, and monitoring of models. • Deploy models into production environments using batch and real-time inference patterns. • Implement model versioning, monitoring, drift detection, and retraining strategies. • Partner with the Senior Data Engineer to leverage and enhance data pipelines for ML readiness. • Collaborate with the Senior Data Architect to align model development with enterprise data architecture and governance standards.




