Imagine the future of business. Ideas for a Digital Renaissance.
Azure DevOps Engineer, ML Ops Engineer
Location
Latin America
Posted
4 days ago
Salary
0
Seniority
Lead
Job Description
Azure DevOps Engineer, ML Ops Engineer
In All Media
• Drive AI-Driven Automation: Design, implement, and experiment with cutting-edge workflows that apply AI/LLMs to automate complex DevOps operations, reducing human intervention and operational friction. • Architect Scalable Cloud Infrastructure: Oversee and optimize robust infrastructure solutions within Azure, ensuring top-tier performance, security, and monitoring. • Define CI/CD & IaC Strategy: Champion best-in-class Infrastructure as Code (IaC) and continuous integration/deployment strategies to enable seamless, reliable software delivery. • Act as a Strategic Technical Advisor: Influence architectural decisions and system design without needing direct authority, fostering a strong culture of research and innovation. • Enhance System Reliability & Observability: Participate in deep architectural reviews, code reviews, and system design sessions to ensure high availability and cost-efficiency.
Job Requirements
- 7+ years of experience in DevOps, Cloud Engineering, or Infrastructure architecture.
- Deep Azure Expertise: Proven mastery of Azure services, specifically Azure Kubernetes Service (AKS), cloud networking, compute, and native monitoring tools.
- Advanced Infrastructure as Code (IaC): Deep hands-on experience using Terraform, Bicep, or ARM templates.
- Robust Automation & Scripting: Strong programming proficiency in Python, Bash, or PowerShell to build custom automation tooling.
- AI/LLM Integration: Genuine, practical experience applying Large Language Models (LLMs) or AI tools to optimize real-world DevOps workflows.
- CI/CD Mastery: Solid background managing enterprise pipelines with GitHub Actions or Jenkins.
- English fluency for daily communication.
Benefits
- Flexible work arrangements
- Professional development
- Remote work options
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Role Description We are looking for a Staff Site Reliability Engineer to join our SRE Squad. This is a technical leadership role for someone who can set the direction of our cloud infrastructure strategy while still being deeply hands-on. Our team is responsible for governance and observability of the Oura AWS infrastructure. We combine the power of the ring and the app with backend services and integrations to provide a data-rich and secure platform. Our APIs power most Oura apps, services, and machine learning components. Good reliability and scalability of the Cloud platform provides the technical foundation for our growth. As a Staff Engineer, you will operate with ownership that spans multiple teams and systems, driving high-impact solutions, setting technical direction, and raising the bar for engineering excellence across the organization. You will be the go-to technical leader for our most complex infrastructure challenges. This is a remote US role. We have offices in San Francisco and San Diego for those who prefer hybrid or office settings. Oura employees in other major cities (like Boston and New York) occasionally gather informally at local co-working locations. What You’ll Do - Technical Strategy & Architecture: Set the technical direction for Oura’s AWS infrastructure and cloud platform. Define and drive the long-term architecture for reliability, scalability, and cost efficiency across all production systems. - Infrastructure as Code Leadership: Own and evolve Oura’s infrastructure-as-code platform, establishing standards and patterns that teams across the organization adopt. Lead migrations of services onto shared best practices. - Observability & Fault Tolerance: Architect and implement organization-wide observability, monitoring, and alerting strategies. Design fault-tolerant systems that handle user demand peaks and degrade gracefully under failure conditions. - Cross-Team Project Leadership: Plan, scope, and execute complex, multi-team infrastructure initiatives. Lead and coordinate rollouts and phased releases of major platform changes, including cross-team migrations. - Deployment & Release Engineering: Own the evolution of deployment pipelines and dependency management to ensure fast, robust, and safe testing and release of code across the engineering organization. - Operational Excellence: Set the standard for operational excellence across the engineering org. Define and maintain SLAs, lead incident response for the most complex production issues, and drive a culture of reliability and continuous improvement. - Security & Compliance: Ensure that our platform adheres to the latest security and compliance regulations. Advocate for privacy-by-design principles across all infrastructure decisions. - Mentorship & Culture: Identify growth opportunities and coach engineers across teams to become stronger infrastructure practitioners and leaders. Share knowledge via documentation, tech talks, and design reviews. Influence engineering culture and build a culture of recognition. - On-Call Leadership: Take part in and improve on-call processes. Lead troubleshooting of the most complex cross-system production issues and effectively manage crisis situations. Qualifications - A seasoned infrastructure leader: You have 8+ years of backend development and infrastructure experience, with a track record of leading complex, cross-team technical initiatives to successful delivery. - An architectural thinker: You have architected and built data-intensive distributed systems in production environments at scale. You know when to apply the right architectural patterns and can make pragmatic tradeoffs between short-term goals and long-term technical investment. - A technical force multiplier: You solve technical problems that few others can and enable your teams to tackle the hardest challenges in the domain. You are a role model for engineering excellence and set standards on system designs and coding practices. - Deep in AWS: You have strong experience running, monitoring, and debugging production systems at scale on AWS. You are fluent with key AWS services like EKS, RDS, S3, SQS, Kinesis, Lambda, and DynamoDB, and can make informed decisions about service selection and architecture. - A Kubernetes expert: You have extensive experience running and orchestrating containers with EKS or similar platforms. You can design and optimize Kubernetes configurations for reliability, security, and cost efficiency at scale. - A systems-level problem solver: You have experience building production systems on serverless architectures, designing robust deployment pipelines (experience with GitHub Actions is a bonus), and operating complex infrastructure with a mindset of operational excellence and cost efficiency. - A strong communicator: You can clearly explain complex technical problems with data and analysis to both engineering and cross-functional audiences. You are frequently sought out by product managers and engineering leaders to help shape technical direction. - A leader and mentor: You actively drive alignment across squads and missions, mediate technical disagreements, and coach engineers to become better leaders. You thrive in ambiguity and help teams navigate complex, undefined problems. Bonus Points - Experience in healthcare, wearable technology, or supporting large enterprise customers. - Strong experience with database management and data pipeline optimization. - Solid programming skills in languages such as Python, Go, or JavaScript/TypeScript. - Experience defining and driving SLO/SLI frameworks across an organization. - Track record of contributing to or leading open-source infrastructure projects. Benefits - Competitive salary and equity packages - Health, dental, vision insurance, and mental health resources - An Oura Ring of your own plus employee discounts for friends & family - 20 days of paid time off plus 13 paid holidays plus 8 days of flexible wellness time off - Paid sick leave and parental leave - Oura takes a market-based approach to pay, which may vary depending on your location. US locations are categorized into tiers based on a cost of labor index for that geographic area. - Region 1: $198,050-$233,000 - Region 2: $180,200-$212,000 - Region 3: $169,150-$228,850
Role Description We are seeking a Senior/Staff DevOps Engineer who thrives in a fast-paced startup environment and is passionate about building reliable, secure, and scalable infrastructure alongside a talented team. The ideal candidate blends deep operational expertise with strong software engineering instincts, partnering closely with engineering, security, and product stakeholders to design and operate the platform that powers identity verification for customers around the world. - Improve and develop infrastructure and reliability across the full product lifecycle, from architecture and provisioning through deployment, monitoring, and continuous improvement. - Ensure every service is observable, performant, and resilient, providing the team with the automation, tooling, and guardrails needed to ship quickly and safely. - Treat security, privacy, and regulatory compliance as first-class, non-negotiable requirements in every system built. - Maintain and advance compliance posture across frameworks such as SOC 2, ISO 27001, GDPR, etc. - Foster a culture of shared accountability around operational excellence and security by leading incident response and facilitating open conversations about risk. - Leverage modern AI-powered tooling to accelerate infrastructure and platform work. Qualifications - 6+ years of experience in DevOps, Site Reliability, Platform, or Infrastructure Engineering within a software engineering organization. - Deep expertise with a major cloud provider (GCP preferred) and strong understanding of networking, security, and distributed systems. - Strong hands-on experience with infrastructure-as-code (Terraform, Pulumi, and/or CloudFormation) and configuration management. - Production experience with containers and orchestration (Docker, Kubernetes, or ECS) and with building robust CI/CD pipelines (GitHub Actions, CircleCI, or similar). - Proficiency with observability and monitoring stacks (Datadog, Prometheus/Grafana, CloudWatch, or equivalent). - Solid scripting and programming skills (Python, Go, Bash, or TypeScript/Node) to build automation and tooling. - Strong grasp of cloud security best practices: IAM and least-privilege, secrets management, encryption, network security, and vulnerability management. - Hands-on experience supporting compliance frameworks such as SOC 2, ISO 27001, GDPR, HIPAA including control implementation, evidence and audit readiness, and compliance automation. - Proficiency with AI-powered development tools such as Claude Code, Cursor, GitHub Copilot, or equivalent. - Experience leading incident response and participating in an on-call rotation for production systems. - Excellent written and verbal communication skills; ability to document systems clearly and write actionable runbooks. - Experience working in a startup environment is required. - Experience managing or collaborating with distributed teams is essential. - Familiarity with identity verification products or AI/ML-based solutions is a plus. Requirements - Design, build, and operate cloud infrastructure (GCP preferred) using infrastructure-as-code, with an emphasis on repeatability, security, and cost efficiency. - Own and continuously improve CI/CD pipelines, automated integration and unit testing, provisioning, deployments, and rollbacks. - Build and maintain observability across the platform, including monitoring, logging, tracing, alerting, and meaningful dashboards. - Improve and advance security posture: secrets management, encryption in transit and at rest, IAM and least-privilege access, network segmentation, and vulnerability management. - Drive compliance readiness by partnering with security and leadership to maintain, automate, and provide evidence for controls. - Lead incident response and the on-call rotation; drive blameless postmortems and reduce mean-time-to-recovery. - Define and uphold reliability targets (SLOs/SLIs), capacity planning, and performance tuning. - Leverage AI-powered tooling to accelerate infrastructure-as-code, automation, and internal tooling. - Partner with engineering to improve developer experience and deployment velocity. - Drive a culture of operational excellence, reliability, security, and continuous improvement. - Set technical direction for platform and infrastructure, and mentor engineers on DevOps, reliability, and security best practices. - Continuously evaluate and adopt emerging AI-powered tools and workflows. Benefits - Equity compensation - Remote working environment - Self-managed paid time off - 11+ annual company holidays - 401(k) - Health Care Benefits: Medical, Vision, Dental - Wellness benefits: EAP, LifeHealth Online, One Medical, Perkspot - Parental leave
• Evolve and maintain the enterprise Kubernetes platform (AKS/EKS), ensuring scalability, security and high availability of the environments; • Build and enhance infrastructure and operations automation using Infrastructure as Code and GitOps practices; • Develop and maintain CI/CD pipelines, supporting teams in the continuous delivery journey; • Implement observability, monitoring and distributed tracing solutions to ensure visibility and reliability of services; • Respond to critical incidents, perform root cause analysis and implement continuous improvements to the platform; • Support development teams in adopting cloud, Kubernetes, observability and automation best practices; • Evolve the internal engineering platform to improve developer experience and accelerate delivery of business value; • Implement and optimize autoscaling strategies, capacity management and operational efficiency for cloud environments; • Collaborate with cross-functional teams to define architecture, security and governance standards for Azure and AWS environments; • Evaluate, test and implement new solutions and technologies focused on Platform Engineering, SRE and enterprise automation.
DevOps Engineer / Linux Administrator
ASM ResearchIt is the policy of ASM that an individual's race, color, religion, sex, disability, age, sexual orientation or national origin are not and will not be considered in any personnel or management decisions. We affirm our commitment to these fundamental policies. All recruiting, hiring, training, and promoting for all job classifications is done without regard to race, color, religion, sex, disability, or age. All decisions on employment are made to abide by the principle of equal employment.
Role Description The DevOps Engineer / Linux Administrator supports and enhances enterprise Linux environments through automation, infrastructure management, CI/CD pipeline development, and system administration. This role is responsible for maintaining secure, reliable, and scalable Linux-based platforms while partnering with development, security, and operations teams to improve deployment efficiency, system performance, and operational stability. - Administer, maintain, troubleshoot, and optimize enterprise Linux environments. - Perform Linux system logging, auditing, patching, and performance tuning across production and non-production systems. - Develop and maintain automation solutions – including providing scripting – for Linux administration and other applications related processes utilizing Jenkins and Ansible Core. - Troubleshoot and manually find and resolve Linux issues. - Build and set up new development tools and infrastructure utilizing knowledge in continuous integration, operational delivery, deployment management (CI/CD), cloud technologies, container orchestration, and security. - Modify existing software and scripts to correct errors, adapt to new infrastructure requirements, and improve performance. - Analyze user needs and technical requirements to determine the feasibility of design and implementation within time and cost constraints. - Collaborate with developers, engineers, security teams, and other stakeholders to design systems and define interfaces, capabilities, and performance requirements. - Build and test end-to-end CI/CD pipelines to ensure the systems are safe against security threats. - Provide accurate and realistic work effort estimates, commit, and deliver results accordingly. - Create and maintain technical documentation, operational procedures, and knowledge transfer materials. Qualifications - 3+ years of experience implementing, administering, and troubleshooting Linux in an enterprise environment including Linux patching with DNF and YUM. - Strong experience building and supporting CI/CD pipelines using tools. Must have strong working knowledge of Jenkins (groovy), Ansible Core (yaml), GitLab CI/CD, FlexDeploy, or similar technologies. - Strong experience with Ansible and Jenkins. - Strong knowledge of DNS/Networking and networking debugging with packet capture. - Strong scripting knowledge in Python, Bash, Zsh, Ksh, Csh. - Strong configuration management knowledge and experience. - Experience working with REST APIs. - Experience working in secure environments. - Experience in an OCI environment on virtual images. - Strong verbal, written, organizational, and process documentation skills. Requirements - Bachelor’s degree in Computer Science, Information Technology, Engineering, or a related field, or equivalent relevant experience. - Strong hands-on experience with Linux administration, including patching with DNF and YUM, logging, auditing, performance tuning, and issue resolution. - Experience with scripting and automation using several of the following: Python, Bash, Zsh, Ksh, or Csh. - Experience working with REST APIs and integrating automation with external systems. - Strong knowledge of DNS, networking fundamentals, and network troubleshooting, including packet capture analysis. - Experience working in secure environments with a strong understanding of operational discipline and system hardening. - Experience with configuration management and infrastructure automation. - Experience supporting Linux systems in OCI environments using virtual images. - Ability to provide accurate effort estimates, manage assigned priorities, and deliver work as committed. - Strong verbal, written, organizational, and technical documentation skills. - Experience supporting Linux platforms in highly regulated or government-secured environments. - Familiarity with container orchestration, cloud-native deployment practices, and secure CI/CD implementations. - Experience building hardened Linux images and supporting secure software delivery pipelines. - Experience partnering across development, operations, and cyber security teams to improve deployment efficiency and platform reliability. - Proven ability to identify process improvement opportunities and implement automation that reduces manual administration. - Secret clearance required. - U.S. citizenship required. - Ability to work remotely. - No travel required. Benefits - Compensation ranges for ASM Research positions vary depending on multiple factors; including but not limited to, location, skill set, level of education, certifications, client requirements, contract-specific affordability, government clearance and investigation level, and years of experience. - The compensation displayed for this role is a general guideline based on these factors and is unique to each role. - Monetary compensation is one component of ASM's overall compensation and benefits package for employees.



