Job Closed

This listing is no longer active.

TrueML logo
TrueML

TrueML is a fintech company building software to create positive experiences for consumers seeking financial health.

Senior DevOps Engineer

DevOps EngineerDevOps EngineerFull TimeRemoteSeniorTeam 51-200Since 2013H1B No SponsorCompany SiteLinkedIn

Location

United States

Posted

3 days ago

Salary

$120K - $155K / year

Seniority

Senior

Job Description

Senior DevOps Engineer

TrueML

• Implement the technical roadmap for Infrastructure as Code (IaC), CI/CD evolution, and cloud-native architecture to support TrueML’s scaling needs. • Design, develop, and maintain self-service internal platforms to reduce developer cognitive load, enabling feature teams to deploy and manage services with minimal friction at increased velocity. • Act as a core steward for cloud spend (AWS), proactively identifying and driving cost-optimization initiatives across our infrastructure. • Build and maintain infrastructure architecture that supports strict High Availability (HA) requirements and robust Disaster Recovery (DR) protocols across multiple regions. • Implement and evolve comprehensive monitoring, logging, and distributed tracing systems, leveraging AIOps to move from reactive to predictive system maintenance.

Job Requirements

  • Bachelor's degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience.
  • 6+ years of experience in DevOps, Site Reliability Engineering (SRE), or Software Engineering, working within high-performing senior engineering teams.
  • Expert-level mastery with AWS and hands-on experience managing multi-region, high-availability deployments.
  • Advanced experience with Kubernetes (K8s) and Docker, including cluster management, networking, and scaling in production environments.
  • High proficiency in Terraform to drive consistency and automation across all infrastructure layers (Experience with Atlantis is a plus).
  • Deep experience designing and maintaining complex pipelines (GitHub Actions, GitLab CI, or Jenkins) and mastery of scripting languages like Python, Go, or Bash.
  • Hands-on experience with modern monitoring, observability, and tracing stacks (Datadog, Observe) and a firm grasp of SRE principles (SLIs/SLOs/Error Budgets).
  • Experience acting as an Incident Commander or critical responder for high-severity outages.
  • Experience integrating AI-assisted productivity tools (Cline, GitHub Copilot) into your engineering workflow to accelerate delivery, troubleshooting, and system monitoring.

Benefits

  • Flexible vacation
  • Medical/dental/vision insurance
  • Traditional/Roth retirement savings options
  • Company-paid disability and life insurance
  • Flexible Spending Account & Limited FSA
  • Family-friendly parental leave, volunteer and voting time off
  • On-demand wellness platform access for you and 5 friends and family
  • PerkSpot discount program for 900+ merchants nationwide

Related Categories

Related Job Pages

More DevOps Engineer Jobs

In All Media logo

Azure DevOps Engineer, ML Ops Engineer

In All Media

Imagine the future of business. Ideas for a Digital Renaissance.

DevOps Engineer4 days ago
ContractRemoteTeam 1,001-5,000H1B No Sponsor

• Drive AI-Driven Automation: Design, implement, and experiment with cutting-edge workflows that apply AI/LLMs to automate complex DevOps operations, reducing human intervention and operational friction. • Architect Scalable Cloud Infrastructure: Oversee and optimize robust infrastructure solutions within Azure, ensuring top-tier performance, security, and monitoring. • Define CI/CD & IaC Strategy: Champion best-in-class Infrastructure as Code (IaC) and continuous integration/deployment strategies to enable seamless, reliable software delivery. • Act as a Strategic Technical Advisor: Influence architectural decisions and system design without needing direct authority, fostering a strong culture of research and innovation. • Enhance System Reliability & Observability: Participate in deep architectural reviews, code reviews, and system design sessions to ensure high availability and cost-efficiency.

Latin America
ŌURA logo

Staff Site Reliability Engineer

ŌURA

Better lives through better sleep.

DevOps Engineer4 days ago
Full TimeRemoteTeam 201-500H1B No Sponsor

Role Description We are looking for a Staff Site Reliability Engineer to join our SRE Squad. This is a technical leadership role for someone who can set the direction of our cloud infrastructure strategy while still being deeply hands-on. Our team is responsible for governance and observability of the Oura AWS infrastructure. We combine the power of the ring and the app with backend services and integrations to provide a data-rich and secure platform. Our APIs power most Oura apps, services, and machine learning components. Good reliability and scalability of the Cloud platform provides the technical foundation for our growth. As a Staff Engineer, you will operate with ownership that spans multiple teams and systems, driving high-impact solutions, setting technical direction, and raising the bar for engineering excellence across the organization. You will be the go-to technical leader for our most complex infrastructure challenges. This is a remote US role. We have offices in San Francisco and San Diego for those who prefer hybrid or office settings. Oura employees in other major cities (like Boston and New York) occasionally gather informally at local co-working locations. What You’ll Do - Technical Strategy & Architecture: Set the technical direction for Oura’s AWS infrastructure and cloud platform. Define and drive the long-term architecture for reliability, scalability, and cost efficiency across all production systems. - Infrastructure as Code Leadership: Own and evolve Oura’s infrastructure-as-code platform, establishing standards and patterns that teams across the organization adopt. Lead migrations of services onto shared best practices. - Observability & Fault Tolerance: Architect and implement organization-wide observability, monitoring, and alerting strategies. Design fault-tolerant systems that handle user demand peaks and degrade gracefully under failure conditions. - Cross-Team Project Leadership: Plan, scope, and execute complex, multi-team infrastructure initiatives. Lead and coordinate rollouts and phased releases of major platform changes, including cross-team migrations. - Deployment & Release Engineering: Own the evolution of deployment pipelines and dependency management to ensure fast, robust, and safe testing and release of code across the engineering organization. - Operational Excellence: Set the standard for operational excellence across the engineering org. Define and maintain SLAs, lead incident response for the most complex production issues, and drive a culture of reliability and continuous improvement. - Security & Compliance: Ensure that our platform adheres to the latest security and compliance regulations. Advocate for privacy-by-design principles across all infrastructure decisions. - Mentorship & Culture: Identify growth opportunities and coach engineers across teams to become stronger infrastructure practitioners and leaders. Share knowledge via documentation, tech talks, and design reviews. Influence engineering culture and build a culture of recognition. - On-Call Leadership: Take part in and improve on-call processes. Lead troubleshooting of the most complex cross-system production issues and effectively manage crisis situations. Qualifications - A seasoned infrastructure leader: You have 8+ years of backend development and infrastructure experience, with a track record of leading complex, cross-team technical initiatives to successful delivery. - An architectural thinker: You have architected and built data-intensive distributed systems in production environments at scale. You know when to apply the right architectural patterns and can make pragmatic tradeoffs between short-term goals and long-term technical investment. - A technical force multiplier: You solve technical problems that few others can and enable your teams to tackle the hardest challenges in the domain. You are a role model for engineering excellence and set standards on system designs and coding practices. - Deep in AWS: You have strong experience running, monitoring, and debugging production systems at scale on AWS. You are fluent with key AWS services like EKS, RDS, S3, SQS, Kinesis, Lambda, and DynamoDB, and can make informed decisions about service selection and architecture. - A Kubernetes expert: You have extensive experience running and orchestrating containers with EKS or similar platforms. You can design and optimize Kubernetes configurations for reliability, security, and cost efficiency at scale. - A systems-level problem solver: You have experience building production systems on serverless architectures, designing robust deployment pipelines (experience with GitHub Actions is a bonus), and operating complex infrastructure with a mindset of operational excellence and cost efficiency. - A strong communicator: You can clearly explain complex technical problems with data and analysis to both engineering and cross-functional audiences. You are frequently sought out by product managers and engineering leaders to help shape technical direction. - A leader and mentor: You actively drive alignment across squads and missions, mediate technical disagreements, and coach engineers to become better leaders. You thrive in ambiguity and help teams navigate complex, undefined problems. Bonus Points - Experience in healthcare, wearable technology, or supporting large enterprise customers. - Strong experience with database management and data pipeline optimization. - Solid programming skills in languages such as Python, Go, or JavaScript/TypeScript. - Experience defining and driving SLO/SLI frameworks across an organization. - Track record of contributing to or leading open-source infrastructure projects. Benefits - Competitive salary and equity packages - Health, dental, vision insurance, and mental health resources - An Oura Ring of your own plus employee discounts for friends & family - 20 days of paid time off plus 13 paid holidays plus 8 days of flexible wellness time off - Paid sick leave and parental leave - Oura takes a market-based approach to pay, which may vary depending on your location. US locations are categorized into tiers based on a cost of labor index for that geographic area. - Region 1: $198,050-$233,000 - Region 2: $180,200-$212,000 - Region 3: $169,150-$228,850

United States
$169.2K - $233K / year
Job Closed
Vouched logo

Senior / Staff DevOps Engineer

Vouched

Award-winning AI for identity verification and KYC

DevOps Engineer4 days ago
Full TimeRemoteTeam 11-50H1B Sponsor

Role Description We are seeking a Senior/Staff DevOps Engineer who thrives in a fast-paced startup environment and is passionate about building reliable, secure, and scalable infrastructure alongside a talented team. The ideal candidate blends deep operational expertise with strong software engineering instincts, partnering closely with engineering, security, and product stakeholders to design and operate the platform that powers identity verification for customers around the world. - Improve and develop infrastructure and reliability across the full product lifecycle, from architecture and provisioning through deployment, monitoring, and continuous improvement. - Ensure every service is observable, performant, and resilient, providing the team with the automation, tooling, and guardrails needed to ship quickly and safely. - Treat security, privacy, and regulatory compliance as first-class, non-negotiable requirements in every system built. - Maintain and advance compliance posture across frameworks such as SOC 2, ISO 27001, GDPR, etc. - Foster a culture of shared accountability around operational excellence and security by leading incident response and facilitating open conversations about risk. - Leverage modern AI-powered tooling to accelerate infrastructure and platform work. Qualifications - 6+ years of experience in DevOps, Site Reliability, Platform, or Infrastructure Engineering within a software engineering organization. - Deep expertise with a major cloud provider (GCP preferred) and strong understanding of networking, security, and distributed systems. - Strong hands-on experience with infrastructure-as-code (Terraform, Pulumi, and/or CloudFormation) and configuration management. - Production experience with containers and orchestration (Docker, Kubernetes, or ECS) and with building robust CI/CD pipelines (GitHub Actions, CircleCI, or similar). - Proficiency with observability and monitoring stacks (Datadog, Prometheus/Grafana, CloudWatch, or equivalent). - Solid scripting and programming skills (Python, Go, Bash, or TypeScript/Node) to build automation and tooling. - Strong grasp of cloud security best practices: IAM and least-privilege, secrets management, encryption, network security, and vulnerability management. - Hands-on experience supporting compliance frameworks such as SOC 2, ISO 27001, GDPR, HIPAA including control implementation, evidence and audit readiness, and compliance automation. - Proficiency with AI-powered development tools such as Claude Code, Cursor, GitHub Copilot, or equivalent. - Experience leading incident response and participating in an on-call rotation for production systems. - Excellent written and verbal communication skills; ability to document systems clearly and write actionable runbooks. - Experience working in a startup environment is required. - Experience managing or collaborating with distributed teams is essential. - Familiarity with identity verification products or AI/ML-based solutions is a plus. Requirements - Design, build, and operate cloud infrastructure (GCP preferred) using infrastructure-as-code, with an emphasis on repeatability, security, and cost efficiency. - Own and continuously improve CI/CD pipelines, automated integration and unit testing, provisioning, deployments, and rollbacks. - Build and maintain observability across the platform, including monitoring, logging, tracing, alerting, and meaningful dashboards. - Improve and advance security posture: secrets management, encryption in transit and at rest, IAM and least-privilege access, network segmentation, and vulnerability management. - Drive compliance readiness by partnering with security and leadership to maintain, automate, and provide evidence for controls. - Lead incident response and the on-call rotation; drive blameless postmortems and reduce mean-time-to-recovery. - Define and uphold reliability targets (SLOs/SLIs), capacity planning, and performance tuning. - Leverage AI-powered tooling to accelerate infrastructure-as-code, automation, and internal tooling. - Partner with engineering to improve developer experience and deployment velocity. - Drive a culture of operational excellence, reliability, security, and continuous improvement. - Set technical direction for platform and infrastructure, and mentor engineers on DevOps, reliability, and security best practices. - Continuously evaluate and adopt emerging AI-powered tools and workflows. Benefits - Equity compensation - Remote working environment - Self-managed paid time off - 11+ annual company holidays - 401(k) - Health Care Benefits: Medical, Vision, Dental - Wellness benefits: EAP, LifeHealth Online, One Medical, Perkspot - Parental leave

United States
$200K - $225K / year
Full TimeRemoteTeam 10,001+Since 2011H1B No Sponsor

• Evolve and maintain the enterprise Kubernetes platform (AKS/EKS), ensuring scalability, security and high availability of the environments; • Build and enhance infrastructure and operations automation using Infrastructure as Code and GitOps practices; • Develop and maintain CI/CD pipelines, supporting teams in the continuous delivery journey; • Implement observability, monitoring and distributed tracing solutions to ensure visibility and reliability of services; • Respond to critical incidents, perform root cause analysis and implement continuous improvements to the platform; • Support development teams in adopting cloud, Kubernetes, observability and automation best practices; • Evolve the internal engineering platform to improve developer experience and accelerate delivery of business value; • Implement and optimize autoscaling strategies, capacity management and operational efficiency for cloud environments; • Collaborate with cross-functional teams to define architecture, security and governance standards for Azure and AWS environments; • Evaluate, test and implement new solutions and technologies focused on Platform Engineering, SRE and enterprise automation.

Brazil