Job Closed

This listing is no longer active.

Dave

We started Dave for one reason: banks weren’t built for people like us, and we knew we deserved better.

Staff Site Reliability Engineer

DevOps EngineerDevOps EngineerOther Remote LeadTeam 201-500H1B SponsorCompany Site LinkedIn

Location

United States

Posted

141 days ago

Salary

$208K - $330K / year

Seniority

Lead

Bachelor Degree8 yrs expEnglishDNS GCP JavaScript Kubernetes MySQL Python Redis SQL Terraform TypeScript

Job Description

• Lead architecture and automation across our GCP environment, ensuring reliability, scalability, security, and thoughtful cost management. • Define and improve SLIs, SLOs, and error budgets using Cloud Monitoring and Datadog — connecting reliability goals to real business outcomes. • Shape our multi-region, disaster recovery, and capacity planning strategies so the platform holds up as we grow. • Design and optimize cloud networking, including VPC architecture, ingress/egress, Cloud Armor, VPN, and DNS to support internal systems, partner integrations, and member-facing services. • Drive infrastructure-as-code and GitOps practices using Terraform, Kubernetes, Helm, and ArgoCD to make deployments predictable and repeatable. • Mentor SREs and infrastructure engineers through design reviews, incident retros, and hands-on collaboration — strengthening technical depth across the team. • Explore practical LLM-driven automation where it meaningfully reduces operational toil and shortens incident resolution time.

Job Requirements

8+ years in software, infrastructure, or site reliability engineering.
5+ years of hands-on experience operating production systems in GCP (compute, networking, storage, IAM, observability).
Deep experience with Kubernetes (GKE), Helm, containerization, Terraform (IaC), and ArgoCD.
Strong programming skills in Python, Go, or TypeScript/JavaScript for automation and internal tooling.
Experience defining and operating against SLIs, SLOs, and error budgets.
Strong knowledge of relational and distributed databases (e.g., MySQL, Cloud SQL, Cloud Spanner, Redis), including performance tuning and HA strategies.
Experience leading incident response, root cause analysis, and systemic remediation.

Benefits

Opportunity to tackle tough challenges, learn and grow from fellow top talent, and help millions of people reach their personal financial goals
Flexible hours and virtual first work culture with a home office stipend
Premium Medical, Dental, and Vision Insurance plans
Generous paid parental and caregiver leave
401(k) savings plan with matching contributions
Financial advisor and financial wellness support
Flexible PTO and generous company holidays, including Juneteenth and Winter Break
All-company in-person events once or twice a year and virtual events throughout to connect with your team members and leadership team

Related Categories

DevOps Engineer

Related Job Pages

Remote Python Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

Senior Software Engineer II - CI Pipeline Engineer

Aledade

Self-described as "a new company with an old-fashioned goal," Aledade aims to put healthcare control back into the hands of doctors. Headquartered in Bethesda,

DevOps Engineer141 days ago

Other Remote

Company Site

As a Senior II Engineer on the CI Pipeline team, you will serve as a primary architect of our CI/CD vision, helping to ensure that as Aledade scales, our delivery speed and compliance posture accelerate together. You will initially lead the evolution of a "Universal Pipeline" – the initiative to make the "Right Way" the "Easy Way" by building automation and guardrails to ensure every deployment is HIPAA-compliant by default. Beyond the initial pipeline framework, you will be involved in the long-term strategy for our internal developer experience, moving into the test tooling infrastructure (interwoven into the CI pipeline), self-service tooling, and ephemeral environments to leverage those technologies. Your goal is to foster a high-velocity engineering culture where security, compliance, and audit evidence are seamless side-effects of a delivery lifecycle, not manual tasks. Primary Duties: - Develop and implement scalable and performant solutions. - Partner, as a peer, with Engineering Managers, Product Managers, and stakeholders throughout Aledade to develop and execute technical roadmaps using Agile processes. - Mentor and coach more junior engineers including thorough pull request reviews for other developers and be receptive to critical feedback on your own work. Minimum Qualifications: - BS/BTech (or higher) in Computer Science, Engineering or a related field. - 6+ years experience as an engineer building and managing highly automated CI/CD infrastructure and developer tooling as part of a cross-functional team. - 3+ years of experience working with infrastructure-as-code and automation scripting (e.g., Python, Bash, or Go) to manage complex delivery pipelines. - 3+ years of experience acting as a trusted technical decision-maker in a team setting, solving for short-term and long-term business value. - 3+ years of experience coaching other engineers on testing strategies and pipeline integration. Preferred KSA’s: - Engineering & Custom Tooling - Systems Programming: Proficiency in a high-level language (Python, Go, etc) to build custom CLI tools, internal providers, or API integrations that extend the capabilities of off-the-shelf CI/CD products. - Developer Experience (DX) Tooling: Experience building internal abstractions or "Golden Path" templates that simplify complex cloud interactions for product engineers. - Infrastructure as Code (IaC): Expert-level Terraform or Pulumi skills used to treat the entire delivery platform as a version-controlled, testable software product. Test Infrastructure & Orchestration - Ephemeral Test Environments: Expertise in architecting "on-demand" testing environments (using Kubernetes/Namespaces or Docker) that allow developers to run full-stack integration tests within the pipeline. - Test Tooling Integration: Experience building or integrating frameworks for Contract Testing (e.g., Pact), Synthetic Testing, and Automated Regression at scale. - Mocking & Service Virtualization: Ability to provide engineers with the infrastructure needed to mock healthcare-specific dependencies (e.g., EHR simulators) within the CI flow. Compliance & Security as Code - Automated Governance: Experience building "Compliance as Code" into pipelines, ensuring that SOC2, SOX, and HIPAA audit evidence (the "Triple-Lock" of Author, Approver, and Scan results) is captured automatically. - Secure Supply Chain: Proficiency in integrating security gates—including SAST, DAST, Secret Detection, and automated SBOM generation—into the automated delivery flow. - Identity & Secrets Management: Deep understanding of managing sensitive credentials and least-privilege access for CI/CD runners in a cloud environment (AWS preferred). Pipeline Architecture & Reliability - Universal Pipeline Design: Expertise in building modular, reusable CI/CD templates (e.g., GitHub Actions) that standardize deployment patterns across diverse stacks (ECS, EKS, Databricks). - Build Optimization: Proven ability to optimize monorepo build performance through intelligent caching, change-detection, and parallelization. - Observability & DORA Metrics: Ability to instrument the delivery platform to track and improve core metrics like Deployment Frequency and Lead Time for Changes. Physical Requirements: - Sitting for prolonged periods of time. Extensive use of computers and keyboard. Occasional walking and lifting may be required.

Python Shell Terraform Pulumi Kubernetes Docker GitHub Actions AWS Infrastructure as Code CI/CD Observability / Monitoring

View details: Senior Software Engineer II - CI Pipeline Engineer

United States

Apply

Senior DevOps Engineer

Dev.Pro

Software Development Partner. Result-driven. Quality-obsessed.

DevOps Engineer141 days ago

Full Time RemoteTeam 501-1,000Since 2011H1B No Sponsor

Company Site LinkedIn

• Manage, scale, and optimize cloud environments used for data science workloads (primarily AWS, Databricks, dbt). • Provision, maintain, and optimize compute clusters for ML workloads (e.g., Kubernetes, ECS/EKS, Databricks, SageMaker). • Implement and maintain high-availability solutions for mission-critical analytics platforms. • Develop CI/CD pipelines for model deployment, infrastructure-as-code (IaC), and automated testing using industry standard toolchains. • Build monitoring, alerting, and logging systems for cloud and ML infrastructure (e.g., Datadog, CloudWatch, Prometheus, Grafana, ELK). • Automate provisioning, configuration, and deployments using tools such as Terraform and CloudFormation, GitHub actions, etc. • Collaborate with Data Engineering to maintain integrations between data pipelines and cloud systems. • Share responsibility for provisioning and operating application networking capabilities that support data platforms, including API gateways, CDNs, application load balancers, TLS, and WAFs. • Conduct periodic risk assessments, best practice reviews, and remediation efforts to strengthen security and resiliency.

AWS Docker Grafana Jenkins Kubernetes Prometheus Python Terraform

View details: Senior DevOps Engineer

Brazil

Apply

Job Closed

Senior DevOps – Site Reliability Engineer

nDeavour Consulting

We are a staffing and IT recruitment company based in Sofia, Bulgaria.

DevOps Engineer141 days ago

Other RemoteTeam 1-10Since 2019H1B No Sponsor

Company Site LinkedIn

• Build, operate, and evolve all AWS environments (production and non-production), ensuring they meet availability, performance, and recovery requirements. • Own the security posture of the AWS environment. Design and enforce secure patterns for IAM, network segmentation (VPC), and ingress/egress controls. • Participate in an on-call as a primary AWS responder. You will drive technical triage and hands-on remediation for P1/P2 incidents, ensuring clear communication with stakeholders throughout. • Define SLIs/SLOs to measure service health. Proactively identify reliability risks and reduce operational toil through automation and self-healing infrastructure. • Provide hands-on support for ISO 27001 audits. You will manage technical documentation, maintain audit evidence, and ensure operational practices align with security control areas. • Maintain and secure our CI/CD pipelines and own the standards for Infrastructure as Code (IaC) to ensure all changes are controlled and auditable.

AWS Terraform

View details: Senior DevOps – Site Reliability Engineer

United States

Apply

Job Closed

Senior Site Reliability Engineer, Linux Performance

Akamai Technologies

DevOps Engineer141 days ago

Full Time RemoteTeam 5,001-10,000H1B Sponsor

Company Site LinkedIn

• Developing, testing, and distributing changes to automation, software, services, and tools the VHP team is responsible for. • Designing and implementing enhancements to VHP observability infrastructure in order to identify and correct problems before they impact our customers. • Comfortable working in new tooling, code and environments and automating what’s possible. • Create supporting tooling using Ansible or Profiling to aide in performance investigations. • Developing subject matter expertise in various components across our Compute environment. • Collaborating with our support, operations and engineering teams to investigate and troubleshoot complex problems • Participating in on-call rotations, guiding restoration and repair of service-impacting issues

Ansible Chef Distributed Systems Kubernetes Linux Puppet SaltStack

View details: Senior Site Reliability Engineer, Linux Performance

Poland

Apply

Staff Site Reliability Engineer

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Senior Software Engineer II - CI Pipeline Engineer

Senior DevOps Engineer

Senior DevOps – Site Reliability Engineer

Senior Site Reliability Engineer, Linux Performance