Dave logo
Dave

We started Dave for one reason: banks weren’t built for people like us, and we knew we deserved better.

Staff Site Reliability Engineer

DevOps EngineerDevOps EngineerOtherRemoteLeadTeam 201-500H1B SponsorCompany SiteLinkedIn

Location

United States

Posted

87 days ago

Salary

$208K - $330K / year

Seniority

Lead

Job Description

Staff Site Reliability Engineer

Dave

• Lead architecture and automation across our GCP environment, ensuring reliability, scalability, security, and thoughtful cost management. • Define and improve SLIs, SLOs, and error budgets using Cloud Monitoring and Datadog — connecting reliability goals to real business outcomes. • Shape our multi-region, disaster recovery, and capacity planning strategies so the platform holds up as we grow. • Design and optimize cloud networking, including VPC architecture, ingress/egress, Cloud Armor, VPN, and DNS to support internal systems, partner integrations, and member-facing services. • Drive infrastructure-as-code and GitOps practices using Terraform, Kubernetes, Helm, and ArgoCD to make deployments predictable and repeatable. • Mentor SREs and infrastructure engineers through design reviews, incident retros, and hands-on collaboration — strengthening technical depth across the team. • Explore practical LLM-driven automation where it meaningfully reduces operational toil and shortens incident resolution time.

Job Requirements

  • 8+ years in software, infrastructure, or site reliability engineering.
  • 5+ years of hands-on experience operating production systems in GCP (compute, networking, storage, IAM, observability).
  • Deep experience with Kubernetes (GKE), Helm, containerization, Terraform (IaC), and ArgoCD.
  • Strong programming skills in Python, Go, or TypeScript/JavaScript for automation and internal tooling.
  • Experience defining and operating against SLIs, SLOs, and error budgets.
  • Strong knowledge of relational and distributed databases (e.g., MySQL, Cloud SQL, Cloud Spanner, Redis), including performance tuning and HA strategies.
  • Experience leading incident response, root cause analysis, and systemic remediation.

Benefits

  • Opportunity to tackle tough challenges, learn and grow from fellow top talent, and help millions of people reach their personal financial goals
  • Flexible hours and virtual first work culture with a home office stipend
  • Premium Medical, Dental, and Vision Insurance plans
  • Generous paid parental and caregiver leave
  • 401(k) savings plan with matching contributions
  • Financial advisor and financial wellness support
  • Flexible PTO and generous company holidays, including Juneteenth and Winter Break
  • All-company in-person events once or twice a year and virtual events throughout to connect with your team members and leadership team

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Aledade logo

Senior Software Engineer II - CI Pipeline Engineer

Aledade

Self-described as "a new company with an old-fashioned goal," Aledade aims to put healthcare control back into the hands of doctors. Headquartered in Bethesda, Maryland, the compan

DevOps Engineer87 days ago

As a Senior II Engineer on the CI Pipeline team, you will serve as a primary architect of our CI/CD vision, helping to ensure that as Aledade scales, our delivery speed and compliance posture accelerate together. You will initially lead the evolution of a "Universal Pipeline" – the initiative to make the "Right Way" the "Easy Way" by building automation and guardrails to ensure every deployment is HIPAA-compliant by default. Beyond the initial pipeline framework, you will be involved in the long-term strategy for our internal developer experience, moving into the test tooling infrastructure (interwoven into the CI pipeline), self-service tooling, and ephemeral environments to leverage those technologies. Your goal is to foster a high-velocity engineering culture where security, compliance, and audit evidence are seamless side-effects of a delivery lifecycle, not manual tasks. Primary Duties: - Develop and implement scalable and performant solutions. - Partner, as a peer, with Engineering Managers, Product Managers, and stakeholders throughout Aledade to develop and execute technical roadmaps using Agile processes. - Mentor and coach more junior engineers including thorough pull request reviews for other developers and be receptive to critical feedback on your own work. Minimum Qualifications: - BS/BTech (or higher) in Computer Science, Engineering or a related field. - 6+ years experience as an engineer building and managing highly automated CI/CD infrastructure and developer tooling as part of a cross-functional team. - 3+ years of experience working with infrastructure-as-code and automation scripting (e.g., Python, Bash, or Go) to manage complex delivery pipelines. - 3+ years of experience acting as a trusted technical decision-maker in a team setting, solving for short-term and long-term business value. - 3+ years of experience coaching other engineers on testing strategies and pipeline integration. Preferred KSA’s: - Engineering & Custom Tooling - Systems Programming: Proficiency in a high-level language (Python, Go, etc) to build custom CLI tools, internal providers, or API integrations that extend the capabilities of off-the-shelf CI/CD products. - Developer Experience (DX) Tooling: Experience building internal abstractions or "Golden Path" templates that simplify complex cloud interactions for product engineers. - Infrastructure as Code (IaC): Expert-level Terraform or Pulumi skills used to treat the entire delivery platform as a version-controlled, testable software product. Test Infrastructure & Orchestration - Ephemeral Test Environments: Expertise in architecting "on-demand" testing environments (using Kubernetes/Namespaces or Docker) that allow developers to run full-stack integration tests within the pipeline. - Test Tooling Integration: Experience building or integrating frameworks for Contract Testing (e.g., Pact), Synthetic Testing, and Automated Regression at scale. - Mocking & Service Virtualization: Ability to provide engineers with the infrastructure needed to mock healthcare-specific dependencies (e.g., EHR simulators) within the CI flow. Compliance & Security as Code - Automated Governance: Experience building "Compliance as Code" into pipelines, ensuring that SOC2, SOX, and HIPAA audit evidence (the "Triple-Lock" of Author, Approver, and Scan results) is captured automatically. - Secure Supply Chain: Proficiency in integrating security gates—including SAST, DAST, Secret Detection, and automated SBOM generation—into the automated delivery flow. - Identity & Secrets Management: Deep understanding of managing sensitive credentials and least-privilege access for CI/CD runners in a cloud environment (AWS preferred). Pipeline Architecture & Reliability - Universal Pipeline Design: Expertise in building modular, reusable CI/CD templates (e.g., GitHub Actions) that standardize deployment patterns across diverse stacks (ECS, EKS, Databricks). - Build Optimization: Proven ability to optimize monorepo build performance through intelligent caching, change-detection, and parallelization. - Observability & DORA Metrics: Ability to instrument the delivery platform to track and improve core metrics like Deployment Frequency and Lead Time for Changes. Physical Requirements: - Sitting for prolonged periods of time. Extensive use of computers and keyboard. Occasional walking and lifting may be required.

United States
Dev.Pro logo

Senior DevOps Engineer

Dev.Pro

Software Development Partner. Result-driven. Quality-obsessed.

DevOps Engineer87 days ago
Full TimeRemoteTeam 501-1,000Since 2011H1B No Sponsor

• Manage, scale, and optimize cloud environments used for data science workloads (primarily AWS, Databricks, dbt). • Provision, maintain, and optimize compute clusters for ML workloads (e.g., Kubernetes, ECS/EKS, Databricks, SageMaker). • Implement and maintain high-availability solutions for mission-critical analytics platforms. • Develop CI/CD pipelines for model deployment, infrastructure-as-code (IaC), and automated testing using industry standard toolchains. • Build monitoring, alerting, and logging systems for cloud and ML infrastructure (e.g., Datadog, CloudWatch, Prometheus, Grafana, ELK). • Automate provisioning, configuration, and deployments using tools such as Terraform and CloudFormation, GitHub actions, etc. • Collaborate with Data Engineering to maintain integrations between data pipelines and cloud systems. • Share responsibility for provisioning and operating application networking capabilities that support data platforms, including API gateways, CDNs, application load balancers, TLS, and WAFs. • Conduct periodic risk assessments, best practice reviews, and remediation efforts to strengthen security and resiliency.

Brazil
nDeavour Consulting logo

Senior DevOps – Site Reliability Engineer

nDeavour Consulting

We are a staffing and IT recruitment company based in Sofia, Bulgaria.

DevOps Engineer87 days ago
OtherRemoteTeam 1-10Since 2019H1B No Sponsor

• Build, operate, and evolve all AWS environments (production and non-production), ensuring they meet availability, performance, and recovery requirements. • Own the security posture of the AWS environment. Design and enforce secure patterns for IAM, network segmentation (VPC), and ingress/egress controls. • Participate in an on-call as a primary AWS responder. You will drive technical triage and hands-on remediation for P1/P2 incidents, ensuring clear communication with stakeholders throughout. • Define SLIs/SLOs to measure service health. Proactively identify reliability risks and reduce operational toil through automation and self-healing infrastructure. • Provide hands-on support for ISO 27001 audits. You will manage technical documentation, maintain audit evidence, and ensure operational practices align with security control areas. • Maintain and secure our CI/CD pipelines and own the standards for Infrastructure as Code (IaC) to ensure all changes are controlled and auditable.

United States
Job Closed
Full TimeRemoteTeam 5,001-10,000H1B Sponsor

• Developing, testing, and distributing changes to automation, software, services, and tools the VHP team is responsible for. • Designing and implementing enhancements to VHP observability infrastructure in order to identify and correct problems before they impact our customers. • Comfortable working in new tooling, code and environments and automating what’s possible. • Create supporting tooling using Ansible or Profiling to aide in performance investigations. • Developing subject matter expertise in various components across our Compute environment. • Collaborating with our support, operations and engineering teams to investigate and troubleshoot complex problems • Participating in on-call rotations, guiding restoration and repair of service-impacting issues

Poland