Job Closed
This listing is no longer active.
Health, powered by you.
Senior DevOps Engineer
Location
California
Posted
174 days ago
Salary
$160K - $200K / year
Seniority
Senior
Job Description
Senior DevOps Engineer
Evidation
• Design, build, and maintain highly available, scalable infrastructure on AWS using Infrastructure as code. • Design and operate multi-tenant Kubernetes environments running on EKS, including cluster operations, workload management, autoscaling, and cost-optimized configurations. • Drive Infrastructure-as-Code (IaC) best practices using Terraform and Pulumi, including modularization, testing, versioning, and safe deployment patterns. • Contribute to CI/CD ecosystem using GitHub Actions, reusable workflows, and secure secrets management; ensure fast, resilient, and traceable deployment pipelines. • Build and maintain containerization based software delivery pipeline leveraging Docker, Helm charts, and Github workflows. • Define and continuously improve monitoring, alerting, dashboards, and logging using Datadog. • Evaluate operational data to identify performance, stability, and cost-efficiency opportunities. • Provide advanced support for major incidents, performing root cause analysis, writing clear postmortems, and ensuring long-term corrective actions. • Apply a security-first mindset to infrastructure architecture, IAM, network boundaries, and workload configurations. • Implement work in alignment to controls in support of ISO 27001, SOC 2, HIPAA, and other regulated requirements. • Collaborate with Security to operationalize secure-by-default infrastructure patterns. • Collaborate with Engineering, Data, and Delivery teams to define requirements, translate technical needs, and deliver scalable solutions. • Facilitate knowledge sharing through documentation, playbooks, incident reviews, and architectural discussions. • Identify opportunities to add value beyond immediate requests—improving reliability, simplifying processes, and reducing operational load.
Job Requirements
- 8+ years of DevOps, SRE, Platform Engineering, or relevant experience supporting production cloud systems.
- Expert-level experience with AWS services.
- Expert-level experience managing Kubernetes environments, including Helm, KEDA, cluster lifecycle, and multi-environment deployments.
- Advanced CI/CD experience using GitHub Actions (workflows, reusable workflows, OIDC auth, environments) or similar technology.
- Expert-level containerization skills (Docker, image optimization, registry management).
- Strong proficiency with Terraform and Pulumi for Infrastructure as Code.
- Hands-on experience with AI-assisted development tools (VSCode, GitHub Copilot, code generation workflows).
- Strong proficiency with scripting and coding automation tools.
- Experience in more than one of: Bash, Python, Ruby, or Go.
- Experience building reliable, observable systems using Datadog (metrics, logs, traces, monitors) or similar solution.
- Strong understanding of distributed systems, networking, autoscaling, and operational patterns in cloud-native architectures.
- Strong debugging, problem-solving, and incident response skills across complex, multi-service systems.
Benefits
- salary + bonus + equity + benefits
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
DevSecOps Engineer
MeridianLinkConnecting You to Better: MeridianLink is the developer of the industry's first multi-channel loan origination system.
• The DevSecOps Engineer will assist in user issues while working with SR. DevSecOps Engineer • Expected to assist in designing, building, and testing scripts in native and tool-dependent languages for continuous integration, continuous delivery pipeline • Responsible for following the direction for the development of an automated framework for Security Tool deployment and development • The DevSecOps Engineer will use Security-as-Code principles, build templates to automate security vulnerabilities • The role will maintain interfaces with outside systems, analyze downtimes, analyze proposed system modifications, upgrades • Expected to follow necessary monitoring, auditing, and reporting frameworks that produce artifacts supporting security and compliance needs
Senior CUDA Driver, DevOps Engineer
NVIDIANVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative and autonomous, we want to hear from you! Applications for this job will be accepted at least until June 15, 2026. This posting is for an existing vacancy. NVIDIA uses AI tools in its recruiting processes. NVIDIA is committed to fostering an inclusive work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
• Decomposing and modularizing build processes for reusability across multiple projects • Debugging GitHub Actions/GitLab pipelines to ensure timely and efficient CI execution • Working on scripting and infrastructure to handle dependencies across various environments and build systems • Bringing up builds and CI across platforms (x64/arm64) and OSes (Linux/Windows/Mac) and other unreleased hardware and software • Working with engineering leadership to identify the support matrix and define the scope of the build matrix • Crafting and updating documentation and coordinating with partners to scope and take on multi-functional projects • Automating scheduled work for all of the above
• Ensure high reliability of microservices running in OpenShift environments • Lead and coordinate a technical team of 3–4 engineers for operational excellence • Manage incident resolution and ticketing workflows via ServiceNow • Collaborate with development teams to drive performance optimization and tuning • Design, configure and maintain monitoring dashboards (Grafana, Prometheus, etc.) • Coordinate with Service Control Room to maintain effective alerting and response • Oversee release processes of new features, hotfixes, and updates in production
Head of DevOps, Cloud & Infrastructure
EnterpriseAlumniCorporate Alumni Engagement & Management Platform For The Enterprise
• Architect, build, and maintain scalable, secure, multi-regional cloud infrastructure on AWS • Own our Infrastructure as Code practices using Terraform, ensuring reproducibility and auditability • Design and optimize CI/CD pipelines across Jenkins and CircleCI, including iOS and Android build systems • Manage container orchestration via EC2/ECS/ECR and Kubernetes as well as ingress/routing through Traefik • Lead observability strategy using Grafana and Prometheus — ensuring comprehensive monitoring, alerting, and incident response capabilities • Drive high availability and disaster recovery planning across regions • Ensure infrastructure meets SOC 2, ISO 27001, and Cyber Essentials+ requirements • Implement and maintain robust security practices, including encryption at rest, in transit, and in use • Stay current on evolving compliance requirements for banking and professional services clients • Lead security audits and remediation efforts • Continuously monitor and optimize cloud spend, staying ahead of AWS pricing changes and leveraging reserved instances, savings plans, and right-sizing strategies • Establish cost visibility and accountability across teams • Present regular cost analyses and recommendations to leadership • Build, mentor, and lead the DevOps and infrastructure team • Set clear goals, provide regular feedback, and support career development • Foster a culture of ownership, collaboration, and continuous improvement • Manage vendor relationships and negotiate contracts where applicable • Partner closely with development teams to ensure infrastructure supports application needs • Communicate infrastructure strategy, risks, and trade-offs clearly to non-technical stakeholders • Participate in incident response and establish on-call practices that balance reliability with team well-being




