Job Closed
This listing is no longer active.
Founded in 2004 and led by CEO Steve Chapman, Natera is a company in the biotechnology market that offers genetic testing and diagnostics on a global scale. Ope
Senior DevOps/SRE Engineer
Location
United States
Posted
130 days ago
Salary
$140.2K - $175.2K / year
Seniority
Senior
Job Description
Senior DevOps/SRE Engineer
Natera
• Own the entire Laboratory Operations Software release process execution, ensuring smooth and timely software releases with minimal downtime. • Continuously monitor the effectiveness of the release process and implement improvements to increase efficiency, reduce errors, and enhance overall quality. • Act as an internal consultant and subject matter expert, coaching individual product teams on best-in-class DevOps practices, including infrastructure-as-code (IaC), monitoring, logging, and security integration. • Embed with development teams to assess and improve DevOps maturity, delivery practices, and operational readiness. • Design and implement a variety of projects to support extreme growth of complexity of applications as well as to enable innovation. • Provide hands-on guidance in CI/CD, cloud infrastructure usage, Kubernetes operations, and observability. • Help teams adopt existing infrastructure, platforms, and tooling provided by central Cloud / Platform teams. • Promote and reinforce technical standards, guardrails, and best practices that allow teams to operate autonomously while remaining compliant and secure. • Guide teams in applying organizational expectations around reliability, security, and cost management through automation rather than manual controls. • Serve as a feedback channel to central platform and cloud teams, sharing adoption challenges and improvement opportunities. • Continuously improve and automate infrastructure provisioning, configuration management, application deployment, and testing using tools like Terraform, Kubernetes and CI/CD. • Advocate for automation-first approaches to reduce operational toil and risk. • Partner with teams to define and implement Service Level Indicators (SLIs), Service Level Objectives (SLOs), and operational dashboards for their services. • Guide teams through incident response, post-incident reviews, and reliability improvements. • Identify systemic reliability issues and escalate platform-level concerns to the appropriate owning teams. • Drive capacity planning and performance tuning activities to ensure scalability and efficiency. • Provide expert-level support for complex infrastructure and deployment issues escalated by the product teams. • Assist teams in root cause analysis and long-term remediation. • Create and maintain clear documentation, runbooks, release process, CI/CD pipelines, and regression testing procedures. • Maintain comprehensive documentation of the release process, CI/CD pipelines, and regression testing procedures. • Share best practices and lessons learned across teams to raise overall DevOps maturity.
Job Requirements
- Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
- 7+ years of professional software engineering experience building production-grade systems with emphasis on automation, integrations and infrastructure tooling.
- Excellent problem-solving skills with the ability to troubleshoot complex issues in a fast-paced environment.
- Excellent communication, coaching, and collaboration skills, with the ability to work effectively across teams and convey technical concepts to non-technical stakeholders.
- Deep understanding of Site Reliability Engineering (SRE) principles, including SLIs, SLOs, error budgets, and toil reduction.
- Expertise in setting up and managing comprehensive monitoring, logging, and alerting systems.
- Proven experience with incident response and leading post-incident review (post-mortem) processes.
- Experience with capacity planning, performance analysis, and optimization of distributed systems.
- Strong expertise in CI/CD tools (e.g., Jenkins, GitLab CI).
- Practical experience building complex CI/CD pipelines.
- Proficiency in at least one programming language (e.g., Java, Python).
- Strong command of AWS stack.
- Proficiency in Docker, Kubernetes and Helm.
- Experience working with databases (SQL, MySQL, PostgreSQL).
- Version control systems (e.g., Git).
- Experience working with Terraform.
Benefits
- Comprehensive medical, dental, vision, life and disability plans for eligible employees and their dependents.
- Free testing for employees and their immediate families.
- Fertility care benefits.
- Pregnancy and baby bonding leave.
- 401k benefits.
- Commuter benefits.
- Generous employee referral program.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Work with technical lead to ensure solutions are aligned with the ATIS enterprise architecture. • Design, build, and maintain CI/CD pipelines using technical resources that integrate secure code scanning. • Implement DevSecOps best practices to enable continuous delivery with Army and program-specific security controls. • Automate infrastructure provisioning and configuration using Infrastructure as Code (IaC) tools. • Integrate and manage security tools within the CI/CD pipeline. • Collaborate with cross-functional teams to align DevSecOps capabilities with Agile delivery. • Monitor pipeline and environment performance, perform troubleshooting, and resolve integration and deployment issues. • Enforce compliance with DoW Risk Management Framework (RMF), NIST SP 800-53, and STIG requirements.
• Lead and manage the DevOps team and management of Bitcoin Depot cloud infrastructure. • Migration of applications/services from cloud providers to AWS. • Work alongside the software engineering team to develop CI/CD pipelines. • Development of Terraform scripts for deployment of application/services in AWS infrastructure. • Monitoring and assisting in resolving system issues as they arise. • Ensure high availability, performance and cost efficiency of cloud infrastructure. • Collaborate with Software Engineering, IT Operations, and Security teams.
Senior Site Reliability Engineer, Observability
ScienceLogicWe are a leader in AIOps providing modern IT operations with actionable insights to predict and resolve problems faster.
• Be a key contributor on an Agile development team, collaboratively realizing business value through iterative software development lifecycle • Build and execute the monitoring strategy for ScienceLogic SaaS infrastructure • Define, deploy, and maintain system and service monitors • Be the authority for various monitoring technologies like Prometheus, AWS Cloudwatch, Scylla manager, New Relic to provide next generation monitoring solutions for ScienceLogic SaaS • Employ advanced monitoring practices and technologies to detect and automatically resolve platform issues before they impact the customer’s experience. • Participate in architecture and operations reviews • Identify and automate measurement of operations SLAs, SLOs using SLIs • Triage incident response, document SOPs, Runbooks and train NOC team members • Participate in shared on-call manager rotation for escalations during incidents and outages, occasionally during off hours • Provide dash boarding and analytics solutions to internal teams based on requirements
SRE Analyst, Senior
Pottencial Seguradora S.ASomos a maior insurtech do Brasil e líderes no mercado de Seguro Garantia!
• Collaborate with development, infrastructure, and security teams to design, build, and maintain reliable and scalable systems; • Participate in planning and executing load, chaos, and failover tests, focusing on risk mitigation and identification of bottlenecks; • Develop and maintain automation tools for monitoring, deployment, rollback, and incident response; • Monitor and respond to critical incidents, conducting root cause analysis (RCA) and proposing preventive actions; • Support the evolution of CI/CD processes, infrastructure as code (IaC), and security; • Lead automation, observability, and performance initiatives for critical systems; • Design, implement, and evolve monitoring, metrics, distributed tracing, and logging solutions; • Conduct incident reviews (postmortems) with root cause analysis and structured action plans; • Identify and apply continuous improvements to SLOs, SLIs, and SLAs; • Act as a focal point for failure mitigation, recovery, and continuity plans; • Drive a culture of reliability and resilience across the organization; • Mentor junior and mid-level professionals, promoting technical training and best practices.




