Job Closed
This listing is no longer active.
Pioneering innovative medical device and digital health solutions that treat and keep people out of the hospital.
SRE Engineer
Location
United States
Posted
135 days ago
Salary
0
Seniority
Senior
Job Description
SRE Engineer
ResMed
• Ensure the reliability, availability, and resiliency of Resmed’s digital products by designing and operating fault-tolerant systems • Partner with product and platform teams to define and improve service health using operational and customer-experience metrics • Design, implement, and maintain monitoring, alerting, logging, and tracing solutions that provide real-time visibility into system behavior and customer experience • Analyze system performance, scalability, and capacity, and drive optimizations to improve efficiency and stability in cloud environments • Build automation and tooling to support deployments, scaling, incident response, and operational workflows • Participate in an on-call rotation as part of a globally distributed team, lead incident response efforts, troubleshoot production issues, conduct postmortems, and drive continuous improvement initiatives • Collaborate with security and compliance partners to support secure, privacy-aware, and compliant operations • Work closely with engineering teams to improve developer experience, operational maturity, and overall customer experience
Job Requirements
- Experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering roles
- Experience operating Kubernetes-based production systems
- Hands-on experience with AWS and infrastructure-as-code tools
- Experience designing and supporting CI/CD pipelines and automated deployments
- Proficiency in Python for automation, tooling, or backend services
- Solid understanding of distributed systems and networking concepts
- Experience with monitoring and observability platforms such as Datadog and CloudWatch
Benefits
- Health insurance
- 401(k) matching
- Flexible work hours
- Paid time off
- Remote work options
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Site Reliability Engineer
Origami RiskOrigami Risk is a leading provider of integrated risk, compliance, safety, healthcare, and P&C insurance SaaS solutions.
• Leads post-incident investigations for the Site Reliability team. • Conducts in-depth post-incident analyses to identify root causes and develops preventive strategies. • Drafts clear and insightful RCAs for customer delivery. • Cross trains colleagues on how to best leverage observability tools during incident and performance investigations. • Provides visibility to all stakeholders throughout the entire Site Reliability process. • Collaborates with cross-functional teams to implement system enhancements that enhance scalability and stability. • Develops client-focused dashboards/alerts to proactively identify performance challenges. • Monitors and continuously improves our time to resolution metrics. • Maintains and configures core observability tools to ensure optimum performance and key metrics/data are available for incident response and performance investigations. • Provides an actionable feedback loop to Observability and Engineering teams toward improving MELT and development patterns. • Contributes to the development of automation tools to streamline incident response. • Works proactively to prevent incidents and reduce their impact on our platform. • Partners with the larger Cloud Operations, SRE, Engineering teams, and the business-at-large to advance our SaaS platforms. • Participates in on-call rotation with other team members as needed. • Other duties as assigned.
Senior DevOps Azure Specialist
EquisoftA global provider of insurance and investment software solutions.
• Collaborate with the development team to facilitate the development process • Automate and align the process of building (CI), deploying (CD), maintaining and upgrading the technologies supporting the application • Diagnose production problems and coordinate with the development team to align code deployment • Manage access and environment controls • Administer the development environment and support the development team • Write relevant documentation on new technologies and processes that are implemented • Evaluate the performance, availability and security of our systems and recommend the restructuring of existing configurations • Respond to requests and investigate problems to apply corrective measures to the systems under their responsibility • Optimize the use of the cloud to reduce operational costs
• Design, implement, and maintain secure CI/CD pipelines aligned with DoD Enterprise DevSecOps Reference Design (DSOP). • Automate deployment of secure environments using Terraform, Ansible, or CloudFormation for DoD or FedRAMP-compliant systems. • Integrate static code analysis (SAST), dynamic testing (DAST), container scanning and various security toolsets within pipelines to enforce continuous compliance. • Implement and manage DoD STIGs, DISA baselines, and RMF controls in Infrastructure as Code (IaC). • Collaborate with security, development, and operations teams to ensure alignment with DoD RMF, NIST SP 800-53, and/or FedRAMP.
DevOps Engineer
Creyos (formerly Cambridge Brain Sciences)A simple & scientifically-validated digital platform for assessing cognitive function.
• Establish best practices and standards for the DevOps team including policies, procedures, runbooks, and disaster recovery processes • Develop and implement security best practices for AWS, including adherence to international regulations, review of the framework, cloud failover, scaling, and issue tracing • Provide support to Engineers and Customer Success Managers, including troubleshooting failed builds as well as dev/QA/production issues • Evaluate and update Terraform project structures and reusable modules • Develop and refine processes for debugging and network configurations • Manage reporting and action steps for key metrics, including alerts for all key system indicators • Work closely with the engineering team to refine and enhance our production and development setups, and develop a continuous improvement approach to software development, testing, and deployment




