Job Closed
This listing is no longer active.
Bringing earlier detection to brain health.
Site Reliability Engineer
Location
United States
Posted
118 days ago
Salary
0
Seniority
Senior
Job Description
Site Reliability Engineer
Linus Health
• Leverage infrastructure as code (Terraform) to build and maintain complex production and analytics workflows including networking and containerized services. • Rapidly diagnose and resolve faults in system services as part of a 24/7 on-call rotation focused on actionable alerting and eliminating toil. • Improve speed of delivery by developing and maintaining CI/CD pipelines. • Develop infrastructure automation leveraging Terraform, Python and Typescript. • Improve system availability, security, compliance, cost effectiveness and performance. • Estimate work, prioritize tasks, track dependencies, report progress, highlight blockers • Participate in continuous improvement initiatives, advocate for SRE best practices, and stay current with emerging technologies and trends. • Be part of a team where your focus will be on building, measuring, and refining the systems infrastructure that runs our software.
Job Requirements
- Effective ability to program with one or more high level languages, such as TypeScript, Python, Java, etc
- Effective ability to use terraform to control infrastructure in AWS or other public cloud environments.
- Experience in building and supporting serverless applications on AWS using services such as api gateway, lambda, fargate, glue, etc.
- Experience with containerization and orchestration tools.
- Outstanding communication and collaboration skills.
- A proactive engineering approach to spotting problems, areas for improvement, and performance bottlenecks
Benefits
- Equal opportunity employer
- Providing reasonable accommodations for candidates with disabilities
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Dev Ops Engineer, Level 5
Scratch FinancialScratch Financial is the world's simplest patient financing solution.
• Participates as a technical expert providing advanced knowledge in vendor devices and management systems • Plans and directs development teams and troubleshoots internal application issues • Provides technical solutions for network engineering and operational problems • Interfaces with vendors and engineering organizations • Provides leadership to Network Engineers and the CIEC Development team
• Build, lead, and develop a high-performing team of Site Reliability Engineers responsible for our hybrid cloud infrastructure in AWS, with an on-premise extension in Hetzner . • Design, document, and lead the implementation of reliable and secure infrastructure solutions following industry best practices. • Oversee technical analysis, cost estimation and optimization, platform and system design, architectural compliance, resource planning, and delivery milestones. • Engage in hands-on technical work alongside the team to maintain deep understanding of the infrastructure, and lead incident response during critical issues. • Define team goals and strategy, building strong relationships with internal stakeholders across the organisation. • Manage and coordinate the on-call rotation, including escalation processes, across infrastructure and software engineering teams. • Champion engineering best practices and drive continuous improvement in production environment quality and reliability.
Electrical Reliability Engineer
Orbital Engineering, Inc.Orbital Engineering, Inc. specializes in a range of industrial services, including engineering and design, construction management, quality assurance/quality control, safety, and a
• Manage risk based electrical infrastructure programs for client operating facilities across the United States • Develop, implement, and manage risk-based electrical infrastructure programs covering substations, switchgear, MCCs, transformers, protective relays and power distribution systems • Establish asset criticality frameworks incorporating probability of failure, consequence of failure, safety impact, production impact, and regulatory exposure • Analyze inspection and testing data to identify degradation trends, failure modes, and emerging reliability risks • Support owners/operators in developing multi‑year electrical capital plans aligned with risk, reliability targets, and business objectives • Recommend repair vs. replace decisions based on lifecycle cost, risk reduction, and system performance • Develop and maintain electrical reliability standards, inspection procedures, and engineering guidelines aligned with applicable codes and industry best practices • Serve as a technical liaison between engineering, maintenance, operations, safety, and management teams • Communicate complex technical risk concepts clearly to non‑technical stakeholders and leadership
• Build and maintain scalable E2E automation frameworks for web (Vue.js / Flutter Web), mobile (Flutter iOS/Android), and APIs • Validate critical marketplace workflows and ensure reliability of our business flows and data domain • Embed automated testing into Azure DevOps pipelines (parallelisation, orchestration, reliability, performance) • Improve delivery infrastructure and environments using IaC (Terraform and/or Azure Bicep) • Manage test data and environments with reusable tooling and consistent configuration • Monitor pipeline health, reduce flaky tests, and remove bottlenecks across CI • Track and report quality and delivery metrics via dashboards and test reports • Collaborate across Engineering, QA, Product, and DevOps in an Agile delivery process • Contribute to PHP (Laravel) and Vue.js development as capacity permits • Apply a practical understanding of AI agents in modern development workflows



