Job Closed
This listing is no longer active.
Partnering with health systems to find time for the best care.
Senior Site Reliability Engineer
Location
United States
Posted
169 days ago
Salary
$125K - $169K / year
Seniority
Senior
Job Description
Senior Site Reliability Engineer
DexCare
• Design, scale, and operate resilient, cloud-native infrastructure in AWS — with a strong emphasis on EKS, IAM, RBAC, and modern security-first practices. • Build and optimize CI/CD pipelines with GitHub Actions and GitHub Advanced Security — enabling velocity without compromising safety. • Own observability across the stack using Datadog (metrics, logging, alerting, and tracing). • Write and maintain Terragrunt, Terraform modules and infrastructure-as-code automation. • Develop internal tools and scripts in Python to automate operational workflows and reduce manual overhead. • Document everything — from runbooks to standards — so teams stay aligned, and systems stay stable. • Actively contribute to Agile workflows using Jira, with clear tracking of work, priorities, and progress. • Participate in on-call rotations, postmortems, and continuous improvement efforts — always with a blameless, team-first mindset.
Job Requirements
- 4+ years in a Senior SRE or DevOps role supporting production cloud infrastructure at scale.
- Deep experience with AWS (IAM, EKS, VPC, EC2, Secrets Manager, Serverless) and RBAC.
- Hands-on proficiency with Terraform, Terragrunt, Helm, and container orchestration.
- Proven experience building and maintaining GitHub Actions for CI/CD, including GitHub Advanced Security features like secret scanning and code policy enforcement.
- Strong Datadog experience — building dashboards, tuning alerts, setting up monitors, and interpreting telemetry.
- Solid Python scripting experience for automation and internal tools.
- You value clear, accurate documentation as a core part of engineering, not an afterthought.
- Comfortable working in Agile/Scrum environments with well-tracked Jira workflows.
- Practical experience with resource analysis and infrastructure optimization.
Benefits
- Eligible for Annual Bonus
- Healthcare benefits, short/long-term disability coverage, life - insurance, and 401k
- Paid Parental Leave
- Nine paid holidays & Unlimited PTO
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• You will be responsible for setting technical strategy vision for your team on a multi year-long time scale, and help your team tie it together with critical, business-impacting projects. • You will collaborate across teams in the product development lifecycle by collaborating with infrastructure, product management, developer experience & analytics to ensure technical sustainability, risks and trade-offs are well understood and managed. • You will act as a force-multiplier for your team through your definition and advocacy of technical solutions and operational processes • You take ownership of your team’s operations and availability by ensuring you have the right monitoring, triage rotations, playbooks, policies, testing and alerting in place to support “keep the lights on” & on-call efforts. • You will foster a culture of quality and ownership on your team by setting code review and design standards for your team, and advocating for them beyond your team through your writing and tech talks. • You will help develop talent on your team by providing feedback and guidance, and leading by example.
• You will build amazing things using MongoDB, Next.js, Nest.js, and Node.js, alongside DevOps tools and practices. • You'll be coding, reviewing, deploying, testing, and iterating—this is our bread and butter. • You'll participate in regular Scrum ceremonies while continuously innovating and optimizing for speed, scalability, and security. • You'll help architect systems to ensure high availability and resilience, leveraging AWS services to build scalable cloud solutions. • Develop and manage CI/CD pipelines using GitHub Actions and YAML for seamless deployment. • Leverage Docker and Kubernetes for containerization and orchestration of applications. • Implement scalable DevOps solutions using AWS services like EC2, S3, Lambda, and RDS. • Build secure, robust web applications with a strong focus on security standards and protocols. • Work with cross-functional teams to make architectural decisions for system scalability and performance. • Design, develop, test, and deploy RESTful APIs and microservices. • Develop processes for infrastructure automation using Terraform, Jenkins, and Docker. • Solve complex problems in real-time, ensuring high performance and reliability. • Implement monitoring and logging solutions to ensure system health and performance.
System Reliability Engineer, Infrastructure R&D
Veeam SoftwareYour Single Backup and Data Management Platform for Cloud, Virtual and Physical
• Deploy and manage physical and virtual infrastructure for R&D teams, from bare-metal server setup to high-density, heterogeneous virtualized clusters. • Be available for periodic on-site visits to data centers to support physical hardware deployment, maintenance, and issue resolution. • Administer and support Azure DevOps Server (On-Premises and Cloud) for source code version control. • Assist R&D teams with troubleshooting and optimizing build processes. • Diagnose and resolve performance issues in high-utilization virtualization clusters and storage systems. • Design optimized, purpose-specific server and storage hardware configurations in collaboration with procurement teams. • Investigate and resolve issues reported by R&D teams and automated monitoring tools through thorough root cause analysis. • Contribute to the design and implementation of disaster recovery strategies. • Maintain and enhance internal documentation. • Identify and implement opportunities for process automation and efficiency improvements.
• Design, implement, and manage scalable cloud infrastructure on AWS. • Develop and maintain CI/CD pipelines using Bitbucket Pipelines. • Automate deployment processes. • Utilize Terraform for provisioning and managing cloud resources. • Ensure infrastructure is versioned, reproducible, and consistent across environments. • Drive the transition from traditional DevOps practices to platform development. • Develop and maintain internal platforms and tools to support development teams. • Implement monitoring solutions to ensure system reliability and performance. • Analyze and optimize system performance, identifying and resolving bottlenecks. • Collaborate with multiple development, security, and operations teams. • Implement security best practices and ensure compliance with industry standards. • Conduct regular security audits and vulnerability assessments. • Maintain comprehensive documentation of infrastructure, processes, and tools. • Share knowledge and mentor junior team members to foster a culture of continuous learning.




