CAKE.com

Deliciously simple way to run a business and empower your team 💫

Site Reliability Engineer, SRE

DevOps EngineerDevOps EngineerOther Remote SeniorTeam 201-500Since 2009H1B No SponsorCompany Site LinkedIn

Location

United States

Posted

70 days ago

Salary

Seniority

Senior

Bachelor Degree5 yrs expEnglishAnsible AWS Docker Jenkins Linux Packer Puppet Terraform Unix

Job Description

• Scale and secure our rapidly growing infrastructure • Automate critical processes • Ensure a seamless experience for new users • Make sure the infrastructure keeps up with the growth • Ensure system scalability and high traffic handling • Define and deploy monitoring, alerting, and logging systems • Respond to and resolve production incidents • Conduct thorough post-mortems • Monitor server logs for abnormalities • Design, manage and maintain automation tools for operational processes

Job Requirements

5+ years of relevant work experience
Working experience with AWS
Docker
Git
CI/CD tools like Gitlab CI, Jenkins, etc.
Experience with IaC tools like Terraform, CloudFormation, Ansible, Puppet, Packer
Proficiency with Linux and other Unix-based systems
Experience setting up build automation
Excellent understanding of security and safety best practices
Bachelor’s degree in Computer Science or equivalent work experience
Excellent written and verbal English communication skills
Ability to work with mixed US and EU based teams

Benefits

No overtime
No work on weekends
No late working hours
In-house learning programs
Tech lectures
Knowledge sharing
Remote work with provided MacBook

Related Categories

DevOps Engineer

Related Job Pages

More Remote Jobs

More DevOps Engineer Jobs

Site Reliability Engineer

Tillster

We’re a unified commerce platform that enables QSR restaurants to deliver personalized brand experiences & drive sales.

DevOps Engineer70 days ago

Full Time RemoteTeam 201-500Since 2002H1B Sponsor

Company Site LinkedIn

• Analyzing and troubleshooting large-scale distributed systems in the public cloud • Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity • Improve and maintain monitoring and logging solutions that measure availability, latency and overall system health of production systems • Provision and manage cloud Infrastructure through automation and infrastructure as code • Restore healthy operation of applications and services through sustainable incident response and blameless postmortems • Follow and monitor security and compliance best practices • A proactive approach to spotting problems, areas for improvement, and performance bottlenecks

Ansible AWS Distributed Systems Python TypeScript

View details: Site Reliability Engineer

Portugal

Apply

Job Closed

Staff Site Reliability Engineer, Platform Engineering

Paxos

DevOps Engineer70 days ago

Other Remote

Company Site

• Architect, build, and operate resilient, scalable, and self-healing cloud infrastructure on AWS. • Lead the evolution of Kubernetes and platform services to enable secure, automated, and multi-region operations. • Define and enforce Infrastructure as Code (IaC) standards using Terraform, AWS CDK, and Crossplane to ensure consistency, security, and auditability. • Drive automation across provisioning, configuration, and monitoring pipelines to reduce manual effort and operational risk. • Establish and champion reliability, observability, and performance standards across Tier-1 services, ensuring alignment with regulatory and partner requirements. • Partner with product engineering to enhance CI/CD velocity, service resilience, and visibility through shared tooling, SLOs, and platform patterns. • Lead incident reviews, root-cause analyses, and systemic reliability improvements, embedding learnings into runbooks and design practices. • Optimize cloud infrastructure for cost, performance, and fault tolerance, driving data-driven operational excellence. • Mentor and upskill engineers, shaping architectural direction and influencing design decisions across multiple teams. • Contribute to the technical strategy and roadmap for Paxos’ infrastructure platform, aligning platform scalability with business growth and compliance objectives.

AWS Amazon EC2 Kubernetes PostgreSQL Python Terraform

View details: Staff Site Reliability Engineer, Platform Engineering

New York

$210K - $240.8K / year

Apply

Senior Site Reliability Engineer

Paxos

DevOps Engineer70 days ago

Other Remote

Company Site

• Design, build, and operate scalable, highly available cloud infrastructure primarily on AWS. • Manage and evolve our Kubernetes environments to support the deployment and operation of modern, containerized applications. • Define and implement Infrastructure as Code (IaC) using tools like Terraform, CDK, or Crossplane. • Automate infrastructure provisioning, configuration, maintenance, and monitoring to reduce manual effort and improve reliability. • Apply best practices around security, observability, and cost optimization across infrastructure and services. • Manage and optimize database technologies, with a focus on Amazon RDS and Aurora. • Partner with development teams to ensure seamless deployment and integration of new features and updates. • Investigate and resolve incidents, perform root cause analysis, and implement long-term fixes. • Participate in on-call rotations and provide support for critical production systems. • Contribute to SRE best practices, internal tooling, and team knowledge sharing.

AWS Amazon EC2 Kubernetes PostgreSQL Python Terraform

View details: Senior Site Reliability Engineer

New York

$172K - $197.0K / year

Apply

Job Closed

Senior Customer Reliability Engineer – Infrastructure

Astronomer

Modern Data Orchestration

DevOps Engineer70 days ago

Full Time RemoteTeam 201-500Since 2018H1B Sponsor

Company Site LinkedIn

• Provide solutions to customers to make them successful using our products. • Troubleshoot customer environments and engage in active triaging with customers • Build out our monitoring and alerting systems. • Build and maintain automation to ensure daily operational tasks are handled as efficiently as possible. • Help direct the architecture of the products and contribute where possible. • Own the customer experience, working directly with customers to prioritize and solve issues, meet SLAs, and provide “white glove” guidance on the path to production. • Participate remotely within a fully distributed team. • Enhance and enrich customer documentation • Work with the latest technology and multi-cloud implementations

Airflow AWS Azure Distributed Systems GCP Kubernetes Linux Python

View details: Senior Customer Reliability Engineer – Infrastructure

Ireland

Apply

Job Closed

Site Reliability Engineer, SRE

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Site Reliability Engineer

Staff Site Reliability Engineer, Platform Engineering

Senior Site Reliability Engineer

Senior Customer Reliability Engineer – Infrastructure