CAKE.com logo
CAKE.com

Deliciously simple way to run a business and empower your team 💫

Site Reliability Engineer, SRE

DevOps EngineerDevOps EngineerOtherRemoteSeniorTeam 201-500Since 2009H1B No SponsorCompany SiteLinkedIn

Location

United States

Posted

70 days ago

Salary

0

Seniority

Senior

Job Description

Site Reliability Engineer, SRE

CAKE.com

• Scale and secure our rapidly growing infrastructure • Automate critical processes • Ensure a seamless experience for new users • Make sure the infrastructure keeps up with the growth • Ensure system scalability and high traffic handling • Define and deploy monitoring, alerting, and logging systems • Respond to and resolve production incidents • Conduct thorough post-mortems • Monitor server logs for abnormalities • Design, manage and maintain automation tools for operational processes

Job Requirements

  • 5+ years of relevant work experience
  • Working experience with AWS
  • Docker
  • Git
  • CI/CD tools like Gitlab CI, Jenkins, etc.
  • Experience with IaC tools like Terraform, CloudFormation, Ansible, Puppet, Packer
  • Proficiency with Linux and other Unix-based systems
  • Experience setting up build automation
  • Excellent understanding of security and safety best practices
  • Bachelor’s degree in Computer Science or equivalent work experience
  • Excellent written and verbal English communication skills
  • Ability to work with mixed US and EU based teams

Benefits

  • No overtime
  • No work on weekends
  • No late working hours
  • In-house learning programs
  • Tech lectures
  • Knowledge sharing
  • Remote work with provided MacBook

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Tillster logo

Site Reliability Engineer

Tillster

We’re a unified commerce platform that enables QSR restaurants to deliver personalized brand experiences & drive sales.

DevOps Engineer70 days ago
Full TimeRemoteTeam 201-500Since 2002H1B Sponsor

• Analyzing and troubleshooting large-scale distributed systems in the public cloud • Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity • Improve and maintain monitoring and logging solutions that measure availability, latency and overall system health of production systems • Provision and manage cloud Infrastructure through automation and infrastructure as code • Restore healthy operation of applications and services through sustainable incident response and blameless postmortems • Follow and monitor security and compliance best practices • A proactive approach to spotting problems, areas for improvement, and performance bottlenecks

Portugal
Job Closed

• Architect, build, and operate resilient, scalable, and self-healing cloud infrastructure on AWS. • Lead the evolution of Kubernetes and platform services to enable secure, automated, and multi-region operations. • Define and enforce Infrastructure as Code (IaC) standards using Terraform, AWS CDK, and Crossplane to ensure consistency, security, and auditability. • Drive automation across provisioning, configuration, and monitoring pipelines to reduce manual effort and operational risk. • Establish and champion reliability, observability, and performance standards across Tier-1 services, ensuring alignment with regulatory and partner requirements. • Partner with product engineering to enhance CI/CD velocity, service resilience, and visibility through shared tooling, SLOs, and platform patterns. • Lead incident reviews, root-cause analyses, and systemic reliability improvements, embedding learnings into runbooks and design practices. • Optimize cloud infrastructure for cost, performance, and fault tolerance, driving data-driven operational excellence. • Mentor and upskill engineers, shaping architectural direction and influencing design decisions across multiple teams. • Contribute to the technical strategy and roadmap for Paxos’ infrastructure platform, aligning platform scalability with business growth and compliance objectives.

New York
$210K - $240.8K / year

• Design, build, and operate scalable, highly available cloud infrastructure primarily on AWS. • Manage and evolve our Kubernetes environments to support the deployment and operation of modern, containerized applications. • Define and implement Infrastructure as Code (IaC) using tools like Terraform, CDK, or Crossplane. • Automate infrastructure provisioning, configuration, maintenance, and monitoring to reduce manual effort and improve reliability. • Apply best practices around security, observability, and cost optimization across infrastructure and services. • Manage and optimize database technologies, with a focus on Amazon RDS and Aurora. • Partner with development teams to ensure seamless deployment and integration of new features and updates. • Investigate and resolve incidents, perform root cause analysis, and implement long-term fixes. • Participate in on-call rotations and provide support for critical production systems. • Contribute to SRE best practices, internal tooling, and team knowledge sharing.

New York
$172K - $197.0K / year
Job Closed
Full TimeRemoteTeam 201-500Since 2018H1B Sponsor

• Provide solutions to customers to make them successful using our products. • Troubleshoot customer environments and engage in active triaging with customers • Build out our monitoring and alerting systems. • Build and maintain automation to ensure daily operational tasks are handled as efficiently as possible. • Help direct the architecture of the products and contribute where possible. • Own the customer experience, working directly with customers to prioritize and solve issues, meet SLAs, and provide “white glove” guidance on the path to production. • Participate remotely within a fully distributed team. • Enhance and enrich customer documentation • Work with the latest technology and multi-cloud implementations

Ireland
Job Closed