Job Closed
This listing is no longer active.
We’re a unified commerce platform that enables QSR restaurants to deliver personalized brand experiences & drive sales.
Site Reliability Engineer
Location
Portugal
Posted
72 days ago
Salary
0
Seniority
Senior
Job Description
Site Reliability Engineer
Tillster
• Analyzing and troubleshooting large-scale distributed systems in the public cloud • Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity • Improve and maintain monitoring and logging solutions that measure availability, latency and overall system health of production systems • Provision and manage cloud Infrastructure through automation and infrastructure as code • Restore healthy operation of applications and services through sustainable incident response and blameless postmortems • Follow and monitor security and compliance best practices • A proactive approach to spotting problems, areas for improvement, and performance bottlenecks
Job Requirements
- Ability to program with one or more high level languages, ex: Typescript, Python, etc
- Configuration Management and Infrastructure as Code (e.g.: CloudFormation, Ansible)
- Monitoring and Alerting tools, ex: AWS Cloudwatch, New Relic, etc
- Incident management/on-call, ex: PagerDuty, etc
- Gather and analyze metrics to assist in performance tuning and fault finding
- Bachelor's degree from a four-year college or university, or three to four years related experience and/or training; or equivalent combination of education and experience.
- 3+ years of software engineering and/or IT operations and infrastructure experience preferred
Benefits
- Compensation competitive to market and geographical location.
- Meal allowance for each day worked available through meal card.
- Home/Office allowance reimbursement per calendar month, pro-rated based on employment start date.
- Health insurance: Tillster pays the premium for employee private health insurance. Employees have the option to add their spouse/dependents at the employee’s cost.
- Holidays: Up to 14 federal and local/municipal holidays in accordance with applicable Portuguese Labour laws, dependent on your employment start date.
- Vacation: Up to 22 days of vacation every holiday year, pro-rated based on employment start date.
- Education, Learning & Development: We offer Udemy Learning courses; and ongoing learning and development opportunities.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Staff Site Reliability Engineer, Platform Engineering
PaxosPaxos is a regulated blockchain infrastructure company building transparent and transformative financial solutions.
• Architect, build, and operate resilient, scalable, and self-healing cloud infrastructure on AWS. • Lead the evolution of Kubernetes and platform services to enable secure, automated, and multi-region operations. • Define and enforce Infrastructure as Code (IaC) standards using Terraform, AWS CDK, and Crossplane to ensure consistency, security, and auditability. • Drive automation across provisioning, configuration, and monitoring pipelines to reduce manual effort and operational risk. • Establish and champion reliability, observability, and performance standards across Tier-1 services, ensuring alignment with regulatory and partner requirements. • Partner with product engineering to enhance CI/CD velocity, service resilience, and visibility through shared tooling, SLOs, and platform patterns. • Lead incident reviews, root-cause analyses, and systemic reliability improvements, embedding learnings into runbooks and design practices. • Optimize cloud infrastructure for cost, performance, and fault tolerance, driving data-driven operational excellence. • Mentor and upskill engineers, shaping architectural direction and influencing design decisions across multiple teams. • Contribute to the technical strategy and roadmap for Paxos’ infrastructure platform, aligning platform scalability with business growth and compliance objectives.
Senior Site Reliability Engineer
PaxosPaxos is a regulated blockchain infrastructure company building transparent and transformative financial solutions.
• Design, build, and operate scalable, highly available cloud infrastructure primarily on AWS. • Manage and evolve our Kubernetes environments to support the deployment and operation of modern, containerized applications. • Define and implement Infrastructure as Code (IaC) using tools like Terraform, CDK, or Crossplane. • Automate infrastructure provisioning, configuration, maintenance, and monitoring to reduce manual effort and improve reliability. • Apply best practices around security, observability, and cost optimization across infrastructure and services. • Manage and optimize database technologies, with a focus on Amazon RDS and Aurora. • Partner with development teams to ensure seamless deployment and integration of new features and updates. • Investigate and resolve incidents, perform root cause analysis, and implement long-term fixes. • Participate in on-call rotations and provide support for critical production systems. • Contribute to SRE best practices, internal tooling, and team knowledge sharing.
• Provide solutions to customers to make them successful using our products. • Troubleshoot customer environments and engage in active triaging with customers • Build out our monitoring and alerting systems. • Build and maintain automation to ensure daily operational tasks are handled as efficiently as possible. • Help direct the architecture of the products and contribute where possible. • Own the customer experience, working directly with customers to prioritize and solve issues, meet SLAs, and provide “white glove” guidance on the path to production. • Participate remotely within a fully distributed team. • Enhance and enrich customer documentation • Work with the latest technology and multi-cloud implementations
• Architect, implement, and maintain secure cloud infrastructure (Azure, AWS, or GCP). • Lead containerization and orchestration efforts using Docker and Kubernetes. • Design, build, and optimize CI/CD pipelines incorporating automated testing. • Develop and maintain infrastructure-as-code (Terraform, ARM, or equivalent). • Establish observability standards using modern monitoring tools. • Own reliability engineering practices including disaster recovery strategies. • Implement secure architecture patterns, IAM controls, secrets management. • Collaborate closely with software engineers to embed DevOps best practices. • Contribute to architectural decisions and define cloud platform standards. • Lead post-incident sessions for root-cause analysis when needed.



