Job Closed
This listing is no longer active.
Our mission is to build everyday entertainment platforms For Everybody.
Site Reliability Engineer
Location
United States
Posted
101 days ago
Salary
0
Seniority
Senior
Job Description
Site Reliability Engineer
Sporty Group
• Planning and securely deploying into new regions • Improving all aspects of our AWS infrastructure • Monitoring all releases making sure they're smooth • Managing multiple K8s clusters • Searching new tech, and having the opportunity to implement them with the team
Job Requirements
- 4+ years experience in a SRE or DevOps position, or if you're a Software Engineer looking to transition then that's also great!
- You're a veteran in AWS technologies
- Experience deploying and releasing into new regions
- You've managed multiple Kubernetes clusters in commercial environments
- Monitoring and logging massive environments
Benefits
- A competitive salary + individual performance based bonuses every quarter
- 28 days paid annual leave
- Our core working hours are 10am-3pm in your local time zone with flexibility outside of this
- Referral bonuses & flash bonuses
- Top of the line equipment
- Annual company retreats to provide great internal networking opportunities
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Design and implement AWS infrastructure using Terraform • Manage CI/CD pipelines for zero-downtime deployments • Optimize cloud resources for performance and cost-efficiency • Conduct security audits and compliance checks • Maintain and optimize database performance with backups
• Operate and improve large-scale Kubernetes infrastructure in production • Build and maintain GitOps-driven infrastructure workflows (ArgoCD/Flux) • Develop automation and tooling that enable self-service infrastructure • Improve system reliability, scalability, and observability • Reduce operational toil through thoughtful engineering solutions • Partner with product engineers to support safe, reliable deployments • Contribute to infrastructure cost visibility and efficiency initiatives
Senior DevOps Engineer, Application
Sedona DigitalExperts in software development and cloud technologies.
• Collaborate with the development teams to plan, deploy and administer applications running on AWS and Kubernetes • Maintain the AWS infrastructure using Infrastructure as Code principles and Terraform • Initiate and drive the adoption of technologies and use of good patterns for development and operations • Mentoring and overseeing other engineers • Lead and manage specific technical domains or projects, ensuring architectural consistency, scalability, and alignment with business objectives • Oversee and improve engineering processes and practices, identifying opportunities for automation, optimization, and enhanced system reliability • Implement and oversee cloud migration projects • Provide guidance for re-platform and re-factor of current cloud infrastructure • Communicate to the senior leadership, the cloud migration projects progress • Preserve business continuity 24/7 with minimum downtime and financial impact • Investigate and perform regular assessments of cloud deployments in compliance with the company’s standards and best practices • Ensure the company’s deployment standards and pillars are followed in cloud solutions and resources • Stay up to date with the latest tools and trends in the industry • Follow and provide training regarding new and current technologies and services used
Senior Site Reliability Engineer, Security
CentralReachElevating Autism & IDD Care through Technology
• Responsible for availability, latency, performance, efficiency, monitoring/observability, emergency response, capacity planning, setting and maintaining SLOs, SLIs and Error Budgets, creating dashboards. • Analyze, troubleshoot and resolve operational challenges contributing to defined SLO's. • Manage site stability, performance, reliability, and maintain uptime for production environments. • Develop a fully automated multi-environment observability stack based on the existing system and extend it to predict capacity needs based on the usage patterns. • Strive for automation to reduce toil and increase development velocity. • Perform application-specific production support, incident management, change management, problem management, RCAs, and service restoration as needed. • Identify changes for the product architecture from the reliability, performance and availability perspective with a data driven approach. • Document resolution run books and standard operating procedures. • Actively look for opportunities to improve the availability and performance of the system by applying the learnings from monitoring and observation. • Collaborate with software development teams in the release management process and to shape the future roadmap and establish strong operational readiness across teams. • Implementation of reliability and observability tools (like New Relic, Prometheus, Grafana etc.,). • Collaborates with Security team and other platform engineering teams to build reliable, maintainable, and scalable solutions that improve our security posture.




