Job Closed

This listing is no longer active.

Yelp logo
Yelp

Yelp helps people find great local businesses through crowdsourced reviews and paid advertising services. As an employer, Yelp aims to build a world-class team

Site Reliability Engineer, Production Reliability

Location

Canada

Posted

61 days ago

Salary

$135K - $185K / year

Seniority

Senior

Job Description

Site Reliability Engineer, Production Reliability

Yelp

• Bring your curiosity, tenacity and experience • Working with engineers across Yelp in supporting new features and services • Integrating tools to monitor platform stability and performance • Help scale our Kubernetes clusters and AWS-based infrastructure while maintaining our platform's SLOs • Ensure the reliability of Yelp’s primary datastores (MySQL and Cassandra) • Troubleshoot site issues using industry-leading tools like Splunk, Grafana, and Prometheus • Automate everything with Python, Puppet, Git, Jenkins, Terraform and more! • Develop custom tools, when off-the-shelf solutions don’t work at our scale and contribute upstream to open source projects • Design and implement new systems, tests, and procedures • Participate in light on-call rotations

Job Requirements

  • Mastery of Linux (we use Ubuntu but any distro is fine)
  • Command of your favorite modern programming language to appreciate delivering safe and secure services: Python, Typescript, Ruby, Go, Rust, Java, C++, etc.
  • A solid understanding of Internet fundamental technologies in delivering services on the Internet (TCP/IP, HTTP, DNS, etc).
  • Experience with public cloud platforms (we use AWS and GCP, but others are also fine) and related tooling (Terraform, Puppet, Chef, Ansible etc.).
  • Experience with Linux containerisation and orchestration (e.g., Docker, Podman and Kubernetes).
  • Self-motivated to investigate, fix and improve Yelp in an ever changing environment.
  • Leading, Collaborating and Sharing technical activities with global teams.
  • Own the total lifecycle of a system.

Benefits

  • health insurance
  • flexible work arrangements
  • paid time off

Related Categories

Related Job Pages

More DevOps Engineer Jobs

ProDev US, LLC logo

Senior DevOps Engineer

ProDev US, LLC

ProDev provides clients with best-of-the best, on-shore, on-demand software development talent.

DevOps Engineer61 days ago
ContractRemoteTeam 11-50H1B No Sponsor

• Help clients design, implement, maintain, and support cloud infrastructure. • Work with development teams to build, test, and deploy changes to new and existing software platforms.

United States
BMO logo

Site Reliability Engineer

BMO

At BMO we are driven by a shared Purpose: Boldly Grow the Good in business and life. It calls on us to create lasting, positive change for our customers, our communities and our people. By working together, innovating and pushing boundaries, we transform lives and businesses, and power economic growth around the world. As a member of the BMO team you are valued, respected and heard, and you have more ways to grow and make an impact. We strive to help you make an impact from day one – for yourself and our customers. We’ll support you with the tools and resources you need to reach new milestones, as you help our customers reach theirs. From in-depth training and coaching, to manager support and network-building opportunities, we’ll help you gain valuable experience, and broaden your skillset. To find out more visit us at BMO Careers .

DevOps Engineer61 days ago
Full TimeRemoteTeam 10,001

Application Deadline: 04/01/2026 Address: VIRTUAL(R)59 - REMOTE/TELETRAVAIL - ON - BMO Job Family Group: Technology Designs how code is deployed, configured, and monitored, as well as the availability, latency, change management, emergency response, and management capacity of services in production. Helps teams to determine what new features can be incorporated and when by using service-level agreements (SLAs) to define the required reliability of the system through service-level indicators (SLI) and service-level objectives (SLO). Applies software engineering to automate IT operations tasks - e.g. production system management, change management, incident response, and emergency response. Acts as a link between the development and operations teams. Applies expertise to conduct chaos tests and performance test for critical business requirements.​ - Deploys, configures, and monitors code as well as the availability, latency, change management, emergency response, and management capacity of services in production.​ - Helps the development and operations teams establish Service level indicators (SLIs), Service level objectives (SLOs) and Error budgets​. - Performs automation to increase efficiency and decrease risk like log analysis, performance tuning, patch application, testing of production settings, incident response, and post-mortem analysis​. - Supports in system design consulting, platform management, and capacity planning​. - Debugs production issues across services and levels of the technology stack.​ - Improves service health visibility by recording metrics, logs, and traces across all services in order to pinpoint the reasons of an incident.​ - Computes the cost of SLA breaches and assists management in calculating the impact of system reliability. Helps development and operations teams understand the cost of downtime.​ - Focus is primarily on business/group within BMO; may have broader, enterprise-wide focus. - Exercises judgment to identify, diagnose, and solve problems within given rules. - Works independently on a range of complex tasks, which may include unique situations. - Broader work or accountabilities may be assigned as needed. - Take measured risks while protecting the bank by applying our Risk Management Framework in the execution of your role, in line with our Risk Culture and within our approved Risk Appetite, making sound and risk informed decisions that align to business strategy, protect assets, and adhere to applicable policy documents (Frameworks, Policies, Standards, Procedures and Supporting documents), laws and regulations. Qualifications: Foundational level of proficiency: - DevOps. - Cybersecurity and privacy concepts, principles and solutions. - Emotional agility. - IT infrastructure library. - Robot Process Automation. - Cloud Computing. - Configuration Management. - Container Orchestration. - System Design and Implementation. - Incident management. - Learning Agility. - Building and managing relationships. Intermediate level of proficiency: - API Management. - Automation and Automation Pipelines. - Automated Testing. - Quality Assurance and Control. - Verbal & written communication skills. - Collaboration & team skills. - Analytical and problem solving skills. - Data driven decision making. - Typically between 4 - 6 years of relevant experience and post-secondary degree in related field of study or an equivalent combination of education and experience. - Technical proficiency gained through education and/or business experience. Salary: $61,600.00 - $113,900.00 Pay Type: Salaried The above represents BMO Financial Group’s pay range and type. Salaries will vary based on factors such as location, skills, experience, education, and qualifications for the role, and may include a commission structure. Salaries for part-time roles will be pro-rated based on number of hours regularly worked. For commission roles, the salary listed above represents BMO Financial Group’s expected target for the first year in this position. BMO Financial Group’s total compensation package will vary based on the pay type of the position and may include performance-based incentives, discretionary bonuses, as well as other perks and rewards. BMO also offers health insurance, tuition reimbursement, accident and life insurance, and retirement savings plans. To view more details of our benefits, please visit: https://jobs.bmo.com/global/en/Total-Rewards About Us At BMO we are driven by a shared Purpose: Boldly Grow the Good in business and life. It calls on us to create lasting, positive change for our customers, our communities and our people. By working together, innovating and pushing boundaries, we transform lives and businesses, and power economic growth around the world. As a member of the BMO team you are valued, respected and heard, and you have more ways to grow and make an impact. We strive to help you make an impact from day one – for yourself and our customers. We’ll support you with the tools and resources you need to reach new milestones, as you help our customers reach theirs. From in-depth training and coaching, to manager support and network-building opportunities, we’ll help you gain valuable experience, and broaden your skillset. To find out more visit us at https://jobs.bmo.com/ca/en. BMO is committed to an inclusive, equitable and accessible workplace. By learning from each other’s differences, we gain strength through our people and our perspectives. Accommodations are available on request for candidates taking part in all aspects of the selection process. To request accommodation, please contact your recruiter. Note to Recruiters: BMO does not accept unsolicited resumes from any source other than directly from a candidate. Any unsolicited resumes sent to BMO, directly or indirectly, will be considered BMO property. BMO will not pay a fee for any placement resulting from the receipt of an unsolicited resume. A recruiting agency must first have a valid, written and fully executed agency agreement contract for service to submit resumes.

Canada
C$61.6K - C$113K / year
Job Closed
Insight Global logo

Site Reliability Engineer

Insight Global

Founded in 2001, Insight Global (IG) offers enhanced staffing, placement staffing, and temporary-to-permanent staffing services, including long-term and short-term job assignments.

DevOps Engineer61 days ago

• Be on a Pagerduty on-call rotation to respond to production incidents • Maintain and develop monitoring and alerting solutions to improve the on-call experience • Design, build and maintain scalable infrastructure for running our systems • Assist product developers in debugging and triaging production issues

United States
RemoteStar logo

Senior Site Reliability Engineer Manager

RemoteStar

Scale Faster, Reduce Costs, Meet Diversity Targets

DevOps Engineer61 days ago
Full TimeRemoteTeam 11-50Since 2020H1B No Sponsor

• Ensuring the reliability, scalability, and performance of infrastructure and services • Taking full ownership of the production estate from both a technical and process perspective • Providing consistent smooth operation of live systems • Designing and operating a new incident tracking process • Creating and maintaining high-end monitoring and automation tooling • Driving automation initiatives to improve operational workflows • Developing and maintaining tools, scripts, and dashboards to monitor system health • Building a first-class SRE team and providing leadership and guidance

United Kingdom