Job Closed

This listing is no longer active.

Jobgether logo
Jobgether

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team. We appreciate your interest and wish you the best! Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time. #LI-CL1 We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.

Sr. Site Reliability Engineer

Location

United States

Posted

78 days ago

Salary

$150K - $200K / year

Seniority

Senior

Job Description

Sr. Site Reliability Engineer

Jobgether

Role Description This role provides a high-impact opportunity to ensure the stability, scalability, and reliability of critical cloud services across a large-scale production environment. You will combine hands-on technical expertise with strategic ownership, driving automation, monitoring, and incident response to deliver consistently high-performing systems. Working closely with engineering, product, and operations teams, you will influence system design, embed reliability practices, and lead cross-functional initiatives that reduce operational toil. The ideal candidate thrives in a collaborative, fast-paced environment, enjoys solving complex problems, and has deep experience with modern cloud infrastructure, automation, and distributed systems. - Own and drive the availability, durability, and performance of key services across all production environments - Lead complex technical projects from discovery to resolution, demonstrating high-level ownership - Define, implement, and enforce service health standards, including SLIs, SLOs, and error budget policies - Lead incident response, post-incident reviews, and implement long-term reliability improvements and architectural enhancements - Mentor team members and act as a subject matter expert in ITIL/OSS processes, including incident, change, problem, and capacity management - Architect and deploy scalable automation solutions to reduce manual tasks and improve operational efficiency - Maintain and improve monitoring, logging, alerting frameworks, and CI/CD pipelines using tools like Prometheus, Grafana, ELK, Terraform, Ansible, and Jenkins - Collaborate with engineering, product, and operations teams on resilient system design, capacity planning, disaster recovery, and vendor management - Develop and maintain operational playbooks, runbooks, and documentation to promote a reliability-first culture Qualifications - Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience) - 8+ years of progressive experience in site reliability, systems engineering, or operations - Extensive experience designing, scaling, and operating large-scale, production-grade distributed systems - Expert-level Linux administration and advanced troubleshooting skills - Proficiency in at least one modern scripting/programming language (Python or Go strongly preferred) - Experience with container orchestration platforms (Kubernetes, Docker) and microservices architecture - Expertise with infrastructure-as-code and Hashicorp tools (Terraform, Vault, Nomad) - Strong understanding of incident response, root cause analysis, and operational best practices - Knowledge of ITIL/OSS practices, SLIs/SLOs, and cloud platforms (AWS, GCP, Azure) - Excellent problem-solving, collaboration, and communication skills, with a proactive approach to operational improvements Benefits - Competitive salary range of $150,000 – $200,000, plus RSU grants and ESPP program - Comprehensive healthcare coverage, including dental and vision - Flexible vacation policy, maternity/paternity leave, and childcare bonuses - MacBook Pro and generous stipend to personalize your workstation - Fertility treatment support and learning & development programs - Commuter benefits and a culture supporting a healthy work-life balance - Opportunities to work in a diverse, inclusive, and globally distributed team Company Description

Related Job Pages

More Software Engineer Jobs

Fullmind Learning logo

Virtual Computer Science (California Certified)

Fullmind Learning

Fullmind Learning, formerly iTutor, is an e-learning company on a mission to ensure all children have access to an exceptional education. The company partners w

$36 hourly. Rates are negotiable and subject to change Fully remote, 1099 contract opportunity. Monday to Thursday 11:10 -11:50 AM PST Valid California Teacher Certification is required to teach Computer Science. Must be authorized to work in the United States Join our pool of educators who have access to our educator portal, where you can select the jobs aligned to your certification as they become available according to our school and district partners! This is a 1099 Independent Contractor position following the school district's calendar. Immediate start dates are based on available placement opportunities upon completion of the application process. Fullmind partners with hundreds of U.S. schools to ensure every child has access to education. We fill teacher vacancies by live-streaming certified educators directly to students. As a Fullmind educator, you’ll deliver virtual instruction and guide students to course completion! Learn more: https://www.fullmindlearning.com/teach As a Fullmind educator, you will: - Have access to our educator portal where you can select the jobs you take on as a Fullmind educator. - Promote creativity and excitement in the virtual learning environment. - Create strategies to engage and nurture student learning and student relationships. - Create lesson plans aligned with the class curriculum. - Keep track of student grades and performance

California
Job Closed
Tucows Domains logo

Manager, Engineering Operations

Tucows Domains

The largest wholesale domain registrar in the world. We are OpenSRS, Enom, Ascio, Hover and Tucows Registry Services.

Full TimeRemoteTeam 51-200H1B No Sponsor

• Lead the Engineering Operations Team • Build, mentor, and support a team of infrastructure and operations engineers. • Set clear priorities, goals, and performance expectations for the team. • Oversee the reliability and performance of infrastructure across AWS and hybrid environments. • Drive improvements in observability, incident response, and operational processes. • Partner with cross-functional groups such as Security Operations, Compliance and diverse Engineering teams to design and operate scalable, resilient infrastructure.

Canada
$154K - $168K / year

• Directs the activities of the engineering organization in conjunction with the team leads • Ensures processes and training are in place to deliver predictable, high-quality engineering deliverables • Helps drive a culture of accountability, initiative, and ownership within the engineering organization • Produces regular data and reporting to understand progress towards goals, team and individual performance, and the quality of engineering deliverables • Works in concert with the Product and Support teams to ensure quality issues are properly identified and handled • Establishes an understandable and scalable structure to help develop and scale the level of engineering team members • Ensures that high-quality team members are attracted to the team, selected, and retained • Works with Product to ensure upcoming work is understood, accurately estimated, and technically ready • Always focuses on building up and enabling others, but occasionally rolls up sleeves and leads from the front • Periodically presents to leadership and the company on engineering organization and results

Colorado + 1 moreAll locations: Colorado | Utah
$200K - $220K / year
Job Closed
Full TimeRemoteTeam 51-200

Software Developer Solovis is a leading portfolio management and analytics platform helping institutional investors navigate todays complex global markets with clarity and confidence. Backed by Insight Partners, were building the next chapter of growth by investing in people and product to raise the bar on quality and client outcomes. Our team is driven by a culture of disciplined execution, humility, and curiosity where AI is at the core of how we operate, innovate, and serve clients. At Solovis, youll join a tech-forward, growth-minded team that believes in learning fast, thinking big, and delivering meaningful impact for asset owners worldwide. Our companies are not the largest or flashiest, but they are among the best-run software businesses, creating value for customers and shareholders at an accelerated pace. To date, our team has built six platform companies, each culminating in multiple liquidity transactions with multi-billion-dollar valuations. The Software Developer delivers cutting-edge software solutions that drive the company's rapid growth and reinforces the company's position as a leader in the enterprise software market. The role involves collaborating closely with product and development teams to design, develop, and deploy software that meets high standards for quality and innovation. The position will leverage your expertise in agile methodologies, unit testing, and collaboration to ensure software solutions that are scalable, reliable, and exceed customer expectations. The Software Developer also participates in the entire software development lifecycle, from conceptualization to deployment, ensuring that the company remains agile and responsive to market demands. Key Responsibilities - Ensure consistent and reliable sprint performance by effectively managing and completing committed tasks. - Contribute to the team's timely delivery, consistently meeting or exceeding the Release to Schedule targets. - Consistently achieve or surpass software quality targets by effectively managing and reducing escaped defects in software releases. - Consistently work towards reducing the backlog of Escaped Open Defects, improving product quality over time. - Produce scalable and efficient code that performs well, with documentation that is comprehensive and easy to understand. Key Qualities - Proficiency in utilizing data and analytics to make informed decisions and drive business strategies - Ability to identify potential issues before they arise and effectively address them to mitigate risks and capitalize on opportunities - Detail-oriented and organized, this individual prioritizes effectively, excels at time management, and anticipates well in decision making Skills - Experience in agile methodology/scrum - Experience in unit testing - Experience in web development experience

United States
Job Closed