The fastest way to visualize, understand and debug software. Find the critical issues that logs and metrics can’t see.
Senior Site Reliability Engineer
Location
United Kingdom
Posted
1 day ago
Salary
£127.7K - £150.2K / year
Seniority
Senior
Job Description
Senior Site Reliability Engineer
Honeycomb.io
• Help Honeycomb scale our backend systems to support our highest-volume customers. • Build organizational trust through transparent communication, giving and receiving direct and kind feedback. • Work with other backend teams to dive deep into our stack to make sure we’re getting the most out of our infrastructure. • Be trained, become, and then train others as an Incident Commander. • Help SRE and Honeycomb develop a healthy cross-Atlantic engineering culture. • Participate in the team’s on-call rotation as the EU side of a new follow-the-sun rotation. • Help the organization navigate tradeoffs between reliability and its other goals and priorities. • Optional: act as an external ambassador through blog posts, conference talks, and presentations with support from our DevRel team.
Job Requirements
- Strong experience in AWS and Kubernetes
- Experience performing cost analysis and reduction
- Solid Helm, Terraform, and CI/CD experience
- Project management skills
- Software engineering experience (Golang is a plus, and so is performance engineering)
- Experience with Kafka or another high-volume distributed system
- Excellent written and spoken communication skills, with the ability to tailor your communication for your audience and give direct feedback when you notice something wrong
- A curiosity to learn how people and systems work, and the willingness to make them partners in your initiatives
- Familiarity with observability concepts (SLOs, instrumentation) and data-driven decision making
- Comfort operating in ambiguity, with a bias for action and experimentation
- Interest in both the technical and human sides of reliability engineering
- Experience working in geographically distributed teams
Benefits
- A stake in our success - generous equity with employee-friendly stock program
- It’s not about how strong of a negotiator you are - our pay is based on transparent levels relative to experience
- Time to recharge with unlimited PTO
- A distributed-first mindset and culture (really!)
- Home office, co-working, and internet stipend
- Full benefits coverage for employees, with additional coverage available for dependents
- Up to 16 weeks of paid parental leave, regardless of path to parenthood
- Annual development allowance
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
DevOps Engineer
BenzingaBenzinga is a dynamic and innovative financial media outlet that empowers investors with high-quality, unique content.
• Radiate knowledge about the service's infrastructure and reliability to the rest of the development team. • Identify parts of the system that do not scale, provide immediate palliative measures, and drive systemic resolution of contributing root cause(s). • Plan the growth of Benzinga's infrastructure. • Document every action so your learnings turn into repeatable actions and then into automation. • Improve the deployment process to make it as boring as possible. • Define, provision, and manage our production infrastructure using Kubernetes and cloud-native serverless deployed by way of Terraform. • Proactively identify and reduce security risks, in alignment with ongoing SOC2 auditing and reporting. • Develop security training and guidance to internal development teams. • Design, build and maintain core infrastructure pieces that allow Benzinga to scale, supporting thousands of concurrent users. • Be on an on-call rotation to respond to benzinga.com availability incidents and provide support for service engineers with customer incidents. • Debug production issues across all services and levels of the stack. • Make monitoring and alerting notify on symptoms and not on outages. • Manage day-to-day maintenance and evolution of Benzinga's Prometheus monitoring and alerting infrastructure. • Bundle Prometheus monitoring as an out-of-the-box monitoring solution for Benzinga products. • Build and maintain the benzinga.com public monitoring gateway. • Help migrate our current performance monitoring solution to Prometheus. • Improve coverage of Benzinga performance monitoring. • Create automated alerts to notify team members of regression.
Role Description We are looking for an Sr. DevOps Engineer who has hands-on experience building and managing a cloud-based infrastructure. Additionally, this engineer will be responsible for development cycles in integration/continuous deployment mode, process monitoring, and more broadly, constructing a “safety culture” within the SmithRx’s DevOps practice. Our user base is currently doubling annually, and you would share the responsibility of orchestrating a reliable, sustainable, and scalable infrastructure. - Help build and maintain a container based infrastructure that is elegant, redundant, scalable and compliant, and support the rest of the team doing the same. - Be part of SmithRx Agile development team to deliver an end-to-end automation of deployment, monitoring, and infrastructure management in a cloud environment. - Gain a deep understanding of the challenges that SmithRx faces, technical and otherwise; collaborate with other teams to identify and carry out effective solutions. - Work closely with our development team to develop and maintain CI/CD pipelines in a reproducible and secure manner. - Monitor and troubleshoot infrastructure issues, and perform root cause analysis when necessary. - Collaborate with developers to ensure that applications and services are built with scalability, reliability, and security in mind. - Organize the highest levels of systems and infrastructure availability, acting proactively. - Be a pillar of a collaborative learning culture through exploration of new technologies, application of best practices, and any other innovations you would like to experiment with. - Install, configure, test and maintain operating systems, application software and system management tools. - Develop custom scripts to increase system efficiency and lower the human intervention time on any tasks. - Evaluate application performance, identify potential bottlenecks, develop solutions, and implement them with the help of developers. - Be effective in maintaining SmithRX security program controls and best practices. - Understand the health regulatory space and maintain HIPAA compliant systems. - Make pragmatic decisions about technical tradeoffs, infrastructure costs, and resource utilization. - Be a part of on-call PagerDuty rotations. Qualifications - 5+ years of experience in DevOps. - BS or advanced degree in computer science or other related field. - Experience deploying and monitoring web applications in AWS. - Worked as an active team member for both product development and the operations teams to provide the best DevOps practices and supported their applications with feasible approaches. - Security first mindset, including demonstrated experience building secure development and test environments integrated to CI/CD pipelines and software release cycles. - Experience building and maintaining a container based infrastructure and Kubernetes. - Experience with Infrastructure as Code (Terraform experience a plus) and infrastructure testing strategies. - Experience with infrastructure automation, systems reliability, load balancing, monitoring, logging. - Experience with fully automating CI/CD pipelines with associated tools such as CircleCI and GitHub Actions. - Experience working in and architecting for regulated environments with data privacy regulations like GDPR, HIPAA preferred. - Experience working and managing SQL and NoSQL databases like Redis, Redshift, AWS RDS, PostgreSQL, BigQuery, and Snowflake. - Strong scripting skills, including shell scripts, Perl, Ruby, Python, Go, Groovy, Helm, etc. Benefits - Highly competitive wellness benefits including Medical, Pharmacy, Dental, Vision, and Life Insurance and AD&D Insurance. - Flexible Spending Benefits. - 401(k) Retirement Savings Program. - Short-term and long-term disability. - Discretionary Paid Time Off. - Paid Company Holidays. - Wellness Benefits. - Commuter Benefits. - Paid Parental Leave benefits. - Employee Assistance Program (EAP). - Well-stocked kitchen in office locations. - Professional development and training opportunities. Company Description SmithRx is a rapidly growing, venture-backed Health-Tech company. Our mission is to disrupt the expensive and inefficient Pharmacy Benefit Management (PBM) sector by building a next-generation drug acquisition platform driven by cutting edge technology, innovative cost saving tools, and best-in-class customer service. With hundreds of thousands of members onboarded since 2016, SmithRx has a solution that is resonating with clients all across the country. We pride ourselves for our mission-driven and collaborative culture that inspires our employees to do their best work. We believe that the U.S healthcare system is in need of transformation, and we come to work each day dedicated to making that change a reality. At our core, we are guided by our company values: - Integrity: Our purpose guides our actions and gives us confidence in the path ahead. With unwavering honesty and dependability, we embrace the pressure of challenging the old and exemplify ethical leadership to create the new. - Courage: We face continuous challenges with grit and resilience. We embrace the discomfort of the unknown by balancing autonomy with empathy, and ownership with vulnerability. We boldly challenge the status quo to keep moving forward—always. - Together: The success of SmithRx reflects the strength of our partnerships and the commitment of our team. Our shared values bind us together and make us one. When one falls, we all fall; when one rises, we all rise.
Site Reliability Engineer
OneStream SoftwareA comprehensive cloud-based platform to modernize the Office of the CFO.
• Implement application/infrastructure observability solutions to ensure desired application availability, reliability, and performance • Participate in regular On-Call rotations and share details related to incidents and their resolution through post-mortem reports and regular review meetings • Proactively partner with Product and Engineering teams to identify, develop, deploy, and maintain reliable systems and services • Influence and create new designs, architectures, standards, and methods for large-scale systems • Sustain a high level of reliability for key services and automated systems • Automate processes to improve reliability, performance, and availability • Update technical documentation, workflows, and knowledge base articles • Provide feedback in pull requests and peer coding reviews • Implement codified automated solutions that build integrations between Dynatrace, Azure DevOps and Jira • Solid knowledge in focused areas of OneStream Software • Ability to mentor others in several technical areas • Understanding practical use of SOC/FedRAMP controls to assist Compliance and Security teams
Staff Database Reliability Engineer – Oracle, Cloud
Rackspace TechnologyWhere enterprise AI runs and outcomes scale
• Work from home as a Staff Database Reliability Engineer • Manage databases in a cloud environment using tools like Terraform and AWS • Collaborate with cross-functional teams to ensure high availability



