Job Closed
This listing is no longer active.
Hashgraph, formerly Swirlds Labs, is a software company home to some of the brightest minds in web3.
Senior Site Reliability Engineer – Azure
Location
United States
Posted
54 days ago
Salary
0
Seniority
Senior
Job Description
Senior Site Reliability Engineer – Azure
Hashgraph
• Design and build secure, scalable Azure infrastructure from first principles for a production-grade distributed system • Develop and own Terraform-based infrastructure as code, enabling repeatable and automated deployments • Translate product and customer requirements into technical architecture and execution plans • Build and enhance platform services, APIs, and integrations that extend HashSphere capabilities • Partner across engineering, security, and product teams to deliver enterprise-ready infrastructure solutions • Contribute to operational excellence, including reliability, observability, and incident response • Support customer deployments and production environments through Tier 2 infrastructure support
Job Requirements
- Proven experience designing and building production-grade systems on Azure
- Ability to take ambiguous requirements to structured technical solutions to delivered systems
- Strong technical communication skills across engineering and non-technical stakeholders
- High ownership mindset with a bias for action and accountability
- Collaborative approach with a focus on building durable, scalable solutions
- Azure cloud services (networking, compute, identity, security, storage)
- Terraform (infrastructure as code at production scale)
- Programming experience in Go and/or Python
- Experience building greenfield infrastructure environments
- Distributed systems, high-availability architectures, or platform engineering
- CI/CD and automation tooling for infrastructure lifecycle management
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Design and build complex, mission-critical cloud architectures, ensuring security and cost optimization. • Develop and implement custom monitoring and observability solutions, including creating dashboards in DataDog. • Provide training, mentoring, and promote SRE best practices to raise the maturity of the technical community. • Participate in discussions and define standards for Infrastructure as Code (IaC), continuous integration, and automation via GitOps. • Collaborate on the continuous improvement of the development platform, offering technical feedback and suggestions for enhancement. • Serve as a technical reference, supporting squads in resolving complex cloud infrastructure and observability issues.
• Manage SLAs that require sub-hour attention and respond to incidents on a daily basis • React, resolve, and escalate incidents as necessary, collaborating with the technical and business teams. • Build and implement incident processes to ensure the organization's ability to operate our connection at scale, working alongside the business and technology teams. • Take ownership, plan, and execute both planned and on-demand maintenance of our systems, collaborating with impacted and dependent technology and business teams. • Continuously improve our product, including observability and integration with internal services and processes.
Senior DevOps Engineer
DotmaticsFounded in 2005, Dotmatics is self-described as the world’s largest research and development scientific software platform, used by leading researchers in biopharma, academia, and
• Maintain and enhance our cloud-based infrastructure • Spearhead our ISO 27001 effort and maintain compliance with corporate security requirements • Work in a small yet highly effective and efficient specialised team • Maintain and operate the software build server and continuous integration pipelines for cross-platform desktop and HPC applications • Manage release processes and versioned software distribution • Support and maintain license control systems and related backend services • Manage and configure AWS services, including: EC2, RDS, S3, ECR, IAM, WAF, CloudFront, Identity, CloudTrail, Security Lake • Ensure security, scalability, and reliability of cloud infrastructure • Support operational tooling across AWS and other cloud providers • Administrator for a diverse set of services (Office, AWS, Github, Mailchimp, etc)
Database Reliability Engineer – Core Team
ClickHouseClickHouse, Inc. is a database management system that allows users to generate analytical reports using real-time SQL queries. The company’s technology works faster than traditio
• Continuously improve the reliability and performance of ClickHouse core. • Improve and create metrics and alerts for ClickHouse to be able to identify and prevent problems in production before they affect customers. • Dig deeper into the most common problems encountered by customers in Clickhouse Core to identify the root cause of problems and submit bug fixes, issue reports and suggest improvements. • Enhance and refine incident response processes and post-mortem analysis for ClickHouse core related outages including working with support and Cloud teams to communicate to the impacted customers. • Plan, enable, and drive Chaos initiatives across Engineering teams, based upon internal priorities. • Manage on-call processes to respond to performance and reliability issues, and establish best practices for coordinating escalation to resolve issues and minimize customer impact.



