Job Closed
This listing is no longer active.
Site Reliability Engineer
Location
United States + 29 moreAll locations: United States | Canada | Brazil | Colombia | Argentina | Chile | Venezuela | Bolivia | Ecuador | French Guiana | Guyana | Paraguay | Peru | Suriname | Uruguay | Mexico | Costa Rica | El Salvador | Guatemala | Honduras | Nicaragua | Panama | Dominican Republic | Puerto Rico | Bahamas | Guadeloupe | Haiti | Jamaica | Martinique | Montserrat
Posted
108 days ago
Salary
0
No structured requirement data.
Job Description
Site Reliability Engineer
asymmetric.re
This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more. Role Description We are looking for a Site Reliability Engineer to join Asymmetric Research initially on a six-month contract engagement, with a strong opportunity to extend into a full-time position. In this role, you will design, operate, and scale mission-critical blockchain infrastructure supporting leading L1/L2 networks and DeFi protocols, working within a high-trust team to deliver secure, highly available, production-grade systems while driving automation, reliability, and operational excellence across our globally distributed environments. - Manage and maintain a globally distributed blockchain infrastructure fleet - Design, architect, deploy, and operate production-grade infrastructure services - Implement and maintain infrastructure-as-code across development, staging, and production environments - Ensure high availability and performance of mission-critical systems - Contribute to automation, CI/CD pipelines, and operational tooling - Monitor system health and respond to incidents with strong troubleshooting fundamentals - Uphold the highest standards of integrity, professionalism, and operational discipline Qualifications - 2+ years of experience in a Site Reliability, DevOps, or Infrastructure Engineering role - Strong experience managing Linux systems and network infrastructure - Hands-on experience with load balancers and high-availability technologies (e.g., HAProxy, ALB/ELB) - Experience with configuration management tools (e.g., Ansible, Chef, Puppet, SaltStack) - Solid troubleshooting skills across hardware, networking, and software systems - Development experience in Go, Python, or Rust - Experience building and maintaining CI/CD pipelines and automated deployment workflows - Experience with open-source monitoring and observability tools (e.g., Grafana, Loki, Prometheus, Alertmanager) Requirements - Experience operating distributed systems using tools such as Nomad or Kubernetes - Familiarity with blockchain infrastructure, including Bitcoin, Ethereum, Solana, Cosmos, or Move-based ecosystems Benefits - 25-days paid vacation - Office and equipment stipend - Pension / 401K programs - Life Insurance - Premium Healthcare - Competitive Base Salary - Lucrative Bonus Programs
Job Requirements
- 2+ years of experience in a Site Reliability, DevOps, or Infrastructure Engineering role
- Strong experience managing Linux systems and network infrastructure
- Hands-on experience with load balancers and high-availability technologies (e.g., HAProxy, ALB/ELB)
- Experience with configuration management tools (e.g., Ansible, Chef, Puppet, SaltStack)
- Solid troubleshooting skills across hardware, networking, and software systems
- Development experience in Go, Python, or Rust
- Experience building and maintaining CI/CD pipelines and automated deployment workflows
- Experience with open-source monitoring and observability tools (e.g., Grafana, Loki, Prometheus, Alertmanager)
- Experience operating distributed systems using tools such as Nomad or Kubernetes
- Familiarity with blockchain infrastructure, including Bitcoin, Ethereum, Solana, Cosmos, or Move-based ecosystems
Benefits
- 25-days paid vacation
- Office and equipment stipend
- Pension / 401K programs
- Life Insurance
- Premium Healthcare
- Competitive Base Salary
- Lucrative Bonus Programs
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
DevOps Cloud Engineer, Open LMS
Learning Technologies Group plcLTG is a leader in corporate digital learning and talent management.
• Using automation and Infrastructure as a Code (IaC) to continuously improve reliability, scalability, and performance of services deployed on AWS. • Performance tuning and configuration of both Linux system and application parameters supporting highly concurrent web stacks. • Manage infrastructure through code using configuration management and IaC templating software such as Terraform and Puppet. • Documenting procedures and knowledge base articles throughout problem resolution and architecture development processes. • Monitoring the availability, performance, and health of production systems to meet service level objectives using monitoring systems such as Icinga, Prometheus, Grafana, CloudWatch, and Loki. • Participating in emergency incident response on-call rosters. • Practicing blameless postmortems that lead to improvements in resiliency and reductions in alert fatigue.
Senior IAM Operations – Reliability Engineer
GenesysOrchestrating billions of remarkable experiences in more than 100 countries – through cloud, digital and AI technology.
• Resolve IAM-related incidents through hands-on troubleshooting and remediation, serving as an escalation point for other junior engineers on the team. • Monitor observability, AIOps, and event management platforms to identify anomalies, authentication failures, provisioning delays, and emerging IAM-related incidents. • Perform incident triage and correlation to determine probable cause and appropriate routing for deeper investigation. • Validate automated remediation workflows and assist in identifying repeated manual IAM tasks that could be automated. • Participate in early-stage automation and AI-readiness activities by documenting remediation steps, key patterns, and operational edge cases related to identity services. • Reduce alert noise by suggesting adjustments to IAM-related thresholds, suppression logic, or detection rules. • Support post-incident reviews by providing relevant data, timelines, and insights related to identity service behavior. • Collaborate with Cloud, Network, Security, Endpoint, and ServiceNow teams to support incident resolution and improve IAM operational processes. • Assist with access lifecycle, certification, or remediation workflows by troubleshooting failures, validating outcomes, and performing manual intervention when automation isn’t available. • Ensure accuracy of identity event data, alerts, and service mappings to support effective correlation within monitoring and CMDB systems. • Troubleshoot and resolve IAM-related incidents across authentication, authorization, provisioning, deprovisioning, and access lifecycle workflows. • Analyze logs, events, and telemetry from IAM platforms (e.g., Okta system logs, Microsoft Entra ID (formerly Azure Active Directory) sign-in logs, directory events) to determine service impact and root causes. • Support correlation of IAM events with dependencies across cloud applications, SaaS platforms, endpoints, and network access paths. • Participate in validating IAM automation workflows such as Joiner/Mover/Leaver processes, access provisioning, and deprovisioning flows and apply fixes and minor enhancements to automation or AIOps capabilities. • Assist in identifying IAM-related automation opportunities by documenting repeated failure modes and manual remediation steps. • Support certificate, trust, and integration troubleshooting for IAM-connected applications and services. • Maintain IAM-focused dashboards and alerts, ensuring clear signals and early detection of user-impacting identity issues. • Provide knowledge-sharing to team members and peers regarding common IAM troubleshooting patterns and operational best practices. • Participate in IAM readiness activities for new application onboarding, platform changes, or lifecycle process updates by reviewing operational and monitoring requirements.
• Spend your days working to automate and improve reliability and continue to push FlightAware's infrastructure forward, ensuring it is resilient and reproducible. • Be responsible for service availability, performance, monitoring, incident response, and capacity planning. • Create, improve, and manage environments to ensure decisions on resource allocation, problem identification, and capacity planning are based on accurate data-driven insights. • Maintain a physical infrastructure using Kubernetes, Linux, & Ceph, and a cloud infrastructure in AWS as part of the Site Reliability Engineering team. • Impact technology decision and direction to grow and support the FlightAware platform. • Collaborate closely with fellow SREs on your team and extend your collaboration across other FlightAware teams and disciplines to design dependable and scalable solutions and services. • Identify, implement, and champion process improvements to enhance productivity, collaboration, and delivery efficiency, while ensuring alignment with company goals and industry best practices.
DevOps Engineering Manager
Roadpass DigitalOur brands help inspire, educate, and empower millions of RVers and roadtrippers to enjoy camping and the open road.
• Drives the delivery of company level features and initiatives while prioritizing work alignment with product and business goals • Continuously monitors deadlines and removes any blockers to ensure delivery at a team level • Be a hands on member of the team by contributing to code as an independent contributor at a senior level • Design, develop and deliver scalable and automated services and architecture • Architect, design, and implement solutions with native AWS Services and other cloud/managed services as necessary • Ensure solutions are architected and delivered using best practices and technologies • Communicate the benefits and drawbacks of infrastructure choices to technical and non-technical stakeholders • Create and apply reusable automation libraries across the company • Manage centralized monitoring and alerting for infrastructure, and enable developers to extend with application-level monitoring • Setup infrastructure for easy reporting and accountability across products • Troubleshoot production issues and perform on-call duties • Plan and coordinate infrastructure and operations for new projects and acquisitions • Monitors, advises and implements solutions to address security and risks for company Infrastructure/Ops • Evaluate DevOps priorities for the company • Manages and prioritizes day to day tasks for DevOps team • Holds regular 1:1 meetings with direct reports allowing for two way feedback • Lead, mentor, and manage a team of DevOps engineers fostering a culture of collaboration, accountability, and continuous improvement • Owns performance reviews, and career growth for direct reports, and actively participates in hiring processes.



