Job Closed

This listing is no longer active.

MLabs

We are a Haskell, Rust, Blockchain and AI consultancy.

Senior Site Reliability Engineer

DevOps EngineerDevOps EngineerOther Remote SeniorTeam 51-200H1B No SponsorCompany Site LinkedIn

Location

United States

Posted

86 days ago

Salary

Seniority

Senior

EnglishTerraform Kubernetes Helm Argo CD CI/CD AWS Google Kubernetes Engine Amazon EKS Azure Kubernetes Service Infrastructure as Code Distributed Systems Observability / Monitoring

Job Description

Senior Site Reliability Engineer (Enterprise Platform) Location: Remote - US - Open to Europe if happy to overlap with EST Compensation: Competitive We are hiring on behalf of our client, a high-growth software company supporting the development of a premier open-source, EVM-compatible public ledger built for global enterprise and Web3 use cases. They are currently hiring a Senior Site Reliability Engineer for their "greenfield" enterprise-focused team. This team is building a private and consortium distributed ledger platform designed specifically for sectors with high security and privacy requirements, such as financial services, healthcare, and supply chain. This is a hands-on, high-impact role where you will own the design, deployment, and reliability of mission-critical, multi-region infrastructure. This is not a traditional support role; they are looking for an engineer who has operated real systems at scale and is eager to take end-to-end ownership of architecture and operational standards from the ground up. Key Responsibilities: - Systems Architecture: Design and operate highly available, multi-region distributed systems with rigorous recovery strategies (RTO/RPO). - Infrastructure as Code: Own large-scale IaC using Terraform, developing reusable modules and multi-account patterns with policy guardrails. - Kubernetes Orchestration: Scale production environments (EKS, GKE, or AKS) utilizing GitOps (ArgoCD), Helm, and strict network policies. - CI/CD Leadership: Build secure pipelines supporting blue/green and canary deployments, artifact signing (SBOM), and automated rollback strategies. - SRE Advocacy: Define and improve SLOs, error budgets, and observability metrics to drive measurable reductions in MTTR. - Collaboration: Partner with the Head of SRE and VP of Engineering to translate complex business requirements into reliable, secure platform services.

Job Requirements

7+ years of experience in SRE, Platform Engineering, or Infrastructure Engineering operating production distributed systems.
Multi-Cloud Mastery: Deep expertise in AWS or GCP, with experience running multi-region production environments and disaster recovery testing.
Containerization: Hands-on experience with Kubernetes at scale, including GitOps workflows and production-grade security controls.
Security Mindset: Strong background in Zero Trust principles, secrets management (Vault), and compliance frameworks (SOC 2, HIPAA, or NIST).
Tooling: Extensive experience with Terraform-first infrastructure in large-scale, real-world environments.
Nice to Have:
Experience with distributed ledger technology (DLT) or blockchain systems, particularly private/consortium deployments.
Familiarity with EVM-based systems and smart contract tooling (Solidity, Hardhat).
Experience operating active-active, globally distributed architectures.
Background in supporting financial services or other highly regulated industries.

Benefits

Incentive Package: Competitive base salary with Performance Bonuses.
Ownership: Equity and Token participation.
Future-Proofing: 401k and comprehensive health insurance (for US-based employees).
Innovation: The opportunity to build a "greenfield" platform from scratch within a stable, venture-backed organization.
Impact: Work on infrastructure that powers the world’s leading organizations across multiple sectors.
Due to the high volume of applications we anticipate, we regret that we are unable to provide individual feedback to all candidates. If you do not hear back from us within 4 weeks of your application, please assume that you have not been successful on this occasion. We genuinely appreciate your interest and wish you the best in your job search.
Commitment to Equality and Accessibility:
At MLabs, we are committed to offer equal opportunities to all candidates. We ensure no discrimination, accessible job adverts, and providing information in accessible formats. Our goal is to foster a diverse, inclusive workplace with equal opportunities for all. If you need any reasonable adjustments during any part of the hiring process or you would like to see the job-advert in an accessible format please let us know at the earliest opportunity by emailing human-resources@mlabs.city.
MLabs Ltd collects and processes the personal information you provide such as your contact details, work history, resume, and other relevant data for recruitment purposes only. This information is managed securely in accordance with MLabs Ltd’s Privacy Policy and Information Security Policy, and in compliance with applicable data protection laws. Your data may be shared only with clients and trusted partners where necessary for recruitment purposes. You may request the deletion of your data or withdraw your consent at any time by contacting legal@mlabs.city.

Related Categories

DevOps Engineer

Related Job Pages

More Remote Jobs

More DevOps Engineer Jobs

DevSecOps Engineer

Perkbox

Helping businesses care for, connect with and celebrate their people— no matter where they are or what they want 🎈

DevOps Engineer86 days ago

Full Time RemoteTeam 201-500Since 2015H1B No Sponsor

Company Site LinkedIn

• Take ownership of our security posture across our AWS and Azure estates • Work closely with DevOps and Developer teams to integrate security into delivery pipelines • Enhance threat detection, manage vulnerability scanning, and ensure infrastructure resilience • Monitor applications and respond promptly to security alerts • Perform static and dynamic security testing as part of pipeline enhancements • Document security procedures and report on findings with clarity and precision

AWS Azure Kubernetes React React Native

View details: DevSecOps Engineer

Bulgaria

Apply

Job Closed

Site Reliability Engineer

Layer7.mx

Soluciones de Contacto Telefónico para Negocios

DevOps Engineer86 days ago

Full Time RemoteTeam 51-200H1B No Sponsor

Company Site LinkedIn

• Diseñar e implementar el sistema de monitoreo y alertas centralizadas (la alerta debe llegar al sistema, no al cliente). • Definir métricas de confiabilidad (SLOs, SLIs, SLAs) y garantizar su cumplimiento. • Analizar y prevenir incidentes de disponibilidad, identificando patrones y causas raíz. • Colaborar con DevOps y Data para diseñar arquitecturas que sean resilientes por diseño. • Documentar runbooks, dashboards y protocolos de respuesta a incidentes. • Liderar revisiones postmortem con foco en mejora continua y aprendizaje organizacional.

Grafana Prometheus

View details: Site Reliability Engineer

Worldwide

Apply

Job Closed

Development, security, and operations Engineer

MetaPhase Consulting

MetaPhase Consulting is a business management and technology consulting company that specializes in providing its services to commercial clients, nonprofit organizations, and gover

DevOps Engineer86 days ago

Other Hybrid

Enhance developer experience by maintaining tools for secure code delivery, implement secure configurations in collaboration with teams, and ensure compliance with government security standards while supporting incident response protocols.

AWS Kubernetes Terraform Docker CI/CD Python Linux Observability / Monitoring Jenkins Ansible Shell

View details: Development, security, and operations Engineer

Virginia

Apply

System Reliability Engineer/DevOps

GROWE

GROWE TOGETHER: Our team is our main asset. We work together and support each other to achieve our common goals. DRIVE RESULT OVER PROCESS: We set ambitious, clear, measurable goals in line with our strategy and driving Growe to success. BE READY FOR CHANGE: We see challenges as opportunities to grow and evolve. We adapt today to win tomorrow.

DevOps Engineer86 days ago

Other RemoteTeam 11-50

Role Description Growe welcomes those who are excited to: - Ensure availability, performance, and scalability of infrastructure and services through monitoring, automation, and operational best practices; - Lead incident response, perform root cause analysis, and implement recovery and long-term fixes; - Manage infrastructure using Terraform, Terragrunt, and automation tools for consistency and repeatability; - Implement and maintain metrics, logs, and tracing solutions (Prometheus, Grafana, Loki, VictoriaMetrics, CloudWatch) to ensure system visibility; - Identify bottlenecks, tune systems, and improve infrastructure performance; - Monitor resources, forecast growth, and implement scaling strategies; - Integrate security best practices into IaC, CI/CD pipelines, and deployments; - Support vulnerability management; - Participate in 24/7 rotations (once a week) for timely resolution of critical incidents; - Work with DevOps, PRE, development, and security teams to improve reliability and design resilient systems; - Maintain operational runbooks, incident reports, and system documentation. Qualifications - 3+ years in a DevOps, SRE, or related role; - Strong hands-on experience with AWS services including EC2, ECS, EKS, RDS, DocumentDB, ElastiCache, Keyspaces, S3, EBS, VPC, Route53, KMS, ACM, and CloudWatch; - Proficiency with Terraform, Terragrunt, and Atlantis for reproducible and version-controlled infrastructure; - Experience with GitLab CI, FluxCD, Argo Rollouts, and automation tools (Ansible, Python, Bash); - Solid experience with Docker, Kubernetes (AWS EKS), and Helm (including custom templates, ChartMuseum); - Familiarity with cluster add-ons such as KEDA, VPA, Karpenter, External-DNS, ingress-nginx, aws-alb-controller, and ebs-csi-driver; - Hands-on experience with Grafana, VictoriaMetrics stack, Tempo, metrics exporters, Pingdom, AWS CloudWatch, and alerting systems like PagerDuty, VMAlert, and Alertmanager; - Proficiency with Grafana Loki, OpenSearch, and Vector Agent for centralized logging; - Strong understanding of networking concepts, AWS networking (VPC, Network Firewall, Transit Gateway, Site-to-Site VPN), identity and access management, certificate management (ACM, Vault, SOPS), and application security best practices; - Familiarity with Cloudflare services, including caching, DNS, and Workers; - Exposure to AWS Cost Explorer, KubeCost, and custom cost export tools; - Certifications: AWS, Terraform, Kubernetes, or Helm are a plus. Requirements - Problem-Solving Mindset: Approaches complex issues methodically and finds practical solutions under pressure; - Analytical Thinking: Able to interpret metrics, logs, and system behavior to make informed decisions; - Attention to Details: Ensures accuracy in infrastructure changes, configurations, and deployment processes; - Adaptability: Comfortable learning new tools, technologies, and adjusting to changing environments; - Collaboration & Teamwork: Works effectively with cross-functional teams and communicates clearly; - Ownership & Responsibility: Takes accountability for tasks, incidents, and service reliability; - Continuous Learning: Keeps up-to-date with DevOps, SRE, cloud, and security best practices; - Effective Communication: Can explain technical concepts clearly to both technical and non-technical stakeholders. Company Description We are seeking those who align with our core values: - GROWE TOGETHER: Our team is our main asset. We work together and support each other to achieve our common goals; - DRIVE RESULT OVER PROCESS: We set ambitious, clear, measurable goals in line with our strategy and driving Growe to success; - BE READY FOR CHANGE: We see challenges as opportunities to grow and evolve. We adapt today to win tomorrow.

AWS Terraform Kubernetes Docker Helm GitLab CI Prometheus Grafana Amazon CloudWatch Ansible Python Shell Amazon ECS Amazon EKS Amazon RDS Amazon S3 Argo Nginx OpenSearch Firewalls HashiCorp Vault Cloudflare

View details: System Reliability Engineer/DevOps

United States + 171 more

Apply

Job Closed

Senior Site Reliability Engineer

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

DevSecOps Engineer

Site Reliability Engineer

Development, security, and operations Engineer

System Reliability Engineer/DevOps