Job Closed

This listing is no longer active.

nDeavour Consulting

We are a staffing and IT recruitment company based in Sofia, Bulgaria.

Senior Systems/Reliability Engineer

DevOps EngineerDevOps EngineerOther Remote SeniorTeam 1-10Since 2019H1B No SponsorCompany Site LinkedIn

Location

United States

Posted

70 days ago

Salary

Seniority

Senior

Bachelor Degree5 yrs expEnglishAnsible AWS Azure Kubernetes Linux Terraform

Job Description

• Operational Stability & Reliability: Own the health and performance of our hybrid (AWS/On-prem) estate. • Infrastructure Maturation: Lead the effort to document our environment—creating the architecture diagrams and runbooks necessary to eliminate single points of failure. • Pragmatic Kubernetes Management: Operate and improve our existing production Kubernetes clusters. • Technical Authority & Peer Leadership: Serve as a senior technical sounding board and mentor. • Developer Enablement: Partner with development teams to provide the automation, standards, and guardrails that allow them to own their own deployments safely. • Sustainable On-Call: Participate in a sustainable on-call rotation. • Security & Compliance Support: Own the operational application of ISO 27001 controls within your remit.

Job Requirements

5+ years in a Senior SysOps, DevOps, or SRE role with proven experience managing high-pressure production environments.
Strong Linux Administration: Confident troubleshooting of production issues (services, logs, performance, and networking).
Hybrid Infrastructure Experience: Practical knowledge of AWS (primary) and Azure, with a comfort level managing both cloud-native and physical/legacy infrastructure.
Automation & IaC: Proficiency with Terraform and configuration management (e.g., Ansible).
Kubernetes Competency: Experience operating and improving Kubernetes in a production setting.
Reliability Mindset: A track record of identifying risks and fixing root causes to improve system monitoring and quality.
Familiarity with physical data center environments or Cisco networking (Desirable).
Experience in regulated environments (Healthcare, Finance, or similar) (Desirable).
Experience supporting ISO 27001 or SOC2 audits (Desirable).

Benefits

Remote Office – Flexible hybrid form of working
Parking Space – Free parking spots provided
Fun Office Space – Game zone and relaxation area available
Health Insurance – Additional private health insurance, including dental care plan
Personal Development – Company-sponsored training budget to further develop your skills
Employee Referral Program – Receive a bonus for referring a friend
Holidays – Extra 5 days after your 1st and 5th year with the company
Social Events – We love to celebrate success together
Family Insurance – Option to add insurance for family members
Sports Cards – 100% sponsored by the company

Related Categories

DevOps Engineer

Related Job Pages

More Remote Jobs

More DevOps Engineer Jobs

Senior Site Reliability Engineer

Akuity

Remove complexity, add velocity.

DevOps Engineer70 days ago

Other RemoteTeam 11-50Since 2021H1B No Sponsor

Company Site LinkedIn

• Own SLI/SLO/SLA definitions for the Akuity SaaS platform and drive continuous improvement against them • Design, instrument, and maintain observability systems (metrics, logs, traces) across multi-region AWS infrastructure • Identify reliability gaps, lead blameless post-mortems, and close the loop with permanent fixes • Partner with engineering teams to build reliability into new features before they ship to production • Participate in an on-call rotation and act as incident commander for high-severity production events • Build and maintain runbooks, escalation paths, and incident playbooks that keep mean time to resolution low • Drive improvements to alerting fidelity; reduce noise, increase signal, eliminate toil • Lead post-incident reviews with clear timelines, root cause analysis, and follow-through on action items

AWS Amazon EC2 Grafana Kubernetes Prometheus Python

View details: Senior Site Reliability Engineer

United States

Apply

Job Closed

Lead Site Reliability Engineer

Gifthealth

DevOps Engineer70 days ago

Other Remote

• Designs, builds, and maintains reliable, scalable software systems supporting Ruby on Rails applications • Embeds reliability, performance, and operational best practices into application code and development workflows • Owns DevOps practices including CI/CD reliability, deployment strategies, and release safety • Leads incident response, debugging, and root cause analysis across application and platform layers • Implements and evolves observability (logging, metrics, tracing) within application and service code • Partners with engineering teams on architecture, capacity planning, and technical standards

AWS Azure Docker GCP Prometheus Ruby Ruby on Rails Terraform

View details: Lead Site Reliability Engineer

United States

$123K - $154K / year

Apply

Job Closed

Site Reliability Engineer – SaaS

Infiterra

Infiterra helps IT Distributors and MSPs transform and grow. Our platform automates each step from quote to bill.

DevOps Engineer70 days ago

Full Time RemoteTeam 51-200Since 2012H1B No Sponsor

Company Site LinkedIn

• Maintain and continuously improve production uptime, supporting our ≥99.9% target for 2026. • Monitor systems proactively and respond effectively to production incidents. • Drive improvements in MTTR (Mean Time to Resolution). • Perform structured root cause analysis and contribute to long-term preventive actions. • Participate in an evolving on-call model as we mature toward structured production support. • Manage and optimize Azure infrastructure across compute, networking, and identity components. • Work hands-on with AKS clusters as part of our growing Kubernetes adoption. • Maintain networking components including load balancers and private endpoints. • Contribute to improving platform resilience and scalability as demand grows. • Design and improve observability practices, including metrics, logs, and alerting standards across production systems. • Contribute to and improve Infrastructure as Code practices (Terraform or similar), ensuring consistent and repeatable deployments. • Reduce manual operational effort through scripting and automation. • Work closely with DevOps to ensure smooth CI/CD integration and reliable production deployments. • Support Security initiatives related to infrastructure hardening. • Partner with DevOps on deployment reliability and configuration changes impacting production.

Azure Kubernetes Linux Terraform

View details: Site Reliability Engineer – SaaS

Greece

Apply

Job Closed

Site Reliability Engineer

Illumination Systems Arizona

Arizona's Lighting & Controls Agency.

DevOps Engineer70 days ago

Full Time RemoteTeam 51-200Since 1937H1B No Sponsor

Company Site LinkedIn

• Enhance, optimize, validate and automate core MinIO software for performance, scalability, and security. • Help building and delivering high-performance distributed storage solutions with a focus on cloud-native architectures. • Validate the MinIO Software according to customer environment and requirements, ensuring no surprises are observed at customer deployments. • Improve existing features, fix critical issues, and contribute to open-source repositories. • Collaborate with other engineers to refine architecture, APIs, and integrations. • Write efficient, well-documented, and maintainable code. • Conduct performance benchmarking and debugging of complex storage environments. • Work closely with customers to address issues, and manage expectations.

Cloud Distributed Systems Kubernetes Microservices Rust Go

View details: Site Reliability Engineer

South Korea

Apply

Senior Systems/Reliability Engineer

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Senior Site Reliability Engineer

Lead Site Reliability Engineer

Site Reliability Engineer – SaaS

Site Reliability Engineer