Job Closed

This listing is no longer active.

nDeavour Consulting logo
nDeavour Consulting

We are a staffing and IT recruitment company based in Sofia, Bulgaria.

Senior Systems/Reliability Engineer

DevOps EngineerDevOps EngineerOtherRemoteSeniorTeam 1-10Since 2019H1B No SponsorCompany SiteLinkedIn

Location

United States

Posted

70 days ago

Salary

0

Seniority

Senior

Bachelor Degree5 yrs expEnglishAnsibleAWSAzureKubernetesLinuxTerraform

Job Description

Senior Systems/Reliability Engineer

nDeavour Consulting

• Operational Stability & Reliability: Own the health and performance of our hybrid (AWS/On-prem) estate. • Infrastructure Maturation: Lead the effort to document our environment—creating the architecture diagrams and runbooks necessary to eliminate single points of failure. • Pragmatic Kubernetes Management: Operate and improve our existing production Kubernetes clusters. • Technical Authority & Peer Leadership: Serve as a senior technical sounding board and mentor. • Developer Enablement: Partner with development teams to provide the automation, standards, and guardrails that allow them to own their own deployments safely. • Sustainable On-Call: Participate in a sustainable on-call rotation. • Security & Compliance Support: Own the operational application of ISO 27001 controls within your remit.

Job Requirements

  • 5+ years in a Senior SysOps, DevOps, or SRE role with proven experience managing high-pressure production environments.
  • Strong Linux Administration: Confident troubleshooting of production issues (services, logs, performance, and networking).
  • Hybrid Infrastructure Experience: Practical knowledge of AWS (primary) and Azure, with a comfort level managing both cloud-native and physical/legacy infrastructure.
  • Automation & IaC: Proficiency with Terraform and configuration management (e.g., Ansible).
  • Kubernetes Competency: Experience operating and improving Kubernetes in a production setting.
  • Reliability Mindset: A track record of identifying risks and fixing root causes to improve system monitoring and quality.
  • Familiarity with physical data center environments or Cisco networking (Desirable).
  • Experience in regulated environments (Healthcare, Finance, or similar) (Desirable).
  • Experience supporting ISO 27001 or SOC2 audits (Desirable).

Benefits

  • Remote Office – Flexible hybrid form of working
  • Parking Space – Free parking spots provided
  • Fun Office Space – Game zone and relaxation area available
  • Health Insurance – Additional private health insurance, including dental care plan
  • Personal Development – Company-sponsored training budget to further develop your skills
  • Employee Referral Program – Receive a bonus for referring a friend
  • Holidays – Extra 5 days after your 1st and 5th year with the company
  • Social Events – We love to celebrate success together
  • Family Insurance – Option to add insurance for family members
  • Sports Cards – 100% sponsored by the company

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Akuity logo

Senior Site Reliability Engineer

Akuity

Remove complexity, add velocity.

DevOps Engineer70 days ago
OtherRemoteTeam 11-50Since 2021H1B No Sponsor

• Own SLI/SLO/SLA definitions for the Akuity SaaS platform and drive continuous improvement against them • Design, instrument, and maintain observability systems (metrics, logs, traces) across multi-region AWS infrastructure • Identify reliability gaps, lead blameless post-mortems, and close the loop with permanent fixes • Partner with engineering teams to build reliability into new features before they ship to production • Participate in an on-call rotation and act as incident commander for high-severity production events • Build and maintain runbooks, escalation paths, and incident playbooks that keep mean time to resolution low • Drive improvements to alerting fidelity; reduce noise, increase signal, eliminate toil • Lead post-incident reviews with clear timelines, root cause analysis, and follow-through on action items

United States
Job Closed

• Designs, builds, and maintains reliable, scalable software systems supporting Ruby on Rails applications • Embeds reliability, performance, and operational best practices into application code and development workflows • Owns DevOps practices including CI/CD reliability, deployment strategies, and release safety • Leads incident response, debugging, and root cause analysis across application and platform layers • Implements and evolves observability (logging, metrics, tracing) within application and service code • Partners with engineering teams on architecture, capacity planning, and technical standards

United States
$123K - $154K / year
Job Closed
Infiterra logo

Site Reliability Engineer – SaaS

Infiterra

Infiterra helps IT Distributors and MSPs transform and grow. Our platform automates each step from quote to bill.

DevOps Engineer70 days ago
Full TimeRemoteTeam 51-200Since 2012H1B No Sponsor

• Maintain and continuously improve production uptime, supporting our ≥99.9% target for 2026. • Monitor systems proactively and respond effectively to production incidents. • Drive improvements in MTTR (Mean Time to Resolution). • Perform structured root cause analysis and contribute to long-term preventive actions. • Participate in an evolving on-call model as we mature toward structured production support. • Manage and optimize Azure infrastructure across compute, networking, and identity components. • Work hands-on with AKS clusters as part of our growing Kubernetes adoption. • Maintain networking components including load balancers and private endpoints. • Contribute to improving platform resilience and scalability as demand grows. • Design and improve observability practices, including metrics, logs, and alerting standards across production systems. • Contribute to and improve Infrastructure as Code practices (Terraform or similar), ensuring consistent and repeatable deployments. • Reduce manual operational effort through scripting and automation. • Work closely with DevOps to ensure smooth CI/CD integration and reliable production deployments. • Support Security initiatives related to infrastructure hardening. • Partner with DevOps on deployment reliability and configuration changes impacting production.

Greece
Job Closed
Illumination Systems Arizona logo

Site Reliability Engineer

Illumination Systems Arizona

Arizona's Lighting & Controls Agency.

DevOps Engineer70 days ago
Full TimeRemoteTeam 51-200Since 1937H1B No Sponsor

• Enhance, optimize, validate and automate core MinIO software for performance, scalability, and security. • Help building and delivering high-performance distributed storage solutions with a focus on cloud-native architectures. • Validate the MinIO Software according to customer environment and requirements, ensuring no surprises are observed at customer deployments. • Improve existing features, fix critical issues, and contribute to open-source repositories. • Collaborate with other engineers to refine architecture, APIs, and integrations. • Write efficient, well-documented, and maintainable code. • Conduct performance benchmarking and debugging of complex storage environments. • Work closely with customers to address issues, and manage expectations.

South Korea