Order.co, formerly known as Negotiatus, has developed a cloud-based spend management software for its customers to “centralize and streamline the purchasing process.” As an emp

Senior Site Reliability Engineer

DevOps EngineerDevOps EngineerFull Time Remote SeniorTeam 146Since 2016

Location

New York

Posted

19 days ago

Salary

$175K - $200K / year

Seniority

Senior

Bachelor DegreeEnglishAWS Cloud Distributed Systems Kubernetes Linux Microservices Ruby Ruby on Rails Terraform

Job Description

• Ensure software systems are reliable, scalable, performant, and operationally efficient • Design, build, and operate highly available, scalable, and fault-tolerant infrastructure and platform services • Define and maintain service level objectives (SLOs), service level indicators (SLIs), and error budgets across platform systems • Lead incident response efforts for complex production outages; drive root-cause analysis and long-term remediation actions • Develop infrastructure automation and self-service tooling to reduce operational toil and improve engineering velocity • Build and maintain CI/CD pipelines, deployment automation, and release engineering workflows • Design and maintain comprehensive monitoring, logging, tracing, and alerting systems for distributed services

Job Requirements

Strong foundation in computer science fundamentals: data structures, algorithms, and system design
Familiarity with building production-grade applications and services using Ruby and Ruby on Rails
Deep expertise with Linux systems administration and production troubleshooting
Strong experience operating cloud infrastructure at scale, particularly within AWS environments
Experience with Kubernetes, container orchestration, and cloud-native infrastructure patterns
Proficiency with infrastructure as code tools such as Terraform or CloudFormation
Expertise designing and operating CI/CD pipelines and deployment automation systems
Deep understanding of observability tooling including Datadog, OpenTelemetry, or similar platforms
Strong knowledge of distributed systems reliability patterns including redundancy, failover, autoscaling, rate limiting, and graceful degradation
Experience supporting distributed microservices architectures and event-driven systems

Benefits

Competitive compensation including base salary, bonus, and equity
Employer-sponsored 401(k) with match
Comprehensive medical, dental, and vision coverage
Flexible time off and hybrid work environment

Related Categories

DevOps Engineer

Related Job Pages

DevOps Engineer Jobs in New York Remote Full-time Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

Senior DevOps Engineer

Hunt St

We help Aussie companies find top 3% remote talent in the Philippines & Nepal for a single finder's fee.

DevOps Engineer20 days ago

Full Time RemoteTeam 1-10H1B No Sponsor

Company Site LinkedIn

Role Description We are seeking an experienced and highly skilled Senior DevOps Engineer to join our engineering team. This role is critical to the development and deployment of our infrastructure, ensuring robust CI/CD pipelines, infrastructure as code (IaC), cloud environment optimization, and seamless collaboration across development, QA, and operations teams. The ideal candidate is passionate about automation, performance, scalability, and reliability. Key Responsibilities - Maintaining and improving the resiliency of our core applications and our hybrid infrastructure platform - Providing continued improvement to the platform infrastructure through automation and standardisation - Providing complementary skills and expertise to the teams and continuously learning from peers and seniors - Ensuring that all of our core services are up to date and security patched - Working closely with development teams to ensure applications are configured for security, efficiency and scalability Qualifications - Bachelor's degree in Computer Science, Information Technology, or a related field - 5+ years of experience in DevOps, Systems Engineering, or a related field - Linux native; if you do not use Linux as your preferred OS this may not be the role for you - Great communication skills (verbal and written) - Strong experience with the following: - Linux administration - Bash scripting - Kubernetes - Docker - AWS - Good knowledge of networking, DNS, load balancing and CDN's Preferred Qualifications - Experience with Terraform and Ansible - Experience working on and supporting container-based CI/CD pipelines - Keen interest in SecOps practices - AWS Certifications (we will fully support any AWS certification you are seeking) - Experience configuring observability platforms for monitoring and alerting (including Prometheus and New Relic) - Experience with Hashicorp vault, Redis, RabbitMQ or MSSQL - Experience with any programming languages (i.e. Node JS, PHP, Typescript, Python) Work Arrangement & Expectations This is a remote role that will be set up as an independent contractor engagement. To ensure alignment and transparency, successful candidates will be expected to: - Be available for meetings and collaboration during core [AEST or PHT] business hours - Disclose any existing ongoing roles or client work - Reflect this engagement on their LinkedIn profile (clearly marked as “Independent Contractor”)

View details: Senior DevOps Engineer

Philippines

A$4K / month

Apply

Senior Cloud & DevOps Engineer, AWS

Siteup

Innovate Develop Succeed

DevOps Engineer20 days ago

Full Time RemoteTeam 1-10Since 2024H1B No Sponsor

Company Site LinkedIn

• Design, deploy, and manage containerized workloads using Amazon ECS (Elastic Container Service) and Amazon EKS (Elastic Kubernetes Service). • Build and maintain CI/CD pipelines to automate software delivery workflows. • Develop and manage Docker container images, registries (ECR), and container lifecycle best practices. • Implement Infrastructure as Code (IaC) using tools such as Terraform, CloudFormation, or CDK. • Monitor, troubleshoot, and optimize cloud infrastructure performance, availability, and cost. • Enforce security best practices across containerized environments (IAM roles, network policies, secrets management). • Collaborate with software engineers to containerize applications and migrate workloads to ECS/EKS. • Manage Kubernetes cluster configurations, namespaces, Helm charts, and service mesh integrations. • Define and maintain observability standards using tools like CloudWatch, Prometheus, Grafana, or Datadog. • Participate in on-call rotations and incident response processes.

AWS Cloud Docker EC2 Flux Grafana Jenkins Kubernetes Node.js Prometheus Python Terraform

View details: Senior Cloud & DevOps Engineer, AWS

United Kingdom

Apply

Lead Site Reliability Engineer

Akka (formerly Lightbend)

Responsive by Design, Akka apps are elastic, agile, and resilient.

DevOps Engineer20 days ago

Full Time RemoteTeam 51-200Since 2011H1B No Sponsor

Company Site LinkedIn

• Own Service Level Objectives/Service Level Indicators (SLOs/SLIs) and error budgets across multi-cloud clusters (EKS, GKE, AKS); drive blameless post-mortems and systemic remediation. • Lead capacity planning with our customers, cluster lifecycle management, and Kubernetes and database upgrade cycles. • Define and enforce runbooks, on-call rotations, and escalation paths for the wider engineering organisation. • Own and evolve the IaC layer: Helm charts, Crossplane compositions, and FluxCD GitOps pipelines. • Design and maintain cloud-resource provisioning workflows that span all three cloud providers, with consistent policy controls. • Architect and operate connectivity patterns: AWS PrivateLink / Transit Gateway, GCP NCC, Azure VNet Peering, and cross-region ingress with Contour/Envoy. • Maintain and evolve the Linkerd service mesh for mTLS, workload identity (OIDC), and zero-trust authorisation policies. • Drive PKI hygiene with cert-manager: root/intermediate CA rotation, ACME certificate lifecycle, and secret management via KMS-backed Kubernetes vaults. • Own the observability stack: Prometheus, Cortex (multi-tenant metrics), OpenTelemetry sidecars, centralised log pipelines, and Groundcover / Grafana dashboards. • Establish alerting standards and SLO-based alerting rules; ensure distributed traces are actionable across JVM, Rust, and Go workloads. • Actively participate in on-call and lead the technical response for platform-level incidents. • Set engineering standards and review infrastructure changes across the team. • Partner with Security, Product, and Application Engineering to translate reliability requirements into platform capabilities. • Grow a team of 3–5 SREs through code review, architecture sessions, and career conversations.

AWS Azure Cloud DNS Flux Google Cloud Platform Grafana Kubernetes Prometheus Rust Shell Scripting Go

View details: Lead Site Reliability Engineer

United States

Apply

Job Closed

Staff Site Reliability Engineer – Site Experience

Reddit, Inc.

Dive into anything

DevOps Engineer20 days ago

Full Time RemoteTeam 501-1,000Since 2005H1B No Sponsor

Company Site LinkedIn

• Lead Reliability Engineering for User Experience • Drive reliability, scalability, and operational excellence for critical user facing systems and services. Improve performance and resiliency across APIs, content delivery, feed generation, search, messaging, and real-time experiences. • Partner with product and infrastructure engineering teams to design systems that remain highly available and performant under massive global load. Guide architectural decisions around failover, redundancy, graceful degradation, traffic management, and capacity planning. • Identify systemic risks and reliability bottlenecks across services, dependencies, deployments, and infrastructure. Build proactive mitigation strategies and drive engineering improvements that reduce incidents and improve service health. • Eliminate repetitive operational work through automation and tooling. Build systems that improve deployment safety, incident response, remediation workflows, and reliability guardrails • Lead complex incident response efforts across engineering teams. Drive blameless postmortems, identify root causes, and ensure sustainable long-term fixes are implemented. • Define and champion best practices around reliability engineering, SLIs/SLOs, capacity management, release engineering, and operational maturity across the company. • Provide technical leadership and mentorship to engineers across SRE and software engineering teams. Help shape reliability culture and raise the operational excellence bar across the organization.

Cloud Distributed Systems Linux Python Go

View details: Staff Site Reliability Engineer – Site Experience

United Kingdom

Apply

Senior Site Reliability Engineer

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Senior DevOps Engineer

Senior Cloud & DevOps Engineer, AWS

Lead Site Reliability Engineer

Staff Site Reliability Engineer – Site Experience