Order.co logo
Order.co

Order.co, formerly known as Negotiatus, has developed a cloud-based spend management software for its customers to “centralize and streamline the purchasing process.” As an emp

Senior Site Reliability Engineer

DevOps EngineerDevOps EngineerFull TimeRemoteSeniorTeam 146Since 2016

Location

New York

Posted

19 days ago

Salary

$175K - $200K / year

Seniority

Senior

Job Description

Senior Site Reliability Engineer

Order.co

• Ensure software systems are reliable, scalable, performant, and operationally efficient • Design, build, and operate highly available, scalable, and fault-tolerant infrastructure and platform services • Define and maintain service level objectives (SLOs), service level indicators (SLIs), and error budgets across platform systems • Lead incident response efforts for complex production outages; drive root-cause analysis and long-term remediation actions • Develop infrastructure automation and self-service tooling to reduce operational toil and improve engineering velocity • Build and maintain CI/CD pipelines, deployment automation, and release engineering workflows • Design and maintain comprehensive monitoring, logging, tracing, and alerting systems for distributed services

Job Requirements

  • Strong foundation in computer science fundamentals: data structures, algorithms, and system design
  • Familiarity with building production-grade applications and services using Ruby and Ruby on Rails
  • Deep expertise with Linux systems administration and production troubleshooting
  • Strong experience operating cloud infrastructure at scale, particularly within AWS environments
  • Experience with Kubernetes, container orchestration, and cloud-native infrastructure patterns
  • Proficiency with infrastructure as code tools such as Terraform or CloudFormation
  • Expertise designing and operating CI/CD pipelines and deployment automation systems
  • Deep understanding of observability tooling including Datadog, OpenTelemetry, or similar platforms
  • Strong knowledge of distributed systems reliability patterns including redundancy, failover, autoscaling, rate limiting, and graceful degradation
  • Experience supporting distributed microservices architectures and event-driven systems

Benefits

  • Competitive compensation including base salary, bonus, and equity
  • Employer-sponsored 401(k) with match
  • Comprehensive medical, dental, and vision coverage
  • Flexible time off and hybrid work environment

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Hunt St logo

Senior DevOps Engineer

Hunt St

We help Aussie companies find top 3% remote talent in the Philippines & Nepal for a single finder's fee.

DevOps Engineer20 days ago
Full TimeRemoteTeam 1-10H1B No Sponsor

Role Description We are seeking an experienced and highly skilled Senior DevOps Engineer to join our engineering team. This role is critical to the development and deployment of our infrastructure, ensuring robust CI/CD pipelines, infrastructure as code (IaC), cloud environment optimization, and seamless collaboration across development, QA, and operations teams. The ideal candidate is passionate about automation, performance, scalability, and reliability. Key Responsibilities - Maintaining and improving the resiliency of our core applications and our hybrid infrastructure platform - Providing continued improvement to the platform infrastructure through automation and standardisation - Providing complementary skills and expertise to the teams and continuously learning from peers and seniors - Ensuring that all of our core services are up to date and security patched - Working closely with development teams to ensure applications are configured for security, efficiency and scalability Qualifications - Bachelor's degree in Computer Science, Information Technology, or a related field - 5+ years of experience in DevOps, Systems Engineering, or a related field - Linux native; if you do not use Linux as your preferred OS this may not be the role for you - Great communication skills (verbal and written) - Strong experience with the following: - Linux administration - Bash scripting - Kubernetes - Docker - AWS - Good knowledge of networking, DNS, load balancing and CDN's Preferred Qualifications - Experience with Terraform and Ansible - Experience working on and supporting container-based CI/CD pipelines - Keen interest in SecOps practices - AWS Certifications (we will fully support any AWS certification you are seeking) - Experience configuring observability platforms for monitoring and alerting (including Prometheus and New Relic) - Experience with Hashicorp vault, Redis, RabbitMQ or MSSQL - Experience with any programming languages (i.e. Node JS, PHP, Typescript, Python) Work Arrangement & Expectations This is a remote role that will be set up as an independent contractor engagement. To ensure alignment and transparency, successful candidates will be expected to: - Be available for meetings and collaboration during core [AEST or PHT] business hours - Disclose any existing ongoing roles or client work - Reflect this engagement on their LinkedIn profile (clearly marked as “Independent Contractor”)

Philippines
A$4K / month
Full TimeRemoteTeam 1-10Since 2024H1B No Sponsor

• Design, deploy, and manage containerized workloads using Amazon ECS (Elastic Container Service) and Amazon EKS (Elastic Kubernetes Service). • Build and maintain CI/CD pipelines to automate software delivery workflows. • Develop and manage Docker container images, registries (ECR), and container lifecycle best practices. • Implement Infrastructure as Code (IaC) using tools such as Terraform, CloudFormation, or CDK. • Monitor, troubleshoot, and optimize cloud infrastructure performance, availability, and cost. • Enforce security best practices across containerized environments (IAM roles, network policies, secrets management). • Collaborate with software engineers to containerize applications and migrate workloads to ECS/EKS. • Manage Kubernetes cluster configurations, namespaces, Helm charts, and service mesh integrations. • Define and maintain observability standards using tools like CloudWatch, Prometheus, Grafana, or Datadog. • Participate in on-call rotations and incident response processes.

United Kingdom
Akka (formerly Lightbend) logo

Lead Site Reliability Engineer

Akka (formerly Lightbend)

Responsive by Design, Akka apps are elastic, agile, and resilient.

DevOps Engineer20 days ago
Full TimeRemoteTeam 51-200Since 2011H1B No Sponsor

• Own Service Level Objectives/Service Level Indicators (SLOs/SLIs) and error budgets across multi-cloud clusters (EKS, GKE, AKS); drive blameless post-mortems and systemic remediation. • Lead capacity planning with our customers, cluster lifecycle management, and Kubernetes and database upgrade cycles. • Define and enforce runbooks, on-call rotations, and escalation paths for the wider engineering organisation. • Own and evolve the IaC layer: Helm charts, Crossplane compositions, and FluxCD GitOps pipelines. • Design and maintain cloud-resource provisioning workflows that span all three cloud providers, with consistent policy controls. • Architect and operate connectivity patterns: AWS PrivateLink / Transit Gateway, GCP NCC, Azure VNet Peering, and cross-region ingress with Contour/Envoy. • Maintain and evolve the Linkerd service mesh for mTLS, workload identity (OIDC), and zero-trust authorisation policies. • Drive PKI hygiene with cert-manager: root/intermediate CA rotation, ACME certificate lifecycle, and secret management via KMS-backed Kubernetes vaults. • Own the observability stack: Prometheus, Cortex (multi-tenant metrics), OpenTelemetry sidecars, centralised log pipelines, and Groundcover / Grafana dashboards. • Establish alerting standards and SLO-based alerting rules; ensure distributed traces are actionable across JVM, Rust, and Go workloads. • Actively participate in on-call and lead the technical response for platform-level incidents. • Set engineering standards and review infrastructure changes across the team. • Partner with Security, Product, and Application Engineering to translate reliability requirements into platform capabilities. • Grow a team of 3–5 SREs through code review, architecture sessions, and career conversations.

United States
Job Closed
Full TimeRemoteTeam 501-1,000Since 2005H1B No Sponsor

• Lead Reliability Engineering for User Experience • Drive reliability, scalability, and operational excellence for critical user facing systems and services. Improve performance and resiliency across APIs, content delivery, feed generation, search, messaging, and real-time experiences. • Partner with product and infrastructure engineering teams to design systems that remain highly available and performant under massive global load. Guide architectural decisions around failover, redundancy, graceful degradation, traffic management, and capacity planning. • Identify systemic risks and reliability bottlenecks across services, dependencies, deployments, and infrastructure. Build proactive mitigation strategies and drive engineering improvements that reduce incidents and improve service health. • Eliminate repetitive operational work through automation and tooling. Build systems that improve deployment safety, incident response, remediation workflows, and reliability guardrails • Lead complex incident response efforts across engineering teams. Drive blameless postmortems, identify root causes, and ensure sustainable long-term fixes are implemented. • Define and champion best practices around reliability engineering, SLIs/SLOs, capacity management, release engineering, and operational maturity across the company. • Provide technical leadership and mentorship to engineers across SRE and software engineering teams. Help shape reliability culture and raise the operational excellence bar across the organization.

United Kingdom