Job Closed

This listing is no longer active.

Robin AI logo
Robin AI

We make contracts simple. For everyone.

SRE

DevOps EngineerDevOps EngineerFull TimeRemoteSeniorTeam 51-200H1B No SponsorCompany SiteLinkedIn

Location

South Africa

Posted

71 days ago

Salary

0

Seniority

Senior

Bachelor Degree3 yrs expEnglishAWSPythonTerraform

Job Description

SRE

Robin AI

• Help build and maintain cloud infrastructure and applications that powers Legal AI platform • Collaborate with engineering teams for monitoring, incident response, and deployment strategies • Ensure high availability and reliability of proprietary models and services • Standardise and implement observability practices in service-based architecture • Design, deploy, and operate infrastructure to support product teams • Add automation around manual operational tasks • Participate in and improve on-call and incident handling processes

Job Requirements

  • 3+ years of experience in DevOps or Site Reliability Engineering roles
  • Proficiency in at least one backend programming language (We use Python)
  • Strong knowledge of AWS services (ECS, S3, RDS, Lambda, etc.), managed by Terraform
  • Comfortable troubleshooting across the full stack
  • Knowledge of observability frameworks and tools (We use OpenTelemetry, Cloudwatch & DataDog)
  • Excellent problem-solving and communication skills
  • Experience with AI/ML infrastructure deployments is a plus

Benefits

  • Competitive
  • Generous equity scheme - everyone gets to be an owner of Robin AI!
  • 20 days PTO, in addition to the public holidays observed in South Africa.
  • We prioritise promotions for high performers and help you to progress your career.

Related Categories

Related Job Pages

More DevOps Engineer Jobs

DataCrunch logo

Senior – Principal Site Reliability Engineer

DataCrunch

Premium dedicated GPU servers and clusters. Raw performance at an unmatched price.

DevOps Engineer71 days ago
Full TimeRemoteTeam 11-50H1B No Sponsor

• Ensure the reliability, scalability, and performance of HPC and cloud systems. • Build and maintain automation, observability, and monitoring frameworks for compute clusters. • Collaborate with ML, data, and infrastructure teams to deliver high-availability systems. • Develop and enhance CI/CD pipelines, deployment workflows, and on-call processes. • Participate in architecture design and long-term infrastructure strategy discussions. • Participate in a 24/7 on-call rotation, with at least one full on-call week per month.

Germany
Job Closed
DevOps Engineer71 days ago
Full TimeRemoteTeam 51-200H1B No Sponsor

• Design, build and operate our AWS- and Kubernetes-based platform • Own one or more areas and act as the go-to person in the team • Operate production AWS environments and Kubernetes clusters • Maintain observability stack: Metrics, Logs, Traces, Instrumentation • Define SLOs, dashboards and alerting for teams • Work on Kubernetes networking, Ingress controllers and traffic routing • Build and maintain Terraform modules for AWS and Kubernetes • Support connectivity between cloud and on-prem systems • Participate in design reviews, incident reviews and on-call.

Germany
Empower logo

API Reliability Engineer

Empower

We are an equal opportunity employer with a commitment to diversity. All individuals, regardless of personal characteristics, are encouraged to apply. All qualified applicants will receive consideration for employment without regard to age, race, color, national origin, ancestry, sex, sexual orientation, gender, gender identity, gender expression, marital status, pregnancy, religion, physical or mental disability, military or veteran status, genetic information, or any other status protected by applicable state or local law.

DevOps Engineer71 days ago
Full TimeRemoteTeam 10,001+H1B Sponsor

• Own and improve the reliability, performance, and scalability of API services in production. • Troubleshoot and resolve P1/P2 production incidents end-to-end, analyzing issues across application, infrastructure, and integrations. • Work closely with API developers to identify and address reliability issues and application-level security vulnerabilities in service design and implementation. • Contribute targeted code-level or configuration fixes to resolve issues and prevent recurrence. • Participate in root cause analysis (RCA) and drive durable, long-term fixes. • Improve API resilience through patterns such as timeouts, retries, circuit breakers, and graceful degradation. • Establish and enhance observability and service health metrics, including logs, metrics, traces, and SLOs, using Datadog and Splunk. • Define and monitor SLAs/SLOs for API performance and availability. • Work with API Gateway and ALB/NLB for traffic management, routing, and system reliability. • Contribute to CI/CD pipelines using Jenkins to ensure safe and consistent deployments. • Contribute to disaster recovery readiness and system resilience planning. • Collaborate across engineering teams to improve system design and operational readiness. • Participate in an on-call rotation for critical incidents (P1/P2).

United States
$87.4K - $123.4K / year
Job Closed
Lucidworks logo

Senior DevOps Engineer

Lucidworks

Leaders in AI-Powered Search

DevOps Engineer71 days ago
Full TimeRemoteTeam 201-500H1B Sponsor

• Build the automation tools that ensure our internal and external customers receive resources quickly and painlessly while making our team’s lives easier • Work closely with engineering teams to deliver a high quality product to our customers that meets all of their needs • Aim for at least 99.9% uptime across all of our managed customers • Work on several major projects including automating parts of our infrastructure, creating new monitors and alerts, creating new tooling for both team consumption and company consumption, etc. • Take ownership of Lucidworks’ company-wide cloud-first initiative by making the onboarding process for new customers as smooth as possible for them.

United States
$128K - $176K / year
Job Closed