Job Closed

This listing is no longer active.

Megaport

Connectivity simplified. megaport.com

Senior Site Reliability Engineer

DevOps EngineerDevOps EngineerFull Time Remote SeniorTeam 201-500Since 2013H1B SponsorCompany Site LinkedIn

Location

Australia

Posted

70 days ago

Salary

Seniority

Senior

Bachelor Degree5 yrs expEnglishAWS Cassandra Cloud Kubernetes Linux PostgreSQL Python Terraform Go

Job Description

• Improving production reliability and system resilience within an SRE scoped team • Championing high standards of work and industry best practices • Communicating with teams and stakeholders at all stages • Bringing fresh ideas to the table and encouraging others • Diving into complex technical problems with a can-do attitude • Working across numerous technologies in a fast-changing industry • Participating in on-call rotation, incident response, and blameless post-incident reviews • Writing code, handling alerts, improving solutions, and supporting others • Playing a crucial role in the success of your company and team

Job Requirements

5+ years administering Linux systems and related infrastructure in production environments
A collaborative SRE mindset, with familiarity around SLIs/SLOs/SLAs, error budgets, blast radius, and blameless postmortems
A focus on automation, reducing toil, and preventing problem recurrence
A track record of writing runbooks that work for the broader team, not just yourself
Strong Kubernetes and broader ecosystem fundamentals
Cloud infrastructure experience; AWS strongly preferred and bare-metal is a bonus
Strong tool development - Bash, plus either Python or Go preferred, or similar
Infrastructure-as-code tooling experience - Terraform preferred
CI/CD and version control, GitHub preferred
Database experience - one of Postgres, Cassandra, or ClickHouse preferred
Experience operating a production observability stack (metrics, logs, traces), with an eye for signal over noise
Comfortable working on live production infrastructure, with strong troubleshooting instincts and ownership of incident response
A history of continual professional development
A self-directed style suited to an async, globally distributed team, and comfortable picking up adjacent work when the situation calls for it

Benefits

Flexible working environments
Birthday Leave
Generous study and training allowance + 5 days paid study leave
Creative, fun, and contemporary workspaces
Motivated team of industry experts and new talent
Celebrated success with ‘Legend’ and ‘Kudos’ Awards
Health and wellness program

Related Categories

DevOps Engineer

Related Job Pages

Remote Full-time Jobs (US)Remote Python Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

Senior DevOps Engineer

Talent Hackers

Top talent from the fastest-growing continent on earth.

DevOps Engineer70 days ago

Full Time RemoteTeam 11-50Since 2024H1B No Sponsor

Company Site LinkedIn

• Built and maintained observability, alerting, and triage systems • Improve system reliability and incident response • Established and managed multi-stage environments (dev, staging, prod) • Strengthened infrastructure security across IAM, networking, and secrets management • Designed and implemented CI/CD pipelines with automated testing and deployment • Supported SOC 2 compliance by implementing monitoring, access controls, and audit-ready infrastructure • Developed OAuth-based authentication and enabled client-specific SSO integrations • Improved performance and efficiency through infrastructure/ database optimization • Enhanced job scheduling, alerting, and internal tooling to increase engineering efficiency

AWS Docker Kubernetes Terraform

View details: Senior DevOps Engineer

South Africa

Apply

Site Reliability Engineer

Review ALL

We are your Recruitment Team!

DevOps Engineer70 days ago

Full Time RemoteTeam 11-50Since 2023H1B No Sponsor

Company Site LinkedIn

• Own reliability for our global bare metal fleet — monitoring, alerting, incident response, post-mortems • Build and maintain internal tooling: NetBox (infrastructure source of truth), Python/Go services • Drive automation for hardware lifecycle: provisioning, decommissioning, firmware updates, network changes • Collaborate with platform engineers on the provisioning stack • Participate in on-call rotation

Linux Prometheus Python Go

View details: Site Reliability Engineer

Brazil

Apply

Senior DevOps Engineer

HostPapa

Let Papa take care of you!

DevOps Engineer70 days ago

Full Time RemoteTeam 51-200Since 2006H1B No Sponsor

Company Site LinkedIn

• Design, evolve, and operate scalable and elastic cloud architectures for multi tenant SaaS platforms • Continuously challenge and improve existing infrastructure and architectural decisions to remove performance, scalability, and operability bottlenecks • Design and maintain cloud native and hybrid solutions, integrating cloud platforms with on prem systems when required • Build, maintain, and improve CI/CD pipelines that enable fast, safe, and repeatable deployments • Promote and enforce Infrastructure as Code (IaC) practices using Terraform • Automate provisioning, configuration, scaling, and recovery to reduce manual operational effort • Improve deployment strategies in collaboration with SRE teams to increase reliability and predictability • Design and operate containerized platforms using Docker and Kubernetes • Support and evolve microservices architectures, ensuring deployment safety, isolation, and scalability • Operate and support production and pre-production environments and troubleshoot complex infrastructure issues • Participate in incident response and on call rotations when required, working with SREs to reduce operational toil • Maintain clear and up to date documentation for infrastructure, pipelines, and operational procedures • Partner closely with engineering teams to improve developer experience, delivery velocity, and platform reliability • Support other tasks or projects as assigned to meet team and business needs

Ansible AWS Azure Cloud Distributed Systems Docker Google Cloud Platform Grafana Groovy Jenkins Kubernetes Linux Microservices Python Terraform

View details: Senior DevOps Engineer

Canada

Apply

Senior Site Reliability Engineer

TechInsights

The most trusted source of semiconductor analysis and market information

DevOps Engineer70 days ago

Full Time RemoteTeam 201-500Since 1989H1B No Sponsor

Company Site LinkedIn

• Own SLOs, SLIs, and error budgets for all production services; drive error budget discipline across engineering • Design reliability patterns for AI agent pipelines: LLM observability, tool-use tracking, failure detection, and graceful degradation • Architect for blast radius containment — agent failures must have bounded customer impact through isolation, circuit breaking, and rapid recovery • Mature our Canada Central/West active-active architecture toward 24-hour RTO with full regional failover • Lead incident response and post-incident reviews that produce durable fixes; maintain DR procedures through regular testing • Serve as the primary reliability liaison to Software and AI Engineering, translating requirements into actionable standards • Partner with AI Engineering on compute provisioning, model serving, inference latency, and workload isolation • Own CI/CD pipeline strategy (Bitbucket Pipelines, GitHub Actions) — set standards, optimize deployment frequency, and ensure teams can ship confidently • Drive IDP adoption and enable teams on SRE practices: on-call readiness, SLO definition, runbook development, and self-service tooling • Represent reliability in architectural discussions; surface risk before it's committed to design • Operate Datadog as the single pane of glass for service health, infrastructure, and agentic pipeline telemetry • Extend observability to AI workloads: LLM latency, token consumption, agent completion rates, and pipeline throughput • Build golden path templates in Backstage and/or Atlassian Compass so teams ship reliably without routine SRE involvement • Own infrastructure as code via Terraform and GitOps; enforce IaC policy in partnership with Trust Assurance • Own FinOps visibility into AWS cost segments; model cloud cost impact as AI/ML workloads scale • Formally mentor junior and intermediate SRE engineers, with accountability for their technical growth and career progression • Build AI-assisted automation to progressively reduce toil and scale the team's operational capacity

AWS Cloud Docker Java Kubernetes Python Spring Spring Boot SpringBoot Terraform

View details: Senior Site Reliability Engineer

Poland

zł18.8K - zł20K / year

Apply

Senior Site Reliability Engineer

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Senior DevOps Engineer

Site Reliability Engineer

Senior DevOps Engineer

Senior Site Reliability Engineer