Megaport logo
Megaport

Connectivity simplified. megaport.com

Senior Site Reliability Engineer

DevOps EngineerDevOps EngineerFull TimeRemoteSeniorTeam 201-500Since 2013H1B SponsorCompany SiteLinkedIn

Location

Australia

Posted

21 days ago

Salary

0

Seniority

Senior

Job Description

Senior Site Reliability Engineer

Megaport

• Improving production reliability and system resilience within an SRE scoped team • Championing high standards of work and industry best practices • Communicating with teams and stakeholders at all stages • Bringing fresh ideas to the table and encouraging others • Diving into complex technical problems with a can-do attitude • Working across numerous technologies in a fast-changing industry • Participating in on-call rotation, incident response, and blameless post-incident reviews • Writing code, handling alerts, improving solutions, and supporting others • Playing a crucial role in the success of your company and team

Job Requirements

  • 5+ years administering Linux systems and related infrastructure in production environments
  • A collaborative SRE mindset, with familiarity around SLIs/SLOs/SLAs, error budgets, blast radius, and blameless postmortems
  • A focus on automation, reducing toil, and preventing problem recurrence
  • A track record of writing runbooks that work for the broader team, not just yourself
  • Strong Kubernetes and broader ecosystem fundamentals
  • Cloud infrastructure experience; AWS strongly preferred and bare-metal is a bonus
  • Strong tool development - Bash, plus either Python or Go preferred, or similar
  • Infrastructure-as-code tooling experience - Terraform preferred
  • CI/CD and version control, GitHub preferred
  • Database experience - one of Postgres, Cassandra, or ClickHouse preferred
  • Experience operating a production observability stack (metrics, logs, traces), with an eye for signal over noise
  • Comfortable working on live production infrastructure, with strong troubleshooting instincts and ownership of incident response
  • A history of continual professional development
  • A self-directed style suited to an async, globally distributed team, and comfortable picking up adjacent work when the situation calls for it

Benefits

  • Flexible working environments
  • Birthday Leave
  • Generous study and training allowance + 5 days paid study leave
  • Creative, fun, and contemporary workspaces
  • Motivated team of industry experts and new talent
  • Celebrated success with ‘Legend’ and ‘Kudos’ Awards
  • Health and wellness program

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Talent Hackers logo

Senior DevOps Engineer

Talent Hackers

Top talent from the fastest-growing continent on earth.

DevOps Engineer21 days ago
Full TimeRemoteTeam 11-50Since 2024H1B No Sponsor

• Built and maintained observability, alerting, and triage systems • Improve system reliability and incident response • Established and managed multi-stage environments (dev, staging, prod) • Strengthened infrastructure security across IAM, networking, and secrets management • Designed and implemented CI/CD pipelines with automated testing and deployment • Supported SOC 2 compliance by implementing monitoring, access controls, and audit-ready infrastructure • Developed OAuth-based authentication and enabled client-specific SSO integrations • Improved performance and efficiency through infrastructure/ database optimization • Enhanced job scheduling, alerting, and internal tooling to increase engineering efficiency

South Africa
Review ALL logo

Site Reliability Engineer

Review ALL

We are your Recruitment Team!

DevOps Engineer21 days ago
Full TimeRemoteTeam 11-50Since 2023H1B No Sponsor

• Own reliability for our global bare metal fleet — monitoring, alerting, incident response, post-mortems • Build and maintain internal tooling: NetBox (infrastructure source of truth), Python/Go services • Drive automation for hardware lifecycle: provisioning, decommissioning, firmware updates, network changes • Collaborate with platform engineers on the provisioning stack • Participate in on-call rotation

Brazil
HostPapa logo

Senior DevOps Engineer

HostPapa

Let Papa take care of you!

DevOps Engineer22 days ago
Full TimeRemoteTeam 51-200Since 2006H1B No Sponsor

• Design, evolve, and operate scalable and elastic cloud architectures for multi tenant SaaS platforms • Continuously challenge and improve existing infrastructure and architectural decisions to remove performance, scalability, and operability bottlenecks • Design and maintain cloud native and hybrid solutions, integrating cloud platforms with on prem systems when required • Build, maintain, and improve CI/CD pipelines that enable fast, safe, and repeatable deployments • Promote and enforce Infrastructure as Code (IaC) practices using Terraform • Automate provisioning, configuration, scaling, and recovery to reduce manual operational effort • Improve deployment strategies in collaboration with SRE teams to increase reliability and predictability • Design and operate containerized platforms using Docker and Kubernetes • Support and evolve microservices architectures, ensuring deployment safety, isolation, and scalability • Operate and support production and pre-production environments and troubleshoot complex infrastructure issues • Participate in incident response and on call rotations when required, working with SREs to reduce operational toil • Maintain clear and up to date documentation for infrastructure, pipelines, and operational procedures • Partner closely with engineering teams to improve developer experience, delivery velocity, and platform reliability • Support other tasks or projects as assigned to meet team and business needs

Canada
TechInsights logo

Senior Site Reliability Engineer

TechInsights

The most trusted source of semiconductor analysis and market information

DevOps Engineer22 days ago
Full TimeRemoteTeam 201-500Since 1989H1B No Sponsor

• Own SLOs, SLIs, and error budgets for all production services; drive error budget discipline across engineering • Design reliability patterns for AI agent pipelines: LLM observability, tool-use tracking, failure detection, and graceful degradation • Architect for blast radius containment — agent failures must have bounded customer impact through isolation, circuit breaking, and rapid recovery • Mature our Canada Central/West active-active architecture toward 24-hour RTO with full regional failover • Lead incident response and post-incident reviews that produce durable fixes; maintain DR procedures through regular testing • Serve as the primary reliability liaison to Software and AI Engineering, translating requirements into actionable standards • Partner with AI Engineering on compute provisioning, model serving, inference latency, and workload isolation • Own CI/CD pipeline strategy (Bitbucket Pipelines, GitHub Actions) — set standards, optimize deployment frequency, and ensure teams can ship confidently • Drive IDP adoption and enable teams on SRE practices: on-call readiness, SLO definition, runbook development, and self-service tooling • Represent reliability in architectural discussions; surface risk before it's committed to design • Operate Datadog as the single pane of glass for service health, infrastructure, and agentic pipeline telemetry • Extend observability to AI workloads: LLM latency, token consumption, agent completion rates, and pipeline throughput • Build golden path templates in Backstage and/or Atlassian Compass so teams ship reliably without routine SRE involvement • Own infrastructure as code via Terraform and GitOps; enforce IaC policy in partnership with Trust Assurance • Own FinOps visibility into AWS cost segments; model cloud cost impact as AI/ML workloads scale • Formally mentor junior and intermediate SRE engineers, with accountability for their technical growth and career progression • Build AI-assisted automation to progressively reduce toil and scale the team's operational capacity

Poland
zł18.8K - zł20K / year