Job Closed
This listing is no longer active.
Building the first global platform for replacement parts, starting with auto parts.
Site Reliability Engineer
Location
New Zealand
Posted
125 days ago
Salary
0
Seniority
Senior
Job Description
Site Reliability Engineer
Partly
• Reliability Engineering: Ensure the stability, scalability, and security of our cloud infrastructure, Partly & 3rd party applications in our Kubernetes powered clusters. Leverage Infrastructure-as-Code and automation (Terraform for GCP, GitOps with ArgoCD, Custom scripts in Python/Bash, etc.) to deploy and manage workloads and resources in a repeatable, automated way. • Cost Optimisation: Monitor and optimise costs across our cloud and on-prem infrastructure, ensuring we get maximum value from our investments. Make recommendations for resource allocation or architecture changes to improve cost-efficiency without sacrificing reliability or performance. • Cross-Functional Collaboration: Work closely with developers, data engineers, and leadership to plan infrastructure needs and improvements. Provide tooling, guidance and training to the engineering team on SRE practices, and collaborate during software delivery to ensure smooth integrations from code to production. • Software Engineering: Make sure our software meets high production readiness standards. When you see a problem or an opportunity to improve, you drive the solution. • Troubleshooting: participate in incidents resolutions, give developers helping hand in debugging applications, networks, databases, compute systems.
Job Requirements
- Software Engineering: You excel at developing and maintaining large, established software systems beyond simple scripts and utilities. You definitely know what makes software maintainable and you are able to write robust code.
- Firmly grounded computer science fundamentals: Including data structures, concurrency, architecture, APIs, testing, and design patterns.
- System engineering fundamentals: You most likely know how to deploy and use memory or stack sampling profiler, how to locate excessive lock contention, how to identify network issues, etc.
- SRE Expertise: Hands-on experience with modern SRE practices and tooling – for example, containerization (Docker/Kubernetes), infrastructure-as-code (Terraform), and GitOps workflows (ArgoCD or equivalent). You have designed, built, and maintained scalable infrastructure and CI/CD systems.
- Cloud & Systems Knowledge: Deep familiarity with at least one major cloud platform and Linux operating system. You can tune servers, manage databases/storage, and wrangle Kubernetes clusters.
- Ownership & Leadership: High degree of ownership and bias for action, with a proactive approach to solving problems. You take initiative and don’t wait to be told what to do. You have demonstrated leadership through mentoring junior engineers or leading small teams/projects, even if not formally a manager. We’re seeking a track record of ownership over critical systems and successful delivery of complex projects.
- Collaboration & Communication: Excellent communication skills (written and verbal) and a collaborative attitude. You can work across teams and departments – from explaining technical issues to non-technical colleagues, to coordinating with engineers on deployments. You value teamwork and knowledge sharing.
- Adaptability: Willingness to wear multiple hats and adapt to evolving needs. In a fast-growing startup environment, requirements can change – you’re excited by the chance to learn new skills, take on new challenges, and grow with the role.
- Bonus Points: Experience in a high-growth startup environment, which means you’re used to the pace and ambiguity. Any prior experience maintaining security compliance and certifications in a company is a plus. If you have used specific tools we use (GCP, ArgoCD, GitLab CI, Kafka, etc.), that’s great – if not, you can learn quickly. If you have significant experience running production workloads over Apache Cassandra and / or Postgres database. If you developed software in Rust programming language and can mentor other developers on the best practices in Rust.
Benefits
- Take time when you need it.
- Zero-hierarchy & no ‘new joiner projects’.
- Dedicated Employee Experience Team.
- Competitive base salary + equity.
- Parental leave and flexible return to work.
- Flexible working hours.
- Focus Days & Ergonomic workspace.
- Generous relocation allowance.
- Brand new, architecturally designed offices in Christchurch CBD and on Auckland’s Karangahape Road.
- Team connection.
- Sustainable Workplace.
- Regular L&D opportunities.
- Quarterly full team weeks.
- Annual global Offsite.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
DevOps Engineer, Security
ReflowGet real-time visibility, make data-driven decisions, and measure ROI from automation and optimization.
• Own infrastructure deployment, monitoring, and scaling across AWS, GCP, or Azure. • Build and maintain CI/CD pipelines, containerized environments (Docker, Kubernetes), and automated deployment flows. • Design isolated multi-tenant environments for enterprise customers. • Implement and manage infrastructure monitoring, alerting, and performance dashboards. • Lead initiatives in data protection, encryption, and threat detection. • Manage infrastructure as code (Terraform, Ansible) to ensure reproducibility and version control. • Partner with engineering to plan and execute a migration toward a more efficient, secure cloud architecture. • Support ongoing security reviews, compliance efforts, and audits.
Senior DevOps/SRE Engineer
NateraFounded in 2004 and led by CEO Steve Chapman, Natera is a company in the biotechnology market that offers genetic testing and diagnostics on a global scale. Operating from its head
• Own the entire Laboratory Operations Software release process execution, ensuring smooth and timely software releases with minimal downtime. • Continuously monitor the effectiveness of the release process and implement improvements to increase efficiency, reduce errors, and enhance overall quality. • Act as an internal consultant and subject matter expert, coaching individual product teams on best-in-class DevOps practices, including infrastructure-as-code (IaC), monitoring, logging, and security integration. • Embed with development teams to assess and improve DevOps maturity, delivery practices, and operational readiness. • Design and implement a variety of projects to support extreme growth of complexity of applications as well as to enable innovation. • Provide hands-on guidance in CI/CD, cloud infrastructure usage, Kubernetes operations, and observability. • Help teams adopt existing infrastructure, platforms, and tooling provided by central Cloud / Platform teams. • Promote and reinforce technical standards, guardrails, and best practices that allow teams to operate autonomously while remaining compliant and secure. • Guide teams in applying organizational expectations around reliability, security, and cost management through automation rather than manual controls. • Serve as a feedback channel to central platform and cloud teams, sharing adoption challenges and improvement opportunities. • Continuously improve and automate infrastructure provisioning, configuration management, application deployment, and testing using tools like Terraform, Kubernetes and CI/CD. • Advocate for automation-first approaches to reduce operational toil and risk. • Partner with teams to define and implement Service Level Indicators (SLIs), Service Level Objectives (SLOs), and operational dashboards for their services. • Guide teams through incident response, post-incident reviews, and reliability improvements. • Identify systemic reliability issues and escalate platform-level concerns to the appropriate owning teams. • Drive capacity planning and performance tuning activities to ensure scalability and efficiency. • Provide expert-level support for complex infrastructure and deployment issues escalated by the product teams. • Assist teams in root cause analysis and long-term remediation. • Create and maintain clear documentation, runbooks, release process, CI/CD pipelines, and regression testing procedures. • Maintain comprehensive documentation of the release process, CI/CD pipelines, and regression testing procedures. • Share best practices and lessons learned across teams to raise overall DevOps maturity.
Mobile App DevSecOps Engineer, Clearance Required
LMILMI is a nonprofit business that was established in 1961 to address complex issues throughout the federal government of the United States. LMI is headquartered in McLean, Virginia
• Work with technical lead to ensure solutions are aligned with the ATIS enterprise architecture. • Design, build, and maintain CI/CD pipelines using technical resources that integrate secure code scanning. • Implement DevSecOps best practices to enable continuous delivery with Army and program-specific security controls. • Automate infrastructure provisioning and configuration using Infrastructure as Code (IaC) tools. • Integrate and manage security tools within the CI/CD pipeline. • Collaborate with cross-functional teams to align DevSecOps capabilities with Agile delivery. • Monitor pipeline and environment performance, perform troubleshooting, and resolve integration and deployment issues. • Enforce compliance with DoW Risk Management Framework (RMF), NIST SP 800-53, and STIG requirements.
• Lead and manage the DevOps team and management of Bitcoin Depot cloud infrastructure. • Migration of applications/services from cloud providers to AWS. • Work alongside the software engineering team to develop CI/CD pipelines. • Development of Terraform scripts for deployment of application/services in AWS infrastructure. • Monitoring and assisting in resolving system issues as they arise. • Ensure high availability, performance and cost efficiency of cloud infrastructure. • Collaborate with Software Engineering, IT Operations, and Security teams.



