Job Closed
This listing is no longer active.
Building the first global platform for replacement parts, starting with auto parts.
Site Reliability Engineer
Location
Australia
Posted
104 days ago
Salary
0
Seniority
Senior
Job Description
Site Reliability Engineer
Partly
• Reliability Engineering: Ensure the stability, scalability, and security of our cloud infrastructure, Partly & 3rd party applications in our Kubernetes powered clusters. Leverage Infrastructure-as-Code and automation (Terraform for GCP, GitOps with ArgoCD, Custom scripts in Python/Bash, etc.) to deploy and manage workloads and resources in a repeatable, automated way. • Cost Optimisation: Monitor and optimise costs across our cloud and on-prem infrastructure, ensuring we get maximum value from our investments. Make recommendations for resource allocation or architecture changes to improve cost-efficiency without sacrificing reliability or performance. • Cross-Functional Collaboration: Work closely with developers, data engineers, and leadership to plan infrastructure needs and improvements. Provide tooling, guidance and training to the engineering team on SRE practices, and collaborate during software delivery to ensure smooth integrations from code to production. • Software Engineering: Make sure our software meets high production readiness standards. When you see a problem or an opportunity to improve, you drive the solution. • Troubleshooting: participate in incidents resolutions, give developers helping hand in debugging applications, networks, databases, compute systems.
Job Requirements
- Software Engineering: You excel at developing and maintaining large, established software systems beyond simple scripts and utilities. You definitely know what makes software maintainable and you are able to write robust code.
- Firmly grounded computer science fundamentals: Including data structures, concurrency, architecture, APIs, testing, and design patterns.
- System engineering fundamentals: You most likely know how to deploy and use memory or stack sampling profiler, how to locate excessive lock contention, how to identify network issues, etc.
- SRE Expertise: Hands-on experience with modern SRE practices and tooling – for example, containerization (Docker/Kubernetes), infrastructure-as-code (Terraform), and GitOps workflows (ArgoCD or equivalent). You have designed, built, and maintained scalable infrastructure and CI/CD systems.
- Cloud & Systems Knowledge: Deep familiarity with at least one major cloud platform and Linux operating system. You can tune servers, manage databases/storage, and wrangle Kubernetes clusters.
- Ownership & Leadership: High degree of ownership and bias for action, with a proactive approach to solving problems. You take initiative and don’t wait to be told what to do. You have demonstrated leadership through mentoring junior engineers or leading small teams/projects, even if not formally a manager. We’re seeking a track record of ownership over critical systems and successful delivery of complex projects.
- Collaboration & Communication: Excellent communication skills (written and verbal) and a collaborative attitude. You can work across teams and departments – from explaining technical issues to non-technical colleagues, to coordinating with engineers on deployments. You value teamwork and knowledge sharing.
- Adaptability: Willingness to wear multiple hats and adapt to evolving needs. In a fast-growing startup environment, requirements can change – you’re excited by the chance to learn new skills, take on new challenges, and grow with the role.
- Bonus Points:
- Experience in a high-growth startup environment, which means you’re used to the pace and ambiguity.
- Any prior experience maintaining security compliance and certifications in a company is a plus.
- If you have used specific tools we use (GCP, ArgoCD, GitLab CI, Kafka, etc.), that’s great – if not, you can learn quickly.
- If you have significant experience running production workloads over Apache Cassandra and / or Postgres database
- If you developed software in Rust programming language and can mentor other developers on the best practices in Rust.
Benefits
- Take time when you need it. Our Partly time-off framework is simple - no questions asked. We work extremely hard and we trust our people to take the time they need to recharge.
- Zero-hierarchy & no ‘new joiner projects’. Solve robust engineering challenges from day one, without bureaucracy blocking you. We want you to get stuck in and see how you make an impact on our customers
- Dedicated Employee Experience Team. A dedicated team to help you feel connected, planted & make it really easy for you to do your best work.
- Competitive base salary + equity. While we're a startup, we offer competitive salaries and attractive equity options for all global full-time employees. We believe everyone should have a stake in being an owner
- Parental leave and flexible return to work. Return to work how it best suits you. Primary carers can return with 4-day weeks (100% pay for the first 12 weeks). Secondary carers get 10 days full pay.
- Flexible working hours. We combine flexibility with an in-office presence. Choose when and where it suits you to work - no mandatory 9-5. Have more control over when, and how you work to maintain presence with life commitments, as well as maximise professional productivity, growth and development.
- Focus Days & Ergonomic workspace. Two days per week, with zero meetings, dedicated solely to uninterrupted deep work with sit-stand desks, ergonomic chairs & plenty of quiet spaces
- Generous relocation allowance. We support your move to join our motivating environment with some of the world’s sharpest minds.
- Brand new, architecturally designed offices in Christchurch CBD and on Auckland’s Karangahape Road. Enjoy free snacks, drinks on tap, top quality coffee, social areas, and some of the best cafes a stone's throw away.
- Team connection. Monthly team lunches, celebrating our wins, happy hours and more!
- Sustainable Workplace. We’re committed to sustainability — join us to make a massive global environmental impact by eliminating waste
- Regular L&D opportunities. Whether it’s during a ‘Lunch n Learn’ or hearing from a unicorn CEO at a Fireside chat, you’ll have the opportunity to constantly learn
- Quarterly full team weeks. Join the entire company to meet at the nearest centralised location (Christchurch, Manila or London) to connect, realign on our strategy and stay motivated
- Annual global Offsite. Step away from the day-to-day for big-picture strategy, deep team bonding in a remote location with the entire global team.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
DevOps Engineer
Booz Allen HamiltonBooz Allen Hamilton is an award-winning provider of strategic innovation, management consulting, technology, and engineering services. Founded in 1914, the comp
DevOps Engineer The Opportunity: Everyone is trying to “harness the cloud,” but not everyone knows how. As a DevOps engineer, you’re eager to develop, manage, and secure a container platform that meets your client’s needs and takes advantage of cloud capabilities. We need you to help us develop container management sof tware to solve some of our clients’ toughest challenges. As a platform DevOps Engineer at Booz Allen, you can use your technical skills to affect mission-forward change. On our team, you’ll strengthen your skills using the latest cloud technologies as you look for ways to improve your client’s environment with current container sof tware to ensure seamless orchestration. Using your DevOps platform knowledge, you’ll support your team as you inform strategy and design while ensuring standards are met throughout the containerization process. You’ll work with your team to recommend resources that will help your client manage and securely adopt containers. Additionally, you’ll gain DevOps skills and experience while supporting the development of critical cloud platforms. Work with us to use cloud platform technology for good. Join us. The world can’t wait. You Have: 2+ years of experience with containerization technologies 2+ years of experience with container orchestration platforms 2+ years of experience managing sof tware deployments through CI / CD pipelines Experience developing enterprise cloud-native solutions and applying basic principles, theories, and concepts Experience with OOP scripting or program languages Ability to work with AWS or Azure Ability to work with container orchestration platforms Secret clearance Bachelor's degree Nice If You Have: Knowledge of automation, programming and scripting languages, infrastructure automation, and microservices Knowledge of triaging and resolving issues related to both open source and commer cia l tools in public cloud environments Top Secret clearance Clearance: Applicants selected will be subject to a security investigation and may need to meet eligibility requirements for access to classified information Compensation At Booz Allen, we celebrate your contributions, provide you with opportunities and choices, and support your total well-being. Our offerings include health, life, disability, financial, and retirement benefits, as well as paid leave, professional development, tuition assistance, work-life programs, and dependent care. Our recognition awards program acknowledges employees for exceptional performance and superior demonstration of our values. Full-time and part-time employees working at least 20 hours a week on a regular basis are eligible to participate in Booz Allen’s benefit programs. Individuals that do not meet the threshold are only eligible for select offerings, not inclusive of health benefits. We encourage you to learn more about our total benefits by visiting the Resource page on our Careers site and reviewing Our Employee Benefits page. Salary at Booz Allen is determined by various factors, including but not limited to location, the individual’s particular combination of education, knowledge, skills, competencies, and experience, as well as contract-specific affordability and organizational requirements. The projected compensation range for this position is $61,900.00 to $141,000.00 (annualized USD). The estimate displayed represents the typical salary range for this position and is just one component of Booz Allen’s total compensation package for employees. This posting will close within 90 days from the Posting Date. Identity Statement As part of the application process, you are expected to be on camera during interviews and assessments. We reserve the right to take your picture to verify your identity and prevent fraud. Work Model Our people-first culture prioritizes the benefits of flexibility and collaboration, whether that happens in person or remotely. If this position is listed as remote or hybrid, you’ll periodically work from a Booz Allen or client site facility. If this position is listed as onsite, you’ll work with colleagues and clients in person, as needed for the specific role. Commitment to Non-Discrimination All qualified applicants will receive consideration for employment without regard to disability, status as a protected veteran or any other status protected by applicable federal, state, local, or international law.
ABOUT BASETEN Baseten powers mission-critical inference for the world's most dynamic AI companies, like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma and Writer. By uniting applied AI research, flexible infrastructure, and seamless developer tooling, we enable companies operating at the frontier of AI to bring cutting-edge models into production. We're growing quickly and recently raised our $300M Series E, backed by investors including BOND, IVP, Spark Capital, Greylock, and Conviction. Join us and help build the platform engineers turn to to ship AI products. THE ROLE As a Site Reliability Engineer, you'll envision and build robust systems and processes that ensure our infrastructure is scalable, reliable, and efficient. This can range from automating deployments and monitoring systems to optimizing performance and managing incidents. We all work closely with our users, learning from their past struggles in operationalizing ML, onboarding them onto our platform, and turning our learnings into ideas for improving Baseten. EXAMPLE INITIATIVES You'll get to work on these types of projects as part of our Infrastructure team: Multi-cloud capacity management Inference on B200 GPUs Multi-node inference Fractional H100 GPUs for efficient model serving RESPONSIBILITIES Build and maintain scalable infrastructure to support the deployment and operation of machine learning models. Establish standards and best practices for reliability and performance across the infrastructure. Automate processes when relevant, particularly for managing CI/CD pipelines. Own products and projects end-to-end, functioning as both an engineer and a project manager, with a focus on user empathy, project specification, and end-to-end execution. Collaborate with cross-functional teams to understand project requirements and translate them into technical solutions. REQUIREMENTS Bachelor's, Master's, or Ph.D. degree in Computer Science, Engineering, Mathematics, or related field. 5+ years of professional work experience in a fast-paced, high-growth environment. Extensive experience with Kubernetes. Experience in building and maintaining scalable infrastructure. Experience with infrastructure-as-code tools (e.g., Terraform, CloudFormation, Pulumi) and CI/CD tooling (e.g., GitHub Actions, GitLab CI, Circle CI, Jenkins). Relevant OSS observability experience (Prometheus, ELK stack, Grafana stack, Opentelemetry) is a plus. Ability to own projects end-to-end, from project specification to execution. No prior machine learning experience required, but should be open to learning about it. BENEFITS Competitive compensation, including meaningful equity. 100% coverage of medical, dental, and vision insurance for employee and dependents Generous PTO policy including company wide Winter Break (our offices are closed from Christmas Eve to New Year's Day!) Paid parental leave Company-facilitated 401(k) Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities. Apply now to embark on a rewarding journey in shaping the future of AI! If you are a motivated individual with a passion for machine learning and a desire to be part of a collaborative and forward-thinking team, we would love to hear from you. At Baseten, we are committed to fostering a diverse and inclusive workplace. We provide equal employment opportunities to all employees and applicants without regard to race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, genetic information, disability, or veteran status.
• Build and maintain the Multigres Operator - Maintain our Go-based Kubernetes operator that orchestrates distributed Postgres deployments • Architect cloud deployment infrastructure - Design and implement robust deployment patterns for EKS and other Kubernetes platforms • Manage storage and networking layers - Work with CSI drivers, persistent volumes, and cross-cloud networking to ensure data reliability and connectivity • Develop deployment tooling - Create internal tools and automation for provisioning, scaling, and managing Multigres clusters • Ensure operational excellence - Build monitoring, alerting, and diagnostic capabilities into the deployment layer • Collaborate across teams - Work with database engineers, SRE, and product teams to deliver seamless deployment experiences
About Andesite: After decades defending the nation's most sensitive networks, we founded Andesite with a clear mission: to build security products that transform how humans and AI collaborate to defend against increasingly sophisticated cyber threats. We’re a diverse team of cyber and security experts, passionate technologists, and experienced product builders. We come from some of the largest national security, tech, cybersecurity, and data organizations on the planet. We've raised more than $38 million from investors like General Catalyst and Red Cell Partners. The future of cybersecurity isn't about better technology alone—it's about reimagining how humans and machines work together. Come build with us. The Role: We are looking for a Senior Release Engineer to own the bridge between our engineering "factory" and our diverse customer environments. You will be responsible for the "definition of done" for our software, ensuring that our weekly SaaS updates are seamless and our self-managed bundles are robust, compliant, and audit-ready. This is a high-impact, hybrid role. You will spend 50% of your time on technical automation (CI/CD, packaging, artifact signing) and 50% on release orchestration (coordinating with Customer Support, Field Engineering, and government compliance officers). You will have the support of a number of others across departments to deliver regular product updates to our customers! What You'll Do: Technical Delivery & Packaging Design and maintain the pipelines that produce our Single-Tenant SaaS updates and our Self-Managed customer bundles. Ensure "Build Once, Deploy Anywhere" consistency across standard cloud and restricted GovCloud environments. Manage artifact lifecycle, including versioning, container registries, and software signing to meet federal security standards. Release Orchestration & Compliance Act as the primary technical point of contact for ISSM (Information System Security Manager) approvals for GovCloud deployments. Maintain the "Version Map"—tracking which customers are on which versions and managing the complexities of "version lag" for those who opt out of weekly updates. Coordinate across teams to validate bundles before they are shipped to customer-managed environments. Automation and Metrics Continually improve our release operations and processes through automation Develop and track metrics for release operations, recommend and develop solutions to improve alongside the engineering team. Communication & Documentation Lead "Go/No-Go" decisions, synthesizing input from QA, Support, and Product. Empower Customer Support and Sales Engineering by providing them with clear "Known Issues" lists and migration paths for each release, with the support of the engineering and product team for input. What You Have: 5+ years in DevOps, Release Engineering, or SRE, specifically in a company that ships both SaaS and On-Prem/Self-Managed software. Strong communication and coordination skills - you have high agency and can own work end to end through ambiguous situations Deep experience with Docker and Helm Deep experience with AWS, Familiarity with SOC2, FedRAMP, and/or IL4/IL5/IL6 environments. You understand that "compliance" isn't a hurdle; it's a requirement of the build. A competitive salary, bonus, and equity package 100% employer paid, comprehensive health insurance including medical, dental, and vision for you and your family Unlimited PTO, with your manager’s approval Flexible work environment where you manage your work day A remote-first environment, with occasional travel to collaborate with customers, your team, and teammates from across the company in person 14 weeks of fully-paid parental leave Salary range : $170,000 - $210,000. This represents the typical salary range for this position based on experience, skills, and other factors. Andesite is an equal opportunity employer, and qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender perception or identity, national origin, age, marital status, protected veteran status, or disability status. We encourage candidates from all backgrounds to apply, even if you don't feel like you're a perfect fit. If you're passionate about contributing to our mission, we'd love to hear from you!


