Relevant, scalable, and blazing-fast search and discovery experiences
Senior Site Reliability Engineer, AI Research
Location
Australia
Posted
150 days ago
Salary
0
Seniority
Senior
Job Description
Senior Site Reliability Engineer, AI Research
Algolia
• Support and evolve the reliability of platforms used by the AI Research team • Ensure production services meet expectations for availability, latency, and operational readiness • Design infrastructure and operational patterns that prioritize iteration speed while maintaining appropriate safeguards for production systems • Work closely with researchers and engineers in a cross-functional setting • Participate directly in team planning and execution, from early exploration through production rollout • Help researchers self-serve infrastructure safely and effectively • Build and maintain Kubernetes-based services on GCP using infrastructure-as-code and GitOps • Own and improve CI/CD pipelines for services written primarily in Go • Design and operate observability systems using tools such as Datadog • Participate in an on-call rotation (relatively light)
Job Requirements
- Strong experience operating cloud-first infrastructure
- Hands-on experience running production services on Kubernetes
- Proficiency with infrastructure-as-code (Terraform) and CI/CD systems
- Experience supporting production services written in Go (Python experience is a plus)
- Solid grounding in service reliability, incident response, and operational best practices
- Comfort working in environments with ambiguity, where problems are not always well-defined upfront.
Benefits
- Flexible workplace strategy
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
SDLC Security Operations Engineer – DevSecOps
NorthBay SolutionsCloud Transformation for the Enterprise
• Embed security controls into CI/CD pipelines and engineering workflows • Integrate and operate security controls across CI/CD pipelines • Implement and manage SAST/DAST • Establish secure build and release practices • Drive remediation workflows with developers
Senior SRE/DevOps Engineer
ScentbirdMonthly subscription service that revolutionizes the way we discover fragrance and beauty.
• Use your shift to prevent incidents from ever happening. • Run our infrastructure with AWS, Docker and Kubernetes. • Make monitoring and alerting alert on symptoms and not on outages. • Document every action so your findings turn into repeatable actions–and then into automation. • Improve the deployment process to make it as boring as possible. • Design, build and maintain core infrastructure pieces that allow Scentbird scaling. • Debug production issues across services and levels of the stack. • Plan the growth of Scentbird’s infrastructure.
• Design, implement, and maintain CI/CD pipelines. • Manage and optimize Azure infrastructure. • Collaborate with development teams to improve deployment workflows. • Plan, execute, and manage application deployments across development, staging, and production environments. • Work directly with clients to understand requirements, provide technical guidance, and resolve issues. • Prepare documentation and reports for internal teams and clients.
• Design, implement, and manage CI/CD pipelines using tools such as Jenkins, GitHub Actions, or Azure DevOps • Develop and maintain Infrastructure-as-Code using Terraform • Manage and scale container orchestration environments using Kubernetes, including experience with larger production-grade clusters • Ensure cloud infrastructure is optimized, secure, and monitored effectively • Collaborate with data science teams to support ML model deployment and operationalization • Implement MLOps best practices, including model versioning, deployment strategies (e.g., blue-green), monitoring (data drift, concept drift), and experiment tracking (e.g., MLflow) • Build and maintain automated ML pipelines to streamline model lifecycle management



