Job Closed
This listing is no longer active.
Meet Your Future Workforce.
Senior Site Reliability Engineer
Location
United States
Posted
73 days ago
Salary
0
Seniority
Senior
Job Description
Senior Site Reliability Engineer
Wand AI
• Build, maintain, and operate scalable production infrastructure. • Own reliability and availability for key services and environments. • Contribute to the design and operation of Kubernetes-based infrastructure. • Develop and maintain Infrastructure-as-Code frameworks (e.g., Terraform). • Improve monitoring, alerting, and observability across systems. • Participate in on-call rotations and respond to production incidents. • Investigate root causes of incidents and contribute to postmortems and reliability improvements. • Improve system performance, availability, and fault tolerance. • Contribute to CI/CD pipeline improvements to increase release safety and predictability. • Support the deployment and operation of data platforms and ML workloads. • Help standardize environments and infrastructure across internal systems and customer deployments. • Troubleshoot issues across infrastructure, services, and deployment pipelines. • Work closely with QA and engineering teams to improve production readiness and release stability. • Contribute to automation efforts that reduce operational toil.
Job Requirements
- Strong hands-on experience in Site Reliability Engineering, DevOps roles.
- Experience working with cloud infrastructure (AWS preferred).
- Experience operating production systems and responding to incidents.
- Experience with Kubernetes in production environments.
- Strong experience with Infrastructure-as-Code (Terraform or similar).
- Experience working with CI/CD pipelines and deployment automation.
- Experience with monitoring, logging, and observability tooling.
- Strong troubleshooting and debugging skills in distributed systems.
- Experience supporting data platforms or ML workloads in production environments.
- Strong collaboration and communication skills.
Benefits
- Flexible working hours
- Professional development opportunities
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Collaborate closely with architects, developers, QA, and security teams to ensure smooth and reliable environment operations • Work in close partnership with the platform team, based on shared ownership, knowledge exchange, and mutual support • Own and operate containerized application platforms based on Docker and Kubernetes, ensuring reliability, scalability, and operational excellence • Design and deliver dynamic test environments at scale, including multiple parallel, per–merge request (branch-based) deployments • Build, maintain, and standardize CI/CD pipelines by creating reusable templates and components in GitLab CI • Drive deployment automation and GitOps practices • Identify operational bottlenecks and implement automation to reduce manual effort and improve delivery speed • Embed security-by-design across the SDLC, including pipeline hardening and automated security checks • Build and operate observability platforms: monitoring, logging, and diagnostics (Prometheus, Grafana, ELK/EFK/Loki, etc.) • Participate in on-call and incident response, including troubleshooting, root-cause analysis, and post-mortems • Take end-to-end ownership of the solutions you build (“you build it, you run it”).
• Enhance, optimize, validate and automate core MinIO software for performance, scalability, and security. • Help building and delivering high-performance distributed storage solutions with a focus on cloud-native architectures. • Validate the MinIO Software according to customer environment and requirements, ensuring no surprises are observed at customer deployments. • Improve existing features, fix critical issues, and contribute to open-source repositories. • Collaborate with other engineers to refine architecture, APIs, and integrations. • Write efficient, well-documented, and maintainable code. • Conduct performance benchmarking and debugging of complex storage environments. • Work closely with customers to address issues, and manage expectations.
• Manage Linux/Unix environments, including network and application configuration • Implement and manage CI/CD pipelines (e.g., GitHub Actions, Azure DevOps) • Automate development, operations, and deployment tasks • Work with Docker containers for application deployment • Manage and support cloud infrastructure (Cloud / PaaS) • Ensure security practices (SecOps), including SSH and certificates • Work with relational and NoSQL databases • Work with distributed/clustered systems • Use enterprise tools (BPM, ESB, messaging such as RabbitMQ) • Document systems, networks, and processes • Gather requirements, propose solutions, and communicate with the team • Work in global teams or lead DevOps workstreams
• Atuação em equipes e frentes de trabalho em DevOps, em ambientes globais de empresas de serviços profissionais, com foco em: • Ambientes Linux/Unix e administração de sistemas; • Gerenciamento de configuração; • Múltiplas linguagens de scripting em Linux (bash, awk, sed, Python); • Sistemas hospedados em cloud; • Capacidades de hardware; • Segurança de transporte, SSH e certificados; • Uso de containers Docker para deploy de aplicações; • Bancos de dados relacionais; • Bancos de dados NoSQL; • Sistemas distribuídos, clusterizados ou de grid computing.




