Job Closed

This listing is no longer active.

Articul8 AI logo
Articul8 AI

Solving the world's toughest problems with Generative AI.

Senior Site Reliability Engineer – Chaos Engineering

DevOps EngineerDevOps EngineerFull TimeRemoteSeniorTeam 11-50H1B No SponsorCompany SiteLinkedIn

Location

Brazil

Posted

152 days ago

Salary

0

Seniority

Senior

Job Description

Senior Site Reliability Engineer – Chaos Engineering

Articul8 AI

• Architect and maintain scalable, highly available infrastructure for our GenAI platform. • Design and implement robust monitoring, alerting, and observability solutions to proactively ensure system health and performance. • Automate deployment, scaling, and management of our cloud-native infrastructure, reducing toil and improving efficiency. • Define, measure, and improve Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to deliver outstanding service quality. • Participate in on-call rotations and provide rapid response to production incidents, minimizing downtime and user impact. • Collaborate closely with development teams to build reliable, scalable, and efficient systems for complex AI workloads. • Lead incident response efforts, conduct thorough post-mortems, and champion continuous improvement initiatives. • Optimize infrastructure for performance, scalability, and cost-effectiveness—especially for high-demand AI workloads. • Implement and enforce security best practices across all systems and environments. • Create and maintain comprehensive documentation, including runbooks and knowledge base articles, to foster a culture of shared knowledge.

Job Requirements

  • Bachelor's degree in Computer Science, Engineering, or related field, or equivalent practical experience
  • 5+ years of experience in DevOps, SRE, or similar roles
  • Strong experience with cloud platforms (AWS, GCP, or Azure)
  • Proficiency in at least one programming/scripting language (Python, Go, Bash, etc.)
  • Hands-on experience with infrastructure as code tools (Terraform, CloudFormation, etc.)
  • Solid background in containerization technologies (Docker, Kubernetes)
  • Proven experience with monitoring and observability tools (Prometheus, Grafana, ELK stack, etc.)
  • Strong understanding of CI/CD pipelines and automation
  • Exceptional troubleshooting and problem-solving skills and ability to troubleshoot complex systems
  • Experience with chaos engineering tools such as Chaos Monkey, Gremlin, or similar frameworks
  • Familiarity with container orchestration platforms like Kubernetes and related chaos tools
  • Preferred
  • Experience supporting AI/ML systems in production
  • Knowledge of GPU infrastructure management and optimization
  • Familiarity with distributed systems and high-performance computing
  • Experience with database systems (SQL and NoSQL)
  • Certifications in cloud platforms (AWS, GCP, Azure)
  • Experience with chaos engineering and resilience testing
  • Knowledge of security best practices and compliance requirements

Benefits

  • Ready to shape the future of resilient software systems? Apply now and help drive the reliability of tomorrow’s AI at Articul8 AI!***NOTE: This position is available via CLT contract only, Thank you!

Related Categories

Related Job Pages

More DevOps Engineer Jobs

BlueMatrix logo

DevOps Engineer

BlueMatrix

The leading technology provider for the global investment research industry.

DevOps Engineer152 days ago
Full TimeRemoteTeam 51-200Since 1999H1B No Sponsor

• Implement and maintain CI/CD pipelines using GoCD and GitLab. • Manage Terraform and Terragrunt modules to provision and maintain infrastructure. • Automate configuration management and environment setup using Ansible. • Administer and optimize Linux-based systems across hybrid cloud environments. • Support database cluster configurations (e.g., MySQL, Cassandra) and troubleshoot issues. • Deploy and maintain Docker and Kubernetes environments across multiple tiers. • Contribute to infrastructure observability using AWS CloudWatch and log pipelines. • Support secrets management, IAM policies, and environment-specific access control using SSM and AWS best practices.

India
₹1,500K - ₹3,000K / year
Motivity logo

Cloud DevOps Engineer

Motivity

The only clinically-driven all-in-one practice management solution for ABA. Data collection, scheduling, billing, + more

DevOps Engineer155 days ago
OtherRemoteTeam 11-50Since 2015H1B Sponsor

• Take on varied roles within a small, growing team of engineers • Tackle full stack development concerns in the frontend, backend and infrastructure • Work closely with the team on architecture, design and code reviews, while continuing to spend the majority of their time doing hands-on development • Work closely with business stakeholders to ensure requests meet the needs of the business and clinical product leaders • Provide technical support as necessary to customers and third-party vendors • Identify and resolve technical issues

United States
Job Closed
Full TimeRemoteTeam 1-10H1B No Sponsor

• Develop your personal brand and professional visibility. • Create engaging training courses tailored to learners' needs. • Contribute to the development of IT skills for our learners worldwide.

France
eSimplicity logo

Senior DevOps Engineer

eSimplicity

An engineering firm that delivers high-quality Healthcare IT, Cybersecurity, and Telecommunication solutions.

DevOps Engineer155 days ago
OtherRemoteTeam 51-200Since 2016H1B No Sponsor

• Design, build, and maintain secure CI/CD pipelines using GitHub Actions to deliver applications and infrastructure • Embed security controls, tools (SAST, DAST, SCA), and processes throughout the software development lifecycle • Manage and secure cloud infrastructure using Infrastructure as Code (IaC) with Terraform and Terragrunt • Implement and manage security for containerized applications using Docker • Collaborate with development teams (Java, Python, Django) to identify and remediate security vulnerabilities in code and dependencies • Automate security monitoring, logging, and incident response procedures within the AWS cloud environment • Ensure systems and applications meet federal compliance standards (e.g., FISMA, NIST) and CMS-specific security requirements • Support the security of data platforms and services, including Databricks and Redshift • Work with cross-functional teams to foster a culture of security awareness and best practices

Maryland
$106.3K - $136.6K / year
Job Closed