Captions logo
Captions

Your AI-powered creative studio.

Software Engineer – Site Reliability Engineer

Location

India

Posted

142 days ago

Salary

0

Seniority

Mid Level

Job Description

Software Engineer – Site Reliability Engineer

Captions

• You will be responsible for the availability and integrity of the infrastructure that underpins Alkira’s Cloud Networking platform • You hold the production systems together; troubleshoot issues that arise in production deployment • Provide 24x7 coverage as a part of scheduled shift and on-call rotation • Work with multiple tools like Prometheus, Grafana, Jira etc. to monitor, manage, triage and document infrastructure issues in real time • Automate infrastructure deployment using CI/CD • Build necessary tools to evolve how we maintain and monitor our solution • Develop and execute system and integration test plans

Job Requirements

  • At least 2 years’ experience in management of production systems
  • Self starter and a solution oriented mindset. You see potential challenges as opportunities to learn and grow
  • Experience with cloud providers, AWS, Azure or GCP
  • Experience with computer networking and network technologies
  • Experience with CI/CD pipelines such as Concourse-CI, Jenkins.
  • Experience with Kubernetes
  • Excellent problem-solving skills and ability to quickly grasp new concepts
  • Highly desirable candidates with Hashicorp Certified: Terraform Associate

Benefits

  • Health insurance
  • Professional development opportunities

Related Categories

Related Job Pages

More DevOps Engineer Jobs

MixMode logo

Senior Software Reliability Engineer – AI

MixMode

Automated threat detection, unparalleled network visibility, & deep guided investigation powered by Self-Supervised AI.

DevOps Engineer143 days ago
OtherRemoteTeam 11-50H1B No Sponsor

• Own the reliability, performance, and operational health of production AI systems, focusing on improving complex, existing services. • Lead efforts to refactor and harden the AI codebase to improve observability, maintainability, and resilience. • Diagnose and resolve issues across distributed systems, including latency, throughput, data pipelines, and resource utilization. • Design and build monitoring, alerting, and debugging tools for high-availability services. • Partner with researchers and ML engineers to productionize models at scale. • Establish best practices for testing, deployment, capacity planning, and incident response. • Serve as a technical leader during on-call rotations, driving incident response, postmortems, and continuous system improvements.

California
Job Closed
PathAI logo

Staff Site Reliability Engineer

PathAI

Improving patient outcomes with AI-powered pathology.

DevOps Engineer144 days ago
OtherRemoteTeam 501-1,000Since 2016H1B Sponsor

• Advancing the state of our operations by implementing SRE best practices - focusing on users, monitoring, and automation. • Engineering infrastructure patterns for cloud environments in Amazon Web Services - building in security, reliability and scalability. • Designing, building, and operating our data center to support our rapidly growing Machine Learning team. • Integrating on-premises datacenter environments with existing cloud infrastructure to create a seamless hybrid cloud environment. • Improving the reliability and resilience of our infrastructure through root-cause analysis and reviewing gaps in designs, and implementations of our infrastructure. • Participating in platform on-call rotations and assisting with urgent incident response.

Massachusetts
$165.8K - $224.5K / year
MetaMask logo

Senior Staff DevOps Engineer

MetaMask

The World’s Leading Web3 Wallet

DevOps Engineer144 days ago
OtherRemoteTeam 51-200Since 2016H1B No Sponsor

• Deliver, upgrade and maintain infrastructure with high cybersecurity standards (ISO/SOC2) • Drive our code deployment (CI / CD) • Set-up, configure and run development/test and staging/production infrastructure across multiple products and critical applications and multiple cloud providers (AWS, Azure) • Collaborate with developers, SREs, Product Managers and other roles within the business group • Empower development teams on a day to day while thinking strategically and planning for platform growth

United States
$160K - $218K / year
Impiricus logo

DevOps Engineer

Impiricus

The future of HCP-Pharma connectivity. Impiricus is the HCP-preferred platform to engage with Pharma.

DevOps Engineer144 days ago
OtherRemoteTeam 11-50Since 2020H1B No Sponsor

• Design, build, and maintain scalable AWS infrastructure using Infrastructure as Code tools such as Terraform or AWS CloudFormation. • Develop and manage CI/CD pipelines leveraging AWS services (e.g. CodePipeline, CodeBuild, CodeDeploy) and/or third-party tools. • Operate and optimize containerized and serverless workloads using services such as EKS, ECS, Lambda, and Fargate. • Monitor, log, and troubleshoot systems using Amazon CloudWatch, AWS X-Ray, and related observability tools to ensure high availability. • Implement AWS security best practices, including IAM, network security (VPCs, security groups), and secrets management. • Automate infrastructure operations, scaling, and maintenance using scripting and AWS-native automation services. • Lead incident response and post-incident reviews, driving continuous improvements in reliability, performance, and cost optimization. • Support additional infrastructure and operational responsibilities as needed.

New York
$110K - $130K / year
Job Closed