IBMC

Driving Business Success in Indonesia.

Senior AWS DevOps Engineer

DevOps EngineerDevOps EngineerFull Time Remote SeniorTeam 51-200Since 2022H1B No SponsorCompany Site LinkedIn

Location

Indonesia

Posted

143 days ago

Salary

Seniority

Senior

EnglishAWS DynamoDB Amazon EC2 Elasticsearch NoSQL Prometheus Python Terraform

Job Description

• Infrastructure Excellence: Design, build, and maintain robust AWS infrastructure to support scalable, secure, and high-availability applications. • Automation & CI/CD: Manage and continuously improve CI/CD pipelines to streamline deployments and ensure maximum system reliability. • Security & Compliance: Implement AWS security best practices, including IAM roles/policies, secure networking (VPC/VPN), and data protection measures. • Performance & Cost Optimization: Proactively monitor and optimize system performance, scalability, and cost efficiency across all AWS environments. • Resilience & Troubleshooting: Troubleshoot complex infrastructure issues, develop disaster recovery strategies, and ensure overall operational resilience. • Technical Mentorship: Provide guidance and mentorship on AWS and DevOps best practices to the wider engineering team to foster a culture of excellence.

Job Requirements

Cloud Expertise: Proven experience designing, deploying, and maintaining production-grade systems on AWS.
Hands-on Experience: Deep technical proficiency in:
Compute: EC2 (scaling & maintenance) and AWS Lambda (cost optimization).
Networking & Security: API Gateway, AWS VPN solutions, and IAM.
Databases: DynamoDB, Elasticsearch/OpenSearch, and ScyllaDB (or similar distributed NoSQL).
DevOps Culture: Strong experience maintaining and improving automated CI/CD pipelines.
Monitoring: Solid experience in performance optimization and monitoring tools (CloudWatch, Prometheus, or similar).
Communication: Excellent communication skills with a proven ability to bridge the gap between technical execution and business needs.
Preferred (Nice-to-Have)
IaC: Proficiency in Infrastructure as Code using Terraform or CloudFormation.
Serverless: Familiarity with AWS SAM or Serverless Framework.
Automation: Proficiency in Python, Bash, or PowerShell for automation and monitoring.
Industry Background: Previous experience in Fintech, SaaS, or high-availability environments.

Benefits

Competitive Salary: Aligned with your seniority and technical expertise.
Remote & Flexible: Fully remote setup with a results-oriented culture and global collaboration.
Time Off: Paid annual leave and public holidays.
Growth: Professional development opportunities, including training and support for AWS certifications.
Innovation: Exposure to cutting-edge technologies in a supportive and collaborative international environment.

Related Categories

DevOps Engineer

Related Job Pages

Remote Full-time Jobs (US)Remote Python Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

Principal Site Reliability Engineer – AI-first SRE

The New York Times

DevOps Engineer143 days ago

Full Time RemoteTeam 1,001-5,000Since 1851H1B Sponsor

Company Site LinkedIn

• Architect and maintain self-healing systems with 99.9%+ availability targets. • Use AI/ML to automate infrastructure governance and detect configuration or IaC anti-patterns. • Implement adaptive SLIs/SLOs that evolve automatically from real-time data. • Build AIOps-based observability and auto-remediation pipelines. • Apply predictive modeling to forecast failures before they impact users. • Lead chaos, performance, and resilience testing programs. • Map platform and service behavior to revenue impact and drive improved revenue resilience through better infrastructure performance. • Mentor engineers and drive reliability standards across teams. • Partner with platform, data, and product teams to ensure stability aligns with business goals. • Support major incident response, incident review, and participate in on-call rotations.

AWS GCP Grafana Kubernetes Prometheus Python Terraform

View details: Principal Site Reliability Engineer – AI-first SRE

Argentina

Apply

Job Closed

Software Engineer – Site Reliability Engineer

Captions

Your AI-powered creative studio.

DevOps Engineer144 days ago

Full Time RemoteTeam 11-50H1B No Sponsor

Company Site LinkedIn

• You will be responsible for the availability and integrity of the infrastructure that underpins Alkira’s Cloud Networking platform • You hold the production systems together; troubleshoot issues that arise in production deployment • Provide 24x7 coverage as a part of scheduled shift and on-call rotation • Work with multiple tools like Prometheus, Grafana, Jira etc. to monitor, manage, triage and document infrastructure issues in real time • Automate infrastructure deployment using CI/CD • Build necessary tools to evolve how we maintain and monitor our solution • Develop and execute system and integration test plans

AWS Azure GCP Grafana Jenkins Kubernetes Prometheus Terraform

View details: Software Engineer – Site Reliability Engineer

India

Apply

Senior Software Reliability Engineer – AI

MixMode

Automated threat detection, unparalleled network visibility, & deep guided investigation powered by Self-Supervised AI.

DevOps Engineer146 days ago

Other RemoteTeam 11-50H1B No Sponsor

Company Site LinkedIn

• Own the reliability, performance, and operational health of production AI systems, focusing on improving complex, existing services. • Lead efforts to refactor and harden the AI codebase to improve observability, maintainability, and resilience. • Diagnose and resolve issues across distributed systems, including latency, throughput, data pipelines, and resource utilization. • Design and build monitoring, alerting, and debugging tools for high-availability services. • Partner with researchers and ML engineers to productionize models at scale. • Establish best practices for testing, deployment, capacity planning, and incident response. • Serve as a technical leader during on-call rotations, driving incident response, postmortems, and continuous system improvements.

Distributed Systems Java Apache Kafka Kotlin Kubernetes MySQL PostgreSQL Python Scala Apache Spark

View details: Senior Software Reliability Engineer – AI

California

Apply

Job Closed

Staff Site Reliability Engineer

PathAI

Improving patient outcomes with AI-powered pathology.

DevOps Engineer146 days ago

Other RemoteTeam 501-1,000Since 2016H1B Sponsor

Company Site LinkedIn

• Advancing the state of our operations by implementing SRE best practices - focusing on users, monitoring, and automation. • Engineering infrastructure patterns for cloud environments in Amazon Web Services - building in security, reliability and scalability. • Designing, building, and operating our data center to support our rapidly growing Machine Learning team. • Integrating on-premises datacenter environments with existing cloud infrastructure to create a seamless hybrid cloud environment. • Improving the reliability and resilience of our infrastructure through root-cause analysis and reviewing gaps in designs, and implementations of our infrastructure. • Participating in platform on-call rotations and assisting with urgent incident response.

Ansible AWS Grafana Prometheus Python Terraform

View details: Staff Site Reliability Engineer

Massachusetts

$165.8K - $224.5K / year

Apply

Senior AWS DevOps Engineer

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Principal Site Reliability Engineer – AI-first SRE

Software Engineer – Site Reliability Engineer

Senior Software Reliability Engineer – AI

Staff Site Reliability Engineer