Job Closed

This listing is no longer active.

While the position is remote, we’re only able to proceed with candidates who already hold the legal right to live and work in South Africa.

Senior DevOps Engineer

DevOps EngineerDevOps EngineerFull Time Remote Senior

Location

South Africa

Posted

3 days ago

Salary

Seniority

Senior

No structured requirement data.

Job Description

Role Description We’re searching for a Senior DevOps Engineer to help scale and automate our cloud infrastructure, software delivery pipelines, and deployment processes across modern cloud-based environments. - Build, configure, and maintain production-grade Kubernetes environments from the ground up across cloud and hybrid infrastructure. - Develop, scale, and maintain robust Infrastructure as Code (IaC) pipelines, primarily leveraging Terraform to orchestrate cloud resources. - Automate and secure cloud infrastructure in Azure, AWS, or both. - Manage and optimize advanced Helm deployments and containerized workloads. - Implement and manage observability and alerting using the Prometheus/Grafana stack (Loki, Alloy, Tempo) to monitor platform performance and reliability. - Support database orchestration and optimization strategies (such as PITR for PostgreSQL or managing Apache Pinot and Cassandra) across environments. - Work closely with engineering teams to streamline software delivery and ensure seamless CI/CD processes. - Troubleshoot complex operational issues, mitigate infrastructure vulnerabilities, and enforce strict cloud security profiles. - Innovate and automate to generate business value. - Support and mentor junior DevOps team members. Qualifications - Minimum 5+ years’ experience in complex product environments. - Strong experience in DevOps, Cloud Engineering, or Platform Engineering. - A long history of Linux administration and engineering experience. - Proven track record of provisioning cloud infrastructure. - Hands-on experience building, administering, and maintaining Kubernetes clusters. - Deep experience with either AWS or Azure infrastructure, core cloud services, and secure networking. - Strong scripting capability in Bash, Python, or PowerShell. - The ability to read and comprehend segments of application code, such as Node.js or Java. - A solid understanding of cloud security, governance, and operational best practices (e.g., identity federation, workload identities). Requirements - Nice to have: Certified Kubernetes Administrator (CKA) or equivalent CNCF certification. - Experience in fintech or regulated environments. - Advanced Database Administration experience, including performance optimization, handling CDC artifacts, or deduplication strategies. - Experience with GitLab CI/CD. Company Description While the position is remote, we’re only able to proceed with candidates who already hold the legal right to live and work in South Africa.

Related Categories

DevOps Engineer

Related Job Pages

Remote Full-time Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

Deployment Engineer

Armada

DevOps Engineer3 days ago

Full Time RemoteTeam 51-200H1B Sponsor

Company Site LinkedIn

• Execute installation, commissioning, startup, and infrastructure validation activities for modular data center deployments • Perform hands-on technical work across electrical, mechanical, controls/BAS, networking, and low-voltage systems • Execute deployment procedures, commissioning plans, operational readiness testing, and infrastructure validation processes • Conduct site assessments and maintain field documentation including commissioning reports, punch lists, and as-built updates • Ensure deployment activities are completed in accordance with Armada operational, engineering, and safety standards • Troubleshoot infrastructure issues across power, cooling, controls, monitoring, and network-connected systems • Read, interpret, and apply electrical schematics, mechanical drawings, and control diagrams during deployment and operational activities • Utilize field diagnostic tools and test equipment to identify and resolve infrastructure issues • Participate in root cause analysis and corrective action implementation during deployment and operational incidents • Exercise independent technical judgment while escalating high-risk or complex issues appropriately • Execute BMS, EPMS, and DCIM integration and validation activities • Support system startup, alarm validation, monitoring verification, and operational turnover processes • Verify infrastructure systems meet operational, performance, and quality requirements prior to deployment completion • Participate in incident response and operational recovery activities as required • Partner with Senior Deployment Engineers, Engineering, Manufacturing, Supply Chain, and Customer Operations teams during deployment execution • Coordinate field activities with vendors, subcontractors, and third-party service providers • Provide operational feedback and field observations that improve deployment quality, repeatability, and infrastructure reliability • Maintain clear communication with internal and external stakeholders throughout deployment activities

View details: Deployment Engineer

United States

$113.8K - $142.2K / year

Apply

Senior Site Reliability Engineer

The Leaflet

An independent platform for cutting-edge, progressive, legal, and political opinion.

DevOps Engineer3 days ago

Full Time RemoteTeam 11-50H1B No Sponsor

Company Site LinkedIn

• Ensure the availability, reliability, and performance of high-traffic Java-based applications in a distributed environment • Troubleshoot and resolve complex issues across production and non-production environments • Participate in pre- and post-deployment performance testing and monitoring to continuously improve application performance • Design, build, and operate agentic AI workflows that automate operational tasks such as alert triage and root cause analysis

Grafana Java Kubernetes Python Go

View details: Senior Site Reliability Engineer

Poland

Apply

Senior Site Reliability Engineer

The Leaflet

An independent platform for cutting-edge, progressive, legal, and political opinion.

DevOps Engineer3 days ago

Full Time RemoteTeam 11-50H1B No Sponsor

Company Site LinkedIn

• Ensure the availability, reliability, and performance of high-traffic Java-based applications in a distributed environment. • Troubleshoot and resolve complex issues across production and non-production environments. • Participate in pre- and post-deployment performance testing and monitoring to continuously improve application performance. • Optimize Java application performance with a focus on JVM tuning, efficient resource utilization, and horizontal scaling. • Deploy and manage the Grafana stack (Grafana, Prometheus, Loki, Mimir, Alloy) to deliver real-time monitoring, logging, and alerting. • Implement and refine observability strategies that enhance visibility into application and infrastructure health. • Create and maintain dashboards, alerts, and log queries for comprehensive system health monitoring. • Integrate AI/ML models into the observability pipeline for anomaly detection, predictive alerting, and intelligent alert correlation and noise reduction. • Design, build, and operate agentic AI workflows that automate operational tasks such as alert triage, root cause analysis, runbook execution, and incident summarization. • Develop tool-calling LLM agents that interact with infrastructure APIs (Kubernetes, Grafana, Jira, Slack, PagerDuty) to execute diagnostic and remediation actions autonomously or with human-in-the-loop approval. • Build and maintain MCP (Model Context Protocol) servers and integrations that expose internal systems as tool surfaces for AI agents. • Evaluate, select, and operationalize LLM frameworks and orchestration platforms (e.g., LangChain, LangGraph, CrewAI, n8n, or custom solutions) for production-grade agentic systems. • Implement guardrails, evaluation harnesses, and feedback loops to ensure AI agent outputs are accurate, safe, and continuously improving. • Champion the adoption of AI-assisted development and operations practices across the SRE and broader engineering organization. • Support the operations team’s incident response efforts, conduct post-mortems, and identify root causes to prevent recurrence. • Leverage AI tools to accelerate incident timelines, auto-generate post-mortem drafts, and surface patterns across historical incidents. • Document and share lessons learned, contributing to a culture of continuous improvement. • Identify repetitive operational workflows and engineer AI-augmented or fully automated replacements. • Build self-service tools and chatbot interfaces that allow engineering teams to query system status, retrieve logs, and execute standard operating procedures through natural language. • Measure and report on toil reduction metrics to quantify the impact of automation initiatives. • Work closely with developers, architects, and data/ML engineers to design solutions that improve reliability and leverage AI capabilities. • Collaborate with DevOps and NOC teams to support the application platform. • Communicate SRE practices, AI/automation capabilities, and operational insights to technical and non-technical stakeholders. • Provide feedback on application performance, potential improvements, and observability metrics.

Ansible AWS Azure Cloud Google Cloud Platform Grafana Java Kubernetes Prometheus Python Terraform Go

View details: Senior Site Reliability Engineer

Florida

Apply

Senior DevOps Engineer - Azure

Innovecs

We are a global digital services company

DevOps Engineer3 days ago

Full Time RemoteTeam 501-1,000Since 2010H1B No Sponsor

Company Site LinkedIn

Role Description We are looking for a Senior DevOps Engineer - Multi - Cloud. Qualifications - Experience in DevOps practices and tools. - Strong knowledge of cloud platforms (AWS, Azure, GCP). - Proficiency in scripting and automation. - Experience with CI/CD pipelines. - Familiarity with containerization technologies (Docker, Kubernetes). Requirements - Ability to work in a remote-first environment. - Strong communication skills. - Problem-solving mindset. - Team player with a collaborative approach. Benefits - Flexible hours and remote-first mode. - Competitive compensation. - Complete Hardware/Software setup – anything you need for work. - Open-door culture, transparent communication, and top management at a handshake distance. - Health insurance, vacation, sick leaves, holidays, paid maternity/paternity leave. - Access to our learning & development center: workshops, webinars, training platform, and edutainment events. - Virtual team buildings and social activities. Company Description Innovecs is a global digital services company with a presence in the US, the UK, the EU, Israel, Australia, and Ukraine. Specializing in software solutions, the Innovecs team has experience in Supply Chain, Healthtech, Collaboration Tech, and Gaming. - Included in the Inc. 5000, the list of fastest-growing private companies in the US. - Ranked as one of the best global outsourcing service providers by IAOP. - Honored with the Global Good Awards for Employee Engagement & Wellbeing. - Won gold at the Employer Brand Management Awards. - Included in the Global Top 100 Inspiring Workplaces Ranking.

View details: Senior DevOps Engineer - Azure

Worldwide

Apply

Senior DevOps Engineer

Job Description

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Deployment Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior DevOps Engineer - Azure