Job Closed

This listing is no longer active.

Finom

Financial solutions for entrepreneurs and freelancers - combining business account benefits with multiple services

Senior Site Reliability Engineer

DevOps EngineerDevOps EngineerFull Time Remote SeniorTeam 501-1,000Since 2019H1B No SponsorCompany Site LinkedIn

Location

Bulgaria

Posted

58 days ago

Salary

Seniority

Senior

Bachelor DegreeEnglishAWS Google Cloud Platform Grafana Kubernetes Prometheus Terraform

Job Description

• Lead the Platform Evolution: Design and operate our Kubernetes ecosystem (GKE, multi-cluster) with a focus on high availability and zero-downtime operations. • Build "Paved Roads": Own and evolve our PaaS strategy, using GitOps (ArgoCD) and CI/CD (GitLab) to empower domain teams to deploy independently. • Architect Reliability: Define and implement our observability strategy across metrics, logs, and tracing (Prometheus, VictoriaMetrics, OpenTelemetry). • Drive Infrastructure-as-Code: Lead the automation of our infrastructure using Terraform, ensuring all resources are standardized and version-controlled. • Own the Error Budget: Partner with engineering teams to establish and manage SLOs, SLAs, and incident management frameworks. • Disaster Recovery Mastery: Design and participate in regular DR drills, implementing blue/green and active/passive strategies across regions to ensure service continuity. • Innovate Operations: Proactively apply AI-driven approaches to improve operational efficiency and automated bottleneck detection.

Job Requirements

Strong hands-on experience managing Kubernetes (GKE preferred) in high-load, multi-cluster production environments
Deep experience with GCP (AWS is a strong plus) and Terraform for large-scale infrastructure
Solid experience with ArgoCD, GitLab CI, and the "Infrastructure as Code" philosophy
Deep knowledge of the Prometheus/Grafana stack and implementing tracing/logging at scale
Proven ability to design highly available 24/7 systems with automated failover and rollback capabilities
English level B2+ for effective cross-functional communication

Benefits

Make a genuine impact on the product
Work in the EU
Become a stock options holder
Receive unwavering support and care
Work & Swim program
Equal Opportunity Statement

Related Categories

DevOps Engineer

Related Job Pages

Remote Full-time Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

Lead Site Reliability Engineer

Coupa Software

Spend is the fuel to help your company deliver performance, profitability, and purpose!

DevOps Engineer58 days ago

Full Time RemoteTeam 1,001-5,000Since 2006H1B Sponsor

Company Site LinkedIn

• Build, deploy, and troubleshoot microservices in Kubernetes and Amazon EKS, ensuring scalability and reliability. • Design secure, highly available web applications with a focus on capacity planning and performance optimization. • Deploy and manage the lifecycle of LLMs and embedding models, defining KPIs to measure and improve AI application performance. • Evaluate and integrate emerging technologies such as RAG systems, MCP servers, AI Agents, and agentic workflows into our platform. • Manage AWS core and GenAI services (S3, IAM, EKS, Bedrock, etc.) using infrastructure-as-code tools like Terraform and Chef, while maintaining observability through tools like New Relic or PagerDuty. • Collaborate across product, platform, and engineering teams on architecture design, security patching, incident response, and release management to ensure the reliability of our ML and GenAI infrastructure

AWS Azure Chef Cloud DNS Google Cloud Platform Kubernetes Linux Microservices MySQL Python Terraform

View details: Lead Site Reliability Engineer

India

Apply

Job Closed

Senior I O Engineer - Azure Cloud Ops and Linux

UnitedHealth Group

UnitedHealth Group is a healthcare and well-being company that’s dedicated to improving the health outcomes of millions around the world. We are comprised of

DevOps Engineer58 days ago

Full Time Remote

Company Site

Role Description Optum is a global organization that delivers care, aided by technology to help millions of people live healthier lives. You will enjoy the flexibility to telecommute* from anywhere within the U.S. as you take on some tough challenges. - Solution new Cloud approaches events that surface to improve the operations of Cloud Infrastructure - Review, improve and approve architectural modifications to existing Cloud Infrastructure through formal change management processes - Installation and configuration of application components of solutions - Serve as a key resource on complex and/or critical issues related to application and server performance - Expert in Windows Server, Windows Desktop, Red Hat Linux and Oracle Linux operating systems - Manage all scheduled maintenance across all servers, in support of service level agreements - Experience with both SharePoint and VMWare is a plus - Submission of change control for all scheduled maintenance - Responsible for strong working knowledge of security compliance according to State and Federal security policies and regulations - Validation and support of disaster recovery plan for all server and application components of OSGS State contracts - Leverage enterprise-approved AI tools to streamline workflows, automate tasks, and drive continuous improvement Qualifications - Bachelor’s degree - 5+ years of experience with Linux and Windows systems administration, maintenance and security - 5+ years of experience with general auditing/troubleshooting experience on all levels (network, Linux, software, hardware) - 3+ years of experience with day-to-day troubleshooting of application and database connectivity problems - 3+ years of experience with proactive application monitoring and trouble resolution - 3+ years of experience with report writing in support of service level agreement reporting requirements - 3+ years of Azure Administration experience - 3+ years of experience on key Azure services with three or more of the following: Jenkins, Terraform, Github, Azure Web Services, Kubernetes, Azure DevOps, Key Vault, DNS, Identity Services, Front Door, Traffic Manager, Azure Monitor, App Insights, and Network Watcher - 3+ years of System Administration experience with Windows and Unix - 3+ years of experience with Network Administration (Firewall, ACLs/NAT, design, upgrades, security) - 3+ years of experience with TCP/IP and networking fundamentals - 3+ years of experience with scripting and automation tools like Python/Shell/Perl - 2+ years of experience with process and procedure documentation for contractual required documentation, including an operations manual for ongoing support of the environment - Willingness to support 24x7 production environments - Willingness to complete AZ-204 Certification or equivalent within 6 months of hire Requirements - All Telecommuters will be required to adhere to UnitedHealth Group’s Telecommuter Policy. Benefits - Pay is based on several factors including but not limited to local labor markets, education, work experience, certifications, etc. - Comprehensive benefits package - Incentive and recognition programs - Equity stock purchase - 401k contribution (all benefits are subject to eligibility requirements) - The salary for this role will range from $91,700 to $163,700 annually based on full-time employment. Application Deadline - This will be posted for a minimum of 2 business days or until a sufficient candidate pool has been collected. - Job posting may come down early due to volume of applicants.

View details: Senior I O Engineer - Azure Cloud Ops and Linux

United States

$91.7K - $163.7K / year

Apply

Job Closed

Release Manager – DevOps, Enterprise Platforms

Gruve

Data to Possibilities

DevOps Engineer58 days ago

Full Time RemoteTeam 201-500Since 2024H1B No Sponsor

Company Site LinkedIn

• Maintain the release calendar for in-scope products, coordinating timing, dependencies, and stakeholder communication across APAC and EMEA teams. • Run release readiness reviews and facilitate go/no-go decisions, ensuring acceptance criteria, test evidence, security sign-off, and operational runbooks are complete before deployment. • Execute deployments to staging and production environments, including coordination of pre- and post-release validation, smoke tests, and rollback if needed. • Operate and continuously improve CI/CD pipelines (e.g., Jenkins, GitHub Actions, Azure DevOps), reducing manual steps and lead time for changes. • Drive change management in line with ITIL practices and applicable regulatory frameworks (e.g., GxP, 21 CFR Part 11), maintaining a complete and audit-ready release record. • Coordinate hotfix and emergency change processes, including incident-driven releases, while protecting overall system stability. • Support healthy lower environments (dev, QA, staging) by helping manage availability, configuration parity, and refresh cadence. • Track and report release metrics such as deployment frequency, lead time for changes, change failure rate, and mean time to recovery (DORA metrics). • Act as the regional release point of contact for APAC and EMEA stakeholders, escalating risks and decisions clearly and on time. • Document release processes, runbooks, and lessons learned, and share best practices with engineering teams across regions.

AWS Azure Cloud Docker Google Cloud Platform Jenkins Kubernetes

View details: Release Manager – DevOps, Enterprise Platforms

India

Apply

Job Closed

Lead DevOps Engineer

Education Perfect

A complete digital toolkit for teaching and learning.

DevOps Engineer58 days ago

Full Time RemoteTeam 201-500Since 2011H1B No Sponsor

Company Site LinkedIn

• Improve EP’s developer experience through workflow automation, self-service tooling, reusable infrastructure patterns, and careful standardisation. • Maintain and enhance EP’s GitHub Actions-based build and deployment pipelines to increase engineering productivity and product quality. • Ensure cost-effective use of third-party providers, such as AWS, Datadog, and Akamai. Monitor and optimize cloud spend. • Develop and enforce high technical standards across the engineering team for performance, reliability, security, and maintainability. • Create high-level infrastructure designs that address the ongoing scalability, reliability, security, and performance needs of the platform. • Collaborate across teams and functions to help define our architecture and technical roadmap. • Help design and enforce security controls to ensure EP adheres to key compliance frameworks, such as ISO 27001 and GDPR. • Own the end-to-end availability, reliability, security and performance of the EP platform. • Develop automation, observability, and processes to keep EP highly available, scalable, and resilient. • Participate in on-call rotations and incident response. • Educate and empower software engineers to think operationally when designing services, and to operate what they build. • Embed AI into the development toolchain, including AI-powered security reviews, code analysis, policy enforcement, and observability. Leverage AI to drive personal and team productivity. • Foster a healthy and collaborative culture, in line with EP’s core values. • Make pragmatic decisions and sensible tradeoffs informed by high-level business objectives.

AWS Cloud Distributed Systems

View details: Lead DevOps Engineer

New Zealand

Apply

Job Closed

Senior Site Reliability Engineer

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Lead Site Reliability Engineer

Senior I O Engineer - Azure Cloud Ops and Linux

Release Manager – DevOps, Enterprise Platforms

Lead DevOps Engineer