Job Closed

This listing is no longer active.

Built on more than 130 years of experience, GE Vernova, a division of General Electric (GE), is leading a new era of energy by electrifying the world while work

SRE Production DevOps

DevOps EngineerDevOps EngineerFull Time Remote Mid Level Company Site

Location

Worldwide

Posted

84 days ago

Salary

Seniority

Mid Level

No structured requirement data.

Job Description

Role Description The Production DevOps Engineer serves as a critical link in the "Middle-Mile" of software delivery for the GE Vernova’s Grid Software SaaS products. This role is responsible for ensuring that software moves from development to production environments through a standardized, secure, and highly observable path. You will own the Change Management Process, serving as a primary authority for production deployments to ensure that new SaaS product versions do not compromise the stability of global energy grid operations. This position requires a strong technical background in automation and a disciplined approach to release safety in a 24/7 operational environment. Works independently and is seen as a Technical Leader. The role demonstrates deep understanding of concurrent software development, its effect on build management and releasing the builds across versions and environments. Qualifications - 3–5 years of experience in DevOps, SRE, or Release Engineering roles for cloud-native SaaS applications. - Bachelor's Degree in Computer Science or “STEM” Majors (Science, Technology, Engineering and Math) with advanced experience. Requirements - Hands-on experience with Jenkins, Artifactory, GitHub Actions and ArgoCD for automated software delivery. - Proficiency in managing workloads on Kubernetes, specifically with EKS clusters. - Strong skills in Ansible and Terraform for configuration management and infrastructure-as-code. - Solid understanding of AWS cloud services (VPC, IAM, EKS, RDS, S3, MSK, etc) in a production setting. - Experience using Prometheus, Grafana, Splunk, Datadog or Dynatrace to monitor deployment health and system performance. - Experience building dynamic build pipelines using Groovy Script, Python, Bash or Go languages. - Proven ability to manage production changes and troubleshooting under pressure in a high-stakes environment. - Familiarity with regulated industries and security frameworks such as NERC CIP, SOC2, ISO 27001, IEC 62443 is highly preferred. - Strong ability to document technical procedures and communicate clearly with stakeholders during global shift handovers. Benefits - Relocation Assistance Provided: No - #LI-Remote - This is a remote position Key Performance Indicators (KPIs) - Contribution towards the 4-hour SLA target for Customer Onboarding Speed. - Help maintain 99.99% availability of mission critical grid SaaS products. - Maintaining a low rate of failed production deployments through improved quality gates for Change Failure Rate. - Ensuring fast restoration of service through automated rollbacks and clear runbooks for Mean Time to Recover (MTTR). - Automating repetitive manual tasks to ensure at least 50% of time is spent on engineering improvements for Toil Reduction. Business Acumen - Strong problem solving abilities and capable of articulating specific technical topics or assignments. - Experience in building scalable and highly available distributed systems. - Skilled in breaking down problems and estimating time for development tasks. - Evangelizes how our technology solves customer problems from a technology and business perspective. Leadership - Demonstrates clarity of thinking to work through limited information and vague problem definitions. - Influences through others; builds direct and "behind the scenes" support for ideas. - Proactively identifies and removes project obstacles or barriers on behalf of the team. - Shares knowledge, power, and credit, establishing trust, credibility, and goodwill. Personal Attributes - Able to work under minimal supervision. - Excellent communication skills and the ability to interface with senior leadership with confidence and clarity. - Skilled in providing oversight and mentoring team members. Shows ability to effectively delegate work. - Applies values, business strategy, policies, precedent, and experience to make complex decisions in ambiguity and with uncertain consequences.

Job Requirements

3–5 years of experience in DevOps, SRE, or Release Engineering roles for cloud-native SaaS applications.
Bachelor's Degree in Computer Science or “STEM” Majors (Science, Technology, Engineering and Math) with advanced experience.
Hands-on experience with Jenkins, Artifactory, GitHub Actions and ArgoCD for automated software delivery.
Proficiency in managing workloads on Kubernetes, specifically with EKS clusters.
Strong skills in Ansible and Terraform for configuration management and infrastructure-as-code.
Solid understanding of AWS cloud services (VPC, IAM, EKS, RDS, S3, MSK, etc) in a production setting.
Experience using Prometheus, Grafana, Splunk, Datadog or Dynatrace to monitor deployment health and system performance.
Experience building dynamic build pipelines using Groovy Script, Python, Bash or Go languages.
Proven ability to manage production changes and troubleshooting under pressure in a high-stakes environment.
Familiarity with regulated industries and security frameworks such as NERC CIP, SOC2, ISO 27001, IEC 62443 is highly preferred.
Strong ability to document technical procedures and communicate clearly with stakeholders during global shift handovers.

Benefits

Relocation Assistance Provided: No
#LI-Remote - This is a remote position
Key Performance Indicators (KPIs)
Contribution towards the 4-hour SLA target for Customer Onboarding Speed.
Help maintain 99.99% availability of mission critical grid SaaS products.
Maintaining a low rate of failed production deployments through improved quality gates for Change Failure Rate.
Ensuring fast restoration of service through automated rollbacks and clear runbooks for Mean Time to Recover (MTTR).
Automating repetitive manual tasks to ensure at least 50% of time is spent on engineering improvements for Toil Reduction.
Business Acumen
Strong problem solving abilities and capable of articulating specific technical topics or assignments.
Experience in building scalable and highly available distributed systems.
Skilled in breaking down problems and estimating time for development tasks.
Evangelizes how our technology solves customer problems from a technology and business perspective.
Leadership
Demonstrates clarity of thinking to work through limited information and vague problem definitions.
Influences through others; builds direct and "behind the scenes" support for ideas.
Proactively identifies and removes project obstacles or barriers on behalf of the team.
Shares knowledge, power, and credit, establishing trust, credibility, and goodwill.
Personal Attributes
Able to work under minimal supervision.
Excellent communication skills and the ability to interface with senior leadership with confidence and clarity.
Skilled in providing oversight and mentoring team members. Shows ability to effectively delegate work.
Applies values, business strategy, policies, precedent, and experience to make complex decisions in ambiguity and with uncertain consequences.

Related Categories

DevOps Engineer

Related Job Pages

Remote Full-time Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

Senior DevSecOps Consultant – AWS, Kubernetes, Terraform

Trility Consulting

Start delivering technology solutions that simplify, automate, and secure your business.

DevOps Engineer84 days ago

Contract RemoteTeam 51-200H1B No Sponsor

Company Site LinkedIn

• Support and optimize cloud infrastructure for a data lake environment within AWS • Develop and maintain Infrastructure as Code using Terraform to ensure scalable, repeatable deployments • Manage and support Kubernetes-based workloads, including deployment and configuration using Helm • Collaborate with data and platform teams to ensure infrastructure supports data ingestion, processing, and reporting needs • Write and maintain Python scripts to support automation, integration, and operational tasks • Monitor and troubleshoot infrastructure and platform issues across cloud and containerized environments • Implement and maintain security best practices across cloud resources, Kubernetes, and data platform components • Contribute to documentation, runbooks, and operational standards to support long-term platform sustainability • Partner with cross-functional teams to support ongoing enhancements and stabilization of the data platform

AWS Distributed Systems Kubernetes Python Terraform

View details: Senior DevSecOps Consultant – AWS, Kubernetes, Terraform

Colorado + 18 more

Apply

Job Closed

Senior Site Reliability Engineer

Backblaze

Backblaze is the cloud storage innovator delivering a modern alternative to traditional cloud providers.

DevOps Engineer84 days ago

Full Time RemoteTeam 201-500Since 2007H1B Sponsor

Company Site LinkedIn

• Own and drive the availability, durability, and performance of critical services across all production environments. • Lead and champion complex projects from problem discovery through complete, cross-functional resolution, demonstrating high-level technical ownership. • Define, establish, and enforce service health standards, including working with engineering leadership to implement SLIs, SLOs, and error budget policies for multiple services. • Lead critical incident response and post-incident reviews, translating findings into strategic, long-term service improvements and architectural changes. • Mentor others and act as a subject matter expert in following and evolving established ITIL/OSS processes (incident, change, problem, and capacity management). • Design and architect scalable automation solutions to eliminate toil and improve the efficiency of operational tasks across the entire platform. • Drive the strategic direction of monitoring, logging, and alerting frameworks (e.g., Prometheus, Grafana, Catchpoint, ELK), and integrate them for comprehensive observability. • Build, maintain, and secure advanced CI/CD pipelines, configuration management, and complex infrastructure as code solutions (Terraform, Ansible, Jenkins). • Write production-grade code (Bash, Python, Go, etc.) to develop new reliability tools and enhance existing systems. • Act as a principal partner to engineering, product, and operations teams, consulting on resilient system design, architecture, and operation. • Lead and formalize the Production Readiness Review (PRR) process, ensuring robust operational handoff for all new services and features. • Lead capacity planning and disaster recovery strategy across critical infrastructure components. • Manage the relationship with vendors and service providers to troubleshoot systemic issues and ensure strict adherence to SLA performance. • Drive the creation of high-quality documentation, proactively share advanced learnings, and cultivate a reliability-first engineering culture across teams. • Own the creation, maintenance, and dissemination of operational playbooks, runbooks, and detailed system documentation. • Proactively identify systemic, recurring issues and architect and drive the implementation of long-term improvements and strategic design action plans. • Be a leading voice in promoting and embedding reliability-focused practices within development and operations teams.

Ansible Distributed Systems Docker Grafana Jenkins Kubernetes Linux Microservices Prometheus Python Terraform HashiCorp Vault

View details: Senior Site Reliability Engineer

United States

$150K - $200K / year

Apply

DevOps – Site Reliability Engineer

Oowlish

We make innovation simple, convenient and right...we just make it HAPPEN

DevOps Engineer84 days ago

Full Time RemoteTeam 51-200Since 2017H1B No Sponsor

Company Site LinkedIn

• Join a growing AI-focused SaaS startup as a DevOps & Site Reliability Engineer • Responsible for maintaining, optimizing, and scaling infrastructure supporting the platform • Work closely with development and product teams to improve deployment processes • Monitor systems and respond proactively to incidents

AWS Azure Docker GCP Grafana Jenkins Kubernetes Prometheus

View details: DevOps – Site Reliability Engineer

Brazil

Apply

Job Closed

SRE Analyst – Mid-level

Vivo (Telefônica Brasil)

Com a conexão, queremos que você descubra novos pontos de vista e aproveite tudo o que realmente importa.

DevOps Engineer84 days ago

Full Time RemoteTeam 10,001+Since 1998H1B No Sponsor

Company Site LinkedIn

• Perform troubleshooting and functional analysis of incidents in non-production environments; • Provide support for applications in testing environments; • Implement and manage monitoring tools to ensure visibility into system performance and proactively detect issues; • Lead incident response, conducting post-incident (postmortem) analyses to identify root causes and prevent recurrence; • Develop scripts and tools to automate repetitive tasks, improving operational efficiency and reducing human error; • Analyze system capacity and plan scalability to meet demand, ensuring services remain available and responsive; • Collaborate with development teams to implement changes safely and efficiently, minimizing impact on the staging environment; • Work closely with security teams to ensure security practices are integrated into the testing lifecycle; • Create and maintain technical documentation and operational runbooks, and train teams on best practices and tools; • Work together with QA analysts to continuously improve system reliability and efficiency.

Apache HTTP Server Cassandra Linux MongoDB OpenShift Oracle Database PostgreSQL Python

View details: SRE Analyst – Mid-level

Brazil

Apply

Job Closed

SRE Production DevOps

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Senior DevSecOps Consultant – AWS, Kubernetes, Terraform

Senior Site Reliability Engineer

DevOps – Site Reliability Engineer

SRE Analyst – Mid-level