Kong Inc. logo
Kong Inc.

The cloud connectivity company. Powering connections to build a reliable digital world.

Staff Site Reliability Engineer – Project Volcano

DevOps EngineerDevOps EngineerFull TimeRemoteLeadTeam 201-500Since 2017H1B No SponsorCompany SiteLinkedIn

Location

United States

Posted

23 hours ago

Salary

0

Seniority

Lead

Bachelor DegreeEnglishKubernetesPostgreSQL

Job Description

Staff Site Reliability Engineer – Project Volcano

Kong Inc.

• Own reliability for Volcano end-to-end: Define and drive SLOs, error budgets, and incident response practices for all Volcano services • Architect the platform's infrastructure: Design and build the multi-region Kubernetes infrastructure • Build the GitOps and CI/CD backbone: Establish deployment automation, canary pipelines, and preview environment provisioning • Scale managed data services: Design, operate, and harden multi-tenant PostgreSQL clusters • Drive observability from day one: Instrument every Volcano service with meaningful SLIs • Lead cross-functional reliability work: Collaborate with the OCTO team, product engineering, and security to bake reliability into Volcano's architecture • Set SRE culture and standards: Mentor engineers on reliability principles; lead postmortems • Evaluate and adopt emerging technologies: Given Volcano's greenfield nature, evaluate edge runtimes, serverless compute, vector databases, and AI-native infrastructure components.

Job Requirements

  • BS in Computer Science or equivalent
  • Substantial experience at Staff or Principal IC level in SRE/Platform Engineering
  • Proven track record building SRE or platform engineering practices for developer-facing platforms or PaaS/SaaS products
  • Deep Kubernetes expertise: multi-tenant cluster design, networking (CNI, service mesh, ingress), autoscaling, and security hardening.

Benefits

  • Flexible work arrangements
  • Professional development opportunities

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Full TimeRemoteTeam 51-200Since 2015H1B Sponsor

• As a Senior DevOps Engineer at Incode, you’ll be a senior individual contributor on the platform team responsible for operating, scaling, and securing the production systems behind Incode’s identity infrastructure. • You’ll own outcomes - not tickets - across our Kubernetes and AWS environments, partnering closely with software engineers, security, and leadership to make our platform faster, safer, and quieter. • You won’t just keep the lights on; you’ll improve reliability, reduce operational friction through automation, and lead durable fixes when things break.

United States
OpenAI logo

Partner AI Deployment Engineer – AWS

OpenAI

Creating safe AGI that benefits all of humanity.

DevOps Engineer23 hours ago
Full TimeRemoteTeam 201-500Since 2015H1B Sponsor

• Serve as the senior technical counterpart to AWS field leadership, building trust and credibility across regions and teams. • Influence joint account strategy and technical direction for high-priority opportunities. • Shape how OpenAI engages with AWS by defining engagement models, prioritization frameworks, and best practices. • Proactively identify and drive net-new opportunities and high-impact use cases across the AWS ecosystem. • Lead technical strategy for large, ambiguous, and high-stakes enterprise engagements. • Guide customers from early ideation through architecture design, prototyping, and production deployment. • Act as a technical decision-maker and escalation point, de-risking complex implementations. • Design and communicate end-to-end AI architectures leveraging OpenAI and AWS services. • Build and guide development of prototypes, POCs, and reference implementations to accelerate adoption. • Establish best practices for scalable, secure, and production-ready GenAI systems. • Enable AWS and partners through scalable technical motions (workshops, playbooks, reference architectures, demos). • Develop reusable solution patterns and assets that can be deployed independently by AWS teams and SIs.

India
Part TimeRemoteTeam 51-200

Role Description We are looking for an experienced DevOps Engineer to support and maintain a custom-developed application running in Microsoft Azure. The role focuses on CI/CD automation, Kubernetes administration, production support, incident management, monitoring, troubleshooting, and implementation of DevSecOps best practices. You will be responsible for ensuring the reliability, security, and smooth deployment of applications across pre-production and production environments. - Contract: B2B - Start: June 2026 - Duration: Until December 2026 (extension possible) - Location: Fully Remote - Workload: - First 2 months: approximately 50% allocation (10 MDs/month) - Following months: approximately 25% allocation (5 MDs/month) - Language: English Responsibilities - Manage and support cloud-based applications hosted in Microsoft Azure - Design, maintain, and optimize CI/CD pipelines using Jenkins and GitHub Actions - Perform deployments across development, pre-production, and production environments - Administer and support Kubernetes (AKS) clusters and containerized workloads - Implement and maintain Infrastructure as Code solutions using Terraform and Ansible - Monitor application and infrastructure performance, troubleshoot incidents, and ensure service reliability - Collaborate with development teams to improve deployment processes and operational efficiency - Implement security best practices and support DevSecOps initiatives - Manage access control and permissions using IAM and RBAC principles - Support patch management, vulnerability remediation, and platform maintenance - Work with logging, monitoring, and observability tools to proactively identify and resolve issues - Document operational procedures, deployment processes, and technical solutions Qualifications - Strong commercial experience as a DevOps Engineer in cloud-native environments - Advanced knowledge of Microsoft Azure - Hands-on experience with Kubernetes (AKS) administration and troubleshooting - Strong experience with Docker and containerized applications - Experience with Jenkins and GitHub Actions for CI/CD automation - Experience with Git and GitHub version control - Knowledge of Infrastructure as Code tools such as Terraform and Ansible - Experience with Linux administration and Shell scripting - Experience with monitoring and logging solutions such as Splunk and Azure Monitor - Understanding of DevSecOps principles and security best practices - Experience with IAM and RBAC access management models - Strong troubleshooting, incident management, and production support experience - Excellent communication skills in English Nice to Have - Experience with ArgoCD - Experience with PostgreSQL administration or support - Experience with Prisma Cloud, Snyk, and SonarQube - Experience with ServiceNow and JIRA - Experience working in enterprise-scale cloud environments

Worldwide
PG&E Corporation logo

Senior, Data Engineer - Reliability Data

PG&E Corporation

The Pacific Gas and Electric Corporation, more commonly referred to as the PG&E Corporation, was officially incorporated in 1905 with the merger of the San Fran

Title: Senior, Data Engineer - Reliability Data Location: Oakland United States Job Description: Requisition ID # 172361 Job Category: Information Technology Job Level: Individual Contributor Business Unit: Strategy & Growth Work Type: Hybrid Job Location: Oakland Department Overview The System Performance, Reliability and Resiliency Strategy team within the overall Electric Transmission and Distribution Engineering organization is responsible for planning, organizing, and managing the resources necessary to successfully execute PG&E's Electric Reliability Strategy and initiatives. Within this department the Reliability Data team is on point for a key role is developing and curating all reliability data and data pipelines so that they meet auditable standards. Position Summary Designs, develops, and leads the implementation of data pipelines and processes to extract, transform, and deliver high-quality reliability data from diverse and complex sources. Defines and applies transformation logic to ensure data is accurate, consistent, and structured to meet enterprise stakeholder needs. Establishes and maintains comprehensive metadata, including data lineage, transformation logic, and audit documentation, to support transparency, governance, and regulatory compliance. Supports the System Performance, Reliability and Resiliency Strategy team by ensuring the availability of accurate, auditable, and actionable data used to inform PG&E's Electric Reliability Strategy and initiatives. Drives improvements in data quality, pipeline performance, and governance practices to meet evolving regulatory and business requirements. Leads collaboration with cross-functional teams, data owners, and leadership to resolve complex data challenges, enhance data processes, and align solutions with enterprise objectives. Provides guidance to junior team members and contributes to the development of best practices and continuous improvement across the data lifecycle. This position follows a hybrid work model, requiring employees to report to their assigned office location at least two or three days per week. The remaining days may be worked remotely, depending on business needs. The headquarters is located in the Oakland General Office. PG&E is providing the salary range that the company in good faith believes it might pay for this position at the time of the job posting. This compensation range is specific to the locality of the job. The actual salary paid to an individual will be based on multiple factors, including, but not limited to, particular skills, education, licenses or certifications, experience, market value, geographic location, collective bargaining agreements, and internal equity. Although we estimate the successful candidate hired into this role will be placed towards the middle or entry point of the range, the decision will be made on a case-by-case basis related to these factors. This job can also participate in PG&E's discretionary incentive compensation programs. A reasonable salary range is: Bay Area Minimum: $122,000 Bay Area Mid-Point: $158,000 Bay Area Maximum: $194,000 Job Responsibilities - Conceptualizes and generates infrastructure that allows big data to be accessed and analyzed. - Partners with various departments to understand and incorporate standards information and requirements into work procedures. - Deploys machine learning algorithms in production environments. - Resolves application programming analysis problems of moderate to complex scope within procedural guidelines. May seek assistance from the supervisor or more skilled programmers/analysts on unusual or especially complex issues that cross multiple functional/technology areas. - Works on complex data and analytics-centric problems having a moderate impact that require in-depth analysis and judgment to obtain results or solutions - Plans work to meet assigned general objectives; progress is reviewed upon completion, and solutions may provide an opportunity for creative/non-standard approaches. - Communicates (oral and written) recommendations. - Mentors/guides less experienced colleagues. Qualifications Minimum: - BA/BS in Computer Science, Management Information Systems, related field of study, or equivalent experience. - 5 years of experience with data engineering/ETL ecosystem, such as Palantir Foundry, Spark, Informatica, SAP BODS, OBIEE. Desired: - Experience with machine learning algorithm deployment. Knowledge, Skills, Abilities, and Competencies: - Business Intelligence and data access tool knowledge. - Knowledge of software engineering principles such as unit testing, CI/CD, and source control.

California
$122K - $194K / year