Job Closed
This listing is no longer active.
SRE Platform Engineer
Location
Worldwide
Posted
68 days ago
Salary
0
Seniority
Mid Level
Job Description
SRE Platform Engineer
GE Vernova
Role Description The Platform System Reliability Engineer is the primary operations engineer and operator of our EKS Kubernetes environment, which serves as the foundation for our global grid software SaaS products. This role focuses on the "middle-mile" of software delivery, ensuring that the underlying compute, networking, and storage layers are secure, hardened, scalable, and resilient to support critical energy infrastructure in the cloud. You will be responsible for the full lifecycle of production clusters, from initial bootstrapping, performance tuning, patching and securing. Qualifications - Bachelor's Degree in Computer Science or “STEM” Majors (Science, Technology, Engineering and Math) with advanced experience. - 6–8 years in SRE or Platform Engineering roles supporting mission-critical, 24/7 cloud environments. Requirements - 5 years of experience operating production-grade Kubernetes clusters at scale. - Expert-level knowledge of multi-cluster management, performance tuning and experience implementing observability tools such as Prometheus/Grafana, Dynatrace, Splunk, Datadog, etc. - Deep hands-on experience with AWS core services (EKS, EC2, ALB, S3, RDS, MSK). - Proficiency in Terraform, Ansible, and Python or Go for infrastructure automation and deployment tools like ArgoCD or Flux. - Strong understanding and hands-on experience of cloud networking concepts such as VPCs, routing, load balancing and security configurations such as encryption, certificate management. Benefits - Relocation Assistance Provided: Yes - #LI-Remote - This is a remote position Roles and Responsibilities - Day 0: Provision & Infrastructure Hardening - Kubernetes Cluster Orchestration: Help design and deploy hardened EKS clusters across multiple AWS regions, ensuring consistent security baselines. - Infrastructure as Code (IaC): Build and maintain reusable Terraform and Ansible modules for automated provisioning of cloud infrastructure services including networking services, compute, storage, queue and cache, etc. - Security Architecture: Implement "Policy as Code" guardrails and secure network perimeters (ESPs) in alignment with NERC CIP and IEC 62443 standards. - Operationalize Cloud Infrastructure: Standardize run books, operating processes required to run critical infrastructure with highest reliability. - Day 1: Platform Readiness & Scaling - Resource Governance: Define and enforce Kubernetes resource quotas, limit ranges, and Pod Priority classes to ensure mission-critical services receive prioritized compute resources. - Connectivity & Ingress: Manage the ingress strategy and service mesh architecture to facilitate secure, performant connectivity between distributed microservices. - Acceptance Testing: Lead platform-level smoke, load testing and disaster recovery exercises to validate that the infrastructure can meet 99.99% uptime targets. - Sizing & Optimization: Partner with application teams to right-size containerized workloads, optimizing for both performance and cloud cost (FinOps). - Day 2: Operational Excellence & Tier 3 Support - L3 Escalation: Act as the highest technical escalation point for complex Kubernetes internals, troubleshooting issues such as failed pods, memory leaks, and network partitions. - Incident Response: Lead root cause analysis (RCA) for platform-level outages, implementing systemic fixes to prevent recurring failures. - Toil Elimination: Proactively identify and automate repetitive operational tasks—such as cluster upgrades and OS patching—to ensure the team spends at least 50% of their time on engineering improvements. - Observability Integration: Institutionalize platform monitoring using Prometheus and Grafana, creating dashboards that surface the "Golden Signals" of cluster health. Preferred Qualifications - Practical knowledge of NERC CIP, SOC2, ISO 27001, or IEC 62443 compliance standards in a SaaS context. - AWS Certified DevOps Engineer – Professional, CKA (Certified Kubernetes Administrator), or SRE Practitioner Certification. - Experience supporting mission-critical systems in energy, utilities, or other high-stakes industrial sectors. Personal Attributes - High level of energy and enthusiasm with the ability to thrive in a rapidly changing environment. - Demonstrated customer focus – evaluates decisions through the eyes of the customer; builds strong customer relationships; creates processes with customer viewpoint; partners with customers. - Change oriented – actively generates process improvements; champions and drives change initiatives; confronts. - Ability to work with global teams, act independently and as part of a team. - Strong analytical and problem-solving skills - communicates in a clear and succinct manner and effectively evaluates information/data to make decisions; anticipates obstacles and develops plans to resolve.
Job Requirements
- Bachelor's Degree in Computer Science or “STEM” Majors (Science, Technology, Engineering and Math) with advanced experience.
- 6–8 years in SRE or Platform Engineering roles supporting mission-critical, 24/7 cloud environments.
- 5 years of experience operating production-grade Kubernetes clusters at scale.
- Expert-level knowledge of multi-cluster management, performance tuning and experience implementing observability tools such as Prometheus/Grafana, Dynatrace, Splunk, Datadog, etc.
- Deep hands-on experience with AWS core services (EKS, EC2, ALB, S3, RDS, MSK).
- Proficiency in Terraform, Ansible, and Python or Go for infrastructure automation and deployment tools like ArgoCD or Flux.
- Strong understanding and hands-on experience of cloud networking concepts such as VPCs, routing, load balancing and security configurations such as encryption, certificate management.
Benefits
- Relocation Assistance Provided: Yes
- #LI-Remote - This is a remote position
- Roles and Responsibilities
- Day 0: Provision & Infrastructure Hardening Kubernetes Cluster Orchestration: Help design and deploy hardened EKS clusters across multiple AWS regions, ensuring consistent security baselines.
- Infrastructure as Code (IaC): Build and maintain reusable Terraform and Ansible modules for automated provisioning of cloud infrastructure services including networking services, compute, storage, queue and cache, etc.
- Security Architecture: Implement "Policy as Code" guardrails and secure network perimeters (ESPs) in alignment with NERC CIP and IEC 62443 standards.
- Operationalize Cloud Infrastructure: Standardize run books, operating processes required to run critical infrastructure with highest reliability.
- Day 1: Platform Readiness & Scaling Resource Governance: Define and enforce Kubernetes resource quotas, limit ranges, and Pod Priority classes to ensure mission-critical services receive prioritized compute resources.
- Connectivity & Ingress: Manage the ingress strategy and service mesh architecture to facilitate secure, performant connectivity between distributed microservices.
- Acceptance Testing: Lead platform-level smoke, load testing and disaster recovery exercises to validate that the infrastructure can meet 99.99% uptime targets.
- Sizing & Optimization: Partner with application teams to right-size containerized workloads, optimizing for both performance and cloud cost (FinOps).
- Day 2: Operational Excellence & Tier 3 Support L3 Escalation: Act as the highest technical escalation point for complex Kubernetes internals, troubleshooting issues such as failed pods, memory leaks, and network partitions.
- Incident Response: Lead root cause analysis (RCA) for platform-level outages, implementing systemic fixes to prevent recurring failures.
- Toil Elimination: Proactively identify and automate repetitive operational tasks—such as cluster upgrades and OS patching—to ensure the team spends at least 50% of their time on engineering improvements.
- Observability Integration: Institutionalize platform monitoring using Prometheus and Grafana, creating dashboards that surface the "Golden Signals" of cluster health.
- Preferred Qualifications
- Practical knowledge of NERC CIP, SOC2, ISO 27001, or IEC 62443 compliance standards in a SaaS context.
- AWS Certified DevOps Engineer – Professional, CKA (Certified Kubernetes Administrator), or SRE Practitioner Certification.
- Experience supporting mission-critical systems in energy, utilities, or other high-stakes industrial sectors.
- Personal Attributes
- High level of energy and enthusiasm with the ability to thrive in a rapidly changing environment.
- Demonstrated customer focus – evaluates decisions through the eyes of the customer; builds strong customer relationships; creates processes with customer viewpoint; partners with customers.
- Change oriented – actively generates process improvements; champions and drives change initiatives; confronts.
- Ability to work with global teams, act independently and as part of a team.
- Strong analytical and problem-solving skills - communicates in a clear and succinct manner and effectively evaluates information/data to make decisions; anticipates obstacles and develops plans to resolve.
Related Guides
Related Categories
Related Job Pages
More Infrastructure Engineer Jobs
Role Description Buscamos un Ingeniero de Sistemas e Infraestructura con experiencia en entornos NOC y SOC, responsable de diseñar, implementar, monitorear y mantener la infraestructura tecnológica de la organización. Este rol es clave para garantizar la disponibilidad, rendimiento y seguridad de los sistemas tanto en ambientes on-premise como en la nube. Colaborará con equipos multidisciplinarios para asegurar la continuidad operativa, la detección proactiva de incidentes y el fortalecimiento de la postura de ciberseguridad. Responsibilities - Diseñar, implementar y mantener la infraestructura de TI, incluyendo servidores, almacenamiento, redes y plataformas de virtualización (on-premise y nube). - Operar en entornos NOC/SOC, monitoreando sistemas, redes y eventos de seguridad para asegurar la continuidad del servicio y la atención oportuna de incidentes. - Configurar y administrar componentes de infraestructura física y virtual alineados a los requerimientos del negocio. - Monitorear el rendimiento, capacidad y disponibilidad de los sistemas, implementando mejoras para garantizar alta disponibilidad y confiabilidad. - Ejecutar tareas de administración de sistemas: instalación, configuración, mantenimiento y actualización. - Automatizar procesos de infraestructura mediante scripting, herramientas de automatización e Infraestructura como Código (IaC). - Gestionar procesos de respaldo y recuperación de información (backup & disaster recovery). - Implementar controles de seguridad, gestión de accesos y mecanismos de cifrado para proteger la información. - Realizar evaluaciones de vulnerabilidades, escaneos de seguridad y apoyar en la respuesta a incidentes. - Administrar y optimizar servicios en la nube (cómputo, almacenamiento, redes e identidades). - Monitorear consumo y costos en la nube, proponiendo estrategias de optimización. - Mantenerse actualizado en tendencias tecnológicas y proponer mejoras continuas. Qualifications - Sólidos conocimientos en administración de servidores, redes e infraestructura. - Experiencia en entornos NOC (Network Operations Center) y SOC (Security Operations Center). - Conocimientos en virtualización (VMware, Hyper-V o similares). - Experiencia con plataformas cloud (AWS, Azure o GCP). - Manejo de PowerShell, Bash u otros lenguajes de scripting. - Conocimiento en automatización e Infraestructura como Código (Terraform, Ansible, etc.). - Conocimientos en ciberseguridad: controles de acceso, cifrado, gestión de vulnerabilidades y cumplimiento. - Capacidad analítica y de resolución de problemas. - Habilidades de comunicación y trabajo en equipo. - Organización, atención al detalle y manejo de múltiples tareas. Requirements - Licenciatura en Sistemas, Tecnologías de la Información o afín (deseable). - Experiencia comprobable en administración de sistemas, ingeniería de infraestructura o roles similares. - Experiencia en implementación y soporte de infraestructura compleja. - Experiencia con herramientas de automatización y control de versiones. Benefits - Contratación directa con la empresa. - Esquema 100% nómina. - Prestaciones de ley. - Fondo de ahorro. - Aguinaldo de 30 días. - Seguro de vida. - Seguro de gastos médicos mayores. - Vales de despensa.
• Operate as part of a team responsible for the 24x7 availability of Geisinger's network, cloud, and data center infrastructure • Monitors and maintains the IT infrastructure • Deployment and maintenance of the IT infrastructure • Informs appropriate personnel of new features, limitations, and considerations from upgrades or new products • Participates in the evaluation of new hardware products and upgrades • Assists with troubleshooting and diagnosing errors in equipment
Date Posted: 2026-03-23Country: United States of AmericaLocation: US-CT-REMOTEPosition Role Type: RemoteU.S. Citizen, U.S. Person, or Immigration Status Requirements: Must be authorized to work in the U.S. without the company’s immigration sponsorship now or in the future. The company will not offer immigration sponsorship for this position. The company will not seek an export authorization for this role.Security Clearance Type: None/Not RequiredSecurity Clearance Status: Not Required Are you ready to explore the world of aerospace and defense? Do you want to learn from and collaborate with some of the greatest minds in the industry? At RTX, our internships, co-ops and full-time careers provide an exceptional foundation to work on complex problems, advance your skills and create a safer, more connected world. Discover opportunities to make a difference at RTX. RTX Corporation is an Aerospace and Defense company that provides advanced systems and services for commercial, military and government customers worldwide. It comprises three industry-leading businesses – Collins Aerospace, Pratt & Whitney, and Raytheon. Its 185,000 employees enable the company to operate at the edge of known science as they imagine and deliver solutions that push the boundaries in quantum physics, electric propulsion, directed energy, hypersonics, avionics and cybersecurity. The company, formed in 2020 through the combination of Raytheon Company and the United Technologies Corporation aerospace businesses, is headquartered in Arlington, VA. We’re looking for a college intern to join our Infrastructure organization within RTX's Digital Technology (DT) function. As a Digital Technology Infrastructure Intern, you’ll have the opportunity to work on cutting-edge technologies and key projects in areas such as Transformational Strategy, Hosting Solutions, and Identity Services. This internship is a great opportunity to develop interpersonal, analytical, and leadership skills in a collaborative and hands-on environment. The following position is to join our RTX Enterprise Services team. What You’ll Do: As a Digital Technology Infrastructure Intern, your responsibilities may vary by team and focus area, but key tasks include: - Supporting strategic initiatives to enhance capabilities, drive process optimization, and improve communication. - Assisting with digital identity and access management, including change management, process documentation, and resource planning. - Contributing to solutions involving Snowflake data warehouse, data analytics, automation, and Power BI visualizations. - Assisting with cloud engineering, hosting operations, infrastructure services, compliance, onboarding, and documentation. - Collaborating with cross-functional teams to integrate digital infrastructure into real estate and facility projects. - Performing technical project management across various disciplines. What You’ll Learn: - Core concepts in digital technology infrastructure, such as cloud hosting, identity services, and enterprise strategy. - Insights into the daily work of technology professionals within a global organization. - Mentorship and networking opportunities with experienced RTX professionals. Qualifications You Must Have: - Currently pursuing an undergraduate or graduate degree in Computer Science, Information Technology, Engineering, Mathematics, or a related field, with an expected graduation date no earlier than August 2026. (Please upload your unofficial transcript with your application.) - 1+ years of experience with Microsoft Office. - At least one previous internship experience. Qualifications We Prefer: - GPA of 3.5 or higher preferred. - Strong problem-solving skills and attention to detail. - Exceptionally self-driven, organized, and able to work independently or as part of a team. - Excellent written and verbal communication skills. - Previous RTX internship experience. - Prior coursework or personal projects related to IT Infrastructure, Data Analytics, or data lake tools (Snowflake, Matillion, Databricks, etc.) is a plus Learn More & Apply Now! Location: This position is remote. Please consider the following role type definition as you apply for this role: Remote: This position is currently designated as remote. Employees who are working in remote roles will work primarily offsite (from home) and may be expected to travel to an RTX facility as needed. As part of our commitment to maintaining a secure hiring process, candidates may be asked to attend select steps of the interview process in-person at one of our office locations, regardless of whether the role is designated as on-site, hybrid or remote. The salary range for this role is 37,000 USD - 82,000 USD. The salary range provided is a good faith estimate representative of all experience levels. RTX considers several factors when extending an offer, including but not limited to, the role, function and associated responsibilities, a candidate’s work experience, location, education/training, and key skills. Hired applicants may be eligible for benefits, including but not limited to, medical, dental, vision, life insurance, short-term disability, long-term disability, 401(k) match, flexible spending accounts, flexible work schedules, employee assistance program, Employee Scholar Program, parental leave, paid time off, and holidays. Specific benefits are dependent upon the specific business unit as well as whether or not the position is covered by a collective-bargaining agreement. Hired applicants may be eligible for annual short-term and/or long-term incentive compensation programs depending on the level of the position and whether or not it is covered by a collective-bargaining agreement. Payments under these annual programs are not guaranteed and are dependent upon a variety of factors including, but not limited to, individual performance, business unit performance, and/or the company’s performance. This role is a U.S.-based role. If the successful candidate resides in a U.S. territory, the appropriate pay structure and benefits will apply. RTX anticipates the application window closing approximately 40 days from the date the notice was posted. However, factors such as candidate flow and business necessity may require RTX to shorten or extend the application window. RTX is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability or veteran status, or any other applicable state or federal protected class. RTX provides affirmative action in employment for qualified Individuals with a Disability and Protected Veterans in compliance with Section 503 of the Rehabilitation Act and the Vietnam Era Veterans’ Readjustment Assistance Act. Privacy Policy and Terms: Click on this link to read the Policy and Terms
Infrastructure Specialist/PM
Harris Computer SystemsBased in Ottawa, Ontario, Canada, Harris Computer Systems provides mission-critical software solutions for organizations across the United States and Canada, including healthcare c
Picis is on the lookout for an experienced Infrastructure Specialist to join our team! As an Infrastructure Specialist, this professional will be responsible for the installation, maintenance, and monitoring of network, server and telecom hardware to support internal users. They will complete hands-on implementation, upgrading, trouble-shooting of network and server hardware infrastructure. What your impact will be: - Design and implement network communications and server solutions. - Propose and implement system enhancements. - Works on problems of a moderate scope. - Demonstrates good judgment in selecting methods and techniques for obtaining solutions. - Interacts with internal and some external personnel. - Create and maintain small scope project plans and report on status to PS Director of PM - Run other non - technical small projects as time allows What we're looking for: - Education in Computer Science, Information Technology, or relevant work experience. - 2+ years of experience in the installation, maintenance, and troubleshooting of network, server, and telecom hardware. - Proven ability to address moderate-level technical challenges and provide timely solutions. - Hands-on experience with network and server infrastructure implementation and upgrades. - Good communication skills to collaborate. What will make you stand out: - Project management experience is a strong plus. - Experience in leading small projects and using project management tools. What we offer: - Plenty of opportunities to grow your career - Comprehensive medical, dental, and vision benefits - 3 weeks of vacation plus 5 personal days to recharge - Employee stock ownership and RRSP program - A chance to give back through community involvement - Flexible work arrangements to suit your lifestyle About us: Join Picis, a global leader in perioperative and critical-care information systems, where you'll be part of a team that is genuinely pleased with their colleagues and thrives in a positive, comfortably fast-paced environment. Our collaborative culture is driven by a shared mission to innovate in life-critical hospital areas, enhancing patient care and staff engagement through cutting-edge automated solutions. You'll work alongside experienced professionals, including a professional services team comprised of Registered Nurses and former perioperative staff, who bring deep clinical expertise and a customer-centric perspective to every challenge. Discover a company dedicated to long-term commitment, continuous improvement, and empowering healthcare providers worldwide. About Harris Computer: Harris provides mission critical software solutions for the Public Sector, Healthcare, Utilities and Private Sector verticals throughout North America, Europe, Asia and Australia. Working for Harris is the perfect opportunity to fulfill your professional goals as well as achieve your personal dreams! Our employees enjoy a casual work environment that offers comfort while providing superior service to our customers. We offer a comprehensive benefit package as well as other additional "Perks"! - We empower our employees to make a difference - We have an award-winning culture - We offer opportunity to learn - We are financially strong and we are owned by the largest software company in Canada (CSI) - We have fun! Follow us on social media to learn more about our company values, culture and initiatives! - Instagram: @weareharris - LinkedIn: Harris Computer Salary Range: The hiring range for this role is $75,000 to $90,000 USD per year. Final compensation will be based on experience, skills, market conditions, and internal equity.



