Job Closed

This listing is no longer active.

Cresta logo
Cresta

Real-Time Intelligence for Contact Centers

Senior Infrastructure Engineer/SRE

DevOps EngineerDevOps EngineerOtherRemoteSeniorTeam 51-200H1B SponsorCompany SiteLinkedIn

Location

United States

Posted

98 days ago

Salary

$205K - $270K / year

Seniority

Senior

Job Description

Senior Infrastructure Engineer/SRE

Cresta

• Partner with engineers to build dev tools that empower developer workflows and deployment infrastructure. • Ensure reliability of multi-cloud Kubernetes clusters and pipelines. • Metrics, logging, analytics, and alerting for performance and security across all endpoints and applications. • Infrastructure-as-code deployment tooling and supporting services on multiple cloud providers. • Automate operations and engineering. Focus on automation so we can spend energy where it matters. • Building machine learning infrastructure that enables AI teams to train, test, and deploy on large-scale datasets.

Job Requirements

  • 5+ years experience in DevOps, Site Reliability Engineering, Production Engineering, or equivalent field.
  • Deep proficiency with coding languages such as Golang or Python.
  • Deep familiarity with container-related security best practices.
  • Production experience working with Kubernetes, and a deep understanding of the Kubernetes ecosystem, including popular open-source tooling such as cert-manager or external-dns. Experience with GPU-enabled clusters is a bonus.
  • Production experience with Kubernetes templating tools such as Helm or Kustomize.
  • Production experience with IAC tools such as Terraform or CloudFormation.
  • Production experience working with AWS and services such as IAM, S3, EC2, and EKS.
  • Production experience with other cloud providers such as Google Cloud and Azure is a bonus.
  • Production experience with database software such as PostgreSQL
  • Experience with GitOps tooling such as Flux or Argo.
  • Experience with CI/CD such as GitHub Actions.

Benefits

  • Comprehensive medical, dental, and vision coverage with plans to fit you and your family
  • Flexible PTO to take the time you need, when you need it
  • Paid parental leave for all new parents welcoming a new child
  • Retirement savings plan to help you plan for the future
  • Remote work setup budget to help you create a productive home office
  • Monthly wellness and communication stipend to keep you connected and balanced
  • In-office meal program and commuter benefits provided for onsite employees

Related Categories

Related Job Pages

More DevOps Engineer Jobs

GFT Technologies logo

DevOps Engineer

GFT Technologies

As a pioneer for digital transformation GFT develops sustainable solutions across new technologies.

DevOps Engineer98 days ago
Full TimeRemoteTeam 10,001+Since 1987H1B No Sponsor

• Gestionar la cola de tickets de soporte: Priorizar y resolver problemas de manera eficiente atendiendo las tareas de trabajo según la entrega de SLA y los recursos disponibles. • Solucionar problemas técnicos: Asistir en el diagnóstico y la resolución de problemas técnicos y escaladas de solicitudes/incidentes entrantes. • Investigar y diagnosticar problemas: Realizar una investigación exhaustiva para diagnosticar y resolver problemas técnicos para nuestro conjunto de herramientas de aplicaciones. • Tareas administrativas en herramientas de Atlassian: Realizar tareas administrativas y mantenimiento de servidores utilizando CLI en herramientas de Atlassian (Jira, Confluence y Bitbucket Data Center). • Personalización de Atlassian: Realizar personalización a nivel de proyecto y espacio para Jira Data Center. • Actualizaciones de producto: Asistir en la realización de actualizaciones de productos para garantizar que nuestros sistemas estén actualizados. • Soporte de herramientas Gen AI: Apoyar a los desarrolladores en la configuración y administración de herramientas Gen AI. • Monitorear la salud de la aplicación: Monitorear continuamente la salud de las aplicaciones para garantizar un rendimiento óptimo.

Costa Rica
Humana logo

Senior Tech Lead – SRE

Humana

Louisville, Kentucky-based Humana is a leading healthcare company that offers a variety of health, wellness, and insurance products and services designed to offer an integrated app

DevOps Engineer98 days ago

Become a part of our caring community and help us put health first Humana is seeking an experienced Senior Tech Lead for our Site Reliability Engineering (SRE) team. This role will champion reliability, scalability, and performance of our critical systems. The ideal candidate will demonstrate strong technical leadership, mentor team members, and collaborate across engineering and business units to drive best practices in reliability and DevOps. Job Role + Responsibilities: - Lead SRE team initiatives focused on system reliability, automation, and operational excellence. - Architect and implement solutions to enhance availability, performance, and scalability of cloud and on-premises services. - Oversee incident management processes, ensuring timely response and thorough root cause analysis. - Develop and refine monitoring, alerting, and reporting frameworks; ensure actionable insights for service health. - Guide adoption of Infrastructure as Code (IaC) and CI/CD pipelines to streamline deployments and reduce risk. - Collaborate with software engineering and product teams to integrate reliability requirements into design and development. - Mentor engineers on SRE principles, fostering a culture of continuous improvement and operational rigor. - Establish service level objectives (SLOs), service level indicators (SLIs), and error budgets in partnership with stakeholders. - Manage on-call rotations, ensuring effective coverage and knowledge sharing. - Document architectural decisions, operational procedures, and incident retrospectives. - Operational Excellence for AI Systems – Identifying AI/ML Use Cases, Influence and implement SRE best practices including SLIs/SLOs for ML workloads, automated remediation, capacity modeling. - Observability & Monitoring for ML - Define and implement monitoring strategies for model drift, data anomalies, pipeline failures, system performance, and user experience. Key responsibilities of this role include: - Proactive risk identification and mitigation during deployments to ensure system stability. - Ensure long-term stability through Technical Debt - Maintaining observability and performance of critical pharmacy applications. - Supporting timely restoration of services during outages, with 24/7 coverage to meet enterprise Service Level Agreements (SLAs). - Driving incident response and root cause analysis to prevent recurrence and improve system resilience. - Drive Operational Excellence for AI Systems Use your skills to make an impact Required Qualifications: - Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience). - 7+ years of relevant experience in SRE, DevOps, or software engineering, including 2+ years in a technical leadership role. - Minimum 5 years' relevant experience with Python, Pyspark, Azure Databricks, Snowflake, SQL, ORACLE, POSTGRES, File Transfer, REST API, and KAFKA - Proficiency with cloud platforms (AWS, Azure, GCP), container orchestration, and automation tools. - Strong scripting and programming skills (e.g., Python, Go, Bash). - Deep understanding of distributed systems, networking, and security principles. - Proven experience leading large-scale incident response and postmortem processes. - Excellent communication and stakeholder management skills. - Experience building automation around: CI/CD (ADO YAML pipelines), Testing and validation. Preferred Qualifications: - Experience in regulated industries (healthcare, finance, etc.). - Certifications in cloud or DevOps technologies. - Familiarity with observability tools (Datadog, Prometheus, Grafana, etc.) Work-At-Home Requirements To ensure Home or Hybrid Home/Office associates’ ability to work effectively, the self-provided internet service of Home or Hybrid Home/Office associates must meet the following criteria: - At minimum, a download speed of 25 Mbps and an upload speed of 10 Mbps is recommended; wireless, wired cable or DSL connection is suggested - Satellite, cellular and microwave connection can be used only if approved by leadership - Associates who live and work from Home in the state of California, Illinois, Montana, or South Dakota will be provided a bi-weekly payment for their internet expense. - Humana will provide Home or Hybrid Home/Office associates with telephone equipment appropriate to meet the business requirements for their position/job. - Work from a dedicated space lacking ongoing interruptions to protect member PHI / HIPAA information Travel: While this is a remote position, occasional travel to Humana's offices for training or meetings may be required. Scheduled Weekly Hours 40 Pay Range The compensation range below reflects a good faith estimate of starting base pay for full time (40 hours per week) employment at the time of posting. The pay range may be higher or lower based on geographic location and individual pay will vary based on demonstrated job related skills, knowledge, experience, education, certifications, etc. $106,900 - $147,000 per year This job is eligible for a bonus incentive plan. This incentive opportunity is based upon company and/or individual performance. Description of Benefits Humana, Inc. and its affiliated subsidiaries (collectively, “Humana”) offers competitive benefits that support whole-person well-being. Associate benefits are designed to encourage personal wellness and smart healthcare decisions for you and your family while also knowing your life extends outside of work. Among our benefits, Humana provides medical, dental and vision benefits, 401(k) retirement savings plan, time off (including paid time off, company and personal holidays, volunteer time off, paid parental and caregiver leave), short-term and long-term disability, life insurance and many other opportunities. Application Deadline: 03-05-2026 About us Humana Inc. (NYSE: HUM) is committed to putting health first – for our teammates, our customers and our company. Through our Humana insurance services and CenterWell healthcare services, we make it easier for the millions of people we serve to achieve their best health – delivering the care and service they need, when they need it. These efforts are leading to a better quality of life for people with Medicare, Medicaid, families, individuals, military service personnel, and communities at large. ​ Equal Opportunity Employer It is the policy of Humana not to discriminate against any employee or applicant for employment because of race, color, religion, sex, sexual orientation, gender identity, national origin, age, marital status, genetic information, disability or protected veteran status. It is also the policy of Humana to take affirmative action, in compliance with Section 503 of the Rehabilitation Act and VEVRAA, to employ and to advance in employment individuals with disability or protected veteran status, and to base all employment decisions only on valid job requirements. This policy shall apply to all employment actions, including but not limited to recruitment, hiring, upgrading, promotion, transfer, demotion, layoff, recall, termination, rates of pay or other forms of compensation and selection for training, including apprenticeship, at all levels of employment.

United States
Job Closed
OtherRemoteTeam 51-200

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more. Role Description We are seeking a passionate and skilled Developer Experience (DX) / Developer Productivity Engineer (DPE) to join our team. Our mission is to enhance the developer experience by creating efficient tools, automating workflows, and improving overall productivity and security for our engineering teams. This role focuses on supporting developers with the best tools and environments while ensuring our products meet stringent security standards. - Working closely with our Agile development team to create and optimize automated tooling to support the Software Development Lifecycle. - Design, Develop, Maintain, and Optimize our continuous integration (CI) and continuous deployment (CD) solution using Jenkins Shared Libraries in Groovy. - Design, Develop, Automate, and continuously ensure the security of our containerized tooling and microservices. - Maintain the security of all project infrastructure, alongside program's operations and security teams. Qualifications - Minimum 10+ years total IT experience. - Minimum 2+ years creating CI/CD pipelines in Jenkins, Jenkins files using Jenkins Shared Libraries in Groovy. - Minimum 2+ years' experience deploying web applications to Kubernetes clusters. - Minimum 4+ years' experience with containerizing microservices with tools such as Docker, Dockerfiles. - Minimum 2+ years' experience utilizing Kubernetes management tools such as Rancher (preferred), OpenShift, VMware Tanzu, GKE with Anthos, etc. - Minimum 4+ years' developing Java applications. - Minimum 2+ years' developing React/Typescript front end. - Experience optimizing Jest unit tests and improving Node.js build processes for performance and efficiency. - Strong understanding of building Java applications. - Proficient with Gradle or Maven (Gradle preferred). - Proficient with Linux and Bash. - Understanding of application architecture and the tooling. - Understanding of git branching strategies for developer productivity. - Experience and understanding of the entire Software Development Lifecycle (SDLC). - Strong understanding of code testing. - Deep knowledge of various types of testing (unit, integration, acceptance, etc.) and ability to orchestrate them. - Experience with setting up and optimizing automated testing pipelines. - Ability to troubleshoot and resolve issues related to test failures, flakiness, and bottlenecks in the testing process. - Understanding of code coverage metrics. - Bonus points for experience in an acceptance test framework such as Cucumber. - Experience with static analysis (SAST) tools such as SonarQube (preferred), Fortify, Checkmarx, or similar. - Must possess demonstrated experience with securing code, operating systems, and containers. - Experience with fully automating CI/CD pipelines end-to-end, from code commits to production. - Have effective verbal and non-verbal communication with peers and clients. - Ability to work in an open team environment. - Security-first mindset. Company Description

United States
Job Closed
Renishaw logo

Site Reliability Engineering Lead

Renishaw

LexisNexis® Risk Solutions provides customers with solutions and decision tools that combine public and industry specific content with advanced technology and analytics to assist them in evaluating and predicting risk and enhancing operational efficiency. We use the power of data and advanced analytics to help our customers make better, timelier decisions. By bringing clarity to information, we ultimately help make communities safer, insurance rates more accurate, commerce more transparent, business decisions easier and processes more efficient. You can learn more about LexisNexis Risk at the link below: LexisNexis Risk Solutions

DevOps Engineer98 days ago
OtherRemoteTeam 5,001-10,000

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more. Role Description This role will lead a team of SRE’s that use the DORA DevOps Metrics as their North Star, and who are systematically driving towards operational excellence. - Deliver cloud-native solutions and patterns that are highly elastic. - Empower stakeholders and reduce toil through self-service pipelines. - Drive best-practices in cloud-native, security, and DRY through your team. - Mentor your team in solving deep technical issues, advanced cloud infrastructure topics, and complex coding problems. - Set an example of methodical, systematic task execution for your team. - Work with project managers and stakeholders to provide status and reporting. - Act as an ambassador to other teams, finding common ground and defining clear agreements. - Drive projects to schedule. - Perform code reviews with an eye toward rigor and best practice. - Apply continuous process improving techniques across the operation. - Automate everything. You will ensure that everything is IAC, and that your team’s infra can be rebuilt from the ground up without relying on manual configurations or the Azure Portal. Qualifications - Leadership/design of application and/or infrastructure migration projects from on-prem to cloud - Proven expert in partnering and leading technology resources in solving complex business needs - Cloud architecture design and implementation to solve key business needs and meet team goals - Experience with Terraform - Experience with GitHub Actions - Deep knowledge of Azure Networking including private endpoints, private DNS zones, peerings, NSG rules, route tables, and VNETs. - Experience with Azure Function Apps - Knowledge of IaaS in Azure - Understanding of Microsoft Entra ID, including RBAC, service principals, managed identities - Expertise in a scripting language, either PowerShell or Bash. - Experience implementing data virtualization technologies - Experience managing cloud storage technologies (blob, table, file, queues, and services) - Strong and enthusiastic technologist, able to demonstrate broad technical cloud knowledge Requirements - Performance analysis and tuning - Experience with Purview, Fabric, Unity Catalog highly desired - Experience with Azure Data Factory highly desired - Experience with Azure Machine Learning and Azure DataBricks Benefits - Base Pay Range: Home based-Illinois $124,200 - $230,800. - If performed in Chicago, IL, the base pay range is $130,200 - $241,800. - This job is eligible for an annual incentive bonus.

United States
Job Closed