Job Closed

This listing is no longer active.

CVS Health logo
CVS Health

Bringing our heart to every moment of your health.

Software Development Engineer – SRE

DevOps EngineerDevOps EngineerFull TimeRemoteSeniorTeam 10,001+Since 1963H1B No SponsorCompany SiteLinkedIn

Location

Texas

Posted

49 days ago

Salary

$72.1K - $144.2K / year

Seniority

Senior

Job Description

Software Development Engineer – SRE

CVS Health

• Design and maintain a comprehensive observability platform using Grafana, Prometheus, Loki, and Tempo. • Implement proactive monitoring and alerting for: • Microservices and APIs (latency, error rates, availability) • Batch jobs, scheduled workloads, and ETL/data pipelines (success/failure, duration, SLA adherence) • Server and container health (CPU, memory, disk, network, capacity trends) • Database health and performance (availability, replication, query latency, resource utilization) • Application and infrastructure logging, including centralized log ingestion, indexing, and search. • Build actionable alerts with clear runbooks, ownership, and escalation paths to minimize mean time to detect (MTTD) and mean time to resolve (MTTR). • Partner with application, platform, and DevOps teams to instrument services with metrics, traces, and structured logs. • Continuously improve signal quality by reducing alert noise, eliminating false positives, and optimizing thresholds based on historical trends. • Create and maintain dashboards for real-time operational visibility and executive-level health reporting. Support incident response and post-incident reviews by providing high-fidelity telemetry and contributing to root cause analysis.

Job Requirements

  • 5+ years of experience in Site Reliability Engineering, DevOps, or Production Operations
  • Hands-on expertise with Prometheus, Grafana, Loki, and Tempo in large-scale, production environments
  • Strong understanding of monitoring distributed systems spanning both On-Premises and Cloud environments (GCP, Azure)
  • Experience defining SLOs/SLIs and building alerting strategies based on reliability engineering best practices
  • Exceptional attention to detail with the ability to think through complex systems end-to-end, anticipate edge cases, failure modes, and cascading impacts, and proactively design monitoring and alerting to cover both common and rare operational scenarios

Benefits

  • medical, dental, and vision coverage
  • paid time off
  • retirement savings options
  • wellness programs
  • other resources, based on eligibility

Related Categories

Related Job Pages

More DevOps Engineer Jobs

CNX logo

Customer Engineer – Infrastructure – Azure Monitor - F/M/D

CNX

We're Concentrix. The intelligent transformation partner. Solution-focused. Tech-powered. Intelligence-fueled. The global technology and services leader that powers the world’s best brands, today and into the future.

DevOps Engineer49 days ago
Full TimeRemoteTeam 10,001

Job Title: Customer Engineer – Infrastructure – Azure Monitor - F/M/D Job Description We're Concentrix. The intelligent transformation partner. Solution-focused. Tech-powered. Intelligence-fueled. The global technology and services leader that powers the world’s best brands, today and into the future. We’re solution-focused, tech-powered, intelligence-fueled. With unique data and insights, deep industry expertise, and advanced technology solutions, we’re the intelligent transformation partner that powers a world that works, helping companies become refreshingly simple to work, interact, and transact with. We shape new game-changing careers in over 70 countries, attracting the best talent. The Concentrix Technical Products and Services team is the driving force behind Concentrix’s transformation, data, and technology services. We integrate world-class digital engineering, creativity, and a deep understanding of human behavior to find and unlock value through tech-powered and intelligence-fueled experiences. We combine human-centered design, powerful data, and strong tech to accelerate transformation at scale. You will be surrounded by the best in the world providing market leading technology and insights to modernize and simplify the customer experience. Within our professional services team, you will deliver strategic consulting, design, advisory services, market research, and contact center analytics that deliver insights to improve outcomes and value for our clients. Hence achieving our vision. Our game-changers around the world have devoted their careers to ensuring every relationship is exceptional. And we’re proud to be recognized with awards such as "World's Best Workplaces," “Best Companies for Career Growth,” and “Best Company Culture,” year after year. Join us and be part of this journey towards greater opportunities and brighter futures.Customer Engineer – Infrastructure – Azure Monitor Job Description: The Azure Monitor Customer Engineer will work directly with customers, as a consultant and technical advisor to: - Design, Deploy, Review and Assess the health of the infrastructure - Upgrade and maintain deployments - Troubleshoot issues with infrastructure and agents - Tune and optimize for performance - Assist with reporting and visualizations - Implement new management packs - Assist in the development of custom management packs - Provide training in all areas of Azure Monitor to ensure customer goals are met Ideal candidate experience: 15+ years working as a depth expert and technology owner or consultant for Azure monitor Ability to present to multiple levels of customer leadership. Ability to act as a consultant and architect for multiple customers. Broad knowledge across multiple monitoring scenarios: - Windows and Linux Operating Systems - Azure Monitor - KQL Kusto Query language advanced level - URL, Network monitoring - Connecting to ITSM systems - Dashboards, Reporting, and Visualizations - PowerShell scripting Deep level knowledge in at least 3 of the above categories Technical Skills Requirements: Azure Monitor: Broad knowledge of ALL the below areas, with deep understanding of (at least) 4 of the following: Deep understanding of Azure Monitor architecture (metrics vs logs, data flow, ingestion, retention) Strong knowledge of: - Log Analytics workspaces - Azure Monitor Metrics - Diagnostic settings - Resource‑level vs platform‑level telemetry Ability to explain when to use Azure Monitor vs Azure Data Explorer / Grafana / third‑party tools. Additionally, be able to ; - Write complex KQL queries across multiple tables - Use: - parse, extend, mv-expand - joins, time series, summarize patterns - performance‑optimized queries - Build: - reusable queries - functions - summary rules for cost & performance optimization - Debug slow or expensive queries Tools - Visual Studio, Silect, MPViewer, Alert Update Connector, PowerShell Linux OS and Linux Monitoring Report Development Network Monitoring URL Monitoring Related Skills: - System Center Orchestrator - System Center Data Protection Manager - System Center Virtual Machine Manager - System Center Service Manager This position requires a fluent German and English level. #WAH #LI-Remote Location: DEU Work-at-Home Language Requirements: Time Type: Full time

Germany
Peraton Corporation logo

Cloud DevOps Engineer

Peraton Corporation

Peraton Corporation, a national security company headquartered in Herndon, Virginia, supplies solutions for mission-critical programs and systems. Founded in 2017, Peraton's missio

DevOps Engineer49 days ago

Role Description The Department of Interior (DOI) CHS III program will be leading the way for Cloud Hosting and Applications Modernization across DOI and its subordinate bureaus. CHS III will facilitate migration of legacy on-premises applications to a modern, secure and scalable multi-cloud platform. From sensors in active volcanic regions to earthquake detection data, CHS III will be DOI's central cloud processing and data solution. We are seeking a skilled and proactive Cloud DevOps Engineer to join our dynamic team. The successful candidate will collaborate closely with individual science labs and cross-functional teams to understand and assess the requirements, feasibility, and implementation of new workloads on AWS. Responsibilities include: - Planning and executing migrations from on-premises environments to AWS. - Designing scalable infrastructure. - Automating deployment processes using cutting-edge AWS DevOps tools. This position offers the opportunity to work on innovative projects that leverage AWS cloud technologies to drive operational efficiency and build resilient, secure cloud environments. Peraton is seeking a Cloud DevOps Engineer to perform the following tasks: - Incident Management: - Participate in root cause analysis of system incidents. - Ticket Queue Management: - Respond to and resolve tickets in the operations support queue in a timely and efficient manner, ensuring issues are tracked, documented, and closed according to SLAs. - Infrastructure as Code (IaC): - Use tools such as AWS CloudFormation, GitLab and GitLab Runners to manage and maintain AWS infrastructure, including EC2, RDS, S3, ELB, VPC, IAM Roles and Policies, Security Groups and related services. - Automation: - Automate routine operational tasks to improve system reliability, security monitoring and configuration management using AWS step functions, lambda functions, SSM documents, AWS EventBridge, SNS, SQS using scripting languages such as Python, Bash and PowerShell. - Configuration Management: - Implement configuration management solutions (e.g., Ansible, Chef, or AWS Systems Manager). - Monitoring and Logging: - Set up and maintain observability tools (CloudWatch, Prometheus, Grafana, ELK stack, etc.) to monitor infrastructure and application performance. - Security and Compliance: - Ensure security best practices (IAM policies, encryption, VPC design, secrets management) and compliance with organizational standards. - Cost Optimization: - Analyze and optimize AWS cloud costs without compromising performance or scalability. - Collaboration and Leadership: - Work cross-functionally with customer DevOps engineers, attend weekly team meetings and report status on open tickets and DevOps assignments. Qualifications - Candidate must be a US Citizen. - Candidate must have an Active Public Trust or the Ability to Obtain one is required. - 5 years with BS/BA; 3 years with MS/MA; 0 years with PhD. - 3 or more years professional job experience designing, deploying, and managing cloud-based solutions using core AWS services such as EC2, S3, RDS, Lambda, VPC, CloudFormation, and IAM for secure and scalable infrastructure. - Experience with continuous integration and continuous delivery CI/CD pipelines using AWS CodePipeline, CodeBuild, CodeDeploy, GitLab, GitLab Runners, or other third-party tools to automate application deployments. - Experience with Infrastructure as Code tools such as AWS CloudFormation, Terraform, Ansible, or AWS CDK to automate cloud resource provisioning and configuration management. - Experience with Python, Bash, AWS CLI, or other scripting languages to automate deployments, streamline workflows, troubleshoot issues, and resolve operational challenges within AWS environments. - Candidate must have at least one of the following certifications or be able to attain within 90 Days: - AWS Certified CloudOps Engineer – Associate - AWS Certified DevOps Engineer - Professional - AWS Certified Developer – Associate or Professional - AWS Certified Solutions Architect – Associate or Professional Preferred Qualifications - One or more AWS Specialty Certification(s) such as: - AWS Certified Advanced Networking - Specialty - AWS Certified Security - Specialty - AWS AI Fundamentals - Experience with incident management, root cause analysis, and resolving high-priority incidents in large, multi-tenant environments using Jira, Confluence, etc. - Exemplary communication, analytical skills, and technical knowledge across the client environment. - Ability to produce concise and clear technical documentation. - Experience working in an Agile environment. - Experience in leveraging AI tools to accelerate workflows and code generation, and for producing concise and clear technical AI-assisted documentation. - Experience with using or supporting applications such as Dremio, Tableau, Posit and/or Nebari. Benefits - Target Salary Range: $86,000 - $138,000. This represents the typical salary range for this position. Salary is determined by various factors, including but not limited to, the scope and responsibilities of the position, the individual’s experience, education, knowledge, skills, and competencies, as well as geographic location and business and contract considerations. - Depending on the position, employees may be eligible for overtime, shift differential, and a discretionary bonus in addition to base pay. EEO EEO: Equal opportunity employer, including disability and protected veterans, or other characteristics protected by law.

United States
$86K - $138K / year
Job Closed
Sigma Software Group logo

Senior DevOps Engineer – Cybersecurity Platform

Sigma Software Group

We support enterprises, product houses, and startups with custom software solutions development and IT consulting.

DevOps Engineer49 days ago
Full TimeRemoteTeam 1,001-5,000Since 2002H1B No Sponsor

• Architect, scale, and maintain self-managed Redis, Kafka, Elasticsearch/OpenSearch, and MongoDB clusters • Collaborate with Product and Engineering teams to design resilient architectures for high-scale, real-time cloud operations • Proactively identify bottlenecks and security risks, implementing robust solutions with minimal supervision • Ensure high availability and disaster recovery for distributed systems through Infrastructure-as-Code and CI/CD best practices • Optimize performance and reliability of cloud-native systems, focusing on observability and automated recovery processes • Mentor team members and contribute to DevOps knowledge-sharing across the organization

Poland
Job Closed
Sigma Software Group logo

Senior DevOps Engineer – Cybersecurity Platform

Sigma Software Group

We support enterprises, product houses, and startups with custom software solutions development and IT consulting.

DevOps Engineer49 days ago
Full TimeRemoteTeam 1,001-5,000Since 2002H1B No Sponsor

• Architect, scale, and maintain self-managed Redis, Kafka, Elasticsearch/OpenSearch, and MongoDB clusters • Collaborate with Product and Engineering teams to design resilient architectures for high-scale, real-time cloud operations • Proactively identify bottlenecks and security risks, implementing robust solutions with minimal supervision • Ensure high availability and disaster recovery for distributed systems through Infrastructure-as-Code and CI/CD best practices • Optimize performance and reliability of cloud-native systems, focusing on observability and automated recovery processes • Mentor team members and contribute to DevOps knowledge-sharing across the organization

Romania
Job Closed