Bringing our heart to every moment of your health.
Staff Observability Data Infrastructure Engineer
Location
Maryland + 3 moreAll locations: Maryland | Michigan | Minnesota | Texas
Posted
52 days ago
Salary
$130.3K - $260.6K / year
Seniority
Lead
Job Description
Staff Observability Data Infrastructure Engineer
CVS Health
• Design, build, and operate high-volume log, metric, and trace pipelines using Databricks, cloud data lakes, and distributed processing engines • Architect and evolve an Observability Lakehouse aligned with OpenTelemetry (OTEL) data models and standards • Implement ingestion and transformation workflows using technologies such as Cribl, Vector, Jenkins, GitHub Actions, or equivalent tools • Normalize, model, and enrich telemetry data to support detection engineering, forensics, and operational analytics • Develop scalable ETL/ELT frameworks, Delta Lake architectures, and automated data quality validation for unstructured and semi-structured data • Partner with Security Engineering, SRE, Cloud, and SOC teams to improve enterprise visibility and detection accuracy • Build and maintain CI/CD pipelines and reusable Infrastructure-as-Code (IaC) patterns for observability platform deployment • Identify and resolve performance, latency, cost, and reliability issues across telemetry pipelines • Contribute to engineering standards, documentation, and knowledge sharing across observability and security platforms
Job Requirements
- 7+ years of experience building and operating log, metric, and trace pipelines in Data, Security Data, or Observability Engineering roles
- 5+ years of hands-on experience with Databricks, Apache Spark, or other large-scale distributed data platforms
- 5+ years of experience working across cloud platforms (AWS, Azure, or GCP)
- 5+ years of production experience using SQL and Python in data-intensive environments
- 3+ years of experience with enterprise observability platforms (Splunk, Datadog, Elastic, or equivalent)
- 3+ years of experience with high-throughput ingestion and streaming technologies such as Cribl, Vector, or Kafka
- 3+ years of experience designing telemetry systems aligned to OpenTelemetry (OTEL) or similar standards
- Familiarity with Delta Lake, Unity Catalog, metadata management, and data lineage
- Understanding of security governance, auditing, access controls, and sensitive data handling
- hands-on experience with Infrastructure as Code (Terraform, ARM/Bicep, CloudFormation)
Benefits
- medical, dental, and vision coverage
- paid time off
- retirement savings options
- wellness programs
- comprehensive benefits package
Related Guides
Related Categories
Related Job Pages
More Infrastructure Engineer Jobs
Lead Cloud Infrastructure Engineer / Site Reliability Engineer (SRE)
Job BoardCorelight is the cybersecurity company that transforms network and cloud activity into evidence—evidence that elite defenders use to proactively hunt for threats, accelerate response to cyber incidents, gain complete network visibility, and create powerful analytics using machine-learning and behavioral analysis tools. We are the fastest-growing Network Detection and Response (NDR) platform in the industry. We are proud of our culture and values—driving diversity of background and thought, low-ego results, applied curiosity, and tireless service to our customers and community. Corelight is committed to a geographically dispersed yet connected employee base with employees working remotely and from office locations worldwide.
Do you want to help make the world safe from cyber attack? At Corelight, we believe that the best approach to cybersecurity risk starts with the network. Attackers can evade endpoint detection, firewalls and many other technologies - but they can’t avoid leaving digital footprints on the networks they traverse. Built on open-source innovations from Zeek, Suricata and YARA and refined through years of real-world use, Corelight transforms network footprints from physical, virtual and cloud networks into actionable insights. Our customers use these insights to speed incident response and proactively hunt for threats. As a Lead Cloud Infrastructure Engineer / Site Reliability Engineer (SRE), you will ensure the stability, performance, and security of our Federal region’s cloud platform. You’ll manage infrastructure and operations with a focus on availability, latency, performance optimization, monitoring, incident response, and capacity planning. This role requires maintaining a FedRAMP-compliant environment and working closely with teams to meet the highest standards of security and compliance. We adopt an "everything as code" approach, leveraging automation and best practices to create an efficient, reliable, and scalable infrastructure. You will be instrumental in maintaining core infrastructure services that are robust, secure, and capable of processing high volumes of data seamlessly. The successful candidate must be a U.S. citizen and may need to perform work that the U.S. government has specified can only be carried out by a U.S. citizen on U.S. soil. Responsibilities - Collaborate with software engineering teams to ensure the reliability, performance, and security of the Federal region’s infrastructure. - Design, deploy, and scale AI/ML/LLM infrastructure across cloud platforms (AWS, Azure, or GCP) ensuring high reliability and performance. - Manage and optimize Kubernetes environments (EKS, AKS, GKE) for AI services, data pipelines, and model operations. - Build and automate end-to-end data and model pipelines for fine-tuning, inference, and RAG workloads using Terraform, Python, and CI/CD tooling. - Utilize automation tools such as GitOps, CI/CD pipelines, and containerization technologies (Docker, Kubernetes) to streamline ML/LLM tasks across the Large Language Model lifecycle. - Implement monitoring, observability, and reliability best practices using Prometheus, Grafana, ELK/EFK, Langfuse, and SLI/SLO/SLA frameworks. - Participate in 24x7 on-call rotations, leading incident response, performance tuning, and cost optimization across SaaS Platform and production workloads - Own infrastructure end to end, leading scaling initiatives, deployments, and automation, and providing technical leadership across the team Qualifications/Requirements: - Bachelor’s or Master’s degree in Computer Science, Engineering, or related field, or equivalent experience. - 8+ years in SRE, DevOps, Platform Engineering, MLOps, or Cloud Infrastructure roles. - 4+ years of production experience with Kubernetes (EKS, GKE, AKS) and containerization tools like Docker. - Strong programming skills in Python and proficiency in Zyphyrscript, Bash, Go, or PowerShell. - Proficiency with Infrastructure-as-Code tools (Terraform, CloudFormation). - Experience with Kubernetes Operators, Helm, GitOps (ArgoCD, Flux), or Service Mesh (Istio, Linkerd). - Exposure to serverless compute (AWS Lambda, Azure Functions). - Experience building or automating data and model pipelines for AI/ML/LLM workloads (e.g., RAG, fine-tuning, inference). - Strong understanding of observability and monitoring using Prometheus, Grafana, ELK/EFK, Langfuse, or similar platforms. - Familiarity with SLI/SLO/SLA practices, incident response, and reliability engineering in production environments. Preferred Qualifications (Nice to Have): - Cloud certifications (AWS, Azure, or GCP – e.g., Solutions Architect, DevOps Engineer). - Experience with agentic AI frameworks (CrewAI, LangGraph, AutoGen). - Background in hybrid or on-prem AI deployments, including OpenShift or Rancher. - Familiarity with configuration management (Ansible, Chef, Puppet). - Contributions to open-source AI/ML, DevOps, or platform tooling. - Experience with multimodal AI or model observability platforms (RAGAS, AgentOps, Langtrace), Distributed Tracing, OpenTelemetry. - Knowledge of performance tuning, cost efficiency, or capacity planning for AI/LLM infrastructure. - Understanding of security controls and FedRAMP compliance for cloud and various workloads. Additional Requirements Due to the criteria and security levels required for Corelight’s FedRAMP program, this position requires: - U.S. citizenship at the time of hire. - Residence within the contiguous United States. - Willingness to undergo a Single Scope Background Investigation, if required. Fueled by investments from top-tier venture capital organizations such as Crowdstrike, Accel and Insight, Corelight is the fastest growing network detection and response platform in the industry. Our customers trust us to protect mission-critical assets in leading enterprises, government, and research institutions worldwide. We are leading the way with AI-assisted workflows, machine learning models, cloud security and SaaS-based solutions to arm defenders with the tools and knowledge they need to disrupt cyber attacks. Our team of passionate innovators are dedicated to solving some of the toughest challenges in cybersecurity, while fostering a collaborative, inclusive, and growth-oriented culture. Corelight is committed to a geographically distributed yet connected employee base with employees working from home and office locations around the world. At Corelight, we take pride in the diversity of our backgrounds and perspectives, and we are committed to fostering an inclusive environment that strengthens our company. By embracing a wide range of experiences, backgrounds, neurodiversity, talents, and approaches to problem-solving, we aim to create a workplace where everyone can thrive and contribute their best. We are looking forward to meeting you. Check us out at www.corelight.com Notice of Pay Transparency: The compensation for this position may vary depending on factors such as your location, skills and experience. Depending on the nature and seniority of the role, a percentage of compensation may come in the form of a commission-based or discretionary bonus. Equity and additional benefits will also be awarded. Compensation Range $172,000—$219,000 USD
Lead Infrastructure Engineer
TKO Group Holdings, IncIMG is a leading global sports marketing agency, specializing in media rights management and sales, multi-channel content production and distribution, brand partnerships, strategic consulting, digital services, and events management. Powers growth of revenues, fanbases, and IP for more than 200 federations, associations, events, and teams Subsidiary of TKO Group Holdings, Inc. (NYSE: TKO)
Who We Are: IMG is a leading global sports marketing agency, specializing in media rights management and sales, multi-channel content production and distribution, brand partnerships, strategic consulting, digital services, and events management. It powers growth of revenues, fanbases and IP for more than 200 federations, associations, events, and teams, including the National Football League, English Premier League, International Olympic Committee, National Hockey League, Major League Soccer, ATP and WTA Tours, the AELTC (Wimbledon), Euroleague Basketball, CONMEBOL, DP World Tour, and The R&A, as well as UFC, WWE, and PBR. IMG is a subsidiary of TKO Group Holdings, Inc. (NYSE: TKO), a premium sports and entertainment company. TKO Group Holdings, Inc. (NYSE: TKO) is a premium sports and entertainment company. TKO owns iconic properties including UFC, the world’s premier mixed martial arts organization; WWE, the global leader in sports entertainment; and PBR, the world’s premier bull riding organization. Together, these properties reach 1 billion households across 210 countries and territories and organize more than 500 live events year-round, attracting more than three million fans. TKO also services and partners with major sports rights holders through IMG, an industry-leading global sports marketing agency; and On Location, a global leader in premium experiential hospitality. Infrastructure Lead acts as the senior onsite technical authority and IT Incident Commander of all IT matters while at site, migration related or otherwise. Leads Windows application migrations, absorbs Helpdesk and project escalations, triages concurrent incidents in real time, and maintains operational control during cutovers. Shields Quest migration engineer from noise while continuously improving tactics and playbooks. This role requires exceptional situational awareness, extremely high technical aptitude, decisive judgment under pressure, and the ability to manage chaos in ambiguous migration scenario. What You’ll Do- • Act as the senior onsite IT authority and Incident Commander during migrations • Lead Windows application migrations tied to the project • Absorb and triage escalations from Helpdesk, project teams, and stakeholders • Serve as the final onsite escalation point for all IT issues, migration-related or otherwise • Manage multiple concurrent incidents while maintaining a global view of migration health • Monitor, assign, and work Hypercare ServiceNow Assignment Group incidents • Continuously refine escalation paths, playbooks, and migration tactics What Success Looks Like • Operational issues are contained onsite • Systemic problems are identified and mitigated early • Migration execution quality improves consistently • Stakeholder confidence remains high during cutovers What We’re Looking For • Senior-level Windows infrastructure and application migration experience • Proven experience operating in high-pressure incident or cutover environments • Strong judgment, composure, and decision-making under ambiguity • Experience supporting enterprise migrations, M&A programs, or complex IT transformations TKO EEO Statement: TKO is an Equal Opportunity Employer and complies with all applicable federal, state, and local laws regarding non-discrimination in employment. TKO makes employment decisions based on merit and qualifications, without considering an employee’s or applicant’s race, color, religion, sex, sexual orientation, gender identity or expression, national origin, age, disability, marital status, veteran status, or any other basis prohibited under federal, state or local laws governing non-discrimination in employment in every location in which the Company has facilities. TKO also provides reasonable accommodations for qualified individuals with disabilities in accordance with the Americans with Disabilities Act (ADA) and applicable state or local laws. For information about Privacy and Information Security for TKO employment candidates, please review our Privacy Policy. For information regarding Terms of Use for this and other TKO websites, please review our Terms of Use.
Senior Infrastructure Software Engineer
DropboxDropbox is the one place to keep life organized and keep work moving.
Role Description As an Infrastructure Engineer, your role will be crucial in shaping and constructing the robust systems that not only support our current flagship products but also lay the groundwork for the next wave of engineering innovations. From optimizing user experiences across various projects to ensuring seamless scalability and data integrity, you'll be at the forefront of shaping the technological backbone of our platform. Collaborating closely with cross-functional teams, you'll leverage your expertise to tackle audacious challenges and push the boundaries of what's possible. Your contributions will directly impact millions of users, as every line of code you write furthers our mission to revolutionize the way people work and collaborate. Join us in redefining the future, where your passion for building scalable, reliable systems will drive meaningful change on a global scale. Our Engineering Career Framework is viewable by anyone outside the company and describes what’s expected for our engineers at each of our career levels. Check out our blog post on this topic and more here. Responsibilities - Build infrastructure capable of managing metadata for hundreds of billions of files, handling hundreds of petabytes of user data, and facilitating millions of concurrent connections. - Lead the expansion of Dropbox's function as the data-fabric, connecting hundreds of millions of applications, devices, and services globally, while also driving initiatives to enhance interoperability and adaptability across diverse ecosystems. - Measure and optimize Dropbox's analytics platform to maintain its status as one of the most advanced in the industry for extracting meaningful insights from vast data volumes. - Collaborate with cross-functional teams to innovate and implement solutions that enhance the performance, reliability, and security of Dropbox's infrastructure, ensuring a seamless experience for users worldwide. - Proactively identify new opportunities and drive improvements in current project states, advocating for and implementing changes that potentially impact broader business initiatives across teams or products. - Proficiency in effectively navigating through ambiguous situations and uncertainties, demonstrating adaptability and strategic thinking to steer projects towards successful outcomes. On-call work may be necessary occasionally to help address bugs, outages, or other operational issues, with the goal of maintaining a stable and high-quality experience for our customers. Requirements - BS, MS, or PhD in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent technical experience - 8+ years of professional software development experience - Proven track record constructing and managing expansive, multi-threaded, geographically dispersed backend systems - Proficient in programming and debugging across a range of languages such as Python, Go, C/C++, or Java - Extensive experience working with operating system internals, filesystems, databases, networks, and compilers considered advantageous - Ability to navigate and thrive in ambiguous situations, showcasing adaptability and open-ended problem solving. - Capable of taking ownership of long-term projects and seeing them through to completion. - Ability to set medium-to-long term strategy for business-impacting projects. Preferred Qualifications - Proven track record of defining technical roadmaps for the team. Compensation Poland Pay Range 333 200 zł—450 800 zł PLN
Senior Infrastructure Software Engineer, DevFleet
DropboxDropbox is the one place to keep life organized and keep work moving.
Role DescriptionAs an Infrastructure Engineer on the Developer Platform team, your role will be crucial in shaping and constructing the robust systems that not only support our current flagship products but also lay the groundwork for the next wave of engineering innovations. From optimizing user experiences across various projects to ensuring seamless scalability and data integrity, you'll be at the forefront of shaping the technological backbone of our platform. Collaborating closely with cross-functional teams, you'll leverage your expertise to tackle audacious challenges and push the boundaries of what's possible. Your contributions will directly impact millions of users, as every line of code you write furthers our mission to revolutionize the way people work and collaborate. Join us in redefining the future, where your passion for building scalable, reliable systems will drive meaningful change on a global scale. Our Engineering Career Framework is viewable by anyone outside the company and describes what’s expected for our engineers at each of our career levels. Check out our blog post on this topic and more here. Responsibilities - Build infrastructure capable of managing metadata for hundreds of billions of files, handling hundreds of petabytes of user data, and facilitating millions of concurrent connections. - Lead the expansion of Dropbox's function as the data-fabric, connecting hundreds of millions of applications, devices, and services globally, while also driving initiatives to enhance interoperability and adaptability across diverse ecosystems. - Measure and optimize Dropbox's analytics platform to maintain its status as one of the most advanced in the industry for extracting meaningful insights from vast data volumes. - Collaborate with cross-functional teams to innovate and implement solutions that enhance the performance, reliability, and security of Dropbox's infrastructure, ensuring a seamless experience for users worldwide. - Mentor and guide junior team members, sharing knowledge and best practices to cultivate a culture of continuous learning and professional growth within the infrastructure engineering team. - Stay current with emerging technologies and industry trends to continuously enhance Dropbox's infrastructure and maintain a competitive edge in the market. On-call work may be necessary occasionally to help address bugs, outages, or other operational issues, with the goal of maintaining a stable and high-quality experience for our customers. Requirements - BS, MS, or PhD in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent technical experience - 9+ years of professional software development experience - Proven track record constructing and managing expansive, multi-threaded, geographically dispersed backend systems - Proficient in programming and debugging across a range of languages such as Python, Go, C/C++, or Java - Proficiency with operating system internals, filesystems, databases, networks, and compilers. - Proven track record of defining & delivering well-scoped milestones/projects - Ability to independently define right solutions for ambiguous, open-ended problems Preferred Qualifications - Familiarity with Semaphores and Mutexes CompensationPoland Pay Range 333 200 zł—450 800 zł PLN

