Job Closed
This listing is no longer active.
Fulfilling the promise of precision medicine through quality and innovation.
Staff DevOps Engineer
Location
United States + 1 moreAll locations: United States | Canada
Posted
60 days ago
Salary
$140K - $179K / year
Seniority
Lead
No structured requirement data.
Job Description
Staff DevOps Engineer
Caris Life Sciences
At Caris, we understand that cancer is an ugly word—a word no one wants to hear, but one that connects us all. That’s why we’re not just transforming cancer care—we’re changing lives. We introduced precision medicine to the world and built an industry around the idea that every patient deserves answers as unique as their DNA. Backed by cutting-edge molecular science and AI, we ask ourselves every day: “What would I do if this patient were my mom?” That question drives everything we do. But our mission doesn’t stop with cancer. We're pushing the frontiers of medicine and leading a revolution in healthcare—driven by innovation, compassion, and purpose. Join us in our mission to improve the human condition across multiple diseases. If you're passionate about meaningful work and want to be part of something bigger than yourself, Caris is where your impact begins. Position Summary Serve as a Staff DevOps Engineer specializing in AWS and Kubernetes to design, implement, and optimize scalable, secure cloud-native infrastructure. Lead PoC initiatives, oversee monitoring solutions, and translate SOX compliance into actionable cloud implementation plans. Break down silos by building a comprehensive team knowledge base, ensuring broad support capabilities. Provide technical leadership in cloud migration, security, and DevOps best practices, driving innovation and operational excellence across the organization. Job Responsibilities - Lead the design, implementation, and management of Kubernetes clusters on AWS EKS, ensuring high availability, scalability, and security. Implement and manage advanced features including autoscaling, monitoring, logging, and security policies. - Spearhead proof-of-concept (PoC) initiatives for new tools and environments, evaluating their potential benefits for the organization. - Manage the full lifecycle of Kubernetes clusters, including regular upgrades, patch management, version control, and performance optimization. - Provide expert-level support and guidance to teams for deploying and optimizing applications on Kubernetes, including container orchestration and service mesh implementation. - Design and implement monitoring and alerting solutions for applications and infrastructure using CloudWatch, Prometheus, and Datadog. - Develop observability standards and dashboards, leveraging AI/AIOps approaches and SRE agents to enable anomaly detection, alert noise reduction, and automated root cause analysis. - Develop and maintain Infrastructure as Code (IaC) using tools such as Terraform or AWS CDK, and implement CI/CD pipelines for efficient application deployment and image management. - Design and implement security solutions, including the deployment and management of security tools, and translate SOX compliance requirements into actionable implementation plans for cloud environments. - Lead initiatives for cloud migration and modernization of legacy applications, collaborating with cross-functional teams to support their cloud and infrastructure needs. - Provide technical leadership and mentorship to junior engineers on cloud technologies and DevOps practices, implementing knowledge-sharing initiatives to ensure broad support capabilities across the team. - Stay current with emerging AWS services and features, evaluating their potential benefits and optimizing cloud resource utilization and cost-efficiency. - Develop and maintain comprehensive documentation, including a team knowledge base, runbooks, and process documentation to eliminate information silos. - Proactively identify areas of inefficiency and develop strategic plans for process improvements across the DevOps and cloud infrastructure landscape. - Participate in on-call rotations to support critical cloud infrastructure and respond to emergency issues as needed. Required Qualifications - Bachelor's degree in Computer Science, Information Technology, or related field. - 7+ years of experience in DevOps or Site Reliability Engineering roles. - 5+ years of hands-on experience with AWS services and cloud architecture. - 5+ years of hands-on experience with Kubernetes, including deep expertise in cluster management, troubleshooting, and optimization. - Strong proficiency in at least one programming language (e.g., Python, Go, Java). - Extensive experience with Infrastructure as Code tools (e.g., Terraform, CloudFormation, AWS CDK). - Deep understanding of containerization technologies (Docker) and orchestration platforms (Kubernetes) including security best practices. - Experience with CI/CD tools and methodologies, particularly GitLab CI and Github actions. - Strong knowledge of networking concepts and implementation in cloud environments. - Excellent problem-solving skills and ability to troubleshoot complex systems. - Proven ability to lead PoC initiatives and evaluate new technologies. - Demonstrated experience in creating and maintaining technical documentation and knowledge bases. - Demonstrated ability to identify operational inefficiencies and develop strategic plans for process improvements in complex cloud and DevOps environments. - Strong analytical skills with the ability to translate technical insights into actionable business recommendations. - Strong communication and mentoring skills, with the ability to effectively transfer knowledge to team members of varying experience levels. - Proficient in Microsoft Office Suite, specifically Word, Excel, Outlook, and general working knowledge of Internet for business use. Preferred Qualifications - AWS Professional level certifications (e.g., Solutions Architect Professional, DevOps Engineer Professional). - Kubernetes certifications (e.g., CKA, CKAD, CKS). - Experience with multiple cloud platforms (e.g., AWS, GCP) for multi-cloud architectures. - Knowledge of database technologies, including MySQL, PostgreSQL, and DynamoDB. - Proficiency with specific monitoring and observability tools such as Prometheus, Grafana, and ELK stack. - Familiarity with serverless architectures and microservices. - Hands-on experience with configuration management tools (e.g., Ansible, Chef, Puppet). - Experience in implementing knowledge management systems or tools in a DevOps environment. - Contributions to open-source projects or personal projects demonstrating cloud expertise. Physical Demands - Ability to sit for extended periods while working on a computer. Training - All job specific, safety, and compliance training are assigned based on the job functions associated with this employee. Other - This position may require periodic travel and some evenings, weekends and/or holidays. - Job may require after-hours response to emergency issues and on-call availability as required. - Willingness to pursue ongoing professional development and stay current with emerging technologies in the field. - Job responsibilities may be modified or expanded at the discretion of management to meet changing business needs and organizational requirements. Annual Hiring Range $140,000 - $179,000 Actual compensation offer to candidate may vary from posted hiring range based upon geographic location, work experience, education, and/or skill level. The pay ratio between base pay and target incentive (if applicable) will be finalized at offer. Description of Benefits - Highly competitive and inclusive medical, dental and vision coverage options - Health Savings Account for medical expenses and dependent care expenses - Flexible Spending Account to pay for certain out-of-pocket expenses - Paid time off, including: vacation, sick time and holidays - 401k match and Financial Planning tools - LTD and STD insurance coverages, as well as voluntary benefit options - Employee Assistance Program - Pet Insurance - Legal Assistance - Tuition Assistance Conditions of Employment: Individual must successfully complete pre-employment process. This job description reflects management’s assignment of essential functions. Nothing in this job description restricts management’s right to assign or reassign duties and responsibilities to this job at any time. Caris Life Sciences is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, religion, color, national origin, gender, gender identity, sexual orientation, age, status as a protected veteran, among other things, or status as a qualified individual with disability.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
DevOps Engineer
Robots and PencilsRobots & Pencils is an applied AI engineering firm building the next frontier of business architecture. We design and ship AI co-workers that integrate into enterprise operations and deliver measurable results for our clients. Founded in 2009, we are smaller, faster, and more senior by design, with teams averaging 15+ years of experience.
Role Description We're looking for a DevOps/Backend Engineer (Level 4) to own infrastructure, CI/CD, and backend services for a large-scale platform serving a major public university. You will be responsible for maintaining and evolving a multi-repo AWS serverless architecture spanning: - Lambda functions - Step Functions - DynamoDB - API Gateway - SQS - CloudFront - Containerized services running on ECR This is not a greenfield project. You'll be joining a mature, production system with real users depending on it daily. The platform orchestrates complex business workflows through event-driven serverless pipelines, integrating with multiple external enterprise APIs and internal microservices. You'll work closely with backend and frontend engineers to keep this system reliable, observable, and shipping safely across multiple environments. By joining us, you leverage our Advanced AWS Partnership and the highly exclusive AWS Patterns Partnership, a distinction held by only 11 companies worldwide out of 190,000. Qualifications - 7+ years of professional software engineering experience - Deep hands-on experience with AWS serverless services: Lambda, API Gateway, Step Functions, DynamoDB, SQS, CloudFront, ECR - Strong proficiency writing and maintaining Terraform at scale - Production experience with CI/CD pipeline design and operation (Jenkins preferred) - Solid understanding of containerization (Docker) and image registry management (ECR) - Experience with multi-environment promotion strategies and deployment safety patterns Requirements - Backend Development & Operations - Develop and maintain Node.js and Python Lambda functions powering REST APIs, data aggregation, and business logic - Build and optimize Step Function workflows orchestrating multi-service operations - Integrate with external enterprise APIs and internal microservices, handling authentication, rate limiting, and error propagation - Implement deferred processing patterns using SQS for handling concurrent and long-running operations - Infrastructure & IaC - Own and evolve Terraform modules provisioning API Gateway, Lambda, DynamoDB, Step Functions, SQS, CloudFront, and ECR across multiple AWS accounts - Manage multi-account AWS environments with proper IAM boundaries, secrets management (Vault), and S3 state backends - Design and implement infrastructure changes that support zero-downtime deployments and safe rollback strategies - Maintain and optimize Docker image builds and ECR lifecycle policies for containerized services - CI/CD & Release Engineering - Own Jenkins pipelines that build, test, and promote artifacts through staged environments with manual approval gates - Improve build reliability, speed, and developer feedback loops across multiple repositories - Implement and maintain pre-commit hooks, automated testing gates, and lint/type-check enforcement - Coordinate production deployments following established release procedures - Observability & Reliability - Build monitoring, alerting, and logging infrastructure for a distributed serverless system - Troubleshoot production issues spanning API Gateway, Lambda cold starts, Step Function timeouts, and DynamoDB throttling - Establish and maintain SLOs for critical user-facing workflows - Improve error handling and retry strategies across event-driven workflows Benefits - Work on a production platform that directly impacts real users at scale - Operate within a mature, multi-repo AWS serverless architecture with real complexity - Collaborate with experienced engineers across backend, frontend, and infrastructure disciplines - Grow within a globally recognized AWS partner ecosystem with access to cutting-edge cloud practices - Influence DevOps culture, tooling, and reliability standards across the engineering organization
DevOps Engineer (Automation Systems) - Freelance AI Trainer
MindriftApply → Pass qualification(s) → Join a project → Complete tasks → Get paid. Project time expectations: Tasks are estimated to require around 10–20 hours per week during active phases, based on project requirements; This is an estimate, not a guaranteed workload, and applies only while the project is active. Note: Rates vary based on expertise, skills assessment, location, project needs, and other factors. Higher rates may be offered to highly specialized experts. Lower rates may apply during onboarding or non-core project phases. Payment details are shared per project.
Mindrift is looking for skilled DevOps / Automation Engineers (Infrastructure & Scaling) to join the Tendem project (https://tendem.ai/) and build and maintain scalable infrastructure for automation workflows within our hybrid AI + human environment. In this role, as an AI Pilot – that’s how we refer to this position at Mindrift – you’ll collaborate with Tendem Agents that handle repetitive tasks, while you provide infrastructure expertise, system reliability, and performance optimization to ensure stable and scalable automation pipelines. This part-time remote opportunity is ideal for professionals with hands-on experience in cloud infrastructure, system deployment, and supporting high-load automation environments. What We Do The Mindrift platform connects specialists with AI projects from major tech innovators. Our mission is to unlock the potential of Generative AI by tapping into real-world expertise from across the globe. About the Role This is a freelance role for a Tendem project. As a DevOps / Automation Engineer, you'll design, deploy, and maintain infrastructure supporting automation workflows, ensuring system stability, scalability, and high availability across environments. Key Responsibilities - Deploy and maintain self-hosted automation environments (e.g., n8n instances or similar systems). - Design and manage infrastructure to support high-volume workflows and large-scale data processing. - Scale automation systems to handle concurrent workloads and performance-intensive tasks. - Set up monitoring, logging, and alerting systems to track workflow performance and detect failures. - Implement alerts for critical events (e.g., node failures, performance drops, system instability). - Ensure system uptime, reliability, and fault tolerance across automation pipelines. Compensation On this project, contributors can earn up to $60 per hour equivalent, depending on their level and pace of contribution. Compensation varies across projects depending on scope, complexity, and required expertise. Please note that other projects on the platform may offer different earning levels based on their requirements.
DevOps Engineer (Automation Systems) - Freelance AI Trainer
MindriftApply → Pass qualification(s) → Join a project → Complete tasks → Get paid. Project time expectations: Tasks are estimated to require around 10–20 hours per week during active phases, based on project requirements; This is an estimate, not a guaranteed workload, and applies only while the project is active. Note: Rates vary based on expertise, skills assessment, location, project needs, and other factors. Higher rates may be offered to highly specialized experts. Lower rates may apply during onboarding or non-core project phases. Payment details are shared per project.
Mindrift is looking for skilled DevOps / Automation Engineers (Infrastructure & Scaling) to join the Tendem project (https://tendem.ai/) and build and maintain scalable infrastructure for automation workflows within our hybrid AI + human environment. In this role, as an AI Pilot – that’s how we refer to this position at Mindrift – you’ll collaborate with Tendem Agents that handle repetitive tasks, while you provide infrastructure expertise, system reliability, and performance optimization to ensure stable and scalable automation pipelines. This part-time remote opportunity is ideal for professionals with hands-on experience in cloud infrastructure, system deployment, and supporting high-load automation environments. What We Do The Mindrift platform connects specialists with AI projects from major tech innovators. Our mission is to unlock the potential of Generative AI by tapping into real-world expertise from across the globe. About the Role This is a freelance role for a Tendem project. As a DevOps / Automation Engineer, you'll design, deploy, and maintain infrastructure supporting automation workflows, ensuring system stability, scalability, and high availability across environments. Key Responsibilities - Deploy and maintain self-hosted automation environments (e.g., n8n instances or similar systems). - Design and manage infrastructure to support high-volume workflows and large-scale data processing. - Scale automation systems to handle concurrent workloads and performance-intensive tasks. - Set up monitoring, logging, and alerting systems to track workflow performance and detect failures. - Implement alerts for critical events (e.g., node failures, performance drops, system instability). - Ensure system uptime, reliability, and fault tolerance across automation pipelines. Compensation On this project, contributors can earn up to $60 per hour equivalent, depending on their level and pace of contribution. Compensation varies across projects depending on scope, complexity, and required expertise. Please note that other projects on the platform may offer different earning levels based on their requirements.
DevOps Engineer (Automation Systems) - Freelance AI Trainer
MindriftApply → Pass qualification(s) → Join a project → Complete tasks → Get paid. Project time expectations: Tasks are estimated to require around 10–20 hours per week during active phases, based on project requirements; This is an estimate, not a guaranteed workload, and applies only while the project is active. Note: Rates vary based on expertise, skills assessment, location, project needs, and other factors. Higher rates may be offered to highly specialized experts. Lower rates may apply during onboarding or non-core project phases. Payment details are shared per project.
Mindrift is looking for skilled DevOps / Automation Engineers (Infrastructure & Scaling) to join the Tendem project (https://tendem.ai/) and build and maintain scalable infrastructure for automation workflows within our hybrid AI + human environment. In this role, as an AI Pilot – that’s how we refer to this position at Mindrift – you’ll collaborate with Tendem Agents that handle repetitive tasks, while you provide infrastructure expertise, system reliability, and performance optimization to ensure stable and scalable automation pipelines. This part-time remote opportunity is ideal for professionals with hands-on experience in cloud infrastructure, system deployment, and supporting high-load automation environments. What We Do The Mindrift platform connects specialists with AI projects from major tech innovators. Our mission is to unlock the potential of Generative AI by tapping into real-world expertise from across the globe. About the Role This is a freelance role for a Tendem project. As a DevOps / Automation Engineer, you'll design, deploy, and maintain infrastructure supporting automation workflows, ensuring system stability, scalability, and high availability across environments. Key Responsibilities - Deploy and maintain self-hosted automation environments (e.g., n8n instances or similar systems). - Design and manage infrastructure to support high-volume workflows and large-scale data processing. - Scale automation systems to handle concurrent workloads and performance-intensive tasks. - Set up monitoring, logging, and alerting systems to track workflow performance and detect failures. - Implement alerts for critical events (e.g., node failures, performance drops, system instability). - Ensure system uptime, reliability, and fault tolerance across automation pipelines. Compensation On this project, contributors can earn up to $60 per hour equivalent, depending on their level and pace of contribution. Compensation varies across projects depending on scope, complexity, and required expertise. Please note that other projects on the platform may offer different earning levels based on their requirements.
