Powering Change
Senior DevOps Engineer, Drupal
Location
United States
Posted
13 hours ago
Salary
$147K - $170K / year
Seniority
Senior
Job Description
Senior DevOps Engineer, Drupal
MetroStar
• Design, develop, and maintain custom Drupal modules, integrations, and content models using modern Drupal engineering practices. • Build and support headless CMS capabilities using JSON:API to deliver content to downstream applications and digital experiences. • Develop automated testing, deployment, and CI/CD pipelines to ensure reliable and secure software releases. • Collaborate with DevOps and platform teams to deploy and operate Drupal applications in containerized cloud environments. • Perform application updates, security patching, performance tuning, and ongoing platform maintenance. • Troubleshoot and resolve application, integration, and production issues to maintain system stability and availability. • Contribute to platform modernization initiatives, including cloud-native architectures, infrastructure improvements, and Drupal version upgrades. • Partner with cross-functional teams to translate business requirements into scalable, user-centered technical solutions. • Participate in code reviews and engineering best practices to ensure high-quality, maintainable software. • Document technical designs, implementation approaches, and operational procedures to support long-term platform sustainability.
Job Requirements
- 10+ years of Drupal development with strong production experience
- Demonstrated experience with with modern Drupal custom engineering practices
- Experience running Drupal in a containerized environment (Docker)
- Comfort with Kubernetes at an operational level (reading manifests, coordinating with a platform team on deployments)
- Experience with CI/CD as it applies to Drupal: automated config import, drush deploy patterns, database update hooks, cache strategy on deploy
- Ability to write PHPUnit and Functional tests for custom Drupal code and knowledge of efficient testing strategies
- Demonstrated experience with Drupal as a headless CMS serving content via JSON:API
- Understanding of entity normalization, resource types, includes, and filtering patterns in JSON:API
Benefits
- Health, dental, and vision insurance
- 401(k) retirement plan with company match
- Paid time off (PTO) and holidays
- Parental Leave and dependent care
- Flexible work arrangements
- Professional development opportunities
- Employee assistance and wellness programs
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Working network engineer experience and knowledge of network components and vendors such as Arista, Aruba and Cisco switches, Juniper and Cisco routers, Palo Alto and Cisco firewalls, Aruba and Arista wireless, Aruba Clearpass authentication, Arista Cloud Vision (CVP), f5 load balancers, Bluecat and Micetro DNS, and the like • Working knowledge of routing protocols, routing/switching technologies, enterprise DNS, firewalls, VPN, load balancing, Internet proxies, wireless, and authentication • Software-Defined Network (SDN) deployment and support knowledge (Versa SDWAN, Redhat Openshift, VMware, and NSX experience preferred) • Ability to troubleshoot large scale data center enterprise network environments and resolve problems • Ability to develop software and tools to proactively monitor global network, capacity planning and load testing, and respond to network events • Ability to develop network automation software and tools to proactively enhance operational support of the network • Ability to communicate and collaborate closely with multiple development and operations teams to build reliable system designs • Understanding of Network Monitoring and Observability and taking appropriate actions on Fault and Performance, Packet Analysis, Logging, Configuration Management and Automation • Must understand, configure, and have knowledge of modern network automation scripting including Python, Ansible, etc. • Support 24x7x365 global network data center operations • Participate in the on-call rotation schedule for after-hours and weekend support • Ability to understand and work in a complex network with moderate supervision in a global team environment • Excellent customer service and English communications skills (written and oral) • Fluency in ticketing systems (eg., ServiceNow) and ITIL processes and procedures • Assist in the Change Management process as necessary, including participating in maintenance activities during off hours including weekends
Site Reliability Engineer
FabricThe national pay range for this role is $165,000.00 - $210,000.00 per year. Actual compensation will be determined by factors such as the candidate's geographic market, experience, skills, and qualifications. Certain roles may also be eligible for additional compensation. If your compensation requirement is greater than our posted range, please still consider applying; a determination can be made based on unique qualifications. Expected compensation ranges for this role may change over time.
Role Description As a Site Reliability Engineer, you will own and evolve the infrastructure powering healthcare experiences for millions of patients. This role bridges the gap between traditional infrastructure excellence and the future of AI-driven operations. You will act as a primary architect for our AWS and Kubernetes (EKS) environment, ensuring the platform is resilient, scalable, and compliant while exploring how agentic workflows can modernize SRE practices. What You'll Do - Infrastructure & Kubernetes Orchestration - Designing, deploying, and maintaining production Kubernetes (EKS) clusters to ensure enterprise-grade availability for our users. - Eliminating manual configuration by building and managing a scalable infrastructure state entirely through Terraform. - Optimizing the AWS footprint—specifically EC2, RDS, and S3—to balance high performance with cost-efficiency and reliability. - AI-Assisted Operations & Automation - Exploring and deploying agentic workflows for AI-assisted runbooks that automate complex operational decisions and repetitive tasks. - Building and evolving deployment pipelines using GitHub Actions or Semaphore to ensure delivery is both rapid and safe. - Focusing on toil reduction by developing internal tools that replace manual operational work with intelligent, autonomous systems. - Observability & Incident Management - Driving the evolution of the observability stack in Datadog by implementing the sophisticated metrics, traces, and logs needed to meet SLOs. - Leading incident response efforts and facilitating the blameless postmortems that help systematically reduce recovery time (MTTR). - Defining and monitoring the SLIs and SLOs that ensure the platform consistently meets rigorous healthcare performance standards. - Compliance & Collaboration - Ensuring every piece of infrastructure remains fully compliant with HIPAA and other critical healthcare regulatory requirements. - Mentoring engineers across the company on reliability best practices and contributing a clinical-safety perspective to cross-functional design reviews. Qualifications - 5+ years of experience in SRE, DevOps, or Platform roles managing production environments at scale. - Expert technical depth in AWS (EKS, EC2, RDS, S3) and production-grade Kubernetes management. - Proficiency with modern tooling including Terraform (IaC), Datadog (Observability), and CI/CD systems. - Deeply proficient coding and scripting skills in Python, Bash, Ruby, or Go. - Preferred experience building agentic workflows or AI-assisted tooling to drive operational efficiency. - A "rigor-first" mindset with a dedication to HIPAA-compliant, high-availability architecture. Requirements - You are a deeply proficient engineer who excels at the intersection of cloud infrastructure, automation, and system design. - You possess a meticulous approach to observability and a passion for finding the "root cause" rather than just applying a patch. - You enjoy exploring the "next frontier" of SRE, including how AI and agentic tools can make operations more efficient. - You thrive in fast-paced environments where technical rigor is balanced with pragmatism and clinical-grade safety. This Might Not Be The Right Fit If... - You prefer working on static infrastructure rather than evolving systems through code and automation. - You are uncomfortable with the "agile" pace of tech-driven platform development or integrating AI tools into your daily workflow. - You prefer a siloed role that does not involve active participation in incident response or collaborative postmortems. Benefits - The national pay range for this role is $135,000.00 – $160,000.00 per year. - Actual compensation will be determined by factors such as the candidate's geographic market, experience, skills, and qualifications. - Certain roles may also be eligible for additional compensation, including a comprehensive benefits package such as medical, dental, vision, unlimited PTO, and a 401(k) plan, stock options and bonuses. - If your compensation requirement is greater than our posted range, please still consider applying; a determination can be made based on unique qualifications. - Expected compensation ranges for this role may change over time.
Site Reliability Engineer
Referral BoardRemote's Total Rewards philosophy is to ensure fair, unbiased compensation and fair equity pay along with competitive benefits in all locations in which we operate. We do not agree to or encourage cheap-labor practices and therefore we ensure to pay above in-location rates. At Remote, we foster internal mobility as a key element of our culture of employee growth and development, supported by a compensation philosophy that guarantees pay equity and fairness.
Role Description We are Cloud Infrastructure SREs that integrate, scale, and evolve multi-cloud infrastructure across 4 Cloud Service Providers, 70+ globally distributed regions, and tens of thousands of hosts to power Elastic Cloud. We tackle hard problems at scale through automation, Infrastructure as Code (IaC), configuration management, and purpose-built software that eliminates toil and improves reliability. If that scale of challenge genuinely excites you, we'd love to hear from you. What you will be doing - Engineering software to automate large-scale systems — building internal tools and services, not just running scripts. - Optimizing the reliability and lifecycle of hosts across multiple cloud providers. - Strengthening our observability posture — crafting alerting and monitoring systems that drive incident prevention over incident response. - Scaling global infrastructure and evolving the infrastructure management processes to meet growing demand. - Contributing to code reviews, sharing your work, planning what we need to do next, and mentoring teammates. - Being part of a balanced SRE on-call rotation: responding to incidents, improving runbooks, leading postmortems, and championing reliability improvements. Qualifications - Experience building software with Golang. You are also comfortable reviewing others' code and have opinions about what good code looks like. - Production experience operating large-scale cloud compute (hundreds of hosts or more) via automated workflows. - Deep experience with Linux systems — you are at home in the terminal debugging at the OS level. - Proficiency working with containerized workloads in production. - A customer-first, systems-thinking approach to operational problems — you care about root causes, not just symptoms. - Comfortable working across time zones in both real-time and asynchronous contexts. - You write clear and maintainable documentation such as software designs, runbooks, architecture diagrams/decisions, postmortems, etc. - You communicate project status regularly and clearly, flag blockers early, and follow through on action items. - A sensible approach to AI integration — identifying where AI tools genuinely reduce operational burden and embedding them into workflows without adding complexity. Bonus Points - Production experience with any of: Terraform, Puppet, Ansible, Argo CD, Argo Workflows, CUE, Docker, Kubernetes, Ubuntu, or Ubuntu Live Patch. - Experience being on-call during incidents and using observability tools (e.g. Elastic Stack, Graphite, Prometheus, Influx) to diagnose issues, quantify impact, and confirm mitigations. - Hands-on experience engineering solutions with the Elastic Stack. Compensation Compensation for this role is in the form of base salary. This role does not have a variable compensation component. The typical starting salary range for new hires in this role is: - $143,100 — $175,000 USD The typical starting salary range for this role in select locations (including Seattle WA, Los Angeles CA, the San Francisco Bay Area CA, and the New York City Metro Area) is: - $143,100 — $175,000 USD Benefits - Competitive pay based on the work you do here and not your previous salary. - Health coverage for you and your family in many locations. - Ability to craft your calendar with flexible locations and schedules for many roles. - Generous number of vacation days each year. - Increase your impact - We match up to $2000 (or local currency equivalent) for financial donations and service. - Up to 40 hours each year to use toward volunteer projects you love. - Embracing parenthood with a minimum of 16 weeks of parental leave. Additional Information As a distributed company, diversity drives our identity. Whether you’re looking to launch a new career or grow an existing one, Elastic is the type of company where you can balance great work with great life. We strive to have parity of benefits across regions and while regulations differ from place to place, we believe taking care of our people is the right thing to do. Different people approach problems differently. We need that. Elastic is an equal opportunity/affirmative action employer committed to diversity, equity, and inclusion. We welcome individuals with disabilities and strive to create an accessible and inclusive experience for all individuals.
• You monitor and maintain our servers. • You manage internal monitoring. • You are responsible for the development and maintenance of the IT infrastructure. • You manage our infrastructure as code (e.g., with Terraform, Ansible). • You integrate security checks directly into our CI/CD pipelines (shift-left approach) and continuously improve them. • You perform regular vulnerability scans and penetration tests. • You handle secrets management and the secure handling of credentials. • You ensure compliance with security standards and regulatory requirements (e.g., ISO 27001, GDPR). • You conduct threat modeling and assess security risks early. • You document processes and infrastructure. • You monitor current trends and introduce new tools and best practices. • You advise colleagues on security-related issues.



