Agile | Adaptive | Ardent
Release Train Engineer – DevOps Systems Lead
Location
Texas
Posted
3 days ago
Salary
0
Seniority
Senior
Job Description
Release Train Engineer – DevOps Systems Lead
VetsEZ
• Lead the Agile Release Train, facilitating key SAFe ceremonies including PI Planning, ART Sync, System Demos, Inspect & Adapt, and backlog refinement. • Coach Scrum Masters, Product Owners, and teams on SAFe principles, flow, and continuous improvement practices. • Drive program-level execution, tracking progress toward PI objectives and managing risks, issues, and dependencies across teams. • Collaborate with Product Management, System Architects, and stakeholders to ensure alignment of vision, roadmap, and program backlog. • Lead and mentor the DevOps Systems Team responsible for CI/CD pipelines, build automation, and shared platform services. • Prioritize and manage the team backlog to balance feature delivery, platform reliability, technical debt reduction, and automation initiatives. • Coordinate cross-team release planning and execution, ensuring alignment of release schedules, feature readiness, and environment availability. • Establish and monitor ART and DevOps flow metrics, including throughput, lead time, MTTR, deployment frequency, and change failure rate, to guide improvement efforts. • Prepare and deliver clear, concise briefings and status reports to senior leadership, identifying key achievements, risks, and decision points. • Align with other RTEs, program managers, and vendor partners to coordinate dependencies across trains, contracts, and external systems.
Job Requirements
- Bachelor’s degree in Engineering, Computer Science, Information Systems, Business, or a related discipline, or equivalent experience.
- A minimum of 7 years of experience in large-scale software delivery, with at least 3 years in an RTE, Program Manager, or similar leadership role in a SAFe environment.
- Demonstrated experience leading an Agile Release Train or equivalent multi-team program using the SAFe framework.
- Strong understanding of DevOps and DevSecOps practices, CI/CD pipelines, build and release management, and platform operations.
- Experience working with technologies such as Jenkins, GitHub Actions, or GitLab CI; artifact repositories; containerization including Docker and Kubernetes; and cloud platforms such as AWS.
- Familiarity with InterSystems IRIS or similar enterprise platforms, or experience coordinating closely with platform and infrastructure engineering teams.
- Proven ability to facilitate large-scale planning events, manage cross-team dependencies, and resolve impediments in complex environments.
- Excellent communication, facilitation, and stakeholder management skills, including experience briefing senior leadership.
- Experience supporting large-scale software development, data integration, or healthcare IT projects.
- Expertise with the SAFe framework; active RTE or SAFe Agilist certification is highly desirable.
- Ability to obtain a Government clearance.
- Experience with the Department of Veterans Affairs, VA tools, processes, and frameworks is a strong plus.
- Experience in federal contracting environments and familiarity with federal security and compliance requirements.
- Hands-on background in DevOps, systems engineering, or software development earlier in career.
- Experience with Jira, Confluence, and Git-based repositories for program and team-level tracking and collaboration.
- Relevant certifications such as SAFe RTE, SAFe SPC, AWS, or DevOps-related certifications are a plus.
Benefits
- Medical/Dental/Vision
- 401k with Matching
- Corporate Laptop
- PTO + Federal Holidays
- Training opportunities
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
DevOps Engineer II
360 Social AgencyWe Provides 360 services for Digital Marketing, Event Management & Web Development
• Assist in managing multiregion and multicloud infrastructure, ensuring resiliency, scalability, and performance. • Support infrastructure provisioning and deployments primarily on GCP, while gaining exposure to other cloud providers. • Design, deploy, and maintain agentic AI workflows and automation systems, integrating LLMs, orchestration frameworks, APIs, and observability tooling to improve operational efficiency, incident response, and developer productivity. • Collaborate with development teams to design and maintain CI/CD pipelines in GitLab CI. • Work on Kubernetes cluster management (GKE and potentially other managed K8s offerings). • Contribute to GitOps-based deployments using ArgoCD. • Help automate infrastructure with Terraform and other infrastructure-as-code tools. • Monitor system health and participate in the on-call rotation, contributing to incident response, troubleshooting, and root cause analysis to improve reliability. • Document processes, create runbooks, and help improve operational practices.
Staff Engineer, Site Reliability
BabylistBabylist eases the path to parenthood, offering helpful content, a curated store, and a universal online baby registry through which new parents can discover, r
Who We Are Babylist is the leading platform for expecting and new families. More than 10 million people shop with Babylist every year, making it the go-to destination for seamless purchasing, guidance, and expert recommendations. As a modern, AI-forward tech company, Babylist has expanded from a universal registry into a full ecosystem — the Babylist Shop, Babylist Health, Babylist Money, NYC and LA showrooms, branded content, and more — generating $750M in revenue in 2025. Building the generational brand in baby, Babylist is reshaping the $235B kids and baby market and helping parents feel confident, connected, and cared for at every step. Our Ways of Working Babylist is remote-first with team members across the U.S. and Canada who move fast, think smart, and use AI as part of how they work every day — not as an experiment, as an expectation. We come together twice a year to build the relationships behind the work, and we hire people who are genuinely excited about what's possible and prove it through how they show up. How We Build Babylist is in the middle of a fundamental shift in how software gets made, and we are not tiptoeing into it. We are rebuilding our engineering culture around a simple belief: AI changes everything. How teams are structured, how decisions get made, how fast ideas become working software. Our engineers own problems end to end, working directly with product, design, and business partners with short feedback loops and real stakeholder access. We ship, learn, and iterate fast. When something is not working, we throw it out and start over — project failure and personal failure are not the same thing here. AI tools are as natural to our workflow as an IDE or version control. We are not exploring this, we are living it. Our engineers use AI to explore tradeoffs, pressure-test designs, and move from problem to solution in hours instead of days. They generate code with AI so they can stay focused on the decisions that actually require human judgment — not the routine ones. More velocity means more time for craft: better test coverage, stronger architecture, and deeper customer understanding. We hold ourselves to a higher quality bar because of AI, not in spite of it. We are building this playbook in real time, and we are looking for people who want to build it with us. If you have already changed how you work because of AI — or you are ready to — and you care more about shipping something great than following a prescribed process, we should talk. Our Tech Stack - Ruby on Rails - AWS - Sidekiq - MySQL - Redis What the Role Is Babylist's Platform team is the foundation every engineering team builds on — and this role is at the center of keeping it reliable, fast, and scalable. As a Staff SRE, you'll own the infrastructure and reliability practices that support 9 million+ users and the engineers who build for them. Babylist started as an e-commerce and registry platform, and we're actively growing beyond that — into health, media, mobile, and new product surfaces that don't exist yet. The Platform team is the foundation that makes all of it possible. This isn't a maintenance role — you'll be actively evolving how we build and operate AWS infrastructure, CI systems, and developer tooling. You'll work cross-functionally across all of Babylist Engineering, which means your decisions have wide leverage. Who You Are - Deep hands-on Terraform expertise — you own IaC, not just contribute to it - Proven AWS experience at scale — EKS, RDS, cloud networking, DNS, CDNs, load balancers — you know the gotchas - Experienced operating Kubernetes in production — you've debugged the hard stuff, not just deployed the easy stuff - Comfortable designing and improving CI/CD systems — CircleCI, GitHub Actions, or similar; you care about developer velocity, not just pipeline uptime - Strong observability instincts — Datadog, Sentry, PagerDuty, Cronitor — you build alerting that's actionable, not noisy - Experienced with on-call and incident management — you've run the post-mortems and actually changed things afterward - Comfortable supporting developers across local, staging, and production — you're a resource, not a gatekeeper - You naturally reach for AI in your work — at Babylist, every team uses AI daily. You're already using it to move faster and improve your output, and you stay curious about what's coming next. How You Will Make An Impact - Infrastructure ownership — manage and evolve our AWS environment using Terraform, keeping EKS clusters, databases, and core services current and performant - CI/CD reliability — own the speed and reliability of our CI systems for the full Engineering org — every deploy starts here - Developer support — be the person engineers turn to when environments break; unblock them fast across local, staging, and production - Monitoring & alerting standards — establish and socialize best practices so the right people get paged for the right reasons - Incident response — lead or support incident response, drive post-incident reviews, and close the loop so the same thing doesn't happen twice - Platform strategy — contribute to architectural decisions that shape how Babylist's infrastructure evolves over the next several years Why This Role - Platform is the team every engineering team depends on — your work has outsized leverage across the entire product org, not just one area - The infrastructure is solid but actively evolving — you're not inheriting chaos, you're shaping what comes next - This is a staff-level role with real cross-team visibility — you'll influence how Babylist engineers build and ship, not just keep the lights on - You'll work on systems that support millions of families at a high-stakes life moment — the scale is real and the product context makes the reliability work matter About Compensation We use a market-based approach to compensation. The starting salary range for this role is: $226,673 to $271,991 Your starting salary will be based on your location, experience, and qualifications, with increases over time tied to performance, role growth, and internal pay equity. Why You Will Love Working At Babylist Our Culture - We work with focus and intention, then step away to recharge - We believe in exceptional management and invest in tools and opportunities to connect with colleagues - We build products that positively impact millions of people's lives - AI is intentionally embedded in how we work, create, and scale—supporting innovation and impact Growth & Development - Competitive pay and meaningful opportunities for career advancement - We believe technology and data can solve hard problems - We're committed to career progression and performance-based advancement Compensation & Benefits - Competitive salary with equity and bonus opportunities - Company-paid medical, dental, and vision insurance - Retirement savings plan with company matching and flexible spending accounts - Generous paid parental leave and PTO - Remote work stipend to set up your office - Perks for physical, mental, and emotional health, parenting, childcare, and financial planning Important NoticesRecorded Interviews. Babylist uses an interview recording tool to record and transcribe interviews for evaluation purposes in accordance with applicable privacy laws. By participating in an interview, you consent to this recording and transcription. Interview Integrity. AI is part of how we work at Babylist — we expect you to use it too. Your application and interviews should still reflect you and your own thinking. We'll tell you when AI is encouraged. Misrepresentation at any stage may result in removal from consideration for this and future roles. Connections at Babylist. If you have a family member or close personal relationship with a current Babylist employee, please let your recruiter know. This helps us keep our process fair and transparent for everyone. Protect Yourself from Scams. All official outreach comes from the Babylist Talent Team via @babylist.com email addresses only. We will never ask for payment or personal financial information. If you receive outreach via WhatsApp, Telegram, or a non-Babylist email — it's not us. Verify open roles at babylist.com/careers.
DevOps Engineer II
LearneoPioneering a platform of builder-driven productivity and learning businesses.
• Assist in managing multiregion and multicloud infrastructure, ensuring resiliency, scalability, and performance. • Support infrastructure provisioning and deployments primarily on GCP, while gaining exposure to other cloud providers. • Design, deploy, and maintain agentic AI workflows and automation systems, integrating LLMs, orchestration frameworks, APIs, and observability tooling to improve operational efficiency, incident response, and developer productivity. • Collaborate with development teams to design and maintain CI/CD pipelines in GitLab CI. • Work on Kubernetes cluster management (GKE and potentially other managed K8s offerings). • Contribute to GitOps-based deployments using ArgoCD. • Help automate infrastructure with Terraform and other infrastructure-as-code tools. • Monitor system health and participate in the on-call rotation, contributing to incident response, troubleshooting, and root cause analysis to improve reliability. • Document processes, create runbooks, and help improve operational practices.
• Own and evolve observability strategy, including monitoring, alerting, dashboards, logging, and distributed tracing. • Define and manage SLIs, SLOs, and reliability metrics. • Lead incident response, postmortems, and continuous improvement initiatives. • Improve MTTD and MTTR through automation and operational excellence. • Integrate observability into CI/CD pipelines and software delivery workflows. • Build and maintain reliable cloud infrastructure on AWS and Kubernetes. • Mentor engineers and promote SRE best practices across the organization




