Twilio is a Platform-as-a-Service (PaaS) company established in 2007. In support of a flexible workplace, Twilio has previously posted freelance, flexible schedule, part-time, hybr
Software Architect, Reliability Engineering
Location
California + 9 moreAll locations: California | Colorado | Illinois | New Jersey | New York | Maryland | Massachusetts | Minnesota | Vermont | Washington
Posted
100 days ago
Salary
$227.8K - $335K / year
Seniority
Lead
Job Description
Software Architect, Reliability Engineering
Twilio
• Partner with senior technical leaders across Twilio to set and communicate the reliability strategy, translating business goals into measurable outcomes. • Influence company-wide architectural decisions while balancing long-term vision with near-term and compliance needs. • Lead the design, implementation, and operation of scalable solutions and paved roads that enable reliable, high-traffic services; • Influence company-wide architectural decisions to focus on availability, performance, resilience, and cost efficiency using Kubernetes, AWS, Terraform, and modern observability. • Ensure integrity and quality across the service lifecycle; design fault-tolerant architectures, incident response, disaster recovery, and capacity/cost management. • Collaborate with product and cross-functional teams to identify reliability risks and convert them into actionable designs, programs, and tooling. • Establish and champion reliability practices and drive systemic improvements. • Mentor and grow engineers and technical leaders • Track and apply emerging SRE, cloud, and large-scale systems best practices; introduce pragmatic innovations that improve reliability at scale.
Job Requirements
- 15+ years of experience in Reliability Engineering, Software Engineering, DevOps roles with a focus on infrastructure, backend systems, and reliability, including as a principal/architect.
- Strong experience in driving strategic technical decisions and defining long-term technical vision.
- In-depth understanding of the role of Reliability Engineering in a large and diverse SaaS organization.
- Experience driving cross-org technical architecture outcomes.
- Knowledge of cloud architecture, devops practices, and large-scale systems design with microservices.
- Bachelor's or Master's degree in Computer Science, Engineering, or a related field (or equivalent experience).
- Strong production experience, including operational management, scaling, partitioning strategies, and tuning for performance and reliability in high-scale environments.
- Hands-on experience with Kubernetes (e.g., EKS), deploying and managing stateful services, and cloud services like AWS.
- Proficiency in infrastructure-as-code tools such as Terraform or CloudFormation for automating infrastructure.
- Expertise in observability tools (e.g., Prometheus, Grafana, Datadog) for monitoring distributed systems and setting up alerting.
- Proficient in at least one programming language (e.g., Go, Python, Java) for building automation and tooling.
- Experience designing incident response processes, SLOs/SLIs, runbooks, and participating in on-call rotations.
- Experience running cross-functional post-incident reviews and driving improvements.
- Strong understanding of distributed systems principles, including consensus, durability, throughput, and availability tradeoffs.
- Proven track record of leading reliability improvements in data-intensive or mission-critical systems and collaborating with engineering teams.
- Excellent problem-solving, analytical, verbal, and written communication skills, with the ability to work in cross-functional and distributed environments.
- Demonstrated leadership in mentoring teams, influencing decisions, and balancing long-term objectives with short-term needs.
- Ability to influence and build effective working relationships with all levels of the organization.
Benefits
- health care insurance
- 401(k) retirement account
- paid sick time
- paid personal time off
- paid parental leave
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Standardise and automate artefact generation across multiple platforms • Develop, manage, and continuously improve end-to-end release processes • Optimise source control workflows and CI/CD pipelines • Manage and assist in microservice development and deployment life cycles • Maintain and improve build systems and infrastructure reliability • Implement and manage configuration management solutions • Apply and enforce basic security best practices across pipelines and infrastructure • Debug, troubleshoot, and resolve pipeline and infrastructure issues efficiently • Collaborate cross-functionally with engineering, QA, and production teams • Document processes and contribute to operational best practices
Program Description: NOTE: This is a short-term position with an expected duration of April 1, 2026 - September 15th, 2026. This program provides IT services focused on building, securing, and operating the Department of Veteran Affairs LGY’s home loan product-line technology. The contract’s purpose is to modernize and sustain critical home loan technology systems that support LGY’s delivery of mortgage-related services to program stakeholders, to provide continuous delivery and security integration. Position Description: The DevSecOps Engineer/Solution Architect is responsible for defining architectural direction, making crucial architectural decisions, and ensuring implementations meet the specified criteria. Ideal candidates should have substantial experience in an AWS environment, managing Jenkins servers with multiple CI/CD pipelines across various environments, and working with GitHub and GitOps. Proficiency with Terraform/Ansible or similar tools is preferred. Experience with AWS CodeBuild and CodeDeploy is a plus. Responsibilities: · Architecture & Technical Leadership Define the end-to-end DevSecOps and cloud architecture approach for CI/CD, infrastructure automation, and deployment patterns. · Make and document architectural decisions (standards, patterns, tool selection, tradeoffs) aligned to program requirements. · Establish architectural guardrails and acceptance criteria to ensure implementations meet security and operational expectations. · Produce and maintain architecture artifacts (logical/physical diagrams, reference architectures, standards, decision records). · CI/CD Platform Engineering (Jenkins + Multi-Environment Delivery) Architect, configure, and maintain Jenkins servers supporting multiple CI/CD pipelines across Dev/Test/Stage/Prod environments. · Design scalable pipeline patterns (shared libraries, templates, standard stages, approvals, and promotion strategies). · Implement strategies for high availability, performance, access control, and plugin governance for Jenkins. · Drive pipeline reliability through standardized build/deploy processes, error handling, and repeatable automation. · GitHub & GitOps Enablement Integrate pipelines with GitHub (branching strategies, PR workflows, hooks/webhooks, checks, release tagging). Establish and support GitOps workflows (declarative configuration, environment promotion, and drift management). · Promote consistent repository and workflow standards across engineering teams. · Infrastructure as Code (IaC) & Automation Design and implement Infrastructure as Code using Terraform, Ansible, or comparable tools to enable repeatable and secure provisioning. · Build automation for environment creation, configuration management, and compliance alignment. · Ensure IaC follows best practices: modularity, versioning, secure secrets handling, and policy enforcement. · Cloud Engineering & Deployment Strategy (AWS) Architect and oversee AWS environment usage, ensuring alignment to cloud best practices and program constraints. · Define secure deployment patterns and connectivity requirements across AWS accounts/environments. · Collaborate with security and operations to ensure logging/monitoring, identity/access, and encryption requirements are satisfied. · Integration with AWS Native CI/CD (Plus) Where applicable, incorporate AWS CodeBuild and AWS CodeDeploy into delivery workflows or migration plans. · Evaluate and recommend when Jenkins vs. AWS native CI/CD services is the best fit, and define integration approaches. · Stakeholder Collaboration & Delivery Assurance Partner with application teams, cloud engineers, security, and SRE/operations to ensure delivery solutions meet functional and non-functional requirements. · Provide technical oversight and mentorship to DevSecOps engineers and platform teams. · Participate in planning, backlog refinement, and technical reviews to ensure architectural alignment.
Program Description: NOTE: This is a short-term position with an expected duration of April 1, 2026 - September 15th, 2026. This program provides IT services focused on building, securing, and operating the Department of Veteran Affairs LGY’s home loan product-line technology. The contract’s purpose is to modernize and sustain critical home loan technology systems that support LGY’s delivery of mortgage-related services to program stakeholders, to provide continuous delivery and security integration. Position Description: This position focuses on creating and modifying pipelines using GitHub Enterprise Cloud repositories. The role requires expertise in developing and maintaining pipelines using Jenkins servers and troubleshooting deployment issues. Candidates should incorporate metrics such as Mean Time To Build (MTTB) and Mean Time To Deploy (MTTD). Experience with multiple CI/CD tools, Git Actions, and code scanning tools like CodeQL, Fortify, SonarQube, and Nexus is desired. Familiarity with automation tools such as Selenium, Cucumber, Maven, and AWS CodeBuild/CodeDeploy is advantageous. Responsibilities: · CI/CD Pipeline Engineering Design, implement, and maintain CI/CD pipelines aligned to team and program delivery practices. · Create and modify pipeline definitions and workflows tied to GitHub Enterprise Cloud repositories. · Develop and maintain pipeline jobs and shared libraries on Jenkins (pipelines-as-code, scripted/declarative approaches as applicable). · Standardize pipeline patterns and reusable templates to reduce duplication and improve maintainability. · Deployment Troubleshooting & Operational Support Diagnose and resolve build failures, deployment issues, and environmental inconsistencies across lower and higher environments. · Perform root cause analysis (RCA) and implement corrective actions to prevent recurring failures. · Partner with engineering, QA, security, and platform teams to remediate pipeline blockers and streamline deployments. · DevSecOps Metrics & Continuous Improvement Instrument and report delivery metrics including MTTB and MTTD; identify bottlenecks and implement improvements. · Monitor pipeline performance (queue time, build duration, failure rates, flaky tests) and drive optimization. · Improve automation coverage and reduce manual steps through pipeline enhancements. · Security & Code Quality Integration (“Shift Left”) Integrate code scanning and quality gates into pipelines using tools such as: CodeQL, Fortify, SonarQube, and artifact/repository controls like Nexus Ensure pipelines enforce consistent security and quality checks prior to merge/release. · Collaborate with security stakeholders to tune scanning thresholds, manage findings, and support remediation workflows. · Automation Enablement Implement or enhance automation steps using tools such as: Selenium, Cucumber, Maven Support automated build/test/deploy stages and improve feedback loops to developers. · Documentation & Enablement Document pipeline standards, usage guides, and operational runbooks. · Provide guidance and mentoring to teams on CI/CD best practices, branching strategies, and pipeline troubleshooting.
• You will understand how our services are performing and the capabilities around their capacity and security, and may need to assist in responding to incidents or outages. • Participate in design, creation, implementation and maintain tasks to ensure our hybrid (multi-instance) infrastructure and architecture surpass that of our customers' needs. • Ensure Backup & Restore capabilities are designed and implemented to industry best practices. Implement autonomous systems and reporting for repeatable testing of recovery point and time objectives. • Ensure infrastructure and services and monitored to meet our service level objectives. • Ensure automated alerting and automated escalation of critical issues • Strive to achieve complete automation of repetitive & arduous tasks. • Continuously improve operational cost-modeling and cost-tracking. • Encourage GitOps/Infrastructure-as-code best practices. • Provide general support for operational tools & services (outages, upgrades, triage, developer assistance, etc).



