Docusign logo
Docusign

Bringing Agreements to Life

Senior Site Reliability Engineer

DevOps EngineerDevOps EngineerOtherRemoteSeniorTeam 5,001-10,000Since 2003H1B SponsorCompany SiteLinkedIn

Location

United States

Posted

78 days ago

Salary

$157.5K - $254.4K / year

Seniority

Senior

Job Description

Senior Site Reliability Engineer

Docusign

Company Overview Docusign brings agreements to life. Over 1.5 million customers and more than a billion people in over 180 countries use Docusign solutions to accelerate the process of doing business and simplify people’s lives. With intelligent agreement management, Docusign unleashes business-critical data that is trapped inside of documents. Until now, these were disconnected from business systems of record, costing businesses time, money, and opportunity. Using Docusign’s Intelligent Agreement Management platform, companies can create, commit, and manage agreements with solutions created by the #1 company in e-signature and contract lifecycle management (CLM). What you'll do We are looking for a self-motivated, driven and creative Senior Site Reliability Engineer to join the Site Reliability team. Metrics and analytics drive engineering at DocuSign and ensure that we are dedicating valuable engineering cycles to the right places. This role is a unique opportunity to impact the entire DocuSign team and drive adoption. We are looking for a Senior Site Reliability Engineer (Senior SRE) to lead reliability initiatives for high‑impact services. In this role, you will own the reliability, scalability, and performance of one or more critical systems, lead the design and implementation of automation to eliminate toil and reduce operational risk, drive improvements in observability, incident response, and production readiness across teams and partner closely with product engineering, platform, security, and release management to ship changes safely and quickly. Senior SREs at Docusign operate as hands‑on technical leaders: they set the reliability bar for their domain, mentor other engineers, and lead cross‑functional projects that materially improve availability and customer experience. Ideally, you have a background in software development, incident management, service catalogs, request tracing systems, time series telemetry platforms, application performance management tools or log management tools. The role requires an on-call rotation every 4 weeks. This position is an individual contributor role reporting to the Senior Manager, SRE. Responsibility - Design, implement, and operate highly available, scalable services in cloud environments (primarily Azure, with some multi‑cloud scenarios) - Define and evolve SLOs/SLIs, error budgets, and capacity strategies for owned services; use them to guide engineering trade‑offs and release decisions - Analyze patterns in incidents and outages; own long‑term reliability improvements for your domain and contribute to reliability strategy across services - Write high quality code that is easy to maintain and test - Ensure design and architecture is extensible across projects, and participate in technical design and code reviews - Identify operational toil and lead automation efforts to eliminate it—deployment, runbook, and remediation workflows that make incidents rarer and faster to resolve - Develop robust, well‑tested tooling and shared libraries that are adopted across multiple teams - Improve CI/CD pipelines and guardrails to reduce change failure rate while increasing deployment velocity - Design and implement logging, metrics, tracing, and alerting for complex distributed systems; ensure signals are actionable and aligned to business impact - Build and automate tools and solutions for incident impact analysis and effective mitigation - Participate in and often lead incident response for Sev0–Sev2 events: triage, mitigation, coordination, and clear communication - Perform and contribute to blameless post‑incident reviews, root‑cause analysis, and follow‑through on corrective actions - Work with Operations and Incident Command teams during and post incidents to drive excellence in Incident Management Process - Compose and analyze dashboard to highlight areas of the business that need attention and help drive organizational KPI - Create and respond to system generated alerts to maintain system health - Work with Operations and Engineers to fill any gaps in alerting and telemetry - Act as the primary SRE partner for one or more engineering teams—shaping architecture, reviewing designs, and embedding reliability best practices - Mentor and coach other SREs and software engineers on topics such as debugging, observability, incident management, and performance optimization - Contribute to and help standardize SRE practices, runbooks, and production readiness criteria across CPE and product teams - Work with Product Management, collaborators and other developers to understand design requirements and provide estimates for development - Learn and grow in all key technologies in Docusign and be a partner to Eng and Operations teams Job Designation Remote: Employee is not required to be in or near an office frequently and works from a designated remote work location for the majority of the time. Positions at Docusign are assigned a job designation of either In Office, Hybrid or Remote and are specific to the role/job. Preferred job designations are not guaranteed when changing positions within Docusign. Docusign reserves the right to change a position's job designation depending on business needs and as permitted by local law. What you bring Basic - 8+ years of experience in Site Reliability Engineering, DevOps, or Software Engineering roles with ownership of production systems at scale (or equivalent experience) - Experience coding in at least one modern language (e.g., Go, Python, C#, Java), with the ability to design, implement, test, and debug production‑grade automation and services - Practical experience operating large‑scale services in public cloud (Azure preferred; AWS/GCP acceptable with willingness to learn Azure) - Experience with Linux, networking fundamentals, and common infrastructure components (load balancers, DNS, certificates, queues, caches, databases) - Experience with Observability stacks (e.g., Prometheus/Grafana, OpenTelemetry/Chronicle, centralized logging) - Experience with CI/CD systems and deployment strategies (blue/green, canary, rolling updates) - Experience with incident management and on‑call operations for 24x7 services - Experience in building dashboards and metrics analysis Preferred - Strong analytical and problem-solving skills - Experience in high‑availability, regulated, or customer‑facing SaaS environments - Background in reliability practices such as chaos testing, capacity modeling, and performance tuning - Exposure to release management/unified release practices and safe rollout strategies (feature flags, staged rollouts, configuration‑driven changes) - Demonstrated leadership driving cross‑team initiatives: reliability programs, migrations, or major refactors - Strong written and verbal communication skills; ability to explain complex technical topics to both engineers and non‑technical stakeholders Wage Transparency Pay for this position is based on a number of factors including geographic location and may vary depending on job-related knowledge, skills, and experience. Based on applicable legislation, the below details pay ranges in the following locations: California: $157,500.00 - $254,350.00 base salary Illinois, Colorado, Massachusetts and Minnesota: $151,200.00 - $213,600.00 base salary Washington, Maryland, New Jersey and New York (including NYC metro area): $151,200.00 - $222,450.00 base salary Washington DC: $157,500.00 - $222,450.00 base salary Ohio: $131,900.00 - $186,275.00 base salary This role is also eligible for the following: - Bonus: Sales personnel are eligible for variable incentive pay dependent on their achievement of pre-established sales goals. Non-Sales roles are eligible for a company bonus plan, which is calculated as a percentage of eligible wages and dependent on company performance. - Stock: This role is eligible to receive Restricted Stock Units (RSUs). Global benefits provide options for the following: - Paid Time Off: earned time off, as well as paid company holidays based on region - Paid Parental Leave: take up to six months off with your child after birth, adoption or foster care placement - Full Health Benefits Plans: options for 100% employer paid and minimum employee contribution health plans from day one of employment - Retirement Plans: select retirement and pension programs with potential for employer contributions - Learning and Development: options for coaching, online courses and education reimbursements - Compassionate Care Leave: paid time off following the loss of a loved one and other life-changing events Life at Docusign Working here Docusign is committed to building trust and making the world more agreeable for our employees, customers and the communities in which we live and work. You can count on us to listen, be honest, and try our best to do what’s right, every day. At Docusign, everything is equal. We each have a responsibility to ensure every team member has an equal opportunity to succeed, to be heard, to exchange ideas openly, to build lasting relationships, and to do the work of their life. Best of all, you will be able to feel deep pride in the work you do, because your contribution helps us make the world better than we found it. And for that, you’ll be loved by us, our customers, and the world in which we live. Accommodation Docusign is committed to providing reasonable accommodations for qualified individuals with disabilities in our job application procedures. If you need such an accommodation, or a religious accommodation, during the application process, please contact us at accommodations@docusign.com. If you experience any issues, concerns, or technical difficulties during the application process please get in touch with our Talent organization at taops@docusign.com for assistance. Applicant and Candidate Privacy Notice States Not Eligible for Employment This position is not eligible for employment in the following states: Alaska, Hawaii, Maine, Mississippi, North Dakota, South Dakota, Vermont, West Virginia and Wyoming. Equal Opportunity Employer It's important to us that we build a talented team that is as diverse as our customers and where all employees feel a deep sense of belonging and thrive. We encourage great talent who bring a range of perspectives to apply for our open positions. Docusign is an Equal Opportunity Employer and makes hiring decisions based on experience, skill, aptitude and a can-do approach. We will not discriminate based on race, ethnicity, color, age, sex, religion, national origin, ancestry, pregnancy, sexual orientation, gender identity, gender expression, genetic information, physical or mental disability, registered domestic partner status, caregiver status, marital status, veteran or military status, or any other legally protected category. EEO Know Your Rights poster #LI-Remote

Job Requirements

  • 8+ years of experience in Site Reliability Engineering, DevOps, or Software Engineering roles with ownership of production systems at scale (or equivalent experience).
  • Experience coding in at least one modern language (e.g., Go, Python, C#, Java).
  • Practical experience operating large-scale services in public cloud (Azure preferred; AWS/GCP acceptable with willingness to learn Azure).
  • Experience with Linux, networking fundamentals, and common infrastructure components (load balancers, DNS, certificates, queues, caches, databases).
  • Experience with Observability stacks (e.g., Prometheus/Grafana, OpenTelemetry/Chronicle, centralized logging).
  • Experience with CI/CD systems and deployment strategies (blue/green, canary, rolling updates).
  • Experience with incident management and on-call operations for 24x7 services.
  • Experience in building dashboards and metrics analysis.
  • Strong analytical and problem-solving skills.
  • Experience in high-availability, regulated, or customer-facing SaaS environments.
  • Background in reliability practices such as chaos testing, capacity modeling, and performance tuning.
  • Exposure to release management/unified release practices and safe rollout strategies (feature flags, staged rollouts, configuration-driven changes).
  • Demonstrated leadership driving cross-team initiatives: reliability programs, migrations, or major refactors.
  • Strong written and verbal communication skills; ability to explain complex technical topics to both engineers and non-technical stakeholders.

Benefits

  • Paid Time Off: earned time off, as well as paid company holidays based on region.
  • Paid Parental Leave: take up to six months off with your child after birth, adoption or foster care placement.
  • Full Health Benefits Plans: options for 100% employer paid and minimum employee contribution health plans from day one of employment.
  • Retirement Plans: select retirement and pension programs with potential for employer contributions.
  • Learning and Development: options for coaching, online courses and education reimbursements.
  • Compassionate Care Leave: paid time off following the loss of a loved one and other life-changing events.

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Clear-Com logo

Senior DevSecOps Engineer

Clear-Com

Trusted global provider of professional real-time communication solutions.

DevOps Engineer79 days ago
Full TimeRemoteTeam 51-200H1B No Sponsor

• Own platform security and reliability improvements across our GCP environment. • Harden identity and network controls in GCP (IAM patterns, service accounts and workload identity, organization policies, and network segmentation controls). • Build security into CI/CD by implementing and enforcing scanning and policy controls (SAST, SCA, secret detection, and container/image scanning). • Drive vulnerability management and supply chain risk reduction across services, dependencies, container images, and build pipelines. • Lead threat modeling and security design reviews for new features and material architecture changes. • Improve security observability and detection quality by tuning telemetry, reducing noise, and building high-signal detections and dashboards. • Lead investigations and coordinate incident response for security alerts and incidents, and drive follow-ups from post-mortems into preventative improvements. • Champion secure SDLC practices through standards, documentation, guardrails, and coaching for product engineering teams. • Define and maintain end-user device security standards, including requirements for security agents such as EDR and remote access tooling, and partner with stakeholders for operational execution. • Support compliance and audit readiness by conducting internal security reviews and helping align practices with frameworks and regulations (SOC 2, GDPR, NIST), including evidence support where needed.

Canada
$160K - $190K / year
Job Closed
Natera logo

Genomics Data System DevOps Engineer – Temp

Natera

Founded in 2004 and led by CEO Steve Chapman, Natera is a company in the biotechnology market that offers genetic testing and diagnostics on a global scale. Ope

DevOps Engineer79 days ago

• Working with the Data Science Production Engineering team, to design and develop a Genomics Data System. • Develop genomics data file retrieve and archive solutions from various data sources and platforms, e.g., DNAnexus, AWS HealthOmics. • Develop databases that record and manage metadata of the genomics data files. • Develop automation solutions for genomics data file retrieval and metadata management. • Test, deploy, and maintain such systems. • Write technical documentations for such systems. • Provide training and technical support for other team members.

United States
Job Closed
DistroKid logo

Senior Systems Operations Engineer

DistroKid

We're the easiest way for music creators to get music into Spotify, Apple Music, and all major streaming services.

DevOps Engineer79 days ago
Full TimeRemoteTeam 51-200Since 2013H1B No Sponsor

• Design, deploy, and manage scalable and highly available cloud infrastructure on AWS, with deep expertise in core services (EC2, EKS, S3, RDS, IAM, VPC, and beyond). • Develop and maintain disaster recovery plans leveraging AWS capabilities for backup and replication to ensure business continuity. • Collaborate with engineering and security teams to improve infrastructure health, security, and long-term scalability. • Design reusable Terraform/OpenTofu modules following DRY principles and organizational standards; implement module versioning and lifecycle strategies. • Direct the migration of manual infrastructure to code; establish patterns and best practices for IaC adoption across the team. • Implement IaC testing strategies, including validation, linting, and integration testing, using tools such as Terraform-Compliance or Checkov. • Architect and maintain complex Bitbucket pipeline configurations for multi-environment IaC deployments; implement pipeline security best practices. • Implement AIOps practices, leveraging AI tools to enhance monitoring, incident response, and predictive alerting. • Use AI-assisted development and operations tools (e.g., Cursor, Claude) to accelerate troubleshooting, code review, and documentation generation. • Evaluate and implement AI-powered automation to reduce operational toil, improve repeatability, and scale platform capabilities. • Define and implement SLOs for services; guide and/or participate in incident response and conduct blameless postmortems. • Implement chaos engineering practices to proactively identify system weaknesses before they impact production. • Build and maintain comprehensive monitoring solutions using tools such as CloudWatch and Datadog to track performance and drive optimization. • Develop automation scripts and tools in Python, Bash, or similar languages to streamline operations and eliminate manual toil. • Build self-service capabilities for development teams to reduce cognitive load and enable developer autonomy across the organization. • Guide the solution architecture and end-to-end implementation of DistroKid’s first Internal Developer Portal (IDP). • Define the IDP roadmap and success criteria in partnership with engineering leadership; establish golden paths, service catalogs, and self-service workflows that reduce deployment friction and accelerate developer productivity. • Drive adoption of the IDP across engineering teams; gather feedback, iterate on the platform, and measure impact through developer experience metrics and reduced time-to-deploy. • Guide cost optimization initiatives; implement rightsizing recommendations, reserved-capacity strategies, and tagging standards for cost allocation. • Monitor and optimize AWS resource usage; select appropriate services and configurations to meet performance requirements cost-effectively. • Direct planning, decision-making, and execution for infrastructure projects; own workstreams end-to-end. • Partner cross-functionally with engineering, security, and product teams; communicate impact in terms of company strategy and OKRs. • Provide technical mentorship to junior and mid-level engineers; invest in team growth and foster a culture of continuous learning. • Maintain and contribute to infrastructure documentation, runbooks, and architectural decision records to ensure knowledge sharing and operational consistency.

United States
$155K - $170K / year
Job Closed
OtherRemoteTeam 5,001-10,000H1B Sponsor

Join the team leading the next evolution of virtual care. At Teladoc Health, you are empowered to bring your true self to work while helping millions of people live their healthiest lives. Here you will be part of a high-performance culture where colleagues embrace challenges, drive transformative solutions, and create opportunities for growth. Together, we’re transforming how better health happens. Summary of Position The Principal Platform Engineer (DevOps / Developer Experience) is a senior individual contributor who accelerates platform delivery by pairing strong software engineering with deep platform/operations expertise. This role sets the technical bar, works collaboratively across cross teams, and delivers reusable patterns that improve delivery speed, reliability and developer productivity. Essential Duties and Responsibilities Accelerate Top Priorities - Act as a technical “force multiplier” on the highest-priority initiatives; clarify approach, resolve ambiguity, and drive work to completion with high quality and pragmatic trade-offs. - Reduce cross-team friction by defining clear interfaces, breaking work into deliverable increments, and enabling parallelization through strong architecture boundaries. Raising Engineering Standards - Establish and model best practices for engineering excellence: design docs/RFCs, architecture reviews, code review discipline, and effective automated testing strategies. - Drive API-first and “platform as a product” behaviors: define and promote consistent platform interfaces that reduce bespoke integrations and siloed solutions. Build Paved Roads - Create reusable platform capabilities (templates/modules/golden paths) that reduce reinvention and speed up delivery for teams. - Drive automation opportunities (including agentic/AI-enabled workflows) that improve operational and delivery efficiency. Improve Operational Excellence - Lead cross-cutting improvements that enhance stability and reduce toil: observability standards, alert hygiene, incident learning loops, and resilience patterns. - Partner with operations and platform stakeholders to measurably improve reliability outcomes and reduce operational drag on platform delivery teams. Partner and Mentor - Coach senior/staff engineers by pairing on real work, running reviews, and teaching pragmatic system-level thinking. - Set clear examples of technical leadership, collaboration, and accountability without formal people management responsibility. On-call Participation - Participate in the on-call rotation and contribute to restoration, root cause learning, and prevention. Required Qualifications - Bachelor’s degree in Computer Science, Engineering, or a related technical field. - 15+ years of hands-on software engineering designing, building, testing, deploying and operating large-scale distributed systems in cloud-native environments. - 5+ years operating at Staff or Principal scope, leading multi-quarter, cross-team technical initiatives that span 3+ teams and deliver organization-level outcomes. - 8+ years of experience designing and operating microservices-based systems, including API design and versioning, authentication and authorization frameworks (e.g. OAuth, OIDC, IAM), and Infrastructure-as-Code (e.g. Terraform, Cloudformation, ARM) - Deep hands-on experience (5+ years) in at least three of the following: Kubernetes and container orchestration platforms, public cloud infrastructure (AWS/Azure/GCP), CI/CD systems and deployment automation, Infrastructure-as-Code and configuration management, and production operations, reliability tooling and on-call systems. - Demonstrated ownership of production systems supporting business-critical workloads, including participation in incident response, post-incident reviews, and reliability improvements at scale. Preferred Qualifications - Proven ability to operate as a self-directed technical leader, navigating ambiguity, defining problem spaces, and driving clarity and alignment across multiple teams. - Demonstrated success influencing technical direction across globally distributed teams and multiple levels of the organization without formal authority. - Strong written and verbal communication skills, with the ability to translate complex technical concepts for engineering, product and executive audiences. - Experience designing or evolving internal platforms or self-service capabilities that materially improve developer experience, delivery throughput, or operational efficiency. - Strong background in observability (metrics, logs, traces), incident management, and reliability practices, with a track record of improving system health and reducing operational toil. - Deep understanding of performance optimization, system resilience, and observability in high-scale production environments. - Experience working in regulated industries such as healthcare or fintech, including familiarity with compliance-driven architectural and security considerations. - Familiarity with healthcare data standards (e.g. FHIR, HL7) and platform security best practices. The base salary range for this position is $180,000 - $210,000. In addition to a base salary, this position is eligible for a performance bonus and benefits (subject to eligibility requirements) listed here: Teladoc Health Benefits 2026. Total compensation is based on several factors including, but not limited to, type of position, location, education level, work experience, and certifications. This information is applicable for all full-time positions. We follow a Flexible Vacation Policy, intended for rest, relaxation, and personal time. All time off must be approved by your manager prior to use. You will also receive 80 hours of Paid Sick, Safe, and Caregiver Leave annually. This applies to full-time positions only. If you are applying for a part-time role, your recruiter can provide additional details. As part of our hiring process, we verify identity and credentials, conduct interviews (live or video), and screen for fraud or misrepresentation. Applicants who falsify information will be disqualified. Teladoc Health will not sponsor or transfer employment work visas for this position. Applicants must be currently authorized to work in the United States without the need for visa sponsorship now or in the future. Why join Teladoc Health? - Teladoc Health is transforming how better health happens. Learn how when you join us in pursuit of our impactful mission. - Chart your career path with meaningful opportunities that empower you to grow, lead, and make a difference. - Join a multi-faceted community that celebrates each colleague’s unique perspective and is focused on continually improving, each and every day. - Contribute to an innovative culture where fresh ideas are valued as we increase access to care in new ways. - Enjoy an inclusive benefits program centered around you and your family, with tailored programs that address your unique needs. - Explore candidate resources with tips and tricks from Teladoc Health recruiters and learn more about our company culture by exploring #TeamTeladocHealth on LinkedIn. As an Equal Opportunity Employer, we never have and never will discriminate against any job candidate or employee due to age, race, religion, color, ethnicity, national origin, gender, gender identity/expression, sexual orientation, membership in an employee organization, medical condition, family history, genetic information, veteran status, marital status, parental status, or pregnancy). In our innovative and inclusive workplace, we prohibit discrimination and harassment of any kind. Teladoc Health respects your privacy and is committed to maintaining the confidentiality and security of your personal information. In furtherance of your employment relationship with Teladoc Health, we collect personal information responsibly and in accordance with applicable data privacy laws, including but not limited to, the California Consumer Privacy Act (CCPA). Personal information is defined as: Any information or set of information relating to you, including (a) all information that identifies you or could reasonably be used to identify you, and (b) all information that any applicable law treats as personal information. Teladoc Health’s Notice of Privacy Practices for U.S. Employees’ Personal information is available at this link.

United States
$180K - $210K / year