Our mission is to enable effortless credit based on true risk.
Principal Software Engineer – Site Reliability
Location
United States
Posted
108 days ago
Salary
$195.3K - $270.4K / year
Seniority
Lead
Job Description
Principal Software Engineer – Site Reliability
Upstart
• Lead the definition, advocacy, and adoption of SRE principles across engineering teams • Partner with leadership to shape long-term reliability, resiliency, and observability strategies • Champion distributed tracing, real user monitoring (RUM), and key performance metrics such as Largest Contentful Paint (LCP) to improve system visibility and user experience • Build and scale self-healing systems to minimize manual intervention and reduce downtime • Drive enterprise-wide improvements to incident response processes, including those related to Machine Learning systems • Collaborate closely with Development Productivity and Quality teams to improve engineering velocity without sacrificing reliability • Influence technical and operational roadmaps through data-driven insights and hands-on technical contributions • Own and deliver cross-functional initiatives from concept through execution, applying program management skills to align stakeholders and achieve results
Job Requirements
- 10+ years combined experience across Software Engineering and Site Reliability Engineering, with a balanced background in both disciplines
- Proven track record as an SRE thought leader and evangelist, driving adoption of reliability best practices across organizations
- Strong communication and mentoring skills to influence engineers across disciplines
- Proficiency in Python, Go, and JavaScript/TypeScript
- Proficiency with Infrastructure as Code (Terraform, CDK, CloudFormation, etc.)
- Experience building internal tooling from scratch in agile development environments
- Expertise with observability, distributed tracing, RUM, LCP, and performance monitoring tools (e.g., Datadog, Prometheus)
- Experience with on-call and incident management, including large-scale or ML-related incidents
- Strong background in automation and building self-healing systems
- Hands-on experience with LLM/GenAI to improve SRE efficiency and processes
- Program management skills, including the ability to propose innovative solutions, influence leadership, improve processes, and drive cross-functional projects to completion
Benefits
- Competitive compensation, including base pay, bonus opportunities, and annual equity grants that vest quarterly
- Generous 401(k) plan with Upstart matching $2 for every $1 contributed, up to $15,000 per year
- Employee Stock Purchase Plan (ESPP) with discounted stock purchase options for eligible employees
- Affordable medical, dental, and vision coverage, with multiple plan options - Upstart covers 90% to 100% of the cost depending on the plans you choose
- Health Savings Account contributions from Upstart for eligible plans
- Income protection benefits, including company-paid Basic Life, AD&D, and Short- and Long-Term Disability coverage, with options to purchase supplemental coverage
- Paid time off, sick and safe time, and company holidays
- Paid family and parental leave to support caregiving and major life moments
- Family-centered benefits through Carrot and Cleo, supporting fertility, parenthood, and caregiving
- Employee Assistance Program (EAP) offering mental health support and life-centered resources
- Financial wellness resources, including access to financial planning tools and a financial concierge service
- Annual wellness allowance to support your physical and emotional well-being and personal development, based on what matters most to you
- Annual productivity allowance to invest in relevant tools and resources you need to do your best work, no matter where you work from
- Connection and community through team events and onsites, all-company updates, and employee resource groups (ERGs)
- Onsite perks, including catered lunches and fully stocked micro-kitchens when working from one of our four offices, located in the Bay Area, Austin, Columbus, and New York City (opening Summer 2026!).
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
DevSecOps Delivery Manager
GuidePoint SecurityFounded in 2011 and headquartered in Herndon, Virginia, GuidePoint Security furnishes commercial and federal organizations with customized information security solutions to help cl
• Develop and implement standardized process templates for DevSecOps service delivery • Create and manage project planning documentation to ensure smooth delivery and execution • Establish a centralized knowledge base of best practices, guidelines, and standards for DevSecOps • Collaborate with cross-functional teams to identify process improvements and implement changes • Manage key engagements and projects with strategic customers on DevSecOps projects ensuring smooth implementation and alignment with customer needs and expectations • Coordinate with Project Managers to ensure smooth delivery and preemptive escalations • Conduct project check-ins with sub-teams to discuss status of active projects • Establish measurable and repeatable DevSecOps delivery processes, reducing escalations and improving response times • Utilize Salesforce and other tracking systems to monitor project budgets, burn rates, and delivery timelines • Develop and maintain standardized process templates for DevSecOps services delivery • Establish and maintain a centralized knowledge base of best practices, guidelines, and standards for DevSecOps services delivery • Develop and implement deployment plans specific to tools and engagement types to enhance delivery efficiency • Track and report team utilization and forecasted engagements to ensure team capacity • Conduct regular process assessments to identify areas for improvement and opportunities for efficiency gains • Develop and implement process monitoring and reporting metrics to track key performance indicators (KPIs)
• Work in a team of DevOps engineers supporting multiple software projects in the data science and AI domain, many of them open source • Manage cutting-edge hardware and help inform purchasing decisions for the team • Collaborate with build engineers, developers, and management to ensure the delivery of high-quality software • Develop and modernize packages, such as streamlined Python wheels, for RAPIDS data science libraries • Design and maintain container build processes • Take a hands-on approach working with engineers on the team to implement DevOps best practices • Execute on a range of DevOps initiatives including CI/CD, observability, security/legal compliance, and SysAdmin tasks • Operate and maintain our infrastructure and development processes
• Criar e manter pipelines de CI/CD. • Automatizar processos de build, testes e deploy. • Apoiar a configuração e manutenção de ambientes em nuvem (principalmente AWS). • Trabalhar com containers (Docker) e ferramentas de orquestração (Kubernetes). • Implementar e manter Infraestrutura como Código com Terraform e/ou Ansible. • Monitorar aplicações e ambientes com ferramentas como Prometheus e Grafana. • Colaborar com os times de desenvolvimento e operações no dia a dia das entregas. • Aplicar boas práticas de versionamento com Git.




