Giving lenders the tools to scale and modernize through integration to our API-first, cloud-native platform.
Site Reliability Engineer
Location
United States
Posted
105 days ago
Salary
$128K - $160K / year
Seniority
Senior
Job Description
Site Reliability Engineer
Peach
• Help build an effective, inclusive SRE team. • Keep reliability over 99.99% • Design, Develop, and Maintain new data products for our customers • Automate reporting and financial processes for the company • Provide architectural expertise to product teams optimizing for availability and performance. • Participate in infrastructure oncall and the incident response process. • Create infrastructure that is compliant with Fintech regulatory frameworks.
Job Requirements
- Experience working in a SaaS
- High availability environments (over 99.99% uptime) for 5+ years
- Helm (5+ years)
- Google Cloud (5+ years)
- Python (5+ years)
- Terraform (5+ years)
- CI (5+ years)
- U.S. Work Authorization
Benefits
- Health insurance
- 401(k) matching
- Paid time off
- Flexible work arrangements
- Professional development opportunities
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Own the reliability, scalability, and performance of Peec AI’s core systems and infrastructure • Design, build, and maintain the tooling, automation, and monitoring that keep our services fast, secure, and highly available • Partner closely with product and engineering teams to ensure new features are reliable, observable, and easy to operate from day one • Develop and refine incident response practices, ensuring issues are triaged quickly and resolved with minimal user impact • Proactively identify and address bottlenecks, single points of failure, and operational inefficiencies across the stack • Champion operational excellence and a culture of reliability, driving best practices across the engineering organization
• Optimize release deployments and maintain secure cloud infrastructure • Handle day-to-day operations and problem-solving • Ingest new solutions and products from the Build/Automation organization • Use monitoring and logging tools to solve issues • Conduct post-mortem analysis and identify potential issues for improvement • Setup, monitor, and maintain DevOps cloud-based SAAS products and solutions • Maintain security and data privacy and ensure compliance • Work with architects on deployment architecture, security, and CI/CD implementations • Setup and maintain Kubernetes clusters on cloud environments • Analyze and solve operational issues, and respond to incidents • Conduct root cause analysis and implement continuous improvements • Evaluate new technology options and vendor products
SRE – Platform Engineer
DroneUpDroneUp is a leader in drone flight services that transforms organizations using drone technology and delivery solutions. The company develops SaaS platforms that have mobile app t
• Broad domain architect for the internal developer platform and all cloud engineering • Drive architecture for tooling or in-house software • Mentor other platform engineers to drive strong engineering practices • Enablement of platform engineering technical capabilities in our internal client teams in software engineering • Peer with the senior architects and engineers in software engineering • Architecture and engineering focused on GCP environment • Architect and oversee GKE cluster operations and workload management • Provide feedback to others and participate in peer reviews / pair programming • Drive the broad adoption of Test Driven Development through designing, development, and debugging unit and integration tests for new and existing infrastructure and code • Continuous curiosity of existing implementations and new technologies and sharing with the team • Practice continuous improvement across all job areas and personally / professionally • Clearly communicate with platform engineering teams and other stakeholders and provide technical direction while doing so • Stay current with platform changes and third-party libraries. • Proactively investigate better solutions for current solutions • An understanding of Open Telemetry and true observability and the difference between it and monitoring and logging • Grow the engineering culture towards a high-performing team • Practice the arts of self-service, least privilege and security by default in all solutions • Define and maintain Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets • Lead incident response, including on-call rotations, root cause analysis, and post-mortem reviews • Implement and optimize monitoring, alerting, and observability systems for system reliability • Collaborate on capacity planning and performance optimization to ensure high availability • Other duties as assigned
• Design, implement, and maintain CI/CD pipelines to support automated build, test, and deployment workflows • Partner with engineering teams to streamline release processes and improve deployment reliability • Implement and manage monitoring, logging, and alerting solutions to ensure system health and performance • Define and maintain cost monitoring and alerting strategies to optimize cloud spend and prevent unexpected usage • Automate infrastructure provisioning and configuration using Infrastructure as Code (IaC) • Troubleshoot production issues and lead root cause analysis efforts • Establish DevOps best practices around reliability, security, and operational excellence • Continuously evaluate tools and processes to improve scalability, availability, and efficiency • Mentor junior engineers and contribute to a strong DevOps culture




