Job Closed
This listing is no longer active.
Senior DevOps Engineer
Location
California
Posted
132 days ago
Salary
$150K - $200K / year
Seniority
Senior
Job Description
Senior DevOps Engineer
Scribe
• Architect, maintain, and scale critical infrastructure. • Ensure system reliability and optimize performance. • Implement modern deployment strategies that enable engineering teams to ship with confidence. • Tackle problems around scale, reliability, and performance optimization. • Empower engineers to be more productive and ship at scale by building infrastructure that removes friction.
Job Requirements
- 7+ years of experience in DevOps, Site Reliability Engineering, or Infrastructure Engineering roles
- Deep expertise with AWS services, particularly EKS, VPC, S3, and RDS/Aurora, and Well-Architected framework
- Extensive experience with Kubernetes internals, cluster administration, and multi-environment management
- Strong proficiency writing and maintaining infrastructure as code using Terraform, including module development and state management
- Production experience with CNCF projects such as Helm for package management and Karpenter for node provisioning
- Proven experience developing and managing CI/CD pipelines in GitHub Actions
- Hands-on experience with observability tooling including Honeycomb, Sentry, and AWS CloudWatch
- Deep expertise implementing zero-downtime deployment strategies with automated rollbacks
- Strong networking fundamentals including TCP/IP, DNS, load balancing, and traffic routing
- Experience managing async job processing systems and message queues at scale
- Strong scripting and automation skills (Python, Go, Bash, or similar)
- Proven track record of capacity planning, performance optimization, and cost management initiatives.
Benefits
- Health, dental, and vision insurance for you and your dependents.
- Flexible paid time off, plus company holidays to rest and reset.
- Employees can contribute to a 401(k) plan to help plan for their future.
- Paid parental leave to help you care for and bond with your growing family.
- SF-based employees receive daily catered lunches at our office.
- Commuter benefits for our office-based team, make getting to and from HQ simpler.
- Remote? Hybrid? Wherever you work, we’ll support your setup with a home office stipend.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Senior/Staff Site Reliability Engineer
CircleCircle helps businesses and developers harness the power of stablecoins for payments and internet commerce worldwide.
• Build and maintain production infrastructure estate • Empower agile development teams with a high-performance CI/CD pipeline • Design, maintain, and secure cloud infrastructure using Infrastructure-as-Code tools • Automate operational tasks using Go, Python, and serverless solutions • Manage and monitor Kubernetes clusters for multiple production workloads • Ensure system reliability and security by participating in on-call rotations • Plan, test, and implement disaster recovery strategies • Leverage AI-powered solutions for managing infrastructure and optimizing performance • Mentor and support team growth
Integration Platform Reliability Engineer
Dealer TireWe’re more than tires and parts. We’re a team on a mission to revolutionize the automobile dealer channel.
• Develop and maintain platform documentation, training materials and systems documentation. • Work with infrastructure engineers to maintain the stability and reliability of the integration platforms. • Monitor the performance and availability of the integration platforms, recommend, and implement remediations to issues. • Follow Dealer Tire and affiliate policies regarding change control and incident management. • Keep integration platform toolsets patched and up to date according to the appropriate policies. • Assist Integration Specialists with platform inter-operability issues. • Work with infrastructure teams to support the infrastructure roadmap. • Work with Enterprise Information Security to maintain the appropriate security posture based upon policy and contract requirements. • Assist in designing and implementing the Integration roadmap. • Recommend, design, develop and implement small and medium scale automations. • Help in the development of junior team members. • Project Management Work closely with functional groups to assure project tasks are completed accurately, timely and with quality. • Document project tasks for scope boundaries, high-level requirements, and other aspects as needed. • Work with functional staff to develop training materials for Information Technology systems. • Business Requirements Document high-level and detailed business requirements related to required platform capabilities. • Communicate business requirements to the Information Technology staff. • Collaborate on Information Technology projects with various functional areas and staff to clarify business requirements, answer questions and resolve problems throughout projects and on-going support. • Collaborate with team members on the development and on-going use of audits and controls to monitor the daily transmission and collection of EDI/Integration files. • Implementation Management/Change Management Assist users and Integration Specialists in the review of business requirements. • Coordinate work with various functional staff to develop user acceptance test objectives, test plans and test scripts to verify and ensure that business requirements are achieved. • Perform system tests and user acceptance tests of Information Technology system changes as agreed with various functional staff in test plans. • Maintain technical analysis documents, technical designs, test scripts and test results to verify Integration Platform requirements are understood and addressed completely, correctly, and consistently. • Execute unit and integrated system test reviews as appropriate with Information Technology staff. • Apply established mechanisms for tracking business requirements throughout the life of Information Technology projects. • Use mechanisms to verify all requirements have been addressed completely, correctly, and consistently. • Work with various functional and IT staff to apply established change management procedures for Information Technology projects to control scope. • Facilitate documentation and approval of requirements changes of either a business or technical nature, during Information Technology projects, using change management procedures. • Perform production implementations, some of which may require after hour activities to complete. • Other Responsibilities Timely and accurate time and status reporting. • Adherence to security policies. • Learn and leverage knowledge to develop mappings on tools utilized by the team.
Senior Site Reliability Engineer
CoderPadCoderPad is the leading technical interview platform for all engineering and software development teams.
• Design, operate, and evolve production infrastructure across AWS, GCP, Heroku, and Kubernetes. • Own and improve monitoring, alerting, and SLOs for customer-facing services. • Lead and participate in incident response, postmortems, and long-term remediation. • Build and maintain infrastructure-as-code, CI/CD pipelines, and automation (Terraform, GitLab CI, Kubernetes tooling). • Drive scalability, performance, and resilience across a real-time SaaS platform. • Ensure security, patching, and operational hygiene across all environments. • Partner with product and engineering teams to enable safe, fast, and reliable releases. • Actively contribute to cost visibility and cloud optimization.
• Architect & Scale Infrastructure: Design and implement multi-cluster, multi-region Kubernetes deployments using EKS, GKE, and AKS. Build infrastructure that scales across regions and cloud providers. • Own Production Systems: Take end-to-end ownership of production infrastructure. Drive incident response, postmortems, and improvements to prevent recurrence. • Infrastructure as Code at Scale: Build and maintain Terraform modules for complex infrastructure patterns. Manage thousands of configuration files across clusters, regions, and environments using GitOps principles. • GitOps & Deployment Excellence: Design and optimize ArgoCD ApplicationSets and Helm chart architectures. Build deployment pipelines that enable safe, automated releases across hundreds of microservices. • Performance & Reliability Engineering: Analyze system performance, identify bottlenecks, and implement optimizations. Improve SLOs through capacity planning, autoscaling, and architectural improvements. • Observability & Monitoring: Build and enhance monitoring, alerting, and observability using Prometheus, Grafana, Loki, and custom tooling. Drive visibility into complex distributed systems. • Security & Compliance: Implement security controls, compliance frameworks, and best practices across cloud infrastructure. Design secure multi-tenant architectures. • Technical Leadership: Mentor engineers, establish best practices, and drive technical decisions. Collaborate with platform, SRE, and product teams to deliver reliable infrastructure.



