Job Closed

This listing is no longer active.

Slate is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, veteran status, marital status, parental status, cultural background, organizational level, work styles, tenure and life experiences. Or for any other reason. Slate is committed to providing reasonable accommodation for qualified individuals with disabilities in our job application procedures. If you need assistance or an accommodation due to a disability, you may contact us at slate-talent_acquisition@slate.auto.

Principal Software Engineer, DevOps

DevOps EngineerDevOps EngineerFull Time Remote LeadTeam 201-500

Location

United States

Posted

10 days ago

Salary

Seniority

Lead

No structured requirement data.

Job Description

Role Description Slate is looking for a Principal Software Engineer, DevOps to own the infrastructure strategy, developer platform, and operational excellence that power our consumer, internal, and B2B digital products. This is a hands-on technical leadership role — not a manager track. You’ll set direction, make architectural decisions, write and review code, and raise the bar for how we build and ship software across the organization. The ideal candidate thrives at the intersection of cloud infrastructure, developer experience, and security — and is equally comfortable in an architecture review and a terminal window. What You’ll Do - AWS Infrastructure & Strategy - Own the architecture, design, and long-term strategy for Slate’s AWS environments, spanning consumer-facing applications, internal tools, and B2B partner integrations. - Define and enforce cloud infrastructure standards, including multi-account strategy, VPC design, IAM governance, and cost management. - Lead AWS Well-Architected Framework reviews across all workloads; identify risk areas and drive remediation roadmaps. - Evaluate and adopt managed AWS services (e.g., ECS/EKS, RDS, Lambda, API Gateway, CloudFront) to reduce operational overhead and improve resilience. - Partner with engineering teams to right-size infrastructure, optimize spend, and build toward a scalable, production-grade architecture as Slate grows from startup to scale. - SDLC Patterns & Developer Platform - Define and champion SDLC standards across our Next.js/Vercel consumer web platform, internal applications, and API services — covering branching strategy, code review, CI/CD pipelines, environment promotion, and release management. - Build and operate CI/CD pipelines (GitHub Actions or equivalent) that enable fast, safe deployments across consumer, internal, and B2B surfaces. - Drive adoption of infrastructure-as-code (Terraform or CDK) to ensure environments are reproducible, auditable, and drift-free. - Champion developer experience: reduce toil, improve local dev parity with production, and create self-service tooling so engineers ship with confidence. - Establish and document engineering runbooks, deployment playbooks, and incident response procedures. - Observability & Reliability - Lead the design and implementation of Slate’s observability strategy: centralized logging, distributed tracing, and metrics across all application tiers. - Select and standardize observability tooling (e.g., Datadog, OpenTelemetry, CloudWatch, Grafana) and ensure consistent instrumentation across services. - Define SLOs, SLIs, and error budgets for critical customer-facing flows — checkout, vehicle configuration, API uptime — and build alerting to support them. - Establish Sev1/Sev2 incident response processes, including on-call rotation design, runbooks, post-mortem culture, and MTTA/MTTR tracking. - Proactively identify reliability risks and drive chaos engineering or game-day exercises to validate system resilience. - Security & Compliance - Own cloud security posture across AWS environments: implement guardrails using SCPs, Config Rules, Security Hub, and IAM least-privilege policies. - Lead application security practices including SAST/DAST integration in CI pipelines, secrets management (e.g., AWS Secrets Manager, HashiCorp Vault), and dependency vulnerability scanning. - Collaborate with enterprise IT and legal/compliance on data privacy requirements relevant to our consumer ecommerce and ownership platform (e.g., PCI-DSS scoping, CCPA). - Drive periodic penetration testing, threat modeling, and remediation prioritization across consumer and internal systems. - Evaluate and adopt security tooling (e.g., Prowler, Wiz, Snyk) to maintain continuous visibility into risk exposure. - Technical Leadership & Cross-Functional Collaboration - Serve as the technical authority for DevOps and infrastructure decisions; influence architecture across the full digital stack. - Mentor and upskill engineers on infrastructure, DevOps practices, and cloud-native patterns without requiring them to become specialists. - Partner closely with the Head of Digital, software engineering leads, enterprise IT, and infosec to align infrastructure decisions with business and product priorities. - Represent infrastructure and platform concerns in product planning, sprint reviews, and roadmap discussions. - Contribute to engineering hiring: technical screens, interview design, and onboarding for DevOps and backend hires. Qualifications - 10+ years of professional software engineering experience, with at least 5 years focused on cloud infrastructure, platform engineering, or DevOps in a production environment. - Deep, hands-on AWS expertise across core services (EC2, ECS/EKS, RDS/Aurora, Lambda, CloudFront, Route 53, IAM, VPC, S3, SQS/SNS, API Gateway, and more). - Demonstrated experience conducting or leading AWS Well-Architected Framework reviews and driving remediation. - Strong infrastructure-as-code proficiency: Terraform and/or AWS CDK in production environments. - Proven track record designing and operating CI/CD pipelines for web applications and APIs (GitHub Actions, CircleCI, or equivalent). - Solid application security experience: SAST/DAST, secrets management, IAM hardening, and vulnerability remediation workflows. - Experience building and improving observability stacks (structured logging, distributed tracing, dashboards, alerting) for production web systems. - Proficiency in at least one backend language used in web/API contexts (Node.js/TypeScript strongly preferred; Python also valued). - Excellent written and verbal communication skills; able to translate complex infrastructure topics for non-technical stakeholders. - Experience working in fast-moving startup or scale-up environments where you’ve had to make pragmatic tradeoffs under ambiguity. Preferred - AWS certifications (Solutions Architect Professional, DevOps Engineer Professional, or Security Specialty). - Experience with Vercel or similar edge/CDN-first deployment platforms alongside AWS-hosted backends. - Familiarity with ecommerce or consumer-facing web platforms at scale (high-traffic, transactional systems). - Experience in automotive, consumer electronics, or other hardware-adjacent industries. - Exposure to PCI-DSS scoping or CCPA/privacy-by-design in a consumer product context. - Experience with AI-assisted developer tooling and incorporating it into engineering workflows. - Background working with external API integrations and B2B partner connectivity (OAuth, webhooks, API gateway patterns). Benefits - This is a rare opportunity to own infrastructure and DevOps practice from the ground up at a company that’s building something genuinely new — and where your work will be felt by customers from day one. Company Description Slate is an Equal Employment Opportunity employer. We are committed to building a workplace that reflects the diversity of the communities we serve and where everyone feels welcome.

Related Categories

DevOps Engineer

Related Job Pages

Remote Full-time Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

Software Engineer II - Cloud Infra/DevOps

Insight

Now is the time to bring your expertise to Insight. We are not just a tech company; we are a people-first company. We believe that by unlocking the power of people and technology, we can accelerate transformation and achieve extraordinary results. Fortune 500 Solutions Integrator with deep expertise in cloud, data, AI, cybersecurity, and intelligent edge. Guiding organizations through complex digital decisions.

DevOps Engineer11 days ago

Full Time RemoteTeam 10,001

Role Description As a Cloud Engineer II you will: - Design and implement scalable, secure, and high-performing cloud solutions across Azure/AWS. - Lead infrastructure deployments using Infrastructure as Code (Terraform, Bicep, etc.). - Design and optimize CI/CD pipelines for automated and reliable deployments. - Implement and manage containerized workloads using Docker and Kubernetes (AKS/EKS). - Apply cloud security best practices (IAM, network security, encryption, governance). - Implement advanced monitoring, logging, and observability solutions. - Perform deep troubleshooting, performance optimization, and cost optimization. - Own end-to-end modules and drive technical delivery. Qualifications - 3-5 years of experience in cloud infrastructure, DevOps, or related roles. - Strong hands-on experience with Azure and/or AWS (multi-cloud is a plus). - Strong expertise in: - Azure: Azure Virtual Machines, VNet, Azure Storage, Azure Functions, Azure Monitor, Azure DNS, Azure AD, Load Balancers, Traffic Manager, and Application Gateway. - AWS: EC2, VPC, S3, IAM, Route 53, CloudWatch, Elastic Load Balancer (ALB/NLB), Auto Scaling, and related networking services. Benefits - Freedom to work from another location—even an international destination—for up to 30 consecutive calendar days per year. - Core values of Hunger, Heart, and Harmony, which guide everything we do.

View details: Software Engineer II - Cloud Infra/DevOps

India

Apply

Site Reliability Engineer – SRE

Bright Vision Technologies

DevOps Engineer11 days ago

Full Time Remote

Company Site

• Ensure the availability, performance, and operational excellence of large-scale distributed systems in production. • Live at the boundary between development and operations, applying strong software engineering principles to infrastructure and operations problems. • Continuously push the platform toward higher reliability with lower operational toil. • Define, instrument, and continually refine service-level objectives (SLOs), service-level indicators (SLIs), and error budgets for critical services. • Lead incident response and resolution for production issues. • Design and implement comprehensive monitoring, logging, and tracing strategies. • Build and maintain robust on-call processes, runbooks, and escalation paths. • Automate operational toil aggressively by writing production-grade tooling. • Architect and operate large-scale Kubernetes clusters and container-based workloads. • Design CI/CD pipelines that promote safe, frequent, and observable releases. • Lead capacity planning and performance engineering activities. • Partner closely with application development teams to embed reliability practices early in design. • Drive continuous improvement of security posture in collaboration with security teams. • Mentor engineers across the organization on SRE practices.

Distributed Systems Grafana Java Kubernetes Linux Prometheus Python Go

View details: Site Reliability Engineer – SRE

United States

Apply

Site Reliability Engineer – SRE

Bright Vision Technologies

DevOps Engineer11 days ago

Full Time Remote

Company Site

• Ensure the availability, performance, and operational excellence of large-scale distributed systems in production • Apply strong software engineering principles to infrastructure and operations problems • Continuously push the platform toward higher reliability with lower operational toil • Define, instrument, and continually refine service-level objectives (SLOs) • Lead incident response and resolution for production issues • Design and implement comprehensive monitoring, logging, and tracing strategies • Build and maintain robust on-call processes, runbooks, and escalation paths • Automate operational toil aggressively • Architect and operate large-scale Kubernetes clusters and container-based workloads • Design CI/CD pipelines that promote safe, frequent, and observable releases • Lead capacity planning and performance engineering activities • Collaborate with application development teams to embed reliability practices • Strengthen the platform’s resiliency through chaos engineering • Drive continuous improvement of security posture in collaboration with security teams • Mentor engineers across the organization on SRE practices

Distributed Systems Grafana Java Kubernetes Linux Prometheus Python Go

View details: Site Reliability Engineer – SRE

California

Apply

Job Closed

Software Engineer II - Cloud Infra/DevOps

Insight Enterprises, Inc.

14,000+ engaged teammates globally #20 on Fortune’s World's Best Workplaces™ list $9.2 billion in revenue Received 35+ industry and partner awards in the past year $1.4M+ total charitable contributions in 2023 by Insight globally

DevOps Engineer11 days ago

Full Time Remote

View details: Software Engineer II - Cloud Infra/DevOps

India

Apply