Launch Potato logo
Launch Potato

Launch Potato is on a mission to build and scale digital brands by solving complex problems in product development, engineering, data science, creative, and more. Launch Potato wan

Lead Engineer, DevOps & SRE

Location

United States

Posted

14 days ago

Salary

$160K - $190K / year

Seniority

Lead

No structured requirement data.

Job Description

Lead Engineer, DevOps & SRE

Launch Potato

Role Description Own and evolve Launch Potato's cloud infrastructure, CI/CD platform, and compliance posture. Build the SRE function from the ground up so product teams can ship faster without compromising reliability, security, or cost control. Outcomes - Stand up the SRE practice from scratch: on-call rotation, PagerDuty configuration, SLA/SLO definitions for core infrastructure services, runbook library, and observability dashboards that tie site performance to business metrics. - Complete the AWS multi-account migration: move production workloads to an isolated account with zero unplanned downtime. - Deliver SOC 2 Type I audit-ready infrastructure evidence package: own the technical controls implementation end-to-end. - Version and publish the Terraform module library: (30+ modules) to a private registry to eliminate ad hoc git consumption by product teams. - Implement automated deployment rollback for ECS and Lambda: gate production on integration test passage. - Stand up monthly cost reporting to leadership: budget anomaly detection, savings plan recommendations, spend by service/team/environment. Qualifications - 5+ years of production AWS infrastructure experience with deep Terraform expertise. - Hands-on experience building the SRE function from scratch and had complete ownership. - Experience with a multi-site company where PaaS or microservices are required. - CI/CD pipeline ownership in one or more previous roles. - PagerDuty experience and standing up an on-call rotation. Requirements - 5+ years hands-on with AWS, Terraform, CI/CD pipeline ownership, and SRE tooling (OpenTelemetry, Grafana, PagerDuty or equivalent) in a production environment. Competencies - Ownership orientation: You don't wait to be assigned a problem. If something is broken, undocumented, or a risk, you flag it and fix it. If the runbooks don't exist yet, you write them. - Documentation discipline: You write things down. Runbooks, decision rationale, architecture patterns, incident post-mortems. The next person should be able to understand your work without asking you. - Cost consciousness: You think about the business impact of infrastructure decisions. You can explain a spending anomaly to a CFO in plain language. You know what things cost before you build them. - Calm under pressure: Production incidents happen. You triage clearly, communicate proactively with technical and non-technical stakeholders, and run a tight post-mortem without blame. You've been woken up at 3am. You can handle it. - Cross-functional communication: You can work with product engineers, legal/compliance, and executive leadership in the same week without switching communication modes awkwardly. You speak both engineer and business. - Proactive reliability: A good SRE reacts to outages. A great SRE catches degradation before it becomes an outage. You build alerting against the patterns, not just the failures. Total Compensation Base salary is set according to market rates for the nearest major metro and varies based on Launch Potato’s Levels Framework. Your compensation package includes a base salary, profit-sharing bonus, and competitive benefits. Launch Potato is a performance-driven company, which means once you are hired, future increases will be based on company and personal performance, not annual cost of living adjustments.

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Launch Potato logo

Lead Engineer, DevOps – SRE

Launch Potato

Launch Potato is on a mission to build and scale digital brands by solving complex problems in product development, engineering, data science, creative, and more. Launch Potato wan

DevOps Engineer14 days ago

• Own and evolve Launch Potato's cloud infrastructure, CI/CD platform, and compliance posture. • Build the SRE function from the ground up so product teams can ship faster without compromising reliability, security, or cost control. • Stand up the SRE practice from scratch: on-call rotation, PagerDuty configuration, SLA/SLO definitions for core infrastructure services, runbook library, and observability dashboards that tie site performance to business metrics. • Complete the AWS multi-account migration: move production workloads to an isolated account with zero unplanned downtime. • Deliver SOC 2 Type I audit-ready infrastructure evidence package: own the technical controls implementation end-to-end. • Version and publish the Terraform module library: (30+ modules) to a private registry to eliminate ad hoc git consumption by product teams. • Implement automated deployment rollback for ECS and Lambda: gate production on integration test passage. • Stand up monthly cost reporting to leadership: budget anomaly detection, savings plan recommendations, spend by service/team/environment.

United States
Launch Potato logo

Lead DevOps/SRE Engineer

Launch Potato

Launch Potato is on a mission to build and scale digital brands by solving complex problems in product development, engineering, data science, creative, and more. Launch Potato wan

DevOps Engineer14 days ago

Role Description Own and evolve Launch Potato's cloud infrastructure, CI/CD platform, and compliance posture. Build the SRE function from the ground up so product teams can ship faster without compromising reliability, security, or cost control. Outcomes - Stand up the SRE practice from scratch: on-call rotation, PagerDuty configuration, SLA/SLO definitions for core infrastructure services, runbook library, and observability dashboards that tie site performance to business metrics. - Complete the AWS multi-account migration: move production workloads to an isolated account with zero unplanned downtime. - Deliver SOC 2 Type I audit-ready infrastructure evidence package: own the technical controls implementation end-to-end. - Version and publish the Terraform module library: (30+ modules) to a private registry to eliminate ad hoc git consumption by product teams. - Implement automated deployment rollback for ECS and Lambda: gate production on integration test passage. - Stand up monthly cost reporting to leadership: budget anomaly detection, savings plan recommendations, spend by service/team/environment. Qualifications - 5+ years of production AWS infrastructure experience with deep Terraform expertise. - Hands-on experience building the SRE function from scratch and had complete ownership. - Experience with a multi-site company where PaaS or microservices are required. - CI/CD pipeline ownership in one or more previous roles. - PagerDuty experience and standing up an on-call rotation. - 5+ years hands-on with AWS, Terraform, CI/CD pipeline ownership, and SRE tooling (OpenTelemetry, Grafana, PagerDuty or equivalent) in a production environment. Requirements - Ownership orientation: You don't wait to be assigned a problem. If something is broken, undocumented, or a risk, you flag it and fix it. If the runbooks don't exist yet, you write them. - Documentation discipline: You write things down. Runbooks, decision rationale, architecture patterns, incident post-mortems. The next person should be able to understand your work without asking you. - Cost consciousness: You think about the business impact of infrastructure decisions. You can explain a spending anomaly to a CFO in plain language. You know what things cost before you build them. - Calm under pressure: Production incidents happen. You triage clearly, communicate proactively with technical and non-technical stakeholders, and run a tight post-mortem without blame. You've been woken up at 3am. You can handle it. - Cross-functional communication: You can work with product engineers, legal/compliance, and executive leadership in the same week without switching communication modes awkwardly. You speak both engineer and business. - Proactive reliability: A good SRE reacts to outages. A great SRE catches degradation before it becomes an outage. You build alerting against the patterns, not just the failures. Benefits - Base salary: $160,000 to $190,000 per year, paid semi-monthly. - Your compensation package includes a base salary, profit-sharing bonus, and competitive benefits. - Performance-driven company: Future increases will be based on company and personal performance, not annual cost of living adjustments.

United States
$160K - $190K / year
Launch Potato logo

Lead DevOps/SRE Engineer

Launch Potato

Launch Potato is on a mission to build and scale digital brands by solving complex problems in product development, engineering, data science, creative, and more. Launch Potato wan

DevOps Engineer14 days ago

• Own and evolve Launch Potato's cloud infrastructure, CI/CD platform, and compliance posture. • Build the SRE function from the ground up so product teams can ship faster without compromising reliability, security, or cost control. • Stand up the SRE practice from scratch: on-call rotation, PagerDuty configuration, SLA/SLO definitions for core infrastructure services, runbook library, and observability dashboards that tie site performance to business metrics. • Complete the AWS multi-account migration: move production workloads to an isolated account with zero unplanned downtime. • Deliver SOC 2 Type I audit-ready infrastructure evidence package: own the technical controls implementation end-to-end. • Version and publish the Terraform module library: (30+ modules) to a private registry to eliminate ad hoc git consumption by product teams. • Implement automated deployment rollback for ECS and Lambda: gate production on integration test passage. • Stand up monthly cost reporting to leadership: budget anomaly detection, savings plan recommendations, spend by service/team/environment.

United States
Eltropy Inc. logo

Senior Devops Engineer

Eltropy Inc.

Eltropy is on a mission to disrupt the way people access financial services. Eltropy enables financial institutions to digitally engage in a secure and compliant way. Using our world-class digital communications platform, community financial institutions can improve operations, engagement, and productivity. CFIs (Community Banks and Credit Unions) use Eltropy to communicate with consumers via Text, Video, Secure Chat, co-browsing, screen sharing, and chatbot technology — all integrated in a single platform bolstered by AI, skill-based routing, and other contact center capabilities. Customers are our North Star No Fear - Tell the truth Team of Owners Eltropy is an equal opportunity employer. All applicants will be considered for employment without attention to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran or disability status.

DevOps Engineer14 days ago
Full TimeRemoteTeam 51-200

Role Description We are seeking a skilled and motivated Sr. DevOps Engineer to join our engineering team. As a DevOps Engineer at Eltropy, you will play a central role in building, securing, and automating the infrastructure that supports our modern, high-scale communication and payment platform. This role requires expertise in Google Cloud Platform (GCP), Kubernetes, and infrastructure automation, with a strong focus on security, networking, and operational excellence. - Design and manage cloud infrastructure on Google Cloud Platform (GCP) with a focus on security, scalability, and cost-efficiency. - Architect and maintain Kubernetes clusters, enabling robust, production-grade container orchestration. - Develop and maintain fully automated CI/CD pipelines to support reliable software delivery across environments. - Implement infrastructure-as-code (IaC) using Terraform or equivalent tools for reproducible and auditable deployments. - Configure and manage PostgreSQL databases, ensuring high availability, performance tuning, and backup automation. - Define and enforce networking configurations (VPC, subnets, firewall rules, routing, ingress/egress control, DNS). - Apply and monitor security best practices across infrastructure, including IAM policies, secrets management, TLS/SSL, and threat prevention. - Monitor systems using tools like Prometheus, Grafana, and Stackdriver; build alerts and dashboards to ensure observability and uptime. - Participate in incident response, root cause analysis, and postmortems. - Continuously evaluate, optimize, and improve operational processes, deployment speed, and infrastructure resilience. Qualifications - 5+ years of hands-on experience in DevOps, SRE, or Cloud Infrastructure Engineering. - Strong experience with GCP services (e.g., GKE, IAM, Cloud Run, Cloud SQL, Cloud Functions, Pub/Sub). - Proven expertise in deploying and managing Kubernetes environments in production. - Proficiency in automating deployments, infrastructure configuration, and container lifecycle management. - Deep understanding of networking fundamentals, including DNS, load balancing, NAT, VPNs, TLS/SSL, and routing policies. - Demonstrated experience implementing CI/CD pipelines using GitHub Actions, ArgoCD, Jenkins, or similar. - Solid knowledge of PostgreSQL and experience managing databases at scale. - Familiarity with monitoring, logging, and alerting systems. - Practical knowledge of cloud security principles, vulnerability management, IAM policies, and secrets handling. - Ability to work collaboratively, communicate effectively, and take ownership of mission-critical infrastructure. Bonus Skills - Experience with Cloudflare (DNS, CDN, WAF, Zero Trust, rate limiting, page rules). - Proficiency with Terraform, Helm, or Ansible. - Familiarity with SRE practices, runbooks, SLAs/SLOs, and disaster recovery planning. - Aware of cost optimization techniques and multi-region HA architectures. - Knowledge of compliance and audit-readiness for fintech or regulated industries. Company Description Eltropy is a rocket ship FinTech on a mission to disrupt the way people access financial services. Eltropy enables financial institutions to digitally engage in a secure and compliant way. Using our world-class digital communications platform, community financial institutions can improve operations, engagement, and productivity. CFIs (Community Banks and Credit Unions) use Eltropy to communicate with consumers via Text, Video, Secure Chat, co-browsing, screen sharing, and chatbot technology — all integrated into a single platform bolstered by AI, skill-based routing, and other contact center capabilities. - Customers are our North Star - No Fear – Tell the Truth - Team of Owners Eltropy is an equal opportunity employer. All applicants will be considered for employment without attention to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran, or disability status. If you're a seasoned DevOps engineer with a passion for automation, reliability, and secure cloud infrastructure — we’d love to hear from you. Apply now and help us build the backbone of tomorrow’s financial engagement platform.

India