Launch Potato is on a mission to build and scale digital brands by solving complex problems in product development, engineering, data science, creative, and more. Launch Potato wan
Lead DevOps/SRE Engineer
Location
United States
Posted
21 days ago
Salary
0
Seniority
Senior
Job Description
Lead DevOps/SRE Engineer
Launch Potato
• Own and evolve Launch Potato's cloud infrastructure, CI/CD platform, and compliance posture. • Build the SRE function from the ground up so product teams can ship faster without compromising reliability, security, or cost control. • Stand up the SRE practice from scratch: on-call rotation, PagerDuty configuration, SLA/SLO definitions for core infrastructure services, runbook library, and observability dashboards that tie site performance to business metrics. • Complete the AWS multi-account migration: move production workloads to an isolated account with zero unplanned downtime. • Deliver SOC 2 Type I audit-ready infrastructure evidence package: own the technical controls implementation end-to-end. • Version and publish the Terraform module library: (30+ modules) to a private registry to eliminate ad hoc git consumption by product teams. • Implement automated deployment rollback for ECS and Lambda: gate production on integration test passage. • Stand up monthly cost reporting to leadership: budget anomaly detection, savings plan recommendations, spend by service/team/environment.
Job Requirements
- 5+ years of production AWS infrastructure experience with deep Terraform expertise.
- Hands-on experience building the SRE function from scratch and had complete ownership.
- Experience with a multi-site company where PaaS or microservices are required.
- CI/CD pipeline ownership in one or more previous roles.
- PagerDuty experience and standing up an on-call rotation.
- 5+ years hands-on with AWS, Terraform, CI/CD pipeline ownership, and SRE tooling (OpenTelemetry, Grafana, PagerDuty or equivalent) in a production environment.
Benefits
- profit-sharing bonus
- competitive benefits
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Senior Devops Engineer
Eltropy Inc.Eltropy is on a mission to disrupt the way people access financial services. Eltropy enables financial institutions to digitally engage in a secure and compliant way. Using our world-class digital communications platform, community financial institutions can improve operations, engagement, and productivity. CFIs (Community Banks and Credit Unions) use Eltropy to communicate with consumers via Text, Video, Secure Chat, co-browsing, screen sharing, and chatbot technology — all integrated in a single platform bolstered by AI, skill-based routing, and other contact center capabilities. Customers are our North Star No Fear - Tell the truth Team of Owners Eltropy is an equal opportunity employer. All applicants will be considered for employment without attention to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran or disability status.
Role Description We are seeking a skilled and motivated Sr. DevOps Engineer to join our engineering team. As a DevOps Engineer at Eltropy, you will play a central role in building, securing, and automating the infrastructure that supports our modern, high-scale communication and payment platform. This role requires expertise in Google Cloud Platform (GCP), Kubernetes, and infrastructure automation, with a strong focus on security, networking, and operational excellence. - Design and manage cloud infrastructure on Google Cloud Platform (GCP) with a focus on security, scalability, and cost-efficiency. - Architect and maintain Kubernetes clusters, enabling robust, production-grade container orchestration. - Develop and maintain fully automated CI/CD pipelines to support reliable software delivery across environments. - Implement infrastructure-as-code (IaC) using Terraform or equivalent tools for reproducible and auditable deployments. - Configure and manage PostgreSQL databases, ensuring high availability, performance tuning, and backup automation. - Define and enforce networking configurations (VPC, subnets, firewall rules, routing, ingress/egress control, DNS). - Apply and monitor security best practices across infrastructure, including IAM policies, secrets management, TLS/SSL, and threat prevention. - Monitor systems using tools like Prometheus, Grafana, and Stackdriver; build alerts and dashboards to ensure observability and uptime. - Participate in incident response, root cause analysis, and postmortems. - Continuously evaluate, optimize, and improve operational processes, deployment speed, and infrastructure resilience. Qualifications - 5+ years of hands-on experience in DevOps, SRE, or Cloud Infrastructure Engineering. - Strong experience with GCP services (e.g., GKE, IAM, Cloud Run, Cloud SQL, Cloud Functions, Pub/Sub). - Proven expertise in deploying and managing Kubernetes environments in production. - Proficiency in automating deployments, infrastructure configuration, and container lifecycle management. - Deep understanding of networking fundamentals, including DNS, load balancing, NAT, VPNs, TLS/SSL, and routing policies. - Demonstrated experience implementing CI/CD pipelines using GitHub Actions, ArgoCD, Jenkins, or similar. - Solid knowledge of PostgreSQL and experience managing databases at scale. - Familiarity with monitoring, logging, and alerting systems. - Practical knowledge of cloud security principles, vulnerability management, IAM policies, and secrets handling. - Ability to work collaboratively, communicate effectively, and take ownership of mission-critical infrastructure. Bonus Skills - Experience with Cloudflare (DNS, CDN, WAF, Zero Trust, rate limiting, page rules). - Proficiency with Terraform, Helm, or Ansible. - Familiarity with SRE practices, runbooks, SLAs/SLOs, and disaster recovery planning. - Aware of cost optimization techniques and multi-region HA architectures. - Knowledge of compliance and audit-readiness for fintech or regulated industries. Company Description Eltropy is a rocket ship FinTech on a mission to disrupt the way people access financial services. Eltropy enables financial institutions to digitally engage in a secure and compliant way. Using our world-class digital communications platform, community financial institutions can improve operations, engagement, and productivity. CFIs (Community Banks and Credit Unions) use Eltropy to communicate with consumers via Text, Video, Secure Chat, co-browsing, screen sharing, and chatbot technology — all integrated into a single platform bolstered by AI, skill-based routing, and other contact center capabilities. - Customers are our North Star - No Fear – Tell the Truth - Team of Owners Eltropy is an equal opportunity employer. All applicants will be considered for employment without attention to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran, or disability status. If you're a seasoned DevOps engineer with a passion for automation, reliability, and secure cloud infrastructure — we’d love to hear from you. Apply now and help us build the backbone of tomorrow’s financial engagement platform.
Cloud IaaS-Azure Devops
ZensarAt Zensar, we’re “experience-led everything”. We are committed to conceptualizing, designing, engineering, marketing, and managing digital solutions and experiences for over 130 leading enterprises. We are a company driven by a bold purpose: Together, we shape experiences for better futures. Whether for our clients, our people, or the world around us, this belief powers everything we do. At the heart of our culture is ONE with Client - a set of four core values that reflect who we are and how we work: One Zensar, Nurturing, Empowering, and Client Focus. Part of the $4.8 billion RPG Group, we’re a community of 10,000+ innovators across 30+ global locations, including Milpitas, Seattle, Princeton, Cape Town, London, Zurich, Singapore, and Mexico City. We believe the best work happens when individuality is celebrated, growth is encouraged, and well-being is prioritized. We are an equal employment opportunity (EEO) and affirmative action employer, committed to creating an inclusive workplace. All qualified applicants will be considered without regard to race, creed, color, ancestry, religion, sex, national origin, citizenship, age, sexual orientation, gender identity, disability, marital status, family medical leave status, or protected veteran status.
Role Description We are seeking an Azure Infrastructure Engineer to lead Azure Platform Engineering, AIOps enablement, and enterprise‑grade API integrations. This is a hands‑on, senior technical role responsible for building and operating a scalable Internal Developer Platform (IDP) that enables deep automation, observability, and AI‑driven operations through Python‑based services and API‑first integrations across Azure, DevOps, monitoring, and ITSM ecosystems. Key Responsibilities - Platform Engineering & Leadership - Own the Azure platform architecture, roadmap, and engineering standards. - Define golden paths, reusable platform services, and self‑service capabilities. - Act as the technical authority and escalation point for platform and CloudOps integrations. - Azure Infrastructure & Infrastructure‑as‑Code - Architect and govern enterprise‑scale Azure infrastructure using Terraform (IaC‑first). - Ensure consistent implementation of landing zones, networking, identity, and governance. - Review Terraform modules for security, scalability, and operational readiness. - CI/CD, Automation & Python Engineering - Define and lead Azure DevOps CI/CD standards using YAML pipelines. - Build Python‑based automation, including: - Orchestration and workflow engines - Platform utilities, health checks, and guardrails - Alert‑ and event‑driven remediation - Integrate automation into CI/CD and operational workflows. - API Integration & Platform Connectivity (Core Focus) - Serve as the API Integration Specialist, owning integrations across: - Azure services (ARM, Azure Monitor, Log Analytics, AKS, Key Vault) - Azure DevOps (pipelines, repos, artifacts, boards) - Observability platforms (Grafana, Prometheus, Loki, Azure Monitor) - ITSM / Operations tools (ServiceNow, incident and ticketing systems) - Design API‑driven workflows using Python (REST, webhooks, event‑driven). - Standardize authentication, authorization, and secrets management. - Ensure integrations are secure, resilient, observable, and automation‑ready. - Kubernetes & Cloud‑Native Platforms - Provide technical leadership for AKS platform architecture and operations. - Define standards for networking, security, scaling, and lifecycle management. - Guide teams on Helm‑based deployments and cloud‑native patterns. - AIOps & Observability - Own the AIOps strategy for Azure CloudOps, including: - Alert correlation and noise reduction - Anomaly detection and predictive insights - API‑driven and Python‑based automated remediation - Standardize observability data for AIOps consumption. - Continuously improve MTTR, availability, and operational efficiency. - Cross‑Team Leadership - Mentor and lead platform and Azure infrastructure engineers. - Partner with Security, FinOps, SRE, and CloudOps teams. - Produce SOPs, runbooks, and API contracts suitable for L1/L2 and AIOps automation. - Business‑as‑Usual (BAU) & Rotational Support - Own BAU platform and CloudOps operations, ensuring stability and reliability. - Participate in rotational on‑call support, acting as Lead escalation for: - Production incidents and service degradation - AKS, Azure infrastructure, and CI/CD failures - Drive incident triage, RCA, and post‑incident reviews, feeding outcomes into: - Platform improvements - Automation and AIOps use cases - SOPs and runbooks - Progressively automate BAU operations using Python, APIs, and CI/CD workflows. - Partner with L1/L2 teams to shift left via automation and AIOps. Qualifications - Bachelor’s degree or equivalent experience. - 8–12+ years in Azure infrastructure, DevOps, or platform engineering. - Prior experience in a Lead technical role owning platforms or automation. Requirements - Deep expertise in Microsoft Azure (compute, networking, identity, governance). - Strong hands‑on experience with Terraform at enterprise scale. - Advanced experience with Azure DevOps CI/CD and YAML pipelines. - Strong experience with AKS / Kubernetes. - Strong Python proficiency for automation and integrations. - Strong experience with REST APIs, webhooks, and event‑driven architectures. - Proven leadership in platform engineering, DevOps, or CloudOps. - Practical experience or strong exposure to AIOps and observability. Benefits - FastAPI/Flask, Helm, GitOps, policy‑as‑code, FinOps (Nice to Have). Company Description At Zensar, we’re “experience-led everything”. We are committed to conceptualizing, designing, engineering, marketing, and managing digital solutions and experiences for over 130 leading enterprises. We are a company driven by a bold purpose: Together, we shape experiences for better futures. Whether for our clients, our people, or the world around us, this belief powers everything we do. - At the heart of our culture is ONE with Client - a set of four core values that reflect who we are and how we work: One Zensar, Nurturing, Empowering, and Client Focus. - Part of the $4.8 billion RPG Group, we’re a community of 10,000+ innovators across 30+ global locations, including Milpitas, Seattle, Princeton, Cape Town, London, Zurich, Singapore, and Mexico City.
DevOps Engineer
iManageFounded in 2015, iManage is a people-focused software company that offers work-product management solutions for accounting, financial services, real estate, law
Automate operational tasks and develop software solutions while collaborating with cross-functional teams. Create resilient, cloud-native platforms and participate in on-call rotations to enhance service reliability and security.
Site Reliability Engineer - FedRAMP
Cisco ThousandEyesCisco ThousandEyes is a Digital Experience Assurance platform that empowers organizations to deliver flawless digital experiences across every network – even the ones they don’t own. Powered by AI and an unmatched set of cloud, internet and enterprise network telemetry data, ThousandEyes enables IT teams to proactively detect, diagnose, and remediate issues – before they impact end- user experiences. ThousandEyes is deeply integrated across the entire Cisco technology portfolio and beyond, helping customers deploy at scale while also delivering AI-powered assurance insights within Cisco’s leading Networking, Security, Collaboration, and Observability portfolios.
The application window is expected to close on: 06/30/2026 Job posting may be removed earlier if the position is filled or if a sufficient number of applications are received . Meet the Team The Cisco ThousandEyes FedRAMP team builds and operates our US GovCloud platform. This team is responsible for architecting, delivering, and maintaining our FedRAMP offering. Cisco ThousandEyes is a Digital Experience Assurance platform that empowers organizations to deliver flawless digital experiences across every network - even the ones they don't own. Powered by AI and an unmatched set of cloud, internet and enterprise network telemetry data, ThousandEyes enables IT teams to proactively detect, diagnose, and remediate issues - before they impact end- user experiences. ThousandEyes is deeply integrated across the entire Cisco technology portfolio and beyond, helping customers deploy at scale while also delivering AI-powered assurance insights within Cisco's leading Networking, Security, Collaboration, and Observability portfolios Your Impact As part of this role, you will be responsible for maintaining services in a FedRAMP compliant environment, therefore, must be a U.S. citizen. This position may also perform work that the U.S. government has specified can only be performed by a U.S. citizen on U.S. soil. - Lead, inspire, and develop a talented SRE team, fostering a culture of innovation, collaboration, and excellence. - Drive strategic vision for the management and continued expansion of FedRAMP-compliant infrastructure and systems, ensuring excellence in operations and security processes. - Collaborate closely with cross-functional teams, including development, product management, and security to define and implement FedRAMP-compliant processes and strategies across the broader Cisco ThousandEyes platform. - Provide oversight and direction for how ThousandEyes approaches the continuous monitoring, logging, and auditing of systems to ensure compliance with FedRAMP controls. - Stay current with industry best practices, evolving security threats, and updates to FedRAMP guidelines, and apply this knowledge to enhance the security posture of all platforms and systems. Minimum Qualifications: - You have led a distributed team of 5+ engineers, can demonstrate strong technical vision for your team, and ensure consistent delivery on objectives. - You have a total of 5+ years of experience building and supporting mission-critical services with a focus on automation, availability, and performance, and you have worked on large-scale distributed systems including multi-tiered architecture. - You have experience identifying and analyzing cyber security risks, familiarity with security best practices, vulnerability management, and incident response processes. - You have experience formulating a team's technical strategy and roadmap; you've collaborated and partnered effectively with several other teams to execute on shared goals. - You understand how to balance tactical needs with strategic growth and quality-based initiatives that can span multiple quarters. - You are a US citizen. Preferred Qualifications - Experience building and/or operating FedRAMP environments. - Strong understanding of the FedRAMP framework, its controls, and compliance requirements. Why Cisco? At Cisco, we're revolutionizing how data and infrastructure connect and protect organizations in the AI era - and beyond. We've been innovating fearlessly for 40 years to create solutions that power how humans and technology work together across the physical and digital worlds. These solutions provide customers with unparalleled security, visibility, and insights across the entire digital footprint. Fueled by the depth and breadth of our technology, we experiment and create meaningful solutions. Add to that our worldwide network of doers and experts, and you'll see that the opportunities to grow and build are limitless. We work as a team, collaborating with empathy to make really big things happen on a global scale. Because our solutions are everywhere, our impact is everywhere. We are Cisco, and our power starts with you. Message to applicants applying to work in the U.S. and/or Canada: The starting salary range posted for this position is $165,000.00 to $241,400.00 and reflects the projected salary range for new hires in this position in U.S. and/or Canada locations, not including incentive compensation*, equity, or benefits. Individual pay is determined by the candidate's hiring location, market conditions, job-related skillset, experience, qualifications, education, certifications, and/or training. The full salary range for certain locations is listed below. For locations not listed below, the recruiter can share more details about compensation for the role in your location during the hiring process. U.S. employees are offered benefits, subject to Cisco's plan eligibility rules, which include medical, dental and vision insurance, a 401(k) plan with a Cisco matching contribution, paid parental leave, short and long-term disability coverage, and basic life insurance. Please see the Cisco careers site to discover more benefits and perks. Employees may be eligible to receive grants of Cisco restricted stock units, which vest following continued employment with Cisco for defined periods of time. U.S. employees are eligible for paid time away as described below, subject to Cisco's policies: - 10 paid holidays per full calendar year, plus 1 floating holiday for non-exempt employees - 1 paid day off for employee's birthday, paid year-end holiday shutdown, and 4 paid days off for personal wellness determined by Cisco - Non-exempt employees** receive 16 days of paid vacation time per full calendar year, accrued at rate of 4.92 hours per pay period for full-time employees - Exempt employees participate in Cisco's flexible vacation time off program, which has no defined limit on how much vacation time eligible employees may use (subject to availability and some business limitations) - 80 hours of sick time off provided on hire date and each January 1st thereafter, and up to 80 hours of unused sick time carried forward from one calendar year to the next - Additional paid time away may be requested to deal with critical or emergency issues for family members - Optional 10 paid days per full calendar year to volunteer For non-sales roles, employees are also eligible to earn annual bonuses subject to Cisco's policies. Employees on sales plans earn performance-based incentive pay on top of their base salary, which is split between quota and non-quota components, subject to the applicable Cisco plan. For quota-based incentive pay, Cisco typically pays as follows: - .75% of incentive target for each 1% of revenue attainment up to 50% of quota; - 1.5% of incentive target for each 1% of attainment between 50% and 75%; - 1% of incentive target for each 1% of attainment between 75% and 100%; and - Once performance exceeds 100% attainment, incentive rates are at or above 1% for each 1% of attainment with no cap on incentive compensation. For non-quota-based sales performance elements such as strategic sales objectives, Cisco may pay 0% up to 125% of target. Cisco sales plans do not have a minimum threshold of performance for sales incentive compensation to be paid. The applicable full salary ranges for this position, by specific state, are listed below: New York City Metro Area: $165,000.00 - $277,600.00 Non-Metro New York state & Washington state: $146,700.00 - $247,000.00 * For quota-based sales roles on Cisco's sales plan, the ranges provided in this posting include base pay and sales target incentive compensation combined. ** Employees in Illinois, whether exempt or non-exempt, will participate in a unique time off program to meet local requirements.


