Job Closed
This listing is no longer active.
We’re revolutionizing the way the industry thinks about pet wellbeing: petco.com/wholehealth
Manager, DevOps – AI
Location
Texas
Posted
65 days ago
Salary
$193.3K - $194.7K / year
Seniority
Lead
Job Description
Manager, DevOps – AI
Petco
• Responsible for leading and scaling technology functions that supports AI/ML workloads in a cloud-native environment. • Overseeing design, implementation, and maintenance of automated deployment pipelines, cloud infrastructure, and monitoring systems to ensure high availability, scalability, and security of AI-driven applications. • Collaborate closely with data scientists, machine learning engineers, and software developers to streamline model deployment, accelerate experimentation, and optimize operational practices. • Drive cloud cost optimization strategies, including effective resource allocation, right-sizing, and leveraging cloud-native tools to reduce operational expenses while maintaining performance and reliability. • Serve as a technical leader with deep knowledge of cloud platforms (AWS, Azure, or GCP), infrastructure-as-code, containerization, and AI workflows. • Mentor team and foster culture of continuous improvement, reliability, and efficiency.
Job Requirements
- Bachelor’s degree or equivalent in Computer Science, or related field of study and 7 years of progressive experience in Development Operations.
- Employer will accept a Master’s degree or equivalent in Computer Science or related field of study and 3 years of post-baccalaureate experience in Development Operations in lieu of a Bachelor’s degree or equivalent and 7 years of progressive experience.
- Applicants must have demonstrated experience with: 1) 7 years (3 years with Master’s) of experience in Cloud operations. 2) 4 years (2 years with Master’s) of experience in a supervisory capacity or serving as a direct lead. 3) 3 years of experience assisting with architecture design and providing support for development teams’ management, migrations, and deployments from a data center to public cloud Serverless Lambda, Cloud Databases, and EKS. 4) 3 years of experience using GitLab pipelines, terraform, vault and helm to manage and deploy public cloud infrastructure, git merges, pull requests and CI/CD processes. 5) 3 years of experience working in SOX audited environment or similar governmental body environment directly interfacing with auditors to review and respond to audits and findings. 6) 3 years of experience implementing cloud security frameworks. 7) 3 years of experience maintaining large (at least $5M) annual cloud budget with experience in cost control and cost saving measures, including savings plans, reserved instances, Spot marketplace, and preferred pricing models. 8) 2 years of experience with cloud networking, routing, load balancing, CIDR Blocks, Security Groups, Access Lists and policies. 9) managing large and complex cloud infrastructure migration projects. 10) technical presentations to senior management with varying technical backgrounds on operational efficiency and innovation projects. Any and all experience may be gained concurrently.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Looking for your chance to make a real impact? Firespring, Nebraska's first Certified B Corporation®, is looking for an amazing human to join our team. We’re known for providing marketing, printing and strategic guidance to thousands of brands, businesses and nonprofits in all 50 states and all over the world. Our mission is to accelerate client prosperity so we collectively do more good. Please let us know if this position sounds like your dream job. Job Description Firespring looks for something different in the engineers we hire. Our engineers work in a dynamic environment where the average day is spent coding, problem-solving, and strategically thinking about how to improve our systems. We believe in building quality infrastructure that we can be proud of, and we don’t operate with a lot of red tape. We aren’t afraid to swap out pieces of our infrastructure with a better solution if it makes sense for the future. In this role, you will play a key part in supporting our infrastructure while developing the tools and dashboards that keep our internal teams running smoothly. We are looking for a proactive, future-thinking individual with the self-confidence to take ownership of their work. Tasks and Responsibilities - Design and develop internal applications and custom dashboards to streamline company operations and improve system visibility. - Assist in maintaining secure, scalable AWS infrastructure using CloudFormation to treat infrastructure as code (IaC) with version control and change sets. - Support and improve deployment pipelines using AWS CodePipeline and CodeBuild to ensure our release processes are seamless and reliable. - Monitor system health using centralized logs and alerts, helping to build the dashboards that ensure we identify issues before our users do. - Perform routine administration and troubleshooting of Linux-based systems while assisting with security standards via AWS Config and Security Hub. Qualifications - Background in object-oriented programming with the ability to write scripts in Python, PHP, or Bash to automate tasks and build web application backends. - Comfortable working in a Linux environment with a solid understanding of command-line navigation, networking basics, and DNS. - You aren’t intimidated by new technology; instead, you enjoy the challenge of learning new tools and growing your technical skillset. - A natural problem-solver who enjoys digging into the root cause of an issue and taking ownership of the fix. - Exposure to or an interest in modern building blocks like Docker, MySQL, and NGINX in a production-grade environment. Location We would love to host you in our Lincoln office, but we know that doesn’t work for everybody. If you would like to work remotely, we can accommodate you! Compensation & Benefits - Salary—You don’t need to go to the grocery store to bring home the bacon. We reward candidates who wow us by offering competitive pay - Hours—40 hrs, flexibility required. - 401(k)—Your parents preached about the importance of saving. Now we’re helping you get it done. Firespring provides professional financial advisors who will help you make a plan and guide your investments. - Fun—Millions of people go to work, punch the clock from 8 to 5 and hate every moment of it. That’s not the case here. We prioritize loving your experience here and have a group of people dedicated to creating activities inside and outside the office. To put it mildly, we’re serious about having fun—and it reflects in our work and the relationships too. - Miscellaneous Benefits—Not all benefits are about the Benjamins, baby. Some of the things you’ll enjoy while working here include unlimited soft drinks, tea and beer. Dress code? We want you to have personal freedom—just stick to the general guidelines of your role and you be you. Ready to come aboard? Let’s make this happen. While we genuinely appreciate your interest in employment with Firespring, we can only respond to the most qualified candidates. Firespring is an EEO/AA employer.
Senior Site Reliability Engineer
MedrioAccelerate clinical research with the fastest, easiest, and most flexible eClinical tools.
• As a Medrio Senior Site Reliability Engineer, you will be a part of the ITOps group responsible for maintaining all environments supporting the SDLC for Medrio’s platform.
Lead DevOps Engineer
NearsureRemove the barriers to growth by scaling your team fast with top-notch Latin American IT talent
• Lead the design and execution of the migration from VMware to OpenShift. • Define target architecture, including cluster design, networking, and security. • Provide technical leadership and mentorship to DevOps engineers. • Collaborate with stakeholders to define migration strategies (rehost, replatform, refactor). • Design and implement CI/CD and GitOps practices (e.g., ArgoCD, Jenkins, GitLab CI). • Ensure high availability, scalability, and security of the platform. • Oversee containerization of legacy applications. • Establish Infrastructure as Code practices using tools like Terraform or Ansible. • Implement observability solutions (monitoring, logging, tracing). • Identify risks and define mitigation strategies during migration.
Senior Site Reliability Engineer – Kubernetes Platform
SysEleven GmbHMastering Cloud. Accelerating Business.
• Design and implement observability solutions using Prometheus, Loki and Mimir • Analyze, troubleshoot and further develop proprietary Kubernetes controllers • Develop and maintain production applications • Operate, automate and continuously evolve the MKA platform • Enhance internal tooling solutions




