Job Closed
This listing is no longer active.
Changing the way people find what they love.
Manager, Data – AI Platform Engineering
Location
United States
Posted
144 days ago
Salary
$146.3K - $195K / year
Seniority
Senior
Job Description
Manager, Data – AI Platform Engineering
Stitch Fix
• Lead in a player-coach capacity in execution for Stitch Fix’s next-gen Data, ML, and GenAI platforms • Contribute towards modernization of data and ML foundations to support unified signals, adaptive models, experimentation velocity, and scalable AI/ML workloads. • Provide foundational APIs, SDKs, frameworks, and self-service tools that make it easy for data scientists, ML engineers, analysts, and application teams to build and deploy AI solutions quickly, safely, and at scale. • Partner with Data Science, Engineering, and Product teams to translate Data/ML/GenAI platform capabilities into production-grade features and intelligent experiences that deliver measurable business value. • Drive responsible AI and data adoption by creating reusable templates, documentation, and enablement programs. • Contribute towards improving governance practices including data contracts, lineage, metric definitions, access policies, and responsible AI guardrails - for trust, safety, and compliance. • Ensure operational excellence through platform reliability, performance, observability, cost efficiency, and simplification of legacy systems. • Lead and develop high-performing engineering teams fostering a culture of clarity, excellence, and trust. • Balance speed of innovation with platform stability, ensuring engineering efforts are tightly aligned to business priorities and long-term client value.
Job Requirements
- 5+ years in software, data, ML, or platform engineering; 1+ years leading engineering individual contributors is a plus.
- Demonstrated success contributing towards large-scale data platforms, ML platforms, or AI/GenAI platforms in cloud environments.
- Experience delivering platform modernization, unification, and multi-year architectural transformation.
- Strong software engineering foundation, with experience designing and building large-scale distributed systems and resilient, high-quality APIs and services using modern programming languages and cloud-native architectures.
- Track record operating and evolving modern data infrastructure, including some of the following: distributed compute and storage technologies (Spark, Trino, Iceberg), real-time processing frameworks (Kafka/Flink), metadata / catalog systems, and Kubernetes-based orchestration.
- Expertise across the ML lifecycle - feature engineering, training pipelines, model deployment and serving, monitoring, validation, fine-tuning, and MLOps best practices.
- Proven capability in building self-service platform abstractions and tooling that enable teams to develop, experiment, and deploy data and ML products efficiently.
- Experience with modern GenAI architectures - semantic retrieval, knowledge-grounded indexing, LLM orchestration, agent workflows, and evaluation frameworks.
- Familiarity with modern ML frameworks like PyTorch and Ray is a plus.
- Strategic thinker able to align platform investments with business priorities and emerging AI opportunities.
- Potential to be a strong people leader with a track record of contributing to make inclusive, high-performing engineering teams.
- Excellent communicator who can influence both technical and business stakeholders across domains.
Benefits
- This position is eligible for an annual bonus
- Eligible for medical, dental, vision, and other benefits
Related Guides
Related Categories
Related Job Pages
More Platform Engineer Jobs
Platform Engineer
QodeaQodea (formerly Appsbroker CTS) is Europe's largest Google Premier only transformation partner.
• Design and implement scalable, reliable, and cost-effective cloud solutions using Google Cloud Platform services and products. • Analyse business requirements and recommend appropriate cloud technologies and architectures. • Develop and maintain cloud infrastructure code using tools such as Terraform or Google Cloud Deployment Manager. • Manage and automate the deployment, scaling, and management of cloud infrastructure using tools such as Kubernetes or Google Cloud Deployment Manager. • Optimise infrastructure costs by analysing usage and implementing cost-saving measures such as reserved instances or instance rightsizing. • Implement infrastructure-as-code practices to ensure consistency, repeatability, and version control of cloud infrastructure. • Implement and maintain security controls and policies to ensure the confidentiality, integrity, and availability of cloud infrastructure and data. • Monitor and respond to security incidents and vulnerabilities, and perform regular security assessments and audits. • Ensure compliance with industry standards and regulations such as GDPR, HIPAA, and PCI DSS. • Monitor cloud infrastructure and services for performance, availability, and security issues using tools such as Stackdriver or Prometheus. • Perform root cause analysis and troubleshoot issues related to cloud infrastructure and services, and implement corrective actions and preventive measures. • Continuously improve the reliability and resilience of cloud infrastructure by implementing best practices such as fault-tolerance, redundancy, and disaster recovery. • Proactive management of customer related tasks and projects, by applying the learned skills and knowledge. • Proactiveness in: identifying and suggesting improvements where applicable, detecting missing documentation and filling in the gaps. • Providing technical support whilst attending meetings with stakeholders as needed. • Acting as a mentor for the more junior colleagues by sharing your knowledge and expertise, and guiding them through the successful resolution of customer related tasks.
Senior Platform Engineer
QodeaQodea (formerly Appsbroker CTS) is Europe's largest Google Premier only transformation partner.
• Design and implement scalable, reliable, and cost-effective cloud solutions using Google Cloud Platform services and products. • Analyse business requirements and recommend appropriate cloud technologies and architectures. • Develop and maintain cloud infrastructure code using tools such as Terraform or Google Cloud Deployment Manager. • Manage and automate the deployment, scaling, and management of cloud infrastructure using tools such as Kubernetes or Google Cloud Deployment Manager. • Optimise infrastructure costs by analysing usage and implementing cost-saving measures such as reserved instances or instance rightsizing. • Implement infrastructure-as-code practices to ensure consistency, repeatability, and version control of cloud infrastructure. • Implement and maintain security controls and policies to ensure the confidentiality, integrity, and availability of cloud infrastructure and data. • Monitor and respond to security incidents and vulnerabilities, and perform regular security assessments and audits. • Ensure compliance with industry standards and regulations such as GDPR, HIPAA, and PCI DSS. • Monitor cloud infrastructure and services for performance, availability, and security issues using tools such as Stackdriver or Prometheus. • Perform root cause analysis and troubleshoot issues related to cloud infrastructure and services, and implement corrective actions and preventive measures. • Continuously improve the reliability and resilience of cloud infrastructure by implementing best practices such as fault-tolerance, redundancy, and disaster recovery. • Leading and managing potential new projects for our customers by gathering and understanding the requirements, providing estimations on effort required, etas of deliverables, attending regular customer meetings and supporting the CSM with technical input where needed. • Acting as a mentor for the more junior colleagues by sharing your knowledge and expertise, and guiding them through the successful resolution of customer related tasks.
Container Platform Operations Engineer – Kubernetes
Owens & MinorEmpowering Our Customers To Advance Healthcare
• Provide advanced operational support and troubleshoot complex issues to minimize downtime. • Collaborate with development and operations teams to implement features and integrate user-facing elements. • Optimize container platform environments for performance, scalability, and security, following best practices. • Monitor system performance and resolve issues to ensure uptime and reliability. • Deploy and manage Kubernetes-based clusters, ensuring high availability, scalability, and security. • Configure infrastructure components for optimal performance and reliability. • Administer and maintain CI/CD pipelines using GitOps tools (e.g., Argo CD), ensuring security and compliance. • Perform advanced Linux administration, including installation and configuration, across cloud environments. • Document operational, maintenance, and upgrade procedures for clarity and accessibility. • Support knowledge sharing and assist team members in troubleshooting and best practices.
Senior Director of Engineering, Platform Engineering
ZeitviewAt Zeitview, we deliver advanced inspection software for high-value infrastructure.
• Lead, mentor, and grow global, distributed engineering teams, including tech leads, engineers and potential managers. • Build a high-performance engineering culture centered on ownership, accountability, quality, and continuous improvement. • Provide regular coaching, feedback, performance evaluations, and career development guidance. • Drive effective hiring, onboarding, succession planning, and retention strategies. • Own the technical strategy and long-term vision for platform engineering. • Partner closely with architects and tech leads to define platform standards, architectural patterns, and best practices. • Ensure the platform serves as a strong foundation enabling rapid and safe product innovation. • Stay sufficiently hands-on to understand the complexity and risk of initiatives and guide execution and delivery. • Act as a key partner to Product, Data, Infrastructure, Security, and other cross-functional leaders.



