hims & hers logo
hims & hers

hims & hers offers a modern approach to health and wellness.

Staff Machine Learning Systems Engineer – MLOps

Systems EngineerSystems EngineerFull TimeRemoteLeadTeam 201-500Since 2017H1B No SponsorCompany SiteLinkedIn

Location

United States

Posted

1 day ago

Salary

$210K - $250K / year

Seniority

Lead

Job Description

Staff Machine Learning Systems Engineer – MLOps

hims & hers

• Own and scale the AI compute and deployment platform • Own and evolve our containerized application deployment platform and related systems for AI workloads, encompassing general process and job orchestration (e.g. Kubernetes) — cluster operations, node lifecycle, autoscaling (Karpenter), storage (EBS CSI), and workload isolation across staging and production. • Build and maintain GitOps-based deployment pipelines (Helm/Kustomize overlays, environment promotion) that let teams ship AI services safely and repeatably. • Design ephemeral/preview environments, feature-branched deployments, and nightly release pipelines so teams can validate AI changes in production-like conditions before release. • Drive efficiency and cost management across compute, autoscaling, and inference infrastructure. • Operate and scale inference infrastructure and a multi-provider LLM AI gateway (e.g. Bedrock, Vertex, and other providers) — including credentials, rate limits, and failover. • Build reliable serving patterns for LLM-powered workflows: routing, grounding, tool execution, and context assembly at the platform level. • Create reusable infrastructure abstractions and contracts that standardize how AI services are deployed, configured, and consumed across the company. • Own the LLM/AI observability and tracing stack — provisioning and scaling systems like Langfuse, Datadog (dd-trace), OpenTelemetry tracing (OTLP), and the underlying datastores (e.g. ClickHouse) — so AI behavior is auditable and debuggable in production. • Build analytics and monitoring pipelines that surface latency, error, quality, and regression signals to engineering and clinical stakeholders. • Define SLOs, alerting, on-call runbooks, and incident response for AI infrastructure; lead troubleshooting and continuously raise platform reliability. • Own and improve the monorepo build system and CI/CD pipelines for AI workloads — including eval workflows, Docker image builds, automated PR checks and convention enforcement, and cross-platform test execution. • Own shared infrastructure tooling, CLIs, and IaC modules (Terraform, Scalr) that AI and product engineers use daily. • Identify and eliminate platform bottlenecks — reducing CI/CD cycle times, build latency, and deployment friction — to improve developer velocity across the Applied AI organization. • Build IAM, OIDC, and secrets management as first-class infrastructure — scoped, least-privilege roles, write-only secret rotation, and cross-account access audits. • Encode security-by-default, scope boundaries, and access controls into the platform so AI services are HIPAA-compliant and privacy-first. • Partner with clinical, legal, security, and data platform teams (including Databricks/Unity Catalog access governance) to enforce compliant, auditable data access. • Drive multi-quarter infrastructure initiatives, from cluster and deployment architecture to inference platform, GPU compute strategy, and observability evolution. • Write and lead technical design documents and design reviews, define infrastructure standards and development-workflow conventions, and contribute to technical governance across AI engineering. • Mentor engineers on reliability engineering, infrastructure-as-code, and MLOps best practices, and bridge the gap between prototypes and production-grade systems.

Job Requirements

  • 8+ years of professional experience in infrastructure, platform, DevOps, or SRE engineering — with at least 3 years focused on ML/AI systems in production.
  • Deep, hands-on experience with Kubernetes (ideally EKS) and the cloud-native ecosystem — autoscaling, GitOps, Helm/Kustomize, operating clusters at scale, and general process/job orchestration.
  • Strong infrastructure-as-code skills (Terraform) and experience designing secure cloud architectures: IAM, OIDC, secrets management, and least-privilege access.
  • Strong proficiency in Python, with experience building production infrastructure tooling, CLIs, and data/observability pipelines.
  • 2+ years of experience operating LLM-based systems in production (LLMOps) — inference routing, serving, tracing, and the reliability patterns needed to run them at scale.
  • Hands-on experience with observability/tracing stacks (Datadog, OpenTelemetry, Langfuse, or equivalent) and metrics/log/trace pipelines.
  • Experience designing and maintaining CI/CD pipelines, build systems, and developer tooling for fast-moving engineering teams.
  • A systems-and-operations mindset: you think about failure modes, SLOs, observability, security, and long-term maintainability before shipping.
  • Experience writing and leading technical design documents (TDDs/RFCs) for infrastructure-scale initiatives.
  • Strong collaboration skills across engineering, ML, product, security, and clinical teams.
  • A deep appreciation for safety, privacy, and security — ideally with experience in a regulated domain such as healthcare, fintech, or life sciences.

Benefits

  • Competitive salary & equity compensation for full-time roles
  • Unlimited PTO, company holidays, and quarterly mental health days
  • Comprehensive health benefits including medical, dental & vision, and parental leave
  • Employee Stock Purchase Program (ESPP)
  • 401k benefits with employer matching contribution
  • Offsite team retreats

Related Categories

Related Job Pages

More Systems Engineer Jobs

Role Description The Senior Systems Engineer is responsible for leading and supporting Modern Work migration engagements for customers modernizing collaboration, messaging, identity, and endpoint management platforms. This role works directly with customers and internal delivery teams to assess current environments, define migration strategies, and implement successful transitions across Modern Work workloads including: - Microsoft Exchange Online - SharePoint Online - Teams - OneDrive - Intune - Microsoft Entra ID The ideal candidate brings strong project delivery skills and hands-on experience planning and executing Modern Work migrations. This individual understands how to move organizations from legacy on-premises, private cloud Microsoft platforms into modern collaboration and productivity environments while balancing user experience, technical risk, security, and compliance requirements. The role requires strong customer communication, collaboration across engineering and project teams, and practical experience with Office 365 tenant-to-tenant migrations, collaboration workloads, identity, and endpoint management. Experience with Microsoft 365 government cloud environments and Microsoft Azure SaaS workloads is strongly preferred. This position is part of NexusTek’s Professional Services organization, which includes specialized Engineering Practice Teams and a comprehensive Project Management Office. The primary focus of the role is to deliver Modern Work migration consulting and lead technical implementation services throughout the project lifecycle. As business needs require, the Senior Systems Engineer may also support: - Discovery - Solution design - Migration planning - Pre-sales activities - Process improvement initiatives for Modern Work and regulated cloud engagements Qualifications - Minimum of 5+ years delivering Modern Work, collaboration, messaging, identity, or endpoint modernization projects using Microsoft technologies. - Demonstrated experience migrating and supporting Exchange Online, SharePoint Online, Teams, OneDrive, Microsoft Entra ID, and Intune. - Practical experience with tenant-to-tenant migrations, hybrid coexistence scenarios, cutover planning, and post-migration remediation. - Strong knowledge of Modern Work security, compliance, and governance concepts including Conditional Access, retention, data protection, and administrative controls. - Hands-on experience with identity architecture and authentication services, including Microsoft Entra ID, federation, single sign-on, directory synchronization, and hybrid identity scenarios. - Experience integrating Modern Work and identity solutions with MFA and security platforms such as Microsoft Defender, Entra Conditional Access, Duo, Okta, and similar third-party access management or security technologies. - Experience with migration and administration tooling such as native Microsoft migration capabilities, PowerShell, and third-party migration platforms. - Experience with Azure services relevant to Modern Work and migration projects, including identity integration, virtual networking, storage, and cloud infrastructure dependencies. - Ability to translate technical requirements into clear migration plans, customer communications, documentation, and operational handoff materials. - Experience working in or supporting Microsoft 365 GCC or GCC High environments, including awareness of sovereign cloud constraints, compliance drivers, and workload limitations, is strongly preferred. Requirements - Actively drive Modern Work migration projects to successful outcomes by engaging customers and internal teams as a lead engineer and trusted advisor. - Lead discovery sessions to understand current-state platforms, business objectives, compliance drivers, user impact, and migration dependencies. - Work with project management and delivery teams to ensure migration plans, milestones, risks, and technical deliverables remain aligned and executable. - Develop practical migration approaches that address mailbox, file, collaboration, identity, device, and security workloads with minimal business disruption. - Support pilot migrations, user readiness activities, cutover events, and post-migration issue resolution. - Create and maintain clear technical documentation, runbooks, configuration records, and transition materials for implemented solutions. - Participate in sales discovery to validate requirements, shape scope, identify assumptions and risks, and recommend migration strategies for Modern Work engagements. - Review and provide input on Statements of Work (SOWs) for technical accuracy, migration feasibility, assumptions, risks, and acceptance criteria. - Partner with Sales and Project Management from pre-sales through delivery to ensure the engagement is set up for a successful outcome. - Lead customer discussions that connect migration decisions to security, compliance, adoption, and long-term operational goals. - Maintain familiarity with structured delivery methodologies and adapt to fast-paced customer environments with strong judgment and just-in-time learning. Benefits - Four weeks of annual accrued PTO - Seven paid national holidays - Medical, dental, vision options - Company-paid life insurance, short and long-term disability - Voluntary benefits such as critical illness and accident - Voluntary Legal Shield and identity theft protection - Discretionary annual 401k match plan - Generous employee referral bonus plan - Employee Assistance Program - Access to over 90,000+ courses in ADP My Learning - StandOut employee engagement tools - Eligible to apply for a Pluralsight license

United States
$120K / year
Reddit logo

Staff Machine Learning Systems Engineer, Embeddings Platform

Reddit

Reddit is an online platform utilized by thousands of communities to connect and converse about a wide variety of topics, including TV and movie fan theories, s

• Architect and lead the development of next-generation, large-scale machine learning techniques. • Define and execute the ML strategy, identifying opportunities to enhance personalization and recommendation quality across Reddit. • Lead research initiatives on scalable machine learning systems and real-time model adaptation, bringing cutting-edge advancements into production. • Partner with ML infrastructure teams to build high-performance, distributed training systems that efficiently scale across multiple GPUs and cloud environments. • Establish and optimize real-time serving architectures for large-scale embeddings, ensuring low-latency inference and high throughput. • Collaborate cross-functionally with teams in Feed Ranking, Ads, Content Understanding, and Core ML to integrate ML models into Reddit’s key AI-driven systems. • Mentor and guide senior and mid-level ML engineers, fostering a culture of excellence, innovation, and knowledge sharing. • Stay at the forefront of AI research, evaluating and introducing new modeling paradigms to keep Reddit’s ML ecosystem cutting-edge. • Drive technical discussions, present findings to leadership, and contribute to long-term ML planning and decision-making.

United States
$253.3K - $354.6K / year

Product Designer - UX/UI & Design Systems

Pavago

Pavago specializes in connecting businesses with top-tier offshore talent in operations, sales, and marketing, offering a comprehensive recruitment solution designed to reduce cost

Role Description We’re hiring a Product Designer (UX/UI) to own the end-to-end product experience across web and mobile applications. This role combines: - UX research - Product thinking - UI design - Design systems - User experience optimization You’ll work closely with Product Managers and Engineers to create intuitive, scalable, and visually polished digital experiences that improve usability, engagement, and customer satisfaction. If you think in user journeys, flows, and systems — not just screens — this role is built for you. What You’ll Do - UX Research & Product Thinking - Conduct user research, usability testing, and behavioral analysis - Translate insights into personas, journey maps, workflows, and user flows - Identify friction points and recommend UX improvements - Use analytics tools like Mixpanel and Amplitude to support product decisions - UX/UI Design Execution - Design wireframes, prototypes, and high-fidelity user interfaces - Create responsive experiences across web and mobile platforms - Build polished UI experiences aligned with modern SaaS standards - Use Figma, Adobe XD, or Sketch to deliver production-ready designs - Design Systems & Scalability - Build and maintain scalable design systems and component libraries - Define UI standards for typography, colors, spacing, and interaction patterns - Ensure consistency across all product surfaces and features - Improve collaboration between design and engineering teams - Accessibility & User Experience - Design accessible experiences aligned with WCAG 2.1 standards - Ensure usability across devices, browsers, and screen sizes - Create intuitive, user-friendly interfaces that reduce friction and improve engagement - Experimentation & Optimization - Run A/B tests on layouts, user flows, and interaction patterns - Analyze user behavior and improve conversion, retention, and engagement - Iterate quickly based on feedback, analytics, and usability findings - Collaboration & Developer Handoff - Work closely with Product Managers and Engineers throughout the product lifecycle - Deliver developer-ready assets using Figma Dev Mode or Zeplin - Provide clear interaction specs, design documentation, and UI guidance Qualifications - 2+ years of Product Design or UX/UI Design experience - Strong portfolio showcasing: - UX problem-solving - Product thinking - UI execution - Responsive design work - Proficiency with: - Figma - Adobe XD - Sketch - Experience creating: - User flows - Wireframes - Interactive prototypes - Design systems - Strong understanding of: - UX principles - Accessibility - Usability testing - Responsive design Requirements - Nice to Have: - Experience designing: - SaaS products - Consumer applications - Enterprise software - Experience with scalable design systems - Basic front-end knowledge: - HTML - CSS - JavaScript - Familiarity with: - Mixpanel - Amplitude - Product analytics tools Benefits - Full ownership of product and UX/UI design - Direct impact on product experience and customer engagement - Work closely with Product and Engineering teams - Opportunity to shape scalable design systems - Strong growth opportunities into: - Senior Product Designer - Design Lead - Head of Design What a Typical Day Looks Like - Conduct UX research and analyze user behavior - Design wireframes and high-fidelity interfaces - Build prototypes and improve user flows - Collaborate with Product Managers and Engineers - Maintain and expand design systems - Run usability tests and optimize experiences based on feedback Key Metrics (KPIs) - Improved usability and user satisfaction - Increased engagement and retention - Reduced friction in user flows - On-time delivery of design assets - Adoption and consistency of design systems - Reduced implementation issues for developers Apply Now If you: - Think in user journeys and product experiences - Combine UX thinking with strong visual design - Use data and feedback to improve products - Care deeply about usability, accessibility, and scalable systems This role is a strong fit for you.

Brazil + 3 moreAll locations: Brazil | Colombia | Argentina | Mexico
Insomniac Games logo

Lead Systems Designer

Insomniac Games

Founded in 1994, Insomniac Games has been developing "the world's best games with the world's best people" for more than two decades. The company has two studio

• Accountable for design systems quality and on-time delivery • Directly supervises Systems Designers • Oversees design, prototyping, and iteration of gameplay systems • Collaborates with Gameplay Programming counterparts • Writes and updates gameplay system documentation • Maintains systems design schedule and prioritizes work • Reviews design work and drives iteration and quality

California
$152.9K - $229.3K / year