Nebius

Nebius is a European AI infrastructure company based in Amsterdam, North Holland, the Netherlands, specializing in full-stack AI solutions. The company offers l

Principal ML Solutions Architect

Location

United States

Posted

15 hours ago

Salary

$208K - $261K / year

Seniority

Lead

Job Description

Principal ML Solutions Architect

Nebius

Role Description This position sits within Nebius Token Factory, our serverless platform for running and customizing open-source LLMs in production. Token Factory allows for serverless inference and fine-tuning (LoRA, full FT, RFT) backed by in-house optimizations like custom speculative decoding, quantization, cache-aware routing, and dedicated endpoints. Customers come to us to move from prototype to scaled production without the cost and complexity of building and tuning their own inference stack. We're looking for a Principal ML Solutions Architect to act as the most senior technical authority for customers leveraging Token Factory's serverless inference and fine-tuning platforms. Your responsibilities will include: - Own the most complex, highest-stakes customer engagements from architecture through production across multiple modalities, driving measurable business value. - Optimize LLM inference at the framework and hardware level and codify the resulting best practices into reusable playbooks for the team. - Lead supervised and reinforcement fine-tuning efforts to maximize model quality. - Design and implement production-ready LLM solutions using Token Factory's inference services. - Provide deep technical expertise in prompt engineering, RAG architectures, model selection, and cost/performance trade-offs at scale. - Partner closely with product, engineering, and research to surface customer needs, prototype platform features, and directly influence the roadmap. - Guide customers from PoC to production with a focus on performance, reliability, and cost efficiency — and define the standards by which the team does so. - Mentor Senior and mid-level Solutions Architects; raise the technical bar of the team through review, enablement, and knowledge sharing. - Represent Token Factory externally through talks, blog posts, and conferences. Qualifications - 8+ years of experience in ML/AI systems, with at least 4 years focused on LLMs and generative AI. - Demonstrated technical leadership: owning ambiguous, high-impact problems end to end and influencing decisions across teams and customers. - Expert knowledge of the LLM ecosystem: model architectures, fine-tuning approaches, and inference internals. - Deep, hands-on command of inference optimization: quantization, KV-cache management, batching, routing, etc. - Hands-on experience with running LLMs in production at scale: deploying, operating, and debugging inference workloads down to the framework level. - LLM fine-tuning, including SFT/LoRA and data preparation/curation; experience with RL-based fine-tuning. - LLM evaluation: building task-specific benchmarks and offline/online eval pipelines, including LLM-as-a-judge setups. - Inference frameworks and libraries (vLLM, SGLang, TensorRT-LLM), including the ability to read, modify, and contribute to their internals. - Deploying LLM-powered applications using APIs from OpenAI, Anthropic, or open-source models. - Strong Python programming skills. - Excellent communication skills, with the ability to clearly explain technical concepts to diverse audiences, from engineers to executives. Requirements - Contributions or maintainership in major OSS inference/ML projects (vLLM, SGLang, TensorRT-LLM). - Published research, conference talks, or widely-read technical writing in the LLM/serving space. - Deep work with multimodal AI models (vision-language, speech). - Proficiency with DevOps tooling (Docker, Kubernetes) and infrastructure-as-code. - Experience building or owning internal tooling/automation for ML workflows at scale. Benefits - Health Insurance: 100% company-paid medical, dental, and vision coverage for employees and families. - 401(k) Plan: Up to 4% company match with immediate vesting. - Parental Leave: 20 weeks paid for primary caregivers, 12 weeks for secondary caregivers. - Remote Work Reimbursement: Up to $85/month for mobile and internet. - Disability & Life Insurance: Company-paid short-term, long-term, and life insurance coverage. - Competitive compensation and benefits packages. - Career growth and learning opportunities. - Flexibility and ownership. - Collaborative and innovative culture. - Opportunity to work on impactful AI projects. - International environment and talented teams.

Related Categories

Related Job Pages

More Solutions Engineer Jobs

New York Life logo

Senior Associate - Workload Automation Engineer

New York Life

New York Life is headquartered in New York, New York, and offers a portfolio of products for life insurance, long-term care insurance, retirement, investment an

Solutions Engineer17 hours ago
Full TimeRemoteTeam 12,000Since 1845

Title: Senior Associate - Workload Automation Engineer Location: United States Requisition ID: 93700 Department: Tech Data AI Ventures Job Function: Tech Data AI Ventures Job Description: Role Summary Serve as the engineering owner for New York Life's enterprise workload automation platform, ensuring the reliability, scalability, and resilience of mission-critical scheduling and batch processing services across on-premises and cloud environments. In this role, you will operate and enhance enterprise scheduling platforms, calendars, and workload orchestration services while designing resilient restart and recovery patterns that enable critical business processes to run predictably and consistently. You will help establish standards for job design, logging, monitoring, and audit readiness, supporting a secure, compliant, and automation-first operating model. Using platform engineering and Site Reliability Engineering (SRE) practices, you will automate operational processes, define and monitor service-level objectives (SLOs), improve observability, and lead incident response efforts to minimize downtime and improve system resilience. Your work will help ensure critical workload automation services consistently meet performance expectations and business service level agreements. What You'll Do Platform Engineering & Reliability - Operate and maintain enterprise scheduling controllers and agents across on-premises and cloud environments. - Manage calendars, SLAs, alerting, and escalation processes. - Design resilient restart, recovery, and rerun frameworks for critical workloads. - Define platform standards, governance practices, SLIs, and SLOs. - Lead platform upgrades, configuration management, and lifecycle maintenance. Automation & Operational Excellence - Implement job-as-code and configuration-as-code practices using Git and CI/CD pipelines. - Develop automation solutions using PowerShell, Python, APIs, SQL, Terraform, and related technologies. - Improve workload orchestration, dependency management, and operational consistency. - Build monitoring, dashboards, alerts, and health checks to improve visibility and reliability. - Lead incident triage, root cause analysis, and post-incident improvements. Observability & Operations - Integrate workload automation platforms with monitoring and observability solutions. - Build dashboards, metrics, and alerts to improve visibility and operational efficiency. - Lead incident triage, root cause analysis, recovery efforts, and post-incident improvements. - Optimize workload performance, resource utilization, and platform reliability. Partnership & Governance - Partner with application, cloud, database, security, and operations teams to ensure reliable workload execution. - Provide guidance on workload design, scheduling strategies, dependency management, and error handling. - Support audit, compliance, and operational governance requirements. - Document standards, playbooks, and best practices while mentoring team members. What You'll Bring - 5–8+ years of experience in workload automation, platform engineering, Site Reliability Engineering (SRE), production operations, or related environments. - Hands-on experience with Stonebranch preferred, or another enterprise workload automation platform such as BMC Control-M, AutoSys, ESP, CA-7, IBM Workload Scheduler (TWS), Redwood RunMyJobs, ActiveBatch, JAMS, Tidal, Automic (UC4), or OpCon. - Experience supporting enterprise batch processing, dependency modeling, workload orchestration, SLA management, and recovery processes. - Strong scripting and automation skills using PowerShell, Python, Bash, SQL, REST APIs, JSON, and YAML. - Experience with Git, CI/CD pipelines, and Infrastructure as Code (Terraform, CloudFormation, AWS CDK, or similar) and automation (Ansible, JFrog Artifactory, etc). - Strong AWS experience, including EC2, S3, Lambda, RDS/DynamoDB, VPC networking, observability, and high-availability architectures. - Proven design and implementation of restart/rerun patterns, dependency modeling, and idempotent batch frameworks. - Excellent coordination skills across incident and change processes, with clear, concise communication to technical and non-technical stakeholders. - Strong troubleshooting, communication, and stakeholder management skills. Nice to Have - Experience in financial services, insurance, healthcare, or other highly regulated industries. - Experience standardizing workload automation platforms and operational governance practices. - Experience integrating workload automation platforms with enterprise monitoring and observability solutions. - AWS, ITIL, CISSP, or related certifications. How Success Will Be Measured - Reduced SLA jeopardy events, breaches, and recovery times (MTTR). - Increased adoption of standardized job templates, recovery patterns, and automated validation checks. - Improved audit readiness through consistent logging, documentation, and evidence collection. - Reduced manual intervention and alert noise while improving workload completion rates and platform reliability. Pay Transparency Salary Range: $90,000-$128,500 Overtime eligible: Exempt Discretionary bonus eligible: Yes Sales bonus eligible: No Actual base salary will be determined based on several factors but not limited to individual’s experience, skills, qualifications, and job location. Additionally, employees are eligible for an annual discretionary bonus. In addition to base salary, employees may also be eligible to participate in an incentive program. Company Overview At New York Life, our 180-year legacy of purpose and integrity fuels our future. As we evolve into a more technology-, data-, and AI-enabled organization, we remain grounded in the values that drive lasting impact. Our diverse business portfolio creates opportunities to make a difference across industries and communities—inviting bold thinking, collaborative problem-solving, and purpose-driven innovation. Here, you’ll find the rare balance of long-standing stability and forward momentum, supported by an inclusive team that honors tradition while embracing progress. As a Fortune 100 mutual company, we offer a place to grow your skills, contribute to meaningful work, and deliver solutions that matter. Your ideas drive what’s next, and your growth powers it. Our Benefits We provide a full package of benefits for employees – and have unique offerings for a modern workforce, including leave programs, adoption assistance, and student loan repayment programs. Based on feedback from our employees, we continue to refine and add benefits to our offering, so that you can flourish both inside and outside of work.

New York
$90K - $128.5K / year
eFlexervices logo

Solutions Engineer

eFlexervices

Your customer-centric, performance-driven, trustworthy offshoring partner.

Solutions Engineer19 hours ago
Full TimeRemoteTeam 51-200Since 2001H1B No Sponsor

• Write, review, and debug transform scripts that modify storefront responses at the edge, ensuring correctness across device types, locales, and customer groups. • Actively monitor customer trials and react to issues or anomalies fast, whether that’s a code change or an internal escalation — speed of solution is key. • Use browser DevTools, Honeycomb, and our client’s internal APIs to diagnose caching anomalies, TTFB regressions, and personalization conflicts in production environments. • Own the technical relationship with a portfolio of merchant accounts — acting as the primary engineering point of contact for escalations. • Build internal tooling, documentation, and playbooks that scale the team's ability to onboard and support customers faster. • Surface product feedback from the field — you'll have more first-hand exposure to real-world edge cases than anyone else in the company, and your input will directly shape the roadmap.

Philippines
Full TimeRemoteTeam 1,001-5,000H1B Sponsor

• Assist with Modern Platforms EUC Strategy • Meet with clients to develop statements of work, BOMs, or presentations • Attend customer meetings and present to staff from Helpdesk to C-Level • Collect business and technical requirements • Drive whiteboard conversations and deliver technical presentations • Support proof-of-concept efforts and create Bill of Materials (BOMs) • Create and deliver proposals • Provide daily work direction to the solutions architect vertical • Serve as primary escalation path for solutions architects and clients • Provide regular status updates to engineering managers

United States
Full TimeRemoteTeam 10,001+Since 1978H1B No Sponsor

• Design and implement scalable, secure, and resilient solutions based on microservices and event-driven architectures • Define integration patterns using APIs, event buses, and synchronous/asynchronous communication models • Collaborate with development, infrastructure, security, and business teams to deliver architectural solutions aligned with business needs • Create architecture documentation, technical designs, and integration standards • Participate in architecture reviews and support technical decision-making throughout the software development lifecycle

Mexico