Trase logo
Trase

AI, Uncomplicated.

Principal AI Researcher – Agentic Systems, AI Infrastructure

LLM EngineerMachine Learning EngineerFull TimeRemoteLeadTeam 11-50Since 2023H1B No SponsorCompany SiteLinkedIn

Location

Virginia + 1 moreAll locations: Virginia | Washington

Posted

4 days ago

Salary

$250K - $300K / year

Seniority

Lead

Postgraduate Degree12 yrs expEnglishCloudJavaPython

Job Description

Principal AI Researcher – Agentic Systems, AI Infrastructure

Trase

• Define and evolve the long-term AI/ML research strategy and technical roadmap for Trase OS in alignment with product and platform direction. • Lead large-scale experimentation and prototyping efforts requiring significant compute infrastructure, translating frontier AI research into scalable, production-grade systems with measurable impact. • Drive original research and technical breakthroughs in agentic systems, autonomous execution, multi-agent orchestration, post-training and fine-tuning systems, SLM/LLM-based architectures, and applied AI infrastructure. • Design how models operate within long-lived execution environments, including agent workflows, tool use, planning, memory systems, reasoning, and human-in-the-loop controls. • Establish evaluation methodologies and reliability frameworks for autonomous systems, including benchmarking, regression testing, safety, controllability, and production behavior analysis. • Drive architecture decisions across orchestration, model serving, routing, inference, and infrastructure governance, including latency, reliability, and cost optimization. • Partner closely with engineering and product teams to operationalize research outcomes into deployable systems and enterprise workflows. • Build AI systems that operate reliably in regulated and constrained environments, including secure cloud, on-premise, and air-gapped deployments. • Contribute to the broader AI research community through technical papers, publications, conference participation, architecture proposals, and thought leadership. • Serve as a senior technical authority and mentor across the organization, influencing technical direction, research rigor, experimentation practices, and best practices across research, engineering, and product teams.

Job Requirements

  • 12–15+ years of experience in machine learning, AI systems, or applied AI research, including experience operating at a Principal, Distinguished, or equivalent technical level.
  • Strong research and publication track record, including authored papers, major technical contributions, or active participation in frontier AI research.
  • Experience publishing at top-tier conferences or contributing influential open-source, research, or AI infrastructure systems.
  • Experience conducting large-scale experimentation requiring significant compute infrastructure, evaluation workflows, and iterative model/system analysis.
  • Deep expertise in one or more areas including agentic systems, LLMs and generative AI, multi-agent systems, reasoning systems, reinforcement learning, orchestration infrastructure, AI systems reliability, NLP, multimodal systems, or deep learning.
  • Hands-on experience with agent-based systems, prompt engineering, RAG, RLHF, SLMs, fine-tuning/post-training techniques, tool integration, memory systems, and human-in-the-loop orchestration.
  • Proven experience building, deploying, and operating enterprise-grade AI systems, including GenAI, LLM, or agent-based applications at scale.
  • Strong understanding of ML system behavior in production, including reliability, latency, cost tradeoffs, observability, evaluation frameworks, regression testing, and failure modes.
  • Strong systems thinking and demonstrated ability to partner cross-functionally with engineering and product organizations to move research into production systems.
  • Strong programming and prototyping skills in Python and modern ML infrastructure stacks, with experience in Java or related systems languages preferred.
  • Experience deploying AI/ML systems in regulated, constrained, or enterprise environments, and demonstrated ability to lead technical direction from research through production impact.

Benefits

  • Career track opportunity with potential for rapid advancement with strong performance as the firm grows
  • 100% employer paid, comprehensive health care including medical, dental, and vision for you and your family.
  • Paid maternity and paternity for 14 weeks at employees' normal pay.
  • Unlimited PTO, with management approval.
  • Opportunities for professional development and continued learning.
  • Optional 401K, FSA, and equity incentives available.
  • Mental health benefits are available through Tara Mind.

Related Job Pages

More LLM Engineer Jobs

Full TimeRemoteTeam 11-50H1B Sponsor

• Define and evolve the long-term AI/ML research strategy and technical roadmap for Trase OS in alignment with product and platform direction. • Lead large-scale experimentation and prototyping efforts requiring significant compute infrastructure, translating frontier AI research into scalable, production-grade systems with measurable impact. • Drive original research and technical breakthroughs in agentic systems, autonomous execution, multi-agent orchestration, post-training and fine-tuning systems, SLM/LLM-based architectures, and applied AI infrastructure. • Design how models operate within long-lived execution environments, including agent workflows, tool use, planning, memory systems, reasoning, and human-in-the-loop controls. • Establish evaluation methodologies and reliability frameworks for autonomous systems, including benchmarking, regression testing, safety, controllability, and production behavior analysis. • Drive architecture decisions across orchestration, model serving, routing, inference, and infrastructure governance, including latency, reliability, and cost optimization. • Partner closely with engineering and product teams to operationalize research outcomes into deployable systems and enterprise workflows. • Build AI systems that operate reliably in regulated and constrained environments, including secure cloud, on-premise, and air-gapped deployments. • Contribute to the broader AI research community through technical papers, publications, conference participation, architecture proposals, and thought leadership. • Serve as a senior technical authority and mentor across the organization, influencing technical direction, research rigor, experimentation practices, and best practices across research, engineering, and product teams.

Virginia + 1 moreAll locations: Virginia | Washington
$250K - $300K / year
Astreya logo

AI Infrastructure Datacenter Technical Project Manager II

Astreya

IT services that put people at the center of your business

LLM Engineer5 days ago
Full TimeRemoteTeam 1,001-5,000Since 2001H1B Sponsor

Role Description The AI Infrastructure Datacenter Technical Project Manager Level 2 serves as a senior project leader responsible for managing large-scale AI infrastructure programs, complex technical deployments, and cross-functional strategic initiatives. This role drives execution excellence across compute, GPU, storage, networking, and data center infrastructure domains while ensuring alignment with business and operational objectives. Key Responsibilities - Lead large-scale AI infrastructure deployment programs across multiple sites, regions, or business units. - Drive end-to-end project execution for GPU clusters, AI compute environments, storage platforms, high-speed networks, and data center infrastructure. - Develop integrated project plans, implementation strategies, and operational readiness frameworks. - Manage cross-functional coordination between engineering, operations, supply chain, vendors, and executive stakeholders. - Identify and mitigate program risks, schedule impacts, technical dependencies, and operational constraints. - Lead infrastructure migration, expansion, upgrade, and modernization initiatives. - Drive governance reviews, project reporting, KPI tracking, and executive-level communications. - Coordinate infrastructure acceptance testing, deployment validation, and production readiness activities. - Mentor junior project managers and contribute to PMO process standardization and operational maturity. - Support vendor negotiations, technical evaluations, and infrastructure planning initiatives. Scope & Complexity - Leads highly complex infrastructure programs with multiple concurrent workstreams. - Manages enterprise-scale AI infrastructure deployments and operational initiatives. - Influences program execution standards, governance models, and delivery methodologies. Qualifications - Advanced understanding of AI infrastructure technologies including GPU platforms, storage systems, networking, and data center operations. - 8+ years of technical project or program management experience within infrastructure environments. - Proven experience leading large-scale infrastructure deployment or transformation programs. - Strong risk management, executive communication, and stakeholder alignment skills. - Experience coordinating multi-vendor and cross-functional technical teams. - Ability to manage complex schedules, budgets, and operational dependencies. - Relevant certifications preferred (PMP, PgMP, ITIL, Agile, CCNA, etc.). Requirements - Salary Range: $72,960.00 - $100,800.00 USD (Salary) - Please note that the salary information provided herein is base pay only (gross); it does not include other forms of compensation which may or may not apply to this specific position, namely, performance-based bonuses, benefits-related payments, or other general incentives - none of which are guaranteed, may be subject to specific eligibility requirements, and are wholly within the discretion of Astreya to remit. - Further, the salary information noted above is a range that consists of a minimum and maximum rate of pay for this specific position. Where an applicant or employee is placed on this range will depend and be contingent on objective, documented work-related considerations like education, experience, certifications, licenses, preferred qualifications, among other factors. Benefits - Medical provided through UHC (PPO, HSA, Surest options) - Medical provided through Kaiser (HMO option only) for California employees only - Dental provided through UHC - Nationwide Vision provided by UHC - Flexible Spending Account for Health & Dependent Care - Pre-Tax Account for Commuter Benefit/Parking & Transit (location-specific) - Continuing Education and Professional Development via various integrated platforms, e.g. Udemy and Coursera - Corporate Wellness Program provided by Goomi Group - Employee Assistance Program - Wellness Days - 401k Plan - Basic and Supplemental Life Insurance - Short Term & Long Term Disability - Critical Illness, Critical Hospital, and Voluntary Accident Insurance - Tuition Reimbursement (available 6 months after start date, capped) - Paid Time Off (accrued and prorated, maximum of 120 hours annually) - Paid Holidays - Any other statutory leaves, paid time, or other ancillary benefits required under state and federal law

United States
$73.0K - $100.8K / year
Full TimeRemoteTeam 51-200H1B Sponsor

• Identify, vet, and manage Tier 1/2 OEMs and regional distributors for high-density servers, network gear, and cabling. • Drive end-to-end contract lifecycles, including Master Purchase Agreements (MPAs), Service Level Agreements (SLAs), and complex warranty/support negotiations. • Monitor global semiconductor trends to mitigate long-lead-time risks. Support Solution Engineering by ensuring "just-in-time" inventory of mission-critical hardware (GPUs, NICs, Switches). • Partner with Systems Engineering and Architecture teams to translate technical specs into scalable, multi-year procurement roadmaps. • Oversee the procurement and delivery of integrated components, including NVIDIA Grace CPUs, NVLink, InfiniBand, and ConnectX-8 technologies. • Architect procurement workflows that satisfy stringent security, data residency, and national compliance requirements for Sovereign AI cloud deployments.

Washington
$130.8K - $163.5K / year
Astera Labs logo

Principal AI Infrastructure – Hardware Program Management

Astera Labs

Purpose-built connectivity solutions for intelligent systems

LLM Engineer6 days ago
Full TimeRemoteTeam 51-200H1B Sponsor

• Lead and manage global AI, Storage, and Networking hardware design programs, ensuring on-time delivery, scope control, and budget adherence • Drive program governance, risk management, and execution excellence across all phases of product development • Provide regular program updates, risk assessments, and financial reporting to executive leadership through structured reviews (e.g., Leadership Program Reviews) • Oversee the successful launch of complex hardware platforms, including AI GPU-based systems • Manage high-priority, resource-constrained programs while maintaining quality and schedule commitments • Enable innovation in next-generation AI infrastructure and high-performance computing platforms • Lead end-to-end program management for UALink / PCIe Gen6 switch tray development supporting rack-scale AI platforms and GPU clusters • Coordinate design, validation, and manufacturing readiness of switch trays across EVT, DVT, and PVT phases • Drive integration of switch silicon, retimer cards, cabling, and system-level connectivity within full rack-scale architectures • Collaborate with ODMs and partners to align on design specifications, cost models, and development schedules • Manage technical trade-offs across performance, signal integrity, power delivery, thermals, and scalability for high-density GPU deployments • Partner with Tier-1 customers to deliver Joint Design Manufacturing (JDM) programs • Ensure alignment with customer-specific requirements in design, supply chain, and quality • Build strong customer relationships and act as a trusted advisor throughout the product lifecycle • Lead globally distributed teams across engineering (HW/SW), supply chain, manufacturing, and quality organizations • Coordinate teams across multiple regions (e.g., North America and Asia) to drive seamless execution • Guide programs through EVT, DVT, and PPVT phases, ensuring technical validation across electrical, thermal, power, signal integrity, and mechanical domains

United States