LLM Engineer Remote Jobs in Virginia (US)
This page tracks remote llm engineer openings that are location-eligible for Virginia.
This page tracks remote llm engineer openings that are location-eligible for Virginia.
Open jobs
91
Hiring companies this week
7
Salary sample
$72,960 - $250,000
Jobs added last hour
0
91 Jobs
45 Companies
Role Description Throughput. Latency. KV cache utilization. Move those three numbers in the right direction, and two things happen: customers get faster, cheaper inference, and our margins improve. That's the entire thesis of this role. Every kernel you tune, every quantization scheme you ship, every scheduler tweak you land shows up directly in a customer's p99 and on our P&L. This is a high-impact seat. It is also a high-autonomy seat as you'll be given the room to lead the technical direction of inference optimization at Kimchi, not execute someone else's roadmap. The problem: running LLMs in production is a moving target. The "right" model and serving configuration for a workload depend on: - Traffic shape - Sequence-length distribution - Batch dynamics - GPU SKU - Memory bandwidth - Quantization tolerance - A dozen other variables that shift week to week Most teams pick a model once, over-provision GPUs, and absorb the cost. Kimchi is the system that makes that decision automatically - continuously matching workloads to the most cost-efficient, best-performing LLM and serving configuration on a customer's infrastructure. We're building the optimization layer between the model and the hardware, and we need engineers who understand both sides deeply. Qualifications - 5+ years building real ML systems, with a portfolio that shows depth in inference or training infrastructure (not just model training notebooks). - Strong Python - production services, not scripts. - Hands-on experience with at least one of vLLM, SGLang, or TensorRT-LLM, and a working mental model of why an inference engine performs the way it does on a given GPU. - Fluency with quantization tradeoffs - you've measured quality regressions, not just compression ratios. - Comfort with distributed systems: collective communication, sharding strategies, and the practical failure modes of multi-GPU and multi-node setups. - A bias toward measurement. You instrument before you optimize, and you can tell the difference between a real win and a benchmark artifact. - Self-direction. This role comes with a wide mandate; you should be excited by that, not unsettled by it. Requirements - Push throughput. Continuous batching, speculative decoding, chunked prefill, kernel-level tuning across vLLM, SGLang, and TensorRT-LLM. Find the ceiling on each GPU SKU, then raise it. - Cut latency. Attack TTFT and TPOT separately. Profile, identify the actual bottleneck (compute, memory bandwidth, scheduling, networking), and fix it - not the bottleneck you assumed. - Get more out of the KV cache. Paged attention, prefix caching, eviction policies, cache reuse across requests, quantized KV. This is where a lot of the unrealized throughput lives, and it's an area you'll own. - Quantize without regressing quality. INT8, INT4, FP8 across weights, activations, and KV. Empirical work: measure quality on real workloads, not just perplexity benchmarks. - Shrink cold starts and memory footprint. Faster init, smarter weight loading, tighter memory accounting - the difference between a model that scales and one that doesn't. - Scale across nodes. Distributed inference topologies, network-aware placement, checkpointing strategies that don't bottleneck on storage or interconnect. - Set the technical direction. Decide what we benchmark, what we adopt, and what we build ourselves. Bring the team along with strong writeups and reproducible experiments. Benefits - Competitive salary (depending on the level of experience). - Enjoy a flexible, remote-first global environment. - Collaborate with a global team of cloud experts and innovators, passionate about pushing the boundaries of Kubernetes technology. - Equity options. - Get quick feedback with a fast-paced workflow. Most feature projects are completed in 1 to 4 weeks. - Spend 10% of your work time on personal projects or self-improvement. - Learning budget for professional and personal development - including access to international conferences and courses that elevate your skills. - Annual hackathon to spark new ideas and strengthen team bonds. - Team-building budget and company events to connect with your colleagues. - Equipment budget to ensure you have everything you need. - Extra days off to help maintain a healthy work-life balance. Hiring process - Screening call with Recruiter - Hiring Manager interview - Technical interview (system design) - Live coding - Culture Check interview with an executive As part of our standard hiring process, we would like to inform you that a background check may be conducted at the final stage of recruitment through our third-party provider, Checkr. Please note that Cast AI does not provide any form of visa sponsorship/work permit.
Role Description Throughput. Latency. KV cache utilization. Move those three numbers in the right direction, and two things happen: customers get faster, cheaper inference, and our margins improve. That's the entire thesis of this role. Every kernel you tune, every quantization scheme you ship, every scheduler tweak you land shows up directly in a customer's p99 and on our P&L. This is a high-impact seat. It is also a high-autonomy seat as you'll be given the room to lead the technical direction of inference optimization at Kimchi, not execute someone else's roadmap. The problem: running LLMs in production is a moving target. The "right" model and serving configuration for a workload depend on: - Traffic shape - Sequence-length distribution - Batch dynamics - GPU SKU - Memory bandwidth - Quantization tolerance - A dozen other variables that shift week to week Most teams pick a model once, over-provision GPUs, and absorb the cost. Kimchi is the system that makes that decision automatically - continuously matching workloads to the most cost-efficient, best-performing LLM and serving configuration on a customer's infrastructure. We're building the optimization layer between the model and the hardware, and we need engineers who understand both sides deeply. Qualifications - 5+ years building real ML systems, with a portfolio that shows depth in inference or training infrastructure (not just model training notebooks). - Strong Python - production services, not scripts. - Hands-on experience with at least one of vLLM, SGLang, or TensorRT-LLM, and a working mental model of why an inference engine performs the way it does on a given GPU. - Fluency with quantization tradeoffs - you've measured quality regressions, not just compression ratios. - Comfort with distributed systems: collective communication, sharding strategies, and the practical failure modes of multi-GPU and multi-node setups. - A bias toward measurement. You instrument before you optimize, and you can tell the difference between a real win and a benchmark artifact. - Self-direction. This role comes with a wide mandate; you should be excited by that, not unsettled by it. Requirements - Push throughput. Continuous batching, speculative decoding, chunked prefill, kernel-level tuning across vLLM, SGLang, and TensorRT-LLM. Find the ceiling on each GPU SKU, then raise it. - Cut latency. Attack TTFT and TPOT separately. Profile, identify the actual bottleneck (compute, memory bandwidth, scheduling, networking), and fix it - not the bottleneck you assumed. - Get more out of the KV cache. Paged attention, prefix caching, eviction policies, cache reuse across requests, quantized KV. This is where a lot of the unrealized throughput lives, and it's an area you'll own. - Quantize without regressing quality. INT8, INT4, FP8 across weights, activations, and KV. Empirical work: measure quality on real workloads, not just perplexity benchmarks. - Shrink cold starts and memory footprint. Faster init, smarter weight loading, tighter memory accounting - the difference between a model that scales and one that doesn't. - Scale across nodes. Distributed inference topologies, network-aware placement, checkpointing strategies that don't bottleneck on storage or interconnect. - Set the technical direction. Decide what we benchmark, what we adopt, and what we build ourselves. Bring the team along with strong writeups and reproducible experiments. Benefits - Competitive salary (depending on the level of experience). - Enjoy a flexible, remote-first global environment. - Collaborate with a global team of cloud experts and innovators, passionate about pushing the boundaries of Kubernetes technology. - Equity options. - Get quick feedback with a fast-paced workflow. Most feature projects are completed in 1 to 4 weeks. - Spend 10% of your work time on personal projects or self-improvement. - Learning budget for professional and personal development - including access to international conferences and courses that elevate your skills. - Annual hackathon to spark new ideas and strengthen team bonds. - Team-building budget and company events to connect with your colleagues. - Equipment budget to ensure you have everything you need. - Extra days off to help maintain a healthy work-life balance. Hiring process - Screening call with Recruiter - Hiring Manager interview - Technical interview (system design) - Live coding - Culture Check interview with an executive As part of our standard hiring process, we would like to inform you that a background check may be conducted at the final stage of recruitment through our third-party provider, Checkr. Please note that Cast AI does not provide any form of visa sponsorship/work permit.
Top world’s largest social discovery company uniting 70+ brands with 500M+ users
• Conducting experiments with LLMs: Explore and analyze the effectiveness of different architectures and techniques (SFT, RLHF, Adapters, etc.) to enhance the capabilities of AI models. • Developing and implementing evaluation methodologies: Implement and maintain robust frameworks to assess the performance, accuracy, and user satisfaction of AI bots, including offline and online metrics. • Optimizing models for inference: Improve the efficiency and speed of AI models during inference to ensure they meet the performance and scalability requirements for production environments. • Collaborating with cross-functional teams: Work closely with data scientists, software engineers, and product managers to integrate AI solutions into the overall product pipeline.
• Define and evolve the long-term AI/ML research strategy and technical roadmap for Trase OS in alignment with product and platform direction. • Lead large-scale experimentation and prototyping efforts requiring significant compute infrastructure, translating frontier AI research into scalable, production-grade systems with measurable impact. • Drive original research and technical breakthroughs in agentic systems, autonomous execution, multi-agent orchestration, post-training and fine-tuning systems, SLM/LLM-based architectures, and applied AI infrastructure. • Design how models operate within long-lived execution environments, including agent workflows, tool use, planning, memory systems, reasoning, and human-in-the-loop controls. • Establish evaluation methodologies and reliability frameworks for autonomous systems, including benchmarking, regression testing, safety, controllability, and production behavior analysis. • Drive architecture decisions across orchestration, model serving, routing, inference, and infrastructure governance, including latency, reliability, and cost optimization. • Partner closely with engineering and product teams to operationalize research outcomes into deployable systems and enterprise workflows. • Build AI systems that operate reliably in regulated and constrained environments, including secure cloud, on-premise, and air-gapped deployments. • Contribute to the broader AI research community through technical papers, publications, conference participation, architecture proposals, and thought leadership. • Serve as a senior technical authority and mentor across the organization, influencing technical direction, research rigor, experimentation practices, and best practices across research, engineering, and product teams.
Impact Through Innovation
• Define and evolve the long-term AI/ML research strategy and technical roadmap for Trase OS in alignment with product and platform direction. • Lead large-scale experimentation and prototyping efforts requiring significant compute infrastructure, translating frontier AI research into scalable, production-grade systems with measurable impact. • Drive original research and technical breakthroughs in agentic systems, autonomous execution, multi-agent orchestration, post-training and fine-tuning systems, SLM/LLM-based architectures, and applied AI infrastructure. • Design how models operate within long-lived execution environments, including agent workflows, tool use, planning, memory systems, reasoning, and human-in-the-loop controls. • Establish evaluation methodologies and reliability frameworks for autonomous systems, including benchmarking, regression testing, safety, controllability, and production behavior analysis. • Drive architecture decisions across orchestration, model serving, routing, inference, and infrastructure governance, including latency, reliability, and cost optimization. • Partner closely with engineering and product teams to operationalize research outcomes into deployable systems and enterprise workflows. • Build AI systems that operate reliably in regulated and constrained environments, including secure cloud, on-premise, and air-gapped deployments. • Contribute to the broader AI research community through technical papers, publications, conference participation, architecture proposals, and thought leadership. • Serve as a senior technical authority and mentor across the organization, influencing technical direction, research rigor, experimentation practices, and best practices across research, engineering, and product teams.
Astreya provides IT support services with a special focus on increasing productivity and employee satisfaction for its business clients. The company was founded
Role Description The AI Infrastructure Datacenter Technical Project Manager Level 2 serves as a senior project leader responsible for managing large-scale AI infrastructure programs, complex technical deployments, and cross-functional strategic initiatives. This role drives execution excellence across compute, GPU, storage, networking, and data center infrastructure domains while ensuring alignment with business and operational objectives. Key Responsibilities - Lead large-scale AI infrastructure deployment programs across multiple sites, regions, or business units. - Drive end-to-end project execution for GPU clusters, AI compute environments, storage platforms, high-speed networks, and data center infrastructure. - Develop integrated project plans, implementation strategies, and operational readiness frameworks. - Manage cross-functional coordination between engineering, operations, supply chain, vendors, and executive stakeholders. - Identify and mitigate program risks, schedule impacts, technical dependencies, and operational constraints. - Lead infrastructure migration, expansion, upgrade, and modernization initiatives. - Drive governance reviews, project reporting, KPI tracking, and executive-level communications. - Coordinate infrastructure acceptance testing, deployment validation, and production readiness activities. - Mentor junior project managers and contribute to PMO process standardization and operational maturity. - Support vendor negotiations, technical evaluations, and infrastructure planning initiatives. Scope & Complexity - Leads highly complex infrastructure programs with multiple concurrent workstreams. - Manages enterprise-scale AI infrastructure deployments and operational initiatives. - Influences program execution standards, governance models, and delivery methodologies. Qualifications - Advanced understanding of AI infrastructure technologies including GPU platforms, storage systems, networking, and data center operations. - 8+ years of technical project or program management experience within infrastructure environments. - Proven experience leading large-scale infrastructure deployment or transformation programs. - Strong risk management, executive communication, and stakeholder alignment skills. - Experience coordinating multi-vendor and cross-functional technical teams. - Ability to manage complex schedules, budgets, and operational dependencies. - Relevant certifications preferred (PMP, PgMP, ITIL, Agile, CCNA, etc.). Requirements - Salary Range: $72,960.00 - $100,800.00 USD (Salary) - Please note that the salary information provided herein is base pay only (gross); it does not include other forms of compensation which may or may not apply to this specific position, namely, performance-based bonuses, benefits-related payments, or other general incentives - none of which are guaranteed, may be subject to specific eligibility requirements, and are wholly within the discretion of Astreya to remit. - Further, the salary information noted above is a range that consists of a minimum and maximum rate of pay for this specific position. Where an applicant or employee is placed on this range will depend and be contingent on objective, documented work-related considerations like education, experience, certifications, licenses, preferred qualifications, among other factors. Benefits - Medical provided through UHC (PPO, HSA, Surest options) - Medical provided through Kaiser (HMO option only) for California employees only - Dental provided through UHC - Nationwide Vision provided by UHC - Flexible Spending Account for Health & Dependent Care - Pre-Tax Account for Commuter Benefit/Parking & Transit (location-specific) - Continuing Education and Professional Development via various integrated platforms, e.g. Udemy and Coursera - Corporate Wellness Program provided by Goomi Group - Employee Assistance Program - Wellness Days - 401k Plan - Basic and Supplemental Life Insurance - Short Term & Long Term Disability - Critical Illness, Critical Hospital, and Voluntary Accident Insurance - Tuition Reimbursement (available 6 months after start date, capped) - Paid Time Off (accrued and prorated, maximum of 120 hours annually) - Paid Holidays - Any other statutory leaves, paid time, or other ancillary benefits required under state and federal law
Purpose-built connectivity solutions for intelligent systems
• Lead and manage global AI, Storage, and Networking hardware design programs, ensuring on-time delivery, scope control, and budget adherence • Drive program governance, risk management, and execution excellence across all phases of product development • Provide regular program updates, risk assessments, and financial reporting to executive leadership through structured reviews (e.g., Leadership Program Reviews) • Oversee the successful launch of complex hardware platforms, including AI GPU-based systems • Manage high-priority, resource-constrained programs while maintaining quality and schedule commitments • Enable innovation in next-generation AI infrastructure and high-performance computing platforms • Lead end-to-end program management for UALink / PCIe Gen6 switch tray development supporting rack-scale AI platforms and GPU clusters • Coordinate design, validation, and manufacturing readiness of switch trays across EVT, DVT, and PVT phases • Drive integration of switch silicon, retimer cards, cabling, and system-level connectivity within full rack-scale architectures • Collaborate with ODMs and partners to align on design specifications, cost models, and development schedules • Manage technical trade-offs across performance, signal integrity, power delivery, thermals, and scalability for high-density GPU deployments • Partner with Tier-1 customers to deliver Joint Design Manufacturing (JDM) programs • Ensure alignment with customer-specific requirements in design, supply chain, and quality • Build strong customer relationships and act as a trusted advisor throughout the product lifecycle • Lead globally distributed teams across engineering (HW/SW), supply chain, manufacturing, and quality organizations • Coordinate teams across multiple regions (e.g., North America and Asia) to drive seamless execution • Guide programs through EVT, DVT, and PPVT phases, ensuring technical validation across electrical, thermal, power, signal integrity, and mechanical domains
• ARMRA is seeking a Sr. Director of AI Infrastructure to own, build, and scale the foundational AI infrastructure across the entire organization. • As ARMRA's dedicated AI leader, you will be the single owner of technical standards, governance, and the scalable architecture that connects all of ARMRA's business systems into a unified intelligent automation layer. • This is a critical, high-visibility, company-wide role. • You will inherit that foundation and take full ownership of it — driving improvements, closing gaps, implementing new capabilities, and expanding the infrastructure as the business evolves. • You will own and continuously develop a uniform, resilient, and high-performance AI playbook that serves every function of the business equally — from Retail and Operations to Marketing, Finance, and Customer Experience. • You understand that the best AI architecture is not about training models — it's about connecting systems, enforcing governance, and making intelligent automation reliable enough to trust across every surface of the business.
• Own Messaging & Positioning • Develop and refine SubQ’s core messaging, positioning, narrative, and category strategy across products, research initiatives, and GTM efforts. • Translate Complex Technical Concepts • Work closely with researchers and engineers to deeply understand complex AI infrastructure concepts and turn them into clear, credible, and differentiated messaging for technical audiences. • Drive Product Launches • Lead messaging and GTM support for product launches, benchmark releases, research announcements, demos, and technical campaigns. • Shape the Company Narrative • Help define how SubQ communicates its long-term vision, technical differentiation, and market positioning within the broader AI ecosystem. • Partner Across Teams • Collaborate cross-functionally with engineering, research, leadership, recruiting, GTM, and content teams to ensure consistency and clarity across all external communication. • Support Founder-Led Content • Partner with leadership on high-impact thought leadership, technical commentary, social content, conference materials, and strategic communications. • Understand the Market • Track trends across AI/ML, LLMs, inference, agents, long-context systems, developer tooling, open-source ecosystems, and frontier AI research to inform positioning and messaging strategy. • Develop Technical GTM Strategy • Help shape developer-facing and technical GTM initiatives including launch strategy, content strategy, competitive positioning, and ecosystem engagement.
Building Advance AI & Cloud Native Software Using The "Virtual Silicon Valley" Model. Let’s Talk AI, Cloud and Outcomes.
• Develop and deploy ML models and LLM-based solutions for product outcome prediction • Build workflows where users input product details and receive predictive insights • Enhance an existing AI agent into a decision-making/prediction system • Work with product and assortment data for feature engineering and modeling • Integrate LLM capabilities to improve reasoning and contextual understanding • Continuously evaluate and improve model performance
81more opportunities are still waiting for you.Log in now and take your next shot before someone else does.
Cloud, Python, PyTorch, Scikit-Learn, Java, AWS