LLM Engineer Remote Jobs in Florida (US)
This page tracks remote llm engineer openings that are location-eligible for Florida.
This page tracks remote llm engineer openings that are location-eligible for Florida.
Open jobs
93
Hiring companies this week
5
Salary sample
$72,960 - $150,000
Jobs added last hour
0
93 Jobs
47 Companies
Role Description Throughput. Latency. KV cache utilization. Move those three numbers in the right direction, and two things happen: customers get faster, cheaper inference, and our margins improve. That's the entire thesis of this role. Every kernel you tune, every quantization scheme you ship, every scheduler tweak you land shows up directly in a customer's p99 and on our P&L. This is a high-impact seat. It is also a high-autonomy seat as you'll be given the room to lead the technical direction of inference optimization at Kimchi, not execute someone else's roadmap. The problem: running LLMs in production is a moving target. The "right" model and serving configuration for a workload depend on: - Traffic shape - Sequence-length distribution - Batch dynamics - GPU SKU - Memory bandwidth - Quantization tolerance - A dozen other variables that shift week to week Most teams pick a model once, over-provision GPUs, and absorb the cost. Kimchi is the system that makes that decision automatically - continuously matching workloads to the most cost-efficient, best-performing LLM and serving configuration on a customer's infrastructure. We're building the optimization layer between the model and the hardware, and we need engineers who understand both sides deeply. Qualifications - 5+ years building real ML systems, with a portfolio that shows depth in inference or training infrastructure (not just model training notebooks). - Strong Python - production services, not scripts. - Hands-on experience with at least one of vLLM, SGLang, or TensorRT-LLM, and a working mental model of why an inference engine performs the way it does on a given GPU. - Fluency with quantization tradeoffs - you've measured quality regressions, not just compression ratios. - Comfort with distributed systems: collective communication, sharding strategies, and the practical failure modes of multi-GPU and multi-node setups. - A bias toward measurement. You instrument before you optimize, and you can tell the difference between a real win and a benchmark artifact. - Self-direction. This role comes with a wide mandate; you should be excited by that, not unsettled by it. Requirements - Push throughput. Continuous batching, speculative decoding, chunked prefill, kernel-level tuning across vLLM, SGLang, and TensorRT-LLM. Find the ceiling on each GPU SKU, then raise it. - Cut latency. Attack TTFT and TPOT separately. Profile, identify the actual bottleneck (compute, memory bandwidth, scheduling, networking), and fix it - not the bottleneck you assumed. - Get more out of the KV cache. Paged attention, prefix caching, eviction policies, cache reuse across requests, quantized KV. This is where a lot of the unrealized throughput lives, and it's an area you'll own. - Quantize without regressing quality. INT8, INT4, FP8 across weights, activations, and KV. Empirical work: measure quality on real workloads, not just perplexity benchmarks. - Shrink cold starts and memory footprint. Faster init, smarter weight loading, tighter memory accounting - the difference between a model that scales and one that doesn't. - Scale across nodes. Distributed inference topologies, network-aware placement, checkpointing strategies that don't bottleneck on storage or interconnect. - Set the technical direction. Decide what we benchmark, what we adopt, and what we build ourselves. Bring the team along with strong writeups and reproducible experiments. Benefits - Competitive salary (depending on the level of experience). - Enjoy a flexible, remote-first global environment. - Collaborate with a global team of cloud experts and innovators, passionate about pushing the boundaries of Kubernetes technology. - Equity options. - Get quick feedback with a fast-paced workflow. Most feature projects are completed in 1 to 4 weeks. - Spend 10% of your work time on personal projects or self-improvement. - Learning budget for professional and personal development - including access to international conferences and courses that elevate your skills. - Annual hackathon to spark new ideas and strengthen team bonds. - Team-building budget and company events to connect with your colleagues. - Equipment budget to ensure you have everything you need. - Extra days off to help maintain a healthy work-life balance. Hiring process - Screening call with Recruiter - Hiring Manager interview - Technical interview (system design) - Live coding - Culture Check interview with an executive As part of our standard hiring process, we would like to inform you that a background check may be conducted at the final stage of recruitment through our third-party provider, Checkr. Please note that Cast AI does not provide any form of visa sponsorship/work permit.
Role Description Throughput. Latency. KV cache utilization. Move those three numbers in the right direction, and two things happen: customers get faster, cheaper inference, and our margins improve. That's the entire thesis of this role. Every kernel you tune, every quantization scheme you ship, every scheduler tweak you land shows up directly in a customer's p99 and on our P&L. This is a high-impact seat. It is also a high-autonomy seat as you'll be given the room to lead the technical direction of inference optimization at Kimchi, not execute someone else's roadmap. The problem: running LLMs in production is a moving target. The "right" model and serving configuration for a workload depend on: - Traffic shape - Sequence-length distribution - Batch dynamics - GPU SKU - Memory bandwidth - Quantization tolerance - A dozen other variables that shift week to week Most teams pick a model once, over-provision GPUs, and absorb the cost. Kimchi is the system that makes that decision automatically - continuously matching workloads to the most cost-efficient, best-performing LLM and serving configuration on a customer's infrastructure. We're building the optimization layer between the model and the hardware, and we need engineers who understand both sides deeply. Qualifications - 5+ years building real ML systems, with a portfolio that shows depth in inference or training infrastructure (not just model training notebooks). - Strong Python - production services, not scripts. - Hands-on experience with at least one of vLLM, SGLang, or TensorRT-LLM, and a working mental model of why an inference engine performs the way it does on a given GPU. - Fluency with quantization tradeoffs - you've measured quality regressions, not just compression ratios. - Comfort with distributed systems: collective communication, sharding strategies, and the practical failure modes of multi-GPU and multi-node setups. - A bias toward measurement. You instrument before you optimize, and you can tell the difference between a real win and a benchmark artifact. - Self-direction. This role comes with a wide mandate; you should be excited by that, not unsettled by it. Requirements - Push throughput. Continuous batching, speculative decoding, chunked prefill, kernel-level tuning across vLLM, SGLang, and TensorRT-LLM. Find the ceiling on each GPU SKU, then raise it. - Cut latency. Attack TTFT and TPOT separately. Profile, identify the actual bottleneck (compute, memory bandwidth, scheduling, networking), and fix it - not the bottleneck you assumed. - Get more out of the KV cache. Paged attention, prefix caching, eviction policies, cache reuse across requests, quantized KV. This is where a lot of the unrealized throughput lives, and it's an area you'll own. - Quantize without regressing quality. INT8, INT4, FP8 across weights, activations, and KV. Empirical work: measure quality on real workloads, not just perplexity benchmarks. - Shrink cold starts and memory footprint. Faster init, smarter weight loading, tighter memory accounting - the difference between a model that scales and one that doesn't. - Scale across nodes. Distributed inference topologies, network-aware placement, checkpointing strategies that don't bottleneck on storage or interconnect. - Set the technical direction. Decide what we benchmark, what we adopt, and what we build ourselves. Bring the team along with strong writeups and reproducible experiments. Benefits - Competitive salary (depending on the level of experience). - Enjoy a flexible, remote-first global environment. - Collaborate with a global team of cloud experts and innovators, passionate about pushing the boundaries of Kubernetes technology. - Equity options. - Get quick feedback with a fast-paced workflow. Most feature projects are completed in 1 to 4 weeks. - Spend 10% of your work time on personal projects or self-improvement. - Learning budget for professional and personal development - including access to international conferences and courses that elevate your skills. - Annual hackathon to spark new ideas and strengthen team bonds. - Team-building budget and company events to connect with your colleagues. - Equipment budget to ensure you have everything you need. - Extra days off to help maintain a healthy work-life balance. Hiring process - Screening call with Recruiter - Hiring Manager interview - Technical interview (system design) - Live coding - Culture Check interview with an executive As part of our standard hiring process, we would like to inform you that a background check may be conducted at the final stage of recruitment through our third-party provider, Checkr. Please note that Cast AI does not provide any form of visa sponsorship/work permit.
Top world’s largest social discovery company uniting 70+ brands with 500M+ users
• Conducting experiments with LLMs: Explore and analyze the effectiveness of different architectures and techniques (SFT, RLHF, Adapters, etc.) to enhance the capabilities of AI models. • Developing and implementing evaluation methodologies: Implement and maintain robust frameworks to assess the performance, accuracy, and user satisfaction of AI bots, including offline and online metrics. • Optimizing models for inference: Improve the efficiency and speed of AI models during inference to ensure they meet the performance and scalability requirements for production environments. • Collaborating with cross-functional teams: Work closely with data scientists, software engineers, and product managers to integrate AI solutions into the overall product pipeline.
IT services that put people at the center of your business
Role Description The AI Infrastructure Datacenter Technical Project Manager Level 2 serves as a senior project leader responsible for managing large-scale AI infrastructure programs, complex technical deployments, and cross-functional strategic initiatives. This role drives execution excellence across compute, GPU, storage, networking, and data center infrastructure domains while ensuring alignment with business and operational objectives. Key Responsibilities - Lead large-scale AI infrastructure deployment programs across multiple sites, regions, or business units. - Drive end-to-end project execution for GPU clusters, AI compute environments, storage platforms, high-speed networks, and data center infrastructure. - Develop integrated project plans, implementation strategies, and operational readiness frameworks. - Manage cross-functional coordination between engineering, operations, supply chain, vendors, and executive stakeholders. - Identify and mitigate program risks, schedule impacts, technical dependencies, and operational constraints. - Lead infrastructure migration, expansion, upgrade, and modernization initiatives. - Drive governance reviews, project reporting, KPI tracking, and executive-level communications. - Coordinate infrastructure acceptance testing, deployment validation, and production readiness activities. - Mentor junior project managers and contribute to PMO process standardization and operational maturity. - Support vendor negotiations, technical evaluations, and infrastructure planning initiatives. Scope & Complexity - Leads highly complex infrastructure programs with multiple concurrent workstreams. - Manages enterprise-scale AI infrastructure deployments and operational initiatives. - Influences program execution standards, governance models, and delivery methodologies. Qualifications - Advanced understanding of AI infrastructure technologies including GPU platforms, storage systems, networking, and data center operations. - 8+ years of technical project or program management experience within infrastructure environments. - Proven experience leading large-scale infrastructure deployment or transformation programs. - Strong risk management, executive communication, and stakeholder alignment skills. - Experience coordinating multi-vendor and cross-functional technical teams. - Ability to manage complex schedules, budgets, and operational dependencies. - Relevant certifications preferred (PMP, PgMP, ITIL, Agile, CCNA, etc.). Requirements - Salary Range: $72,960.00 - $100,800.00 USD (Salary) - Please note that the salary information provided herein is base pay only (gross); it does not include other forms of compensation which may or may not apply to this specific position, namely, performance-based bonuses, benefits-related payments, or other general incentives - none of which are guaranteed, may be subject to specific eligibility requirements, and are wholly within the discretion of Astreya to remit. - Further, the salary information noted above is a range that consists of a minimum and maximum rate of pay for this specific position. Where an applicant or employee is placed on this range will depend and be contingent on objective, documented work-related considerations like education, experience, certifications, licenses, preferred qualifications, among other factors. Benefits - Medical provided through UHC (PPO, HSA, Surest options) - Medical provided through Kaiser (HMO option only) for California employees only - Dental provided through UHC - Nationwide Vision provided by UHC - Flexible Spending Account for Health & Dependent Care - Pre-Tax Account for Commuter Benefit/Parking & Transit (location-specific) - Continuing Education and Professional Development via various integrated platforms, e.g. Udemy and Coursera - Corporate Wellness Program provided by Goomi Group - Employee Assistance Program - Wellness Days - 401k Plan - Basic and Supplemental Life Insurance - Short Term & Long Term Disability - Critical Illness, Critical Hospital, and Voluntary Accident Insurance - Tuition Reimbursement (available 6 months after start date, capped) - Paid Time Off (accrued and prorated, maximum of 120 hours annually) - Paid Holidays - Any other statutory leaves, paid time, or other ancillary benefits required under state and federal law
Purpose-built connectivity solutions for intelligent systems
• Lead and manage global AI, Storage, and Networking hardware design programs, ensuring on-time delivery, scope control, and budget adherence • Drive program governance, risk management, and execution excellence across all phases of product development • Provide regular program updates, risk assessments, and financial reporting to executive leadership through structured reviews (e.g., Leadership Program Reviews) • Oversee the successful launch of complex hardware platforms, including AI GPU-based systems • Manage high-priority, resource-constrained programs while maintaining quality and schedule commitments • Enable innovation in next-generation AI infrastructure and high-performance computing platforms • Lead end-to-end program management for UALink / PCIe Gen6 switch tray development supporting rack-scale AI platforms and GPU clusters • Coordinate design, validation, and manufacturing readiness of switch trays across EVT, DVT, and PVT phases • Drive integration of switch silicon, retimer cards, cabling, and system-level connectivity within full rack-scale architectures • Collaborate with ODMs and partners to align on design specifications, cost models, and development schedules • Manage technical trade-offs across performance, signal integrity, power delivery, thermals, and scalability for high-density GPU deployments • Partner with Tier-1 customers to deliver Joint Design Manufacturing (JDM) programs • Ensure alignment with customer-specific requirements in design, supply chain, and quality • Build strong customer relationships and act as a trusted advisor throughout the product lifecycle • Lead globally distributed teams across engineering (HW/SW), supply chain, manufacturing, and quality organizations • Coordinate teams across multiple regions (e.g., North America and Asia) to drive seamless execution • Guide programs through EVT, DVT, and PPVT phases, ensuring technical validation across electrical, thermal, power, signal integrity, and mechanical domains
• ARMRA is seeking a Sr. Director of AI Infrastructure to own, build, and scale the foundational AI infrastructure across the entire organization. • As ARMRA's dedicated AI leader, you will be the single owner of technical standards, governance, and the scalable architecture that connects all of ARMRA's business systems into a unified intelligent automation layer. • This is a critical, high-visibility, company-wide role. • You will inherit that foundation and take full ownership of it — driving improvements, closing gaps, implementing new capabilities, and expanding the infrastructure as the business evolves. • You will own and continuously develop a uniform, resilient, and high-performance AI playbook that serves every function of the business equally — from Retail and Operations to Marketing, Finance, and Customer Experience. • You understand that the best AI architecture is not about training models — it's about connecting systems, enforcing governance, and making intelligent automation reliable enough to trust across every surface of the business.
• Own Messaging & Positioning • Develop and refine SubQ’s core messaging, positioning, narrative, and category strategy across products, research initiatives, and GTM efforts. • Translate Complex Technical Concepts • Work closely with researchers and engineers to deeply understand complex AI infrastructure concepts and turn them into clear, credible, and differentiated messaging for technical audiences. • Drive Product Launches • Lead messaging and GTM support for product launches, benchmark releases, research announcements, demos, and technical campaigns. • Shape the Company Narrative • Help define how SubQ communicates its long-term vision, technical differentiation, and market positioning within the broader AI ecosystem. • Partner Across Teams • Collaborate cross-functionally with engineering, research, leadership, recruiting, GTM, and content teams to ensure consistency and clarity across all external communication. • Support Founder-Led Content • Partner with leadership on high-impact thought leadership, technical commentary, social content, conference materials, and strategic communications. • Understand the Market • Track trends across AI/ML, LLMs, inference, agents, long-context systems, developer tooling, open-source ecosystems, and frontier AI research to inform positioning and messaging strategy. • Develop Technical GTM Strategy • Help shape developer-facing and technical GTM initiatives including launch strategy, content strategy, competitive positioning, and ecosystem engagement.
Building Advance AI & Cloud Native Software Using The "Virtual Silicon Valley" Model. Let’s Talk AI, Cloud and Outcomes.
• Develop and deploy ML models and LLM-based solutions for product outcome prediction • Build workflows where users input product details and receive predictive insights • Enhance an existing AI agent into a decision-making/prediction system • Work with product and assortment data for feature engineering and modeling • Integrate LLM capabilities to improve reasoning and contextual understanding • Continuously evaluate and improve model performance
Founded in 1981, Infosys is an information technology and services company providing consulting, outsourcing, technology, and next-generation services to client
• Research, design, develop, and deploy innovative AI/ML and Generative AI solutions across various mission-focused problem sets • Support development of advanced AI capabilities using technologies such as large language models and cloud-native AI services including AWS Bedrock • Participate in all phases of the software engineering lifecycle, including research, requirements analysis, solution design, model development, integration, deployment, evaluation, and testing
• Oversee day-to-day management, reliability, and performance of corporate IT systems and endpoints • Manage and scale cloud-first IT infrastructure (identity, device management, collaboration tools) • Own lifecycle management of employee hardware, software, and access • Support and manage internal AI tooling and infrastructure used across the company • Partner with Engineering and Security to ensure responsible, secure, and scalable use of AI systems • Evaluate and implement AI-driven tools to improve internal productivity and automation • Own SaaS application ecosystem (e.g., Microsoft 365, Slack, Zoom, etc.) • Design and automate onboarding/offboarding workflows, including provisioning and deprovisioning access • Ensure proper access controls, least-privilege principles, and audit readiness across systems • Maintain integrations between identity providers (Entra ID/Okta) and downstream applications • Manage and maintain office IT infrastructure, including networking, Wi-Fi, conferencing systems, and physical IT environments • Ensure secure and reliable connectivity for both in-office and hybrid employees • Drive and support security and compliance initiatives (e.g., DPO, TIA, SOC 2 readiness) • Build and maintain clear documentation and scalable IT support processes
83more opportunities are still waiting for you.Log in now and take your next shot before someone else does.
Cloud, AWS, Python, PyTorch, Scikit-Learn, Azure