Model Serving Engineer

EngineerEngineerFull Time Remote Mid Level Company Site

Location

United States

Posted

4 days ago

Salary

$100K - $150K / year

Seniority

Mid Level

AI/ML AI Observability/Monitoring Distributed Systems Python Rust C++LLM Kubernetes

Job Description

Role Description We are seeking a Model Serving Engineer to design, build, and operate high-performance, highly reliable inference platforms for serving large machine learning models in production. The role focuses on the systems engineering side of AI deployment, including: - Request routing - Batching - Caching - Autoscaling - GPU utilization - End-to-end observability across diverse model workloads The ideal candidate brings strong distributed systems and performance engineering expertise, has shipped serving systems at scale, and understands the trade-offs between latency, throughput, cost, and quality in ML serving. Qualifications - Bachelor’s or Master’s degree in Computer Science or a related field - Six or more years of experience in distributed systems, infrastructure, or ML platform engineering - Strong proficiency in Python and a systems language such as Go, Rust, or C++ - Deep experience operating high-throughput, low-latency services in production - Hands-on experience with LLM or large model inference frameworks such as vLLM or TensorRT-LLM - Strong understanding of GPU architecture, memory hierarchies, and accelerator utilization - Familiarity with Kubernetes, autoscaling, and modern cloud platforms - Experience with observability stacks including metrics, tracing, and structured logging - Solid grounding in performance engineering and capacity planning - Strong communication and incident response skills Requirements - Design and operate model serving platforms supporting diverse workloads including LLMs, vision models, and recommendation systems - Optimize inference performance using continuous batching, paged attention, speculative decoding, and request multiplexing - Implement multi-tenant routing, rate limiting, and quality-of-service policies across model endpoints - Build autoscaling and capacity management systems that balance latency, throughput, and cost - Tune GPU utilization, memory management, and KV cache strategies for LLM serving workloads - Integrate model serving with API gateways, identity systems, and observability platforms - Implement caching, prompt deduplication, and response reuse strategies where appropriate - Drive end-to-end observability including latency histograms, queue dynamics, GPU utilization, and error tracking - Develop deployment workflows including canary releases, shadow testing, and automated rollback - Operate incident response for high-availability AI services and drive durable reliability improvements - Collaborate with ML and product teams to support new model releases and capability rollouts - Implement security controls including request signing, content filtering, and abuse detection at the serving layer - Document operational procedures, performance characteristics, and tuning guidance for internal teams - Stay current with AI serving research and translate advances into production capabilities Benefits - Competitive base salary commensurate with experience - Full-time, direct W2 employment with Bright Vision Technologies - 100% remote work opportunity

Related Categories

Engineer

Related Job Pages

Remote Full-time Jobs (US)Remote Python Jobs (US)More Remote Jobs

More Engineer Jobs

Senior Deployment Engineer

Derq

Intelligent transportation systems for smarter, safer roads

Engineer4 days ago

Full Time RemoteTeam 11-50Since 2016H1B No Sponsor

Company Site LinkedIn

Role Description We are looking for a Senior Deployment & Configuration Engineer to lead the setup, configuration, deployment, and reliability of our software and hardware solutions across client projects. In this role, you will: - Own complex deployments, manage server and system configurations, validate production readiness, monitor live systems, and troubleshoot technical issues after go-live. - Work closely with Project Management, Software Systems, and DevOps teams to improve deployment quality, standardize configurations, and keep performance and uptime high. - Support process improvements, documentation, and knowledge transfer to help scale delivery across projects. Key Responsibilities - Configure and deploy Derq’s edge and cloud‑based traffic safety solutions across customer sites. - Use Derq’s Deployment Manager and internal tools to onboard new sites and configure intersection layouts, detection zones, event mappings, cameras, sensors, traffic controllers, and V2X integrations. - Support deployment planning and readiness, coordinating configuration activities, identifying risks early, and ensuring prerequisites are completed prior to go‑live. - Act as a senior technical escalation point for deployment and configuration issues, working closely with Engineering, Product, Project Management, and Support teams. - Troubleshoot complex technical issues across Linux servers, networking, sensors, cameras, traffic controllers, cloud connectivity, and software deployments. - Monitor live systems, investigate alerts, support incident response, and drive root cause analysis for recurring operational issues. - Develop scripts, automation, and tooling improvements to reduce manual effort and improve deployment quality, consistency, and speed. - Contribute to GitHub repositories, including configuration files, deployment workflows, and internal tooling. - Create and maintain deployment guides, runbooks, troubleshooting documentation, and configuration best practices. - Support knowledge sharing across the team and help improve the consistency, quality, and scalability of Derq’s deployment operations. Qualifications - 5–7 years of experience in deployment, configuration, systems engineering, or a related technical role. - Strong hands‑on experience with Linux server administration, including troubleshooting, log analysis, service monitoring, and performance diagnostics. - Solid networking knowledge, including TCP/IP, routing, DNS, DHCP, NAT, VPNs, VLANs, firewalls, and port forwarding. - Experience deploying, configuring, and supporting production systems in customer environments. - Hands‑on scripting experience with Python and Bash/Shell to automate operational tasks and improve deployment efficiency. - Experience working with Git and GitHub workflows, including commits, pull requests, and version control best practices. - Good understanding of cloud and infrastructure concepts, particularly AWS, Docker, and CI/CD pipelines. - Experience with edge computing, IoT systems, connected devices, video analytics, or hardware‑integrated software solutions. - Strong ability to troubleshoot across software, infrastructure, networking, and connected devices. - Experience creating technical documentation, including runbooks, configuration guides, and troubleshooting procedures. - Experience with ITS, traffic controllers, SPaT/MAP, V2X, smart city deployments, or traffic management systems is a strong plus.

Linux GitHub Observability/Monitoring TCP/IP DNS Firewalls Python Shell Git AWS Docker CI/CD IoT

View details: Senior Deployment Engineer

Brazil

Apply

Delivery Engineer – Data

Ollion

Ollion is the global, born-in-the-cloud consultancy working together to unify business-shaping tech for good.

Engineer4 days ago

Full Time RemoteTeam 501-1,000Since 2023H1B No Sponsor

Company Site LinkedIn

• Collaborate with business and technical leaders to design and build robust data engineering solutions • Implement scalable ELT/ETL pipelines, cloud data warehouse structures, and dimensional data models • Design and implement intuitive, enterprise-grade dashboards and semantic models using Power BI • Utilize advanced proficiency in SQL and Python for complex data manipulation and workflow automation • Partner with project leads to enforce rigorous security best practices • Develop and maintain comprehensive technical documentation for data flows, schemas, and operational processes • Rapidly master cutting-edge, cloud-native technologies and acquire deep domain expertise across diverse client industries

Amazon Redshift AWS Azure BigQuery Cloud ETL Google Cloud Platform Python SQL

View details: Delivery Engineer – Data

Texas

$70K - $85K / year

Apply

Engineering Manager

QuinStreet

QuinStreet offers a decentralized online marketplace that empowers consumers by matching them with brands that meet their needs. A leader among “research and compare” networks,

Engineer4 days ago

Full Time Remote

Role Description We are seeking an Engineering Manager to lead the technical side of our platform engineering, integrations, and customer delivery experience. This role is critical to helping customers successfully integrate with our SaaS platform while enabling cross-functional teams to execute on complex, technical initiatives. As an Engineering Manager, you will act as the primary technical leader across Engineering, Sales, Business Development, Account Management, and Operations. You will drive platform capability, provide real-time technical guidance, and elevate documentation and processes as we scale. This role is ideal for someone who enjoys being collaborative, deeply technical, and highly solution-oriented—while also influencing how the engineering organization operates internally. Responsibilities - Technical Leadership & Delivery - Partner with Sales and Business Development as the technical lead throughout the sales cycle and delivery process. - Join customer calls to explain API functionality, sandbox environments, data formats, authentication, error handling, and platform constraints. - Lead technical reviews and tailored solution discussions aligned to customer and product requirements. - Answer complex technical questions live to remove blockers and accelerate execution. - Provide detailed post-call follow-ups including technical summaries, architecture diagrams, and next steps. - Platform & API Subject Matter Expertise - Serve as the internal and external SME on our SaaS platform, APIs, integrations, and supported workflows. - Advise customers and internal teams on best-practice integration approaches and implementation strategies. - Support Business Development by clearly communicating carrier availability, supported programs, and technical feasibility. - Collaborate with Engineering to confirm technical requirements, edge cases, and delivery timelines. - Integration Support & Technical Triage - Act as an escalation point for integration-related issues, reducing friction across Account Management, Operations, and Engineering. - Help define, support, and improve SLAs for technical tickets and customer integrations. - Identify recurring integration issues and drive root-cause resolution with Engineering. - Streamline technical workflows to reduce back-and-forth and improve time-to-resolution. - Documentation, Enablement & Scale - Create and maintain high-quality technical documentation, including FAQs, integration guides, and implementation resources. - Analyze common integration questions and convert them into scalable documentation and knowledge-base content. - Improve internal enablement by aligning Sales, AMs, and Ops on accurate technical messaging and expectations. - Cross-Functional Leadership - Partner closely with Engineering to align technical realities with customer-facing messaging. - Provide feedback on product gaps, integration friction, and roadmap risks. - Support clearer communication around product changes, launch timelines, and roadmap updates. Qualifications - 5+ years of experience as an Engineering Manager, Solutions Engineer, or Technical Lead in a SaaS environment. - Strong experience with APIs, integrations, and technical system design. - Proven ability to explain complex technical concepts to both technical and non-technical audiences. - Experience supporting customer integrations, implementations, or onboarding. - Excellent written and verbal communication skills, including technical documentation. - Highly organized, proactive, and able to operate independently in ambiguous environments. Requirements - Experience in API-first, fintech, insurtech, or highly regulated SaaS platforms. - Familiarity with carrier integrations or complex third-party ecosystems. - Experience creating presales architecture diagrams or technical enablement materials. - Exposure to ticketing systems and SLA-driven support models. Benefits - The expected salary range for this position is $130,000 USD to $155,000 USD annually. - This salary range is an estimate, and the actual salary may vary based on the Company's compensation practices. - The salary may be adjusted based on applicant's geographic location. - The position is also eligible to receive performance bonus or commission and equity in the form of restricted stock units. - This position is eligible to participate in the Company's standard employee benefits programs, which currently include health care benefits, retirement benefits, paid days off, and any other tax-reportable benefits. Company Description QuinStreet is an equal opportunity employer. We do not discriminate on the basis of race, color, religion, national origin, pregnancy status, sex, age, marital status, disability, sexual orientation, gender identity or any other characteristics protected by law. Please see QuinStreet’s Employee Privacy Notice here.

View details: Engineering Manager

United States

$130K - $155K / year

Apply

Forward Deployed Engineer

Sutherland

We make digital 𝐡𝐮𝐦𝐚𝐧™ #MakeDigitalHuman

Engineer4 days ago

Full Time RemoteTeam 10,001+Since 1986H1B Sponsor

Company Site LinkedIn

• Partner directly with enterprise customers to understand business challenges and identify AI opportunities. • Design, develop, and deploy AI-powered applications using Google Cloud and Generative AI technologies. • Build solutions using LLMs, RAG, AI agents, APIs, and enterprise data platforms. • Develop scalable backend services, integrations, and cloud-native applications. • Lead technical workshops, architecture discussions, and solution design sessions with customers. • Rapidly prototype, test, and iterate on AI use cases to drive business value. • Collaborate with product managers, data scientists, and engineering teams to deliver production-ready solutions. • Ensure solutions meet standards for scalability, security, reliability, and performance. • Support customer deployments and serve as a trusted technical advisor throughout project delivery. • Stay current with emerging AI and cloud technologies and help drive innovation across the organization.

BigQuery Cloud Distributed Systems Google Cloud Platform Java JavaScript Microservices Python Go

View details: Forward Deployed Engineer

India

Apply

Model Serving Engineer

Job Description

Related Guides

Related Categories

Related Job Pages

More Engineer Jobs

Senior Deployment Engineer

Delivery Engineer – Data

Engineering Manager

Forward Deployed Engineer