Job Closed

This listing is no longer active.

Model Serving Engineer

Location

United States

Posted

10 days ago

Salary

100K - 150K / year

Seniority

Mid Level

Job Description

Model Serving Engineer

Bright Vision Technologies

Role Description We are seeking a Model Serving Engineer to design, build, and operate high-performance, highly reliable inference platforms for serving large machine learning models in production. The role focuses on the systems engineering side of AI deployment, including: - Request routing - Batching - Caching - Autoscaling - GPU utilization - End-to-end observability across diverse model workloads The ideal candidate brings strong distributed systems and performance engineering expertise, has shipped serving systems at scale, and understands the trade-offs between latency, throughput, cost, and quality in ML serving. Qualifications - Bachelor’s or Master’s degree in Computer Science or a related field - Six or more years of experience in distributed systems, infrastructure, or ML platform engineering - Strong proficiency in Python and a systems language such as Go, Rust, or C++ - Deep experience operating high-throughput, low-latency services in production - Hands-on experience with LLM or large model inference frameworks such as vLLM or TensorRT-LLM - Strong understanding of GPU architecture, memory hierarchies, and accelerator utilization - Familiarity with Kubernetes, autoscaling, and modern cloud platforms - Experience with observability stacks including metrics, tracing, and structured logging - Solid grounding in performance engineering and capacity planning - Strong communication and incident response skills Requirements - Design and operate model serving platforms supporting diverse workloads including LLMs, vision models, and recommendation systems - Optimize inference performance using continuous batching, paged attention, speculative decoding, and request multiplexing - Implement multi-tenant routing, rate limiting, and quality-of-service policies across model endpoints - Build autoscaling and capacity management systems that balance latency, throughput, and cost - Tune GPU utilization, memory management, and KV cache strategies for LLM serving workloads - Integrate model serving with API gateways, identity systems, and observability platforms - Implement caching, prompt deduplication, and response reuse strategies where appropriate - Drive end-to-end observability including latency histograms, queue dynamics, GPU utilization, and error tracking - Develop deployment workflows including canary releases, shadow testing, and automated rollback - Operate incident response for high-availability AI services and drive durable reliability improvements - Collaborate with ML and product teams to support new model releases and capability rollouts - Implement security controls including request signing, content filtering, and abuse detection at the serving layer - Document operational procedures, performance characteristics, and tuning guidance for internal teams - Stay current with AI serving research and translate advances into production capabilities Benefits - Competitive base salary commensurate with experience - Benefits package How to Apply For immediate consideration, please send your resume to [email protected] or contact us at (908) 676-4399. Learn more about Bright Vision Technologies at www.bvteck.com .

Related Categories

Related Job Pages

More Engineer Jobs

Okta logo

Demo Engineer

Okta

The World's Identity Company

Engineer10 days ago
Full TimeRemoteTeam 5,001-10,000Since 2010H1B Sponsor

Role Description Demo Engineering sits at the intersection of go-to-market and product, serving as a global accelerator for Okta's technical sales success. Our team provides our internal and partner technical sellers worldwide with the capabilities to quickly and securely showcase how Okta secures identity for employees, customers, and AI agents. As a member of this team, you'll have a global impact on Okta’s go-to-market technology strategy: the demo components and customer experiences you build will be used by hundreds of Solution Engineers across dozens of countries, reaching thousands of customers and prospects. You'll work in a highly collaborative, engineering-driven culture where we prioritize reusability, automation, and self-service enablement. We're building the platform and tooling that makes Okta's field organization more effective, scalable, and customer-focused. What you get to do in this role: - Design and build reusable demonstration assets that encapsulate product configurations used across multiple demonstrations for solution-oriented outcomes. - Work directly with field teams to understand customer perspectives to capture their expectations, preferences, and aversions in a “Voice of the Customer.” - Work with Field Readiness team to drive adoption of the Demo Platform within the go-to-market organization. - Build analytics and reporting to track demo usage and effectiveness. - Collaborate with Product Managers and Engineering to prepare assets to support demonstrations of new product introductions. - Participate in the release planning process to influence the product direction based on customer feedback. - Create supporting documents for demos like technical demonstration guides and video examples. Qualifications - Bachelor’s degree or equivalent experience. - 5+ years of developer experience with Enterprise SaaS products. - Full Stack development experience: React.js, Node.js, AWS Serverless patterns (Lambda, DynamoDB, SQS and SNS). - Experience designing and building for multi-tenancy and tenant isolation. - Experience with infrastructure-as-code or configuration management tools. - API integration experience for connecting applications to identity platforms. - Have working technical knowledge of digital identity and authentication (OAuth, OIDC & SAML). Requirements - Global first: comfortable working in and supporting a globally distributed team. - Self-directed and proactive: identifies opportunities to improve and drives solutions without waiting for direction. - Collaborative: thrives in cross-functional environments working with field teams, product, and engineering. - Systems thinker: balances immediate needs with long-term scalability and reusability. - Customer-centric approach: ability to gather and synthesize "Voice of Customer" feedback into actionable requirements. Nice to Have - Experience with demo engineering teams or technical marketing. - Experience with presales. - Experience with Okta or Auth0. Benefits - The OTE range for this position for candidates located in the San Francisco Bay area is between $179,000 — $246,000 USD. - The annual OTE range for this position for candidates located in California (excluding San Francisco Bay Area), Colorado, Illinois, New York, and Washington is between $160,000 — $220,000 USD. - Okta offers equity (where applicable) and benefits, including health, dental and vision insurance, 401(k), flexible spending account, and paid leave (including PTO and parental leave) in accordance with our applicable plans and policies.

United States
$160K - $246K / year
Job Closed

Role Description We are looking for an Observability Engineer to design and operate the metrics, logging, tracing, and alerting platforms that give engineering teams confidence in the systems they run. The role spans the full observability stack — from collection agents and pipelines to long-term storage, dashboards, and alerting workflows — with a strong focus on usability, signal quality, and operational ROI. The ideal candidate has built and operated observability platforms at scale, understands the trade-offs between open-source and SaaS approaches, and can translate noisy telemetry into actionable insight for both engineers and business stakeholders. Key Responsibilities - Design and operate enterprise-grade observability platforms covering metrics, logs, traces, events, and synthetic monitoring. - Architect Prometheus / Thanos / Mimir, Grafana, Loki, Tempo, OpenTelemetry, and Datadog deployments for high availability and scale. - Develop standards for service instrumentation, including OpenTelemetry adoption, metric naming, label cardinality, and structured logging conventions. - Define and enforce SLOs, SLIs, and error budgets, and build the dashboards and alerts that operationalize them. - Build alerting strategies that minimize noise, surface actionable signals, and integrate cleanly with on-call workflows in PagerDuty, Opsgenie, or similar tools. - Operate large-scale time-series and log storage platforms, balancing retention, query performance, and cost. - Design distributed tracing pipelines and help teams use traces to diagnose latency and reliability issues. - Develop self-service tooling, paved-road libraries, and templates that make adoption of observability standards easy for product teams. - Drive cost management and label-cardinality discipline across the observability estate. - Lead incident response readiness improvements through better dashboards, alerting hygiene, and post-incident analysis tooling. - Partner with SRE and platform teams to integrate observability into deployment pipelines, canary analysis, and progressive delivery workflows. - Evaluate and recommend observability vendors and open-source tools based on cost, capability, and operational maturity. - Mentor engineering teams on observability fundamentals, debugging techniques, and SLO-driven operations. - Maintain documentation, onboarding guides, and runbooks for the observability platform. Qualifications - Bachelor’s degree in Computer Science or a related field. - Five or more years of experience in SRE, platform engineering, or observability roles. - Deep hands-on experience with Prometheus, Grafana, and at least one major commercial observability platform such as Datadog, New Relic, or Splunk. - Strong understanding of OpenTelemetry, distributed tracing, and structured logging. - Proficiency in at least one general-purpose language such as Go, Python, or Java. - Experience operating high-cardinality, high-throughput metrics and log pipelines. - Strong understanding of SLOs, error budgets, and SRE principles. - Experience integrating observability with CI/CD and incident management tooling. - Solid grasp of Linux internals, networking, and container platforms. - Excellent communication and collaboration skills. Preferred Qualifications - Experience with Thanos, Mimir, Cortex, Loki, or Tempo at scale. - Contributions to OpenTelemetry or observability open-source projects. - Familiarity with eBPF-based observability tooling. - Experience driving observability cost optimization initiatives. - Exposure to regulated environments with audit-grade logging requirements. How to Apply Would you like to know more about this opportunity? For immediate consideration, please send your resume to [email protected] . Learn more about Bright Vision Technologies at www.bvteck.com .

United States
$100K - $150K / year
Job Closed

• Lead a two-person fire damper and grease extraction team in Reading/Slough • Carry out fire damper inspections and drop testing • Clean commercial kitchen extract systems to TR19 Grease standards • Identify and report defects • Ensure all works are completed safely and to a high professional standard • Manage team productivity and maintain health & safety • Complete accurate reports with photographic evidence

United Kingdom
£32.5K / year
Ericsson logo

ESN Engineer - Isle of Skye

Ericsson

We create limitless connectivity to improve lives, redefine business and pioneer a sustainable future. #ImaginePossible

Engineer10 days ago
Full TimeRemoteTeam 10,001+Since 1876H1B Sponsor

Join our Team Applications are open to candidates from the Isle of Skye area. Salary: £33,000-£36,000 per annum + allowances About this opportunity: As an ESN Engineer, you will own a designated group of sites, working as part of a four-person team covering adjacent territories. The role requires you to be self-managing, proactively maintaining your sites and collaborating closely with other ESN engineers and the wider FSO team to solve problems, with the goal of boosting network resilience and network availability.This position requires a flexible work schedule, including participation in a 24/7 on call rota and potential shift work to handle tasks outside of normal business hours. You must be prepared to travel and stay away from home for service recovery mainly on the remote islands. In a major network event scenario, you should be available to lead and complete the recovery in your own and adjoining areas This role is based in Scotland and reports directly to the Regional Manager. We currently have vacancies in Isle of Skye! Applicants should already reside in, or be willing to relocate to the Isle of Skye area. What you will do:- Conduct corrective maintenance activities and manage site facility operations.- The role requires you to take full responsibility for assigned Emergency Service Network (ESN) Sites, proactively maintaining them to prevent failures and leading the coordination with other technical team to resolve any issues that arise- You must also take a leadership role in problem management for your area resolving major issues in your own and neighbouring areas- Work individually, as part of the FSO and wider Ericsson teams to achieve right first time resolution of work within customer SLA's without compromising safe working practices- Carryout reactive and proactive maintenance on multi-customer, multi technology networks, achieving agreed performance targets (Mostly EE/ESN)- Participate in a 24 x 7 callout rota covering all equipment in the Service Area- Assist in the development and review of FSO processes and procedures with a view to maximise efficiency The skills you bring:- Radio Frequency(RF) Transmission and Receiver systems employed in mobile communication equipment including- AC/DC Power Systems (single phase and three phase)- UPS systems and deployed ESN generators- Satellite Backhaul systems- Rectification and power distribution systems- GSM, UMTS and LTE fundamentals- Cellular Network Fundamentals. Mobility Management. Cell Planning-Radio Access Network and Core Network architecture- Antenna systems, Mast Head Amplifiers, Feeder systems and connectors

United Kingdom