Senior Deep Learning Tools Engineer – CUDA Tile

EngineerEngineerFull TimeRemoteSeniorTeam 10,001+Since 1993H1B SponsorCompany SiteLinkedIn

Location

California + 2 moreAll locations: California | Utah | Washington

Posted

24 days ago

Salary

$152K - $241.5K / year

Seniority

Senior

Bachelor Degree5 yrs expEnglishPythonPyTorchTensorflow

Job Description

Senior Deep Learning Tools Engineer – CUDA Tile

NVIDIA

• Design and develop performance testing frameworks for deep learning compilers and workloads • Build and maintain automated pipelines (CI/CD) to continuously track performance across models, hardware, and compiler changes • Implement benchmarking systems to measure latency, throughput, and efficiency of AI and HPC workloads • Analyze performance trends over time and identify regressions, bottlenecks, and optimization opportunities • Partner with compiler and architecture teams to debug and resolve performance issues • Develop tools and dashboards for performance visualization, reporting, and insights • Enable scalable testing across diverse GPU systems and environments • Improve infrastructure to ensure reliable, reproducible, and high-signal performance data

Job Requirements

  • BS, MS, or PhD (or equivalent experience) in Computer Science, Computer Engineering, Electrical Engineering, Mathematics, or related field
  • 5+ years of software engineering experience, including experience in performance engineering, benchmarking, or systems optimization
  • Strong programming skills in Python (C++ is a plus)
  • Experience with CI/CD systems and automation frameworks
  • Familiarity with hardware-aware performance analysis (GPUs, accelerators, or similar systems)
  • Experience working with deep learning frameworks such as PyTorch, TensorFlow, JAX, or TensorRT
  • Background in data analysis, profiling, and regression tracking
  • Ability to debug complex system-level issues across software and hardware layers

Benefits

  • Competitive salaries
  • Comprehensive benefits package
  • Equity options

Related Categories

Related Job Pages

More Engineer Jobs

Role Description Incubate and validate new ML initiatives end-to-end. On Innovation, you’ll build adoption-ready prototype vertical slices spanning data flows, model serving, evaluation, and product integration—then hand off clear artifacts so delivery teams can productize and own them long-term. - Build ML prototype vertical slices that connect ingest/processing to inference and visible product outcomes (search, insights, UX flows). - Create evaluation harnesses and decision artifacts: datasets, baselines, quality/latency/cost metrics, and go/no-go recommendations. - Package prototypes for adoption: containerize services, define reproducible deployments, and produce runbooks/checklists. - Partner with Research and Data Engineering on dataset curation, annotation loops, experiment tracking, and safe iteration. - Make prototypes operationally credible: instrumentation, monitoring, and security/compliance basics (PII handling, provenance mindset). Qualifications - 3+ years ML engineering/MLOps experience (level dependent), with evidence of shipping real systems. - Strong Python and hands-on PyTorch/Transformers; comfortable taking models from notebook to reproducible services. - Practical Kubernetes + containers experience; able to deploy and troubleshoot in production-like clusters (including offline/air-gapped constraints). - Strong evaluation discipline and monitoring mindset; comfortable communicating tradeoffs clearly. - Eligible to work in Germany; EU/NATO citizenship preferred and export-control screening applies. Nice‑to‑haves - GPU serving/optimization experience (Triton/KServe, ONNX/TensorRT, batching, quantization). - Streaming/pipeline tooling (Kafka, Ray, Beam/Flink/Spark) and search/vector/graph integrations. - German language (B1+) and/or experience with regulated/public-sector datasets and workflows. Benefits - Modern ML stack in real constraints: Kubernetes, streaming, and hybrid/on-prem/air-gapped deployments. - Remote-first in Germany with regular Berlin workshops, 30 days vacation, equipment & learning budget. - High leverage: your prototypes and handoffs unblock multiple delivery teams.

United States + 8 moreAll locations: United States | United Kingdom | Canada | Germany | France | India | Brazil | Australia | Estonia
Job Closed
FiscalNote logo

Productivity and Efficiency Engineer

FiscalNote

The leading technology provider of global policy and market intelligence.

Engineer24 days ago
Full TimeRemoteTeam 501-1,000Since 2013H1B Sponsor

• Conduct structured process audits to document workflows, identify inefficiencies, and quantify manual effort costs. • Develop and prioritize a roadmap for process improvements and tool integrations based on impact assessments. • End-to-end design and implementation of solutions, including but not limited to AI agents, API integrations, and custom internal tools. • Seamlessly integrate existing tools like Slack, Jira, and Google Workspace to enhance system interoperability. • Define success metrics, monitor the adoption of new processes, and iteratively improve solutions based on feedback. • Contribute to and expand the organization’s process automation playbook for future initiatives.

United States
Job Closed
SBI, The Growth Advisory logo

Senior Forward Deployed Engineer

SBI, The Growth Advisory

Driven by Insights, Delivered from Experience

Engineer24 days ago
Full TimeRemoteTeam 51-200H1B No Sponsor

• Work directly with clients to identify their highest-impact problems • Translate client problems into structured approaches • Design and deploy scalable, AI-powered solutions • Operate side-by-side with clients building in real time • Iterate quickly and demonstrate value throughout the engagement • Define, solve, and ensure solutions are adopted in high-stakes environments

United States
Job Closed
Workato logo

Staff Engineer

Workato

Workato is a computer software company that has developed an enterprise automation platform with easy-to-use automation and integrations. The company fosters a collaborative, diver

Engineer24 days ago

Role Description Workato Inc. seeks Staff Engineer in Palo Alto, CA. - Design and develop production-grade distributed services in Rust using async/Tokio, focusing on concurrency, performance, and scalability. - Own the full service lifecycle from system design and implementation through deployment and operations. - Build and optimize data-processing and transformation pipelines with emphasis on throughput, latency, and memory efficiency. - Create and maintain integration tests with real service dependencies in containerized environments. - Improve test determinism, stability, and reliability across distributed systems. - Deploy and operate services across development, staging, and production environments using infrastructure-as-code practices. - Implement safe rollout and rollback procedures using GitOps and CI/CD workflows. - Develop and evolve observability systems including logs, metrics, and distributed tracing. - Define service-level objectives (SLOs), configure alerts, and lead incident response and post-incident reviews. - Design and maintain distributed cluster coordination systems using gossip-based membership and leader-election mechanisms for resilience and scalability. - Plan and execute performance benchmarking and load testing, including capacity modeling and regression detection. - Drive performance optimization initiatives across distributed services. - Apply fuzz testing techniques to critical components to improve reliability and security. - Practice chaos engineering in lower environments through fault injection, network partitioning, and resource pressure testing to validate resilience and recovery objectives. - Participate in architecture reviews and code reviews. - Contribute to technical design documents and RFCs. - Mentor peers and collaborate cross-functionally on service integrations and stateful components. - Full-time telecommuting permitted from anywhere in the United States. Qualifications - Bachelor’s degree (or foreign equivalent) in Computer Science, Management, or a closely related field. - 5 years of progressively responsible experience in the job offered or a related occupation. Requirements - 3 years of experience with Rust, including Tokio, asynchronous programming, concurrency, performance optimization, and allocator profiling. - 2 years of experience with Apache DataFusion and Apache Arrow, including Parquet, data pipelines, query planning, and vectorized execution. - 3 years of experience creating integration tests with real dependencies using Docker and Testcontainers. - 2 years of experience with behavior-driven testing for distributed services using frameworks such as Gherkin and Cucumber. - 2 years of experience with performance benchmarking, including throughput and latency analysis, regression detection, and capacity planning. - 2 years of experience with load testing using Locust and wrk, including test scenario design, ramp-up strategies, and analysis of latency, throughput, and error rates. - 1 year of experience with chaos engineering and fault injection, including network partitions, process termination, and resource pressure testing for resilience validation. - 2 years of experience designing and scaling distributed backend services, including rate limiting, fair queuing, back-pressure control, cluster coordination, gossip-based membership protocols, and leader election. - 3 years of experience with Kubernetes for production deployments, rollouts, and rollbacks across multiple environments. - 3 years of experience with Terraform and infrastructure-as-code practices for service provisioning and configuration. - 3 years of experience with advanced Redis patterns, including counters, streams/pub-sub, distributed locks, and idempotency controls. - 2 years of experience with PostgreSQL, including SQL optimization, JSON/JSONB, indexing, and locking, as well as columnar OLAP databases such as ClickHouse. - 2 years of experience with Ruby for backend and service tooling, including fuzz testing and library development. - 2 years of experience with Java or Kotlin for backend services. - 3 years of experience implementing observability and CI/CD systems, including Prometheus, OpenTelemetry, GitHub Actions, and ArgoCD. - 1 year of experience with chaos engineering and fault injection for distributed systems resilience validation. Benefits - Salary: $264,514.00-285,000.00 per annum. - 40 hours per week; M-F, 9:00 a.m. to 5:00 p.m. - Must be legally authorized to work in the U.S. without sponsorship.

United States
$264.5K - $285K / year
Job Closed