Senior Deep Learning Tools Engineer – CUDA Tile
Location
California + 2 moreAll locations: California | Utah | Washington
Posted
24 days ago
Salary
$152K - $241.5K / year
Seniority
Senior
Job Description
Senior Deep Learning Tools Engineer – CUDA Tile
NVIDIA
• Design and develop performance testing frameworks for deep learning compilers and workloads • Build and maintain automated pipelines (CI/CD) to continuously track performance across models, hardware, and compiler changes • Implement benchmarking systems to measure latency, throughput, and efficiency of AI and HPC workloads • Analyze performance trends over time and identify regressions, bottlenecks, and optimization opportunities • Partner with compiler and architecture teams to debug and resolve performance issues • Develop tools and dashboards for performance visualization, reporting, and insights • Enable scalable testing across diverse GPU systems and environments • Improve infrastructure to ensure reliable, reproducible, and high-signal performance data
Job Requirements
- BS, MS, or PhD (or equivalent experience) in Computer Science, Computer Engineering, Electrical Engineering, Mathematics, or related field
- 5+ years of software engineering experience, including experience in performance engineering, benchmarking, or systems optimization
- Strong programming skills in Python (C++ is a plus)
- Experience with CI/CD systems and automation frameworks
- Familiarity with hardware-aware performance analysis (GPUs, accelerators, or similar systems)
- Experience working with deep learning frameworks such as PyTorch, TensorFlow, JAX, or TensorRT
- Background in data analysis, profiling, and regression tracking
- Ability to debug complex system-level issues across software and hardware layers
Benefits
- Competitive salaries
- Comprehensive benefits package
- Equity options
Related Guides
Related Categories
Related Job Pages
More Engineer Jobs
Role Description Incubate and validate new ML initiatives end-to-end. On Innovation, you’ll build adoption-ready prototype vertical slices spanning data flows, model serving, evaluation, and product integration—then hand off clear artifacts so delivery teams can productize and own them long-term. - Build ML prototype vertical slices that connect ingest/processing to inference and visible product outcomes (search, insights, UX flows). - Create evaluation harnesses and decision artifacts: datasets, baselines, quality/latency/cost metrics, and go/no-go recommendations. - Package prototypes for adoption: containerize services, define reproducible deployments, and produce runbooks/checklists. - Partner with Research and Data Engineering on dataset curation, annotation loops, experiment tracking, and safe iteration. - Make prototypes operationally credible: instrumentation, monitoring, and security/compliance basics (PII handling, provenance mindset). Qualifications - 3+ years ML engineering/MLOps experience (level dependent), with evidence of shipping real systems. - Strong Python and hands-on PyTorch/Transformers; comfortable taking models from notebook to reproducible services. - Practical Kubernetes + containers experience; able to deploy and troubleshoot in production-like clusters (including offline/air-gapped constraints). - Strong evaluation discipline and monitoring mindset; comfortable communicating tradeoffs clearly. - Eligible to work in Germany; EU/NATO citizenship preferred and export-control screening applies. Nice‑to‑haves - GPU serving/optimization experience (Triton/KServe, ONNX/TensorRT, batching, quantization). - Streaming/pipeline tooling (Kafka, Ray, Beam/Flink/Spark) and search/vector/graph integrations. - German language (B1+) and/or experience with regulated/public-sector datasets and workflows. Benefits - Modern ML stack in real constraints: Kubernetes, streaming, and hybrid/on-prem/air-gapped deployments. - Remote-first in Germany with regular Berlin workshops, 30 days vacation, equipment & learning budget. - High leverage: your prototypes and handoffs unblock multiple delivery teams.
Productivity and Efficiency Engineer
FiscalNoteThe leading technology provider of global policy and market intelligence.
• Conduct structured process audits to document workflows, identify inefficiencies, and quantify manual effort costs. • Develop and prioritize a roadmap for process improvements and tool integrations based on impact assessments. • End-to-end design and implementation of solutions, including but not limited to AI agents, API integrations, and custom internal tools. • Seamlessly integrate existing tools like Slack, Jira, and Google Workspace to enhance system interoperability. • Define success metrics, monitor the adoption of new processes, and iteratively improve solutions based on feedback. • Contribute to and expand the organization’s process automation playbook for future initiatives.
Senior Forward Deployed Engineer
SBI, The Growth AdvisoryDriven by Insights, Delivered from Experience
• Work directly with clients to identify their highest-impact problems • Translate client problems into structured approaches • Design and deploy scalable, AI-powered solutions • Operate side-by-side with clients building in real time • Iterate quickly and demonstrate value throughout the engagement • Define, solve, and ensure solutions are adopted in high-stakes environments
Staff Engineer
WorkatoWorkato is a computer software company that has developed an enterprise automation platform with easy-to-use automation and integrations. The company fosters a collaborative, diver
Role Description Workato Inc. seeks Staff Engineer in Palo Alto, CA. - Design and develop production-grade distributed services in Rust using async/Tokio, focusing on concurrency, performance, and scalability. - Own the full service lifecycle from system design and implementation through deployment and operations. - Build and optimize data-processing and transformation pipelines with emphasis on throughput, latency, and memory efficiency. - Create and maintain integration tests with real service dependencies in containerized environments. - Improve test determinism, stability, and reliability across distributed systems. - Deploy and operate services across development, staging, and production environments using infrastructure-as-code practices. - Implement safe rollout and rollback procedures using GitOps and CI/CD workflows. - Develop and evolve observability systems including logs, metrics, and distributed tracing. - Define service-level objectives (SLOs), configure alerts, and lead incident response and post-incident reviews. - Design and maintain distributed cluster coordination systems using gossip-based membership and leader-election mechanisms for resilience and scalability. - Plan and execute performance benchmarking and load testing, including capacity modeling and regression detection. - Drive performance optimization initiatives across distributed services. - Apply fuzz testing techniques to critical components to improve reliability and security. - Practice chaos engineering in lower environments through fault injection, network partitioning, and resource pressure testing to validate resilience and recovery objectives. - Participate in architecture reviews and code reviews. - Contribute to technical design documents and RFCs. - Mentor peers and collaborate cross-functionally on service integrations and stateful components. - Full-time telecommuting permitted from anywhere in the United States. Qualifications - Bachelor’s degree (or foreign equivalent) in Computer Science, Management, or a closely related field. - 5 years of progressively responsible experience in the job offered or a related occupation. Requirements - 3 years of experience with Rust, including Tokio, asynchronous programming, concurrency, performance optimization, and allocator profiling. - 2 years of experience with Apache DataFusion and Apache Arrow, including Parquet, data pipelines, query planning, and vectorized execution. - 3 years of experience creating integration tests with real dependencies using Docker and Testcontainers. - 2 years of experience with behavior-driven testing for distributed services using frameworks such as Gherkin and Cucumber. - 2 years of experience with performance benchmarking, including throughput and latency analysis, regression detection, and capacity planning. - 2 years of experience with load testing using Locust and wrk, including test scenario design, ramp-up strategies, and analysis of latency, throughput, and error rates. - 1 year of experience with chaos engineering and fault injection, including network partitions, process termination, and resource pressure testing for resilience validation. - 2 years of experience designing and scaling distributed backend services, including rate limiting, fair queuing, back-pressure control, cluster coordination, gossip-based membership protocols, and leader election. - 3 years of experience with Kubernetes for production deployments, rollouts, and rollbacks across multiple environments. - 3 years of experience with Terraform and infrastructure-as-code practices for service provisioning and configuration. - 3 years of experience with advanced Redis patterns, including counters, streams/pub-sub, distributed locks, and idempotency controls. - 2 years of experience with PostgreSQL, including SQL optimization, JSON/JSONB, indexing, and locking, as well as columnar OLAP databases such as ClickHouse. - 2 years of experience with Ruby for backend and service tooling, including fuzz testing and library development. - 2 years of experience with Java or Kotlin for backend services. - 3 years of experience implementing observability and CI/CD systems, including Prometheus, OpenTelemetry, GitHub Actions, and ArgoCD. - 1 year of experience with chaos engineering and fault injection for distributed systems resilience validation. Benefits - Salary: $264,514.00-285,000.00 per annum. - 40 hours per week; M-F, 9:00 a.m. to 5:00 p.m. - Must be legally authorized to work in the U.S. without sponsorship.



