Bringing real world currency to the blockchain.
AI Research Engineer – Kernel, Inference Optimization
Location
Netherlands
Posted
9 days ago
Salary
0
Seniority
Senior
Job Description
AI Research Engineer – Kernel, Inference Optimization
Tether.to
• Design and deploy model serving architectures that deliver high throughput and low latency • Ensure pipelines run efficiently across environments including resource-constrained devices and edge platforms • Establish clear performance targets for latency and memory usage • Build, run, and monitor controlled inference tests • Track key performance indicators like response latency and memory consumption • Document iterative results and compare outcomes against benchmarks • Analyze computational efficiency and diagnose bottlenecks in the serving pipeline • Work with cross-functional teams to integrate optimized frameworks into production pipelines • Define success metrics for improved performance and scalability
Job Requirements
- A degree in Computer Science or related field
- Ideally PhD in NLP, Machine Learning, or a related field, complemented by a solid track record in AI R&D (with good publications in A* conferences)
- Knowledge of Metal Shading Language (MSL)
- Comfort with writing custom compute shaders from scratch
- Proven experience in low-level kernel optimizations and inference optimization on mobile devices
- Contributions should have led to improvements in inference latency, throughput, and memory footprint for domain-specific applications
- A deep understanding of modern model serving architectures and inference optimization techniques
- Strong expertise in writing GPU kernels for mobile devices
- Practical experience in developing and deploying end-to-end inference pipelines
- Ability to apply empirical research to overcome challenges in model serving
- Proficient in designing robust evaluation frameworks and iterating on optimization strategies
- Experience with Distributed Inference Systems utilizing Tensor Parallelism, Pipeline Parallelism, and Expert Parallelism
- Understanding of Pruning, Quantization, Flash attention, KV Cache, Speculative Decoding (Eagle)
Benefits
- Work remotely from anywhere in the world
- Opportunity to innovate in the fintech space
- Collaborate with global talent
- Competitive compensation packages
- Flexible work arrangements
Related Guides
Related Job Pages
More AI Research Scientist Jobs
AI Research Engineer – Multi-Modal Reinforcement Learning
Tether.toBringing real world currency to the blockchain.
• Conduct research on reinforcement learning algorithms for multimodal models, • Design and build reinforcement learning infrastructure that supports scalable training, • Develop and refine reward modeling strategies, • Create and curate multimodal simulation environments and datasets, • Analyze and optimize policy performance across modalities, • Investigate and develop next-generation reinforcement learning paradigms, • Publish research findings in top-tier conferences.
AI Research Engineer, Model Compression, Quantization
Tether.toBringing real world currency to the blockchain.
• Drive innovation in model compression and efficient deployment for advanced multimodal AI systems, including large language models (LLMs) and vision-language models (VLMs). • Reduce model footprint and computational cost while preserving accuracy, enabling high-performance AI to run efficiently across resource-constrained edge devices. • Apply and advance compression techniques such as quantization, knowledge distillation, and pruning. • Build robust compression pipelines, establish performance and fidelity metrics, and address bottlenecks in production inference.
• Conduct end-to-end research and engineering on vision-language models, covering training, evaluation, and optimization across the full model development lifecycle. • Design and implement post-training pipelines including supervised fine-tuning, knowledge distillation, and reinforcement learning from human feedback. • Develop and maintain high-quality multimodal datasets, including data curation, filtering, and balancing for domain-specific tasks. • Drive model efficiency and deployability, adapting models for resource-constrained environments using compression and optimization techniques. • Design and implement evaluation frameworks and benchmarks to measure model performance, robustness, and real-world task success. • Build and scale training workflows across distributed GPU infrastructure. • Identify and resolve bottlenecks in training pipelines to achieve state-of-the-art model quality on target benchmarks. • Contribute to and leverage open-source ecosystems including models, datasets, and tooling to accelerate development. • Stay current with the latest research in multimodal learning and vision-language systems, translating relevant findings into practical improvements. • Publish research findings in top-tier AI conferences and journals where applicable.
AI Research Engineer – Agentic Post-training
Tether.toBringing real world currency to the blockchain.
• Conduct end-to-end research and engineering initiatives to advance post-training of agentic and tool-use models to achieve SOTA results. • Drive broad, cross-cutting model improvements, including factuality, instruction adherence, tool/function use, multi-agent coordination, and reasoning calibration. • Design and enhance large-scale post-training systems, including data pipelines, training workflows, evaluation frameworks, and benchmark infrastructure. • Develop rigorous evaluation suites and diagnostic tools to assess model readiness for deployment. • Strengthen feedback loops from real-world product usage, incorporating both explicit and implicit user signals into post-training. • Collaborate with tooling, product, and training teams to improve the usefulness, reliability, and agentic capabilities of frontier models. • Closely liaise with research, engineering and cross-functional teams to determine which integrations are production-ready for inclusion in major model releases.
