Bringing real world currency to the blockchain.
AI Research Engineer – Kernel, Inference Optimization
Location
United Kingdom
Posted
8 days ago
Salary
0
Seniority
Senior
Job Description
AI Research Engineer – Kernel, Inference Optimization
Tether.to
• Drive innovation in model serving and inference architectures for advanced AI systems. • Focus on optimizing model deployment and inference strategies to deliver highly responsive, efficient, and scalable performance across real-world applications. • Work on a wide spectrum of systems, ranging from resource-efficient models designed for limited hardware environments to complex, multi-modal architectures that integrate data such as text, images, and audio. • Develop, test, and implement novel serving strategies and inference algorithms. • Engineer robust inference pipelines, establish comprehensive performance metrics, and identify and resolve bottlenecks in production environments. • Enable high-throughput, low-latency, low-memory footprint, and scalable AI performance that delivers tangible value in dynamic, real-world scenarios.
Job Requirements
- A degree in Computer Science or related field.
- Ideally PhD in NLP, Machine Learning, or a related field, complemented by a solid track record in AI R&D (with good publications in A* conferences).
- Must have knowledge of Metal Shading Language (MSL).
- Proven experience in low-level kernel optimizations and inference optimization on mobile devices is essential.
- Your contributions should have led to measurable improvements in inference latency, throughput, and memory footprint for domain-specific applications, particularly on resource-constrained devices and edge platforms.
- A deep understanding of modern model serving architectures and inference optimization techniques is required.
- Must have strong expertise in writing GPU kernels for mobile devices (i.e., smartphones) as well as a deep understanding of model serving frameworks and engines.
- Practical experience in developing and deploying end-to-end inference pipelines, from optimizing models for efficient serving to integrating these solutions on resource-constrained devices is required.
- Demonstrated ability to apply empirical research to overcome challenges in model serving, such as latency optimization, computational bottlenecks, and memory constraints.
- You should be proficient in designing robust evaluation frameworks and iterating on optimization strategies to continuously push the boundaries of inference performance and system efficiency.
- Distributed Inference Systems: Designing and optimizing high-performance inference engines using techniques like Tensor Parallelism, Pipeline Parallelism, and Expert Parallelism to handle massive models on GPU clusters.
- Deep understanding of the math and structure behind Diffusion Models and Vision Transformers
- Understanding of Pruning, Quantization, Flash attention, KV Cache, Speculative Decoding (Eagle) etc.
Benefits
- Flexible working hours
- Professional development opportunities
Related Guides
Related Job Pages
More AI Research Scientist Jobs
• Design, prototype, and evaluate applied AI solutions across natural language, vision, recommendation, and structured data domains. • Translate ambiguous business problems into well-scoped ML formulations with clear success metrics and evaluation strategies. • Stay current with the latest research in deep learning, large language models, and adjacent areas, and assess applicability to internal use cases. • Implement rigorous experimentation workflows including baselines, ablations, and statistically sound evaluation methodology. • Build production-quality training and inference pipelines using modern ML frameworks and orchestration tools. • Collaborate with ML platform engineers to ensure efficient use of compute, storage, and accelerator resources. • Optimize models for accuracy, latency, throughput, and cost based on production requirements. • Develop tooling for dataset construction, labeling, validation, and ongoing monitoring of data quality. • Partner with product, design, and domain experts to ensure model behavior aligns with user needs and policy requirements. • Implement safety, fairness, and reliability evaluations and incorporate findings into model selection decisions. • Document research findings, design decisions, and operational characteristics clearly for both technical and non-technical audiences. • Mentor engineers on applied ML methodology, evaluation rigor, and responsible deployment. • Contribute to internal knowledge sharing, reading groups, and prototype-to-production playbooks. • Influence the broader AI roadmap based on research insight, capability gaps, and emerging opportunities.
Lead Bioinformatics AI Scientist
Baylor GeneticsBaylor Genetics pioneered the history of genetic testing. Now, we’re leading the way in precision diagnostics.
• Serves as the visionary leader in Bioinformatics AI application development in a clinical genetic testing setting. • Provides technical guidance and hands-on support towards building company’s next-generation bioinformatics AI platform. • Identifies, prototypes, and develops state-of-the-art AI applications to revolutionize clinical testing and genomic analysis workflow. • Designs, develops, evaluates, and deploys novel AI solutions to gain valuable data insights based on the genetical, phenotypical, and clinical datasets. • Evaluates, adopts, and customizes GenAI models based on both internal and external datasets to build next-generation clinical genetic testing platforms. • Supports both internal and external data requirements by leveraging AI and GenAI capabilities to keep up with the increasing demands of the business. • Collaborates in a multidisciplinary and regulated clinical diagnostics environment with geneticists, bioinformaticians, software engineers, and IT infrastructure professionals.
• Conduct a research study with Legal professionals • Collect real-world professional scenarios to help train AI models • Submit scenarios resolved through conversation or consultation • Follow validation criteria for scenario submissions
• Research and design Agent execution framework, providing standardized runtime environment for intelligent agents • Implement tool call orchestration mechanism, supporting unified abstraction for function calling, API integration, and external system interaction • Build execution sandbox environment to ensure safety and controllability of Agent operations • Design task decomposition and planning engine, supporting automatic breakdown of complex goals and execution path optimization • Implement execution state tracking and anomaly recovery mechanisms to ensure reliability of long-running tasks • Design hierarchical memory architecture, covering storage and retrieval mechanisms for working memory, short-term memory, and long-term memory • Research memory compression and summarization techniques, enabling efficient storage of massive interaction history while preserving key information • Build context-aware memory system, supporting multi-dimensional memory association based on time, task, and user • Develop memory retrieval augmentation mechanisms, achieving deep integration of RAG and Agent memory • Explore memory forgetting and update strategies, balancing memory capacity with information timeliness • Research multi-Agent system architecture, design communication protocols and collaboration mechanisms between Agents • Implement role specialization and task allocation algorithms, supporting orchestration of expert Agents, coordinator Agents, executor Agents, and other roles • Build consensus achievement and conflict resolution mechanisms to handle decision disagreements among multiple Agents • Design Agent social behavior norms, simulating communication, negotiation, and feedback patterns in human team collaboration • Explore emergent behavior and collective intelligence, researching self-organization and adaptive capabilities in multi-Agent systems • Design Agent evaluation and benchmarking system, establishing quantitative capability metrics • Build Agent behavior interpretability framework, supporting decision process tracing and attribution analysis • Research Agent safety alignment mechanisms to prevent risks such as unauthorized operations, harmful outputs, and goal drift • Track cutting-edge Agentic AI research and translate academic achievements into engineering practice.



