Bringing real world currency to the blockchain.
AI Research Engineer, Model Compression, Quantization
Location
Brazil
Posted
7 days ago
Salary
0
Seniority
Senior
Job Description
AI Research Engineer, Model Compression, Quantization
Tether.to
• Apply low-bit quantization to reduce model size and inference latency for generative AI models (LLMs, VLMs, multimodal) while maintaining accuracy and output quality • Leverage knowledge distillation to transfer capabilities from larger teacher models to smaller student models, enabling efficient multimodal reasoning across text, image, and audio inputs • Implement pruning techniques to remove redundant parameters and attention heads, reducing computational overhead without sacrificing task performance • Analyze trade-offs between model efficiency (size, latency, memory) and accuracy across quantization, distillation, and pruning methods; propose improvements based on empirical findings • Research and apply mixed-precision quantization and other advanced compression strategies (e.g., adaptive pruning schedules, distillation with intermediate feature matching) to optimize the accuracy–performance balance • Stay current with the latest research in model compression, including emerging techniques for multimodal and generative architectures • Document methodologies, experiments, and results clearly to support reproducibility, internal collaboration, and stakeholder communication • Author technical papers and publish findings in top-tier conferences (e.g., NeurIPS, ICML, ICLR, CVPR, ACL, AAAI) to advance the field of model compression for multimodal AI.
Job Requirements
- A degree in Computer Science or related field
- Ideally PhD in NLP, Machine Learning, or a related field, complemented by a solid track record in AI R&D (with good publications in A* conferences)
- Experience with PyTorch deep learning frameworks or equivalent frameworks
- Hands-on experience with model quantization including both Quantization-Aware Training (QAT) and Post-Training Quantization (PTQ)
- Research and hands-on experience with knowledge distillation for compressing large models into smaller, efficient ones
- Research and hands-on experience with model pruning for compressing large models into smaller, efficient ones
- Solid understanding of neural network architectures and training processes – Including transformers (e.g., LLMs, VLMs), backpropagation, optimization, and fine-tuning techniques
- Familiarity with C++ is a plus (especially for implementing low-level quantization kernels or inference optimizations).
Benefits
- Flexible working arrangements
- Professional development opportunities
Related Guides
Related Job Pages
More AI Research Scientist Jobs
AI Research Engineer – Multi-Modal Reinforcement Learning
Tether.toBringing real world currency to the blockchain.
• Conduct research on reinforcement learning algorithms for multimodal models, • Design and build reinforcement learning infrastructure that supports scalable training, • Develop and refine reward modeling strategies, • Create and curate multimodal simulation environments and datasets, • Analyze and optimize policy performance across modalities, • Investigate and develop next-generation reinforcement learning paradigms, • Publish research findings in top-tier conferences.
AI Research Engineer, Model Compression, Quantization
Tether.toBringing real world currency to the blockchain.
• Drive innovation in model compression and efficient deployment for advanced multimodal AI systems, including large language models (LLMs) and vision-language models (VLMs). • Reduce model footprint and computational cost while preserving accuracy, enabling high-performance AI to run efficiently across resource-constrained edge devices. • Apply and advance compression techniques such as quantization, knowledge distillation, and pruning. • Build robust compression pipelines, establish performance and fidelity metrics, and address bottlenecks in production inference.
• Conduct end-to-end research and engineering on vision-language models, covering training, evaluation, and optimization across the full model development lifecycle. • Design and implement post-training pipelines including supervised fine-tuning, knowledge distillation, and reinforcement learning from human feedback. • Develop and maintain high-quality multimodal datasets, including data curation, filtering, and balancing for domain-specific tasks. • Drive model efficiency and deployability, adapting models for resource-constrained environments using compression and optimization techniques. • Design and implement evaluation frameworks and benchmarks to measure model performance, robustness, and real-world task success. • Build and scale training workflows across distributed GPU infrastructure. • Identify and resolve bottlenecks in training pipelines to achieve state-of-the-art model quality on target benchmarks. • Contribute to and leverage open-source ecosystems including models, datasets, and tooling to accelerate development. • Stay current with the latest research in multimodal learning and vision-language systems, translating relevant findings into practical improvements. • Publish research findings in top-tier AI conferences and journals where applicable.
AI Research Engineer – Agentic Post-training
Tether.toBringing real world currency to the blockchain.
• Conduct end-to-end research and engineering initiatives to advance post-training of agentic and tool-use models to achieve SOTA results. • Drive broad, cross-cutting model improvements, including factuality, instruction adherence, tool/function use, multi-agent coordination, and reasoning calibration. • Design and enhance large-scale post-training systems, including data pipelines, training workflows, evaluation frameworks, and benchmark infrastructure. • Develop rigorous evaluation suites and diagnostic tools to assess model readiness for deployment. • Strengthen feedback loops from real-world product usage, incorporating both explicit and implicit user signals into post-training. • Collaborate with tooling, product, and training teams to improve the usefulness, reliability, and agentic capabilities of frontier models. • Closely liaise with research, engineering and cross-functional teams to determine which integrations are production-ready for inclusion in major model releases.
