Our security platform combines AI and domain expertise, enabling teams to ship code faster with higher confidence.
AI Research Engineer
Location
United States + 1 moreAll locations: United States | Canada
Posted
68 days ago
Salary
$170K - $210K / year
Seniority
Mid Level
Job Description
AI Research Engineer
Cantina
Role Description We are looking for a talented AI Research Engineer to join our computer vision research team. In this role, you will work closely with our research team, implementing, training, and evaluating state-of-the-art image and video generation models. You will own the engineering execution that turns research ideas into working systems: - Building robust data pipelines - Running and stabilizing large-scale training - Implementing models from papers - Optimizing for speed/efficiency - Running rigorous evaluations This is a high-impact implementation and execution role. This role is ideal for engineers who enjoy building reliable ML systems and scaling research ideas into production-quality training pipelines. The ideal candidate is someone who gets deep satisfaction from: - Making complex systems work - Translating research ideas into reliable, scalable code - Debugging training instabilities - Delivering measurable improvements in training stability, model quality, and inference efficiency This is an excellent opportunity to work closely with experienced researchers, gain deep hands-on exposure to cutting-edge model training techniques, latest research methods in diffusion/transformer-based generation, large-scale experimentation, and efficiency innovations, all while contributing directly to production-grade models. Qualifications - 2–5 years of hands-on experience building and training ML systems, with strong ownership of results - Fluency in PyTorch: comfortable reading, writing, and debugging both training and inference code - Experience training or fine-tuning generative models (diffusion models, transformers, VAEs, or similar) from scratch or near-scratch - Solid understanding of distributed training workflows and practical debugging of large training runs - Demonstrated ability to read and implement AI research papers in computer vision - Familiarity with cutting-edge computer vision models and research literature in the image and video domain - Experience building data pipelines for large-scale image or video datasets - Strong debugging skills: comfortable diagnosing both engineering bugs and training failures - Strong engineering mindset: writing clean, reliable, debuggable code; profiling tools; handling numerical issues at scale Requirements - Build and maintain end-to-end data pipelines for large-scale image and video datasets: collection, filtering, augmentation, conditioning alignment, and efficient storage/sampling - Implement model architectures (diffusion, autoregressive, flow-based, diffusion transformers, etc.) and maintain high-throughput PyTorch training loops for large-scale image and video diffusion models - Run and manage large-scale training experiments on multi-GPU and multi-node setups (DDP, FSDP, DeepSpeed) - Debug training instabilities, loss spikes, and convergence issues - Apply quantization, pruning, and knowledge distillation techniques to compress models without sacrificing quality - Collaborate with researchers and translate state-of-the-art research papers into working implementations in our internal codebase (e.g., new attention mechanisms, sampling schedules, or conditioning methods) - Build and maintain evaluation pipelines of image quality, video consistency, and perceptual metrics - Set up and maintain human annotation and evaluation pipelines using services like AWS GroundTruth - Profile and optimize training speed, GPU memory utilization, and iteration time - Implement inference optimizations to reduce latency and compute cost - Work with acceleration toolchains such as torch.compile, Triton, TensorRT, or ONNX where appropriate Benefits - Competitive salary and generous company equity - Medical, dental, and vision insurance – 99.99% of premiums covered by Cantina - 42 days of paid time off, including: - 15 PTO days - 10 sick days - 15 company holidays - 2 floating holidays - Generous parental leave & fertility support - 401(k) retirement savings plan - Lifestyle spending account – $500/month to use however you’d like - Complimentary lunch and snacks for in-office employees - One Medical membership, and more!
Job Requirements
- 2–5 years of hands-on experience building and training ML systems, with strong ownership of results
- Fluency in PyTorch: comfortable reading, writing, and debugging both training and inference code
- Experience training or fine-tuning generative models (diffusion models, transformers, VAEs, or similar) from scratch or near-scratch
- Solid understanding of distributed training workflows and practical debugging of large training runs
- Demonstrated ability to read and implement AI research papers in computer vision
- Familiarity with cutting-edge computer vision models and research literature in the image and video domain
- Experience building data pipelines for large-scale image or video datasets
- Strong debugging skills: comfortable diagnosing both engineering bugs and training failures
- Strong engineering mindset: writing clean, reliable, debuggable code; profiling tools; handling numerical issues at scale
- Build and maintain end-to-end data pipelines for large-scale image and video datasets: collection, filtering, augmentation, conditioning alignment, and efficient storage/sampling
- Implement model architectures (diffusion, autoregressive, flow-based, diffusion transformers, etc.) and maintain high-throughput PyTorch training loops for large-scale image and video diffusion models
- Run and manage large-scale training experiments on multi-GPU and multi-node setups (DDP, FSDP, DeepSpeed)
- Debug training instabilities, loss spikes, and convergence issues
- Apply quantization, pruning, and knowledge distillation techniques to compress models without sacrificing quality
- Collaborate with researchers and translate state-of-the-art research papers into working implementations in our internal codebase (e.g., new attention mechanisms, sampling schedules, or conditioning methods)
- Build and maintain evaluation pipelines of image quality, video consistency, and perceptual metrics
- Set up and maintain human annotation and evaluation pipelines using services like AWS GroundTruth
- Profile and optimize training speed, GPU memory utilization, and iteration time
- Implement inference optimizations to reduce latency and compute cost
- Work with acceleration toolchains such as torch.compile, Triton, TensorRT, or ONNX where appropriate
Benefits
- Competitive salary and generous company equity
- Medical, dental, and vision insurance – 99.99% of premiums covered by Cantina
- 42 days of paid time off, including:
- 15 PTO days
- 10 sick days
- 15 company holidays
- 2 floating holidays
- Generous parental leave & fertility support
- 401(k) retirement savings plan
- Lifestyle spending account – $500/month to use however you’d like
- Complimentary lunch and snacks for in-office employees
- One Medical membership, and more!
Related Guides
Related Job Pages
More AI Engineer Jobs
• Agent-driven enrollment and parent communication pipelines that scale from hundreds to tens of thousands of families without linear headcount growth. • 10,000 simulated students testing our curriculum in parallel — stress-testing content, surfacing gaps, and generating improvements before real students ever see it. • Automated culture and community agents — building engagement, onboarding, and retention systems that feel human but run at machine scale. • Real-time operational dashboards that give leadership visibility into every part of the business: enrollment, academic progress, parent satisfaction, campus operations. • AI-first workflows for guides, advisors, and ops staff — freeing them from administrative burden so they can focus on students. • Brainlifts that capture institutional knowledge into AI systems that compound over time — the competitive moat. • Integration into Alpha’s broader AI ecosystem (EPHOR, Alpha GPTs, Fleet/Swarm infrastructure)
Lead AI Software Engineer
Streamline Healthcare SolutionsStreamline’s innovative technology solutions help behavioral health organizations advance the lives of those they serve.
Job Description: About Streamline Healthcare Solutions Here at Streamline, we strive on building lasting and trusting relationships with our clients, and our employees set the bar. Streamline’s mission is to build innovative technology solutions that empower people who improve behavioral health and quality of life of those in need. We are a high growth technology company that delivers web-based software for healthcare organizations to provide and coordinate all service delivery processes. Streamline has been offering software in the behavioral health marketplace since 2003. Streamline has built and maintains systems for some of the nation’s premier behavioral health organizations using the latest web-based technology. Streamline offers competitive compensation and benefits packages as well as a challenging, yet flexible, work environment that is conducive to collaboration and productivity. A career with Streamline Healthcare Solutions provides opportunities for growth and continued learning in a workplace where individual contribution is valued and recognized. Join us, and advance your career today with a company that is on the cutting edge of the behavioral healthcare technology industry. Summary of the Lead AI Software Engineer This senior-level role focuses on designing and delivering AI/ML solutions for the healthcare industry—spanning LLM-powered applications, retrieval-augmented generation (RAG), and predictive models. You will co-design AI solutions with the AI Architect and own product-level design and end-to-end implementation (data pipelines, training/fine-tuning, evaluation & guardrails, deployment, and monitoring) within enterprise standards. The role is Azure-first and requires hands-on experience with OpenAI or Anthropic models, Microsoft Copilot, GitHub Copilot, and strong competence with SQL Server (SSMS) and Visual Studio. A commitment to HIPAA-compliant handling of PHI, Responsible AI, and measurable clinical and business outcomes is essential. Healthcare domain experience—either clinical or revenue cycle management—is required. This position is remote and based in the United States. The salary range is $150,000 - $200,000, DOE. Employment visa sponsorship is not available for this role. Responsibilities of the Lead AI Software Engineer - Co-design AI solutions with the AI Architect and own product-level solutioning and delivery within enterprise AI architecture, standards, and governance. - Lead end-to-end implementation for your product/squad: - RAG over EHR/claims/clinical text using embeddings and vector search. - Model development and training/fine-tuning (LLMs and classical ML), evaluation frameworks, and guardrails (hallucination reduction, safety, PII/PHI handling). - Production deployment with containerization/orchestration, and GPU-aware inference where applicable. - Establish and operate MLOps: experiment tracking and model registry, CI/CD for ML, canary/A/B testing, and monitoring for latency, accuracy, drift, bias, and cost. - Own reliability, security, and cost for your product’s AI services: define SLOs, participate in on-call/incident response, manage token/GPU budgets, and optimize prompts, embeddings, caching, and indexing. - Build and maintain data pipelines (e.g., Spark/Databricks or equivalent) and ensure robust SQL Server performance and data quality; collaborate with DBAs and data engineers using SSMS. - Ensure HIPAA compliance and Responsible AI practices across development and operations; partner with security and compliance to meet policy requirements. - Collaborate with product management, domain experts, and compliance to translate requirements into safe, reliable, high-impact AI services. - Conduct reviews emphasizing code quality, experiment rigor, reproducibility, and evaluation discipline; mentor engineers and data scientists. - Participate in architecture reviews; propose improvements and contribute reusable components (RAG templates, evaluation harnesses) back to the shared AI platform. - Leverage Microsoft Copilot and GitHub Copilot to improve developer productivity, code quality, and documentation, aligning with organizational governance. Education and Experience Requirements for the Lead AI Software Engineer - Bachelor’s degree in Computer Science, Information Technology, Computer Information Systems, Health Informatics, or a related field. - Healthcare domain experience: either clinical (e.g., care delivery, clinical documentation, quality) or revenue cycle management (e.g., coding, claims, denials, prior auth). - 10+ years in software engineering; 5+ years building and shipping ML/AI solutions; 2+ years leading AI/ML initiatives or teams. - Azure AI Foundry (Azure AI Studio) knowledge for developing, evaluating, and operationalizing LLM solutions; familiarity with Azure AI resources and deployment patterns. - Proficiency with SQL Server Management Studio (SSMS) for SQL development, performance tuning, and troubleshooting; strong T-SQL fundamentals. - Proficiency with Visual Studio and experience integrating AI services into .NET/C# applications or services where needed. - Hands-on experience using OpenAI or Anthropic models (e.g., GPT-4.x/4o, Claude 3.x), including prompt engineering, function/tool calling, and evaluation. - Experience with Microsoft Copilot and GitHub Copilot in professional workflows (coding assistance, test generation, documentation) with awareness of usage policies and data boundaries. - Strong Python and ML ecosystem: PyTorch/TensorFlow, transformers/Hugging Face, embeddings, LLM orchestration (e.g., LangChain or LlamaIndex), and vector databases (e.g., FAISS, Azure AI Search, Pinecone). - MLOps: MLflow/W&B (or equivalent), Docker, Kubernetes, CI/CD for ML, model registries, monitoring (drift, performance, bias), A/B testing, and rollback strategies. - Cloud AI on Azure (preferred): Azure AI Foundry, Azure OpenAI, Azure AI Search, Azure ML, Azure Key Vault, with solid understanding of IAM, secrets, and encryption. - Demonstrated security, privacy, and compliance competence: HIPAA, PHI/PII handling and de-identification (e.g., Presidio), Responsible AI practices. - Excellent communication and cross-functional collaboration, including with clinical stakeholders and compliance teams. Preferred Education and Experience Requirements for the Lead AI Software Engineer - Experience with FHIR and HL7 data standards; clinical NLP (entity extraction, summarization, coding/RCM use cases); and/or medical imaging (DICOM). - Databricks (Delta Lake, Spark), Airflow (or similar orchestration), and Azure-native data services (e.g., Data Factory, Synapse). - GPU/CUDA experience; inference optimization (quantization, distillation); prompt/token budgeting and caching strategies for LLM workloads. - Governance & ethics: model cards, datasheets for datasets, bias/fairness evaluations, and red-teaming. - Familiarity with .NET microservices and API design to integrate AI services into enterprise systems. Streamline Healthcare Solutions is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, age, disability, military status, national origin, or any other characteristic protected under federal, state, or applicable local law.
Director, AI Platforms – Global
Vantage Data CentersVantage Data Centers is a global leader in providing innovative, sustainable hyperscale data center solutions. Founded in 2010, Vantage has expanded its footprint to meet the growi
• Build and lead the AI Platforms team spanning Copilot platform ownership, Halo platform ownership, go-to-market leadership, and program operations, developing capabilities that balance technical depth with business value communication. • Own the health, performance, and continuous improvement of the Copilot and Halo platforms, establishing reliability standards, monitoring frameworks, and enhancement roadmaps that ensure platform stability while enabling innovation. • Drive enterprise-wide platform adoption through compelling value narratives, measurement frameworks, and user engagement strategies that demonstrate ROI and build organizational enthusiasm for AI-powered workflows. • Establish and maintain the governance frameworks, reporting structures, and coordination mechanisms that enable the Navigators and Accelerators teams to operate effectively while maintaining organizational alignment. • Partner with the Director, Navigators, Global to ensure platform capabilities are well-understood across functional areas, enabling Navigators to identify opportunities that leverage existing infrastructure rather than requiring net-new development. • Partner with the Director, AI Delivery, Global to ensure platforms provide stable foundations for rapid prototyping, with clear integration patterns and development standards that accelerate solution delivery. • Develop go-to-market strategies for AI capabilities that translate technical features into business value propositions, ensuring every AI solution ships with a clear narrative about why it matters to the organization. • Oversee cross-cutting program operations including executive reporting, inter-team coordination, portfolio health tracking, and governance compliance that provide the Head of AI with decision-quality visibility into organizational performance. • Recruit, onboard, and develop platform owners, adoption specialists, GTM leads, and program operations professionals, creating career pathways that retain high performers while building organizational capability. • Establish platform health metrics, SLAs, and monitoring dashboards for both Copilot and Halo, ensuring reliability standards are met while identifying opportunities for performance optimization and feature enhancement. • Design adoption measurement frameworks that go beyond usage counts to capture meaningful engagement, business impact, and user satisfaction metrics that demonstrate platform value to executive stakeholders. • Create go-to-market playbooks that package AI capabilities with compelling narratives, training materials, and support resources that drive organic adoption across the enterprise. • Build and maintain the program governance framework including portfolio tracking, resource allocation processes, decision gates, and executive reporting cadences that provide organizational transparency. • Facilitate cross-team coordination between Navigators, Platforms, and Accelerators, ensuring information flows smoothly and dependencies are managed without creating bureaucratic overhead. • Develop executive reporting packages that synthesize platform health, adoption metrics, delivery velocity, and business value realization into narratives that inform strategic decision-making. • Architect platform evolution roadmaps that balance stability with innovation, incorporating feedback from platform users, Navigator discovery insights, and emerging technology capabilities. • Establish data governance, security, and compliance standards for platform operations that meet enterprise requirements while enabling the agility needed for rapid experimentation and prototyping. • Partner with IT and Security on platform infrastructure, ensuring AI systems meet enterprise architecture standards, cybersecurity requirements, and operational resilience expectations. • Monitor competitive landscape and industry trends in enterprise AI platforms, evaluating build versus buy decisions and identifying strategic partnership opportunities that extend platform capabilities. • Design and facilitate platform community events, user groups, and feedback channels that build organizational engagement while surfacing improvement opportunities and feature requests. • Manage vendor relationships for platform components, negotiating terms that preserve flexibility while ensuring support levels meet enterprise operational requirements. • Contribute to AI strategy development by providing platform perspective on organizational capabilities, adoption readiness, and infrastructure investment priorities. • Handle additional duties as assigned by Leadership.
Software Engineer, Data & AI
KineticWe are ranked #1 on APUC for student accommodation, conference & events, hotel, and multifunctional management systems
About Kinetic Backed by Nationwide, we’re transforming workers’ comp with technology-driven solutions for injury prediction, prevention, and management—helping employers create safer, healthier workplaces. We also work directly with enterprise clients. Our platform is built for labor intensive industries including parcel delivery, manufacturing, and wholesale/warehousing operations. From wearable tech that reduces strain and sprain injuries by over 60% to a tech-driven program that speeds up return-to-work and generative AI for claims management, we're redefining what workers' comp can do. We’re a remote-first team headquartered in New York City, growing fast—10x in the past two years—and driven by accountability, continuous improvement, and collaboration. If you're excited to build a career with impact, we’d love to meet you. Kinetic has been named to: Business Insurance's Best Place to Work in Insurance 2025 The Role As a Software Engineer on our team, you’ll be a versatile contributor working across a wide range of projects and technologies in service of Kinetic’s mission. You’ll collaborate with cross-functional teams to design, build, and maintain our data ecosystem including ETL pipelines, data lakes, autonomous AI agents, and other LLM-based applications. The role provides broad exposure to data-intensive applications across the company. We’re looking for someone who enjoys tackling complex, real-world challenges, thrives in a fast-paced environment, and is eager to learn and adapt as technologies evolve. You’ll contribute to everything from core infrastructure to experimental features, helping deliver high-impact solutions that reduce workplace injuries and improve operational outcomes for our customers. A strong engineering foundation, curiosity, and a collaborative mindset are key to success in this role. Responsibilities: - Design, develop, and implement robust, scalable, and cloud-native data and AI solutions that directly contribute to Kinetic’s mission. - Write clean, efficient, well-tested, and maintainable code in a team environment. - Participate in architecture and code reviews to ensure consistent quality, performance, reliability, and scalability of software systems. - Implement security, reliability, and observability best practices in software design and development. - Proactively identify opportunities for improvement in existing systems and processes. - Troubleshoot and resolve complex technical issues, emphasizing root cause analysis and post-mortem documentation. - Contribute to projects across the stack—including backend services, internal tools, and AI-powered features—as priorities evolve. Basic Qualifications: - Bachelor's degree in Computer Science or a related field and at least 3 years of relevant software engineering experience. - Back-end development experience in a modern tech stack that includes Python in a cloud environment. - Experience delivering software from concept through production, including CI/CD, monitoring, and operational support. - Proven track-record of building and operating large-scale cloud solutions in production while managing technical debt effectively. - The working language at Kinetic is English. Preferred Qualifications: - Experience developing and deploying AI features in production software, especially those leveraging LLMs or agentic AI patterns. - Experience with AWS, Apache Airflow, and Terraform. - Strong testing and documentation practices. - Experience with production security practices in the cloud including building systems that meet enterprise security standards and industry best practices. - Experience operating in early-stage or high-growth startup environments where pragmatism and speed matter. What we offer: - This is a remote position (candidate must be a full time U.S. resident) - Salary Range: $140,000 - 180,000 per year - Equity Grant - Medical, dental, and vision insurance - 20 vacation days per year - 9 federal holidays off - Parental Leave - Pre-tax 401k contribution - HSA with compatible health plans - FREE: Health Advocate - FREE: Telehealth Membership - FREE: OneMedical Account



