Vision-Language-Action (VLA) Annotator

Computer Vision EngineerMachine Learning EngineerFull Time Remote Mid Level

Location

United States

Posted

73 days ago

Salary

$25 / hour

Seniority

Mid Level

No structured requirement data.

Job Description

Role Description We are looking for a detail-oriented and technically capable Vision-Language-Action (VLA) Annotator to join our data operations team in Phoenix, Arizona. In this role, you will be responsible for reviewing, labeling, and quality-checking multimodal datasets used to train and evaluate autonomous driving and robotics models. Your work directly impacts the safety and performance of AI systems operating in the real world. This is a full-time, 40-hour-per-week position requiring sustained focus, sound judgment, and the ability to apply structured annotation guidelines to complex, real-world scenarios — including frequent edge cases. Key Responsibilities: - Review and annotate video footage, sensor telemetry, and camera feeds from autonomous vehicle test drives and robotics platforms. - Assess vehicle and robotic behavior in 3D space using 2D camera inputs, including approach angles, following distances, trail alignment, and controlled stop quality. - Use time-series telemetry data — including speed, throttle, steering, and braking charts — to make precise trim and segmentation decisions on data clips. - Apply annotation guidelines consistently while exercising independent judgment on ambiguous or edge-case scenarios. - Identify and flag unsafe, incomplete, or anomalous driving behaviors (e.g., rolling stops, improper following distance, out-of-distribution maneuvers). - Maintain high throughput and accuracy standards; participate in regular quality audits and calibration sessions. - Work within annotation platforms (e.g., Encord, CVAT, Label Studio, or similar) to complete labeling tasks efficiently. - Document and communicate recurring issues or ambiguities in the data to improve pipeline quality. Qualifications - Bachelor's degree with a STEM background preferred (Engineering, Computer Science, Physics, Mathematics, GIS, or related field). - Demonstrated ability to interpret vehicle or robotic behavior in 3D space from 2D camera feeds. - Experience reading and interpreting sensor data, telemetry charts, lab instrumentation output, or signal processing data. - Regular driving experience, ideally in varied or off-road conditions. - Ability to follow precise, rule-based guidelines while also applying sound judgment on frequent edge cases. - Comfort with sustained video review tasks. Requirements - Prior experience in QA, data annotation, or lab/research settings is a strong signal. - Prior annotation or data labeling experience, especially in autonomy or robotics datasets. - Familiarity with geospatial tools, map interfaces, or GIS platforms. - Hands-on experience with Encord, Label Studio, CVAT, Scale AI, or comparable labeling platforms. - Background in autonomous vehicles, ADAS systems, or driver safety analysis. Benefits - This is a remote position.

Related Categories

Computer Vision Engineer AI Engineer Machine Learning Engineer AI Research Scientist LLM Engineer NLP Engineer

Related Job Pages

Remote Full-time Jobs (US)More Remote Jobs

More Computer Vision Engineer Jobs

Computer Vision Engineer

Snap Inc.

Kind. Smart. Creative.

Computer Vision Engineer74 days ago

Full Time RemoteTeam 5,001-10,000Since 2011H1B No Sponsor

Company Site LinkedIn

Snap Inc is a technology company. We believe the camera presents the greatest opportunity to improve the way people live and communicate. Snap contributes to human progress by empowering people to express themselves, live in the moment, learn about the world, and have fun together. The Company’s three core products are Snapchat, a visual messaging app that enhances your relationships with friends, family, and the world; Lens Studio, an augmented reality platform that powers AR across Snapchat and other services; and its AR glasses, Spectacles. The Spectacles team is pushing the boundaries of technology to bring people closer together in the real world. Our fifth-generation Spectacles, powered by Snap OS, showcase how standalone, see-through AR glasses make playing, learning, and working better together. We’re looking for a Software Engineer to join the Spectacles team at Snap Inc! What you’ll do: - Lead or participate in the design, architecture and implementation of device calibration software and process to support both prototype and large scale production phase - Design and implement software or system development components at all device or component calibration steps, e.g. multi-cam calibration, visual-inertial calibration, sensor synchronization, influence and reliability testing etc - Onsite support for factory activities (hardware bringup, factory test integration), Participate in design reviews, code review with peers and stakeholders to create reliable solutions - Collaborate with computer vision and mechanical engineers on calibration station designs - Analyze and integrate existing calibration algorithms for highest efficiency in terms of accuracy and throughput Knowledge, Skills & Abilities: - Strong knowledge in C++ - Comfortable with large code bases, code reviews and version control (git) - Basic skills to work with mechanical and robotic installations - Excellent communication skills; ability to work with cross-functional teams - Ability to travel internationally and to travel 50% of the time to factory near Taipei Minimum Qualifications: - BSc in a relevant technical field such as computer science or electrical engineering or equivalent years of experience - 7+ years of experience in native software development (C++) - Experience in one or more of the following areas: Camera, IMU, sensor fusing, graphics and display, with a view towards writing performant and energy efficient solutions Preferred Qualifications: - Masters, PhD, or industrial experience in a relevant engineering discipline - Practical experience contributing to the design, validation, and transfer-to-mass-production phase for consumer electronic devices - Ability to use Python for scripting, automation of build/test processes, and developer productivity tools. - Experience in one of the following areas: Robotics, Mechanics, Actuators, High-speed Cameras, Camera Calibration - Solid knowledge of computer vision fundamentals such as distortion models, optimization, 3D geometry and Linear algebra If you have a disability or special need that requires accommodation, please don’t be shy and provide us some information. "Default Together" Policy at Snap: At Snap Inc. we believe that being together in person helps us build our culture faster, reinforce our values, and serve our community, customers and partners better through dynamic collaboration. To reflect this, we practice a “default together” approach and expect our team members to work in an office 4+ days per week. At Snap, we believe that having a team of diverse backgrounds and voices working together will enable us to create innovative products that improve the way people live and communicate. Snap is proud to be an equal opportunity employer, and committed to providing employment opportunities regardless of race, religious creed, color, national origin, ancestry, physical disability, mental disability, medical condition, genetic information, marital status, sex, gender, gender identity, gender expression, pregnancy, childbirth and breastfeeding, age, sexual orientation, military or veteran status, or any other protected classification, in accordance with applicable federal, state, and local laws. EOE, including disability/vets. Our Benefits: Snap Inc. is its own community, so we’ve got your back! We do our best to make sure you and your loved ones have everything you need to be happy and healthy, on your own terms. Our benefits are built around your needs and include paid parental leave, comprehensive medical coverage, emotional and mental health support programs, and compensation packages that let you share in Snap’s long-term success!

C++Git Python

View details: Computer Vision Engineer

Taiwan

Apply

Job Closed

Computer Vision Annotator - Contractor

SWORD Health

SWORD Health is a virtual musculoskeletal care provider that is on a mission to free 2 million people from post-surgical and chronic pain. The company’s platf

Computer Vision Engineer74 days ago

Full Time Remote

Company Site

At Sword, we’re building AI to heal billions and unlock humanity’s full potential. In doing so, we’re pioneering AI Care, a fundamentally new approach to healthcare built for medical reasoning, safety, and real-time treatment, not generic technology applied after the fact. As both a clinical-centric frontier AI lab and an applied AI platform, Sword is reimagining how care is delivered at scale, removing traditional barriers like appointments, waiting rooms, and stigma so more people can access the care they need—and ultimately get back to lives lived in full. Since 2020, Sword has expanded across physical therapy, women’s health, cardiometabolic, and mental health, and is now moving beyond the session to a fully AI-native, 24/7 care program that brings physical activity, therapeutic exercise, psychotherapy, nutrition, and behavior change into one connected experience. More than 700,000 members across three continents have completed over 10 million AI sessions, helping 1,000+ enterprise clients avoid more than $1 billion in unnecessary healthcare costs. Backed by 42 clinical studies, 44+ patents, and more than $500 million raised from leading investors including Khosla Ventures, General Catalyst, and Founders Fund, Sword is defining a new standard for healthcare. Role Computer vision models learn from labeled data – they are only as good as the annotations behind them. Your job is to label images and videos of people performing physical exercises, across a range of annotation tasks, using Sword's internal tooling. This is focused, detail-oriented work that spans multiple annotation types and evolves as our pipelines grow. You are expected to pick up new task types quickly and maintain high consistency across large volumes. The Data & Tooling team (part of the Algorithms org) builds the labeled datasets that train Sword's proprietary computer vision models. We develop the data foundation that enables our AI to see, understand, and interpret human movement. AI Proficiency at Sword Health AI fluency is a core expectation at Sword Health. Every candidate is assessed against our three-level framework — be ready to share real examples of how AI is already part of how you work. - Explorer (Level 1) — Uses AI daily to boost personal productivity - Builder (Level 2) — Creates workflows and tools that elevate the whole team - Integrator (Level 3) — Embeds AI into products and processes at scale Every hire must demonstrate at least Level 1. The expected level will vary depending on the seniority of the role. What you’ll be doing - Annotating images and videos of people performing physical exercises across a variety of labeling tasks - Reviewing and quality-checking work produced by other annotators - Applying structured guidelines consistently across high volumes - Flagging quality issues, edge cases, and ambiguous content - Adapting to new annotation task types as team needs evolve - Working full-time (40 hours/week), remotely, using Sword's internal and web-based annotation tools What you need to have - High attention to detail – precision and consistency are the job - Comfortable with high-volume, repetitive digital work for sustained periods - Quick to understand and apply structured guidelines - Experience with data annotation, QA, or content review is a plus - Basic familiarity with human anatomy is helpful but not required (full training provided) - Reliable internet connection and a desktop or laptop computer - No clinical or technical background required €7.50 - €7.50 an hour Full-time contract at €7.5/hour. Fixed hours, stable workload. Note: Please note that this position does not offer relocation assistance. Candidates must possess a valid EU visa and be based in Portugal. Sword Health complies with applicable Federal and State civil rights laws and does not discriminate on the basis of Age, Ancestry, Color, Citizenship, Gender, Gender expression, Gender identity, Gender information, Marital status, Medical condition, National origin, Physical or mental disability, Pregnancy, Race, Religion, Caste, Sexual orientation, and Veteran status.

View details: Computer Vision Annotator - Contractor

Portugal

Apply

Vision-Language Model (VLM) Engineer

Wide and Wise

Computer Vision Engineer74 days ago

Full Time RemoteTeam 2-10

We are seeking a highly skilled Vision-Language Model (VLM) Engineer to design, develop, and deploy state-of-the-art multimodal AI systems. You will work at the intersection of computer vision and natural language processing, contributing to cutting-edge products that combine image and text understanding. Key Responsibilities: Design and implement vision-language models for tasks such as image captioning, visual question answering, and cross-modal retrieval Train, fine-tune, and evaluate multimodal models using large-scale datasets Optimize model performance for scalability and real-world deployment Collaborate with cross-functional teams including data scientists, software engineers, and product managers Stay up to date with the latest research in multimodal AI and apply it to production systems Required Qualifications: Bachelor’s or Master’s degree in Computer Science, Artificial Intelligence, or a related field Strong experience with Python and deep learning frameworks (e.g., PyTorch or TensorFlow) Solid understanding of machine learning, computer vision, and NLP concepts Experience with multimodal models or related architectures (e.g., transformers) Familiarity with handling large datasets and distributed training Preferred Qualifications: Experience with models such as CLIP, BLIP, or similar multimodal architectures Knowledge of model deployment (Docker, APIs, cloud services) Publications or contributions to AI research projects Experience working with real-world AI applications

View details: Vision-Language Model (VLM) Engineer

Turkey

Apply

Teaching Associate - Computer Science

California State University

California State University is a state system of higher education that encompasses 23 campuses and eight off-campus centers across California. It is among the largest public four-y

Computer Vision Engineer77 days ago

Part Time

Instruct and manage classroom activities, prepare course materials, assess student performance, and assist students during office hours, ensuring effective learning and engagement in the Computer Science curriculum.

View details: Teaching Associate - Computer Science

California

Apply

Vision-Language-Action (VLA) Annotator

Job Description

Related Guides

Related Categories

Related Job Pages

More Computer Vision Engineer Jobs

Computer Vision Engineer

Computer Vision Annotator - Contractor

Vision-Language Model (VLM) Engineer

Teaching Associate - Computer Science