Job Closed
This listing is no longer active.
Menus. Orders. Simplified.
Data Architect – Annotation
Location
India
Posted
159 days ago
Salary
0
Seniority
Senior
Job Description
Data Architect – Annotation
ItsaCheckmate
• Act as the transition point between Prompt Engineering and Data Labeling, translating model and product requirements into concrete data and annotation workflows. • Design, implement, and maintain scalable data workflows for dataset generation, curation, and ongoing maintenance. • Ensure data quality and consistency across labeling projects, with a focus on operational reliability for production AI systems. • Create, review, and maintain high-quality annotations across multiple modalities, including text, audio, conversational transcripts, and structured datasets. • Identify labeling inconsistencies, data errors, and edge cases; propose and enforce corrective actions and improvements to annotation standards. • Utilize platforms such as Labelbox, Label Studio, or Langfuse to manage large-scale labeling workflows and enforce consistent task execution. • Use Python and SQL for data extraction, validation, transformation, and workflow automation across labeling pipelines. • Leverage LLMs (e.g., GPT-4, Claude, Gemini) for prompt-based quality checks, automated review, and data validation of annotation outputs. • Implement automated QA checks and anomaly-detection mechanisms to scale quality assurance for large datasets. • Analyze annotation performance metrics and quality trends to surface actionable insights that improve labeling workflows and overall data accuracy. • Apply statistical analysis to detect data anomalies, annotation bias, and quality issues, and partner with stakeholders to mitigate them. • Collaborate with ML and Operations teams to refine labeling guidelines and enhance instructions based on observed patterns and error modes. • Work closely with Prompt Engineering, Data Labeling, and ML teams to ensure that data operations align with model requirements and product goals. • Document data standards, annotation guidelines, and workflow best practices for use by internal teams and external labeling partners.
Job Requirements
- Experience with data annotation and hands-on use of platforms such as Labelbox, Label Studio, or Langfuse for managing large-scale labeling workflows.
- Proficiency in Python and SQL for data extraction, validation, and workflow automation in a data operations or data engineering context.
- Hands-on experience using LLMs (e.g., GPT-4, Claude, Gemini) for prompt-based quality checks, automated review, and data validation.
- Demonstrated experience working with large-scale / high-volume datasets.
- At least one prior role where data workflow automation is explicitly part of the job scope or responsibilities.
- Ability to perform statistical analysis to detect data anomalies, annotation bias, and quality issues.
- Strong requirement-elicitation and communication skills, with a process-driven and detail-oriented mindset when working with cross-functional teams.
- Qualifications: **
- B.S. or higher in a quantitative discipline (Data Science, Computer Science, Engineering, or related field)
- 5+ years of relevant experience with a B.S. degree, or 3+ years of experience with a Master's degree
- Demonstrated proficiency in SQL for reporting and Python for automation and scripting
- Academic or applied research experience related to the NLP, LLM Benchmarking dataset is a strong plus
- Must be flexible to work during US hours (until at least 1:30 PM EST)for this role.
Related Guides
Related Categories
Related Job Pages
More Data Engineer Jobs
Data Engineer – Geolocation Team
IPinfo.io – IP Data ProviderWe're the trusted source for IP address data, handling over 40 billion API requests per month for over 500,000+ users.
• Design, build, and operate data collection and analysis pipelines • Work with large-scale internet measurement data (we collect 75+ TB per week , including BGP, DNS, ping, and traceroute data from 1200+ global vantage points ) • Research, apply, and implement techniques from cutting-edge internet measurement research • Maintain a high bar for signal quality and defensibility , prioritizing observable network behavior over heuristics or guesswork • Communicate findings clearly by contributing to blog posts, technical documentation, and research publications , both internally and externally
• Utilize extract/transform/load ETL technologies using snowflake and other cloud data platforms • Interpret data, analyze results using statistical techniques and provide ongoing reports • Develop and implement databases, data collection systems, data analytics, and other strategies that optimize statistical efficiency and quality • Acquire data from primary or secondary data sources and maintain databases/data systems • Evaluate and optimize data structures • Identify, analyze, and interpret trends or patterns in complex data sets • Filter and “clean” data by reviewing computer reports, printouts, and performance indicators to locate and correct code problems • Monitor, troubleshoot, and improve pipeline transparency, performance, scalability, and reliability, using Snowflake OpenFlow and related ELT/ETL tools • Ensure AI/ML readiness of data by preparing and maintaining semantic models, ensuring robust data quality, and establishing and enforcing data access • Produce field mapping and translation documentation for use in both manual and scripted migrations • Work within Agile methodology managing tasks and tickets as assigned • Communicate with clients and team members for requirements gathering, clarification, and planning for data conversions • Document work and work processes for use by team members
Senior Data Engineer
The LeafletAn independent platform for cutting-edge, progressive, legal, and political opinion.
• Be a strong technical person on our team who helps drive innovation • Be responsible for producing and maintaining critical data deliverables / pipelines • Be technically talented, able to contribute quality code • Be responsible for the quality, scope, and timeliness of all your deliverables • Operate with a startup mindset, relying on your technical skill, passion, and ownership
AI Data Engineer
InfluurAccess your audience through our unrivaled influencer community – Forbes 30 under 30
• Own the full lifecycle from raw video ingestion to the decisions made by autonomous AI agents in production. • Shape the future of agentic AI and develop innovative solutions that drive millions in revenue today. • Collaborate with a team to push forward cutting-edge AI technology.




