Careerflow.ai logo
Careerflow.ai

Commitments Required: 40 hours per week with overlap of 6 hours with PST. Engagement type: Contractor (no medical/paid leave). Duration of contract: 6 months with opportunity to extend; expected start date is 1st week of Jun-2026. Location: North America and LATAM.

LLM - AI Quality Analyst

Artificial IntelligenceArtificial IntelligencePart TimeRemoteMid LevelTeam 11-50

Location

Northern America + 2 moreAll locations: Northern America | Latin America (LATAM) | Europe

Posted

15 days ago

Salary

0

Seniority

Mid Level

No structured requirement data.

Job Description

LLM - AI Quality Analyst

Careerflow.ai

Role Description We're looking for sharp and detail-oriented analysts to join a global team evaluating a personalization feature for a leading AI assistant. In this role, you'll put the AI through its paces, testing how well it uses real personal context from past conversations, Gmail, Google Search, and YouTube activity to give genuinely helpful, tailored responses. This isn't a passive review job; you'll be designing your own prompts drawn from your actual experiences, then rigorously evaluating the AI's responses across multiple quality dimensions. Think of yourself as part product tester, part quality critic. Responsibilities - Design and run multi-turn conversational prompts (typically 1-5 turns) that challenge the AI to draw on your personal data and experiences. - Assess whether the model understood your intent and applied personalization appropriately, or where it missed the mark. - Evaluate Grounding: flag any claims the model makes about you that aren't supported by evidence, including hallucinations or flawed inferences. - Evaluate Integration: check whether personal data is woven into responses naturally, or whether the AI comes across as robotic or over-explanatory. - Conduct side-by-side (SxS) comparisons of two model responses, ranking them on helpfulness, usability, and overall quality. - Write clear, structured rationales for your rankings citing specific turns in the conversation to back up your assessments. - Verify "Debug Info" to confirm that chat summaries and external data sources were properly referenced. - Keep your evaluation data clean by deleting test conversations after each session so they don't influence future results. Qualifications - Strong written and reading proficiency in Portuguese, this is the primary language for the project. - Exceptional analytical thinking, particularly when it comes to evaluating nuanced or ambiguous AI-generated content. - Willing to use your primary personal Google account (not a test account) and enable personal data sources for evaluation. - Ability to design creative, context-rich prompts based on your own personal experiences. - A keen eye for spotting subtle differences between model responses, things like over-narrating, forced connections, or weak personalization. - Strong written communication skills. You'll need to write clear, defensible evaluation notes regularly. - Comfortable working independently and staying self-directed in a fully remote setup. - Reliable desktop or laptop with a solid internet connection. - Full-time availability in your local time zone, with 4 hours of daily overlap with PST. Preferred Qualifications - Prior experience in data annotation, AI quality evaluation, content moderation, or a related field. - A BS/BA degree (or equivalent experience) in a field like Linguistics, Journalism, Computer Science, Policy, Law, Ethics, or another analytically rigorous discipline. - Familiarity with personalization concepts and an intuition for when AI responses feel genuine versus when they seem off. Engagement Details - Type: Short-term contractor engagement (2 months) - Location: Remote - open to candidates in LATAM, USA, and Europe (excluding Portugal) - Hours: Minimum 30 hours per week, with a 40-hour option available - Overlap requirement: 4 hours per day during PST timezone - Start date: Immediate Hiring Process - Shortlisted candidates will receive a Job Interest Form to confirm availability and fit. - After profile review, you'll be sent a skills assessment to complete within 24 hours. - Candidates who pass the assessment will be contacted to walk through pre-onboarding requirements.

Related Job Pages

More Artificial Intelligence Jobs

Full TimeRemoteTeam 5,001-10,000Since 1991H1B Sponsor

• Own the AI use case roadmap across functions • Partner closely with the CIO to ensure business priorities are translated into the technical roadmap • Define what needs to change and why • Own the adoption methodology and coach functional leaders in applying it • Provide hands-on involvement in the first 1–2 flagship programs

New Jersey + 2 moreAll locations: New Jersey | Texas | Utah
Appen logo

Voice AI Data Collection, Polish

Appen

Appen is your trusted data partner, powering cutting-edge AI applications for the world's most innovative companies.

ContractRemoteTeam 501-1,000Since 1996H1B Sponsor

• Must be able to complete the task in a quiet environment for recording clear audio • Must be willing to complete ~500 short scripted prompts (or fewer if assigned) • Must maintain at least 90% QA acceptance rate to be eligible for payment

Poland
$35 / hour
Hire Overseas logo

AI Quality Analyst – Healthcare

Hire Overseas

Scale Your Business while Saving Money By Hiring Overseas Employees

Full TimeRemoteTeam 1-10Since 2023H1B No Sponsor

• Review live AI-driven interactions including call transcripts and text messages to ensure accuracy and quality of outcomes • Identify inconsistencies, edge cases, and failure patterns in system behavior • Flag issues proactively, including when expected system outputs are missing or delayed • Step in when the AI encounters ambiguity, failure cases, or situations requiring human judgment • Interpret nuanced or incomplete inputs and determine the appropriate next step without waiting for direction • Recognize when an AI interaction is no longer valid and take ownership of resolving it • Translate issues into clear, structured, and actionable communication for internal teams and customers • Communicate through Slack, Teams, and other platforms as needed • Maintain a high standard of written and verbal English across all interactions • Help ensure smooth execution of active pilots and deployments • Surface recurring patterns and insights to inform product and workflow improvements • Document edge cases and support the development of processes for handling them in the future

Philippines
CarringtonCrisp logo

IT Analyst, Artificial Intelligence (AI) Compliance

CarringtonCrisp

Intelligence, insight and imagination - the power to change business education

Full TimeRemoteTeam 1-10Since 2003H1B No Sponsor

• Responsible for supporting the ongoing monitoring efforts of the organization’s Artificial Intelligence (AI) governance and compliance program. • Works closely with Information Security, Legal, Risk Management, and business stakeholders to help ensure AI solutions are developed, deployed, and operated in alignment with IT Department policies and procedures. • Support compliance activities for AI use-cases across the model lifecycle (intake, risk assessment, development, validation, deployment, monitoring, change management, and retirement). • Provide recommendations for AI governance documentation, including policies, standards, procedures, model inventories, use-case registers, and control mappings. • Design reporting and/or monitoring aligned with industry best practices to ensure compliance with applicable state and federal laws/regulations and client contractual obligations. • Coordinate and track AI-related risk assessments and follow up on remediation actions. • Support evidence collection and assist with preparation of responses for internal audits, external audits, due diligence, and regulatory examinations related to AI governance and controls. • Evaluate documentation and artifacts for completeness and accuracy. • Establish and maintain a repository of AI compliance evidence. • Partner with stakeholders to ensure AI solutions follow required approvals, testing, monitoring, and incident reporting processes. • Support third-party AI/tool due diligence by coordinating documentation requests and reviewing control evidence in collaboration with information security. • Monitor internal policy/procedure changes and emerging AI governance expectations to identify potential gaps, provide recommendations, and assist with coordinating implementation. • Maintain detailed communications on assigned requests and initiatives, providing progress updates and completion status to management and key stakeholders. • Assist with the development and delivery of AI governance awareness and training materials.

United States
$95K - $105K / year