Job Closed

This listing is no longer active.

Weekday (YC W21) logo
Weekday (YC W21)

We are a Y-Combinator-backed startup building your AI-powered Recruiter Agent

AI Red-Teamer - Adversarial AI Testing English

AI EngineerMachine Learning EngineerOtherRemoteMid LevelTeam 11-50Since 2021H1B No SponsorCompany SiteLinkedIn

Location

United States

Posted

99 days ago

Salary

0

Seniority

Mid Level

English

Job Description

AI Red-Teamer - Adversarial AI Testing English

Weekday (YC W21)

This role is for one of our clients Compensation: $50-$111 per hour We are seeking AI Red-Teamers to help test and strengthen modern AI systems through adversarial evaluation. In this role, you will challenge AI models with carefully designed inputs to uncover weaknesses, surface vulnerabilities, and generate high-quality data that improves the safety, reliability, and robustness of conversational AI. This work focuses on proactively identifying potential risks before they appear in real-world use. By systematically probing AI systems, you will help ensure they respond safely, accurately, and responsibly across a wide range of scenarios. This role may include reviewing AI outputs that reference sensitive topics such as bias, misinformation, or harmful behaviors. All work is text-based, and participation in higher-sensitivity projects is optional and supported with clear guidelines and wellness resources.

Job Requirements

  • What You’ll Do
  • Red-team AI models and agents by testing jailbreak attempts, prompt injections, misuse scenarios, and exploit strategies
  • Generate high-quality human evaluation data by annotating model failures, classifying vulnerabilities, and identifying systemic risks
  • Apply structured testing methodologies using taxonomies, benchmarks, and playbooks to ensure consistent evaluation
  • Document findings clearly and reproducibly, producing reports, datasets, and adversarial test cases that teams can act upon
  • Work across multiple projects, supporting different AI systems and evaluation objectives
  • Who You Are
  • You have prior red-teaming experience, such as adversarial AI testing, cybersecurity, or socio-technical risk analysis
  • You naturally think adversarially, exploring ways to push systems to their limits and uncover weaknesses
  • You prefer structured methodologies, using frameworks and benchmarks rather than ad-hoc testing
  • You communicate risks and vulnerabilities clearly to both technical and non-technical audiences
  • You are comfortable working across multiple projects and adapting to new evaluation challenges
  • Nice-to-Have Specialties
  • Adversarial Machine Learning: jailbreak datasets, prompt injection attacks, RLHF/DPO vulnerabilities, or model extraction techniques
  • Cybersecurity: penetration testing, exploit development, reverse engineering
  • Socio-technical risk analysis: harassment or misinformation testing, abuse pattern analysis
  • Creative adversarial thinking: backgrounds in psychology, acting, writing, or other disciplines that support unconventional attack strategies
  • What Success Looks Like
  • You uncover vulnerabilities and failure modes that automated tests miss
  • Your work produces reproducible artifacts and datasets that improve AI system resilience
  • Evaluation coverage expands with more realistic adversarial scenarios tested before deployment
  • AI systems become safer and more reliable due to your rigorous testing and insights
  • Why Join
  • Contribute directly to frontier work in AI safety and adversarial testing
  • Help improve the robustness, safety, and trustworthiness of modern AI systems
  • Gain hands-on experience working with human data-driven AI evaluation methodologies
  • Compensation may vary depending on the project, customer requirements, level of expertise, and content sensitivity involved in each engagement.
  • Contract and Payment Terms
  • Engagement will be as an independent contractor
  • This is a fully remote role that can be completed on your own schedule
  • Projects may be extended, shortened, or concluded early depending on project needs and performance
  • Work performed will not involve access to confidential or proprietary information from any employer, client, or institution
  • Payments are issued weekly via Stripe or Wise based on services rendered
  • Please note: Candidates requiring H1-B or STEM OPT sponsorship cannot be supported for this role at this time.

Related Job Pages

More AI Engineer Jobs

The Reality Most companies claim to be “AI-powered”—but treat LLMs as a magic API. At hillock., you won’t be shipping “proofs of concept”; you’ll engineer production-grade systems that actually move the revenue needle for clients. What You’ll Actually Do You’re the engineer who turns “Our engine found 847 high-intent prospects matching your ICP” into automated pipelines that generate measurable pipeline and revenue. This isn’t about building demos; it’s about shipping reliable AI-powered systems that let client execs sleep at night. AI System Design & GTM Automation - Turn proprietary ICP and market research into robust go-to-market automations—whether programmatic outreach, SEO, ABM, or influencer marketing—using AI/ML tools and frameworks. - Eliminate random acts of marketing: Architect data-driven, repeatable GTM workflows instead of hand-crafted campaign hacks. - Drive measurable results: Enable 73% of clients to hit pipeline targets in Q1 through automation and precision—not “one-size-fits-all” playbooks. Technical Execution & Performance Tracking - Build, deploy, and maintain AI-powered GTM tools and integrations—think automated prospecting, lead scoring, campaign orchestration, and attribution systems. - Prioritize business-critical metrics: pipeline velocity, cost per qualified lead, revenue contribution—not vanity stats. - Ensure reliability and scalability: ship software that works across multiple client accounts without constant babysitting. Cross-functional Collaboration - Work directly with GTM strategists and client leads who have taken companies from zero to $100M ARR—we make sure engineering aligns to business impact. - Communicate technical concepts to CEOs/CMOs and non-technical stakeholders, translating AI possibilities into commercial outcomes. - Collaborate on internal product development that multiplies humans with automation—not replaces them. What We’re Looking For The Essentials - 3+ years building production software in GTM/growth, marketing tech, data engineering, or AI/ML environments (track record > tenure). - Experience deploying and integrating ML, LLM, or automation tools into real-world marketing or sales workflows. - Comfort juggling multiple projects and priorities—delivering predictable outcomes at pace. What Sets You Apart - You see marketing as an engineering problem, not just a branding race. - Strong with data analysis, pipelines, and workflow automation. You’ve worked with AI-driven platforms, open-source tools, and have a builder’s mindset. - Passion for tracking what moves the dial: you debug both code and metrics, and can explain “why it works” to a founder or marketer. The Real Test - You know the difference between fragile AI demos and production-grade automation. - Not afraid to push back when commercial ideas exceed what tech can do—or to rapidly prototype until it can. - Dismiss buzzwords, prioritize accountability. Why hillock. Works Differently - No Agency Theater: You’ll engineer with direct access to proprietary AI, ICP research, and workflow automation tools that dwarf what most “AI agencies” have. - No Random Acts of Marketing: Build unified GTM systems with measurable impact. - Real Technical Depth: Collaborate with top-tier strategists and operators, not just “visionaries” with no shipping record. - Actually Remote-First: Results matter more than location. Most GTM engineering roles mean translating vague growth goals into technical hand-waving. Here, you’ll ship production AI that drives revenue—and always know if it’s working. Ready to engineer the future of AI-powered GTM? Tell us your best story—where did you save a go-to-market team from technical chaos, or build something nobody thought possible?

United States + 1 moreAll locations: United States | Canada
Careerflow.ai logo

AI/ML Software Engineer (RL Environments) (Contract)

Careerflow.ai

Commitments Required: 40 hours per week with overlap of 6 hours with PST. Engagement type: Contractor (no medical/paid leave). Duration of contract: 6 months with opportunity to extend; expected start date is 1st week of Jun-2026. Location: North America and LATAM.

AI Engineer99 days ago
Full TimeRemoteTeam 11-50

About the Role We're seeking experienced Machine Learning Engineers and Software Engineers with ML experience to design and build high-quality RL training environments for LLM agents. As an RL Environment Engineer, you'll create diverse machine learning tasks that challenge and improve language models, working with minimal supervision to deliver consistent, quality outputs. What You'll Do - Design and build tasks for machine learning domains that target specific language models and difficulty distributions - Iterate rapidly on task designs based on customer feedback, with 24-hour turnaround times - Create diverse, challenging scenarios that test language model capabilities and expose their limitations - Hit the ground running with minimal onboarding time What We're Looking For - Strong machine learning background through coursework, previous work experience, or personal projects - Python fluency: you write clean, efficient Python code regularly - Heavy LLM user who understands current model capabilities and failure modes through daily hands-on experience - Self-directed and creative. You can generate novel ML task ideas in your domain without constant guidance - High responsibility and integrity. You deliver quality work consistently and meet deadlines - Availability overlap with PST 9am-5pm (minimum 3 hours required) Work Details - Location: Remote - Type: Contractor Time Commitment: 40 hours a week. Must have at least 3 hours of overlap with PST business hours (9am-5pm) Selection Process: - Screening - Hacker rank assessment - 1 Week paid task - Full time

United States
Phygtl, Inc. logo

Founding Engineer — Distributed Systems (Games + AI)

Phygtl, Inc.

At Phygtl, we are solving the problem for Students suffering from social isolation, a challenge that affects over 60% of today's students, contributing significantly to rising depression and anxiety rates. We develop technology that reduces isolation by up to 40%, by providing a novel way to foster community on campus while bridging connections between physical and digital environments. Our team includes PhDs from institutions such as Stanford, UC Berkeley, MIT, and Carnegie Mellon, alongside professionals from companies like Niantic, Meta, Magic Leap, Riot Games, Ubisoft, and Zynga. Led by a Silicon Valley-based serial entrepreneur, our startup is united by a commitment to rescue next-gen from Social Atrophy.

AI Engineer99 days ago
Full TimeRemoteTeam 11-50

The Problem Young people are more connected online than ever, yet lonelier than any generation before them. Most digital platforms are optimized for passive consumption on screens, pulling people away from the environments they live in. We are exploring the opposite: technology that brings people together in real-world environments. To do that, software must understand place, context, and coordinated action, not just users and clicks. What We Are Building Vyry is infrastructure that allows physical places to accumulate persistent digital meaning through human participation. When students coordinate in quests, they are not simply completing activities. They are creating artifacts, relationships, and identity traces that remain anchored to real locations. Over time, campuses begin to develop a persistent experiential layer, where places contain memories, co-created assets, and discovery paths generated by the community itself. The experiences themselves are not the product. They are the behavioral interface through which the system learns how humans coordinate and co-create meaning in physical environments. Phygtl is building a system where: - physical environments become stateful systems - human coordination generates persistent artifacts - places accumulate structured memory and identity - distributed services maintain the world state of these environments Recent proofs: - 12k → 175k students reached in 2025 - 25% of MAUs complete quests - pilots show up to 40% reduction in social isolation The team combines consumer product, spatial, and game-world expertise, with seasones talents from: Roblox • Niantic • Ubisoft alongside a research layer that includes: 2 professors • 4 PhDs Future Direction Today Vyry operates as a mobile app. The next phase evolves the system into infrastructure for smart glasees. The long-term architecture enables developers and brands to create context-aware activities inside the spatial environment, similar to how platforms like Roblox enabled creators to build experiences on top of a shared world layer. This requires building distributed systems capable of coordinating environments, users, AI agents, and third-party developers simultaneously. What You Would Work On You would help design and build the distributed systems layer that powers the platform. This includes: - real-time coordination systems for large numbers of concurrent users - distributed world state for artifacts, quests, identities, and spatial context - APIs and infrastructure enabling developer and brand participation - orchestration of multiple AI agents operating within the system - scalable service architecture for persistent spatial environments - AI platform infrastructure such as vector search, retrieval systems, and agent orchestration that power contextual reasoning inside the system - search and discovery infrastructure allowing users and developers to navigate artifacts, identities, and environments across the network This role sits between game infrastructure, distributed systems, and AI platform architecture. Hard Problems We’re Exploring - Persistent shared world state - artifacts, identities, and quests evolving across sessions and users - Massively concurrent spatial coordination - enabling real-time participation across many users interacting in the same environments - Distributed AI orchestration - production systems where multiple AI agents operate in parallel while maintaining reliability and latency guarantees - Platform infrastructure - evolving the system from a single product into infrastructure other developers can build on Role We are looking for a hands-on distributed systems engineer from the gaming industry who wants to become a founding engineer. You would: - design distributed system architecture - define service boundaries and coordination mechanisms - build systems for concurrency, persistence, and scale - integrate AI-driven pipelines into production infrastructure - shape the platform architecture as the system evolves beyond the app We are not looking for someone to simply build backend services, but a talent who wants to architect the distributed systems layer of a new spatial computing platform. Compensation This is an equity-first founding role. Compensation is equity-only until our next funding round (target H2 2026). You can join either: - Part-time, while keeping your current role - Full-time This role is designed for someone who wants to be a founding engineer at a startup without having to start one from scratch. Who This Is For Engineers who: - have deep experience building distributed systems - have worked on real-time or game-scale infrastructure - think in systems architecture rather than isolated services - are comfortable designing for concurrency, reliability, and scale - have experience building AI or ML platforms, such as: - vector search or retrieval systems - RAG pipelines or LLM services - agent orchestration platforms - AI safety or synthetic data pipelines - are already experimenting with AI-native development workflows This often resonates with engineers who have spent years inside large companies and now want to build something category-defining with ownership. Who This Is NOT For Please do not apply if: - you are looking for a salaried job today - you want a freelance contract or short-term gig - you are bridging time until your next role - you are primarily a traditional CRUD backend developer This is a founding role with startup emotions. If Interested Send: - GitHub or portfolio - examples of distributed systems or infrastructure you built - anything you’ve built involving AI orchestration, agents, or platform systems

United States
eBay logo

Senior AI Engineer

eBay

One of the world's largest ecommerce marketplaces, eBay was founded in 1995 with an online platform designed to provide an open, trustworthy forum for sellers a

AI Engineer100 days ago

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more. Role Description This opportunity is for builders who thrive between ambiguity and execution. At eBay, you will help define and deliver the next wave of AI-powered marketplace experiences by turning emerging ideas into measurable outcomes. Success in this role means creating new capabilities that improve customer and business impact at scale. We are looking for a Senior AI Engineer with a strong focus on AI research and innovation, combining deep experimentation with systems-building execution. You will own the full lifecycle of applied AI initiatives: problem framing, experimentation, prototyping, and production hardening. The role combines research depth with strong systems execution across modeling, serving, and product integration. This is not a pure research role and not a pure backend role. You will own end-to-end vertical slices and partner closely with software engineers, researchers, data engineers, product, and design. What You Will Do - Research and Modeling - Design, train, and evaluate systems across LLMs, retrieval, ranking, personalization, multimodal AI, and agent architectures. - Apply fine-tuning, distillation, retrieval-augmented generation (RAG), and orchestration strategies to improve quality and efficiency. - Run structured experiments and reason clearly about quality, latency, reliability, and cost tradeoffs. - Translate ambiguous product problems into modeling and system strategies that can be validated with users. - Applied AI Systems and Platform - Design and build advanced AI capabilities across LLM workflows, retrieval systems, ranking, personalization, multimodal experiences, and agent-led interactions. - Develop reusable platform components, APIs, and best practices for AI application development. - Evaluate and integrate new models, tools, and frameworks with a critical, production-first lens. - Contribute to responsible AI practices, including safety, monitoring, and governance. - Systems and Scaling - Build scalable inference pipelines and services for high-throughput, low-latency workloads. - Optimize batching, streaming, caching, and request orchestration in distributed and async environments. - Improve production systems across latency, throughput, reliability, observability, and unit economics. - Partner with infrastructure teams to leverage GPU-enabled and cloud-native environments effectively. - Application Integration and Prototyping - Integrate AI capabilities into real APIs, applications, and user experiences. - Design end-to-end systems from data ingestion and model serving to user interaction. - Build prototypes that demonstrate user value, not only model-level metrics. - Own delivery from idea to deployed prototype and production iteration. Qualifications - 8+ years of software engineering and/or machine learning engineering experience. - 4+ years of focused experience building and deploying AI-centric systems. - 2+ years of hands-on experience with LLM-based agents, autonomous workflows, or multi-agent orchestration. - Strong programming skills in Java or similar JVM languages, with working proficiency in Python. - Production experience with modern ML tooling and frameworks (for example: PyTorch, Transformers, scikit-learn). - Proven experience taking AI-powered products from prototype to production with strong maintainability and operational quality. - Experience designing scalable backend services and APIs for real-world traffic. - Familiarity with distributed systems, async processing, and reactive architectures. - Ability to move from ambiguous problem statements to working systems with clear technical judgment. - Strong communication skills and ability to collaborate across research, engineering, and product. Preferred Qualifications - Experience with Spring-based service development. - Experience with Docker and Kubernetes in production environments. - Familiarity with big data and processing ecosystems (for example: Spark, Hadoop). - Experience with streaming systems (for example: Kafka, Flink, Beam). - Experience with RAG pipelines, vector stores, tool-use frameworks, and multimodal model integration. - Exposure to GPU optimization and performance tuning (for example: CUDA, inference optimization techniques). - Experience building conversational AI systems (intents, entities, dialog flows, and interaction design). - Ability to create lightweight frontend or internal tooling to accelerate prototyping and validation. Benefits - The base pay range for this position is expected in the range below: $156,800 - $255,300. - Base pay offered may vary depending on multiple individualized factors, including location, skills, and experience. - The total compensation package for this position may also include other elements, including a target bonus and restricted stock units (as applicable). - A full range of medical, financial, and/or other benefits (including 401(k) eligibility and various paid time off benefits, such as PTO and parental leave).

United States
$156.8K - $255.3K / year
Job Closed