Technology enabled cybersecurity services company focused on Pentesting-as-a-Service (PTaaS).
Senior Application Security Tester, AI Red Team Subject Matter Expert
Location
United States
Posted
30 days ago
Salary
0
Seniority
Senior
Job Description
Senior Application Security Tester, AI Red Team Subject Matter Expert
Evolve Security
The Senior Application Security Tester & AI Red Team Subject Matter Expert is a senior-level offensive security role for a tester who has mastered modern web and API security and is now defining how Evolve Security tests AI-enabled applications, large language models, and agentic systems. This role wears two hats: hands-on senior application penetration tester for our most complex client engagements, and the firm-wide subject matter expert who builds, scales, and represents Evolve Security’s AI red team practice. The senior tester executes assessments with full autonomy, owns the technical relationship with client security and engineering leadership, mentors mid-level engineers and OSOC analysts, and is the recognized internal authority on offensive AI/ML testing methodology, tooling, and threat modeling.
Job Requirements
- Typical Experience: **5–8+ years of offensive security experience with a deep concentration in web application and API penetration testing, plus demonstrable hands-on work testing AI/ML systems — LLM-backed applications, RAG pipelines, fine-tuned models, multi-agent systems, or production ML inference. A track record of dozens of completed assessments, published research, conference talks, CVEs, or open-source contributions is expected.
- Domain Expertise: **Mastery of web application and API security beyond the OWASP Top 10 — business logic abuse, complex authentication and authorization flows (OAuth 2.0 / OIDC, SAML, JWT, mTLS), SSRF chains, deserialization, request smuggling, prototype pollution, and modern SPA / GraphQL attack surface. Equally fluent in the OWASP Top 10 for LLM Applications and OWASP ML Top 10 — prompt injection (direct, indirect, multi-modal), jailbreaks and safety bypasses, insecure output handling, training data poisoning and extraction, model denial of service, supply chain vulnerabilities in model and plugin ecosystems, excessive agency in agentic systems, sensitive data leakage from system prompts and embeddings, and vector store / RAG poisoning.
- Technical Skills: **Expert with the modern offensive toolchain — Burp Suite Pro (including custom extensions), OWASP ZAP, Nuclei, Postman, Nmap, Metasploit, BloodHound — and able to build bespoke tooling when the off-the-shelf option falls short. Comfortable with AI red-teaming tooling such as Garak, PyRIT, Promptfoo, Giskard, and adversarial ML libraries, and confident designing custom evaluation harnesses against client-specific LLM and agent stacks. Strong scripting and small-tool development in Python, with working knowledge of JavaScript / TypeScript, Bash, and PowerShell. Familiar with the components of modern AI applications: vector databases (Pinecone, Weaviate, pgvector), embedding models, retrieval pipelines, agent frameworks (LangChain, LlamaIndex, CrewAI), and tool-use protocols including MCP.
- Soft Skills: **Excellent written and verbal communication — produces publication-quality reports with no editorial rework, leads CISO and engineering-leader briefings, and de-escalates contested findings with technical rigor. Mentors mid-level engineers and OSOC analysts through code review, paired testing, and methodology coaching. Comfortable representing Evolve Security externally — webinars, podcasts, conference CFPs, and client thought-leadership content.
- Certifications (Preferred, not required): **OSWE, OSCP, OSEP, GWAPT, GXPN, Burp Suite Certified Practitioner; AI/ML-adjacent credentials and contributions such as AI Red Team certifications, published prompt injection research, MITRE ATLAS contributions, or SANS SEC545/SEC595.
- Expertise that aligns to our approach
- Lead end-to-end web application and API penetration tests as the senior technical owner, scoping the engagement, executing the assessment, and presenting findings to client security and engineering leadership.
- Apply structured testing techniques aligned to OWASP WSTG and OWASP API Security Top 10 to assess authentication, session management, access control (vertical and horizontal privilege escalation), input validation, error handling, and business logic flaws.
- Design and execute AI red team engagements against LLM-backed applications, RAG systems, and agentic workflows — covering prompt injection (direct, indirect, multi-modal), jailbreak resilience, system prompt and tool-use exfiltration, training data and embedding leakage, insecure output handling, and excessive agency in tool-using agents.
- Map AI findings to the OWASP Top 10 for LLM Applications, OWASP ML Top 10, MITRE ATLAS, and the NIST AI Risk Management Framework so client stakeholders can defend severity and remediation calls internally.
- Test the full AI application surface: model endpoints, prompt and response pipelines, retrieval augmentation, vector stores, fine-tuning pipelines, plugin / tool integrations (including MCP servers), guardrail and safety layers, and supporting cloud infrastructure.
- Demonstrate proficiency in manual exploit development for both classical web vulnerabilities (XSS, SQLi, SSRF, IDOR, CSRF, deserialization) and LLM-specific attacks (jailbreak chains, indirect prompt injection via RAG content, agent hijacking via crafted tool outputs).
- Validate authentication mechanisms — OAuth, OIDC, SAML, MFA implementations, and JWT — and how they extend into AI-specific surfaces such as agent identity, per-user tool scoping, and prompt-level authorization.
- Assess session management, secrets handling, and data-flow controls in AI applications, including how user data ends up in prompts, logs, vector stores, and model fine-tunes.
- Execute client-side testing using browser dev tools and proxy-based inspection, evaluating DOM-based vulnerabilities, insecure local storage, and AI-driven client behaviors (e.g., embedded copilots and in-page agents).
- Test REST and GraphQL APIs using a combination of dynamic, manual, and automated methods; extend the same rigor to model and agent APIs.
- Perform code-assisted (grey-box) and full source review when available, identifying logic flaws, insecure configurations, and dangerous patterns specific to AI integrations (untrusted-content-into-prompt, unbounded tool use, missing output sanitization).
- Build, maintain, and contribute to Evolve Security’s AI red team methodology, payload libraries, evaluation harnesses, and reporting templates — and serve as the firm-wide reviewer for AI-related findings.
- Mentor mid-level penetration testing engineers and OSOC analysts through paired testing, technical review, knowledge-sharing sessions, and contributions to internal training and the academy.
- Represent Evolve Security externally through conference talks, blog posts, webinars, and client thought-leadership content on application security and AI red teaming.
- Communicate findings clearly, with strong emphasis on business impact, reproducibility, and strategic remediation guidance that engineering teams can actually ship.
- Success in the first 6 months looks like:
- Published, version-controlled AI red team methodology covering LLM applications, RAG systems, and agentic workflows, adopted across Evolve Security engagements.
- A reusable AI red team toolkit (custom Garak/PyRIT probes, payload libraries, evaluation harnesses) ready for any tester to use on a client engagement.
- Senior technical ownership of at least one strategic, AI-focused client account.
- Mentorship cadence in place with mid-level engineers and OSOC analysts; demonstrable uplift in their AI-related findings and reporting quality.
- At least one piece of public thought leadership (talk, blog, or research) attributed to Evolve Security.
Benefits
- Who is Evolve Security?
- Evolve Security is a cybersecurity services firm headquartered in Chicago, IL. We are dedicated to improving our client’s security posture by providing continuous penetration testing, training services, and talent solutions.
- In addition to our professional cybersecurity service offerings, Evolve Security offers a cybersecurity bootcamp, “Evolve Academy”, currently ranked the #1 cybersecurity bootcamp in the world. The Cybersecurity Bootcamp in Chicago provides immersive training, giving students the concrete and practical skills, needed on the job. Students gain real work experience through live security assessment work that they perform on not-for-profit companies.
- We are passionate about directly improving our customers’ security posture, and we proudly train others to help meet the need for qualified cybersecurity talent.
- Benefits Include
- Healthcare Benefits
- 401(k) Match
- Parental Leave
- Flexible Paid Time Off
- Annual vacation reimbursement
Related Guides
Related Categories
Related Job Pages
More QA Engineer Jobs
QA Analyst III – Marketing Technology
PrizePicksPrizePicks is the fastest-growing sports company in North America according to the 2023 Inc. 5000 rankings, two years running, and the largest independent skill-based fantasy sports operator in the country.
• Own attribution and deep link testing: Design and execute end-to-end test scenarios that validate attribution accuracy across paid and organic channels — from ad click through install, open, and conversion event — using tools like AppsFlyer. • Validate event data integrity: Trace events from source to destination across our data pipeline — from SDK instrumentation through CDP transformations into downstream platforms like Braze and our data warehouse. • Validate ad platform integrations (paid social & beyond): Test and verify event flows and attribution across major ad platforms, with a strong emphasis on paid social (e.g., Meta, TikTok, Snap). • Execute structured test plans (no-code): Develop and maintain detailed manual and semi-automated test plans for MarTech integrations, including event validation, audience sync accuracy, campaign trigger logic, and API behavior. • Define QA standards for the MarTech stack: Establish testing protocols, documentation standards, and acceptance criteria for new integrations and campaigns. • Drive data quality observability: Proactively monitor data quality across systems — identifying misconfigured events, broken integrations, and attribution anomalies before they surface in dashboards or impact campaigns. • Collaborate cross-functionally: Work closely with Software Engineering, Data Engineering, Product, Marketing, and Analytics to validate that campaign logic, segmentation, and measurement infrastructure behave as intended. • Support incident investigation and resolution: Investigate MarTech-related issues, perform root cause analysis, and validate fixes.
DRG QA Trainer
MachinifyMachinify focuses on providing machine learning solutions to businesses and was created to help companies integrate artificial intelligence into everyday practi
• Review and audit the work of new and existing DRG Reviewers to ensure accuracy, compliance, and alignment with established coding and clinical validation guidelines. • Provide constructive, detailed feedback to DRG Reviewers, highlighting areas for improvement and recognizing high-performance results • Collaborate with management to promote continuous improvement in audit accuracy, workflow efficiency, and reviewer performance • Assist the educator by conducting supplemental training sessions for new hires on DRG methodologies, clinical validation, and proprietary tools • Provide ongoing training to existing staff, focusing on policy updates, skill development, and performance enhancement • Maintain comprehensive documentation of audit findings, feedback sessions, and training activities • Generate reports on the quality assurance results and the effectiveness of training sessions for review by management • Work closely with team leaders and the educator to ensure the training and QA processes align with the company’s standards and objectives • Serve as an expert resource for DRG Reviewers and other staff members, answering questions and offering guidance on complex cases or procedures
• Desenvolver e executar planos de teste abrangentes, casos de teste e procedimentos de teste para garantir a qualidade da aplicação; • Colaborar com desenvolvedores, analistas de negócios e outros membros da equipe para entender os requisitos, funcionalidades e casos de uso esperados; • Identificar, registrar, documentar e acompanhar bugs e outras questões de qualidade; • Realizar testes funcionais, de regressão, de integração e de performance para validar a aplicação; • Participar de revisões de especificação de requisitos e sessões de planejamento de projeto; • Estabelecer e promover as melhores práticas de processo de QA; • Auxiliar na definição de métricas e KPIs de qualidade e garantir que as metas sejam atingidas.
• Plan and execute penetration tests against internal and external infrastructure, web applications, APIs, and cloud environments (primarily AWS) • Conduct red team engagements simulating advanced persistent threats (APTs) and real-world attack chains • Design, build, and maintain automated pentesting and security scanning pipelines integrated into CI/CD workflows • Leverage AI and machine-learning–based tools (e.g., LLM-assisted vulnerability discovery, automated exploit generation, AI-driven anomaly detection) to scale offensive security operations • Develop custom exploit code, scripts, and tooling tailored to the organisation's technology stack • Assess and harden Kubernetes and AWS environments (IAM, VPC, EKS, Lambda, S3, CloudTrail, GuardDuty, etc.) • Document findings in clear, actionable reports with risk ratings and remediation guidance • Collaborate with SOC, DevOps, and engineering teams to validate fixes and improve detection capabilities • Contribute to purple team exercises bridging offensive findings with defensive improvements • Stay current on emerging attack techniques, CVEs, threat intelligence, and offensive security research • Mentor junior security team members on offensive methodologies and tooling


