Job Closed
This listing is no longer active.
Principal Engineer
Location
United States
Posted
99 days ago
Salary
0
Seniority
Lead
No structured requirement data.
Job Description
Principal Engineer
Prove AI
We are looking for a Principal Engineer who will sit at the technical center of our AI platform—owning architecture, standards, and production excellence across the stack. This is not a pure research role, nor a model R&D position. You will lead the productionization of GenAI systems—owning evaluation pipelines, tracing and observability, inference orchestration, safety guardrails, dataset/prompt workflows, and developer-facing tooling. You will translate ambiguous product problems into reliable, scalable platform capabilities that customers trust with production workloads. You will operate as a force multiplier—setting architectural standards, mentoring senior engineers, influencing roadmap trade-offs, and partnering directly with product and executive leadership to shape technical direction. Required Experience - 10+ years of software engineering experience with recent hands-on coding - Experience operating at Principal / Distinguished / Staff / Lead Engineer level (or equivalent while still coding) - Proven ownership of production GenAI systems (RAG, eval frameworks, inference services, safety/guardrails, tracing/observability) - Deep expertise in Python - Strong practical capability in TypeScript, React, and Node.js - Strong cloud and platform experience (AWS/GCP/Azure, Kubernetes, CI/CD, distributed systems) - Experience driving multi-team architecture decisions - Ability to communicate trade-offs clearly to senior leadership Strong Plus - RAG evaluation frameworks (ragas, custom eval harnesses) - Vector databases (Pinecone, Weaviate, Milvus, Qdrant) - OpenTelemetry instrumentation - Performance optimization of inference systems - Exposure to blockchain/tamper-evidence concepts - Own technical vision and architecture across GenAI platform capabilities - Lead end-to-end productionization of AI features (not prototypes) - Define and operate readiness criteria for AI releases (eval gates, rollout strategy, rollback mechanisms, SLOs) - Design and own core platform capabilities: - Evaluation pipelines - Tracing and observability systems - Prompt/version/dataset workflows - Guardrails and safety systems - Inference orchestration layers - Establish and track metrics for latency (p95/p99), cost efficiency, quality, and safety - Write clear architectural proposals that influence multi-team decisions - Partner with Product to translate ambiguous user pain into staged technical delivery - Raise the engineering bar through reviews, documentation, mentoring, and principled system design - Contribute hands-on across Python services, Node/TypeScript APIs, and React UI when necessary - Join at the ground floor of a GenAI infrastructure company defining a new category - Shape the technical foundation before the architecture calcifies - Work directly with senior product and engineering leadership - Build production systems that AI teams depend on daily - Fully remote - ESOP equity - Flexible hours - Generous PTO - Global offsites - Education support - Clear advancement path
Related Guides
Related Job Pages
More AI Engineer Jobs
We are looking for a Principal Engineer – AI/ML Platform who will own the architecture, productionization, and operational excellence of our machine learning and LLM infrastructure.This is not a research scientist role. You will define how GenAI systems are evaluated, deployed, monitored, governed, and continuously improved at scale. You will shape standards across model integration, evaluation frameworks, inference systems, safety mechanisms, telemetry instrumentation, and AI/ML workflow automation. You will operate at the intersection of AI engineering, distributed systems, and platform architecture—partnering closely with Product and Engineering leadership to ensure our AI systems are reliable, observable, safe, and economically scalable in enterprise production environments. Required Experience - 10+ years of software engineering experience with significant recent hands-on AI/ML work - Proven ownership of production AI/ML or LLM systems at scale (not research or prototypes) - Deep expertise in LLM productionization (RAG, finetuning, evaluation, guardrails, model monitoring) - Strong Python expertise - Experience with modern AI frameworks (PyTorch, TensorFlow, JAX, Scikit-learn) - Hands-on AI/MLOps experience (CI/CD for ML, deployment automation, experiment tracking, monitoring) - Experience with cloud platforms (AWS/GCP/Azure), Kubernetes, and distributed systems - Experience implementing evaluation pipelines and observability instrumentation - Demonstrated technical leadership influencing multi-team architectural direction Strong Plus - Experience with ML workflow orchestration platforms (Kubeflow, MLflow, Vertex AI, SageMaker) - Expertise in model governance, bias evaluation, compliance, and drift detection - Domain expertise in NLP, agentic systems, recommender systems, or similar applied AI areas - Open-source AI/ML contributions - Master’s or PhD in ML/AI-related field - Define and own architecture for scalable AI/ML and LLM systems, including: - Inference pipelines - Evaluation frameworks - Model lifecycle workflows - Monitoring and observability systems - Translate ambiguous business requirements into robust AI platform designs and staged delivery plans - Make strategic decisions on: - Model integrations and gateways - Retrieval-augmented generation (RAG) approaches - Evaluation methodologies - Safety and guardrail systems - Establish standards for model readiness, evaluation gates, rollout/rollback mechanisms, and drift detection - Build and deploy production-grade LLM capabilities integrated into distributed systems with clear SLOs and telemetry - Design scalable AI/MLOps and AIOps practices across training, testing, deployment, and monitoring - Improve data pipelines, feature workflows, and lineage processes supporting model evaluation and inference - Instrument tracing and model observability using OpenTelemetry and modern telemetry standards - Own evaluation pipelines tracking latency, cost, accuracy, hallucination rates, and prompt/version drift - Provide clear trade-off analyses balancing model performance, cost efficiency, safety, and maintainability - Write structured technical proposals that guide executive investment and roadmap decisions - Mentor engineers in AI productionization, experimentation discipline, and distributed systems design - Raise the engineering bar through principled reviews, documentation, and mechanism-driven standards - Shape the AI production architecture of a category-defining GenAI infrastructure company - Define how enterprise-grade AI systems are observed, evaluated, and remediated - Build mechanisms that scale beyond individual engineers - Influence roadmap and platform strategy at a formative stage - Fully remote - ESOP equity - Flexible hours - Generous PTO - Global offsites - Education support - Clear advancement opportunities
Prompt Engineer
PencilPencil uses AI to make ads. Our mission is to make marketing effective and effortless. We want to become the default way ads get made — because AI ads are 10x faster and cheaper to make, and 2x better performing, than making them without AI. Pencil was founded in 2018 with a team from Google, Facebook and Uber with backing from Sequoia and Entrepreneur First. We were acquired by The Brandtech Group in 2023 to pursue a shared vision of bringing GenAI to the Fortune 500.
At Pencil, we’re building the agentic OS for marketing. We aren't just using integrating generative AI; we are building the machine that makes it professional, brand-safe, and scalable. We’re moving beyond simple text-in/text-out interfaces toward complex multi-agent architectures that can handle the nuanced demands of global brands and the agility of small businesses. We are looking for a Prompt Engineer who thinks in systems, not just sentences. You will be responsible for the "brain" of Pencil—designing the agent logic, tool-calling structures, and evaluation loops that power our core creative engine and bespoke client solutions. The Role: Architecting the Creative Brain:You won’t just be "prompting"; you will be engineering behavior. You will bridge the gap between creative intent and machine execution, ensuring that our agents are robust, predictable, and capable of high-fidelity output across text, image, and video. Your work will fall into two high-impact pillars: - Core Systems: Designing and scaling the foundational agents that power the Pencil SaaS platform. - Client Solutions: Architecting custom workflows for world-class brands that require specific "brand DNA" and complex creative logic. - 3+ Years of Direct GenAI Experience: You have a deep, intuitive, and technical understanding of LLMs (GPT-4, Claude, Gemini) and multimodal models (Stable Diffusion, Midjourney, Video Gen). - Systems Thinking: You don’t just write a prompt; you think about the latent space, the context window, and how one agent's output becomes another’s input. - Technical Literacy: While this isn't a "Software Engineer" role, you should be comfortable with Python, JSON structures, and API documentation. Experience with orchestration frameworks (LangChain, CrewAI, AutoGen) is a major plus. - Evaluation Obsession: You believe that if you can’t measure a prompt’s performance, you shouldn’t ship it. You are familiar with benchmarking and A/B testing AI outputs. - The "Creative/Technical" Bridge: You can sit in a room with a Creative Director and translate "make it feel more punchy" into a temperature adjustment and a few-shot prompting strategy. You’ll Thrive Here If... - You find "hallucinations" to be a logic puzzle to be solved, not just a bug. - You are excited by the challenge of making an AI follow a 50-page brand book with 100% fidelity. - You want to build the infrastructure that defines how the next generation of advertising is created. - Design and refine prompts, workflows, and AI agents that enable clients to generate high-quality creative outputs at scale. - Collaborate with the Delivery teams to understand client goals and translate them into prompt-based solutions. - Build reusable prompt frameworks and templates for recurring use cases. - Test, evaluate, and iterate on prompts and agents to improve output quality, efficiency, and user satisfaction. - Define and document best practices for prompt creation, testing, and deployment within Pencil’s platform. - Partner with Product and Engineering teams to identify new capabilities or improvements based on client feedback. - Contribute to internal playbooks, demos, and guides that showcase how to get the most out of our agents and AI tools. KPIs & Success Measures - Agent adoption rate: Growth in the number of active clients using Pencil-built agents and workflows. - Creative generation volume: Increase in the number of generations produced through agents you’ve developed. - Output quality: Improvement in client satisfaction scores or internal quality benchmarks. - Efficiency: Reduction in time-to-first-output or manual setup time for client projects. - Knowledge sharing: Contributions to prompt libraries, playbooks, and team enablement materials. - 25 days PTO plus public holidays, although we operate a Flexible Time Off scheme. - Health insurance / private medical cover. - Monthly stipend towards wellness, fitness, and learning and development. - Remote - work from anywhere in your home country. - Enhanced parental leave policies, whether you become a parent through birth, adoption or surrogacy. - Access to our Pencil office in The Shard, London for our UK employees. - Flexible working hours.
AI Red-Teamer - Adversarial AI Testing English
Weekday (YC W21)We are a Y-Combinator-backed startup building your AI-powered Recruiter Agent
This role is for one of our clients Compensation: $50-$111 per hour We are seeking AI Red-Teamers to help test and strengthen modern AI systems through adversarial evaluation. In this role, you will challenge AI models with carefully designed inputs to uncover weaknesses, surface vulnerabilities, and generate high-quality data that improves the safety, reliability, and robustness of conversational AI. This work focuses on proactively identifying potential risks before they appear in real-world use. By systematically probing AI systems, you will help ensure they respond safely, accurately, and responsibly across a wide range of scenarios. This role may include reviewing AI outputs that reference sensitive topics such as bias, misinformation, or harmful behaviors. All work is text-based, and participation in higher-sensitivity projects is optional and supported with clear guidelines and wellness resources.
The Reality Most companies claim to be “AI-powered”—but treat LLMs as a magic API. At hillock., you won’t be shipping “proofs of concept”; you’ll engineer production-grade systems that actually move the revenue needle for clients. What You’ll Actually Do You’re the engineer who turns “Our engine found 847 high-intent prospects matching your ICP” into automated pipelines that generate measurable pipeline and revenue. This isn’t about building demos; it’s about shipping reliable AI-powered systems that let client execs sleep at night. AI System Design & GTM Automation - Turn proprietary ICP and market research into robust go-to-market automations—whether programmatic outreach, SEO, ABM, or influencer marketing—using AI/ML tools and frameworks. - Eliminate random acts of marketing: Architect data-driven, repeatable GTM workflows instead of hand-crafted campaign hacks. - Drive measurable results: Enable 73% of clients to hit pipeline targets in Q1 through automation and precision—not “one-size-fits-all” playbooks. Technical Execution & Performance Tracking - Build, deploy, and maintain AI-powered GTM tools and integrations—think automated prospecting, lead scoring, campaign orchestration, and attribution systems. - Prioritize business-critical metrics: pipeline velocity, cost per qualified lead, revenue contribution—not vanity stats. - Ensure reliability and scalability: ship software that works across multiple client accounts without constant babysitting. Cross-functional Collaboration - Work directly with GTM strategists and client leads who have taken companies from zero to $100M ARR—we make sure engineering aligns to business impact. - Communicate technical concepts to CEOs/CMOs and non-technical stakeholders, translating AI possibilities into commercial outcomes. - Collaborate on internal product development that multiplies humans with automation—not replaces them. What We’re Looking For The Essentials - 3+ years building production software in GTM/growth, marketing tech, data engineering, or AI/ML environments (track record > tenure). - Experience deploying and integrating ML, LLM, or automation tools into real-world marketing or sales workflows. - Comfort juggling multiple projects and priorities—delivering predictable outcomes at pace. What Sets You Apart - You see marketing as an engineering problem, not just a branding race. - Strong with data analysis, pipelines, and workflow automation. You’ve worked with AI-driven platforms, open-source tools, and have a builder’s mindset. - Passion for tracking what moves the dial: you debug both code and metrics, and can explain “why it works” to a founder or marketer. The Real Test - You know the difference between fragile AI demos and production-grade automation. - Not afraid to push back when commercial ideas exceed what tech can do—or to rapidly prototype until it can. - Dismiss buzzwords, prioritize accountability. Why hillock. Works Differently - No Agency Theater: You’ll engineer with direct access to proprietary AI, ICP research, and workflow automation tools that dwarf what most “AI agencies” have. - No Random Acts of Marketing: Build unified GTM systems with measurable impact. - Real Technical Depth: Collaborate with top-tier strategists and operators, not just “visionaries” with no shipping record. - Actually Remote-First: Results matter more than location. Most GTM engineering roles mean translating vague growth goals into technical hand-waving. Here, you’ll ship production AI that drives revenue—and always know if it’s working. Ready to engineer the future of AI-powered GTM? Tell us your best story—where did you save a go-to-market team from technical chaos, or build something nobody thought possible?
