Modern work management platform
Senior Software Engineer II – Applied AI and Evaluations
Location
United States
Posted
60 days ago
Salary
$175K - $245K / year
Seniority
Senior
Job Description
Senior Software Engineer II – Applied AI and Evaluations
Smartsheet
• Own agent quality end-to-end: diagnosis, improvement, and validation across SmartAssist's orchestrator and subagents • Identify failure modes across quality dimensions factual accuracy, completeness, tone, actionability, and latency and prioritize what to fix • Drive quality improvements through prompt engineering, context engineering, and RAG retrieval tuning • Extend and mature our evaluation framework: scorers, golden datasets, regression gates, and online evaluation for production traffic • Close the feedback loop ensure that every change has a measurable, attributable quality signal • Collaborate with our Agent Architecture lead to distinguish quality problems that require prompt/context solutions from those that require structural fixes • Establish repeatable methodology that scales beyond any single agent or subagent
Job Requirements
- 8+ years of software engineering experience, with at least 2 years working directly with LLMs in production
- Deep, hands-on experience with prompt engineering and context engineering, you understand how model behavior changes with framing, structure, and input design
- Strong working knowledge of RAG architectures: chunking strategies, embedding models, retrieval evaluation, and failure diagnosis
- Experience building or extending LLM evaluation frameworks, you have designed scorers, worked with golden datasets, and thought carefully about what good looks like
- Fluency in agent system design, you don't need to own the architecture, but you can engage as a peer on architectural tradeoffs that affect quality
- Strong Python skills; comfortable working in data-heavy environments (Databricks, Delta tables, or equivalent)
- Ability to communicate complex quality findings (written and verbal) to both technical and non-technical stakeholders, you can explain what’s broke, why it matters, and what needs to happen next without losing the room
- Strong cross-functional judgment, you know when to escalate, when to resolve independently, and how to build credibility across engineering, product, and AI platform teams
- A bias for clarity in ambiguous situations, when failure modes are murky and trade-offs are real, you bring structure and a clear point of view rather than waiting for consensus
- Legally eligible to work in the U.S. on an ongoing basis
- BS or MS in Computer Science, a related field, or equivalent industry experience.
Benefits
- Employer subsidized medical/vision and dental coverage for full-time employees
- 401k Match to help you save for your future (50% of your contribution up to the first 6% of your eligible pay)
- Monthly stipend to support your work and productivity
- Flexible Time Away Program, plus Sick Time Off
- US employees are automatically covered under Smartsheet-sponsored life insurance, short-term, and long-term disability plans
- US employees receive 12 paid holidays per year
- Up to 24 weeks of Parental Leave
- Personal paid Volunteer Day to support our community
- Opportunities for professional growth and development including access to Udemy online courses
- Company Funded Perks, including a counseling membership, local retail discounts, and your own personal Smartsheet account
- Teleworking options from any registered location in the U.S. (role specific)
Related Guides
Related Job Pages
More Full-stack Engineer Jobs
Software Engineer, Borrower Experience
UpstartOur mission is to enable effortless credit based on true risk.
• Design and deliver borrower-facing features across web, mobile, and AI-powered experiences to improve self-service outcomes and payment success • Translate servicing workflows into intuitive, scalable product experiences that reduce friction and inbound contact volume • Build and maintain full-stack systems integrating backend services with modern frontend frameworks • Drive experimentation through A/B testing, analytics, and observability tools to improve borrower outcomes • Partner with Product, Design, Analytics, and Engineering to deliver high-quality solutions • Contribute to architectural decisions that unify systems into a cohesive borrower experience
About Etera Etera is building the first AI-native corporate travel platform for the GCC market — designed from the ground up for how businesses in the region actually operate. We're a lean, high-conviction team replacing legacy TMCs with autonomous agents, real-time inventory, and a product experience that doesn't feel like it was designed in 2009. The Role We're looking for a Senior AI Fullstack Engineer who can own the full stack — from a smooth React Native mobile experience to TypeScript backend services and the AI/LLM layer that powers our multi-agent infrastructure (OPUS Prime). If you think in systems, ship fast, and find it natural to reason about both user-facing product and underlying model behavior, this role was built for you. What You'll Own - Mobile-first product: Build and maintain Etera's React Native app (Expo/TypeScript) — booking flows, real-time updates, multilingual UX (Arabic/English). - Backend services: Design and extend our TypeScript backend, including booking APIs, supplier integrations (PKfare, RateHawk, Hotelbeds), and webhook orchestration. - LLM integration: Work directly with Claude (Anthropic API) and our multi-agent OPUS Prime system — building, tuning, and extending AI pipelines for travel intent detection, QA, and automated engineering reports. - End-to-end delivery: Take features from spec to production. Own the full lifecycle — architecture, implementation, testing, deployment. - Infra & reliability: Firebase Auth (OTP, Apple, Google), EAS Build/TestFlight pipelines, and CI/CD hygiene across the stack. - Code quality: Participate in security audits and peer review within a small, high-trust engineering team. Tech Stack Frontend / Mobile: React Native + Expo, TypeScript, Firebase Auth (OTP, Apple, Google), EAS Build / TestFlight, Native iOS/Android context. Backend / AI: TypeScript (end-to-end), Anthropic API / Claude, Mastra (agent orchestration framework), LangChain (chains, tools, memory), Zeabur / cloud deployment, REST + Webhook integrations. You've Built and Run Agent Teams. Plural. This is not a role for someone who has experimented with LLMs on side projects. We're looking for engineers who have designed, shipped, and actively maintained multi-agent systems — and who understand what it takes to keep them reliable in production. - Proven agent team ownership: You have built one or more autonomous agent teams from scratch — defined their scope, wired their tools, managed inter-agent communication, and own their continued operation. - Self-validating mindset: You don't hand off agent output blindly. You act as the primary validation layer for the systems you build — designing evaluation criteria, catching failure modes, and iterating on agent behavior based on real output quality. - Cross-domain fluency: Your agents have touched multiple domains — not just one narrow use case. You can reason about agent behavior across booking logic, data quality, QA pipelines, code review, or content generation without needing a specialist in each area. - Maintenance discipline: You have a track record of keeping agent systems healthy over time — prompt drift, tool failures, context window management, regression testing. Not just shipping and moving on. - Full-stack integration: You connect agent teams to real product surfaces — not just notebooks or internal tools. Your work shows up in APIs, mobile interfaces, or backend workflows that real users depend on.
About Etera Etera is building the first AI-native corporate travel platform for the GCC market — designed from the ground up for how businesses in the region actually operate. We're a lean, high-conviction team replacing legacy TMCs with autonomous agents, real-time inventory, and a product experience that doesn't feel like it was designed in 2009. The Role We're looking for a Senior AI Fullstack Engineer who can own the full stack — from a smooth React Native mobile experience to TypeScript backend services and the AI/LLM layer that powers our multi-agent infrastructure (OPUS Prime). If you think in systems, ship fast, and find it natural to reason about both user-facing product and underlying model behavior, this role was built for you. What You'll Own - Mobile-first product: Build and maintain Etera's React Native app (Expo/TypeScript) — booking flows, real-time updates, multilingual UX (Arabic/English). - Backend services: Design and extend our TypeScript backend, including booking APIs, supplier integrations (PKfare, RateHawk, Hotelbeds), and webhook orchestration. - LLM integration: Work directly with Claude (Anthropic API) and our multi-agent OPUS Prime system — building, tuning, and extending AI pipelines for travel intent detection, QA, and automated engineering reports. - End-to-end delivery: Take features from spec to production. Own the full lifecycle — architecture, implementation, testing, deployment. - Infra & reliability: Firebase Auth (OTP, Apple, Google), EAS Build/TestFlight pipelines, and CI/CD hygiene across the stack. - Code quality: Participate in security audits and peer review within a small, high-trust engineering team. Tech Stack Frontend / Mobile: React Native + Expo, TypeScript, Firebase Auth (OTP, Apple, Google), EAS Build / TestFlight, Native iOS/Android context. Backend / AI: TypeScript (end-to-end), Anthropic API / Claude, Mastra (agent orchestration framework), LangChain (chains, tools, memory), Zeabur / cloud deployment, REST + Webhook integrations. You've Built and Run Agent Teams. Plural. This is not a role for someone who has experimented with LLMs on side projects. We're looking for engineers who have designed, shipped, and actively maintained multi-agent systems — and who understand what it takes to keep them reliable in production. - Proven agent team ownership: You have built one or more autonomous agent teams from scratch — defined their scope, wired their tools, managed inter-agent communication, and own their continued operation. - Self-validating mindset: You don't hand off agent output blindly. You act as the primary validation layer for the systems you build — designing evaluation criteria, catching failure modes, and iterating on agent behavior based on real output quality. - Cross-domain fluency: Your agents have touched multiple domains — not just one narrow use case. You can reason about agent behavior across booking logic, data quality, QA pipelines, code review, or content generation without needing a specialist in each area. - Maintenance discipline: You have a track record of keeping agent systems healthy over time — prompt drift, tool failures, context window management, regression testing. Not just shipping and moving on. - Full-stack integration: You connect agent teams to real product surfaces — not just notebooks or internal tools. Your work shows up in APIs, mobile interfaces, or backend workflows that real users depend on.
About Etera Etera is building the first AI-native corporate travel platform for the GCC market — designed from the ground up for how businesses in the region actually operate. We're a lean, high-conviction team replacing legacy TMCs with autonomous agents, real-time inventory, and a product experience that doesn't feel like it was designed in 2009. The Role We're looking for a Senior AI Fullstack Engineer who can own the full stack — from a smooth React Native mobile experience to TypeScript backend services and the AI/LLM layer that powers our multi-agent infrastructure (OPUS Prime). If you think in systems, ship fast, and find it natural to reason about both user-facing product and underlying model behavior, this role was built for you. What You'll Own - Mobile-first product: Build and maintain Etera's React Native app (Expo/TypeScript) — booking flows, real-time updates, multilingual UX (Arabic/English). - Backend services: Design and extend our TypeScript backend, including booking APIs, supplier integrations (PKfare, RateHawk, Hotelbeds), and webhook orchestration. - LLM integration: Work directly with Claude (Anthropic API) and our multi-agent OPUS Prime system — building, tuning, and extending AI pipelines for travel intent detection, QA, and automated engineering reports. - End-to-end delivery: Take features from spec to production. Own the full lifecycle — architecture, implementation, testing, deployment. - Infra & reliability: Firebase Auth (OTP, Apple, Google), EAS Build/TestFlight pipelines, and CI/CD hygiene across the stack. - Code quality: Participate in security audits and peer review within a small, high-trust engineering team. Tech Stack Frontend / Mobile: React Native + Expo, TypeScript, Firebase Auth (OTP, Apple, Google), EAS Build / TestFlight, Native iOS/Android context. Backend / AI: TypeScript (end-to-end), Anthropic API / Claude, Mastra (agent orchestration framework), LangChain (chains, tools, memory), Zeabur / cloud deployment, REST + Webhook integrations. You've Built and Run Agent Teams. Plural. This is not a role for someone who has experimented with LLMs on side projects. We're looking for engineers who have designed, shipped, and actively maintained multi-agent systems — and who understand what it takes to keep them reliable in production. - Proven agent team ownership: You have built one or more autonomous agent teams from scratch — defined their scope, wired their tools, managed inter-agent communication, and own their continued operation. - Self-validating mindset: You don't hand off agent output blindly. You act as the primary validation layer for the systems you build — designing evaluation criteria, catching failure modes, and iterating on agent behavior based on real output quality. - Cross-domain fluency: Your agents have touched multiple domains — not just one narrow use case. You can reason about agent behavior across booking logic, data quality, QA pipelines, code review, or content generation without needing a specialist in each area. - Maintenance discipline: You have a track record of keeping agent systems healthy over time — prompt drift, tool failures, context window management, regression testing. Not just shipping and moving on. - Full-stack integration: You connect agent teams to real product surfaces — not just notebooks or internal tools. Your work shows up in APIs, mobile interfaces, or backend workflows that real users depend on.

