Smartsheet logo
Smartsheet

Modern work management platform

Senior Software Engineer II – Applied AI and Evaluations

Full-stack EngineerSoftware EngineerFull TimeRemoteSeniorTeam 1,001-5,000Since 2005H1B SponsorCompany SiteLinkedIn

Location

United States

Posted

60 days ago

Salary

$175K - $245K / year

Seniority

Senior

Bachelor Degree8 yrs expEnglishPython

Job Description

Senior Software Engineer II – Applied AI and Evaluations

Smartsheet

• Own agent quality end-to-end: diagnosis, improvement, and validation across SmartAssist's orchestrator and subagents • Identify failure modes across quality dimensions factual accuracy, completeness, tone, actionability, and latency and prioritize what to fix • Drive quality improvements through prompt engineering, context engineering, and RAG retrieval tuning • Extend and mature our evaluation framework: scorers, golden datasets, regression gates, and online evaluation for production traffic • Close the feedback loop ensure that every change has a measurable, attributable quality signal • Collaborate with our Agent Architecture lead to distinguish quality problems that require prompt/context solutions from those that require structural fixes • Establish repeatable methodology that scales beyond any single agent or subagent

Job Requirements

  • 8+ years of software engineering experience, with at least 2 years working directly with LLMs in production
  • Deep, hands-on experience with prompt engineering and context engineering, you understand how model behavior changes with framing, structure, and input design
  • Strong working knowledge of RAG architectures: chunking strategies, embedding models, retrieval evaluation, and failure diagnosis
  • Experience building or extending LLM evaluation frameworks, you have designed scorers, worked with golden datasets, and thought carefully about what good looks like
  • Fluency in agent system design, you don't need to own the architecture, but you can engage as a peer on architectural tradeoffs that affect quality
  • Strong Python skills; comfortable working in data-heavy environments (Databricks, Delta tables, or equivalent)
  • Ability to communicate complex quality findings (written and verbal) to both technical and non-technical stakeholders, you can explain what’s broke, why it matters, and what needs to happen next without losing the room
  • Strong cross-functional judgment, you know when to escalate, when to resolve independently, and how to build credibility across engineering, product, and AI platform teams
  • A bias for clarity in ambiguous situations, when failure modes are murky and trade-offs are real, you bring structure and a clear point of view rather than waiting for consensus
  • Legally eligible to work in the U.S. on an ongoing basis
  • BS or MS in Computer Science, a related field, or equivalent industry experience.

Benefits

  • Employer subsidized medical/vision and dental coverage for full-time employees
  • 401k Match to help you save for your future (50% of your contribution up to the first 6% of your eligible pay)
  • Monthly stipend to support your work and productivity
  • Flexible Time Away Program, plus Sick Time Off
  • US employees are automatically covered under Smartsheet-sponsored life insurance, short-term, and long-term disability plans
  • US employees receive 12 paid holidays per year
  • Up to 24 weeks of Parental Leave
  • Personal paid Volunteer Day to support our community
  • Opportunities for professional growth and development including access to Udemy online courses
  • Company Funded Perks, including a counseling membership, local retail discounts, and your own personal Smartsheet account
  • Teleworking options from any registered location in the U.S. (role specific)

Related Job Pages

More Full-stack Engineer Jobs

Upstart logo

Software Engineer, Borrower Experience

Upstart

Our mission is to enable effortless credit based on true risk.

Full TimeRemoteTeam 1,001-5,000Since 2012H1B Sponsor

• Design and deliver borrower-facing features across web, mobile, and AI-powered experiences to improve self-service outcomes and payment success • Translate servicing workflows into intuitive, scalable product experiences that reduce friction and inbound contact volume • Build and maintain full-stack systems integrating backend services with modern frontend frameworks • Drive experimentation through A/B testing, analytics, and observability tools to improve borrower outcomes • Partner with Product, Design, Analytics, and Engineering to deliver high-quality solutions • Contribute to architectural decisions that unify systems into a cohesive borrower experience

United States
$142K - $196.6K / year
Job Closed

About Etera Etera is building the first AI-native corporate travel platform for the GCC market — designed from the ground up for how businesses in the region actually operate. We're a lean, high-conviction team replacing legacy TMCs with autonomous agents, real-time inventory, and a product experience that doesn't feel like it was designed in 2009. The Role We're looking for a Senior AI Fullstack Engineer who can own the full stack — from a smooth React Native mobile experience to TypeScript backend services and the AI/LLM layer that powers our multi-agent infrastructure (OPUS Prime). If you think in systems, ship fast, and find it natural to reason about both user-facing product and underlying model behavior, this role was built for you. What You'll Own - Mobile-first product: Build and maintain Etera's React Native app (Expo/TypeScript) — booking flows, real-time updates, multilingual UX (Arabic/English). - Backend services: Design and extend our TypeScript backend, including booking APIs, supplier integrations (PKfare, RateHawk, Hotelbeds), and webhook orchestration. - LLM integration: Work directly with Claude (Anthropic API) and our multi-agent OPUS Prime system — building, tuning, and extending AI pipelines for travel intent detection, QA, and automated engineering reports. - End-to-end delivery: Take features from spec to production. Own the full lifecycle — architecture, implementation, testing, deployment. - Infra & reliability: Firebase Auth (OTP, Apple, Google), EAS Build/TestFlight pipelines, and CI/CD hygiene across the stack. - Code quality: Participate in security audits and peer review within a small, high-trust engineering team. Tech Stack Frontend / Mobile: React Native + Expo, TypeScript, Firebase Auth (OTP, Apple, Google), EAS Build / TestFlight, Native iOS/Android context. Backend / AI: TypeScript (end-to-end), Anthropic API / Claude, Mastra (agent orchestration framework), LangChain (chains, tools, memory), Zeabur / cloud deployment, REST + Webhook integrations. You've Built and Run Agent Teams. Plural. This is not a role for someone who has experimented with LLMs on side projects. We're looking for engineers who have designed, shipped, and actively maintained multi-agent systems — and who understand what it takes to keep them reliable in production. - Proven agent team ownership: You have built one or more autonomous agent teams from scratch — defined their scope, wired their tools, managed inter-agent communication, and own their continued operation. - Self-validating mindset: You don't hand off agent output blindly. You act as the primary validation layer for the systems you build — designing evaluation criteria, catching failure modes, and iterating on agent behavior based on real output quality. - Cross-domain fluency: Your agents have touched multiple domains — not just one narrow use case. You can reason about agent behavior across booking logic, data quality, QA pipelines, code review, or content generation without needing a specialist in each area. - Maintenance discipline: You have a track record of keeping agent systems healthy over time — prompt drift, tool failures, context window management, regression testing. Not just shipping and moving on. - Full-stack integration: You connect agent teams to real product surfaces — not just notebooks or internal tools. Your work shows up in APIs, mobile interfaces, or backend workflows that real users depend on.

Switzerland
Job Closed

About Etera Etera is building the first AI-native corporate travel platform for the GCC market — designed from the ground up for how businesses in the region actually operate. We're a lean, high-conviction team replacing legacy TMCs with autonomous agents, real-time inventory, and a product experience that doesn't feel like it was designed in 2009. The Role We're looking for a Senior AI Fullstack Engineer who can own the full stack — from a smooth React Native mobile experience to TypeScript backend services and the AI/LLM layer that powers our multi-agent infrastructure (OPUS Prime). If you think in systems, ship fast, and find it natural to reason about both user-facing product and underlying model behavior, this role was built for you. What You'll Own - Mobile-first product: Build and maintain Etera's React Native app (Expo/TypeScript) — booking flows, real-time updates, multilingual UX (Arabic/English). - Backend services: Design and extend our TypeScript backend, including booking APIs, supplier integrations (PKfare, RateHawk, Hotelbeds), and webhook orchestration. - LLM integration: Work directly with Claude (Anthropic API) and our multi-agent OPUS Prime system — building, tuning, and extending AI pipelines for travel intent detection, QA, and automated engineering reports. - End-to-end delivery: Take features from spec to production. Own the full lifecycle — architecture, implementation, testing, deployment. - Infra & reliability: Firebase Auth (OTP, Apple, Google), EAS Build/TestFlight pipelines, and CI/CD hygiene across the stack. - Code quality: Participate in security audits and peer review within a small, high-trust engineering team. Tech Stack Frontend / Mobile: React Native + Expo, TypeScript, Firebase Auth (OTP, Apple, Google), EAS Build / TestFlight, Native iOS/Android context. Backend / AI: TypeScript (end-to-end), Anthropic API / Claude, Mastra (agent orchestration framework), LangChain (chains, tools, memory), Zeabur / cloud deployment, REST + Webhook integrations. You've Built and Run Agent Teams. Plural. This is not a role for someone who has experimented with LLMs on side projects. We're looking for engineers who have designed, shipped, and actively maintained multi-agent systems — and who understand what it takes to keep them reliable in production. - Proven agent team ownership: You have built one or more autonomous agent teams from scratch — defined their scope, wired their tools, managed inter-agent communication, and own their continued operation. - Self-validating mindset: You don't hand off agent output blindly. You act as the primary validation layer for the systems you build — designing evaluation criteria, catching failure modes, and iterating on agent behavior based on real output quality. - Cross-domain fluency: Your agents have touched multiple domains — not just one narrow use case. You can reason about agent behavior across booking logic, data quality, QA pipelines, code review, or content generation without needing a specialist in each area. - Maintenance discipline: You have a track record of keeping agent systems healthy over time — prompt drift, tool failures, context window management, regression testing. Not just shipping and moving on. - Full-stack integration: You connect agent teams to real product surfaces — not just notebooks or internal tools. Your work shows up in APIs, mobile interfaces, or backend workflows that real users depend on.

Germany
Job Closed

About Etera Etera is building the first AI-native corporate travel platform for the GCC market — designed from the ground up for how businesses in the region actually operate. We're a lean, high-conviction team replacing legacy TMCs with autonomous agents, real-time inventory, and a product experience that doesn't feel like it was designed in 2009. The Role We're looking for a Senior AI Fullstack Engineer who can own the full stack — from a smooth React Native mobile experience to TypeScript backend services and the AI/LLM layer that powers our multi-agent infrastructure (OPUS Prime). If you think in systems, ship fast, and find it natural to reason about both user-facing product and underlying model behavior, this role was built for you. What You'll Own - Mobile-first product: Build and maintain Etera's React Native app (Expo/TypeScript) — booking flows, real-time updates, multilingual UX (Arabic/English). - Backend services: Design and extend our TypeScript backend, including booking APIs, supplier integrations (PKfare, RateHawk, Hotelbeds), and webhook orchestration. - LLM integration: Work directly with Claude (Anthropic API) and our multi-agent OPUS Prime system — building, tuning, and extending AI pipelines for travel intent detection, QA, and automated engineering reports. - End-to-end delivery: Take features from spec to production. Own the full lifecycle — architecture, implementation, testing, deployment. - Infra & reliability: Firebase Auth (OTP, Apple, Google), EAS Build/TestFlight pipelines, and CI/CD hygiene across the stack. - Code quality: Participate in security audits and peer review within a small, high-trust engineering team. Tech Stack Frontend / Mobile: React Native + Expo, TypeScript, Firebase Auth (OTP, Apple, Google), EAS Build / TestFlight, Native iOS/Android context. Backend / AI: TypeScript (end-to-end), Anthropic API / Claude, Mastra (agent orchestration framework), LangChain (chains, tools, memory), Zeabur / cloud deployment, REST + Webhook integrations. You've Built and Run Agent Teams. Plural. This is not a role for someone who has experimented with LLMs on side projects. We're looking for engineers who have designed, shipped, and actively maintained multi-agent systems — and who understand what it takes to keep them reliable in production. - Proven agent team ownership: You have built one or more autonomous agent teams from scratch — defined their scope, wired their tools, managed inter-agent communication, and own their continued operation. - Self-validating mindset: You don't hand off agent output blindly. You act as the primary validation layer for the systems you build — designing evaluation criteria, catching failure modes, and iterating on agent behavior based on real output quality. - Cross-domain fluency: Your agents have touched multiple domains — not just one narrow use case. You can reason about agent behavior across booking logic, data quality, QA pipelines, code review, or content generation without needing a specialist in each area. - Maintenance discipline: You have a track record of keeping agent systems healthy over time — prompt drift, tool failures, context window management, regression testing. Not just shipping and moving on. - Full-stack integration: You connect agent teams to real product surfaces — not just notebooks or internal tools. Your work shows up in APIs, mobile interfaces, or backend workflows that real users depend on.

Ireland
Job Closed