Grafana Labs logo
Grafana Labs

Grafana Labs supports organizations’ monitoring, visualization and observability goals. 950,000+ active installations

Staff AI Engineer

AI EngineerMachine Learning EngineerFull TimeRemoteLeadTeam 501-1,000Since 2014H1B SponsorCompany SiteLinkedIn

Location

United States

Posted

47 days ago

Salary

$175.0K - $210.0K / year

Seniority

Lead

Job Description

Staff AI Engineer

Grafana Labs

• Own end-to-end development of multi-agent AI systems, from architecture and implementation through testing, deployment, and ongoing operation • Build modular, composable agentic systems using orchestration frameworks (LangChain, CrewAI, Anthropic MCP, or similar) that operate 24/7 across teams • Develop reusable agentic skills that agents invoke across interfaces (Slack, dashboards, internal apps, CLIs) • Implement observability and feedback loops including logging, performance metrics, prompt iteration, model evaluation, and cost management • Establish governance and compliance standards for AI workflows including access controls, audit trails, PII handling, and human-in-the-loop escalation paths • Build MCP servers, APIs, CLIs, and microservices connecting AI models to business systems (BigQuery, Slack, CRMs, email, calendars, analytics tools) • Architect data flows for retrieval-augmented generation (RAG), connecting LLMs to internal knowledge bases, customer data, and real-time business context • Build serverless or containerized services (GCP Cloud Functions, Cloud Run) that scale with usage and integrate with Grafana's cloud infrastructure • Partner with RevOps, Demand Generation, Regional Marketing, and SDR teams to scope high-impact automation problems, identify bottlenecks, and build solutions with measurable business outcomes • Design and deploy workflows using orchestration tools (n8n, Workato, or custom platforms) with CI/CD, testing, and production reliability standards • Build systems designed for self-service with documentation, playbooks, and enablement materials that let partner teams operate independently

Job Requirements

  • 8+ years of software engineering experience with depth in backend development, systems integration, or data/analytics engineering
  • 2+ years hands-on experience applying LLMs/AI to production workflows, not just prototypes
  • Strong proficiency in Python and JavaScript/Node.js with Git-based workflows, code review practices, and testing discipline
  • Hands-on experience with LLM frameworks and patterns including prompt engineering, RAG, function calling/tool use, structured output parsing, and evaluation
  • Experience building and operating multi-agent systems at scale including agent decomposition, orchestration patterns (sequential chains, router/dispatcher, parallel fan-out), state management, and production monitoring
  • Diagnose business problems before writing code; think in workflows and outcomes, not just functions
  • Deep familiarity with Google Cloud Platform, BigQuery, and serverless/containerized services (Cloud Functions, Cloud Run)
  • Understanding of LLM failure modes and production mitigations including confidence thresholds, fallback logic, human escalation, and cost/latency management
  • Proven ability to identify high-leverage problems, push back on low-impact requests, and deliver end-to-end with minimal direction
  • Fluent with AI-assisted development tools (GitHub Copilot, Cursor, Claude Code); use AI to build AI systems
  • Clear technical communicator—able to explain complex systems in simple terms to both engineers and business stakeholders.

Benefits

  • equity
  • bonus (if applicable)
  • Restricted Stock Units (RSUs)
  • 30 days annual leave

Related Job Pages

More AI Engineer Jobs

Full TimeRemoteTeam 201-500

About the team Kiloverse is the operational force that helps ventures grow — building the systems, teams, and infrastructure that turn ideas into scalable businesses. Behind every fast-growing product, there’s a group of people who think like owners, act fast, and execute with precision. Here, you’ll find people who want to break big in their careers — working side by side with entrepreneurs and founders to build ventures that redefine the health, wellness, beauty, or travel industries. We are seeking an Automation Developer to join our fast-paced Automations Team at Kiloverse. You’ll be responsible for designing, building, and maintaining automations that improve operational efficiency across various teams using tools like n8n, Zendesk and integrating services using JavaScript, PHP (Laravel), and React. This is a great opportunity for someone passionate about building scalable and meaningful process automations using modern tools and languages. Get ready to - Develop and maintain workflows in n8n, integrating with tools like Zendesk CRMs, internal services, and 3rd-party APIs. - Build custom logic and integrations using JavaScript and PHP (Laravel). - Collaborate with cross-functional teams (Support, Product, Engineering) to gather requirements and implement process automations. - Monitor, debug, and optimize automation flows for performance, reliability, and scalability. - Maintain clean documentation of workflows, APIs, and automation logic. - Contribute to the technical architecture of our automation ecosystem. - Participate in code reviews and uphold coding best practices. We expect you to - 4+ years of experience in software development or automation-focused roles. - Proficiency in JavaScript and PHP (Laravel framework). - Experience with low-code/no-code platforms like n8n (or similar: Zapier, Make, Node-RED). - Experience integrating REST APIs, and webhooks. - Good understanding of software architecture and data flow in software systems. - Ability to write clean, maintainable, and testable code. - Strong debugging and problem-solving skills. - Familiarity with monitoring/logging tools (e.g., Sentry, Datadog). - Experience with version control systems (e.g., Git) and CI/CD pipelines. - Background in customer service or operations engineering is a plus. - Strong communication skills for collaborating with technical and non-technical teams. - Problem-solving mindset with attention to detail. - Ability to manage multiple tasks and prioritize effectively. Salary Gross salary range is 4700 - 6700 EUR/month. Location This position is remote, with the opportunity to work from our office in Barcelona, Spain. Speaking of perks: Own your wellness 7 extra days off, 3 for weddings. Shape your growth path Builders never stop learning. Coaching, mentorship, training, conferences, online courses, subscriptions, books – you’ve got a €1100 yearly budget to fuel personal or team growth. You pick the path – we’ll back you all the way. Drive global impact This is where ambition scales. Build impact across 30+ brands worldwide. Launch bold projects. Even step in as a co-founder – we reward those who push further. And if you bring in more great people, you get €1500 for every successful referral. *additional conditions apply based on your residence location.

Spain
€4.7K - €6.7K / month

We’re looking for a Senior Software Engineer - AI Agent Platform to join our tech team and help build the future of neuro-contextual advertising at global scale. Who We Are At Seedtag, our mission is to transform advertising by proving that effectiveness and user privacy can truly coexist. As the leading Neuro-Contextual Advertising Company, we combine Artificial Intelligence, Natural Language Processing, Computer Vision, and neuroscience to understand not only what content is about, but how it makes people feel and what they intend to do next. Our proprietary AI, Liz, enables brands to connect with audiences across the open web and Connected TV without cookies or user tracking. Founded in 2014 by two ex-Googlers, Seedtag has grown to 700+ Seedtaggers in 17 countries, backed by €250M in funding, and operates today as a global ad-tech leader. If you enjoy solving complex engineering challenges and building AI-driven systems at scale, you’ll feel right at home here. Your Challenge As Senior Software Engineer - AI Agent Platform, you will: - Design and build an enterprise-grade AI agent platform that handles real workloads, real users, and real constraints. - Own critical parts of the platform: agent orchestration, memory and context systems, tool integration layers, and external communication protocols. - Solve problems like managing conversation context at scale, coordinating multi-agent workflows, optimising LLM interactions, and exposing agent capabilities through standard protocols. - Make architectural decisions that balance performance, maintainability and developer experience. - Work closely with data scientists and product teams to ship agent capabilities that drive business outcomes. This space moves fast. Frameworks, protocols, and best practices evolve constantly. We build solid foundations but stay ready to adapt. You should be comfortable with change and eager to stay at the state of the art. Our Core Values Outcome over Output We measure success by impact and value, not by volume of features or lines of code. Failure Is Allowed, Learning Is a Must Experimentation is key to innovation. We test early, iterate often, and learn fast. We Are All Scouts We take ownership and leave things better than we found them. We Are Data-Driven Data informs our decisions and helps us continuously improve our systems and results. Tech Stack We operate at large scale, supporting up to 120k requests per second, with ML models responding in under 10 milliseconds and processing 20 TB of data daily. Our stack includes: - Languages: Python, Go, TypeScript, Scala - Cloud & Infrastructure: Kubernetes, GCP, AWS - Streaming: Kafka, Kinesis - Data: Datalakes, GCS, Redis, Druid, MongoDB - Architecture: 100+ microservices, DevOps-driven, continuous deployment What You’ll Need to Succeed - 5+ years building and operating backend systems in production, with strong distributed systems fundamentals. - Proficiency in Python and ideally Go. You care about code quality and maintainability. - Hands-on experience building LLM-based applications or AI agent systems using agent SDKs and frameworks (Pydantic AI, LangGraph, OpenAI Agents SDK, or similar). - Experience with Kubernetes, GCP or AWS, observability tools, CI/CD pipelines. - Understanding of agent patterns: tool calling, context management, and multi-step workflows. - Experience working with AI-assisted development tools. - Strong judgment and communication skills. You navigate ambiguity and make sound technical decisions. - Track record of owning projects end-to-end in fast-moving environments. Why Join Seedtag? - A key moment of growth with real ownership and global impact. - Flexible work model with 100% remote or hybrid options. (Remote contracts available in Spain, Italy, UK, Belgium, Netherlands, France, and Germany.) - Continuous learning through a learning platform and optional language classes. - A supportive, trust-based culture that values well-being. - Team activities, offsites, and opportunities to connect beyond work. Additional Perks - Home office setup budget up to €1,000 - Paid trips to our HQ in Madrid - MacBook Pro M3 Ready to Join the Seedtag Adventure? At Seedtag, we create an environment where everyone can thrive. If you need accommodations during the hiring process, let us know and we’ll ensure a positive experience. Send us your CV and let’s build the future of neuro-contextual advertising together.

Belgium + 5 moreAll locations: Belgium | France | Italy | Netherlands | Spain | United Kingdom
Full TimeRemoteTeam 201-500H1B No Sponsor

• Integrate Generative AI models, such as LLMs, with external APIs, tools, and databases using secure and efficient orchestration patterns. • Design, develop, and deploy AI workflows and Agentic AI solutions, enabling the seamless orchestration of intelligent agents to plan and perform tasks while leveraging autonomous and/or human-in-the-loop paradigms. • Implement and optimize multi-agent systems, leveraging standards and protocols such as Model Context Protocol (MCP), and emerging frameworks for agent interoperability and access to external resources. • Develop evaluation frameworks, metrics, and checkpoints for agent autonomy, performance, and safety, ensuring compliance with moderation, security, and ethical standards. • Evaluate, analyse, and gather insights out of structured and non-structure data leveraging Generative AI models and pipelines. • Ensure robust AI agent operations by applying observability, monitoring, and MLOps best practices, facilitating reliable deployment pipelines and continuous performance optimization. • Orchestrate AI model selection, tuning, and performance validation to meet specific agent-based application needs. • Communicate complex AI concepts, systems, and decisions effectively to technical and non-technical stakeholders, promoting transparency and trust in AI delivery. • Foster an environment of innovation and collaboration, engaging and encouraging teams to solve complex problems and share ideas that drive innovative approaches.

Brazil
Full TimeRemoteTeam 201-500H1B No Sponsor

• Design and implement context architectures that enable AI systems to access, interpret, and reason over enterprise knowledge. • Develop and maintain ontologies, schemas, and knowledge representations that structure domain knowledge across systems, ensuring consistency, reusability, and scalability. • Define and optimize context assembly pipelines, including retrieval strategies, ranking logic, memory handling, and prompt/context composition for LLM-based systems. • Build and manage semantic layers over structured and unstructured data, enabling effective grounding of AI agents in real-world business context. • Design and implement knowledge graphs and context graphs to model relationships between entities, actions, and outcomes across enterprise systems. • Collaborate with AI Engineers and Data teams to align embeddings, chunking strategies, and vector storage with ontology and semantic design. • Establish standards for context quality, including evaluation frameworks for relevance, coherence, completeness, and business impact. • Enable interoperability across AI systems by defining shared context interfaces, schemas, and protocols (e.g., MCP or API-based context services). • Continuously refine context systems based on agent performance, feedback loops, and operational insights. • Translate complex semantic and contextual concepts into actionable implementations for both technical and non-technical stakeholders.

Brazil