Buildkite logo
Buildkite

Buildkite is the fastest, most reliable way to deploy and test code at any scale.

Staff Engineer - Compute & Agents

AI EngineerMachine Learning EngineerFull TimeRemoteLeadTeam 11-50Since 2013H1B No SponsorCompany SiteLinkedIn

Location

Australia + 1 moreAll locations: Australia | New Zealand

Posted

64 days ago

Salary

0

Seniority

Lead

No structured requirement data.

Job Description

Staff Engineer - Compute & Agents

Buildkite

At Buildkite, our mission is to unblock every developer on the planet. We've rethought how software delivery should work and have built a platform that is fast, reliable, secure, and able to scale to the needs of the most demanding high-growth tech companies globally, including Airbnb, Shopify, Canva, PagerDuty, Lyft, and Pinterest. Job Overview We're hiring a Staff Engineer to join our Compute and Agents team. In this role, you'll help set technical direction for the team, lead architectural decisions across complex systems, and drive the work that has the highest impact on Buildkite's infrastructure and developer experience. You'll shape how we build and scale our agent infrastructure, hosted compute, and MCP service — balancing reliability, performance, and security at scale. This is a hands-on role. You'll write code, own the hardest problems, and raise the engineering bar across the team through standards, mentoring, and the quality of your own work. 🔧 About the Team The Compute and Agents team builds, maintains, and iterates on the infrastructure that keeps Buildkite humming — from our open-source agent trusted by engineers around the world, to our hosted agents and MCP service. It's a team that sits at an intersection of deeply technical infrastructure work and the fast-moving world of AI agents, with our own agent in the middle. There's real ownership here. You'll work on problems that matter to developers everywhere, shipping work you can point to and be proud of — alongside a sharp, kind team that loves digging into hard problems together. 🚀 What You'll Do - Help guide the technical direction for the Compute and Agents team, shaping architecture and system design decisions - Lead the hardest cross-system integrations and drive solutions to complex infrastructure challenges - Design and evolve systems, patterns, and standards that improve reliability, performance, and operability across teams - Write code, review code, and pair with engineers — leading by example on quality and craft - Build alignment across teams and stakeholders on technical decisions, balancing speed, quality, and impact - Collaborate with engineers, designers, and product managers to shape solutions that solve customer problems - Mentor and upskill engineers within the team, raising capability and setting shared standards - Lead cross-team reliability, performance, and cost initiatives; set operational readiness expectations for major changes - Participate in customer research and discovery sessions to inform product and architectural decisions 🎨 Skills & Experience We Value Core Skills: - Strong communication skills, with empathy and kindness in both written and verbal collaboration - Proven experience leading technical direction within or across engineering teams - Repeated ownership of large or complex production systems, with clear evidence of impact beyond a single team - Ability to make strategic trade-offs, articulate decisions clearly, and build alignment in ambiguous situations - Strong judgment about where to invest versus where to simplify - Comfortable working directly with customers and incorporating feedback into product development - Familiarity with CI/CD systems, developer tooling, and DevOps concepts Technical Stack: - Go — our primary language, used across our agents and infrastructure. Strong experience or a genuine eagerness to go deep is a must. - Kubernetes & AWS — we run cloud-native infrastructure at scale. Experience with container orchestration and distributed systems is a big plus. - Terraform — we manage our infrastructure as code with Terraform. Familiarity with IaC principles and writing reusable, maintainable Terraform is valued. - Ruby on Rails — the backbone of the Buildkite platform. The majority of our product is built here, so comfort with Rails or a willingness to pick it up matters. - React & GraphQL — our frontend and API layer, used across the platform alongside Rails. - PostgreSQL — experience with query optimisation, schema design, or managing relational databases under load is a plus. 🗓 A Typical Day Might Include - Leading an architecture discussion for a new capability or system change - Reviewing pull requests with detailed, constructive feedback - Pairing with an engineer on a complex infrastructure problem - Investigating performance bottlenecks or scaling challenges - Driving alignment across teams on a cross-cutting technical decision - Joining a customer discovery session to understand how teams use agents at scale - Writing code on the hardest or most ambiguous parts of the work ✨ Why Join Buildkite At Buildkite, we value kindness, autonomy, and collaboration. You'll be part of a remote-first company where your work can make a meaningful impact — empowering engineers worldwide to build and deliver better software faster. - Competitive compensation and benefits package - Flexible, remote-first culture - Opportunities for professional growth, leadership, and technical ownership - Work alongside talented, passionate engineers building world-class developer tools - A collaborative, inclusive, and innovative culture where your ideas make a real impact 🌈 Equal Opportunity Employer At Buildkite, we value diversity and celebrate all types of skills, backgrounds, and experiences. We’re dedicated to fostering an inclusive environment and providing reasonable accommodations throughout our recruitment process. If you need any accommodations or support during the application or interview process, please reach out to us at accommodations@buildkite.com.

Related Job Pages

More AI Engineer Jobs

Zensar logo

Platforms Engineer-AWS SME

Zensar

At Zensar, we’re “experience-led everything”. We are committed to conceptualizing, designing, engineering, marketing, and managing digital solutions and experiences for over 130 leading enterprises. We are a company driven by a bold purpose: Together, we shape experiences for better futures. Whether for our clients, our people, or the world around us, this belief powers everything we do. At the heart of our culture is ONE with Client - a set of four core values that reflect who we are and how we work: One Zensar, Nurturing, Empowering, and Client Focus. Part of the $4.8 billion RPG Group, we’re a community of 10,000+ innovators across 30+ global locations, including Milpitas, Seattle, Princeton, Cape Town, London, Zurich, Singapore, and Mexico City. We believe the best work happens when individuality is celebrated, growth is encouraged, and well-being is prioritized. We are an equal employment opportunity (EEO) and affirmative action employer, committed to creating an inclusive workplace. All qualified applicants will be considered without regard to race, creed, color, ancestry, religion, sex, national origin, citizenship, age, sexual orientation, gender identity, disability, marital status, family medical leave status, or protected veteran status.

AI Engineer64 days ago
Full TimeRemoteTeam 10,001

Role Description - Solution Design & Implementation: Designs, builds, and deploys solutions on AWS, ensuring scalability, security, and cost-effectiveness. - Customer Engagement: Works directly with customers to understand their needs, business objectives, and technical challenges. - Technical Expertise: Possesses deep knowledge of AWS services, best practices, and industry trends, which include but are not limited to AWS services like AWS Control Tower, AWS Organization, VPC, AWS Cloud WAN, AWS Security Hub. - Networking: Should have strong expertise in configuring networking services in AWS. - Communication & Collaboration: Effectively communicates technical solutions to both technical and non-technical audiences. - Continuous Learning: Keeps abreast of the latest AWS technologies and trends. - Problem Solving: Identifies and resolves complex technical issues. - Digital Transformation: Helps organizations move from on-premises systems to the cloud, enabling digital transformation. Qualifications - Experience: Extensive experience in designing, implementing, and managing cloud solutions on AWS, typically 10+ years. - Skills: Strong technical skills in cloud computing, networking, security, and application development. - Certifications: AWS certifications, such as AWS Certified Solutions Architect - Professional, are highly valued. - Education: Bachelor's degree in Computer Science or a related field. - Soft Skills: Excellent communication, problem-solving, and collaboration skills.

India
Job Closed

Staff Engineer - Power and Renewables

Braun Intertec Corporation

Braun Intertec Corporation is an engineering, environmental consulting, and testing firm that provides a range of innovative, science-based solutions. The compa

AI Engineer64 days ago

Manage complex geotechnical projects in the renewables sector, develop project scopes and proposals, perform engineering analysis, and ensure client expectations are met through effective communication and quality review.

Minnesota + 1 moreAll locations: Minnesota | Texas
Full TimeRemoteTeam 51-200H1B No Sponsor

• Deploy and optimise a large language model for production inference: quantisation, continuous batching, low-latency serving. • Build the RAG pipeline: document chunking, embedding generation, vector storage, cross-encoder reranking, and context assembly optimised for a 128K-token context window. • Build the context layer: per-tenant system prompts, dynamically retrieved few-shot exemplars, task routing (classifying incoming requests to the right prompt configuration). • Build defensive output parsing: structured JSON output from an unmodified base model with graceful fallbacks. • Design and implement the feedback collection pipeline: capturing user corrections and ratings, automatically generating training data candidates for future fine-tuning. • Design the custom model training workflow: tenant-scoped LoRA training on client-specific data, model evaluation, A/B testing, and isolated deployment. • Monitor and improve inference quality: parsing failure rates, citation accuracy, hallucination rates, latency — all tracked per tenant. • Iterate on prompts daily with the domain expert during the pilot phase.

Singapore
Job Closed
Full TimeRemoteTeam 51-200H1B No Sponsor

• Deploy and optimise a large language model for production inference: quantisation, continuous batching, low-latency serving. • Build the RAG pipeline: document chunking, embedding generation, vector storage, cross-encoder reranking, and context assembly optimised for a 128K-token context window. • Build the context layer: per-tenant system prompts, dynamically retrieved few-shot exemplars, task routing (classifying incoming requests to the right prompt configuration). • Build defensive output parsing: structured JSON output from an unmodified base model with graceful fallbacks. • Design and implement the feedback collection pipeline: capturing user corrections and ratings, automatically generating training data candidates for future fine-tuning. • Design the custom model training workflow: tenant-scoped LoRA training on client-specific data, model evaluation, A/B testing, and isolated deployment. • Monitor and improve inference quality: parsing failure rates, citation accuracy, hallucination rates, latency — all tracked per tenant. • Iterate on prompts daily with the domain expert during the pilot phase.

Hong Kong
Job Closed