Job Closed
This listing is no longer active.
The all-in-one Agentic SEO and AI Visibility platform - Get found everywhere people search
Platform Reliability Engineer (Agentic AI)
Location
United States
Posted
90 days ago
Salary
$70K - $120K / month
Seniority
Mid Level
Job Description
Platform Reliability Engineer (Agentic AI)
Search Atlas
The Mission: Building the Autonomous Nervous System Search Atlas is moving beyond suggestions to full execution. Our agent, Atlas Brain, handles SEO, AEO, Google Ads, and AI Content Generation autonomously—zero manual intervention. While Platform Engineers build self-service tools for developers, you ensure those tools enable autonomous AI execution with 99.99% reliability. You're not keeping dashboards alive; you're building the engine that allows an AI Agent to replace manual marketing execution. If the platform is reliable, the agent is unstoppable. What You Will Do: Architect the Autonomous Backbone Design and maintain the Kubernetes-based platform (EKS/GKE) that hosts Atlas Brain and its distributed agentic workers—handling millions of requests across SEO crawling, content generation, and ad optimization pipelines. Engineer for Zero-Touch Automate every aspect of infrastructure using Terraform, ArgoCD, and Go/Python. If you have to do it twice, it must be a script. Enable true "zero manual execution" at the infrastructure level. Scale Agentic Workflows - Optimize ML inference pipelines for real-time agent decision-making - Architect high-concurrency crawling systems that feed Atlas Brain's intelligence - Ensure sub-second latency for agent task execution (SEO, Content, AI Builder) - Handle high-frequency data pipelines: real-time bidding, SERP monitoring, content generation at scale Define Radical Reliability for AI Establish SLOs/SLIs specifically for AI execution success rates and agent task completion, not just "uptime." Design self-healing systems that preemptively resolve failures before they impact autonomous workflows. Observability for Agent Decisions Build distributed tracing and monitoring for complex agentic interactions—trace agent decision trees across SEO/AEO/Ads workflows, enabling rapid diagnosis of "why the agent made that choice." Implement OpenTelemetry, Prometheus, and Grafana for full visibility into autonomous execution. Safety & Guardrails Implement guardrails and safety controls for autonomous agent execution in marketing contexts—ensuring AI actions align with business rules, budget constraints, and compliance requirements. Design human-in-the-loop escalation paths for edge cases. Cost & Performance Governance Proactively optimize cloud spend and resource allocation (Karpenter/KEDA) as we scale to thousands of agencies. Balance performance with cost efficiency for unpredictable AI workloads. Technical Requirements Experience: 6+ years in Platform Engineering, SRE, or Infrastructure roles within high-growth SaaS environments—with proven experience supporting AI/ML systems at scale. Infrastructure as Code: Mastery of Terraform, ArgoCD, and GitOps workflows. Container Orchestration: Expert-level Kubernetes (EKS/GKE) networking, scaling, security, and multi-tenancy patterns. MLOps for Agents (Must-Have): - Hands-on experience with MLOps pipelines for autonomous agents - Model versioning and deployment strategies for continuous agent improvement - Prompt management and A/B testing of agent behaviors - Guardrails for safe tool execution and decision boundaries - Scaling AI inference services (LLMs, embeddings, classification models) Languages: Proficiency in Python for building custom platform tools and automation. Observability: Deep expertise in distributed tracing and monitoring for complex, event-driven systems—specifically for debugging AI agent decision chains. Data-Intensive Systems: Experience with high-frequency data pipelines, web crawling at scale, real-time processing, and low-latency requirements. Why This Is Different Unlike traditional SRE roles focused on keeping services up, you're building the infrastructure that enables autonomous AI to execute business-critical marketing tasks. Every millisecond of latency you eliminate, every self-healing mechanism you deploy, directly impacts whether Atlas Brain can truly replace manual agency work. This is not traditional SRE—you're building the autonomous nervous system for AI execution. What Success Looks Like - Atlas Brain executes millions of marketing tasks daily with <0.1% failure rate - Zero infrastructure-related incidents requiring manual intervention during business hours - Platform scales from hundreds to thousands of agency clients without reliability degradation - Complete observability into agent behavior: "We know not just that the agent acted, but why" Ready to build the platform that makes autonomous marketing execution a reality?
Job Requirements
- 6+ years in Platform Engineering, SRE, or Infrastructure roles within high-growth SaaS environments—with proven experience supporting AI/ML systems at scale.
- Mastery of Terraform, ArgoCD, and GitOps workflows.
- Expert-level Kubernetes (EKS/GKE) networking, scaling, security, and multi-tenancy patterns.
- Hands-on experience with MLOps pipelines for autonomous agents.
- Proficiency in Python for building custom platform tools and automation.
- Deep expertise in distributed tracing and monitoring for complex, event-driven systems—specifically for debugging AI agent decision chains.
- Experience with high-frequency data pipelines, web crawling at scale, real-time processing, and low-latency requirements.
- Model versioning and deployment strategies for continuous agent improvement.
- Prompt management and A/B testing of agent behaviors.
- Guardrails for safe tool execution and decision boundaries.
- Scaling AI inference services (LLMs, embeddings, classification models).
Benefits
- Opportunity to build the infrastructure that enables autonomous AI to execute business-critical marketing tasks.
- Every millisecond of latency you eliminate, every self-healing mechanism you deploy, directly impacts whether Atlas Brain can truly replace manual agency work.
- What Success Looks Like
- Atlas Brain executes millions of marketing tasks daily with <0.1% failure rate.
- Zero infrastructure-related incidents requiring manual intervention during business hours.
- Platform scales from hundreds to thousands of agency clients without reliability degradation.
- Complete observability into agent behavior: "We know not just that the agent acted, but why."
Related Guides
Related Categories
Related Job Pages
More Platform Engineer Jobs
Our IT Infrastructure Organization is actively searching for a Blue Yonder Senior Platform Engineer who will be the “go to” for our business users needing assistance with advanced projects, changes, issues and/or problems, and in resolving those needs within the Blue Yonder platform. Lineage is at the beginning of a global role out of the Blue Yonder TMS platform and this individual will play a crucial role in the successful deployment and then ongoing maintenance of this platform. Qualified individuals will have approximately 12 years of IT experience including Blue Yonder TMOD & TM and will approach this role with a global thinking mindset. This role combines part business analyst, part platform expert, and part program/project manager to help lead the platform roll-out and support the business after the successful execution. The ideal candidate will possess deep expertise in TMS systems (BY specifically), have a solid understanding of supply chain logistics, know or be able to coordinate needed expertise to provide solutions, and be adept at collaborating with cross-functional teams to ensure a seamless platform rollout and SLAs and enhancement needs are managed during steady-state. The Senior Platform Engineer helps to shape the overall application/platform strategy and project portfolio by assisting in soliciting demand from the various business areas as well as ensuring valuable innovative technical solutions are brought forward and considered. This is a fully remote US based role with travel (including international) required Please note: We are unable to sponsor work authorization now or in the future for this role. Key Responsibilities - Quickly learns the various current core business processes within all the trucking markets of Lineage globally and the underlying systems and manual processes that support those processes at a very high level - Maintains and develops internal business relationships and IT relationships to learn and influence the various key business and IT stakeholders. - Partners with business and technology partners to elicit, analyze, translate, and document requirements/needs and drives solutions to solve those needs - Determines operational feasibility by evaluating analysis, problem definition, requirements, solution development, and proposed solutions. - Determines platform solutions or configurations by studying information needs, conferring with users, and studying systems flow, data usage, and work processes. - Works with executive and senior leaders to create and maintain an operating plan to achieve the strategic vision and operating platform that defines the people, processes, tools, and technology - Provides internal support for evaluation of needed system functionality in existing and proposed applications to ensure solutions are optimized for quality and ongoing support. - Helps to design and maintain integrations between the BY platform and other standard/strategic solutions globally. Evaluates non-standard integration needs to determine the best step for those needs. - Manages the delivery of configuration and/or development needs to mitigate business requirements. Guides the business in the proper solution for those needs. - Coordinates with business areas to ensure proper testing, documentation, and training for solutions - Coordinates the project resources to ensure that projects are delivered on time and within budget - Helps to ensure proper support and maintenance for IT systems are in place and participates in major incident management processes - Helps to ensure proper licensing agreements are in place for COTS/SaaS application, cost are managed properly, and renewals are timely - Helps to ensure proper SOX controls and security protocols are in place for systems within the Redistribution division - Monitor system performance and propose enhancements to improve efficiency and user experience. - Stay updated on industry trends and advancements in transportation management systems. - Must be willing to travel (including internationally) during peak project milestones (anticipated to be ~30% travel) Skills - In depth knowledge of the Blue Yonder TMS SaaS solutions with implementation or upgrade experience - Must have an in-depth knowledge of TMOD and TM (both Optimization and Execution) within Blue Yonder. - Strong understanding of transportation, logistics and supply chain processes. - Proven self-management and project management skills with the ability to manage multiple tasks, situations and deadlines. - Able to understand business functionality (distribution business preferred) and translate it into application requirements - Excellent understanding and experience with managing business complexity, project interdependencies, and organization change - Intellectual curiosity and the ability to question thought partners across functional areas - Outstanding written, verbal, and visual communication skills - Proficient in data analysis and problem-solving skills. The ability to quickly troubleshoot problems that may arise from the transportation business area - Ability to partner with other technical subject matter experts (SMEs) both internally and externally to identify the path (or paths) of resolution - Strong understanding of basic system engineering, information risk and security guidelines, and architecture standards - In depth understanding of the various software development lifecycles (e.g. Agile, Waterfall, etc.) Experience - Minimum of 12 years of IT experience or applicable work experience Education Requirements - Bachelor's degree in Computer Science or Business area; MBA a plus #LI-Remote Why Lineage? This is an excellent position to begin your career path within Lineage! Success in this role enables greater responsibilities and promotions! A career at Lineage starts with learning about our business and how each team member plays a part each and every day to satisfy our customers’ requirements. Beyond that, you’ll help us grow and learn on our journey to be the very best employer in our industry. We’ll ask you for your opinion and ensure we do our part to keep you developing and engaged as we grow our business. Working at Lineage is energizing and enjoyable. We value respect and care about our team members. Lineage is an Equal Employment Opportunity Employer and is committed to compliance with all federal, state, and local laws that prohibit workplace discrimination and unlawful harassment and retaliation. Lineage will not discriminate against any applicant on the basis of race, color, age, national origin, religion, physical or mental disability or any other protected status under federal, state and local law. Benefits Lineage provides safe, stable, reliable work environments, medical, dental, and basic life and disability insurance benefits, 401k retirement plan, paid time off, annual bonus eligibility, and a minimum of 7 holidays throughout the calendar year.
Associate Director, Platform Engineering – BioAgent
BeOne MedicinesCancer has no borders. Neither do we.
• Define and execute a platform engineering strategy aligned with organizational objectives • Translate product and business needs into a prioritized technical roadmap • Establish platform standards and best practices • Oversee design, implementation, and lifecycle management of key platform services • Drive platform innovation that supports low-code digitalization • Ensure high availability, scalability, and performance of platform services • Ensure compliance with security, disaster recovery, data governance, and regulatory requirements • Establish service-level objectives and metrics for platform health • Partner closely with product management and stakeholders to define requirements • Collaborate with GTS to align platform initiatives • Build, mentor, and retain a strong platform engineering team • Support resource planning, budgeting, forecasting, and vendor/partner management
Associate Director, Platform Engineering - BioAgent
BeiGeneBeOne is committed to fair and equitable compensation practices. Actual compensation packages are determined by several factors that are unique to each candidate, including but not limited to job-related skills, depth of experience, certifications, relevant education or training, and specific work location. We are proud to be an equal opportunity employer. BeOne does not discriminate on the basis of race, religion, color, sex, gender identity, sexual orientation, age, disability, national origin, veteran status or any other basis covered by appropriate law. In order to ensure reasonable accommodation for individuals protected by Section 503 of the Rehabilitation Act of 1973, the Vietnam Era Veterans’ Readjustment Assistance Act of 1974, Title I of the Americans with Disabilities Act of 1990, and any other applicable federal, state or local laws, applicants who require reasonable accommodation in the job application process may contact accommodationsus@beonemed.com.
BeOne continues to grow at a rapid pace with challenging and exciting opportunities for experienced professionals. When considering candidates, we look for scientific and business professionals who are highly motivated, collaborative, and most importantly, share our passionate interest in fighting cancer. The Associate Director, Platform Engineering is responsible for building and operating an enterprise-grade platform that accelerates digital transformation across key business processes. This role owns the core engineering capabilities that enable scalable digital products, spanning workflow automation, analytics enablement, AI-driven capabilities, and foundational infrastructure services. The Associate Director will lead a high-performing platform engineering team, partner closely with product and cross-functional stakeholders to ensure the platform is robust, secure, compliant, and widely adopted. Major Responsibilities - Define and execute a platform engineering strategy aligned with organizational objectives to improve delivery speed, operational efficiency, and reuse across digital products. - Translate product and business needs into a prioritized technical roadmap (platform capabilities, reliability investments, security/compliance enhancements) and deliver against it with clear milestones. - Establish platform standards, reference architectures, and best practices to enable consistent engineering outcomes across teams. - Oversee design, implementation, and lifecycle management of key platform services such as authentication and authorization, data caching, data bus/pub-sub, Logging, monitoring, and observability foundations etc. - Drive platform innovation that supports low-code business process digitalization, analytics enablement, and AI-assisted automation at scale. - Ensure high availability, scalability, and performance of platform services, including capacity planning and operational excellence practices. - Ensure compliance with security, disaster recovery, data governance, and regulatory requirements applicable to enterprise digital solutions. - Establish service-level objectives (SLOs), operational metrics, incident management processes, and continuous improvement mechanisms for platform health. - Partner closely with product management, software engineering, data science, UX/UI, and business stakeholders to define requirements, validate platform usability, and drive adoption. - Collaborate with GTS to align platform initiatives with enterprise architecture, security standards, and broader infrastructure strategies. - Build, mentor, and retain a strong platform engineering team; foster a culture of excellence, ownership, collaboration, and continuous learning. - Support resource planning, budgeting, forecasting, and vendor/partner management to ensure efficient delivery and sustainable operations. Qualification - 8+ years of progressive experience in platform engineering, infrastructure engineering, or enterprise software engineering, with significant leadership experience overseeing complex initiatives. - Proven track record delivering scalable, reliable platforms that enable multiple products/teams, with strong stakeholder management across functions. - Strong understanding of Agile delivery practices and end-to-end product/platform lifecycle management. - Strong knowledge of cloud platforms (AWS, Azure, or GCP) and experience with hybrid cloud/on-prem environments. - Deep understanding of networking, security, and modern infrastructure practices (DevOps, CI/CD, Infrastructure as Code). - Familiarity with containerization and orchestration (Docker, Kubernetes) and observability tooling (e.g., Prometheus/Grafana/Splunk or equivalents). - Ability to translate technical decisions into clear business value and communicate effectively with executive stakeholders. - Exceptional leadership and communication skills with the ability to influence across engineering and business teams. - Strong problem-solving and strategic thinking; comfortable making decisions under ambiguity and operational pressure. BeOne Global Competencies: When we exhibit our values of Patients First, Collaborative Spirit, Bold Ingenuity and Driving Excellence, through our twelve global competencies below, we help get more affordable medicines to more patients around the world. - Fosters Teamwork - Provides and Solicits Honest and Actionable Feedback - Self-Awareness - Acts Inclusively - Demonstrates Initiative - Entrepreneurial Mindset - Continuous Learning - Embraces Change - Results-Oriented - Analytical Thinking/Data Analysis - Financial Excellence - Communicates with Clarity Global Competencies When we exhibit our values of Patients First, Driving Excellence, Bold Ingenuity and Collaborative Spirit, through our twelve global competencies below, we help get more affordable medicines to more patients around the world. - Fosters Teamwork - Provides and Solicits Honest and Actionable Feedback - Self-Awareness - Acts Inclusively - Demonstrates Initiative - Entrepreneurial Mindset - Continuous Learning - Embraces Change - Results-Oriented - Analytical Thinking/Data Analysis - Financial Excellence - Communicates with Clarity Salary Range: $158,400.00 - $208,400.00 annuallyBeOne is committed to fair and equitable compensation practices. Actual compensation packages are determined by several factors that are unique to each candidate, including but not limited to job-related skills, depth of experience, certifications, relevant education or training, and specific work location. Packages may vary by location due to differences in the cost of labor. The recruiter can share more about the specific salary range for a preferred location during the hiring process. Please note that the listed range reflects the base salary or hourly range only. Non-Commercial roles are eligible to participate in the annual bonus plan, and Commercial roles are eligible to participate in an incentive compensation plan. All Company employees have the opportunity to own shares of BeOne Medicines Ltd. stock because all employees are eligible for discretionary equity awards and to voluntarily participate in the Employee Stock Purchase Plan. The Company has a comprehensive benefits package that includes Medical, Dental, Vision, 401(k), FSA/HSA, Life Insurance, Paid Time Off, and Wellness. We are proud to be an equal opportunity employer. BeOne does not discriminate on the basis of race, religion, color, sex, gender identity, sexual orientation, age, disability, national origin, veteran status or any other basis covered by appropriate law. All employment is decided on the basis of qualifications, merit, and business need. In order to ensure reasonable accommodation for individuals protected by Section 503 of the Rehabilitation Act of 1973, the Vietnam Era Veterans’ Readjustment Assistance Act of 1974, Title I of the Americans with Disabilities Act of 1990, and any other applicable federal, state or local laws, applicants who require reasonable accommodation in the job application process may contact accommodationsus@beonemed.com.
• Tactically apply agentic and generative AI to optimize platform development, operations, and growth • Collaborate with application teams to identify pain points and bottlenecks in workflows • Design, implement, and maintain platform infrastructure that ensures high availability and predictable performance • Drive platform and infrastructure change hands-on, from architecture through implementation • Provide mentorship and technical leadership to other platform engineers • Lead design and architectural discussions for complex platform systems • Contribute to the continuous improvement of platform processes and developer experience



