We are a leader in supply chain software solutions, helping organizations streamline operations, reduce costs, and improve efficiency.
Senior AI Platform Engineer
Location
United States
Posted
41 days ago
Salary
0
Seniority
Senior
Job Description
Senior AI Platform Engineer
Infios
Role Description We are looking for a Senior AI Platform Engineer with deep expertise in spec-driven AI SDLC, strong hands-on experience with AWS AI infrastructure (Bedrock, Bedrock Agents, Agent Core), and fluency across Java, TypeScript, and Python. You will champion a specification-first approach to AI development — translating product requirements into rigorous AI specs, building LLM-powered and agentic applications using Spring AI, and owning the full lifecycle from prototype through production on AWS. What a day in the life looks like: - Define AI feature specifications upfront — including acceptance criteria, evaluation metrics, prompt contracts, and expected behaviors. - Own end-to-end AI feature delivery across the full AI SDLC: spec definition, prototyping, development, evaluation, deployment, and production monitoring. - Build production-grade LLM and agentic AI applications using Spring AI — including RAG pipelines, agent orchestration, tool-use patterns, guardrails, and human-in-the-loop workflows. - Architect and operate AWS AI infrastructure (Bedrock, Bedrock Agents, Agent Core, SageMaker) alongside core AWS services (ECS/EKS, Lambda, S3, DynamoDB, RDS, API Gateway). - Design and implement scalable microservices and distributed systems in Java, TypeScript, and Python that power the Archer AI platform. - Build CI/CD pipelines for AI workloads — including LLM evaluation pipelines and automated regression testing for AI outputs — using Terraform, CloudFormation, Docker, Kubernetes, and GitHub Actions. - Drive AI-specific operational practices: observability, drift detection, quality scoring, feedback loops, and incident response for non-deterministic systems. - Communicate technical concepts clearly to both technical and non-technical stakeholders; author AI specs, design documents, and architectural decision records. - Mentor engineers, conduct thorough code reviews, and champion engineering excellence. Qualifications - Deep expertise in the AI software development lifecycle with a specification-first mindset. - Experience authoring AI feature specs (acceptance criteria, evaluation metrics, prompt contracts). - Track record of shipping AI-powered features through multiple product cycles with engineering rigor. - Strong hands-on experience with Amazon Bedrock, Bedrock Agents, Agent Core, SageMaker, and Amazon Q. - Solid knowledge of core AWS infrastructure including compute (ECS/EKS, Lambda), databases (RDS, DynamoDB, ElastiCache), networking (VPC, ALB, CloudFront), and security (IAM, KMS, Secrets Manager). - Experience architecting AI infrastructure pipelines with cost optimization and high availability. - Hands-on experience building production applications with Spring AI. - Solid understanding of LLM application patterns (prompt management, RAG, context orchestration, vector stores, evaluation) and agentic workflows (multi-step agents, tool-use orchestration, planning loops). - 5+ years of professional software engineering with strong proficiency across Java (Spring Boot, Spring Cloud), TypeScript (Node.js, modern frameworks), and Python (AI tooling, evaluation frameworks). - Experience designing and operating distributed systems at scale. - Familiarity with event-driven architectures, message brokers (Kafka, SQS/SNS), caching (Redis, ElastiCache), and relational/NoSQL database design. - Proficiency in CI/CD pipelines, Infrastructure as Code (Terraform, CloudFormation), containerization (Docker, Kubernetes/EKS), and GitOps workflows. - Excellent analytical skills and the ability to tackle complex, ambiguous challenges independently. - Outstanding written and verbal communication — able to articulate technical concepts to diverse audiences and collaborate effectively across teams. - Bachelor’s or Master’s degree in Computer Science, Software Engineering, or a related field (or equivalent practical experience). Benefits - At Infios, we're not just looking for employees; we're looking for partners in innovation, growth, and purpose. - We meet you on your journey, equipping you with the tools and opportunities to build the future you envision. - We are committed to creating a safe and welcoming environment where every individual’s unique experiences and perspectives are valued.
Related Guides
Related Categories
Related Job Pages
More Platform Engineer Jobs
Staff Software Engineer - Platform Engineering
Cohere HealthCohere Health is a Software-as-a-Service (SaaS) company focused on improving the patient journey by enhancing the quality of care at lower costs, as well as emp
Opportunity Overview: We’re looking for a Staff Software Engineer - Platform Engineering to help shape the future of how we build, deploy, and operate software at Cohere. In this role, you’ll drive improvements across DevOps, SRE, and developer experience - building a platform that is scalable, reliable, and intuitive for engineers to use. You’ll partner closely with engineering, product, and operations to enable high-quality, high-velocity delivery across the organization, with a direct impact on how our teams ship software at scale. What you’ll do: - Lead development of scalable, reliable, high-performance platform infrastructure - Drive deployment and release strategies to improve safety, consistency, and speed - Improve developer experience through CI/CD enhancements and faster feedback cycles - Establish observability and SRE practices to strengthen monitoring, reliability, and incident response - Design environments that support consistent testing, validation, and safe releases - Partner cross-functionally and provide technical leadership, mentoring teams on platform and reliability best practices What you’ll need: - 9+ years of experience across the software development lifecycle, with a focus on platform engineering, DevOps, or SRE - Proven ability to design and build scalable, reliable platform systems supporting production workloads - Strong backend experience (Java/Spring or similar) and hands-on experience with AWS, Kafka, and CI/CD pipelines - Deep understanding of system design, observability, and production operations - Experience with Infrastructure as Code (e.g., Terraform) and containerization/orchestration (e.g., Kubernetes, ECS) - Familiarity with event-driven architectures, modern deployment strategies (canary, blue-green), and messaging systems - Strong collaboration skills, with the ability to work cross-functionally and proactively improve platform performance and developer productivity - Bachelor’s degree in Computer Science, Software Engineering, or related field (or equivalent experience) Pay & Perks: 💻 Fully remote opportunity with about 5% travel 🩺 Medical, dental, vision, life, disability insurance, and Employee Assistance Program 📈 401K retirement plan with company match; flexible spending and health savings account 🏝️ Flex Time Off + company holidays 👶 Up to 14 weeks of paid parental leave 🐶 Pet insurance The salary range for this position is $185,000 to $210,000 annually; as part of a total benefits package which includes health insurance, 401k and bonus. In accordance with state applicable laws, Cohere is required to provide a reasonable estimate of the compensation range for this role. Individual pay decisions are ultimately based on a number of factors, including but not limited to qualifications for the role, experience level, skillset, and internal alignment. Interview Process*: - Connect with Talent Acquisition for a Preliminary Phone Screening - Meet your Hiring Manager! - System Design Interview - Technical Discussion - Cross Functional Interview *Subject to change About Cohere Health: Cohere Health’s clinical intelligence platform delivers AI-powered solutions that streamline access to quality care by improving payer-provider collaboration, cost containment, and healthcare economics. Cohere Health works with over 660,000 providers and handles over 12 million prior authorization requests annually. Its responsible AI auto-approves up to 90% of requests for millions of health plan members. With the acquisition of ZignaAI, we’ve further enhanced our platform by launching our Payment Integrity Suite, anchored by Cohere Validate™, an AI-driven clinical and coding validation solution that operates in near real-time. By unifying pre-service authorization data with post-service claims validation, we’re creating a transparent healthcare ecosystem that reduces waste, improves payer-provider collaboration and patient outcomes, and ensures providers are paid promptly and accurately. Cohere Health’s innovations continue to receive industry wide recognition. We’ve been named to the 2025 Inc. 5000 list and in the Gartner® Hype Cycle™ for U.S. Healthcare Payers (2022-2025), and ranked as a Top 5 LinkedIn™ Startup for 2023 & 2024. Backed by leading investors such as Deerfield Management, Define Ventures, Flare Capital Partners, Longitude Capital, and Polaris Partners, Cohere Health drives more transparent, streamlined healthcare processes, helping patients receive faster, more appropriate care and higher-quality outcomes. The Coherenauts, as we call ourselves, who succeed here are empathetic teammates who are candid, kind, caring, and embody our core values and principles. We believe that diverse, inclusive teams make the most impactful work. Cohere is deeply invested in ensuring that we have a supportive, growth-oriented environment that works for everyone. We can’t wait to learn more about you and meet you at Cohere Health! Equal Opportunity Statement: Cohere Health is an Equal Opportunity Employer. We are committed to fostering an environment of mutual respect where equal employment opportunities are available to all. To us, it’s personal. #LI-Remote #BI-Remote
Platform Engineer, FinOps
NateraWe are a global leader in cell-free DNA (cfDNA) testing, dedicated to oncology, women’s health, and organ health.
• Maintain and evolve scalable cost ingestion pipelines, specifically focused on the AWS Cost and Usage Report (CUR). • Help embed AI-driven intelligence into the platform to enable predictive scaling and automated cost-saving recommendations. • Leverage GitLab CI/CD pipelines to version, deploy, and automate FinOps governance workflows. • Manage FinOps-related infrastructure using Terraform to ensure controlled development and release practices. • Support ingestion and normalization of cost and usage data from SaaS platforms using APIs and integrations. • Build and maintain advanced Amazon QuickSight dashboards, utilizing dataset modeling and row-level security to provide clear insights to engineering teams. • Implement and operationalize AI-driven cost alerts. • Integrate natural language interfaces into our Internal Developer Portal (IDP). • Implement monitoring and data quality checks to ensure cost data accuracy and completeness. • Support the modeling and tracking of Savings Plans and Reserved Instances to ensure high coverage and utilization. • Enforce tagging standards and cost allocation models to ensure financial discipline keeps pace with technical innovation. • Assist in developing "day two" operational tools that use agentic reasoning to solve complex cost attribution challenges.
Platform Security Engineer
PartlyBuilding the first global platform for replacement parts, starting with auto parts.
Role Description The Platform Security Engineer will own Partly's security posture while contributing to platform reliability, reporting to Platform Lead. This role combines infrastructure security with platform reliability: - Not a pure "checkbox compliance" role; we need someone who can implement technical controls and work hands-on with infrastructure. - You'll be the first dedicated security hire at Partly, building processes from scratch while partnering closely with our SRE team. Qualifications - 5+ years in security engineering, platform engineering, or SRE with strong security focus. - Hands-on Kubernetes security experience. - Compliance framework experience (ISO 27001, SOC 2, or PCI-DSS). - Cloud security expertise (GCP experience preferred). - Infrastructure-as-code practitioner (Terraform, ArgoCD, GitOps workflows). - Clear communicator, able to translate technical vulnerabilities into business risk. Requirements - Keep Partly reliable and secure. - Participate in on-call rotation alongside the SRE team. - Own security incident response planning and testing. - Lead post-incident reviews for security-related incidents and participate in availability incidents. - Build security event monitoring and alerting. - Own our security posture and compliance. - Prepare for and pass security audits (ISO 27001, future SOC 2). - Maintain continuous compliance via Vanta. - Respond to enterprise customer security questionnaires. - Maintain and communicate the risk register to engineering and leadership. - Harden our infrastructure. - Implement principle of least privilege across the stack. - Drive network segmentation and zero-trust progress. - Make production access read-only by default for developers. - Manage vulnerabilities systematically. - Implement and operate our vulnerability scanning pipeline. - Own the vulnerability triage process. - Coordinate remediation with service owners and report on metrics and trends. Benefits - Healthy, Catered Lunches: Fresh, healthy lunches every workday in our offices. - Healthy Body, Healthy Mind: $1,500 annual wellness allowance on a Partly-branded card. - Family Comes First: 3 months of fully paid parental leave for primary caregivers. - Getting Here Is On Us: Paid 24/7 car park or commute allowance. - Workspaces That Inspire: Architecturally designed offices built for collaboration. - Office-First with Flexibility: Default to office work with a high trust environment. - We Celebrate Together: Weekly happy hours, monthly lunches, and annual global offsite. - Relocation: Generous relocation allowance for those moving to Partly HQ.
Founding Software Engineer, Platform
GovWellGovWell is an AI-powered platform committed to transforming how governments serve communities by delivering modern, configurable, and secure SaaS platforms for
Title: Founding Software Engineer, Platform Location: New York, NY Department: Product & Engineering Job Description: About GovWell We the people — and the taxpayers — deserve good government. Yet today, interacting with government services is often frustrating and inefficient. GovWell is building the AI operating system transforming how governments serve communities, starting with local agencies. GovWell replaces legacy software for municipalities and counties, empowering public servants to radically streamline public services and cut internal processing time for permits and licenses by up to 90%. Founded in 2023, GovWell powers 5,000+ mission-critical processes for agencies in 30+ states serving millions of residents. The company has raised $10M in seed funding from Work-Bench and Bienville Capital, and the team works in person at GovWell HQ in New York City. Read more about our founding story in TechCrunch. Why GovWell? - A mission that matters: Building AI-powered products to fix outdated government systems isn’t just a technical challenge—it’s a historic opportunity to improve our foundational relationship with government and ensure trillions of tax dollars result in high quality services. WATCH: Mission & Vision with CEO & Co-Founder Troy LeCaire - Real-world impact: GovWell’s product is the system of record for government services that affect millions of Americans. From streamlining permitting for small businesses to accelerating affordable housing development, your work will make an immediate difference. - Join a startup in hyper-growth: We’ve found product-market fit and are scaling the business very quickly (4X ARR growth in the last year). As an early team member, you’ll learn what it takes to build a successful startup. You’ll work closely with the founders while enjoying exceptional autonomy and ownership over your work. Role overview GovWell is hiring a Founding Software Engineer, Platform to build and operate the core backend systems that enable our engineering teams to ship quickly and safely at scale. This role sits at the intersection of product-facing backend engineering and the platform foundations that keep GovWell reliable, secure, and fast-moving. You will write production backend code while owning the systems that power deployment, observability, incident response, and security across the company. Your work will directly shape developer velocity, system reliability, and GovWell’s ability to support mission-critical government workflows as we grow. This is a hands-on role with significant ownership and long-term impact on how GovWell builds, deploys, and operates software. The role reports to the CTO and operates in a hybrid model with regular in-person collaboration in our New York City office (3+ days per week). What you’ll do - Own GovWell’s deployment and reliability platform, ensuring systems are production-safe, observable, and resilient as customer and engineering load grows. - Build and evolve CI/CD foundations that improve deployment speed, safety, and repeatability across multiple teams shipping independently. - Establish and maintain observability systems, including metrics, logging, tracing, alerting, and on-call foundations. - Lead incident response practices, including playbooks, escalation paths, postmortems, and continuous reliability improvements. - Own platform-level security and compliance foundations, including SOC 2 readiness, vulnerability management, and access controls. - Improve infrastructure efficiency and cost performance in partnership with engineering leadership. - Build internal tooling that enables product engineers to focus on feature development rather than infrastructure overhead. Who you are - 5+ years of experience as a backend, infrastructure, or platform engineer operating production SaaS systems at scale. - Strong systems engineer with deep understanding of reliability, distributed systems, and operational failure modes. - Comfortable owning CI/CD, observability, incident response, and production operations end-to-end. - Advanced user of modern AI tools (Cursor, Copilot, GPT-based workflows). - Strong coder (Node/TypeScript or similar); you build systems, not just configure tools. - High-ownership, startup-oriented operator comfortable owning a broad platform surface area. - Clear written and oral communicator who can document systems, write RFCs, and lead postmortems. Our tech stack - AWS - Kubernetes - Datadog - GitHub Actions - React - TypeScript - Node.js - GraphQL - PostgreSQL - RedwoodJS - OpenAI API Compensation and benefits Compensation within the posted salary band will be commensurate with experience. All offers will include: - Competitive base salary. - Equity / stock options. - Medical, dental, and vision insurance. - 401(k) program. - Flexible PTO. Meet the team #LI-AC1 #LI-Hybrid



