Dev.Pro logo
Dev.Pro

Software Development Partner. Result-driven. Quality-obsessed.

Senior AI Platform Engineer

Platform EngineerPlatform EngineerFull TimeRemoteSeniorTeam 501-1,000Since 2011H1B No SponsorCompany SiteLinkedIn

Location

Argentina

Posted

12 days ago

Salary

0

Seniority

Senior

Job Description

Senior AI Platform Engineer

Dev.Pro

Role Description We are seeking a Senior AI Platform Engineer to own the core platform layer that powers every EasyBee AI agent in production — from multi-tenant agent configuration and schema architecture, to data pipeline contracts, evaluation harnesses, and customer onboarding automation. This role sits at the intersection of backend platform engineering, LangGraph-based orchestration, and AI evaluation systems. - Own the infrastructure that makes all features possible: the agent orchestration graph, the customer configuration schema, end-to-end conversation logging, automated eval pipelines, and the scripts that deploy new customers in under 30 minutes. - If you love owning systems that other engineers depend on, ship at high velocity across a wide surface area, and take pride in leaving codebases cleaner than you found them — we want to hear from you. Qualifications - Bachelor's degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience. - Proficient or Advanced use of agentic workflows for coding in tools like Cursor AI or Claude Code. - 4+ years building and owning production-grade backend systems in Python. - Proven experience owning a core platform or shared infrastructure layer used by multiple teams or customers. - Hands-on track record with multi-tenant system design — schema isolation, config-driven parameterization, and deployment automation. - Experience building evaluation harnesses for LLM-based systems with quantitative metrics. Requirements - Strong systems thinking — ability to see how schema decisions in the core platform ripple downstream to eval, logging, onboarding, and customer deployments. - Comfort owning wide surface area — this role crosses platform, data, eval, and ops without a narrow specialization. - High individual shipping velocity — ability to close multiple GitHub issues per day with clean PRs and minimal back-and-forth. - Strong schema discipline — treats data contracts as first-class artifacts, not afterthoughts. - Ability to work autonomously with minimal supervision in a fast-moving startup environment. - Strong written communication for PR descriptions, Notion documentation, and deployment SOPs. Benefits - 30 paid days off each year — use them for vacation, holidays, or personal time. - 5 paid sick days, up to 60 days of medical leave, and 6 paid days off for family events like weddings, funerals, or having a baby. - Partially covered health insurance - after probation. - Wellness bonus for gym memberships, sports nutrition, and similar needs. What success looks like - A core platform where every new customer can be deployed in under 30 minutes from a configuration template + KB content — zero custom code per customer. - An eval pipeline with automated simulation scenarios and distributed tracing that runs on every PR and blocks deployment when scores drop below threshold. - A conversation logging system that captures every production interaction with full metadata, enabling the data strategy and future fine-tuning. - A clean, schema-validated platform codebase where new nodes, customers, and capabilities can be added with predictable behavior and no silent regressions. - A deployment SOP so reliable that any engineer on the team can onboard a new customer without escalation.

Related Categories

Related Job Pages

More Platform Engineer Jobs

Lob logo

Senior Platform Engineer

Lob

Lob was founded in 2013 by technical co-founders with a vision to connect the world one mailbox at a time. We're transforming the way businesses use direct mail and bringing the power of technology to a traditionally manual channel. Our modern logistics and fulfillment engine helps businesses to build and scale high-quality, personalized direct mail programs without the operational burden. As we grow to meet the evolving needs of our customers and expand our product offerings, we're building a team to shape the future of direct mail.

Full TimeRemoteTeam 125Since 2013

Lob was founded in 2013 by technical co-founders with a vision to connect the world one mailbox at a time. Today, we're transforming the way businesses use direct mail and bringing the power of technology to a traditionally manual channel. Our modern logistics and fulfillment engine helps businesses to build and scale high-quality, personalized direct mail programs without the operational burden. As we grow to meet the evolving needs of our customers and expand our product offerings, we’re building a team to shape the future of direct mail. About The Role We are looking for a Senior Platform Engineer to help scale and improve the reliability, observability, performance, and cost efficiency of our platform infrastructure. This role is focused on observability engineering and infrastructure optimization across AWS environments. The ideal candidate has deep hands-on experience with Datadog, OpenTelemetry, and HashiCorp Nomad, and understands how to build highly visible, scalable, and operationally efficient systems while actively reducing unnecessary infrastructure spend. You will work closely with engineering teams to improve telemetry, monitoring, performance testing, platform reliability, and cloud infrastructure efficiency across a fast-moving distributed environment, including leveraging modern AI-driven tooling and operational workflows where appropriate. What You’ll Work On - Building and improving observability across distributed systems and services - Designing dashboards, alerting, metrics, tracing, and telemetry pipelines - Improving operational visibility using Datadog, and OpenTelemetry - Helping evolve and mature the organization’s observability strategy and tooling - Supporting and improving HashiCorp Nomad orchestration environments - Identifying and implementing AWS cost-saving opportunities across compute, storage, and platform infrastructure - Improving infrastructure utilization and operational efficiency across Nomad workloads - Optimizing S3 storage utilization, lifecycle management, and storage costs - Designing and maintaining performance testing environments and tooling - Running load and performance tests to identify bottlenecks and scalability issues - Managing and tuning Elasticsearch/OpenSearch environments - Troubleshooting production performance issues across services, infrastructure, and databases - Partnering with engineering teams to improve platform reliability, scalability, and infrastructure efficiency Responsibilities - Lead observability initiatives across infrastructure and applications - Design and maintain monitoring, telemetry, dashboards, tracing, and alerting systems - Build actionable visibility into platform health, reliability, and performance - Improve incident detection, troubleshooting, and operational response capabilities - Define observability standards and best practices across engineering teams - Drive infrastructure cost optimization initiatives across AWS services and platform environments - Analyze infrastructure utilization and recommend performance and cost efficiency improvements - Maintain and improve infrastructure-as-code standards and workflows - Design, build, and maintain scalable performance testing environments and tooling - Execute and analyze load/performance testing initiatives - Support and improve Nomad-based orchestration environments - Troubleshoot complex production and infrastructure issues across distributed systems - Collaborate closely with engineering teams to improve scalability, reliability, operational visibility, and infrastructure efficiency - Create and maintain operational documentation and platform best practices Qualifications - 7+ years of experience in platform engineering, infrastructure engineering, or site reliability engineering - Strong hands-on experience with HashiCorp Nomad - Deep expertise with Datadog - Strong experience implementing and operating observability platforms using OpenTelemetry and modern monitoring tooling - Experience with Grafana or similar visualization and observability platforms - Strong understanding of distributed tracing, metrics, logging, and monitoring best practices - Experience building dashboards, alerts, telemetry pipelines, and operational visibility tooling - Strong experience identifying and implementing AWS cost optimization strategies in production environments - Strong knowledge of S3 optimization, lifecycle management, and storage cost reduction - Experience building and running performance/load testing environments - Strong troubleshooting and performance analysis skills across distributed systems - Strong experience operating infrastructure in AWS environments - Strong experience with Terraform and infrastructure-as-code practices - Experience balancing platform reliability, observability, and infrastructure cost efficiency at scale - Experience working with distributed and event-driven architectures using technologies such as Redis, SQS, or Temporal - Experience managing and tuning Elasticsearch or OpenSearch clusters - Experience working in fast-paced engineering environments - Strong communication and collaboration skills Nice to Have - Exposure to PostgreSQL RDS to Aurora migrations - Experience with Kubernetes - Experience with CI/CD systems and deployment automation - Experience with Go, Python, or TypeScript Since great engineers come from a variety of backgrounds, it doesn’t particularly matter if you have a specific degree—we want to hear about your contributions in a real-world setting. Compensation information The compensation for this role consists of a base salary + additional RSUs. Annual Base Salary: $160,000 - $177,500 <#LI-REMOTE #LI-GD1 “Lob’s salary ranges are based on market data, relative to our size, industry and stage of growth. Salary is one part of total compensation, which also includes equity, perks and competitive benefits. Salary decisions are based on many factors including geographic location, qualifications for the role, skillset, proficiency and experience level. Lob reasonably expects to pay candidates who are offered roles within the provided salary ranges.” We offer remote working opportunities in AZ, CA, CO, DC, FL, GA, IA, IL, MA, MD, MI, MN, MT, NE, NC, NH, NJ, NV, NY, OH, OR, PA, RI, TN, TX, UT, and WA, unless specified otherwise in the job description above. If you are looking for a progressive, fun-spirited, and mentally stimulating environment, come join us at Lob! Our Commitment to Diversity Lob is an equal opportunity employer and values diversity of backgrounds and perspectives to cultivate an environment of understanding to have greater impact on our business and customers. We encourage under-represented groups to apply and do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, disability status, or criminal history in accordance with local, state, and/or federal laws, including the San Francisco’s Fair Chance Ordinance. Recent awards #88 on BuiltIn's Best Remote Midsize Companies to Work For in 2025 BuiltIn Best Remote Midsize Companies to Work For in 2024 BuiltIn Best Midsize Companies to Work For 2022

United States
$160K - $177.5K / year

Power Platform Specialist

Encora Digital

Encora, a leader in digital engineering, drives innovation by crafting cutting-edge, cloud-first, data-first, and AI-first solutions that redefine industries. Since its inception i

Role Description We at Coforge are hiring a Power Platform Specialist with the following skill set. - Design and develop advanced Power BI dashboards and Power Apps to visualize complex industrial engineering data. - Configure and manage the Microsoft Power Platform environment, ensuring seamless integration with the Microsoft Fabric ecosystem. - Collaborate with Data Engineers and Data Scientists to consume processed data models and LLM outputs for reporting. - Build internal tools and automated workflows that facilitate human validation of AI-driven engineering calculations. - Optimize report performance and maintain data integrity across all visualization and application layers. Qualifications - Expertise in Power BI: Advanced report creation, data modeling, and data visualization techniques. - Power Apps Development: Experience building canvas and model-driven applications for internal business use. - Fabric Integration: Practical experience connecting Power Platform tools to the Microsoft Fabric environment. - Data Transformation: Proficiency in Power Query (M) and basic DAX for data preparation. Requirements - Power Automate: Experience creating automated workflows to streamline data processes and notifications. - Advanced Analytics: Proficiency in DAX and SQL for complex data modeling and calculations. - Cloud Ecosystem: Familiarity with the Azure platform and Entra ID (formerly Azure AD) for authentication. - UI/UX Design: Ability to create intuitive and user-friendly interfaces for technical and engineering data. - Domain Knowledge: Interest or experience in industrial engineering or hydraulics data visualization. - Cross-functional Collaboration: Experience working within integrated pods alongside Data Engineers and AI developers. Company Description At Coforge, we hire professionals based solely on their skills and do not discriminate based on age, disability, religion, gender, sexual orientation, socioeconomic status, or nationality.

Brazil
Job Closed
Dev.Pro logo

Senior AI Platform Engineer – HexCore, Eval Systems

Dev.Pro

Software Development Partner. Result-driven. Quality-obsessed.

Full TimeRemoteTeam 501-1,000Since 2011H1B No Sponsor

• Own and evolve the core platform repository • Build and maintain cross-client isolation across customer configuration • Own the end-to-end conversation logging system • Own the eval suite end-to-end • Build and maintain onboarding automation scripts that deploy a new customer in under 30 minutes • Ensure all platform APIs meet latency targets

Brazil
Toucanberry Tech logo

Azure Platform Engineer

Toucanberry Tech

Helping the Reinsurance sector embrace modern technology

ContractRemoteTeam 11-50H1B No Sponsor

• Manage and enhance a live Azure-based production environment • Lead the migration from Ansible to Terraform • Support the shift to an event-driven architecture using Azure Service Bus • Help evaluate and integrate a third-party AI solution • Troubleshoot issues and optimise performance and reliability • Collaborate with engineers to deliver iterative platform improvements

United Kingdom
£250 - £380 / day
Job Closed