Job Closed
This listing is no longer active.
Senior Platform Engineer, Quality Systems
Location
United States
Posted
109 days ago
Salary
$160K - $220K / year
Seniority
Senior
Job Description
Senior Platform Engineer, Quality Systems
Astra (astrafi.com)
Senior Platform Engineer, Quality Systems Location: Remote (US) Type: Full-time Experience: 5+ years About Astra Astra is a high-impact engineering team building mission-critical financial infrastructure. Our fintech platform powers payment processing, fraud detection, and financial compliance for businesses handling 100M+ in weekly transaction volume. We maintain 99.9%+ uptime, process complex cross-border payments, and coordinate with multiple third-party financial services while meeting strict regulatory requirements. What You'll Do Build Release Confidence Systems: Design and implement automated smoke tests and workflow validations for our most important payment flows, and integrate them into CI and release gates Improve Environment Fidelity: Make testing and staging behave more like production by improving stability, data hygiene, environment health checks, and repeatable test scenarios Develop Test Infrastructure: Build frameworks and harnesses for API and event-driven workflow testing, including partner simulations and contract checks where appropriate Enable Developer Productivity: Build internal self-service tooling and automation that reduces manual steps, improves debugging workflows, and increases engineering velocity Improve Release Observability and Documentation: Strengthen release dashboards, visibility, and documentation practices to improve triage, bug tracking, and regression prevention over time Strengthen CI and Developer Workflows: Improve CI speed and signal, reduce flaky tests, and make failures easy to debug with useful artifacts and visibility Add Operational Guardrails: Partner with engineering to define monitors, canaries, rollout strategies, and rollback patterns for higher-risk changes Close the Loop on Regressions: Turn production regressions into durable prevention by adding automated checks, checklist steps, improved documentation, or better safety rails What We're Looking For Senior Platform Engineer, Quality Systems Requirements 5+ years of software engineering experience building and operating production systems Demonstrated experience building test infrastructure, automation frameworks, CI gates, or developer productivity tooling Strong foundation in systems engineering, debugging, and designing pragmatic solutions that improve reliability and velocity Experience with distributed systems, asynchronous workflows, events, retries, and state transitions Education Bachelor's degree in Computer Science, Engineering, or related field required Technical Skills Software Engineering: Strong coding ability in Python or another backend language (we use Python 3) Test Engineering: Experience building workflow and integration tests beyond unit tests, including harnesses, fixtures, and test data strategies CI and Tooling: Experience improving CI pipelines, reducing flakiness, and increasing the signal of automated checks Cloud Infrastructure: Google Cloud Platform, or similar cloud platforms Observability: Logs, metrics, tracing, dashboards, alerting, and using observability to improve release safety Preferred Experience Fintech or High-Reliability Systems: Payments, money movement, reconciliation, risk, or compliance systems Partner Integrations: Testing and operating systems dependent on external APIs and webhooks Release Engineering: Building release gates, canaries, rollout plans, and rollback playbooks Test Environment Strategy: Experience improving staging fidelity and maintaining stable test fixtures for complex systems Why This Role Matters Direct Impact: You’ll be a big part of a small engineering team and your work will directly improve how we ship High Leverage: The tooling and automation you build will compound over time as the product and team scale Faster, Safer Releases: Help the team ship more frequently with smaller releases and higher confidence Complex Problems: Work on sophisticated workflow and reliability challenges that few engineers get to solve Team Enablement: Make it easier for engineers to build and validate changes without fear of unpredictable regressions What We Offer Competitive compensation with equity in a growing fintech company Remote-first culture with flexible working arrangements Small team, big impact - your work directly shapes our platform Professional growth - lead technical decisions and drive foundational improvements Modern tech stack - work with cutting-edge cloud technologies Mission-driven - build systems that power financial innovation Remote Work and Culture Astra is a remote-first company hiring only within the U.S. We value thoughtful collaboration, clarity, and initiative. We’re proud to be an equal opportunity employer and are committed to building a diverse and inclusive team. How to Apply We’re looking for engineers who enjoy building high-leverage systems that improve reliability and velocity. If you’re excited to own release confidence through tooling, automation, and platform improvements, we’d love to hear from you.
Job Requirements
- The Role
- We’re looking for a Senior Platform Engineer, Quality Systems to build the systems that make releases safe and fast. This is a platform engineering role with a clear mandate around release confidence and developer enablement. You’ll build test infrastructure and automated workflow validation for our core payment flows, and you’ll also own adjacent platform improvements in CI, pre-production environments, and release observability.
- This role requires strong systems thinking, hands-on engineering, and comfort working across distributed services and asynchronous workflows (webhooks, delayed processing, external dependencies). The goal is to help the team ship more frequently with higher confidence by building high-leverage tooling and automation that becomes part of our day-to-day engineering workflow.
Related Guides
Related Categories
Related Job Pages
More Platform Engineer Jobs
Platform Support Engineer
SaviyntThe #1 Converged Identity Platform with Intelligent Access Governance for Employees, Third Parties & Machines.
• Strong pod-level troubleshooting skills in AKS/EKS (not just restarting pods). • Analyze application and DB (RDS, MySQL) performance issues. • Oversee the monitoring of our SaaS applications and underlying infrastructure (Kubernetes on AWS and Azure, VPN connections, customer applications, Elastic Search, MySQL) for alerts and performance issues. • Ensure adherence to defined SLAs (Service Level Agreements) and KPIs (Key Performance Indicators) for operational performance. • Plan and coordinate scheduled maintenance activities with minimal impact to service availability.
Senior Data Platform Engineer – Financial Services
Truelogic SoftwarePremium boutique software development company that helps brands with big ideas to make a difference in people’s lives.
• Manage High Availability solutions including Always On Availability Groups, Failover Clustering, and HA/DR strategies. • Perform advanced T-SQL development (CTEs, Window Functions) and expert-level performance tuning (execution plans, indexing strategies). • Implement enterprise-grade database security including TDE, Row-Level Security (RLS), and RBAC governance controls. • Design and build scalable data pipelines using Python 3, Pandas, SQLAlchemy, and orchestration tools (e.g., Airflow). • Manage AWS-based data infrastructure using Infrastructure as Code (Terraform/CDK) and serverless architectures. • Develop and integrate RESTful APIs (FastAPI/Flask) to support microservices and enterprise data exchange. • Leverage AI-assisted development tools (e.g., GitHub Copilot, Cursor) to accelerate engineering workflows. • Architect data pipelines optimized for AI/ML workloads, including feature engineering and model deployment support.
About Clarity AI 🪴 Clarity AI is a global tech company founded in 2017 with a unique mission: bringing societal impact to markets. We leverage AI and machine learning technologies to provide top international investors, governments, companies, and consumers with the right data, methodologies, and tools to make more informed decisions. We are now a team of more than 300 highly passionate and curious individuals from all over the world, with offices in New York, Madrid, London, Paris, and Abu Dhabi. Together, we have established Clarity AI as a leading sustainability tech AI company backed by investors and strategic partners such as BlackRock, SoftBank, and Deutsche Börse , who believe in us and share our goals. We are dedicated to cultivating an exceptional workplace environment, and we take pride in our culture, defined by our commitment to being fact-based, diverse, transparent, meritocratic, and flexible. We have plans to continue growing our teams globally, so if you would like to join us on this rocket ship, keep reading! Your work will shape and guide the sustainable decisions of investors, companies and consumers worldwide. About The Role 💻 We are looking for a Senior GenAI Platform Staff Engineer who is an expert in the deployment and scaling of LLMs and Agentic systems . In this role, you will bridge the gap between machine learning experimentation and production at scale by building the robust, highly efficient platform that powers our AI initiatives. While our Data Science teams focus on developing and tuning state-of-the-art models, you will be the owner of the platform that enables them. You will define best practices, build automated pipelines, and ensure that our infrastructure can handle complex agentic workflows with high reliability and performance. This is a role for a visionary who stays ahead of the daily shifts in the AI landscape and can rapidly adapt our stack to leverage emerging trends. For more insight into the technologies used by the engineering team at Clarity AI, please explore our Tech Stack What You’ll Be Doing 🚀 As a Senior GenAI Platform Staff Engineer, you will be responsible for: GenAI Platform Engineering: Designing and developing the core platform that enables the efficient deployment, scaling, and management of LLMs and multi-agent systems. Infrastructure for Agents: Building specialized infrastructure to support long-running agentic workflows, including state management, tool-calling interfaces, and complex reasoning loops. High-Scale Productionization & Model Serving: Scaling inference for LLMs to handle global demand while optimizing for latency, throughput, and cost. Implement standard batch and online serving with controlled rollback. Build & Delivery : Establishing the "Golden Path" for model deployment through a self-service path to move code, data, and models to production safely and reproducibly , including automated evaluation frameworks, safety guardrails, and CI/CD/CT pipelines. Strategic Vision & Product Management: Continuously monitoring the AI ecosystem and proactively evolving our platform to maintain a competitive edge. This includes adopting best practices in Platform Product Management and driving the adoption of golden-path solutions . End-to-End Observability: Implementing deep observability for LLMs, tracking not just system health but providing unified visibility into health, impact, and root cause across data, ML, and GenAI (including model hallucinations, token usage, and RAG performance). Collaborative Foundation: Providing the tools and abstractions that allow Data Scientists and stakeholders to move from a "tuned model" to a "production service" with zero friction. Location 🌍 The role is based in Madrid/Spain ( Remote / Hybrid ). What You’ll Need 👀 LLM & Agent Expertise: Deep, hands-on experience deploying Large Language Models and complex agentic architectures at scale. GenAI Platform Specifics: Proven experience in implementing Prompt Lifecycle Management (versioning, testing, and deploying prompts as code), an LLM Abstraction Layer (provider-agnostic access), and systems for Cost & Usage Control (visibility and limits on GenAI spend per use case). Evaluation & Benchmarking Mastery: Expert-level experience building automated evaluation pipelines and frameworks (e.g., Ragas, DeepEval, G-Eval) and implementing LLM-as-a-judge patterns to validate model quality, grounding, and safety in CI/CD. Platform & MLOps Mindset: A proven track record of building platforms or shared infrastructure. Deep understanding of MLOps concepts like Model Registry (versioning, state management, and lineage) and Model Monitoring & Drift Detection . 3+ years of experience in MLOps or high-scale Software Engineering with a focus on AI production environments. Technical Stack Mastery: Expert-level Python and deep experience with container orchestration ( Kubernetes , Docker ) and cloud infrastructure (AWS/GCP). AI Tooling & Frameworks: Proficiency with orchestration libraries (e.g., LangChain, LlamaIndex, CrewAI), vector databases (e.g., Pinecone, Weaviate), and inference engines (e.g., vLLM, TGI). Agility & Adaptability: The ability to learn and implement new technologies in a field that changes weekly. You should be a "fast mover" who enjoys constant evolution. Software Excellence & Governance: Strong fundamentals in API design, microservices, and "GitOps" methodologies, including the implementation of automated security and compliance by default. English Proficiency: Excellent communication skills (minimum C1 level), with the ability to articulate technical vision to both engineers and leadership. What We Offer 🥁 Competitive compensation, both in terms of base salary as well as equity plans that enable to you to share in our success Flexibility in ways of working both in terms of your schedule as well as your location, whether you prefer to work from home, the office, or abroad with access to a global network of co-working spaces Generous paid time off schemes , including vacation, sabbatical, religious observance and compensation days Meaningful benefits including private healthcare coverage, fitness and wellness programs covered through Wellhub, working-from-home allowances to help you set up your home office and cover monthly expenses Professional development with annual training budget for conferences, courses, certifications and access to top market e-learning platforms Collaborative environment with multiple offices around the globe, regular team activities and events as well as employee-led resource groups More About Clarity AI ⭐ Clarity AI’s Founder and CEO, Rebeca Minguela , is a successful entrepreneur who has been recognised by prestigious institutions like the World Economic Forum as one of the most distinguished leaders under 40. The leadership team has an international presence and is composed of professionals from leading tech, consulting, and banking firms, entrepreneurs, PhDs from top research institutions, and MBA graduates from top business schools. Clarity AI has received several awards: The Forrester New Wave - ESG Ratings, Data, and Analytics - Leaders for 2022-2024 Investment Week - Best Sustainable Investment Research & Ratings Provider 2023 Fast Company - Most Innovative Companies 2023 European Commission | EU Seal of Excellence 2020 World Economic Forum - Technology Pioneer 2020 World Economic Forum, Young Global Leader - Rebeca Minguela Clarity AI believes diversity, inclusion, and belonging are essential for creating an innovative and successful workplace. By actively promoting and engaging in sustainability efforts, we can help create a more equitable and resilient future for our planet and all its inhabitants.
Founding Platform Engineer
HexaStartup studio specializing in the future of work (eFounders), web3, fintech, AI, and health.
• Own the multi-tenant platform. Architect and maintain the infrastructure that hosts hundreds of MCP codebases. Each isolated, each configurable, all observable from a single pane of glass (or MCP). • Build the data backbone. Design the pipelines that aggregate statistics, analytics, and performance metrics across every deployed instance. Make the data real-time, reliable, and actionable. • Ship the core app. Work hands-on in the main Waniwani application: a Next.js/TypeScript codebase with Drizzle ORM, React Query, and a modern component architecture. You'll build features that customers and internal teams use daily. • Design for scale. Make architectural decisions that hold at 10x. Multi-codebase deployment strategies, database partitioning, caching layers, job queues. Whatever the problem demands. • Automate everything. CI/CD pipelines, codebase provisioning, health monitoring, rollback systems. If a human has to do it twice, you build a system to do it forever. • Instrument relentlessly. Logging, tracing, alerting. When something breaks at 2am across 300 instances, the system should tell you exactly where and why, before anyone notices. • You'll work directly with the CTO. The architecture is yours to shape. The stack is TypeScript end-to-end.



