We are a Haskell, Rust, Blockchain and AI consultancy.
DevOps / Infrastructure Engineer
Location
EST (UTC-5)
Posted
3 days ago
Salary
$100K - $130K / year
Seniority
Mid Level
No structured requirement data.
Job Description
DevOps / Infrastructure Engineer
MLabs
Role Description We are hiring on behalf of our client who is seeking an exceptional, production-proven Infrastructure & DevOps Engineer to take absolute ownership of the deployment, secure networking, architectural lifecycle, and overall reliability of this distributed agent fleet from day one. The client is engineering a sophisticated infrastructure designed to launch a highly distributed fleet of managed, single-tenant personal artificial intelligence (AI) trading agents. Operating non-stop, these isolated processes execute high-frequency, complex financial workflows natively on blockchain infrastructure, dedicated exclusively to individual user portfolios. Key Responsibilities - Fleet Orchestration & Scaling: Architect, provision, and scale the core user agent fleet across a hybrid Railway and AWS ecosystem, ensuring each user retains an isolated, secure, and predictable containerized process with optimized cost tracking and precise lifecycle hooks. - Secure Network Engineering: Establish, manage, and continuously harden private overlay networks using Tailscale in production, linking disparate user agents securely with core Model Context Protocol (MCP) servers and the underlying live trading runtimes. - Automated User Provisioning: Design and construct an end-to-end, zero-touch deployment pipeline utilizing advanced infrastructure-as-code and CI/CD best practices, enabling seamless, single-click automated provisioning of containers, secrets management, and environmental configurations for new users. - Operational Resilience & SRE: Define, build, and maintain comprehensive monitoring, telemetry, alerting, and automated incident response frameworks to guarantee graceful state retention, preserving live in-flight transaction states across sudden host restarts, scheduled key rotations, or regional cloud outages. - Incident Management: Oversee system health and participate in direct real-incident response and on-call rotations to maintain strict operational continuity for the live global fleet. - Container PaaS Orchestration: Proven professional experience deploying, monitoring, and scaling complex architectures in production utilizing Railway, or equivalent containerized platform-as-a-service frameworks (such as Fly.io, Render, or Northflank). - Advanced AWS Proficiency: In-depth technical mastery of Amazon Web Services (AWS), with practical expertise spanning Virtual Private Clouds (VPC), Identity & Access Management (IAM), Secrets Manager, and elastic scaling frameworks (ECS / AWS Lambda). - Production-Grade Tailscale Networking: Demonstrated experience implementing Tailscale within a high-security production environment, with distinct competence configuring Access Control Lists (ACLs), complex subnet routing, and ephemeral node lifecycles. - Modern Infrastructure & CI/CD: Mastery of Docker containerization, comprehensive CI/CD deployment pipelines, and modern Infrastructure-as-Code (IaC) paradigms. - Blockchain & Onchain Context: Technical familiarity with blockchain mechanics, smart contract interactions, or web3 infrastructure paradigms to support decentralized application layers. - High-Availability / Financial SRE Background: A proven professional history managing environments where system stability impacts critical financial outcomes, paired with total comfort managing on-call duties and live incident response. Qualifications - Direct experience deploying, managing, and monitoring Large Language Model (LLM) or autonomous AI agent fleets at multi-tenant scale. - Prior exposure to quantitative trading systems, high-frequency execution runtimes, or deep integrations with platforms such as Hyperliquid. Benefits - Highly competitive compensation package. - The flexibility of a fully remote operating environment with an immediate start timeline. - The opportunity to shape the architectural foundation of a cutting-edge technical ecosystem intersecting Artificial Intelligence and decentralized financial infrastructure. - Access to top-tier modern tooling, modern infrastructure frameworks, and a highly streamlined, zero-red-tape development culture. Commitment to Equality and Accessibility At MLabs, we are committed to offer equal opportunities to all candidates. We ensure no discrimination, accessible job adverts, and providing information in accessible formats. Our goal is to foster a diverse, inclusive workplace with equal opportunities for all. If you need any reasonable adjustments during any part of the hiring process or you would like to see the job-advert in an accessible format please let us know at the earliest opportunity by emailing human-resources@mlabs.city. MLabs Ltd collects and processes the personal information you provide such as your contact details, work history, resume, and other relevant data for recruitment purposes only. This information is managed securely in accordance with MLabs Ltd’s Privacy Policy and Information Security Policy, and in compliance with applicable data protection laws. Your data may be shared only with clients and trusted partners where necessary for recruitment purposes. You may request the deletion of your data or withdraw your consent at any time by contacting legal@mlabs.city.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Principal DevOps Engineer
CESITCES has 26+ years of experience in delivering Software Product Development, Quality Engineering, and Digital Transformation Consulting Services to Global SMEs & Large Enterprises. CES has been delivering services to some of the leading Fortune 500 Companies including Automotive, AgTech, Bio Science, EdTech, FinTech, Manufacturing, Online Retailers, and Investment Banks. These are long-term relationships of more than 10 years and are nurtured by not only our commitment to timely delivery of quality services but also due to our investments and innovations in their technology roadmap. As an organization, we are in an exponential growth phase with a consistent focus on continuous improvement, process-oriented culture, and a true partnership mindset with our customers.
Role Description We are seeking a highly skilled and self-driven Senior Software Engineer – Azure DevOps Engineer with deep expertise in Azure DevOps, Infrastructure as Code, and cloud automation. The ideal candidate will take full ownership of tasks from requirement gathering to development, handover, and troubleshooting, ensuring seamless delivery and minimal downtime across environments. Strong troubleshooting skills in Azure Cloud, excellent knowledge of Azure administration, and a solid understanding of networking concepts are essential. The candidate will leverage GitHub Copilot—including MCP-based context integrations—to accelerate specification-driven development, improve code quality, and automate test and deployment workflows. The candidate should have excellent communication skills, interpersonal skills, ownership and commitment. Experience in the healthcare domain is a plus. Job Responsibilities - Design, develop, and maintain advanced Azure DevOps YAML pipelines for CI/CD automation. - Write and maintain robust PowerShell scripts for automation, monitoring, and deployment tasks. - Develop and manage infrastructure as code (IaC) using Bicep for Azure resource provisioning. - Operationalize GitHub Copilot to ground code generation, improving speed and accuracy of feature delivery. - Take complete ownership of assigned tasks, including requirement analysis, implementation, documentation, and support. - Collaborate with cross-functional teams to understand deployment needs and deliver scalable DevOps solutions. - Proactively monitor and support development, testing, and production environments, ensuring high availability and minimal downtime. - Troubleshoot and resolve issues across the DevOps lifecycle, including build failures, deployment errors, and infrastructure problems. - Continuously improve DevOps practices, tools, and processes to enhance team productivity and system reliability. - Monitor and optimize infrastructure performance, cost, and security. - Mentor junior engineers and contribute to knowledge sharing within the team. Qualifications - 6–9 years of hands-on experience in Azure DevOps. - Proven expertise in Infrastructure as Code (IaC) using Bicep and ARM templates. - Strong experience in building and maintaining Azure DevOps (ADO) YAML pipelines for CI/CD. - Advanced scripting skills in PowerShell and YAML; familiarity with Unix Shell scripting is a plus. - In-depth knowledge of Azure Cloud services, including compute, storage, networking, and security. - Proven experience using GitHub Copilot in day-to-day development and demonstrable expertise configuring and using Copilot with MCP. - Strong troubleshooting and diagnostic skills for Azure deployments and infrastructure issues. - Solid understanding of Application Gateway, Azure networking, including Hub-Spoke architecture, NSGs, VNets, and routing. - Proficiency in Git and source control workflows. - Experience with .NET and RDBMS is a plus. Educational Requirements - UG: B. Tech / B.E. - Any Specialization, Computers, Electronics/Telecommunication - PG: MS/M.Sc. (Science) – Any Specialization, M.Tech – Any Specialization, MCA – Computers Benefits - Flexible working hours to create a work-life balance. - Opportunity to work on advanced tools and technologies. - Global exposure to not only collaborate with the team, but also to connect with the client portfolio and build professional relationships. - Highly encouraged for any innovative ideas & thoughts and we support in executing the same. - Periodical and on-spot rewards and recognitions on your performance. - Provides a better platform for enhancing skills via many different L&D programs. - Enabling and empowering atmosphere to work along.
Staff Site Reliability Engineer – Project Volcano
Kong Inc.The cloud connectivity company. Powering connections to build a reliable digital world.
• Own reliability for Volcano end-to-end: Define and drive SLOs, error budgets, and incident response practices for all Volcano services • Architect the platform's infrastructure: Design and build the multi-region Kubernetes infrastructure • Build the GitOps and CI/CD backbone: Establish deployment automation, canary pipelines, and preview environment provisioning • Scale managed data services: Design, operate, and harden multi-tenant PostgreSQL clusters • Drive observability from day one: Instrument every Volcano service with meaningful SLIs • Lead cross-functional reliability work: Collaborate with the OCTO team, product engineering, and security to bake reliability into Volcano's architecture • Set SRE culture and standards: Mentor engineers on reliability principles; lead postmortems • Evaluate and adopt emerging technologies: Given Volcano's greenfield nature, evaluate edge runtimes, serverless compute, vector databases, and AI-native infrastructure components.
• As a Senior DevOps Engineer at Incode, you’ll be a senior individual contributor on the platform team responsible for operating, scaling, and securing the production systems behind Incode’s identity infrastructure. • You’ll own outcomes - not tickets - across our Kubernetes and AWS environments, partnering closely with software engineers, security, and leadership to make our platform faster, safer, and quieter. • You won’t just keep the lights on; you’ll improve reliability, reduce operational friction through automation, and lead durable fixes when things break.
Partner AI Deployment Engineer – AWS
OpenAIA privately-held artificial intelligence (AI) research company, OpenAI discovers, builds, and enacts paths to secure artificial general intelligence (AGI). Foun
• Serve as the senior technical counterpart to AWS field leadership, building trust and credibility across regions and teams. • Influence joint account strategy and technical direction for high-priority opportunities. • Shape how OpenAI engages with AWS by defining engagement models, prioritization frameworks, and best practices. • Proactively identify and drive net-new opportunities and high-impact use cases across the AWS ecosystem. • Lead technical strategy for large, ambiguous, and high-stakes enterprise engagements. • Guide customers from early ideation through architecture design, prototyping, and production deployment. • Act as a technical decision-maker and escalation point, de-risking complex implementations. • Design and communicate end-to-end AI architectures leveraging OpenAI and AWS services. • Build and guide development of prototypes, POCs, and reference implementations to accelerate adoption. • Establish best practices for scalable, secure, and production-ready GenAI systems. • Enable AWS and partners through scalable technical motions (workshops, playbooks, reference architectures, demos). • Develop reusable solution patterns and assets that can be deployed independently by AWS teams and SIs.



