MLabs

We are a Haskell, Rust, Blockchain and AI consultancy.

Senior DevOps / SRE Engineer

DevOps EngineerDevOps EngineerFull Time Remote SeniorTeam 51-200H1B No SponsorCompany Site LinkedIn

Location

United States

Posted

75 days ago

Salary

$120K - $150K / year

Seniority

Senior

EnglishAnsible AWS Docker Grafana JavaScript Kafka Kubernetes Node.js PostgreSQL Prometheus Python Redis Terraform Go

Job Description

• Build and maintain the infrastructure for concurrent AI trading agents, managing complex cron schedules, state files, and trailing stop processes. • Deploy and manage agent environments, including workspace persistence, isolated session management, and Model Context Protocol (MCP) server connectivity. • Design and operate pipelines for shipping trading skills and plugins to production without interrupting live trading activity. • Execute deployment strategies (blue/green, canary) ensuring active financial positions remain protected during every infrastructure change. • Build comprehensive alerting across the full stack using metrics, logs, and traces to detect agent failures, state file corruption, or infrastructure regressions before financial loss occurs. • Operate and scale core platform infrastructure, including Kubernetes (EKS) clusters, Redis, Postgres, ClickHouse, and Kafka. • Maintain blockchain node infrastructure and ensure stable connectivity to exchange APIs and on-chain transaction systems. • Lead incident response and on-call practices, including debugging, mitigation, and post-mortems to improve long-term platform reliability.

Job Requirements

Extensive experience in DevOps, SRE, or Infrastructure Engineering, preferably within a startup environment where systems were built from the ground up.
Proven track record of deploying, scaling, and debugging production workloads, specifically within AWS EKS.
Proficiency with tools such as Terraform, Ansible, or equivalent frameworks.
Hands-on experience with Docker and Helm for packaging production services.
Experience operating production-grade data and messaging systems (Redis, Postgres/RDS, ClickHouse, Kafka).
Strong experience with Prometheus, Grafana, Datadog, Loki, or OpenTelemetry to build proactive operational visibility.
Ability to debug across multiple languages, including Python, Node.js, and Go.
Understanding of systems where latency and reliability have direct financial consequences.
Familiarity with node infrastructure, exchange APIs, wallet operations, and on-chain monitoring.
Experience managing secrets, access controls, and production hardening for sensitive financial environments.
Experience defining SLOs and building mature on-call practices.

Benefits

Opportunity to build infrastructure for a new category of software (Autonomous AI Agents).
High-autonomy environment with a focus on engineering excellence and technical ownership.
Competitive compensation package commensurate with senior-level experience.
Remote-first or flexible working arrangements (as specified by the client).

Related Categories

DevOps Engineer

Related Job Pages

Remote Full-time Jobs (US)Remote Python Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

Senior DevOps Engineer

CI&T

Navigate Change

DevOps Engineer75 days ago

Full Time RemoteTeam 5,001-10,000Since 1995H1B No Sponsor

Company Site LinkedIn

• Design, implement, and maintain CI/CD pipelines using GitHub Actions. • Automate infrastructure provisioning and management using Terraform. • Leverage AWS services to build and maintain scalable and reliable cloud solutions. • Monitor and optimize cloud infrastructure for performance and cost. • Develop scripts and tools using Python for automation and troubleshooting. • Collaborate with cross-functional teams to ensure seamless integration and delivery. • Stay up to date with the latest DevOps practices, tools, and technologies. • Troubleshoot and resolve infrastructure-related issues promptly.

AWS Cloud EC2 Python Terraform

View details: Senior DevOps Engineer

Brazil

Apply

Job Closed

Senior DevOps Specialist

Black Piano

Building remote teams.

DevOps Engineer75 days ago

Full Time RemoteTeam 51-200Since 2023H1B No Sponsor

Company Site LinkedIn

• Work with other engineers to design, implement and operate our cloud infrastructure on AWS • Ensure the engineering team has the tools available to maintain and operate a business-critical SaaS application. For example, telemetry, alerting, tracing, etc • Build and maintain CI and CD pipelines for reliable deployments and rollbacks as necessary • Ensure our and our customer’s data is safe and secure, including in disaster recovery situations • Help improve engineering productivity and efficiency • Contribute to the long-term product and technology roadmap

AWS Cloud

View details: Senior DevOps Specialist

India

Apply

DevOps Engineer

Cosmote Global Solutions

DevOps Engineer75 days ago

Contract RemoteTeam 11-50H1B No Sponsor

Company Site LinkedIn

• Understand current policies on three Entra ID tenants • Develop terraform modules and Azure DevOps pipelines to ensure secure management of policies • Prepare Conditional Access Policy operations transition from current team to Cyber Security team • Maintenance of the Conditional Access Policies (troubleshooting, new policies implementation, improve existing policies)

AWS Azure Cloud Cyber Security Python Splunk SQL Terraform

View details: DevOps Engineer

Luxembourg

Apply

Job Closed

Senior Site Reliability Engineer, SRE

Christian Care Ministry

A Christ-centered community wellness experience based on faith, prayer, and personal responsibility.

DevOps Engineer75 days ago

Full Time RemoteTeam 501-1,000Since 1993H1B Sponsor

Company Site LinkedIn

• Collectively work on the design, evolution, and operational health of CCM’s AWS environment, including architectural decisions, standards, and best practices • Design, implement, and optimize AWS-based infrastructure using services such as EC2, ECS/EKS, Lambda, RDS, S3, CloudWatch, IAM, and VPC • Design and manage cloud infrastructure using Infrastructure as Code (e.g., Terraform, CloudFormation, or equivalent) • Lead new implementations and major reliability initiatives, serving as a subject matter expert for AWS and SRE best practices • Actively monitor, analyze, and optimize AWS spend, providing regular cost insights and recommendations that balance reliability, performance, and fiscal stewardship • Apply and mature site reliability principles to improve system availability, scalability, performance, security, and observability • Design, analyze, and implement automation to eliminate operational toil and improve system efficiency • Provide advanced operations and systems administration for cloud-hosted and hybrid platforms supporting CCM’s IT systems and services • Define and improve monitoring, alerting, logging, and incident response practices to proactively identify risks and minimize customer impact • Lead complex production incidents, perform root cause analysis, and drive corrective and preventive actions • Mentor and provide technical guidance to junior and mid-level engineers without direct people-management responsibilities • Collaborate with engineering, QA, security, and business teams to embed reliability throughout the SDLC • Ensure systems and data are handled in compliance with legal, regulatory, and organizational requirements • Develop and continuously improve production engineering processes, including: • Change and configuration management • Monitoring and observability • Incident and emergency response • Disaster recovery and business continuity • Capacity planning and performance tuning • Infrastructure-as-code and deployment automation • Partner with leadership to establish and enforce consistent IT Production policies, standards, and tooling • Act as a change agent for long-term technical strategy, identifying risks, dependencies, and opportunities across systems and teams • Participate in a sustainable on-call rotation and contribute to ongoing improvements that reduce alert fatigue and operational overhead • Build strong cross-functional relationships to align reliability initiatives with business and ministry outcomes • Contribute to the exercise and expression of Christian Care Ministry’s Christian beliefs • Perform all other duties as assigned

AWS Cloud EC2 SDLC Terraform

View details: Senior Site Reliability Engineer, SRE

Alabama + 16 more

$101K - $146K / year

Apply

Job Closed

Senior DevOps / SRE Engineer

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Senior DevOps Engineer

Senior DevOps Specialist

DevOps Engineer

Senior Site Reliability Engineer, SRE