Nick AI

We are building an AI Agent Trading Platform. Create your Agent, customize strategy & trade on your favorite exchanges.

Backend/DevOps Engineer

DevOps EngineerDevOps EngineerOther Remote SeniorTeam 1-10Since 2024Company Site LinkedIn

Location

United States

Posted

155 days ago

Salary

Seniority

Senior

5 yrs expEnglishAWS Docker GCP Grafana Kubernetes Prometheus Python Blockchain / Web3

Job Description

• Design and manage infrastructure deployments using Docker, Kubernetes, and AWS/GCP. • Develop secure key management systems for API keys and wallet abstraction. • Implement monitoring, logging, and incident handling for execution nodes. • Set up CI/CD pipelines and streamlined developer workflows (GitHub Actions). • Optimize infrastructure for reliability, security, and scalability. • Collaborate with backend engineers to support execution and receipts systems.

Job Requirements

5+ years in DevOps or backend infrastructure roles.
Expertise with Docker, Kubernetes, and cloud platforms (AWS/GCP).
Security-first mindset with experience in key management and access control.
Experience with monitoring and observability tools (Prometheus, Grafana, ELK).
Strong scripting skills (Python, Bash, or similar).
Proven track record scaling systems to production usage.
Background in fintech, trading, or Web3 infrastructure preferred.
Strong understanding of deployment best practices and incident response protocols.

Benefits

Competitive salary commensurate with experience
Flexible PTO and sick leave policies
Fully remote-friendly, with flexible working hours
Access to cutting-edge AI and trading technologies
Opportunity to design core infra for a fast-growing product
Direct impact on product reliability and security

Related Categories

DevOps Engineer

Related Job Pages

Remote Python Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

Database Reliability Engineer

WorkOS

Your app, Enterprise Ready.

DevOps Engineer155 days ago

Other RemoteTeam 51-200Since 2019H1B Sponsor

Company Site LinkedIn

• Own the reliability, performance, and scalability of WorkOS's PostgreSQL infrastructure. • Analyze and implement best practices for our database clusters, including replication, connection pooling, high availability, and disaster recovery. • Build and maintain observability for database metrics (query performance, replication lag, connection saturation, storage growth) and ensure we meet our database SLOs. • Provide database expertise to product engineering teams through migration reviews, query optimization guidance, and schema design consultation. • Develop automation and self-service tooling that enables engineers to safely interact with databases without bottlenecking on the DBRE team. • Participate in on-call rotations and lead incident response for database-related production issues, performing root cause analysis and implementing permanent fixes. • Plan and manage database capacity, forecasting growth and ensuring our infrastructure can handle increased workloads. • Collaborate with SREs to roll out infrastructure changes to production environments, with a focus on minimizing risk to the data layer. • Document operational procedures, runbooks, and architectural decisions so learnings become repeatable actions and eventually automation. • Drive improvements to backup and recovery strategies, regularly testing and validating disaster recovery procedures.

Ansible AWS Chef DynamoDB Grafana PostgreSQL Prometheus Python Ruby SQL Terraform

View details: Database Reliability Engineer

United States

$175K - $275K / year

Apply

Job Closed

Site Reliability Engineer

WorkOS

Your app, Enterprise Ready.

DevOps Engineer155 days ago

Other RemoteTeam 51-200Since 2019H1B Sponsor

Company Site LinkedIn

• Design and evolve the systems, tooling, and processes that improve the reliability and performance of WorkOS • Collaborate with product and infrastructure teams to ensure services are production-ready, observable, and resilient to failure • Define and measure SLIs/SLOs to guide reliability improvements • Write and optimize backend systems (in TypeScript) with a focus on performance, maintainability, and graceful degradation • Improve our incident response process, lead postmortems, and drive follow-through on reliability risks • Develop internal tools and automations that make it easier to operate and scale our systems • Participate in our on-call rotation—responding to, resolving, and learning from production incidents • Contribute to design and architecture discussions with a focus on operability and long-term sustainability • Document systems, share learnings, and help grow a reliability-minded engineering culture

AWS Grafana Kubernetes Prometheus TypeScript

View details: Site Reliability Engineer

United States

$175K - $275K / year

Apply

Network DevOps Engineer, RDMA Fabric Automation

Vultr

Vultr is on a mission to make high-performance cloud computing easy to use, affordable, and locally accessible.

DevOps Engineer155 days ago

Other RemoteTeam 201-500Since 2014H1B No Sponsor

Company Site LinkedIn

• Automate deployment and operations of large-scale RDMA (RoCEv2) Ethernet fabrics across Vultr data centers. • Build Ansible and Python-based frameworks to provision, validate, and remediate underlay and overlay networks. • Integrate network automation with Vultr’s source-of-truth systems (NetBox, OpsMill) for intent-driven configuration and validation. • Develop telemetry ingestion and correlation pipelines (gNMI, Prometheus, Kafka, custom collectors) for real-time network health and performance metrics. • Collaborate with platform, orchestration, and product engineering teams to optimize RDMA performance, PFC/ECN behavior, and path symmetry across fabrics. • Implement CI/CD workflows for network configuration changes — validation, pre-checks, and rollbacks. • Investigate complex network behaviors across layers — flow hashing, congestion domains, ECMP, and overlay interactions. • Contribute to the design of next-generation GPU and AI interconnect fabrics, ensuring seamless integration into Vultr’s global network architecture.

Ansible Grafana Jenkins Apache Kafka Linux PHP Prometheus Python Rust

View details: Network DevOps Engineer, RDMA Fabric Automation

United States

$90K - $130K / year

Apply

Senior Site Reliability Engineer, Core Cloud Engineering

Vultr

Vultr is on a mission to make high-performance cloud computing easy to use, affordable, and locally accessible.

DevOps Engineer155 days ago

Other RemoteTeam 201-500Since 2014H1B No Sponsor

Company Site LinkedIn

• Operate and scale Vultr’s control plane, ensuring availability, correctness, and performance across global datacenters. • Design, implement, and maintain automation to manage hypervisor fleets (KVM, QEMU, libvirt) and supporting infrastructure at scale. • Develop tooling and automation for Open vSwitch (OVS), BGP routing, and other networking components to ensure resilient and self-healing network operations. • Continuously analyze and improve system performance across compute, storage, and network layers, with an emphasis on reducing toil and eliminating single points of failure. • Implement advanced monitoring, logging, and tracing solutions (Grafana, Sentry, SumoLogic) while leading incident response to minimize impact and drive postmortem culture. • Maintain and evolve infrastructure pipelines (GitLab CI/CD, Puppet) to enable safe, fast, and reliable changes to both control plane and hypervisor infrastructure. • Work closely with Software Engineers, Network Engineers, and Product teams to align platform reliability with business and user needs. • Produce clear technical documentation for runbooks, operational procedures, and automation frameworks to improve team efficiency and reliability standards. • Coach and mentor team members in best practices for site reliability, incident handling, automation, and low-level Linux systems debugging.

Distributed Systems Grafana Linux MySQL PHP Puppet

View details: Senior Site Reliability Engineer, Core Cloud Engineering

United States

$120K - $130K / year

Apply

Backend/DevOps Engineer

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Database Reliability Engineer

Site Reliability Engineer

Network DevOps Engineer, RDMA Fabric Automation

Senior Site Reliability Engineer, Core Cloud Engineering