Nick AI logo
Nick AI

We are building an AI Agent Trading Platform. Create your Agent, customize strategy & trade on your favorite exchanges.

Backend/DevOps Engineer

Location

United States

Posted

100 days ago

Salary

0

Seniority

Senior

Job Description

Backend/DevOps Engineer

Nick AI

• Design and manage infrastructure deployments using Docker, Kubernetes, and AWS/GCP. • Develop secure key management systems for API keys and wallet abstraction. • Implement monitoring, logging, and incident handling for execution nodes. • Set up CI/CD pipelines and streamlined developer workflows (GitHub Actions). • Optimize infrastructure for reliability, security, and scalability. • Collaborate with backend engineers to support execution and receipts systems.

Job Requirements

  • 5+ years in DevOps or backend infrastructure roles.
  • Expertise with Docker, Kubernetes, and cloud platforms (AWS/GCP).
  • Security-first mindset with experience in key management and access control.
  • Experience with monitoring and observability tools (Prometheus, Grafana, ELK).
  • Strong scripting skills (Python, Bash, or similar).
  • Proven track record scaling systems to production usage.
  • Background in fintech, trading, or Web3 infrastructure preferred.
  • Strong understanding of deployment best practices and incident response protocols.

Benefits

  • Competitive salary commensurate with experience
  • Flexible PTO and sick leave policies
  • Fully remote-friendly, with flexible working hours
  • Access to cutting-edge AI and trading technologies
  • Opportunity to design core infra for a fast-growing product
  • Direct impact on product reliability and security

Related Categories

Related Job Pages

More DevOps Engineer Jobs

WorkOS logo

Database Reliability Engineer

WorkOS

WorkOS is an internet company providing a developer platform that helps app-builders sell their apps to enterprise customers with only a few lines of code. Founded in 2019, the com

DevOps Engineer100 days ago

• Own the reliability, performance, and scalability of WorkOS's PostgreSQL infrastructure. • Analyze and implement best practices for our database clusters, including replication, connection pooling, high availability, and disaster recovery. • Build and maintain observability for database metrics (query performance, replication lag, connection saturation, storage growth) and ensure we meet our database SLOs. • Provide database expertise to product engineering teams through migration reviews, query optimization guidance, and schema design consultation. • Develop automation and self-service tooling that enables engineers to safely interact with databases without bottlenecking on the DBRE team. • Participate in on-call rotations and lead incident response for database-related production issues, performing root cause analysis and implementing permanent fixes. • Plan and manage database capacity, forecasting growth and ensuring our infrastructure can handle increased workloads. • Collaborate with SREs to roll out infrastructure changes to production environments, with a focus on minimizing risk to the data layer. • Document operational procedures, runbooks, and architectural decisions so learnings become repeatable actions and eventually automation. • Drive improvements to backup and recovery strategies, regularly testing and validating disaster recovery procedures.

United States
$175K - $275K / year
Job Closed
WorkOS logo

Site Reliability Engineer

WorkOS

WorkOS is an internet company providing a developer platform that helps app-builders sell their apps to enterprise customers with only a few lines of code. Founded in 2019, the com

DevOps Engineer100 days ago

• Design and evolve the systems, tooling, and processes that improve the reliability and performance of WorkOS • Collaborate with product and infrastructure teams to ensure services are production-ready, observable, and resilient to failure • Define and measure SLIs/SLOs to guide reliability improvements • Write and optimize backend systems (in TypeScript) with a focus on performance, maintainability, and graceful degradation • Improve our incident response process, lead postmortems, and drive follow-through on reliability risks • Develop internal tools and automations that make it easier to operate and scale our systems • Participate in our on-call rotation—responding to, resolving, and learning from production incidents • Contribute to design and architecture discussions with a focus on operability and long-term sustainability • Document systems, share learnings, and help grow a reliability-minded engineering culture

United States
$175K - $275K / year
Vultr logo

Network DevOps Engineer, RDMA Fabric Automation

Vultr

Vultr is on a mission to make high-performance cloud computing easy to use, affordable, and locally accessible.

DevOps Engineer100 days ago
OtherRemoteTeam 201-500Since 2014H1B No Sponsor

• Automate deployment and operations of large-scale RDMA (RoCEv2) Ethernet fabrics across Vultr data centers. • Build Ansible and Python-based frameworks to provision, validate, and remediate underlay and overlay networks. • Integrate network automation with Vultr’s source-of-truth systems (NetBox, OpsMill) for intent-driven configuration and validation. • Develop telemetry ingestion and correlation pipelines (gNMI, Prometheus, Kafka, custom collectors) for real-time network health and performance metrics. • Collaborate with platform, orchestration, and product engineering teams to optimize RDMA performance, PFC/ECN behavior, and path symmetry across fabrics. • Implement CI/CD workflows for network configuration changes — validation, pre-checks, and rollbacks. • Investigate complex network behaviors across layers — flow hashing, congestion domains, ECMP, and overlay interactions. • Contribute to the design of next-generation GPU and AI interconnect fabrics, ensuring seamless integration into Vultr’s global network architecture.

United States
$90K - $130K / year
Vultr logo

Senior Site Reliability Engineer, Core Cloud Engineering

Vultr

Vultr is on a mission to make high-performance cloud computing easy to use, affordable, and locally accessible.

DevOps Engineer100 days ago
OtherRemoteTeam 201-500Since 2014H1B No Sponsor

• Operate and scale Vultr’s control plane, ensuring availability, correctness, and performance across global datacenters. • Design, implement, and maintain automation to manage hypervisor fleets (KVM, QEMU, libvirt) and supporting infrastructure at scale. • Develop tooling and automation for Open vSwitch (OVS), BGP routing, and other networking components to ensure resilient and self-healing network operations. • Continuously analyze and improve system performance across compute, storage, and network layers, with an emphasis on reducing toil and eliminating single points of failure. • Implement advanced monitoring, logging, and tracing solutions (Grafana, Sentry, SumoLogic) while leading incident response to minimize impact and drive postmortem culture. • Maintain and evolve infrastructure pipelines (GitLab CI/CD, Puppet) to enable safe, fast, and reliable changes to both control plane and hypervisor infrastructure. • Work closely with Software Engineers, Network Engineers, and Product teams to align platform reliability with business and user needs. • Produce clear technical documentation for runbooks, operational procedures, and automation frameworks to improve team efficiency and reliability standards. • Coach and mentor team members in best practices for site reliability, incident handling, automation, and low-level Linux systems debugging.

United States
$120K - $130K / year