Job Closed
This listing is no longer active.
Full commerce is the future — we get you there now.
DevOps Infrastructure Engineer – Systems Administrator
Location
United States
Posted
102 days ago
Salary
$75K - $87K / year
Seniority
Mid Level
Job Description
DevOps Infrastructure Engineer – Systems Administrator
NMI
• Supporting the day-to-day operations of our production infrastructure — resolving incidents, investigating alerts, and keeping systems healthy • Helping maintain a regular patching cycle across Linux-based systems to meet security and compliance requirements • Writing and maintaining Ansible playbooks or similar configuration management code to automate provisioning and configuration tasks • Assisting with the deployment and management of cloud load balancers, WAFs, and other network infrastructure • Contributing to hardware upgrades and replacements at colocation facilities • Participating in incident reviews and post-mortems, helping identify root causes and preventative measures • Collaborating with team members across time zones using clear written and verbal communication
Job Requirements
- 2–4 years of hands-on experience in a Linux systems or DevOps role
- Solid Linux fundamentals — comfortable administering servers, writing basic shell scripts, and configuring web servers (Red Hat/CentOS experience a bonus)
- Exposure to configuration management tools such as Ansible or Puppet
- Familiarity with DevOps workflows and tools — Git, GitLab, VS Code
- Good working knowledge of TCP/IP networking and how web traffic flows
- Understanding of TLS certificates and basic PKI concepts
- Strong attention to detail and a methodical approach to problem-solving
- Clear communicator who can document work and collaborate across teams and time zones.
- Experience with cloud platforms (AWS, GCP, Azure) is preferred.
- Exposure to load balancers (F5 BIG-IP or similar) is preferred.
- Familiarity with observability tools such as Grafana, Prometheus, or the ELK stack is preferred.
- Any exposure to WAF/DDoS platforms (Cloudflare, Akamai, F5 XC) is preferred.
- Experience with Redis, RabbitMQ, or HashiCorp Vault is preferred.
- Familiarity with line of business tools such as Confluence, Netbox, and Jira is preferred.
Benefits
- Annual salary of $75,000 - $87,000 + bonus
- A remote first culture!
- Flex PTO
- Health, Dental and Vision Insurance
- 13 Paid Holidays
- Company volunteer days
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Senior Site Reliability Engineer
ZocdocZocdoc is the beginning of a better healthcare experience for millions of patients every month.
• Monitoring and maintaining complex cloud-based infrastructure, systems, and services and ensuring their uptime to help millions of patients get the care they need • Automating and developing our tooling, processes, and infrastructure to speed up development and make them repeatable and error-proof • Supporting our large product engineering org with their scaling, performance, and uptime needs as well as helping diagnose and debug production related issues • Analyzing and performance tuning systems, code, and networking for scaling and optimal operation • Working with cutting edge GenAI tools and technology
Senior DevOps Engineer
ReveleerReveleer is an AI-powered healthcare data and analytics company that delivers a unified value-based care platform integrating clinical intelligence, risk adjustment, quality improv
• Architect, build, and maintain scalable and secure cloud infrastructure across AWS, Azure, and GCP. • Design and implement multi-region, fault-tolerant architectures that support 24/7 SaaS healthcare operations. • Lead Infrastructure as Code (IaC) development using Terraform, CloudFormation, Pulumi, or equivalent. • Build, optimize, and maintain CI/CD pipelines using tools such as Bitbucket, GitHub Actions, GitLab CI, Jenkins, CircleCI, etc. • Automate repeatable processes, deployments, and operational tasks to increase reliability and reduce human error. • Implement end-to-end automated testing frameworks integrated into deployment workflows. • Drive SRE principles, including SLIs/SLOs/SLA management, observability, and proactive reliability improvements. • Implement and maintain logging, monitoring, alerting, and distributed tracing (e.g., New Relic, Datadog, Prometheus, Grafana, ELK). • Lead major incident response, root cause analysis, and post-mortem processes. • Implement DevSecOps best practices, embedding security into CI/CD and infrastructure workflows. • Collaborate with the Security team to ensure controls meet HIPAA, HITRUST, SOC2, NIST, and CIS requirements. • Manage secret stores, identity/access controls, certificate management, and vulnerability remediation. • Architect and maintain cloud networking, including VPCs, Firewalls, WAFs, VPNs, load balancers, service meshes, and hybrid networking. • Support secure integrations between platforms, SaaS systems, and 3rd-party vendors. • Partner with Software Engineering to enable rapid development while maintaining operational excellence. • Work with SRE, Security, QA, and Data teams to optimize performance, automation, and compliance. • Mentor junior engineers and contribute to team standards, design reviews, and architecture discussions.
DevOps Engineer – IST Timezone
testRigortestRigor is the #1 generative AI-based codeless test automation tool for manual testers and product managers.
• The DevOps Engineer (IST Timezone) will be responsible for designing, automating, and maintaining cloud infrastructure and deployment pipelines for a global SaaS platform. • You will collaborate closely with developers and technical leadership in an agile, cross-functional team, driving best practices in automation, security, and reliability. • As a key contributor, you will address challenging infrastructure problems and deliver scalable solutions in fast-paced, start-up environments. • Estimation and planning for infrastructure and automation tasks. • Analysis of requirements to develop robust, maintainable systems. • Designing and implementing Infrastructure as Code (IaC) solutions using Terraform and related toolchains. • Managing cloud infrastructure across Azure, AWS, Google Cloud (GC), and Cloudflare (CF). • Deploying, maintaining, and optimizing Kubernetes (k8s) clusters and containerized workloads. • Developing scripts and automation using Python, Bash, and PowerShell. • Administering MongoDB databases and ensuring high availability and backups. • Maintaining systems across Linux, macOS, and Windows environments. • Supporting development teams with CI/CD pipelines, automated testing, and continuous delivery. • Configuring networks, VPNs, and HTTP reverse proxies for secure and efficient communication. • Implementing best practices for source control with Git and automating workflows. • Monitoring systems for reliability, performance, and security.
Customer Site Reliability Engineer – OpenShift Managed Cloud Services, Spoken Japanese, Kubernetes/AWS/Azure, Linux
Red HatThe leading provider of enterprise open source solutions.
• Manage large-scale, distributed systems, focusing on minimizing downtime and improving system resilience. • Maintain customer trust and confidence by ensuring stability and functionality of services. • Drive continuous enhancement of processes, tools, and methodologies to support the evolving needs of the service. • Lead the development of code and automation scripts to optimize the scalability, reliability, and performance of services. • Lead and participate in high-priority customer escalations, adopting a customer-first mindset. • Coordinate and execute complex incident response procedures, ensuring timely resolution and thorough postmortems. • Collaborate with cross-functional teams to enhance system robustness. • Demonstrate a proactive mindset to help preempt escalations and ensure reliable operations. • Document resolutions, root causes, and best practices to enrich the knowledge base and promote self-service solutions. • Mentor and coach team members, fostering a culture of continuous learning, knowledge sharing and collaboration. • Participate in on-call rotation and provide leadership during critical incidents. • Collaborate on strategic AI and automation projects designed to increase the efficiency of fleet operations and troubleshooting, ultimately delivering a better product experience for customers.



