Made4net is a global leader in supply chain execution and warehouse management systems (WMS), delivering agile, scalable, and unified solutions that help companies streamline opera

Cloud Operations Engineer

DevOps EngineerDevOps EngineerFull Time Remote Mid Level

Location

United States

Posted

2 days ago

Salary

$120K / year

Seniority

Mid Level

No structured requirement data.

Job Description

Role Description We’re looking for a hands-on Cloud Operations Engineer to join our global operations team. This is a technical, hands-on role at the intersection of cloud infrastructure, production support, and security/ops monitoring. We operate a follow-the-sun support model across global regions, with coverage expectations in shifts during weekends/holiday. You’ll support mission-critical systems across AWS, Windows, and Linux ensuring reliability, performance, and rapid incident response. Key Responsibilities - Provide production support as part of a global follow-the-sun operations team - monitor systems, triage alerts, respond to incidents, and ensure service continuity across all environments. - Monitor infrastructure and application health across security and operations dashboards; detect, triage, and respond to performance, availability, and security alerts; produce and maintain SLA reporting. - Deploy, manage, and maintain AWS infrastructure with a primary focus on EC2-based workloads across Windows and Linux environments. - Configure and manage load balancers (ALB/NLB), including URL rewrite rules, routing policies, and SSL termination for web applications. - Design and maintain high-availability architectures on AWS, ensuring redundancy, multi-AZ deployments, and tested failover procedures. - Own backup and disaster recovery operations - scheduling, retention policies, monitoring, and regular restoration testing. - Automate configuration management and application update deployments using Ansible, ensuring consistency and minimal downtime across all environments. - Manage database infrastructure across environments - provisioning, patching, performance tuning, and backup/recovery operations. - Troubleshoot and configure networking components including VPC, subnets, routing tables, DNS, security groups, and firewall rules. - Handle incident escalation, maintain shift handover documentation, and contribute to post-mortems and continuous improvement of support processes. - Maintain infrastructure runbooks, operational documentation, and contribute to automation and tooling improvements. - Support the company’s ongoing transition toward microservices and containerized architectures, contributing operational knowledge and helping ensure smooth adoption in production environments. Qualifications - Bachelor’s degree in computer science, Information Technology or a related field, or equivalent experience through certifications, vocational training, or hands-on work. - 3-5 years of hands-on experience in a cloud infrastructure, systems engineering, or production support role. - Comfortable with a follow-the-sun support model; availability for occasional weekend coverage is expected as part of the global team rotation. - Strong hands-on experience with AWS, specifically EC2, ALB/NLB, VPC, S3, and AWS Backup services. - Solid experience administering Windows and Linux servers in an enterprise IaaS environment. - Hands-on experience with Ansible for configuration management and automated application deployments, including managing update pipelines. - Proven experience designing and maintaining high-availability architectures on AWS (multi-AZ, auto-scaling, failover). - Proficiency in configuring ALB/NLB, including URL rewrite rules, path-based and host-based routing, and listener rules for web applications. - Solid understanding of AWS networking: VPC design, subnets, routing tables, security groups, DNS, and VPN connectivity. - Hands-on experience managing relational database infrastructure (provisioning, patching, monitoring, and performance tuning (MSSQL Server, Postgres, Oracle). - Experience with security and ops monitoring - interpreting alerts, identifying anomalies, triaging incidents, and escalating appropriately. - Experience with infrastructure and application performance tracking; able to interpret metrics and respond to anomalies. - Scripting proficiency in Bash, Python, or PowerShell for automation and operational tasks. Preferred Qualifications - AWS certifications (Solutions Architect, SysOps Administrator, or equivalent). - Experience with infrastructure-as-code tools such as Terraform or CloudFormation. - Experience with ITSM / ticketing systems in a production support context (Zendesk preferred). - Experience with Grafana or similar monitoring and observability platforms (Grafana preferred). - Familiarity with CI/CD pipelines and DevOps practices. Benefits - Health insurance (medical, dental, vision) with a robust wellness program to support your physical and mental well-being. - Generous paid time off policy. - Company-matched 401(k) retirement plan to help you secure your future. - Tuition reimbursement program to support your continued education and career advancement. - Employee assistance program providing confidential counseling and support services for personal challenges. - Discretionary employee bonus program. - Employee Discounts and perks through our PEO. - Pay range: Starting from $120,000, per year salary.

Related Categories

DevOps Engineer

Related Job Pages

Remote Full-time Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

Principal Site Reliability Engineer, SRE

SoluStaff

People Powering Technology

DevOps Engineer2 days ago

Full Time RemoteTeam 51-200H1B No Sponsor

Company Site LinkedIn

• Serve as the primary technical owner for production reliability across U.S. customer environments. • Investigate and resolve complex issues spanning web applications, APIs, backend services, data pipelines, cloud infrastructure, and customer integrations. • Lead production incident response efforts, coordinating cross-functional teams to restore service and minimize customer impact. • Perform root cause analysis and drive corrective actions that improve long-term system stability and resilience. • Partner with software engineering and platform teams to identify recurring reliability risks and implement sustainable solutions. • Design, configure, and validate secure customer connectivity solutions including Site-to-Site VPNs, Transit Gateway integrations, routing configurations, and secure network paths. • Support customer onboarding initiatives by troubleshooting connectivity challenges and ensuring consistent implementation processes. • Enhance platform observability through improvements in monitoring, logging, alerting, tracing, and operational dashboards. • Contribute to CI/CD, infrastructure automation, and deployment processes that improve release safety and operational consistency. • Develop operational tooling that supports incident response, troubleshooting, onboarding, and system monitoring activities. • Collaborate with engineering leadership to improve cloud architecture, scalability, security, and operational readiness. • Partner with customer-facing teams to communicate technical issues, remediation plans, and reliability improvements in a clear and effective manner. • Support compliance, security, and risk management initiatives within highly regulated healthcare environments.

AWS Cloud Django Grafana Kubernetes Python Terraform

View details: Principal Site Reliability Engineer, SRE

United States

Apply

Estágio DevOps

Viasoft Korp | Industry ERP

O sistema de gestão nascido na indústria que vive e respira processos industriais e distribuição 💙

DevOps Engineer2 days ago

Internship RemoteTeam 51-200Since 1999H1B No Sponsor

Company Site LinkedIn

• Auxiliar em rotinas envolvendo: - microsserviços; - containers; - observabilidade; - alta disponibilidade; - integração contínua; - entrega contínua; - monitoramento; - deploy contínuo. • Auxiliar em rotinas de automação de infraestrutura em ambientes cloud e on-premise; • Auxiliar na implantação e evolução de ambientes Kubernetes; • Auxiliar na automatização de processos utilizando Ansible; • Auxiliar na implementação, evolução de monitoramentos e observabilidade com Grafana; • Atuar no auxílio de resolução de incidentes e troubleshooting entre serviços e ambientes; • Auxiliar na garantia de estabilidade, disponibilidade e performance dos ambientes; • Auxiliar na evolução de pipelines e ferramentas de CI/CD; • Documentar procedimentos, fluxos e configurações; • Participar ativamente da evolução tecnológica da plataforma da Korp.

Ansible Cloud Grafana Jenkins Kubernetes Linux

View details: Estágio DevOps

Brazil

R$1.3K / month

Apply

Senior Site Reliability Engineer

Akamai Technologies

DevOps Engineer2 days ago

Full Time RemoteTeam 5,001-10,000H1B Sponsor

Company Site LinkedIn

• Owning the SRE infrastructure lifecycle from design reviews and pre-rollout readiness assessments through production sign-off and ongoing reliability management • Designing and implementing frameworks that reflect customer experience for load balancing services and driving action when error budgets are at risk • Building and maintaining observability pipelines from load-balancing components and system-level sources to dashboards that enable rapid incident triage • Leading technical incident response for complex NB/NLB failures, acting as the technical commander and driving root cause analysis and preventive follow-through • Developing and automating safe deployment workflows for phased releases, including bake-period monitoring, feature flag management, and validation across global datacenter rollouts • Reviewing design documents, product-requirement documents and producing actionable SRE input on operational risks, capacity implications, Day-2 concerns, and product strategy gaps • Building automation and tooling using Python or Go that reduces operational toil and improves team-wide operational capability

Ansible Distributed Systems Kubernetes Linux Python SaltStack Terraform Go

View details: Senior Site Reliability Engineer

Canada

$120.4K - $216.6K / year

Apply

Senior Site Reliability Engineer

Akamai Technologies

DevOps Engineer2 days ago

Full Time RemoteTeam 5,001-10,000H1B Sponsor

Company Site LinkedIn

• designing, developing, testing, and operating critical services that support the reliability, scalability, and performance of our infrastructure • designing and implementing observability solutions, including monitoring, logging, alerting, and telemetry capabilities, to proactively detect and resolve issues • driving reliability improvements through automation, reducing operational toil and increasing the resilience of engineering processes • developing deep technical expertise in IAC systems and serving as a trusted technical resource, mentoring engineers and sharing best practices • collaborating with software engineering, infrastructure, and platform teams to investigate complex production issues, identify root causes, and implement long-term corrective actions • participating in an on-call rotation and providing leadership during incident response, driving timely service restoration, effective communication, and post-incident improvement efforts.

Ansible Chef Distributed Systems Jenkins Puppet Python SaltStack Terraform Go

View details: Senior Site Reliability Engineer

Massachusetts

$121.4K - $218.6K / year

Apply