Made4net

Made4net is a global leader in supply chain execution and warehouse management systems (WMS), delivering agile, scalable, and unified solutions that help companies streamline opera

Cloud Operations Engineer

Location

United States

Posted

2 days ago

Salary

$120K / year

Seniority

Mid Level

No structured requirement data.

Job Description

Cloud Operations Engineer

Made4net

Role Description We’re looking for a hands-on Cloud Operations Engineer to join our global operations team. This is a technical, hands-on role at the intersection of cloud infrastructure, production support, and security/ops monitoring. We operate a follow-the-sun support model across global regions, with coverage expectations in shifts during weekends/holiday. You’ll support mission-critical systems across AWS, Windows, and Linux ensuring reliability, performance, and rapid incident response. Key Responsibilities - Provide production support as part of a global follow-the-sun operations team - monitor systems, triage alerts, respond to incidents, and ensure service continuity across all environments. - Monitor infrastructure and application health across security and operations dashboards; detect, triage, and respond to performance, availability, and security alerts; produce and maintain SLA reporting. - Deploy, manage, and maintain AWS infrastructure with a primary focus on EC2-based workloads across Windows and Linux environments. - Configure and manage load balancers (ALB/NLB), including URL rewrite rules, routing policies, and SSL termination for web applications. - Design and maintain high-availability architectures on AWS, ensuring redundancy, multi-AZ deployments, and tested failover procedures. - Own backup and disaster recovery operations - scheduling, retention policies, monitoring, and regular restoration testing. - Automate configuration management and application update deployments using Ansible, ensuring consistency and minimal downtime across all environments. - Manage database infrastructure across environments - provisioning, patching, performance tuning, and backup/recovery operations. - Troubleshoot and configure networking components including VPC, subnets, routing tables, DNS, security groups, and firewall rules. - Handle incident escalation, maintain shift handover documentation, and contribute to post-mortems and continuous improvement of support processes. - Maintain infrastructure runbooks, operational documentation, and contribute to automation and tooling improvements. - Support the company’s ongoing transition toward microservices and containerized architectures, contributing operational knowledge and helping ensure smooth adoption in production environments. Qualifications - Bachelor’s degree in computer science, Information Technology or a related field, or equivalent experience through certifications, vocational training, or hands-on work. - 3-5 years of hands-on experience in a cloud infrastructure, systems engineering, or production support role. - Comfortable with a follow-the-sun support model; availability for occasional weekend coverage is expected as part of the global team rotation. - Strong hands-on experience with AWS, specifically EC2, ALB/NLB, VPC, S3, and AWS Backup services. - Solid experience administering Windows and Linux servers in an enterprise IaaS environment. - Hands-on experience with Ansible for configuration management and automated application deployments, including managing update pipelines. - Proven experience designing and maintaining high-availability architectures on AWS (multi-AZ, auto-scaling, failover). - Proficiency in configuring ALB/NLB, including URL rewrite rules, path-based and host-based routing, and listener rules for web applications. - Solid understanding of AWS networking: VPC design, subnets, routing tables, security groups, DNS, and VPN connectivity. - Hands-on experience managing relational database infrastructure (provisioning, patching, monitoring, and performance tuning (MSSQL Server, Postgres, Oracle). - Experience with security and ops monitoring - interpreting alerts, identifying anomalies, triaging incidents, and escalating appropriately. - Experience with infrastructure and application performance tracking; able to interpret metrics and respond to anomalies. - Scripting proficiency in Bash, Python, or PowerShell for automation and operational tasks. Preferred Qualifications - AWS certifications (Solutions Architect, SysOps Administrator, or equivalent). - Experience with infrastructure-as-code tools such as Terraform or CloudFormation. - Experience with ITSM / ticketing systems in a production support context (Zendesk preferred). - Experience with Grafana or similar monitoring and observability platforms (Grafana preferred). - Familiarity with CI/CD pipelines and DevOps practices. Benefits - Health insurance (medical, dental, vision) with a robust wellness program to support your physical and mental well-being. - Generous paid time off policy. - Company-matched 401(k) retirement plan to help you secure your future. - Tuition reimbursement program to support your continued education and career advancement. - Employee assistance program providing confidential counseling and support services for personal challenges. - Discretionary employee bonus program. - Employee Discounts and perks through our PEO. - Pay range: Starting from $120,000, per year salary.

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Full TimeRemoteTeam 51-200H1B No Sponsor

• Serve as the primary technical owner for production reliability across U.S. customer environments. • Investigate and resolve complex issues spanning web applications, APIs, backend services, data pipelines, cloud infrastructure, and customer integrations. • Lead production incident response efforts, coordinating cross-functional teams to restore service and minimize customer impact. • Perform root cause analysis and drive corrective actions that improve long-term system stability and resilience. • Partner with software engineering and platform teams to identify recurring reliability risks and implement sustainable solutions. • Design, configure, and validate secure customer connectivity solutions including Site-to-Site VPNs, Transit Gateway integrations, routing configurations, and secure network paths. • Support customer onboarding initiatives by troubleshooting connectivity challenges and ensuring consistent implementation processes. • Enhance platform observability through improvements in monitoring, logging, alerting, tracing, and operational dashboards. • Contribute to CI/CD, infrastructure automation, and deployment processes that improve release safety and operational consistency. • Develop operational tooling that supports incident response, troubleshooting, onboarding, and system monitoring activities. • Collaborate with engineering leadership to improve cloud architecture, scalability, security, and operational readiness. • Partner with customer-facing teams to communicate technical issues, remediation plans, and reliability improvements in a clear and effective manner. • Support compliance, security, and risk management initiatives within highly regulated healthcare environments.

United States
Viasoft Korp | Industry ERP logo

Estágio DevOps

Viasoft Korp | Industry ERP

O sistema de gestão nascido na indústria que vive e respira processos industriais e distribuição 💙

DevOps Engineer2 days ago
InternshipRemoteTeam 51-200Since 1999H1B No Sponsor

• Auxiliar em rotinas envolvendo: - microsserviços; - containers; - observabilidade; - alta disponibilidade; - integração contínua; - entrega contínua; - monitoramento; - deploy contínuo. • Auxiliar em rotinas de automação de infraestrutura em ambientes cloud e on-premise; • Auxiliar na implantação e evolução de ambientes Kubernetes; • Auxiliar na automatização de processos utilizando Ansible; • Auxiliar na implementação, evolução de monitoramentos e observabilidade com Grafana; • Atuar no auxílio de resolução de incidentes e troubleshooting entre serviços e ambientes; • Auxiliar na garantia de estabilidade, disponibilidade e performance dos ambientes; • Auxiliar na evolução de pipelines e ferramentas de CI/CD; • Documentar procedimentos, fluxos e configurações; • Participar ativamente da evolução tecnológica da plataforma da Korp.

Brazil
R$1.3K / month
Full TimeRemoteTeam 5,001-10,000H1B Sponsor

• Owning the SRE infrastructure lifecycle from design reviews and pre-rollout readiness assessments through production sign-off and ongoing reliability management • Designing and implementing frameworks that reflect customer experience for load balancing services and driving action when error budgets are at risk • Building and maintaining observability pipelines from load-balancing components and system-level sources to dashboards that enable rapid incident triage • Leading technical incident response for complex NB/NLB failures, acting as the technical commander and driving root cause analysis and preventive follow-through • Developing and automating safe deployment workflows for phased releases, including bake-period monitoring, feature flag management, and validation across global datacenter rollouts • Reviewing design documents, product-requirement documents and producing actionable SRE input on operational risks, capacity implications, Day-2 concerns, and product strategy gaps • Building automation and tooling using Python or Go that reduces operational toil and improves team-wide operational capability

Canada
$120.4K - $216.6K / year
Full TimeRemoteTeam 5,001-10,000H1B Sponsor

• designing, developing, testing, and operating critical services that support the reliability, scalability, and performance of our infrastructure • designing and implementing observability solutions, including monitoring, logging, alerting, and telemetry capabilities, to proactively detect and resolve issues • driving reliability improvements through automation, reducing operational toil and increasing the resilience of engineering processes • developing deep technical expertise in IAC systems and serving as a trusted technical resource, mentoring engineers and sharing best practices • collaborating with software engineering, infrastructure, and platform teams to investigate complex production issues, identify root causes, and implement long-term corrective actions • participating in an on-call rotation and providing leadership during incident response, driving timely service restoration, effective communication, and post-incident improvement efforts.

Massachusetts
$121.4K - $218.6K / year