This opportunity is available through a leading AI-driven work platform.
Cloud Infrastructure Evaluation Engineer
Location
Northern America + 3 moreAll locations: Northern America | Europe | Central America | Australia and New Zealand
Posted
10 days ago
Salary
$90 / hour
Seniority
Mid Level
Job Description
Cloud Infrastructure Evaluation Engineer
24-MAG
Role Description We are sharing a specialised part-time consulting opportunity for experienced DevOps, SRE, and cloud engineering professionals with strong backgrounds in infrastructure engineering, Kubernetes, CI/CD systems, observability, automation tooling, and AI coding agent workflows. This role supports current and upcoming remote consulting opportunities focused on evaluating complex infrastructure engineering tasks, reviewing coding-agent-generated implementations, assessing reliability and cloud architecture decisions, and applying practical engineering judgment to realistic DevOps, SRE, and cloud scenarios. Selected professionals may work with modern coding agents such as Cursor, Claude Code, Codex, Windsurf, Gemini CLI, or comparable tools to complete, review, and evaluate technical infrastructure workflows. Key Responsibilities - Infrastructure Engineering Evaluation - Complete and evaluate complex infrastructure engineering tasks using modern coding agent tools. - Review technical implementations involving cloud platforms, Kubernetes, CI/CD systems, infrastructure automation, and observability tooling. - Assess whether proposed solutions reflect realistic DevOps, SRE, and cloud engineering practices. - Apply professional engineering judgment to identify quality gaps, reliability concerns, and improvement areas. - Cloud, Kubernetes & CI/CD Review - Evaluate implementations involving AWS, Azure, GCP, Kubernetes, Terraform, CI/CD pipelines, and related infrastructure tooling. - Review cloud architecture decisions for scalability, maintainability, reliability, security awareness, and production-readiness. - Identify bugs, edge cases, misconfigurations, failure modes, and weak assumptions in infrastructure-related deliverables. - Provide structured feedback on deployment workflows, service reliability, monitoring coverage, and automation quality. - Coding Agent Output Assessment - Review coding-agent-generated infrastructure and reliability engineering solutions. - Compare outputs from multiple coding agents and assess strengths, weaknesses, accuracy, and practical usefulness. - Identify where generated solutions succeed, where they fail, and where human engineering judgment is required. - Document technical findings clearly for project teams and quality review workflows. - Technical Documentation & Scenario Feedback - Produce clear, structured evaluations of infrastructure engineering tasks and model-generated outputs. - Explain reasoning around cloud architecture, reliability engineering, CI/CD workflows, observability, and automation choices. - Support technical assessment workflows by documenting accepted work, improvement areas, and practical engineering conclusions. - Help ensure outputs reflect real-world infrastructure engineering standards and production-scale expectations. Qualifications - 2+ years of professional experience in DevOps, Site Reliability Engineering, Cloud Engineering, Infrastructure Engineering, or related technical roles. - Hands-on experience with AWS, Azure, GCP, Kubernetes, Terraform, CI/CD pipelines, observability tools, or infrastructure automation. - Regular use of AI coding agents such as Cursor, Claude Code, Codex, Windsurf, Gemini CLI, or similar tools. - Ability to evaluate coding-agent-generated infrastructure solutions for correctness, reliability, maintainability, and production fit. - Experience supporting production-scale systems is strongly preferred. - Strong ability to identify bugs, edge cases, reliability issues, and failure modes. - Clear written communication skills and comfort documenting technical reasoning in a remote, project-based environment. Educational Background - A degree in Computer Science, Software Engineering, Computer Engineering, Information Systems, Cloud Computing, Cybersecurity, or a related technical field is helpful. - Equivalent professional experience in DevOps, SRE, cloud infrastructure, platform engineering, or production systems is also highly relevant. Nice to Have - Experience with Terraform, Helm, GitHub Actions, GitLab CI/CD, Jenkins, ArgoCD, Prometheus, Grafana, Datadog, ELK, or comparable infrastructure tools. - Background in production incident response, reliability engineering, distributed systems, cloud security, or platform engineering. - Experience evaluating technical outputs, reviewing infrastructure code, or comparing implementation approaches. - Familiarity with multi-cloud environments, microservices architecture, container networking, service deployment, or observability design. - Strong comfort working in fast-moving sprint-based project environments. Why This Opportunity - Flexible, remote consulting work aligned with your DevOps, SRE, cloud infrastructure, and coding agent expertise. - Opportunity to evaluate realistic infrastructure engineering workflows involving cloud systems, Kubernetes, CI/CD, observability, and automation. - Suitable for engineers who enjoy practical technical assessment, tool-assisted coding workflows, and reliability-focused problem-solving. - Sprint-based project work that can align with part-time availability and remote schedules. Contract Details - Independent contractor engagement. - Fully remote and flexible scheduling. - Sprint-based, project-based availability. - Some project work may run in focused 12–24 hour sprint windows depending on project requirements. - Compensation may reach up to $90/hour, depending on project scope, experience, and accepted work structure. - Some projects may use accepted-task compensation depending on the specific workflow. - Payments are made weekly via Stripe or Wise based on services rendered. - Projects may be extended, shortened, adjusted, or concluded based on project needs and performance. - Eligible locations include various countries across Europe, North America, and Australia. - Candidates requiring H1-B or STEM OPT sponsorship support are not eligible at this time. - Work must not involve sharing confidential or proprietary information from any employer, client, or institution. About the Platform This opportunity is available through 24-MAG LLC. We connect experienced professionals with remote consulting opportunities across technical, evaluation, and project-based workstreams. By submitting this application, you acknowledge that your information may be processed by 24-MAG LLC for recruitment and opportunity matching in accordance with our Privacy Policy: https://www.24-mag.com/privacy-policy .
Related Guides
Related Categories
Related Job Pages
More Infrastructure Engineer Jobs
• Respond to incoming requests made via web, email, or phone within agreed-upon levels in customer service level agreements. • Identify and organize tickets and internal initiatives according to priority. • Update and maintain internal process documentation. • Configure, maintain and troubleshoot corporate environments and applications to support Verndale's employees and clients. • Provide technical support and training for colleagues. • Collaborate with engineering, leadership, and cross-functional team members to meet business goals. • Participate in on-call rotation and ensure uptime of services.
• Support the VP of Technology Engineering & Innovation in evaluating emerging AI infrastructure technologies and future-ready data center strategies. • Analyze AI workload characteristics including: Training vs. inference workloads, GPU utilization patterns, Dynamic workload fluctuations, Rack-level power variability, Networking and latency requirements. • Assess implications of AI workload behavior on infrastructure resiliency, scalability, efficiency, and operational design. • Develop technical recommendations and infrastructure strategies supporting future AI deployments. • Analyze current and future AI compute platforms including NVIDIA GPU architectures, ARM-based platforms, custom AI accelerators and ASICs, optical networking and switching technologies, and emerging hyperscaler-designed AI chips. • Evaluate implications of evolving chip architectures on rack density, power consumption, cooling requirements, electrical distribution, mechanical infrastructure, space planning, and future development standards. • Model current and future AI rack power density trends including existing high-density deployments (50–120 kW), near-term AI deployments (150–300+ kW), and future ultra-dense AI cluster scenarios. • Assess long-term impacts of emerging chip architectures on energy efficiency and future data center design and development standards. • Support conceptual and detailed design efforts for AI-ready data center infrastructure. • Assist in developing long-term infrastructure roadmaps for high-density AI deployments, liquid cooling adoption, modular infrastructure strategies, utility coordination, grid-parallel and microgrid solutions, and future AI campus development. • Evaluate implications of AI infrastructure evolution on greenfield developments, existing facility retrofits, construction methodologies, scalability, and future campus master planning. • Collaborate with engineering, development, and construction teams to develop scalable AI-ready infrastructure standards and deployment models. • Collaborate closely with the Energy Strategy Team to evaluate utility constraints, interconnection requirements, grid limitations, dynamic load fluctuation impacts, power quality and resiliency considerations, and onsite generation and distributed energy solutions. • Support analysis of grid-parallel and islanded microgrid architectures, fuel cells, Battery Energy Storage Systems (BESS), bridge power solutions, natural gas generation, and renewable integration opportunities. • Evaluate implications of AI workloads on substation development, transmission planning, utility coordination, and energy efficiency and PUE optimization. • Assess how future AI compute growth will influence utility planning and power infrastructure strategies. • Analyze current and emerging thermal management solutions including air cooling, direct-to-chip liquid cooling, immersion cooling, rear-door heat exchangers, and hybrid cooling architectures. • Assess implications of ultra-high-density AI deployments on mechanical system design, water usage, cooling scalability, heat rejection strategies, thermal resiliency, and future cooling infrastructure standards. • Evaluate cooling technologies and infrastructure requirements as AI rack densities continue to increase. • Interface directly with technology vendors, OEMs, utilities, and strategic partners across power generation, UPS systems, electrical infrastructure, cooling technologies, liquid cooling platforms, AI compute infrastructure, and networking and optical interconnect technologies. • Lead technical assessments of next-generation technologies with respect to reliability, scalability, energy efficiency, sustainability, AI workload performance, construction complexity, and operational resiliency. • Support proof-of-concept initiatives, pilot deployments, and technology benchmarking efforts. • Develop executive-level recommendations regarding adoption of emerging AI infrastructure technologies and strategic engineering standards. • Collaborate with design engineering, construction, operations, energy strategy, procurement, utilities, technology partners, and external engineering firms and consultants. • Support strategic planning initiatives and executive-level technical presentations. • Assist in developing future infrastructure standards and innovation roadmaps for AI-enabled data center platforms.
• The IT Infrastructure Engineer is responsible for the deployment, configuration and ongoing support of enterprise infrastructure across both virtual and physical environments. • The role focuses on Windows Server deployments, clustering technologies, storage integration (MPIO), identity and access solutions (PingFederate, UserLock MFA), and application onboarding within a secure, highly available environment. • Deploy and configure Windows Server environments across VMware vCenter (virtual builds) and physical servers. • Perform system provisioning including OS installation, patching and baseline security hardening. • Manage virtual machine lifecycle (provisioning, resizing, decommissioning). • Monitor and optimise resource utilisation across clusters. • Document onboarding processes and standards. • Maintain technical documentation including SOPs and runbooks. • Administer and maintain VMware vSphere / vCenter environments. • Design and implement Windows Failover Clustering solutions. • Configure and support Multipath I/O (MPIO) for SAN storage. • Ensure resilience and high availability of critical systems. • Lead/support application onboarding including authentication integration and infrastructure provisioning. • Collaborate with application owners to meet security and connectivity requirements. • Provide support for infrastructure incidents and requests. • Troubleshoot complex issues across Windows, VMware and identity platforms. • Participate in on-call support where required. • Ensure systems meet organisational security and compliance standards. • Support patching cycles and vulnerability remediation. • Work with cyber and compliance teams as required. • Contribute to continuous service improvement initiatives.
Principal Site Reliability Engineer - AI Infrastructure Operations
NscaleNscale is the Hyperscaler engineered for AI.
Role Description At Nscale, our AI Infrastructure Operations team is responsible for the reliability and scalability of one of the most demanding AI platforms in the industry. We value engineers who think in systems, lead through influence, and raise the bar for operational excellence across the organisation. We’re looking for a Principal Site Reliability Engineer (SRE) to provide technical leadership across our AI Infrastructure Operations domain. This is a senior, highly impactful role focused on setting reliability strategy, designing foundational systems, and driving cross-team improvements at scale. You will operate as a technical authority for reliability, automation, and operational architecture across Nscale’s GPU, network, and control-plane platforms. - Owning and evolving the long-term reliability strategy for Nscale’s AI and HPC infrastructure - Designing and leading the development of large-scale control-plane systems, automation frameworks, and operational tooling - Defining reliability standards, SLO frameworks, and operational best practices used across multiple teams - Acting as a senior technical escalation point during critical incidents, guiding resolution and ensuring systemic fixes - Identifying structural reliability risks and driving cross-functional initiatives to address them at the architectural level - Partnering with Engineering, Network Operations, and Fleet Operations leadership to influence platform design and operational maturity - Mentoring senior and mid-level engineers, raising the overall quality and effectiveness of SRE practices - Driving measurable improvements in availability, MTTR, cost efficiency, and operational scalability Qualifications - 10+ years of experience in Site Reliability Engineering, Systems Engineering, or Software Engineering roles operating complex, large-scale infrastructure - Expert-level software engineering skills, with a strong track record of building production-grade automation and systems - Deep expertise in Linux, networking, and distributed systems design at scale - Extensive experience debugging and resolving failures across hardware, OS, networking, and application layers - Proven ability to lead technical initiatives across teams without direct authority - Strong systems-thinking mindset, with the ability to balance reliability, velocity, and cost Requirements - Deep hands-on experience with AI or HPC platforms, including GPUs, high-speed interconnects (InfiniBand/RDMA), and workload schedulers (e.g. SLURM) - Experience designing observability systems for high-cardinality, high-throughput environments - Familiarity with Kubernetes at scale and hybrid or bare-metal cloud architectures - A history of driving step-change improvements in reliability, scalability, or operational efficiency Benefits - Collaborative, supportive, and innovative environment where your contributions spark real impact - Highly competitive package (base + equity) with reviews every 12 months - Opportunity to join the fastest-growing tech startup, pushing boundaries and collaborating with brilliant minds - Dynamic progression plan tailored to your ambitions - Human-First Flexibility: autonomy to shape your day around life's moments - Thriving remote-first team with seamless virtual collaboration



