Helping Visionaries Change the World

AWS Infrastructure Engineer

Infrastructure EngineerInfrastructure EngineerFull Time Remote Mid LevelTeam 501-1,000Since 1989H1B No SponsorCompany Site LinkedIn

Location

Worldwide

Posted

2 days ago

Salary

Seniority

Mid Level

AWS Infrastructure as Code Terraform CI/CD Amazon Lambda Amazon S3 Jenkins GitHub Actions Observability/Monitoring

Job Description

Role Description We are looking for an AWS Infrastructure Engineer with strong expertise in Amazon Connect and Infrastructure as Code (IaC). The ideal candidate will be responsible for designing, deploying, and managing scalable Amazon Connect environments while building automated infrastructure deployment pipelines using Terraform and CI/CD tools. - Design and implement AWS infrastructure for Amazon Connect environments - Deploy and manage Amazon Connect environments across Dev, Staging, and Production - Utilize AWS services such as Lambda, S3, EventBridge, Data Bridges, and related services to support deployments - Develop and maintain Infrastructure as Code (IaC) modules using Terraform Enterprise - Ensure Terraform code is maintainable, reusable, and aligned with best practices - Build and manage CI/CD pipelines for automated infrastructure deployment and updates - Integrate Terraform Enterprise with CI/CD pipelines for seamless deployments - Monitor infrastructure performance and troubleshoot deployment-related issues - Collaborate with cross-functional teams to ensure secure, scalable, and reliable cloud solutions Qualifications - 4+ years of experience in AWS services, especially Amazon Connect, Lambda, S3, EventBridge, and Data Bridges - Hands-on experience with Infrastructure as Code (IaC) using Terraform Enterprise - Experience building and managing CI/CD pipelines using Jenkins, GitHub Actions, or similar tools - Good understanding of cloud infrastructure design and deployment best practices - Experience managing multi-environment deployments (Dev, Staging, Production) - Strong troubleshooting and problem-solving skills - Knowledge of automation, monitoring, and cloud security practices - Experience working in Agile and collaborative environments - Good communication and documentation skills Benefits - Culture of Relentless Performance: join an unstoppable technology development team with a 99% project success rate and more than 30% year-over-year revenue growth. - Competitive Pay and Benefits: enjoy a comprehensive compensation and benefits package, including health insurance, language courses, and a relocation program. - Work From Anywhere Culture: make the most of the flexibility that comes with remote work. - Growth Mindset: reap the benefits of a range of professional development opportunities, including certification programs, mentorship and talent investment programs, internal mobility and internship opportunities. - Global Impact: collaborate on impactful projects for top global clients and shape the future of industries. - Welcoming Multicultural Environment: be a part of a dynamic, global team and thrive in an inclusive and supportive work environment with open communication and regular team-building company social events. - Social Sustainability Values: join our sustainable business practices focused on five pillars, including IT education, community empowerment, fair operating practices, environmental sustainability, and gender equality.

Related Categories

Infrastructure Engineer

Related Job Pages

Remote Full-time Jobs (US)More Remote Jobs

More Infrastructure Engineer Jobs

AI Infrastructure Engineer

Bright Vision Technologies

Infrastructure Engineer2 days ago

Full Time Remote

Role Description We are seeking an AI Infrastructure Engineer to design, build, and operate the platform layer that powers large-scale AI training and inference workloads. The role focuses on: - GPU clusters - Distributed training frameworks - Scheduling - Storage performance - Developer experience for ML engineers and researchers The ideal candidate has built or operated production AI infrastructure at scale, understands the interaction between hardware, kernel, scheduler, and ML framework, and brings strong software engineering discipline to platform work. Qualifications - Bachelor’s or Master’s degree in Computer Science or a related field. - Six or more years of experience in infrastructure, platform, or HPC engineering. - Hands-on experience operating GPU clusters or large-scale ML training infrastructure. - Strong proficiency in Python and at least one systems language such as Go or C++. - Deep understanding of distributed training, accelerator architectures, and collective communication. - Experience with Kubernetes, Slurm, Ray, or similar scheduling systems for ML workloads. - Strong understanding of Linux internals, networking, and high-performance storage. - Experience with at least one major cloud provider’s ML infrastructure offerings. - Strong software engineering practices including testing, CI/CD, and code review. - Excellent communication and cross-functional collaboration skills. Requirements - Design and operate GPU and accelerator infrastructure for training and inference, spanning on-prem clusters, cloud-managed services, and hybrid configurations. - Build scheduling, queueing, and resource-sharing systems that maximize accelerator utilization across many teams. - Integrate frameworks such as PyTorch, JAX, DeepSpeed, FSDP, Megatron-LM, and Ray Train into a unified platform offering. - Operate high-performance storage systems and data pipelines that keep accelerators fed with training data at near-line-rate. - Design networking architectures supporting RDMA, InfiniBand, NCCL, and high-bandwidth collective communication. - Build observability for AI workloads including utilization, throughput, training stability, and failure-mode analytics. - Implement checkpointing, restart, and fault-tolerance patterns for long-running training jobs at scale. - Drive cost optimization across compute, storage, and networking through scheduling, spot capacity, and right-sizing. - Develop developer tooling and paved-road workflows that let researchers launch experiments safely and efficiently. - Partner with research and applied ML teams to plan capacity for upcoming training runs. - Implement security controls, isolation, and access management for multi-tenant AI infrastructure. - Drive automation across cluster provisioning, lifecycle management, and configuration enforcement. - Maintain runbooks, capacity dashboards, and operational documentation for the AI platform. - Stay current with AI infrastructure research, accelerator hardware, and emerging open-source AI tooling. Benefits - Competitive base salary commensurate with experience, plus benefits.

AI AI/ML Python C++Kubernetes Ray Linux CI/CD PyTorch JAX Observability/Monitoring Mode

View details: AI Infrastructure Engineer

United States

$100K - $150K / year

Apply

Cloud Platform Infrastructure Architect

Guidehouse

Solving big problems, building trust in society, and empowering our clients to shape the future.

Infrastructure Engineer2 days ago

Full Time RemoteTeam 10,001+Since 2018H1B Sponsor

Company Site LinkedIn

• Design end-to-end AWS cloud infrastructure solutions to support enterprise applications, data platforms, and business critical services. • Create high-availability, disaster recovery, and multi-region strategies. • Develop migration strategies for transitioning on-premises workloads to AWS. • Contribute to cloud strategy and technology roadmaps. • Oversee deployment of Infrastructure as code (IaC) using tools like Terraform, AWS CloudFormation or CDK. • Establish and enhance continuous integration and continuous delivery pipelines to streamline software deployment and infrastructure updates. • Ensure observability, monitoring and logging using AWS CloudWatch, X-Ray or third-party tools. • Design architecture that meets security, governance, and regulatory compliance requirements (HIPAA, FedRAMP, SOC 2 etc.) as applicable. • Implement IAM best practices, encryption strategies, and secure networking. • Partner with developers, DevOps engineers, and business stakeholders to ensure solutions meet mission-critical needs. • Diagnose and resolve complex technical issues related to cloud infrastructure, ensuring high performance and reliability. • Provide technical leadership, mentoring, and guidance to engineering teams. • Stay current with emerging cloud technologies and trends, evaluating and recommending new solutions to enhance our capabilities.

Ansible AWS Cloud Docker EC2 Kubernetes Puppet Python Ray SQL Terraform

View details: Cloud Platform Infrastructure Architect

Virginia + 1 more

$102K - $170K / year

Apply

Director, Infrastructure Engineering – Program Management

NVIDIA

Infrastructure Engineer2 days ago

Full Time RemoteTeam 10,001+Since 1993H1B Sponsor

Company Site LinkedIn

• Lead how NVIDIA responds to and adapts to growth and rapid change. • Drive the organization’s program design and execution. • Collect requirements, define priorities, coordinate scheduling, and address challenges throughout the implementation lifecycle. • Optimize workflow using objective measures to improve engineering efficiency. • Establish a Program Management charter based on accountability, outcomes, leadership, and delivery. • Set and clearly define high standards and help the team achieve them. • Hire, retain, and grow outstanding people. • Drive high performance, clarity, positive culture, and collaboration.

View details: Director, Infrastructure Engineering – Program Management

Colorado + 3 more

$272K - $425.5K / year

Apply

Senior Backend & Infrastructure Engineer

Very Good Ventures

The Flutter Development Experts

Infrastructure Engineer2 days ago

Full Time RemoteTeam 51-200H1B No Sponsor

Company Site LinkedIn

Role Description We are seeking a hands-on Backend & Infrastructure Engineer (Infrastructure & Scalability) to design, build, and maintain scalable cloud infrastructure and backend services. This role requires deep expertise in backend services development, DevOps, programmatic infrastructure / Infrastructure As Code (IaC), and Site Reliability Engineering to ensure the stability and performance of our internal and client product initiatives. You will be instrumental in standing up, scaling, and supporting complete backends as well as individual services, including deployed AI agents and agentic infrastructure. As a consultancy, the needs and technology stacks of our clients can range widely across projects, as such we are looking for candidates with breadth of expertise and a flexible mindset. Our product development focus leans heavily towards cloud infrastructure and more recently towards deployment of AI agents and agentic infrastructure. Responsibilities - Platform & Infrastructure - Design, implement, and manage cloud infrastructure using programmatic tools such as Terraform, OpenTofu, or similar. - Build and manage staged development environments and corresponding CI/CD pipelines to ensure rapid, reliable, and automated deployments. - Oversee observability, monitoring, logging, and alerting systems for performance, usage metrics, and security. - Establish best practices for security, compliance, and cost optimization within cloud deployments. - Build, deploy, and manage individual services, including autonomous agents, ensuring they are scalable and performant. - Backend Development & Integration - Develop, deploy, and scale robust backend services and microservices, with a focus on high availability and resilience. - Develop and maintain robust backend APIs and integration services. Strong working knowledge of API design and expertise with GraphQL deployment. - Working knowledge of backend architectural patterns and their application. - Collaboration with front-end teams for services integrations and performance. Qualifications - 6+ years of experience as a Backend or Full Stack Engineer with a strong emphasis on DevOps, programmatic infrastructure and/or Site Reliability Engineering (SRE). - Expertise in programmatic infrastructure and cloud resource management using infrastructure-as-code tooling. - Strong experience with one or more of the major cloud platforms (e.g., GCP, AWS, or Azure). - Deep proficiency in at least one modern backend language (e.g., Python, Typescript, Go, Java/Kotlin, C#, or Rust). - Solid experience designing and scaling production backends, building and deploying custom services, and utilizing cloud managed services. - Experience with containerization and orchestration technologies (e.g., Docker, Kubernetes). - Familiarity with feature-flagging and staged product rollouts. - Experience implementing and managing observability, logging, and alerting systems. - Comfortable working in fast-paced environments with evolving requirements. - All resumes must be in English. - Must be fluent in English (level 3). Nice to haves - Certification from one or more cloud platform (GCP, AWS, or Azure). - Experience building and scaling AI-powered services or agents. - Anthropic or other AI certification (e.g., Claude Certified Architect Foundations). Benefits - Passion and enthusiasm for what we create. - REMOTE first and global company. - Subsidized health insurance, dental, and vision coverage. - Generous parental leave. - Flexible PTO and company holidays. - End of year company shut down (December 25 - January 1), in addition to observing company holidays. - 12 -16 weeks universal fully paid family leave. - Other benefits available based on location.