Job Closed

This listing is no longer active.

Truelogic Software logo
Truelogic Software

Premium boutique software development company that helps brands with big ideas to make a difference in people’s lives.

Senior Infrastructure Engineer – AWS

Infrastructure EngineerInfrastructure EngineerFull TimeRemoteSeniorTeam 501-1,000Since 2004H1B No SponsorCompany SiteLinkedIn

Location

Mexico

Posted

98 days ago

Salary

0

Seniority

Senior

Job Description

Senior Infrastructure Engineer – AWS

Truelogic Software

• Own and scale AWS and Kubernetes infrastructure, ensuring reliability, security, performance, and cost optimization. • Build and maintain CI/CD pipelines and infrastructure-as-code (Terraform/CDK) to enable safe, automated deployments. • Lead observability and monitoring initiatives (Datadog), driving performance visibility and operational excellence. • Manage data infrastructure (Redshift, Airflow, DBT) supporting real-time and analytics workloads. • Implement security best practices and collaborate with engineering teams on scalability, incident response, and developer self-service tooling.

Job Requirements

  • 8+ years of experience in Infrastructure, DevOps, or SRE roles within production environments.
  • Strong hands-on experience with AWS and deep expertise managing Kubernetes clusters at scale.
  • Proven experience building and maintaining CI/CD pipelines and implementing infrastructure-as-code (Terraform, CloudFormation, or CDK).
  • Solid scripting skills (Python, Bash, or similar) and experience working with distributed systems in production.
  • Demonstrated ability to improve system reliability, observability, security, and overall developer productivity.

Benefits

  • 100% Remote Work: Enjoy the freedom to work from the location that helps you thrive. All it takes is a laptop and a reliable internet connection.
  • Highly Competitive USD Pay: Earn an excellent, market-leading compensation in USD, that goes beyond typical market offerings.
  • Paid Time Off: We value your well-being. Our paid time off policies ensure you have the chance to unwind and recharge when needed.
  • Work with Autonomy: Enjoy the freedom to manage your time as long as the work gets done. Focus on results, not the clock.
  • Work with Top American Companies: Grow your expertise working on innovative, high-impact projects with Industry-Leading U.S. Companies.

Related Categories

Related Job Pages

More Infrastructure Engineer Jobs

Cash App logo

Engineering Lead, Core Experiences & Server Infrastructure

Cash App

Initially built to take the pain out of peer-to-peer payments, Cash App has gone from a simple product with a single purpose to a dynamic app, bringing a better way to send, spend, invest, borrow and save to our millions of monthly active users. With a mission to redefine the world's relationship with money by making it more relatable, instantly available and universally accessible.

OtherRemoteTeam 3,500Since 2013

It all started with an idea at Block in 2013. Initially built to take the pain out of peer-to-peer payments, Cash App has gone from a simple product with a single purpose to a dynamic ecosystem, developing unique financial products, including Afterpay/Clearpay, to provide a better way to send, spend, invest, borrow and save to our 50+ million monthly active customers. We want to redefine the world's relationship with money to make it more relatable, instantly available, and universally accessible. Today, Cash App has thousands of employees working globally across office and remote locations, with a culture geared toward innovation, collaboration and impact. We've been a distributed team since day one, and many of our roles can be done remotely from the countries where Cash App operates. No matter the location, we tailor our experience to ensure our employees are creative, productive, and happy. The Role As the Engineering Lead for Core Experiences and Server Infrastructure, you will lead the central pillars of the Cash App ecosystem. You are responsible for the foundational products-including P2P, activity feed, social connectivity-as well as the critical product-platform systems like data services, experimentation, and messaging that power every customer experience. Your goal is to drive strategy and execution across all primary user journeys while building the underlying systems that enable product velocity, quality, and scale for thousands of engineers. You Will Oversee the evolution of high-concurrency server infrastructure supporting 60M monthly active users, ensuring 99.99% uptime for mission-critical financial services during periods of hyper-growth. Bridge the gap between core infrastructure and product engineering, leading cross-functional teams to launch flagship features (e.g., Peer-to-Peer payments) from ideation through global rollout. Championed a "Product-First" engineering culture, aligning backend roadmap priorities with user-facing KPIs to reduce friction in the money-movement lifecycle. Managed a multi-million dollar annual cloud budget, optimizing server-side resource allocation and infrastructure efficiency without compromising system performance or developer velocity. Built and mentored a high-performing team of 50+ engineers, fostering a culture of engineering excellence for one of the world's most visible fintech platforms. You Have 10+ years of experience leading and scaling engineering teams, including managing engineering managers. A strong track record of architecting, building and scaling distributed systems capable of processing billions of dollars in transaction volume, prioritizing low-latency execution and transactional integrity across microservices. Demonstrated ability to drive strategy, execution, and operational excellence in a complex and regulated environment. Block is committed to building an inclusive and diverse workplace. We encourage applications from candidates of all backgrounds. We're working to build a more inclusive economy where our customers have equal access to opportunity, and we strive to live by these same values in building our workplace. Block is an equal opportunity employer evaluating all employees and job applicants without regard to identity or any legally protected class. We will consider qualified applicants with arrest or conviction records for employment in accordance with state and local laws and "fair chance" ordinances. We believe in being fair, and are committed to an inclusive interview experience, including providing reasonable accommodations to disabled applicants throughout the recruitment process. We encourage applicants to share any needed accommodations with their recruiter, who will treat these requests as confidentially as possible. Want to learn more about what we're doing to build a workplace that is fair and square? Check out our I+D page . Block takes a market-based approach to pay, and pay may vary depending on your location. U.S. locations are categorized into one of four zones based on a cost of labor index for that geographic area. The successful candidate's starting pay will be determined based on job-related skills, experience, qualifications, work location, and market conditions. These ranges may be modified in the future.

California
Job Closed
Andromeda Cluster logo

Performance Engineer - AI Infrastructure

Andromeda Cluster

Andromeda Cluster was founded by Nat Friedman and Daniel Gross to give early-stage startups access to the kind of scaled AI infrastructure once reserved only for hyperscalers. We began with a single managed cluster — but it filled almost instantly. Today, Andromeda works with leading AI labs, data centers, and cloud providers to deliver compute when and where it’s needed most. Our long-term vision is to build the liquidity layer for global AI compute. We are expanding to new frontiers to find the brightest that work in AI infrastructure, research and engineering.

OtherRemoteTeam 11-50

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more. Role Description We are hiring a Performance Engineer to join our Growth team. In this role, your "product" is the efficiency and throughput of our massive-scale AI clusters. As we scale our network, the difference between a "working" cluster and an "optimized" one represents millions of dollars in value and weeks of saved research time for our customers. - Conduct end-to-end profiling of training workloads to identify bottlenecks across GPU kernels, NCCL communication, and storage I/O. - Collaborate with systems engineers to improve scheduling efficiency, collective communication performance, and kernel execution. - Build and maintain high-fidelity tooling to monitor and visualize MFU, throughput, and cluster uptime. - Design technical processes (e.g., postmortem reviews, incident response) that help the team operate effectively and avoid repeating performance regressions. Qualifications - You love optimizing performance and digging into systems to understand how every layer interacts—from the training loop to the hardware. - Proven experience running distributed training jobs on multi-GPU systems or HPC clusters. - Strong programming skills in Python and C++ (Rust or CUDA experience is a major plus). - Solid understanding of PyTorch, JAX, or TensorFlow, and how large-scale training loops are built. - Familiarity with modern cloud infrastructure, including Kubernetes and Infrastructure as Code. - A passion for measuring efficiency rigorously and translating raw profiling data into practical engineering improvements. Requirements - Experience with Linux kernel tuning, eBPF, and understanding systems design tradeoffs at the hardware level. - Hands-on experience with GPUs, TPUs, or Trainium, and the networking libraries that power them (NCCL, MPI, UCX). - Expertise in security best practices for high-scale infrastructure. - Familiarity with monitoring tools like Prometheus and Grafana. Benefits This is a builder’s role. You’ll have ownership and autonomy to shape how our systems run, working directly with customers and providers while building the foundation for reliable, scalable AI infrastructure. Company Description Andromeda Cluster was founded by Nat Friedman and Daniel Gross to give early-stage startups access to the kind of scaled AI infrastructure once reserved only for hyperscalers. - We began with a single managed cluster — but it filled almost instantly. - Today, Andromeda works with leading AI labs, data centers, and cloud providers to deliver compute when and where it’s needed most. - Our long-term vision is to build the liquidity layer for global AI compute. - We are expanding to new frontiers to find the brightest that work in AI infrastructure, research and engineering.

United States
Job Closed
becon GmbH logo

Open-Source Infrastructure Specialist

becon GmbH

Komplettanbieter für Lösungen und Dienstleistungen der Informations- und Telekommunikationstechnologie. #DataCenterLove

Full TimeRemoteTeam 51-200Since 1993H1B No Sponsor

• Aufbau und Konfiguration von Linux-basierten Infrastrukturen • Integration und Anpassung von Open-Source-Komponenten • Durchführung von Systemupdates, Hardening und Performance-Optimierungen • Monitoring und Sicherstellung der Systemverfügbarkeit • Unterstützung bei technischen Konzepten und Architekturentscheidungen • Automatisierung wiederkehrender administrativer Prozesse

Germany
Job Closed
OtherRemoteTeam 51-200Since 2017H1B No Sponsor

• Build, maintain, and optimize the physical network and compute layer of on‑premises environments. • Ensure reliability, performance, and scalability of networks, firewalls, server hardware, racks, power, cabling, and related physical systems. • Deploy, install, rack, cable, and power physical servers, storage systems, and supporting hardware. • Perform hardware diagnostics, break/fix tasks, component replacement, and lifecycle upgrades. • Manage firmware, BIOS, and device‑level updates. • Maintain accurate inventory of physical compute assets, spares, and components. • Design, deploy, and manage enterprise network infrastructure using Cisco technologies. • Architect and implement AWS networking solutions.

United States
Job Closed