Job Closed

This listing is no longer active.

DraftKings logo
DraftKings

DraftKings is a sports-technology and media entertainment platform founded in 2012 to change the way consumers engage with their favorite athletes, teams, and s

Lead Infrastructure Engineer

Location

Bulgaria

Posted

49 days ago

Salary

0

Seniority

Senior

Job Description

Lead Infrastructure Engineer

DraftKings

At DraftKings, AI is becoming an integral part of both our present and future, powering how work gets done today, guiding smarter decisions, and sparking bold ideas. It's transforming how we enhance customer experiences, streamline operations, and unlock new possibilities. Our teams are energized by innovation and readily embrace emerging technology. We're not waiting for the future to arrive. We're shaping it, one bold step at a time. To those who see AI as a driver of progress, come build the future together. The Crown Is Yours As a Lead Infrastructure Engineer, you'll design, build, and operate the machine learning platform and Databricks infrastructure that powers scalable, reliable data science at DraftKings. You'll own the backbone that makes model development, training, and deployment fast, repeatable, and cost-aware-so teams can move from ideas to impact without friction. Working alongside Data Science, Machine Learning, Data Engineering, and Infrastructure partners, you'll turn evolving use cases into durable platform capabilities. You'll lead projects end to end, strengthening reliability, automation, and developer experience across the stack. What you'll do as a Lead Infrastructure Engineer - Own and operate Databricks infrastructure with a focus on reliability, scalability, performance, and cost optimization. - Build and manage cloud infrastructure on Amazon Web Services using infrastructure-as-code tools like Terraform. - Author and review technical designs that enable scalable, automated, and reproducible machine learning workflows - Lead and mentor engineers, helping grow team capability and strengthen day-to-day execution. - Partner closely with data scientists, data engineers, machine learning engineers, and Infrastructure teams to align platform capabilities to real-world needs. - Drive engineering initiatives from technical planning through delivery, production rollout, and long-term maintenance. - Stay current on data platform and machine learning platform trends, applying best practices that improve platform efficiency, usability, and governance. - Coach and mentor teammates, raising the bar through strong technical feedback, thoughtful enablement, and shared ownership. What you'll bring - At least 5 years of experience in machine learning platform, data platform, data engineering, or infrastructure engineering roles. - Hands-on experience administering and operating Databricks in production environments. - Deep familiarity with infrastructure as code, including Terraform or Pulumi, and proven ability to manage change safely at scale. - Experience with AWS, Docker, Kubernetes, and continuous integration and continuous delivery pipelines. - People management experience is a plus - Strong Python skills and familiarity with machine learning tooling such as MLflow, pandas, and scikit-learn. - A track record of owning complex systems end to end, including reliability improvements, incident follow-up, and performance tuning. - Clear, confident communication skills, including strong technical documentation and the ability to align cross-functional partners. #LI-SP1 Join Our Team We're a publicly traded (NASDAQ: DKNG) technology company headquartered in Boston. As a regulated gaming company, you may be required to obtain a gaming license issued by the appropriate state agency as a condition of employment. Don't worry, we'll guide you through the process if this is relevant to your role.

Benefits

  • 401(K), 401(K) matching, Adoption Assistance, Childcare benefits, Commuter benefits, Company equity, Company-sponsored outings, Continuing education stipend, Customized development tracks, Dedicated diversity and inclusion staff, Dental insurance, Disability insurance, Volunteer in local community, Employee stock purchase plan, Family medical leave, Fitness stipend, Flexible Spending Account (FSA), Flexible work schedule, Generous parental leave, Company-sponsored happy hours, Health insurance, Job training & conferences, Open door policy, Life insurance, Charitable contribution matching, Mentorship program, Online course subscriptions available, Open office floor plan, Paid holidays, Onsite office parking, Partners with nonprofits, Performance bonus, Pet insurance, Promote from within, Recreational clubs, Lunch and learns, Relocation assistance, Remote work program, Free snacks and drinks, Team based strategic planning, OKR operational model, Tuition reimbursement, Unlimited vacation policy, Vision insurance, Wellness programs, Some meals provided, Mental health benefits, Home-office stipend for remote employees, Diversity employee resource groups, Fertility benefits, Employee resource groups, Employee-led culture committees, Quarterly engagement surveys, Hybrid work model, In-person all-hands meetings, Employee awards, Pay transparency, Transgender health care benefits, Abortion travel benefits, Meditation space, Mother's room, Personal development training, Virtual coaching services, Flexible time off, Bereavement leave benefits

Related Categories

Related Job Pages

More Infrastructure Engineer Jobs

Full TimeRemoteTeam 11-50Since 2025

Deeter Analytics At Deeter Analytics, we’re building something that doesn’t get built twice in a generation. Our goal is to create a fundamental trading model as capable as today’s most advanced AI systems — but applied to global markets. Not incremental signals or isolated strategies, but a system that can continuously interpret, learn from, and act on the evolving state of the world. We train on large-scale, real-time social data — capturing how narratives form, how sentiment propagates, and how collective behavior drives markets. This requires operating at the frontier of data infrastructure, model design, and compute, all tightly integrated into a single system. You’ll work alongside a small group of elite engineers, AI researchers, and traders, in an environment defined by speed and ownership. We run experiments continuously. Ideas move from concept to production in hours. And the feedback loop is immediate — measured directly in live performance. About the role You will build and optimize the systems that turn data and compute into model capability. This role sits at the intersection of distributed systems, GPU infrastructure, and model training — ensuring that both our in-house models and state-of-the-art external models can be trained efficiently at scale. We prefer systems that maximize learning per unit of compute, not just systems that run. What you’ll work on ● Designing and operating distributed training systems on GPU infrastructure ● Optimizing GPU utilization, throughput, and training efficiency ● Translating model requirements into efficient system configurations ● Improving training speed, cost efficiency, and reliability ● Debugging failures in high-cost, high-pressure training environments What we’re looking for We’re looking for people who understand how systems behave under real constraints — and know how to push them to perform. Strong signals: ● You have run or significantly contributed to large-scale training workloads or compute-intensive systems ● You have a strong understanding of distributed systems in practice, including public cloud environments like AWS ● You understand how infrastructure behaves beneath the abstraction: ○ networking constraints ○ GPU/CPU utilization ○ memory and I/O bottlenecks ○ hardware limits at scale ● You can reason about how systems can be tuned for more efficient training and resource usage ● You have debugged systems where failures were non-trivial and costly ● You move quickly, identify bottlenecks, and eliminate them without being asked Bonus signals: ● Experience optimizing systems where small efficiency gains had large downstream impact ● Experience working under strict compute or cost constraints ● Experience debugging distributed or asynchronous systems with non-obvious failure modes ● You use AI tools to accelerate debugging, development, and iteration ● You care about building systems that are measurably efficient, not just functional

Philippines
NexGen Cloud logo

Senior Infrastructure Engineer

NexGen Cloud

The AI Factory. Accelerating the Future.

Full TimeRemoteTeam 51-200Since 2020H1B No Sponsor

• Own the design, deployment, and operation of OpenStack and Kubernetes environments — ensuring platform performance, scalability, and resilience for GPU workloads • Build and improve infrastructure using infrastructure-as-code and GitOps practices, driving automation across provisioning, deployment, and operational workflows • Optimise GPU workload scheduling using Kubernetes and NVIDIA tooling, and implement monitoring, logging, and alerting to ensure platform stability • Lead incident response and drive continuous improvement of reliability across the platform • Maintain strong security controls across infrastructure and container layers — RBAC, network policies, and tenant isolation • Work closely with Platform, DevOps, AI, Product, and Support teams to align infrastructure capabilities with customer and platform requirements

Australia
Peraton Corporation logo

Infrastructure Systems Engineer

Peraton Corporation

Peraton Corporation, a national security company headquartered in Herndon, Virginia, supplies solutions for mission-critical programs and systems. Founded in 2017, Peraton's missio

Responsibilities The Office of Space Weather Observations (SWO) under NESDIS is responsible for advancing space weather observational capabilities to meet NOAA programmatic needs. NOAA’s Space Weather Next (SWX) program maintains and extends space weather observations from various vantage points, selected to most efficiently provide comprehensive knowledge of the Sun and the near-Earth space environment needed to protect our technological infrastructure.  The Space Weather Ground Services (SWGS) is responsible for comprehensive ground services for all SWX projects, ensuring successful implementation and operation of observing assets and ensuring the continuity of space weather measurements made by SWFO-L1 and the GOES-R series satellites.  The SWGS Mission Operations Services (MOS) program must provide a full mission satellite command and control solution to support the L1 Series with two new independently launched observatories. Overview: Peraton is seeking a Systems Engineer to support the infrastructure design, configuration and deployment for a new satellite ground system development program supporting the National Oceanic and Atmospheric Administration (NOAA). This position will support all Infrastructure activities throughout the full system lifecycle—from architecture and design through integration, assessment, authorization, and operational deployment. The selected candidate will be responsible for supporting the program’s Infrastructure functions. This role requires close collaboration with other program functional elements including Cyber, Software Engineering, Networks, Systems Engineering, Architecture, Operations, Quality and program leadership. - Provide support for system design, deployment, and implementation of Information Technology (IT) systems within Linux, Windows and cloud infrastructures. - Develop and design hardware systems architecture based on project requirements. - Collaborate with cross-functional teams to define system specifications and requirements. - Research, evaluate, and recommend new hardware and software solutions. - Conduct comparative analyses of hardware and software options for specific projects. - Collaborate with stakeholders to understand business drivers and requirements for cloud migration. - Identify systems and applications suitable for migration to the cloud. - Create and maintain detailed technical drawings, diagrams, and schematics. - Create and maintain detailed design documentation. - Ensure accuracy and completeness of program technical documentation. - Monitor and analyze physical and virtualized environments to track resource allocation. - Support integration efforts for hardware and software components into larger systems. - Support procurement efforts for new material and maintenance renewals. - Support ongoing obsolescence analysis of existing hardware and software.  **This position is contingent on contract award.** #SWOMOS Qualifications - 5 years with BS/BA, 3 years with MS/MA or 0 years with PhD. 9 years of experience with no degree - Demonstrated proficiency in system design, design principles, implementation, and troubleshooting across various platforms. - Ability to obtain and hold a Public Trust clearance – US Citizenship is required - Hands-on experience with cloud computing platforms (e.g., AWS, Azure, Google Cloud), including designing, deploying, and managing cloud-based infrastructures. - Proficiency in designing, deploying, and managing virtualized environments to optimize resource utilization and performance. - Expertise in VMware virtualization technologies, including vSphere, vCenter Server, and Horizon View  - Strong understanding of Windows and Linux operating systems, including installation, configuration, and administration. - Hands-on experience with hardware including servers, networking equipment, and storage solutions Knowledge of product specifications, configurations, and best practices for deployment and maintenance. - Experience in managing changes to hardware designs, requirements, and project plans. Ability to assess the impact of changes, obtain approval, and implement changes effectively while minimizing disruption. - Experience in troubleshooting system issues and optimizing performance for mission-critical applications. - Proficient in creating formal drawings and schematics for system architecture, system diagrams, network topologies, and other visual representations of hardware and infrastructure designs. (Visio experience required, AutoCAD recommended). - Knowledge of electronic components and their specifications. Ability to select appropriate hardware components based on performance requirements, cost considerations, and availability. - Experience in collaborating with hardware vendors, software providers, and service providers to evaluate products and services.  Desired Qualifications: - Experience working within a distributed virtual team environment, with proficiency in remote collaboration tools and practices. - Experience with Infrastructure as Code (IaC) to include tools like Terraform, CloudFormation and Ansible - AWS Certified Cloud Practioner or similar cloud certifications - Familiar with the Atlassian tool suite, including Jira, Asset Manager, Confluence, Risk Register, Crucible, Bitbucket, Git, etc. - Strong interpersonal skills with a willingness to foster strong relationships with coworkers and vendors. - Highly organized with strong attention to detail - Outstanding verbal and written communication skills - Experience leading projects and process improvement activities to completion with successful outcomes and delivery of desired results. - Active Public Trust clearance Peraton Overview Peraton is a next-generation national security company that drives missions of consequence spanning the globe and extending to the farthest reaches of the galaxy. As the world’s leading mission capability integrator and transformative enterprise IT provider, we deliver trusted, highly differentiated solutions and technologies to protect our nation and allies. Peraton operates at the critical nexus between traditional and nontraditional threats across all domains: land, sea, space, air, and cyberspace. The company serves as a valued partner to essential government agencies and supports every branch of the U.S. armed forces. Each day, our employees do the can’t be done by solving the most daunting challenges facing our customers. Visit peraton.com to learn how we’re keeping people around the world safe and secure. Target Salary Range $104,000 - $166,000. This represents the typical salary range for this position. Salary is determined by various factors, including but not limited to, the scope and responsibilities of the position, the individual’s experience, education, knowledge, skills, and competencies, as well as geographic location and business and contract considerations. Depending on the position, employees may be eligible for overtime, shift differential, and a discretionary bonus in addition to base pay. EEO EEO: Equal opportunity employer, including disability and protected veterans, or other characteristics protected by law.

United States
$104K - $166K / year
Axway logo

Senior Cloud Infrastructure Architect

Axway

Axway helps companies move forward faster using our Amplify API Management Platform and proven MFT and B2B/EDI solutions

Full TimeRemoteTeam 1,001-5,000Since 2001H1B Sponsor

• Design and implement scalable cloud and hybrid infrastructure solutions leveraging Microsoft Azure and on-premises technologies. • Architect and support highly available, resilient, and disaster recovery-enabled infrastructure environments. • Design and implement secure Azure networking solutions including: VNets, ExpressRoute, VPN Gateway, Azure Firewall, NSGs, Load Balancers, Application Gateway / WAF. • Deploy and support mission-critical applications using Azure infrastructure services. • Act as technical lead for customer onboarding, migration, and cloud infrastructure projects. • Design, deploy, administer, and troubleshoot enterprise Linux and Windows server environments. • Manage Active Directory, Entra ID (Azure AD), and hybrid identity integrations. • Deploy, administer, and support Kubernetes environments, including Azure Kubernetes Service (AKS). • Support containerized environments using Docker and Kubernetes orchestration technologies. • Manage Kubernetes cluster operations including scaling, upgrades, patching, monitoring, ingress, storage, and security. • Partner with DevOps and engineering teams to support CI/CD pipelines and cloud-native application deployments. • Automate infrastructure provisioning and operational activities using Terraform, scripting, and Infrastructure-as-Code methodologies. • Administer infrastructure monitoring and observability platforms such as Azure Monitor, Prometheus, Grafana, and Nagios. • Support enterprise hosting technologies including backup, replication, storage, proxy, and load balancing solutions. • Provide advanced L3 support across cloud infrastructure, Kubernetes, Linux, networking, and storage platforms. • Perform infrastructure lifecycle management, patching, capacity planning, and operational optimization. • Participate in audits, disaster recovery testing, operational governance, and security compliance initiatives. • Conduct root cause analysis and drive long-term operational improvements. • Participate in after-hours support rotations and scheduled maintenance activities as required.

United States