Job Closed

This listing is no longer active.

Thermo Fisher Scientific logo
Thermo Fisher Scientific

The World Leader In Serving Science

ML Ops Infrastructure Engineer

Infrastructure EngineerInfrastructure EngineerFull TimeRemoteSeniorTeam 10,001+H1B SponsorCompany SiteLinkedIn

Location

United States

Posted

55 days ago

Salary

$150K / year

Seniority

Senior

Bachelor Degree6 yrs expEnglishAnsibleKubernetesLinuxNode.jsTerraform

Job Description

ML Ops Infrastructure Engineer

Thermo Fisher Scientific

• Deploy, configure, and maintain on-premises GPU servers — primarily NVIDIA H200 and A100 nodes. • Implement and tune NVIDIA-specific tooling: DCGM (Data Center GPU Manager), MIG (Multi-Instance GPU) partitioning, and NVIDIA Container Toolkit. • Manage bare-metal provisioning workflows (iPXE, PXE, MAAS/Foreman) for repeatable server builds. • Monitor hardware health, capacity utilization, and thermal/power envelopes. • Build, upgrade, and maintain production-grade Kubernetes clusters on bare-metal infrastructure. • Design and operate cluster networking using CNI plugins for high-throughput AI workloads. • Implement resource quotas, LimitRanges, PriorityClasses, and node affinity/taints. • Deploy and operate MLOps platforms (MLflow, Kubeflow) for experiment tracking, model versioning, and pipeline orchestration. • Design the high-bandwidth network fabric required for GPU cluster interconnects. • Maintain hardened OS baselines across all infrastructure nodes; automate compliance scanning.

Job Requirements

  • 6+ years of infrastructure engineering experience, with at least 3 years managing GPU compute clusters or HPC environments in production.
  • Deep hands-on expertise with NVIDIA GPU infrastructure: driver lifecycle management, CUDA, DCGM, MIG, NVLink topologies, and the NVIDIA GPU Operator for Kubernetes.
  • Production-level Kubernetes administration experience on bare-metal: cluster provisioning, upgrades, CNI/CSI configuration, RBAC, and day-2 operations.
  • Strong networking fundamentals: BGP, VLAN segmentation, RDMA/RoCE or InfiniBand configuration, load balancing, and firewall policy management.
  • Hands-on experience with software-defined storage (Ceph, Rook-Ceph, or MinIO) in AI/HPC workload contexts — performance tuning, capacity planning, and failure recovery.
  • Practical MLOps experience: model serving infrastructure (Triton or equivalent), experiment tracking (MLflow or Kubeflow), and GitOps-based model deployment pipelines.
  • Working knowledge of NIST SP 800-171 controls and the ability to translate them into concrete infrastructure configurations and audit evidence.
  • Proficiency with infrastructure-as-code tooling: Terraform or Ansible for reproducible, auditable infrastructure builds.
  • Strong Linux systems administration skills (RHEL/Rocky Linux or Ubuntu) including kernel tuning, storage I/O optimization, and systemd service management.
  • Excellent written communication for producing infrastructure runbooks, network diagrams, and compliance documentation in a remote-first environment.

Benefits

  • Competitive, globally benchmarked compensation including base salary, equity, and performance bonus.
  • Fully remote with async-first culture; periodic travel to client facilities and team on-sites for cluster deployments and planning.
  • Access to cutting-edge NVIDIA hardware, early access to new GPU generations, and budget for relevant certifications (NVIDIA, CKA/CKS, RHCSA, etc.).
  • Collaboration with a Lead Architect and engineering team who understand infrastructure as a product — not just a cost center.

Related Categories

Related Job Pages

More Infrastructure Engineer Jobs

Full TimeRemoteTeam 10,001

Responsibilities Artificial Intelligence; Advanced Technology; The very best in patient care. With decades of expertise, RadNet is Leading Radiology Forward. With dynamic cross-training and advancement opportunities in a team-focused environment, the core of RadNet’s success is its people with the commitment to a better healthcare experience. When you join RadNet as an Infrastructure Engineer, you will be joining a dedicated team of professionals who deliver quality, value, and access in the 21st century and align all stakeholders- patients, providers, payors, and regulators to achieve the best clinical outcomes. You will: - Design, implement, support, and improve enterprise infrastructure platforms across on-premises and hybrid cloud environments. - Engineer and maintain virtualization, private cloud, enterprise storage, SAN/NAS, backup and recovery, and related infrastructure platforms. - Contribute to infrastructure design, reference architectures, and technical standards for virtualization, storage, hybrid cloud, and platform services. - Support hardware and infrastructure lifecycle management, including refresh planning, upgrades, decommissioning, firmware coordination, and capacity planning. - Support virtual desktop, secure workspace, remote application delivery, and containerized platform services where applicable. - Partner with cloud, networking, security, and application teams to support hybrid connectivity, cloud operations, and shared infrastructure services. - Partner with application and platform teams to understand application workflows, dependencies, and operational requirements in support of infrastructure design, troubleshooting, and service delivery. - Partner with security teams to support infrastructure hardening, patching, remediation activities, vulnerability response, and security bulletin review across supported platforms. - Develop and maintain infrastructure automation using scripting, orchestration, and infrastructure-as-code tools. - Maintain infrastructure documentation, standards, diagrams, operational procedures, and source-of-truth records in configuration and asset management platforms. - Support monitoring, alerting, performance analysis, service improvement, and advanced troubleshooting across infrastructure platforms. - Participate in scheduled maintenance, incident response, and critical troubleshooting as needed. - Support project-based infrastructure delivery for strategic initiatives, including new site deployments, acquisitions, platform rollouts, and production go-lives with defined timelines and operational readiness requirements. - Coordinate with vendors and support partners for advanced troubleshooting, remediation planning, lifecycle guidance, and issue resolution across supported infrastructure platforms. You Are: - A hands-on infrastructure engineer who is comfortable working across traditional datacenter systems and modern hybrid cloud environments. - A strong problem-solver who can troubleshoot complex issues across virtualization, storage, backup, automation, and core infrastructure services. - An effective communicator who works well across technical teams and contributes to secure, scalable, and well-supported platforms. - Comfortable making informed decisions, identifying risks early, and driving issues toward resolution. - Calm and effective in fast-paced environments, with the ability to manage competing priorities and respond to critical issues with urgency and professionalism. To Ensure Success in This Role, You Must Have: - Bachelor’s degree in Computer Science, Information Technology, or a related field, or equivalent combination of education, certifications, and relevant experience. - 5+ years of experience in infrastructure engineering, systems engineering, virtualization, enterprise storage, datacenter, cloud engineering or related roles. - Strong hands-on experience with enterprise virtualization, storage, SAN/NAS, backup and recovery, and core datacenter infrastructure. - Strong understanding of core networking concepts and infrastructure dependencies, with the ability to work effectively with network teams on connectivity, routing, firewall, load balancing, and related service needs. - Experience designing and implementing scalable, resilient infrastructure solutions in enterprise or hybrid cloud environments. - Experience supporting platform lifecycle management, including upgrades, standards, refresh planning, and operational support. - Experience with infrastructure automation, scripting, and technical documentation, including runbooks, diagrams, and asset or configuration records. - Strong troubleshooting, communication, collaboration, and organizational skills. - Experience with VMware technologies such as vSphere, VMware Cloud Foundation, Aria Operations, Aria Automation, Horizon, or similar platforms is preferred. - Experience with enterprise storage, Fibre Channel switching, and platforms such as HPE storage, Brocade, Qumulo, or similar technologies is preferred. - Experience with cloud platforms such as Google Cloud Platform, AWS, or Microsoft Azure is preferred. - Experience with Kubernetes, Docker, and other containerized infrastructure or application services is preferred. - Experience with secure workspace, browser isolation, VDI, or remote application delivery technologies such as Citrix, Island Enterprise Browser, or similar platforms is preferred. - Familiarity with Active Directory, Azure AD, or Microsoft Entra ID is a plus. - Experience with automation and infrastructure-as-code tools such as Terraform, Ansible, PowerShell, Python, or similar technologies is preferred. - Experience with CMDB, DCIM, IPAM, or source-of-truth platforms such as NetBox is preferred. - Relevant certifications in VMware, cloud, Kubernetes, storage, backup, or infrastructure technologies are a plus. - Experience in healthcare or other regulated enterprise environments is a plus. We Offer: - - Comprehensive Medical, Dental and Vision coverages. - Health Savings Accounts with employer funding. - Wellness dollars - 401(k) Employer Match - Free services at any of our imaging centers for you and your immediate family. Pay Range: $100,000.00 - $130,000.00 per year

United States
$100K - $130K / year
Wells Fargo logo

Lead Infrastructure Engineer

Wells Fargo

Wells Fargo & Company (NYSE: WFC) is a leading financial services company that has approximately $2.1 trillion in assets. We provide a diversified set of banking, investment and mortgage products and services, as well as consumer and commercial finance, through our four reportable operating segments: Consumer Banking and Lending, Commercial Banking, Corporate and Investment Banking, and Wealth & Investment Management. Wells Fargo ranked No. 33 on Fortune’s 2025 rankings of America’s largest corporations. Our technology professionals drive innovation, information security, and big data analytics while maintaining a network that handles more than 12 billion customer interactions a year. Join us! Are you looking for more? Find it here. At Wells Fargo, we're more than a financial services leader – we’re a global trailblazer committed to driving innovation, empowering communities, and helping our customers succeed. We believe that a meaningful career is much more than just a job – it’s about finding all of the elements to help you thrive, in one place. Living the Well Life means you’re supported in life, not just work. It means having robust benefits, competitive compensation, and programs designed to help you find work-life balance and well-being. You’ll be rewarded for investing in your community, celebrated for being your authentic self, and empowered to grow. And we’re recognized for it – Wells Fargo once again ranked in the top three – making us the #1 financial services employer – on the 2025 LinkedIn Top Companies list of best workplaces “to grow your career” in the U.S. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other legally protected characteristic. © 2026 Wells Fargo Bank, N.A. All rights reserved. Member FDIC.

Full TimeRemoteTeam 10,001+Since 1852H1B Sponsor

About this role: Wells Fargo is seeking a Lead Infrastructure Engineer. In this role, you will: - Lead complex initiatives to develop infrastructure to provide solutions for business applications - Participate in various projects intended to continually improve or upgrade the infrastructure - Evaluate internal and external software solutions which could be leveraged to meet target state architecture goals - Review and analyze high impact outages to ensure the proper processes and procedures are in place to avoid problems in the future - Design, build, deploy and maintain infrastructure solutions through collaborative efforts with the team and third party vendors - Design, code, test, debug and document programs using Agile development practices - Make decisions in technical designs, implementation plans and identify project risks and resource requirements - Direct the daily risk and control flow of operations, focusing on policies, procedures and work standards to ensure success - Recommend courses of action to maintain cost effectiveness and achieve results - Collaborate and consult with peers, colleagues and managers to resolve issues and achieve goals - Interact with customer and vendor. Required Qualifications: - 5+ years of Technology Infrastructure Engineering and Solutions experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education - 5+ years of Broadcom CA7, CA11, CA1 installation and administration experience - 5+ years of z/OS experience - 5+ years of JCL(Job Control Language) experience Desired Qualifications: - Broadcom Autosys experience - 5+ years RACF experience - 5+ years SMP/E experience - 5+ years of TSO (Time Sharing Options) experience - 5+ years of experience installing third party tools for IBM, Broadcom, or BMC - zOS Unix System Services - REXX programming language Job Expectations: - Support all aspects of Broadcom's CA7 workload automation tool suite - This position is not eligible for visa sponsorship - This position can be 100% remote This position will focus on the Broadcom suite of products regarding workload automation. Candidate will install the products, consult with the scheduling team, research problems and provide performance recommendations. Pay Range Reflected is the base pay range offered for this position. Pay may vary depending on factors including but not limited to demonstrated examples of prior performance, skills, experience, or work location. Employees may also be eligible for incentive opportunities. $119,000.00 - $206,000.00 Benefits Wells Fargo provides eligible employees with a comprehensive set of benefits, many of which are listed below. Visit Benefits - Wells Fargo Jobs for an overview of the following benefit plans and programs offered to employees. - Health benefits - 401(k) Plan - Paid time off - Disability benefits - Life insurance, critical illness insurance, and accident insurance - Parental leave - Critical caregiving leave - Discounts and savings - Commuter benefits - Tuition reimbursement - Scholarships for dependent children - Adoption reimbursement Posting End Date: 7 Apr 2026*Job posting may come down early due to volume of applicants. We Value Equal Opportunity Wells Fargo is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other legally protected characteristic. Employees support our focus on building strong customer relationships balanced with a strong risk mitigating and compliance-driven culture which firmly establishes those disciplines as critical to the success of our customers and company. They are accountable for execution of all applicable risk programs (Credit, Market, Financial Crimes, Operational, Regulatory Compliance), which includes effectively following and adhering to applicable Wells Fargo policies and procedures, appropriately fulfilling risk and compliance obligations, timely and effective escalation and remediation of issues, and making sound risk decisions. There is emphasis on proactive monitoring, governance, risk identification and escalation, as well as making sound risk decisions commensurate with the business unit’s risk appetite and all risk and compliance program requirements. Applicants with Disabilities To request a medical accommodation during the application or interview process, visit Disability Inclusion at Wells Fargo. Drug and Alcohol Policy Wells Fargo maintains a drug free workplace. Please see our Drug and Alcohol Policy to learn more. Wells Fargo Recruitment and Hiring Requirements: a. Third-Party recordings are prohibited unless authorized by Wells Fargo. b. Wells Fargo requires you to directly represent your own experiences during the recruiting and hiring process.

United States
$119K - $206K / year
Job Closed
Full TimeRemoteTeam 10,001+Since 1966H1B Sponsor

At Jabil (NYSE: JBL), we are proud to be a trusted partner for the world's top brands, offering comprehensive engineering, supply chain, and manufacturing solutions. With 60 years of experience across industries and a vast network of over 100 sites worldwide, Jabil combines global reach with local expertise to deliver both scalable and customized solutions. Our commitment extends beyond business success as we strive to build sustainable processes that minimize environmental impact and foster vibrant and diverse communities around the globe. JOB SUMMARY Provides technical leadership and is responsible for determining the direction for the IT architecture, standards, design and implementation approaches for the company’s application systems, infrastructure and/or network-based cloud product systems. Creates, evaluates and implements plans and design proposals for high impact IT solutions and their use involving leading edge technologies and methods considering key factors such as their long-term effectiveness (service delivery and cost), practicality, technical limitations and criticality. This is an expert-level role requiring independent action to establish methods and procedures on new and/or special assignments. ESSENTIAL DUTIES AND RESPONSIBILITIES IT Architect - Know and understand Jabil business strategy - Know and understand Jabil IT strategy & objectives - Define the overall solution architecture consistent with Jabil’s methodology - Be responsible for the technical solution by providing leadership for the customer, project manager, domain architects, domain specialists and application engineers to advance and deliver solutions - Consult and Inform Enterprise Architects and Senior IT Architects to design and deliver solutions - Earn trust of clients and management - Assess merits of alternative technical approaches and gain consensus for best approach - Learn, follow, promote, and improve recognized methodologies to design and deliver solutions - Ensure that the non-functional requirements are satisfied including, but not limited to, security, disaster recovery, availability, and performance - Researches technology and industry trends to hone both personal and Jabil’s competitive edge - Through modeling or prototyping, validate solution prior to full implementation - Develop expertise in one of the following disciplines: Enterprise Architecture, Business Architecture, Information Architecture, Application Architecture, Technology Infrastructure Architecture - Mentor IT professionals Management Practices - Develop project plans and influence project organization - Apply recognized system sizing methodology - Vet change(s) with respect to scope, schedule, cost, risk, etc. - Cross train staff to reduce delivery risk - Define processes & methods necessary to support delivery/deployment - Define management tools to support production environment Continuous Improvement - Utilize Lean Six Sigma or other methods to identify & provide guidance on organizational improvement opportunities - Perform root cause analysis and remediation actions - Contribute to Jabil IP though development and submission of patents Policy & Procedures - Comply with IT policy, procedure, and process - Adhere to all safety and health rules and regulations associated with this position and as directed by supervisor - Comply and follow all procedures within the company security policy Training & Development - Define technical job content & qualifications of key roles required to support technical infrastructure - Work closely with management to assess and aid the development of staff skill sets - Assist management to assess and help resolve staffing knowledge gaps Communication - Publish and present to customers, IT leaders and business executives - Engage with vendors and third parties as needed - Organize verbal and written ideas clearly and use an appropriate business style - Ask questions; encourage input from staff - Develop peer relationships with Senior IT Architects MANAGEMENT & SUPERVISORY RESPONSIBILITIES - Typically reports to management. - The purpose of this role is not primarily managerial, and the job is typically NOT directly responsible for managing employees (e.g., hiring/termination and/or pay decisions, performance management). JOB QUALIFICATIONS KNOWLEDGE REQUIREMENTS - Understanding of all architectural components and their interrelationships - Knowledge of Software Engineering and Architectural Principles and methods - Solid presentation and written communication skills - Good judgment and the ability to handle stressful situations - Team lead experience in application development - Knowledge and experience of one or more languages e.g., Java, C#, etc. - Knowledge and experience with server-side technologies - Knowledge and experience with client-side technologies e.g., Node, Angular - Knowledge and experience working in an Agile methodology - Knowledge of SOA, including REST, SOAP, API Management, and other integration patterns e.g. ESB, EIP, etc. - Knowledge of relational databases and SQL - Knowledge of UML or ArchiMate - Knowledge of cloud technologies - Ability to define problems, collect data, establish facts, and draw valid conclusions EDUCATION & EXPERIENCE REQUIREMENTS - Bachelor’s degree required - Post-graduate degree in Computer Science or Management Information Systems expected - Minimum 12 years of experience in a related discipline - Or, equivalent combination of education, training, or experience Preferred Certifications: - Open CA Level 1: Certified - TOGAF 9 Foundation - ArchiMate 3 Foundation The pay range for this role is $126,100 - $227,000. Job-related, non-discriminatory factors used to determine the actual offered rate include qualifications and experience, geographic location, education, external market data, and consideration of internal equity. The anticipated close date of this job requisition is: May 22, 2026. As part of the total rewards package, this position is eligible for a short-term incentive based on performance. In addition, Jabil offers benefits to enhance your health, wealth, and resilient self. These include medical, dental, and vision insurance plans; paid time off accruing at a rate of 3.07 hours during your first year of employment; 4 weeks of paid parental leave; in 2026, 11 company-paid holidays (9 fixed holidays and 2 optional floating holidays), subject to change yearly; 401(k) retirement plan; and employee stock purchase plan. BE AWARE OF FRAUD: When applying for a job at Jabil you will be contacted via correspondence through our official job portal with a jabil.com e-mail address; direct phone call from a member of the Jabil team; or direct e-mail with a jabil.com e-mail address. Jabil does not request payments for interviews or at any other point during the hiring process. Jabil will not ask for your personal identifying information such as a social security number, birth certificate, financial institution, driver’s license number or passport information over the phone or via e-mail. If you believe you are a victim of identity theft, contact the Federal Bureau of Investigations internet crime hotline (www.ic3.gov), the Federal Trade Commission identity theft hotline (www.identitytheft.gov) and/or your local police department. Any scam job listings should be reported to whatever website it was posted in. Jabil, including its subsidiaries, is an equal opportunity employer and considers qualified applicants for employment without regard to race, color, religion, national origin, sex, sexual orientation, gender identity, age, disability, genetic information, veteran status, or any other characteristic protected by law. Accessibility Accommodation If you are a qualified individual with a disability, you have the right to request a reasonable accommodation if you are unable or limited in your ability to use or access Jabil.com/Careers site as a result of your disability. You can request a reasonable accommodation by sending an e-mail to Always_Accessible@Jabil.com or calling 727-803-7988 with the nature of your request and contact information. Please do not direct any other general employment related questions to this e-mail or phone number. Please note that only those inquiries concerning a request for reasonable accommodation will be responded to. #whereyoubelong #AWorldofPossibilities

United States
$126K - $227K / year
DMI (Digital Management, LLC) logo

Senior Cloud Infrastructure Architect

DMI (Digital Management, LLC)

At the Intersection of Public and Private Sectors

Full TimeRemoteTeam 1,001-5,000Since 2002H1B No Sponsor

• Designs, engineers, and implements cloud and hybrid infrastructure solutions for TSA • Translates enterprise architecture guidance and TSA mission requirements into detailed infrastructure designs • Produces comprehensive implementation documentation • Executes cloud migration, modernization, and optimization projects • Designs cloud landing zones, network architectures, and identity and access management configurations aligned with DHS Zero Trust principles • Supports integration of cloud platforms with TSA’s on-premises systems and third-party services • Performs infrastructure-as-code development using tools such as Terraform or CloudFormation • Leads pre-production validation of cloud environments • Supports 14-day operational testing periods post-transition • Contributes to cloud cost optimization and rightsizing assessments • Provides technical input to lifecycle replacement strategies for infrastructure components

United States
Job Closed