ML Infrastructure Engineer – Early Career/Internship

Infrastructure EngineerInfrastructure EngineerInternshipRemoteEntry LevelTeam 5,001-10,000Since 2005H1B SponsorCompany SiteLinkedIn

Location

Washington

Posted

4 days ago

Salary

$112.7K - $169K / year

Seniority

Entry Level

Job Description

ML Infrastructure Engineer – Early Career/Internship

Unity

• Build and maintain data pipelines that generate training datasets for machine learning models and experimentation • Contribute to infrastructure that supports distributed training workflows (e.g., PyTorch, Ray) • Work with workflow orchestration tools (e.g., Airflow, Flyte, or similar) to support multi-stage ML pipelines • Improve reproducibility and reliability through dataset validation, monitoring, and testing • Partner with ML engineers to support experimentation and model iteration • Help optimize performance and efficiency across data processing and training systems • Contribute to the evolution of our offline ML platform architecture as it scales

Job Requirements

  • Bachelor’s degree in Computer Science, Machine Learning, Systems, or a related field
  • Strong foundation in machine learning systems, distributed systems, or large-scale data processing (through research or projects)
  • Experience with Python and working with data-intensive workloads
  • Familiarity with ML frameworks (e.g., PyTorch, TensorFlow) and/or distributed systems (e.g., Ray, Spark)
  • Experience (academic or applied) with data pipelines, model training workflows, or large datasets
  • Strong problem-solving skills and ability to translate research ideas into practical systems
  • Interest in building scalable, reliable infrastructure for machine learning
  • Nice to Have
  • Experience with workflow orchestration systems (Airflow, Flyte, etc.)
  • Exposure to large-scale data platforms (data lakes, warehouses, streaming systems)
  • Publications or research in ML systems, distributed systems, or related areas

Benefits

  • Comprehensive health, life, and disability insurance
  • Commute subsidy
  • Employee stock ownership
  • Competitive retirement/pension plans
  • Generous vacation and personal days
  • Support for new parents through leave and family-care programs
  • Office food snacks
  • Mental Health and Wellbeing programs and support
  • Employee Resource Groups
  • Global Employee Assistance Program
  • Training and development programs
  • Volunteering and donation matching program

Related Categories

Related Job Pages

More Infrastructure Engineer Jobs

Hadrian logo

Infrastructure Engineer

Hadrian

Digital security insights from a hacker’s perspective

Full TimeRemoteTeam 51-200Since 2021H1B No Sponsor

• Design, build, and evolve the cloud-native platform that powers offensive security operations • Enable engineering teams to move faster, deploy safely, and operate reliably • Create scalable self-service infrastructure and resilient platform capabilities • Help implement platform standards, security controls, and engineering best practices • Collaborate closely with software engineers to improve deployment workflows, developer experience, and platform adoption • Manage and improve the Kubernetes environment, ensuring reliability, performance, and security

Netherlands

Role Description We are seeking an AI Data Infrastructure Engineer to build and operate the large-scale data systems that power modern AI training and evaluation pipelines. The role combines deep data engineering expertise with a strong understanding of AI workloads, focusing on ingestion, transformation, quality assurance, lineage, and high-throughput delivery of data to training jobs across diverse modalities. The ideal candidate has experience operating petabyte-scale data systems, strong software engineering fundamentals, and clear understanding of how data infrastructure choices propagate into model quality and training efficiency. Key Responsibilities - Design and operate large-scale data pipelines supporting AI training, evaluation, and continual improvement workflows. - Build ingestion systems for diverse modalities including text, image, audio, video, and structured signals. - Implement data cleaning, deduplication, filtering, and quality assurance at petabyte scale. - Develop dataset versioning, lineage, and provenance tracking systems suitable for reproducible training. - Build high-throughput data loading systems that maximize GPU utilization during training. - Implement labeling workflows, active learning pipelines, and human-in-the-loop data improvement systems. - Design storage architectures balancing cost, throughput, and latency across data tiers. - Build evaluation dataset construction pipelines with strict integrity and contamination controls. - Implement data privacy, redaction, and consent enforcement throughout the pipeline. - Collaborate with ML researchers and engineers to align data systems with model development needs. - Drive observability of data quality, drift, and pipeline health across the AI data estate. - Optimize cost and performance through compression, format selection, and caching strategies. - Document data systems, schemas, and operational procedures for broad internal use. - Stay current with AI data infrastructure research and emerging open-source tools. Qualifications - Bachelor’s or Master’s degree in Computer Science or a related field. - Six or more years of data engineering experience, with significant work supporting ML or AI workloads. - Strong proficiency in Python and at least one JVM or systems language. - Deep experience with modern data processing frameworks such as Spark, Ray, or Beam. - Hands-on experience operating petabyte-scale storage and pipeline systems. - Strong understanding of distributed systems, data modeling, and storage formats. - Experience with dataset versioning, lineage, and reproducibility for ML workflows. - Familiarity with high-throughput data loading for accelerator-based training. - Strong software engineering practices including testing, CI/CD, and code review. - Excellent communication and cross-functional collaboration skills. Preferred Qualifications - Experience with multimodal datasets at large scale. - Familiarity with data quality tooling and dataset evaluation methodology. - Exposure to privacy-preserving data systems and regulated data handling. - Open-source contributions to data infrastructure projects. - Experience supporting frontier model training pipelines. How to Apply Would you like to know more about this opportunity? For immediate consideration, please send your resume to [email protected] or contact us at (908) 505-3545. Learn more about Bright Vision Technologies at www.bvteck.com .

United States
$100K - $150K / year
Job Closed

Role Description We are seeking an experienced Oracle Cloud Infrastructure (OCI) Engineer to architect, deploy, and operate large-scale, secure, and resilient cloud environments on Oracle’s next-generation public cloud. In this role you will own the full OCI engineering lifecycle, from landing-zone design and infrastructure-as-code, through workload migration and ongoing operations, including cost optimization and observability. The ideal candidate will combine deep technical knowledge of OCI services with strong DevOps and automation fundamentals, and will partner with database, application, and security teams to deliver well-engineered cloud platforms that support mission-critical Oracle and non-Oracle workloads. Key Responsibilities - Architect, deploy, and operate enterprise-grade OCI landing zones, compartments, and tenancy structures aligned with internal governance and compliance standards. - Design and implement OCI networking topologies, including VCNs, subnets, gateways, FastConnect, IPSec VPN, and DRG-based hub-and-spoke architectures. - Build and maintain OCI infrastructure-as-code using Terraform and OCI Resource Manager, with version control, peer review, and automated validation. - Implement IAM policies, federation with corporate identity providers, and strong RBAC patterns using OCI Identity Domains. - Design and operate OCI compute (VMs, bare-metal, OKE for Kubernetes) and storage services (Object, Block, File, Archive) optimized for workload patterns. - Operate OCI database services, including Autonomous Database, Exadata Cloud, MySQL HeatWave, and Base Database, in close partnership with DBA teams. - Build CI/CD pipelines on OCI DevOps service or external platforms (Jenkins, GitHub Actions) for both infrastructure and applications. - Implement observability using OCI Monitoring, Logging, APM, and integrations with third-party observability stacks. - Drive ongoing cost optimization through right-sizing, lifecycle management, and architectural simplification. - Design disaster-recovery and business-continuity solutions across OCI regions and availability domains. - Strengthen security posture using OCI Cloud Guard, Security Zones, Vault, and Data Safe, and lead remediation of findings. - Advise application development teams on OCI-native design patterns and best practices. - Develop automation tooling in Python, Bash, and Terraform to reduce operational toil. - Mentor junior engineers and contribute to internal cloud communities of practice. Qualifications - Bachelor’s degree in Computer Science, Engineering, or a related technical discipline. - Five or more years of cloud engineering experience, with strong hands-on time on OCI. - Production-level experience with infrastructure-as-code using Terraform. - Solid experience with OCI compute, networking, storage, identity, and database services. - Hands-on experience with Oracle Container Engine for Kubernetes (OKE). - Experience with CI/CD pipelines for cloud workloads. - Strong scripting skills in Python and Bash. - Deep understanding of cloud security, IAM, and compliance requirements. - Experience with observability tooling and incident response. - Excellent troubleshooting, communication, and documentation skills. Preferred Qualifications - Oracle Cloud Infrastructure Architect Professional certification. - Experience operating Oracle Exadata Cloud or Autonomous Database at scale. - Exposure to multi-cloud architectures spanning OCI and AWS/Azure/GCP. - Familiarity with FinOps practices on OCI. - Experience operating regulated workloads on OCI. How to Apply Would you like to know more about this opportunity? For immediate consideration, please send your resume to [email protected] or contact us at (908) 505-3545. Learn more about Bright Vision Technologies at www.bvteck.com .

United States
$100K - $150K / year
Job Closed
Booker DiMaio logo

Cloud Infrastructure Architect, AWS

Booker DiMaio

Engineering Innovation and Transformation

Full TimeRemoteTeam 11-50H1B No Sponsor

• Design and maintain AWS cloud infrastructure supporting enterprise-scale data platform operations. • Develop architecture solutions utilizing VPCs, subnets, routing, security groups, PrivateLink, and other AWS networking services. • Support high availability, disaster recovery, backup, and business continuity strategies. • Implement Infrastructure as Code solutions utilizing Terraform and related automation tools. • Collaborate with cloud engineers, data engineers, and cybersecurity specialists to deliver secure and scalable solutions. • Support platform modernization, performance optimization, capacity planning, and infrastructure lifecycle management. • Develop and maintain technical architecture documentation, standards, and operational procedures. • Participate in architecture reviews, deployment planning, and operational readiness activities. • Support monitoring, observability, and operational support initiatives across cloud environments.

Maryland