Senior Infrastructure Engineer – Bazel Remote Execution
Location
California
Posted
6 days ago
Salary
$184K - $356.5K / year
Seniority
Senior
Job Description
Senior Infrastructure Engineer – Bazel Remote Execution
NVIDIA
• Craft creative scalable cloud solutions for running millions of jobs, thousands of systems, and petabytes of storage. • Address exciting challenges in infrastructure such as Kubernetes, job scheduling, multi-region services, resource management, and automated recovery. • Create agentic workflows for infrastructure. • Collaborate with customers to understand their needs and develop innovative solutions that cater to their requirements.
Job Requirements
- Proven experience in developing scalable cloud infrastructure solutions from concept to production.
- Background in AI/ML, Data Analytics, and their application in infrastructure.
- Strong background in object-oriented programming, with a preference for Java or Go.
- Ability to collaborate optimally across multiple teams and different time zones.
- Bachelor's degree or equivalent experience.
- 8+ years of experience in infrastructure development.
Benefits
- Equity
- Benefits
Related Guides
Related Categories
Related Job Pages
More Infrastructure Engineer Jobs
Senior Infrastructure Engineer
Definitive Healthcare, USDefinitive Healthcare (NASDAQ: DH) is passionate about turning data, analytics, and expertise into meaningful intelligence that helps our customers achieve success and shape the future of healthcare. We empower them to uncover the right markets, opportunities, and people—paving the way for smarter decisions and greater impact. Headquartered just outside of Boston, Massachusetts. Operates across North America, Europe, and India. Supports a growing global client base of more than 2,400 customers since our founding in 2011. Earned multiple workplace honors, including Built In’s 100 Best Places to Work in Boston (2024 and 2025), a Stevie Bronze Award for Great Employers, and recognition as a Great Place to Work in India. Fosters a collaborative, inclusive culture where diverse perspectives drive innovation.
Role Description We are looking for an experienced and versatile Infrastructure Engineer to join our team. This is a broad, hands-on role for someone who is comfortable operating across the modern infrastructure stack — spanning cloud platforms, virtualization, systems administration, automation, and network engineering. This role replaces a traditional network-only function and reflects how we think about infrastructure today: as an interconnected discipline where networking, computing, security, and automation are inseparable. You will be a key contributor to the reliability, scalability, and security of our platforms, and a go-to escalation point for complex infrastructure challenges. What You'll Do - Cloud & Hybrid Infrastructure - Support the maintenance of both on-prem and cloud infrastructure across AWS, Azure, or GCP including compute, storage, networking, and identity services. - Manage and optimize hybrid connectivity between on-premises environments and cloud platforms (VPN, ExpressRoute/Direct Connect, Transit Gateway). - Govern infrastructure as code (IaC) using tools such as Terraform or Pulumi, ensuring environments are reproducible and version-controlled. - Networking - Participate in ongoing management of network infrastructure including routers, switches, firewalls, and load balancers. - Manage and troubleshoot LAN/WAN, SD-WAN, VPN, BGP/OSPF, and VLAN environments. - Administer DNS, DHCP, and IP address management (IPAM) across hybrid environments. - Review and enforce network security policies, firewall rules, and segmentation strategies in collaboration with the security team. - Systems & Virtualization - Administer virtualized environments (VMware vSphere, Hyper-V, or equivalent) and container platforms (Kubernetes, Docker). - Manage server operating systems at scale across Linux (RHEL/Ubuntu) and Windows Server. - Monitoring, Reliability & Security - Maintain observability tooling including infrastructure monitoring, alerting, and log aggregation (e.g. Datadog, Prometheus/Grafana, Splunk, or similar). - Participate in an on-call rotation for infrastructure-level incidents, driving timely resolution and thorough post-incident reviews. - Contribute to backup, disaster recovery, and business continuity planning and testing. - Partner with the security team on vulnerability management, patching cadences, and hardening standards. - Automation & Continuous Improvement - Identify and drive automation opportunities to reduce toil and improve consistency across the infrastructure estate. - Contribute to capacity planning and infrastructure roadmap discussions. - Mentor P1/P2 engineers and share knowledge through internal documentation and team sessions. Qualifications - 4–7 years of experience in infrastructure, network engineering, or a related discipline. - Strong networking fundamentals and hands-on experience with enterprise network platforms (Cisco, Juniper, Palo Alto, Fortinet, or similar). - Proven experience with at least one major cloud platform (AWS, Azure, or GCP) — ideally with a cloud associate-level certification or equivalent practical experience. - Experience managing Linux and Windows Server environments in production. - Familiarity with containerization concepts and platforms (Docker, Kubernetes). - Experience with monitoring and observability tooling in a production environment. - Strong troubleshooting skills across network, compute, and storage layers. - Comfortable working in an on-call capacity for infrastructure incidents. Nice to Have - Working knowledge of infrastructure as code and configuration management tools (Terraform, Ansible, or similar). - Experience with SD-WAN platforms (e.g. Meraki, Viptela/Cisco SDWAN, VMware VeloCloud). - Scripting proficiency in Python, Bash, or PowerShell for automation. - Exposure to DevOps practices and CI/CD pipelines. - Experience in a zero-trust network architecture project. - Relevant certifications: CCNA/CCNP, AWS Solutions Architect, Azure Administrator (AZ-104), CKA, or equivalent. Compensation and Benefits - The salary range for this position is $115,000 – $173,000 per year, which represents the base pay the company reasonably and in good faith expects to pay for this role. - Actual pay within this range will be determined based on factors such as relevant experience, skills, and qualifications. - Depending on the position, employees may also be eligible to participate in a company bonus or commission plan. - All employees are eligible for a comprehensive benefits package, including medical, dental, and vision coverage, unlimited paid time off, and participation in the company’s 401(k) plan with employer contribution.
Senior Cloud & Infrastructure Engineer
maxRTEIndustry-leading software that helps healthcare systems accelerate their revenue cycle & recuperate uncompensated care.
• Help own the full breadth of AWS cloud environment, network infrastructure, and internal IT operations. • Collaborate closely with platform engineering team to align infrastructure decisions with product goals. • Own and improve Site-to-Site VPN setup, including VPC architecture, route tables, subnets, and security groups for client connectivity. • Design and implement automated client onboarding experiences using templating and Infrastructure as Code. • Harden existing network configurations to improve security posture and reduce manual intervention for each new client connection. • Manage interface infrastructure supporting healthcare data integrations, ensuring availability, performance, and observability. • Continuously monitor and remediate security vulnerabilities across AWS resources — Lambdas, ECR images, EC2 instances, and beyond. • Patch and resolve critical and high-severity vulnerabilities in Python, C#, and Node.js codebases. • Maintain and improve HIPAA-compliant configurations across all cloud resources and services. • Manage device security, enrollment, and lifecycle including remote wipe capabilities and MDM tooling. • Own identity and access management across AWS (IAM), internal applications, and SaaS tools. • Support onboarding and offboarding workflows including provisioning and deprovisioning accounts, devices, and access.
Cloud Infrastructure Engineer
RefinedScienceAdvance care by bringing together the best science, data and minds to discover pathways to life beyond disease.
• Design and implement cloud infrastructure on GCP using infrastructure as code • Manage cloud networking components including VPCs, load balancers, DNS, Cloud Router, NAT, and firewall rules • Manage and optimize cloud compute resources including GCE instances, Cloud Run, and related services • Build and maintain CI/CD pipelines using tools such as GitHub Actions or Google Cloud Build to support reliable, repeatable deployments • Design and maintain observability infrastructure including metrics collection, log aggregation, and dashboards to surface actionable insights for engineering and research teams • Implement and maintain security and compliance controls appropriate to a regulated healthcare research environment • Manage cloud services including backup and disaster recovery • Support deployment and maintenance of internal applications and their underlying infrastructure • Champion containerization best practices and support containerized workload deployments • Collaborate with cross-functional teams to ensure seamless integration with existing systems and workflows • Troubleshoot and resolve cloud infrastructure and deployment issues • Create and maintain clear technical documentation, runbooks, and architecture diagrams • Stay current with cloud technology trends and evaluate new tools and approaches relevant to the organization
Senior Cloud & Infrastructure Engineer
maxRTEIndustry-leading software that helps healthcare systems accelerate their revenue cycle & recuperate uncompensated care.
Role Description As a Senior Cloud & Infrastructure Engineer at maxRTE, you will help own the full breadth of our AWS cloud environment, network infrastructure, and internal IT operations. This is a high-impact, variety-filled role at a strong health tech company where your work will directly shape the security, reliability, and scalability of systems that healthcare providers and patients depend on every day. You will make a significant positive impact across networking, security, cost optimization, and developer tooling while collaborating closely with our platform engineering team to align infrastructure decisions with product goals. If you thrive in environments where no two days look the same and you take genuine pride in making systems cleaner, tighter, and more automated than you found them - we'd love to hear from you. - Networking & Client Onboarding - Own and improve Site-to-Site VPN setup, including VPC architecture, route tables, subnets, and security groups for client connectivity. - Design and implement automated client onboarding experiences using templating and Infrastructure as Code. - Harden existing network configurations to improve security posture and reduce manual intervention for each new client connection. - Help manage interface infrastructure supporting healthcare data integrations, ensuring availability, performance, and observability. - Cloud Security & Vulnerability Management - Continuously monitor and remediate security vulnerabilities across AWS resources — Lambdas, ECR images, EC2 instances, and beyond — using AWS Inspector, Security Hub, and related tooling. - Patch and resolve critical and high-severity vulnerabilities in Python, C#, and Node.js codebases; escalate larger code changes to platform or product engineers as needed. - Drive meaningful, measurable reductions in our vulnerability count over time through proactive hygiene, dependency management, and tooling improvements. - Maintain and improve HIPAA-compliant configurations across all cloud resources and services. - IT Administration - Manage device security, enrollment, and lifecycle including remote wipe capabilities and MDM tooling (experience with Rippling a plus). - Own identity and access management across AWS (IAM), internal applications, and SaaS tools — enforcing least-privilege and RBAC principles. - Administer device vulnerability scanning and ensure endpoint compliance for a distributed remote workforce. - Support onboarding and offboarding workflows including provisioning and deprovisioning accounts, devices, and access across all platforms. - Developer Tooling & Cloud Operations - Monitor, analyze, and optimize AWS spend across all services, identifying cost reduction opportunities without sacrificing reliability. - Evaluate and improve existing cloud integrations and identify new tooling that meaningfully improves developer or operational efficiency. - Streamline repetitive infrastructure processes through automation, scripting, and templating. - Serve as a go-to resource for the engineering team on AWS resource questions, environment access issues, and infrastructure debugging. Qualifications - 5+ years of experience in cloud infrastructure, DevOps, or a related engineering role. - Deep hands-on experience with AWS: VPC, Site-to-Site VPN, EC2, Lambda, ECR, IAM, CloudFormation, CloudWatch, Security Hub, and related services. - Proven ability to design and implement Infrastructure as Code (CloudFormation, Terraform, or equivalent). - Experience identifying, triaging, and remediating security vulnerabilities across cloud resources and application code. - Ability to read, understand, and make targeted fixes in Python, C#/.NET, and/or Node.js codebases. - Experience managing IT administration for a distributed team: MDM/device management, identity providers, and RBAC. - Strong written and verbal communication skills — able to translate technical findings into clear action items for engineering and leadership. - Comfort working autonomously in a small-team environment with broad, varied ownership. - Understanding of HIPAA compliance requirements and how they apply to cloud infrastructure. Requirements - Nice to have: - Experience with Rippling for device management, identity, and HR/IT workflows. - Familiarity with healthcare revenue cycle management or clinical data infrastructure. - Experience with RabbitMQ, ECS/EKS, or containerized workload management. - Background working in a SaaS or health tech environment. Benefits - Competitive salary and Performance-based bonus. - Team bonding and off-site events 2x per year. - Unlimited paid time off. - Dental, Vision, Health, and Life Insurance. - 401(k) plan.


