Lambda

Designing the world's most advanced GPU systems for Deep Learning.

Data Center Facility Telemetry & Controls Engineer

EngineerEngineerFull Time Remote LeadTeam 51-200H1B SponsorCompany Site LinkedIn

Location

California

Posted

2 days ago

Salary

$185K - $290K / year

Seniority

Lead

Bachelor Degree7 yrs expEnglishGrafana Prometheus

Job Description

• Architect and manage BMS integration across colocation and Lambda-owned facilities, covering chillers, CRAHs, CDUs (Coolant Distribution Units), cooling towers, UPS systems, PDUs, and automatic transfer switches. • Define standards for BMS point lists, naming conventions, control sequences, and integration protocols (BACnet, Modbus, SNMP, OPC-UA, RESTful APIs). • Oversee commissioning and acceptance testing of new BMS deployments and CDU/TCS loop integrations for next-generation liquid-cooled GPU rack systems. • Collaborate with colocation partners (Equinix, Digital Realty, and others) to ensure telemetry data flows from provider BMS/EPMS into Lambda's monitoring stack. • Own the DCIM platform strategy and roadmap — evaluating, selecting, and implementing tooling for asset management, capacity planning, environmental monitoring, and power chain visibility. • Develop and maintain real-time dashboards for PUE, thermal performance, stranded capacity, and cooling system efficiency across all Lambda sites. • Build and maintain telemetry pipelines ingesting data from BMS, PDUs, in-rack sensors, CDUs, and network devices into centralized monitoring and alerting platforms (e.g., Prometheus, Grafana, InfluxDB, or equivalent). • Define alarm thresholds and escalation workflows for critical facility events including high coolant temperatures, CDU inlet/outlet anomalies, leak detection, and power exceedances. • Develop control strategies and setpoint frameworks for TCS (Thermal Control System) loops supporting direct liquid cooling at densities of 220–380 kW per rack. • Evaluate and qualify CDU vendors on controls integration capabilities, telemetry exposure, and remote management interfaces. • Define and enforce operational procedures for CDU commissioning, setpoint changes, loop pressure management, and fluid quality monitoring. • Support design and construction coordination for liquid cooling infrastructure in new data center buildouts, ensuring BMS and controls readiness at Day 1. • Establish and maintain facility event management processes, including on-call response protocols for facility telemetry anomalies. • Lead root cause analysis for facility system failures and implement corrective actions to prevent recurrence. • Partner with the data center operations team to maintain and refine emergency response runbooks tied to BMS alerts and automated controls. • Drive continuous improvement in MTTR for facility-related events through better telemetry coverage and automated remediation. • Manage BMS integrators, DCIM vendors, and control subcontractors - from RFP through design, installation, commissioning, and ongoing support. • Serve as the primary technical interface with colocation providers on all BMS/EPMS integration topics. • Collaborate with Lambda's infrastructure engineering, construction, and procurement teams to align controls requirements with facility buildout timelines. • Support due diligence and technical evaluation for new colocation sites and modular data center deployments from a telemetry and controls readiness perspective.

Job Requirements

7+ years of experience in data center infrastructure engineering, with at least 4 years focused on BMS, DCIM, or controls systems in a hyperscale, colocation, or AI/HPC environment.
Hands-on experience designing and integrating BMS for mission-critical facilities including UPS, PDU, CRAH/CRAC, chiller plant, cooling tower, and liquid cooling (CDU/in-row) systems.
Strong working knowledge of industrial control protocols: BACnet IP/MS-TP, Modbus TCP/RTU, SNMP, DNP3, and modern API-based integrations.
Demonstrated experience with DCIM platforms (Nlyte, Sunbird, Vertiv TRELLIS, or equivalent) including deployment, configuration, and ongoing administration.
Experience with real-time telemetry stacks (Prometheus, InfluxDB, Grafana, or similar) applied to infrastructure monitoring use cases.
Strong understanding of data center power and cooling systems, including PUE optimization, thermal management, and redundancy architectures (2N, N+1).

Benefits

Opportunity to shape the telemetry and controls architecture for one of the fastest-growing AI infrastructure platforms in the industry.
Work with cutting-edge GPU infrastructure at rack densities at the frontier of what the industry has deployed.
Collaborative environment with experienced infrastructure, construction, and vendor teams across a rapidly scaling global portfolio.
Competitive compensation including salary, equity, and comprehensive benefits.
Flexibility in work location with hybrid/remote options depending on facility portfolio needs.

Related Categories

Engineer

Related Job Pages

Engineer Jobs in California Remote Full-time Jobs (US)More Remote Jobs

More Engineer Jobs

Forward Deployed Engineer

Alteryx

Engineer2 days ago

Full Time RemoteTeam 1,001-5,000H1B Sponsor

Company Site LinkedIn

Role Description The Forward Deployed Engineer (FDE) is a senior, hands-on technical role embedded with Alteryx’s most strategic enterprise customers and partners. Your mission is to make Alteryx One a core part of modern data and AI architectures by turning existing analytics workflows into AI-ready, cloud-native solution patterns that scale through partners. This is not a traditional professional services or slideware architect role. As an FDE, you will design, build, and validate real production systems, then package what works into repeatable patterns that partners can sell and deliver independently. What You’ll Do - Design and build AI-ready data architectures using Alteryx One, anchored to cloud data platforms such as Snowflake, BigQuery, Databricks, and AWS - Convert existing analytics workflows into deployable solution patterns that can be reused across customers - Deliver lighthouse customer deployments that move from pilot to production in ~90 days - Partner closely with systems integrators and consulting partners to enable partner-led delivery - Work alongside Sales Engineers and Product teams to validate solutions in production and provide structured product feedback - Create documentation, reference architectures, and playbooks so solutions can scale without ongoing FDE involvement What You’ll Own - Technical leadership from architecture → pilot → production → partner handoff - Architecture decisions for cloud data platform and analytics modernization - Creation of repeatable patterns that drive faster time-to-value, expansion, and consumption - Clear exit criteria for every engagement—success is partner-led delivery, not long-term dependency What Success Looks Like - Success in this role is measured by what reaches production, what scales, and what continues without your direct involvement. - Ship production systems - Move enterprise customers from existing analytics workflows to AI-ready, cloud-native architectures built on Alteryx One - Take solutions from pilot to production in ~90 days - Create scalable solution patterns - Design architectures, workflows, and deployment models that are reused across customers - Establish repeatable patterns that become reference implementations - Enable partner-led delivery - Hand off solutions to systems integrators and consulting partners with clear ownership and exit criteria - Ensure partners can sell, implement, and extend solutions independently - Drive faster customer outcomes - Reduce time-to-production for analytics and AI initiatives - Deliver measurable impact, including cost reduction, faster insights, and improved operational efficiency - Shape the product with real-world evidence - Provide clear, structured feedback based on live deployments - Influence roadmap and product direction by validating solutions in production What You Bring - 7+ years building data, analytics, or platform systems in production - Hands-on experience with at least one major cloud data platform (Snowflake, BigQuery, Databricks, Redshift, or AWS) - Experience working directly with enterprise customers in a technical, field-facing role Technical Skills - Strong background in analytics workflows, data pipelines, or analytics engineering - Comfortable writing production workflows and lightweight code (primarily Python and SQL; APIs as needed) - Solid understanding of cloud architecture, governance, and cost tradeoffs Ways of Working - Builder mindset—you prefer shipping real systems over creating slides - Comfortable operating in ambiguity and giving candid technical guidance - Motivated by scale, reuse, and partner leverage rather than one-off wins Compensation - Base salary range for this role in the United States is $202,500-$225,000 with On-Target-Earnings range of $270,000-$300,000. - A monthly Connectivity Plus stipend of $150 to support remote work-related expenses - An annual $200 home office reimbursement Benefits - Medical, dental, and vision coverage - 401(k) with company match - Paid parental leave, caregiver leave, and flexible time off - Mental health support and wellness reimbursement - Career development and education assistance

Alteryx AI Snowflake BigQuery Databricks AWS Amazon Redshift Python SQL

View details: Forward Deployed Engineer

United States

$202.5K - $225K / year

Apply

Agent Quality / Evals Engineer

SOFTGIC

Engineer2 days ago

Full Time Remote

Role Description This is a remote position. Owns the eval harness and quality gate from the beginning. This role replaces the old late-stage “Evals Specialist” model with a standing owner for measurable agent quality. Key Responsibilities - Build and maintain the MVP eval harness: golden tasks, exception tasks, scorecard metrics, and regression packs. - Wire evals into CI so quality regressions fail builds and releases. - Define and maintain release-gate thresholds with Product and the Tech Lead. - Lay the path for later adversarial and drift-testing expansion without overbuilding MVP scope. Qualifications - Experience evaluating ML, LLM, or non-deterministic systems. - Strong test and benchmark design capability. - Comfort working with noisy metrics, thresholds, and probabilistic behavior. - Good scripting and automation skills. Requirements - Uses AI to generate candidate eval cases and failure hypotheses, but never confuses generated tests with validated quality. - Approaches AI quality as an operating system, not a QA afterthought. What Success Looks Like in the First 90 Days - The first reference agent has a published scorecard and gated eval path. - Golden and exception tests run automatically. - The team can explain what “good enough to ship” means in measurable terms.

AI/ML LLM AI

View details: Agent Quality / Evals Engineer

Worldwide

$2K / month

Apply

Full Stack Engineer

Mercor

Cincinnatus is an enterprise staffing company that partners with leading technology companies to source and employ highly skilled professionals for full-time and long-term contingent roles. Cincinnatus serves as the employer of record for these engagements, providing W-2 employment, payroll, benefits, and compliance, while placing employees directly within client teams to work on high-impact initiatives. Roles hired through Cincinnatus are not project-based or freelance engagements. They are structured, role-based positions that typically involve full-time or fixed-term commitments, close collaboration with a client's internal teams, and integration into standard enterprise workflows. Cincinnatus is a legal entity separate from Mercor. While opportunities may be discovered through Mercor's platform, employment, onboarding, payroll, and benefits for these roles are administered by Cincinnatus. Equal Employment Opportunity Cincinnatus is proud to be an Equal Employment Opportunity employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, reproductive health decisions, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, genetic information, political views or activity, or any other legally protected characteristic. Cincinnatus is committed to providing reasonable accommodations for qualified individuals with disabilities and disabled veterans throughout the job application process.

Engineer2 days ago

Full Time RemoteH1B No Sponsor

Role Description Mercor connects elite creative and technical talent with leading AI research labs. Headquartered in San Francisco, our investors include Benchmark, General Catalyst, Peter Thiel, Adam D'Angelo, Larry Summers, and Jack Dorsey. Position: Senior Software Engineer Expert Type: Contract Compensation: $80–$130/hour Location: Remote Role Responsibilities - Design, develop, and maintain full-stack applications using Python. - Build and integrate APIs and MCP-based systems. - Collaborate closely with engineering and product teams to enhance system scalability. - Write clean, scalable, and production-ready code for robust application performance. - Troubleshoot, optimize, and improve application performance. - Participate in architecture and technical design discussions to drive innovation. Qualifications - Multiple years of experience as a Full Stack Developer. - Strong proficiency in Python. - Experience with MCP and API development. - Ability to work at least 8 hours per day. - Experience building and maintaining production-grade applications. - Strong problem-solving and communication skills. Application Process - Upload resume - AI interview based on your resume - Submit form Resources & Support - For details about the interview process and platform information, please check: Interview Process - For any help or support, reach out to: support@mercor.com - Our team reviews applications daily. Please complete your AI interview and application steps to be considered for this opportunity.

AI Python

View details: Full Stack Engineer

United Kingdom

$80 - $130 / hour

Apply

Chemical Safety Expert

Mercor

Engineer2 days ago

Part Time RemoteH1B No Sponsor

Role Description Mercor connects elite creative and technical talent with leading AI research labs. Headquartered in San Francisco, our investors include Benchmark, General Catalyst, Peter Thiel, Adam D'Angelo, Larry Summers, and Jack Dorsey. Position: Chemistry Expert (PhD) — AI Safety Type: Contract Compensation: $65–$70/hour Location: Remote Commitment: 15–40 hours/week Role Responsibilities - Write expert-level prompts across specialized chemistry topics. - Evaluate and annotate model responses for scientific accuracy, helpfulness, and appropriate handling of sensitive content. - Apply structured guidelines to classify prompts and conversations. - Collaborate with AI research teams to improve model training and safety. - Work independently and asynchronously to meet deadlines and enhance AI model performance. Qualifications - Must-Have: PhD in chemistry or a closely related field — ongoing or completed. - Deep familiarity with modern laboratory and computational techniques in your subfield. - Strong scientific reasoning and writing in English. - Sound judgment around chemical safety and the responsible handling of dual-use information. - Preferred: Research or coursework on chemical safety/security, synthesis, or hazardous-materials handling. - Experience reviewing, grading, or red-teaming technical content. Requirements - Start Date: Immediate Application Process - Upload resume - AI interview based on your resume - Submit form Resources & Support - For details about the interview process and platform information, please check: Interview Process - For any help or support, reach out to: support@mercor.com - PS: Our team reviews applications daily. Please complete your AI interview and application steps to be considered for this opportunity.

View details: Chemical Safety Expert

United States

$65 - $70 / hour

Apply

Data Center Facility Telemetry & Controls Engineer

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More Engineer Jobs

Forward Deployed Engineer

Agent Quality / Evals Engineer

Full Stack Engineer

Chemical Safety Expert