#1 AI DIGITAL Logistics TMS - maximum customer experience artificial intelligence freight matching-global visibility
Senior/Principal Performance Engineer
Location
United States
Posted
29 days ago
Salary
0
Seniority
Senior
Job Description
Senior/Principal Performance Engineer
TMSfirst
• Design, develop, and maintain comprehensive benchmarking frameworks spanning OS, kernel, and application layers. • Profile workloads across CPU, memory, I/O, network, and accelerator (GPU/NPU) subsystems to identify bottlenecks and optimization opportunities. • Establish and own performance baselines across CIQ's product and solutions portfolio. • Leverage AI-assisted tooling and agentic workflows to accelerate profiling, analysis, and root cause identification. • Build and maintain automated performance regression-detection pipelines integrated into CI/CD workflows using Fuzzball. • Identify, triage, and resolve regressions across user space, kernel space, and application layers with urgency and rigor. • Collaborate across engineering teams to root-cause regressions introduced by upstream kernel changes, compiler updates, or library modifications. • Drive proactive performance improvements - not just reactive fixes - to keep CIQ solutions ahead of the competition across every layer of the stack. • Own core operating system performance: kernel subsystem tuning (scheduler, memory management, I/O, networking), system call overhead reduction, and user space library and runtime optimizations. • Identify and implement kernel-level enhancements, including patches, configuration changes, and upstream contributions that yield measurable performance gains for CIQ's customer workloads. • Optimize for AI inference and training workloads, including LLM serving, model parallelism, and accelerator utilization. • Tune performance for HPC workloads, including modeling, simulation, and tightly coupled parallel applications (MPI, OpenMP, etc.). • Optimize general computing and service workloads - web services, databases, messaging systems, and other production software that runs on CIQ's OS platform. • Work at all levels of the stack: compiler flags, kernel parameters, scheduler tuning, NUMA topology, memory allocation, and application-level algorithmic improvements. • Champion an AI-first engineering philosophy - use AI tools, agents, and automation to accelerate your own productivity and the quality of performance insights. • Identify and prioritize optimization opportunities that directly impact AI training throughput and inference latency/cost. • Stay current on state-of-the-art techniques in ML system performance, including quantization, batching strategies, kernel fusion, and hardware-software co-design. • Develop deep expertise in CIQ's Fuzzball platform - its architecture, scheduling, and workload execution model. • Integrate performance benchmarks, regression tests, and user-facing workloads into Fuzzball-based pipelines. • Contribute to the performance characterization of Fuzzball itself, ensuring the platform adds minimal overhead and scales efficiently. • Develop broad familiarity with the full CIQ product portfolio — including Rocky Linux and RLC (and its variants), Fuzzball, Apptainer (formerly Singularity), and Warewulf - understanding how performance considerations span and interconnect across each. • Collaborate deeply with the engineering teams behind each product line to surface, prioritize, and deliver performance improvements that benefit customers across the entire CIQ ecosystem. • Partner with product and customer success teams to translate real-world performance pain points into engineering priorities and measurable outcomes. • Document and communicate findings clearly - from low-level profiling data to executive-level summaries. • Contribute to technical publications, conference presentations, and thought leadership that reinforces CIQ's reputation for performance excellence.
Job Requirements
- A deep, principled understanding of operating system internals - Linux kernel scheduler, memory subsystem, I/O stack, and networking.
- Proven experience identifying and resolving performance regressions across kernel and user space in production environments.
- Hands-on expertise with profiling and tracing tools: perf, eBPF/bpftrace, Flamegraphs, VTune, Nsight, strace, ftrace, and similar.
- Strong background in AI/ML workload performance - including inference optimization (TensorRT, ONNX, vLLM, or similar), training efficiency, and GPU/accelerator utilization.
- Experience with HPC workloads: MPI, OpenMP, parallel filesystems, RDMA/InfiniBand, and job schedulers (Slurm, PBS, etc.).
- Familiarity with modern AI-first development workflows and comfort using LLM-based tools to accelerate engineering work.
- Experience building automated performance testing and regression detection pipelines in CI/CD environments.
- Excellent analytical skills - able to form hypotheses, design experiments, and draw actionable conclusions from complex data.
- Strong written and verbal communication skills; able to present findings to both deeply technical audiences and business stakeholders.
- A collaborative, humble, and always-learning mindset - combined with the confidence to champion performance as a first-class engineering concern.
Benefits
- Medical, dental, and vision insurance.
- Flexible paid time off.
- Employee stock options.
- Remote work; no travel required for most positions.
Related Guides
Related Categories
Related Job Pages
More Engineer Jobs
Sr. Controls Engineer
Terabase EnergyA solar technology company whose mission is to reduce the cost and increase the scalability of large-scale solar.
Role Description The Sr. Controls Engineer – OT SCADA Projects leads the design, configuration, commissioning, and support of plant control systems for utility-scale solar, storage, and hybrid renewable energy projects. This engineer works with minimal oversight, applies expert knowledge of grid functionality and Utility/ISO standards, mentors junior engineers, and contributes to product standards. Approximately 80% of this role is project execution, while up to 20% is continuous improvement and product development. Responsibilities - Project Execution & Technical Delivery (~80%) - Lead end-to-end controls design for utility-scale solar, BESS, and hybrid projects – from initiation through commissioning and closeout. - Program and commission SEL controllers using AcSELerator; develop control logic in Codesys using IEC 61131-3 structured text. - Identify project-specific deviations from standard product scope during contracting, forecast and scope project-specific development work. - Implement and validate closed-loop Active Power and Reactive Power, AVR and PFR Algorithms. - Maintain version control for all code artifacts according to established version control procedure. - Troubleshoot complex SCADA and controls issues using Wireshark, breakpoints, cross-reference, and watch list tools. - Produce project deliverables: System Architecture Diagrams, Control Narratives, Logic Diagrams, commissioning documents, and operator manuals. - Product & Process Improvement (~20%) - Contribute to new feature development and bug fixes in collaboration with the Product Engineering team. - Review documentation prepared by junior controls engineers (technical and non-technical). - Lead the Continuous Improvement and Lessons Learned program, feed field insights back into product standards and templates. - Coach junior engineers through FAT preparation and customer-facing presentation. - Stakeholder Communication & Collaboration - Serve as primary controls technical contact for EPCs, asset owners, and grid operators through project execution, FAT, and commissioning. - Lead FATs as formal presentations; communicate to non-technical audiences with supporting materials prepared in advance. - Flag technical risks and schedule pressures to management with context and proposed solutions. - Maintain Jira tickets daily with thorough detail; enforce Jira best practices with junior engineers. Expectations & Success Indicators - Deliver high-quality work independently across multiple concurrent projects with ownership and urgency. - Leverage standardized platforms and tools; avoid project-specific one-off engineering approaches. - Ensure 100% adherence to Terabase quality processes; enforce standards with junior engineers. - Mentor junior controls engineers through technical guidance, code review, and FAT coaching. - Project deliverables completed on time, within scope, and meeting Utility/ISO regulatory standards. - Jira, version control, and documentation consistently maintained without follow-up from management. - Recognized internally and externally as the go-to technical authority on Terabase SCADA and OT controls. - Travel up to 10% for on-site commissioning, FAT, and customer engagements. Qualifications - Bachelor’s degree in Engineering, Computer Science, Technology, or related field. - 3-5+ years of IEC 61131-3/PLC programming experience, preferably in Codesys or AcSELerator environment. - 3+ years of utility-scale power plant controls experience (solar, BESS, or hybrid). Requirements - Expert IEC 61131-3 programming; primary tooling is AcSELerator (SEL controllers) and Codesys, including Diagram Builder, traces, breakpoints, and watch lists. - Extensive knowledge of industrial protocols: Modbus-TCP, DNP3, OPC-UA, etc. - Expert knowledge of grid functionality: PFR, AVR, Reactive Power, Voltage Regulation, and Capacitor Banks – including the underlying grid rationale, not just controller behavior. - Proficiency with Utility/ISO testing and interconnection requirements (ERCOT, PJM, BPA, IEEE 2800, NERC, etc.). - Experience with Power Plant Controller (PPC) design, configuration, and commissioning. - Familiarity with PSCAD, PSSE, and/or TSAT modeling processes for utility-scale sites. Benefits - Generous time off and holiday policy. - Remote flexibility. - Flexible time off. - Comprehensive benefits package. - Career progression. - 401k match. - Stock options. - Home office set up allowance. - And much more!
• Analisar e migrar pipelines e notebooks (Spark/Databricks) • Refatorar ou reescrever processos para: SQL / Dataform e Dataflow • Criar transformações no BigQuery + Dataform • Construir camadas: Silver (Trusted) e Gold • Garantir qualidade, deduplicação e padronização • Implementar ingestão com: Dataflow (Apache Beam) para eventos (Kafka/Event Hubs)Datastream (CDC) • Trabalhar com a persistência de dados na camada Raw utilizando tabelas Iceberg gerenciadas pelo BigLake. • Provisionar recursos com Terraform (IaC) • Gerenciar pipelines com CI/CD (GitHub Actions) • Seguir modelo deIngestion Factory e repositórios por domínio • Implementar testes no Dataform • Garantir: Catalogação e linhagem (Dataplex) Compartilhamento seguro (Analytics Hub)
Senior Manufacturing Engineer – Mass Production Infrastructure
NVIDIANVIDIA is widely considered one of the world's most desirable employers in technology. We have some of the world's most forward-thinking and passionate people working for us. If you're creative and autonomous, we want to hear from you!
• End-to-end management of P-Rel production infrastructures supply chain, from NPI Infrastructure review to Mass Production delivery and full implementation at production sites • Coordinating ongoing production capacity management and support, risk assessment, infrastructure establishment, maintenance activities, budgeting, and process improvements • Serving as the primary contact for multiple production sites to ensure full infrastructure support for NVIDIA Mass Production
Senior Scientist / Engineer – Advanced Oncology, Medical Devices, Diagnostics
Intertwine AssociatesOperational efficiency and trusted teams across science, tech, and government.
• Serve as a senior scientific and technical advisor to the Program Manager, supporting the development and execution of new and existing ARPA‑H R&D programs • Conduct rigorous evaluations of the current scientific and technical landscape across oncology, diagnostics, and medical device development • Identify gaps, risks, and opportunities within research portfolios aligned with ARPA‑H mission objectives • Monitor and assess performer progress against defined technical milestones and quantitative performance metrics • Critically review complex scientific, engineering, and technical data using strong logical and analytical reasoning • Read, synthesize, and distill large volumes of information into concise, executive‑ready briefings for technical and non‑technical audiences • Support program execution through documentation, internal reporting, and coordination across multidisciplinary teams • Perform programmatic support tasks including data collection, analysis, and preparation of internal reports and briefings • Operate effectively both independently and as part of an integrated team supporting ongoing ARPA‑H efforts • Travel within CONUS (~10–35%) to support program reviews, meetings, and stakeholder engagements




