Semtech logo
Semtech

Technology for a Better World

Staff Engineer, Integration Software

Solutions EngineerSolutions EngineerFull TimeRemoteLeadTeam 1,001-5,000Since 1960H1B SponsorCompany SiteLinkedIn

Location

Minnesota

Posted

6 days ago

Salary

$105K - $115K / year

Seniority

Lead

Bachelor Degree5 yrs expEnglishCloudERPGroovyJavaSOAP

Job Description

Staff Engineer, Integration Software

Semtech

• Analyze and document existing on-premise middleware interfaces (integration flows, message mappings, adapter configurations, and orchestration processes) for cloud migration readiness • Design, develop, and deploy integration flows on a cloud-based iPaaS (Integration Platform as a Service) environment • Convert legacy message mappings — including graphical, XSLT, and Java-based mappings — to cloud-compatible formats • Migrate adapter configurations (IDoc, SOAP, REST, SFTP, JDBC, RFC, AS2, etc.) from the on-premise middleware layer to the cloud integration platform • Perform unit testing, integration testing, and support UAT for all migrated interfaces • Monitor, operate, and troubleshoot integration flows using cloud operations and observability tooling • Collaborate with functional business teams across Finance, Supply Chain, Sales, and Manufacturing, as well as middleware architects, to ensure business continuity throughout the migration lifecycle • Produce and maintain migration artifacts including mapping specifications, technical design documents, and runbooks

Job Requirements

  • 5+ years of hands-on experience with an enterprise on-premise middleware platform (process integration or process orchestration)
  • 2+ years of experience developing and operating cloud-based integration flows on a modern iPaaS platform
  • Proven ability to operate with a high degree of autonomy and minimal management oversight
  • Proficiency in Groovy scripting and XSLT for message transformation
  • Experience with standard enterprise adapters: IDoc, SOAP, REST, RFC, SFTP, JDBC
  • Solid understanding of integration patterns including point-to-point, publish-subscribe, and content-based routing
  • Familiarity with cloud business technology platforms and their associated services
  • Relevant cloud integration platform certification
  • Experience with API Management or event-driven messaging (e.g., Event Mesh or equivalent broker services)
  • Knowledge of ERP-to-cloud integration scenarios, particularly involving SAP environments
  • Exposure to migration assessment and readiness tooling for middleware modernization projects

Related Categories

Related Job Pages

More Solutions Engineer Jobs

Zuzeum Art Centre logo

Solution Engineer

Zuzeum Art Centre

Home of the Zuzāns Collection. The largest private collection of Latvian art in the world.

Full TimeRemoteTeam 11-50Since 2020H1B No Sponsor

• Join early-stage calls with Strategic Account Executives to lead technical discovery and demos. • Deliver compelling live demos tailored to customer pain points and use cases. • Translate customer pain points into relevant AI-powered solution examples. • Communicate the value and architecture of Unframe clearly and confidently. • Build credibility and trust with prospects while becoming a strategic partner in shaping their AI vision. • Share insights with Product, Marketing, and R&D teams based on real-world conversations.

Germany
Full TimeRemoteTeam 1,001-5,000

Role Description We are operating large-scale AI training and inference data centers, and we need an expert who can see the entire stack at once — from the chiller plant and switchgear to the GPU fabric and the Kubernetes scheduler. This role spans facilities/OT telemetry (cooling, power) and IT/AI infrastructure observability (compute, network, accelerators), unified by a single goal: complete, real-time, predictive visibility into how AI infrastructure consumes power, generates heat, moves data, and delivers compute. You will design the observability platform that ingests signals from building and electrical systems, server and network fabrics, Kubernetes, and GPU/accelerator clusters — then apply AI/ML models on top of that telemetry to optimize utilization, predict failures, reduce energy cost, and surface insights operators can act on. You are equally comfortable reading a BACnet point list and a GPU NVLink topology, and you can explain to both facilities and platform teams why their data belongs in the same system. Qualifications - 8+ years in infrastructure, SRE, observability, or data center engineering, with 3+ years in an architect or principal-level role. - Demonstrated experience designing and operating observability platforms at scale (metrics, logs, traces). - Expertise in Datadog, Dynatrace, Grafana, Prometheus and Grafana. - Hands-on experience integrating BMS and EPMS data, and a working understanding of data center mechanical and electrical systems (cooling topologies, power distribution, redundancy, capacity). - Strong systems monitoring background — Linux/server fleets, hardware health, baseboard management (IPMI/Redfish). - Strong network monitoring background, including high-performance / low-latency fabrics relevant to AI workloads. Expertise in SNMP, WMI. - Production experience with Kubernetes and observability of containerized workloads. - Experience operating or monitoring GPU / AI-accelerator clusters and understanding of distributed training/inference behavior. - Practical experience applying AI/ML models to operational data (anomaly detection, forecasting, or AIOps), and comfort using LLMs to derive insights and automate analysis. - Proficiency in at least one language for data/automation work (Python preferred), and infrastructure-as-code practices. Requirements - Define and own the end-to-end observability architecture covering metrics, logs, traces, and events across facilities and IT domains. - Establish standards for instrumentation, telemetry pipelines, data retention, cardinality management, and a unified data model that lets power, thermal, network, and compute signals be correlated in one place. - Design for scale: hundreds of thousands of time series per site, high-frequency power and thermal sampling, and GPU-cluster-level granularity. - Integrate Building Management System (BMS) telemetry — CRAC/CRAH units, chillers, cooling loops, airflow, temperature/humidity, leak detection — into the central observability platform (BACnet, Modbus, MQTT, OPC-UA). - Integrate Electrical Power Monitoring System (EPMS) data — switchgear, UPS, PDUs, busways, branch-circuit metering, generators — for real-time power draw, capacity, and quality monitoring (Modbus, DNP3, IEC 61850). - Build correlated views of power and thermal behavior against compute workload so operators understand cause and effect (e.g., a training job's effect on rack power and inlet temperatures). - Partner with facilities engineering on PUE, capacity planning, stranded-power recovery, and thermal optimization. - Architect observability for AI/GPU clusters — accelerator utilization, memory pressure, thermals, ECC/Xid errors, power capping, and job-level efficiency (e.g., via NVIDIA DCGM, accelerator telemetry exporters). - Instrument Kubernetes environments running AI/ML workloads: cluster, node, pod, and workload metrics, scheduler behavior, GPU/accelerator allocation, and operator health. - Provide visibility into training and inference pipelines — throughput, queue depth, checkpoint behavior, straggler detection, and cost-per-token / cost-per-training-step metrics. - Surface noisy-neighbor, fragmentation, and underutilization patterns across multi-tenant clusters. - Design monitoring for high-performance data center fabrics, including the AI back-end network (RDMA, InfiniBand and/or RoCE Ethernet) and front-end/management networks. - Capture fabric health, congestion, link errors, latency, and bandwidth utilization using streaming telemetry, SNMP, gNMI/gRPC, NetFlow/sFlow, and fabric managers (e.g., InfiniBand UFM). - Correlate network behavior with distributed training performance to diagnose collective-communication bottlenecks. - Apply ML and AI models to the telemetry estate for anomaly detection, predictive maintenance, capacity forecasting, and automated root-cause analysis. - Build models and pipelines that recommend (or automate) actions: dynamic cooling and power optimization, workload placement, power capping under thermal/electrical constraints, and failure pre-emption. - Leverage LLMs and modern AI techniques to summarize incidents, accelerate root-cause investigation, query telemetry in natural language, and generate operator-facing insights from large volumes of logs and metrics. - Establish the feedback loop where observability data trains the models that, in turn, optimize the infrastructure being observed. - Act as the technical authority connecting facilities, network, platform, SRE, and AI/ML teams around a shared observability practice. - Define SLOs, alerting strategy, and on-call signal quality; drive down alert noise and mean-time-to-resolution. - Mentor engineers and set the technical direction for the observability roadmap. Benefits - Experience with tooling such as OpenTelemetry, VictoriaMetrics/Thanos, Loki, Tempo, Elastic, Splunk. - Familiarity with OT/industrial protocols: BACnet, Modbus, OPC-UA, DNP3, IEC 61850, MQTT. - Familiarity with GPU/accelerator telemetry (NVIDIA DCGM and exporters) and InfiniBand/RDMA monitoring (e.g., UFM). - Experience with network telemetry: gNMI/OpenConfig streaming, SNMP, NetFlow/sFlow. - Experience with time-series data at high cardinality, stream processing, and data lake/warehouse patterns for telemetry. - Background in MLOps, model deployment, or building data/feature pipelines for operational ML. - Exposure to power and cooling optimization, PUE improvement, or sustainability/energy-efficiency initiatives. - Relevant certifications (e.g., data center facilities, Kubernetes/CKA, cloud or networking) are a plus.

France
Nimble Gravity logo

AI Platform Engineer – Integration

Nimble Gravity

Data Science, Digital Transformation and eCommerce Strategy from experienced eCommerce and AI/ML experts

Full TimeRemoteTeam 51-200H1B No Sponsor

• Design and build integrations between internal and external systems such as Jira, Bitbucket, AWS services using APIs and webhooks. • Develop automation workflows that connect project management tools, repositories, cloud services, and AI platforms. • Configure and manage AWS services, including Amazon SageMaker, Amazon Redshift, Amazon Bedrock, and related cloud infrastructure. • Set up and configure AI agents and agent-based workflows using AWS Bedrock as LLM provider. • Write, maintain, and improve infrastructure as code using Pulumi and/or Terraform. • Support CI/CD workflows and repository automation in Bitbucket. • Work with Jira automation, webhooks, APIs, and event-driven workflows to streamline engineering and operational processes. • Collaborate with data, engineering, AI, and platform teams to understand requirements and deliver reliable technical solutions. • Document integrations, infrastructure components, runbooks, and operational procedures. • Troubleshoot issues across cloud infrastructure, integrations, AI services, and automated workflows.

Latin America
Zocdoc logo

Integration Health Engineer

Zocdoc

Zocdoc is the beginning of a better healthcare experience for millions of patients every month.

Full TimeRemoteTeam 501-1,000Since 2007H1B Sponsor

• Full ownership of the continuous monitoring and health checks of system integrations. • Troubleshooting and resolving real-time integration errors, API failures, and data flow disruptions. • Building and fostering relationships with Account, DevOps, and IT teams to coordinate fixes and improvements. • Analyzing logs and alerts to determine the root cause of system failures and documenting resolutions. • Optimizing existing integrations for efficiency and enhanced performance. • Working with cutting edge GenAI tools and technology.

New York
$30 - $48 / hour