Mirantis logo
Mirantis

Strategic open source infrastructure for containers and virtual machines.

Senior AI Infrastructure, Platform Operations Engineer

Platform EngineerPlatform EngineerFull TimeRemoteSeniorTeam 501-1,000H1B SponsorCompany SiteLinkedIn

Location

Poland

Posted

4 days ago

Salary

0

Seniority

Senior

Bachelor Degree7 yrs expEnglishCloudDistributed SystemsKubernetesLinux

Job Description

Senior AI Infrastructure, Platform Operations Engineer

Mirantis

• Lead the investigation and resolution of complex infrastructure, networking, and platform-related incidents. • Act as a senior escalation point for operational teams during critical service-impacting events. • Support large-scale NVIDIA GPU infrastructure and high-performance networking environments. • Troubleshoot complex Linux, Kubernetes, networking, storage, and hardware-related issues. • Analyze platform performance, capacity, stability, and reliability trends to proactively identify risks. • Lead root cause analysis activities and drive long-term corrective actions. • Collaborate with engineering teams, hardware vendors, and datacenter personnel to resolve complex technical challenges. • Participate in major incident management and service restoration activities. • Provide technical leadership for Kubernetes platform operations and supporting infrastructure services. • Drive improvements in platform reliability, observability, monitoring, and operational processes. • Identify opportunities to automate repetitive operational activities and improve operational efficiency. • Contribute to operational readiness reviews, infrastructure changes, upgrades, and service introductions. • Support the adoption and operation of AI-powered infrastructure services and operational capabilities through k0rdent AI. • Evaluate emerging technologies and operational practices to improve service delivery and platform resilience. • Mentor and support AI Infrastructure & Platform Operations Engineers. • Share technical knowledge through documentation, training sessions, and operational reviews. • Develop and maintain operational standards, runbooks, troubleshooting guides, and best practices. • Help define operational processes, escalation paths, and service reliability standards. • Act as a trusted technical advisor during operational planning and service improvement initiatives.

Job Requirements

  • 7+ years of experience in infrastructure operations, platform operations, site reliability engineering, network operations, cloud operations, datacenter operations, or related technical roles.
  • Expert-level Linux administration and troubleshooting skills.
  • Strong networking expertise, including experience diagnosing complex performance, connectivity, and reliability issues.
  • Strong experience operating Kubernetes in production environments.
  • Experience supporting large-scale production infrastructure and distributed systems.
  • Proven experience leading technical investigations and managing complex incidents.
  • Experience performing root cause analysis and driving long-term operational improvements.
  • Strong understanding of observability, monitoring, and service reliability practices.
  • Excellent troubleshooting and analytical skills across multiple infrastructure domains.
  • Strong communication, collaboration, and stakeholder management skills.

Benefits

  • Operate some of the most advanced AI infrastructure environments in production today.
  • Work with the latest NVIDIA GPU technologies, Kubernetes platforms, and high-performance networking environments.
  • Help define operational standards and reliability practices for next-generation AI infrastructure services.
  • Influence the adoption of AI-powered operational capabilities through k0rdent AI.
  • Work alongside highly skilled engineers solving complex infrastructure and platform challenges at scale.
  • Join a growing organisation investing heavily in AI infrastructure, platform services, and operational innovation.

Related Categories

Related Job Pages

More Platform Engineer Jobs

Senior Quadient Inspire Platform Integration Engineer

UnitedHealth Group

UnitedHealth Group is a healthcare and well-being company that’s dedicated to improving the health outcomes of millions around the world. We are comprised of

Role Description Provide dedicated, long‑term engineering ownership of core Quadient and Groovy‑based processing, onboarding workflows, and shared communication components. Act as a persistent platform capability enabling scalable onboarding and consistent communications delivery across the enterprise. You’ll enjoy the flexibility to work remotely from anywhere within the U.S. as you take on some tough challenges. Primary Responsibilities: - Develop, enhance, and maintain Groovy scripts, Quadient business logic, templates, and shared data‑alignment components used across CCM compositions. - Support onboarding workflows by configuring communication setups, data mappings, template builds, and application of standardized composition patterns. - Partner with TPOs, business stakeholders, and implementation teams to translate requirements into reusable, scalable technical solutions. - Support production readiness by assisting with communication reviews, approvals, and readiness validation. - Troubleshoot communication build issues, routing and orchestration logic defects, variance detection failures, and batch/job‑processing problems. - Serve as a floating technical resource to support and train new onboarding teams as they come online (tools, processes, template operations). - Contribute to ongoing documentation, playbook refinement, and continuous improvement initiatives to strengthen platform stability and reuse. - Design, develop, and deploy AI-powered solutions to address complex business challenges with emphasis on responsible use of AI. Qualifications - 3+ years of experience in developing and maintaining Groovy scripts, Quadient business rules, templates, and shared components. Requirements - Undergraduate degree or equivalent experience. - 5+ years of hands-on experience using the Quadient Suite of tools (e.g., Quadient Inspire, Interactive, Scaler) in an enterprise CCM environment. - Proven experience supporting end-to-end communication onboarding, including data mappings, template configuration, and application of standardized composition patterns. - Experience troubleshooting CCM build issues, routing/orchestration logic, variance detection, and batch or job-processing failures. - Demonstrated ability to translate business and technical requirements into reusable, scalable platform solutions. - Experience working closely with Product Owners/TPOs, business stakeholders, and implementation teams in an Agile or hybrid delivery model. - Familiarity with production readiness processes, communication reviews, approvals, and release validation. - Experience supporting or mentoring onboarding teams and contributing to platform documentation, playbooks, and continuous improvement initiatives. Benefits - Comprehensive benefits package. - Incentive and recognition programs. - Equity stock purchase and 401k contribution (all benefits are subject to eligibility requirements). - Salary range from $91,700 to $163,700 annually based on full-time employment.

United States
$91.7K - $163.7K / year

Database Platform Distinguished Engineer

Dayforce

Dayforce is a global HCM platform offering a comprehensive array of services encompassing payroll, HR, benefits, workforce management, talent, and analytics. With the mission of "m

Role Description We are redefining the data foundation that powers Dayforce. As a Distinguished Database Platform Engineer, you will set the technical direction for how we design, scale, and operate our data platforms—spanning transactional systems and modern lakehouse architectures. This is a hands-on architectural role for a technical leader who thrives on solving complex data challenges, influencing engineering at scale, and modernizing data platform engineering through cloud, automation, and AI. - Define and drive the architecture of highly scalable, resilient data platforms across OLTP and analytical systems - Lead the evolution of Lakehouse architecture, supporting unified data and analytics - Own performance, reliability, and scalability across critical database platforms through deep, hands-on expertise - Guide adoption of modern platforms and tooling (e.g., Databricks, cloud-native services, automation frameworks) - Partner with engineering and data teams to shape data models, access patterns, and platform capabilities - Influence technical strategy and raise engineering standards across the organization - Lead hands-on investigation and resolution of complex customer database and data lake performance, reliability, and data consistency issues, including direct analysis of production systems when needed Qualifications - Expert-level experience with large-scale database platforms (e.g., SQL Server, PostgreSQL) in production environments - Strong background in modern data architecture and distributed data systems - Experience with Databricks and cloud data platforms (AWS, Azure, or GCP) - Familiarity with both relational and NoSQL systems (e.g., MongoDB) - Deep expertise in performance tuning, scalability, and high-availability design - Proven ability to operate as a technical leader—setting direction while remaining hands-on - Comfort leveraging AI and automation to improve database platform operations and engineering productivity - 10+ years of experience designing, operating, and optimizing large-scale production database platforms - Deep hands-on expertise with enterprise relational database technologies such as Microsoft SQL Server and PostgreSQL - Strong experience with distributed data systems and modern data architectures, including Data Lake, Delta Lake, or Lakehouse platforms - Experience troubleshooting complex database performance, scalability, reliability, and data consistency issues in production environments - Experience with cloud platforms and cloud-native data services (AWS, Azure, or GCP) - Strong understanding of high availability, disaster recovery, backup/recovery, indexing, query optimization, performance tuning, Change Data Capture, and Event Driven systems. - Experience working with automation, observability, and infrastructure-as-code approaches for platform operations - Proven ability to influence technical direction and collaborate effectively across engineering, infrastructure, and data teams - Bachelor’s degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience Requirements - Experience with Databricks and modern Lakehouse implementations supporting unified analytics and operational workloads - Experience with NoSQL technologies such as MongoDB or other distributed document/key-value databases - Experience designing and operating platforms that support both transactional and analytical workloads at enterprise scale - Experience modernizing legacy database environments into cloud-native or hybrid architectures - Demonstrated technical leadership in architecting shared platform capabilities, standards, and engineering best practices - Experience leveraging AI-assisted tooling or automation to improve operational efficiency, troubleshooting, and engineering productivity - Experience mentoring senior engineers and leading cross-functional technical initiatives - Experience designing secure data platforms with strong governance and access-control models - Familiarity with compliance frameworks and data residency requirements (SOC2, GDPR, HIPAA, etc.) - Expertise with encryption, secrets management, auditing, and zero-trust approaches - Advanced cloud, database, or data engineering certifications are considered an asset - Experience in large-scale SaaS, HCM, fintech, or other highly regulated enterprise software environments is an asset Benefits - Medical, dental, vision, and life insurance - 401k plan (plus match) - Global Employee Stock Purchase Plan - Unlimited Time Away From Work (in lieu of accrued vacation time) - 10 paid US holidays - Up to 80 hours of paid sick time - 17 weeks of paid parental leave, subject to the terms of the applicable policy or program - Opportunities for community impact, including volunteer days and charity initiatives

United States
$140.9K - $251.6K / year
Xcellent Technology Solutions (XTS) logo

Data Platform Engineer

Xcellent Technology Solutions (XTS)

A Leader in Geospatial, IT and Program Management Services

Full TimeRemoteTeam 51-200Since 2005H1B No Sponsor

• Help design, support, and evolve the data infrastructure for environmental research. • Work closely with federal stakeholders and technical teams to translate data orchestration needs. • Manage PostgreSQL databases, optimize SQL based workflows, and support Python driven automation and CI/CD pipelines. • Contribute to improving observability, reliability, and operational performance across data pipelines. • Help engineer the platform that keeps high value environmental data moving, accessible, and trusted.

United States
$70K - $80K / year
Full TimeRemoteTeam ,H1B No Sponsor

• Execute and extend the engineering patterns established by the Snowflake Platform Lead and Senior Data Platform Engineer. • Responsible for the day-to-day build, deployment, and operation of Snowflake objects, pipelines, and tooling. • Own Implementation of Snowflake objects and roles in Terraform per established patterns. • Day-to-day operation of CI/CD pipelines for database changes. • Configuration and maintenance of assigned ingestion pipelines (Snowpipe, Streams/Tasks, managed connectors). • First-line response to platform alerts on assigned components. • Documentation and runbook updates for work this role delivers. • Contributes to Architecture and pattern design decisions.

California + 1 moreAll locations: California | Florida
$110K - $130K / year
Job Closed