Strategic open source infrastructure for containers and virtual machines.
AI Infrastructure & Platform Operations Engineer
Location
Poland
Posted
6 days ago
Salary
0
Seniority
Senior
Job Description
AI Infrastructure & Platform Operations Engineer
Mirantis
• Monitor, operate, and support production AI infrastructure platforms. • Investigate and resolve infrastructure, networking, hardware, and platform-related incidents. • Support NVIDIA GPU infrastructure and associated platform services. • Monitor and troubleshoot Kubernetes-based environments. • Investigate performance, availability, and reliability issues across infrastructure and platform components. • Collaborate with engineering teams, hardware vendors, datacenter personnel, and service delivery teams to resolve technical issues. • Participate in incident response, root cause analysis, and operational improvement activities. • Contribute to improvements in monitoring, observability, automation, and operational processes. • Maintain operational documentation, runbooks, and knowledge articles.
Job Requirements
- 3+ years of experience in infrastructure operations, platform operations, network operations, site reliability engineering, cloud operations, datacenter operations, or related technical roles.
- Strong Linux administration and troubleshooting skills.
- Good understanding of networking concepts and experience diagnosing infrastructure-related issues.
- Working knowledge of Kubernetes in production environments.
- Experience supporting production infrastructure and services.
- Strong analytical and problem-solving skills.
- Experience working within structured operational and incident management processes.
- Excellent communication and collaboration skills.
- Ability to work within a shift-based operational environment.
Benefits
- Work with some of the most advanced AI infrastructure environments in production today.
- Gain exposure to NVIDIA GPU technologies, Kubernetes platforms, and high-performance networking environments.
- Help define how next-generation AI infrastructure is operated and supported.
- Be part of a team shaping the future of AI-powered operations through k0rdent AI.
- Join a growing organisation investing heavily in AI infrastructure and platform services.
Related Guides
Related Categories
Related Job Pages
More Platform Engineer Jobs
Software Engineer - SaaS Platform & Asset Management - Türkiye
JumpCloudAn open directory platform for secure, frictionless access from any device to any resource, anywhere
All roles at JumpCloud® are Remote unless otherwise specified in the Job Description. About JumpCloud®JumpCloud® is the AI-powered unified IT management platform designed to secure the modern workforce. By consolidating identity, device, and access management, JumpCloud provides intelligent, secure IT that scales from human users to autonomous AI agents. We help organizations around the globe eliminate complexity and turn AI risk into an optimized advantage, ensuring the right people and agents have secure access to the right resources at all times. JumpCloud is Intelligent, Secure IT. Key Responsibilities - Design, develop, and implement highly scalable backend services for our Asset & SaaS Management platform using Go (Golang). - Architect and build high-throughput microservices using gRPC and Protobuf, ensuring low-latency communication in a high-traffic distributed system. - Take ownership of the database layer using PostgreSQL; design efficient schemas, write complex optimized queries, and perform deep performance analysis to prevent bottlenecks. - Implement and maintain comprehensive observability pipelines using tools like Datadog, Prometheus, or Grafana to monitor system health, trace requests, and proactively identify performance degradation. - Design and implement asynchronous processing workflows for asset data synchronization using message brokers (e.g., Kafka, RabbitMQ, or AWS SQS). - Manage and deploy containerized applications using Kubernetes, ensuring high availability and zero-downtime deployments. - Collaborate with product managers to translate complex asset & saas management logic (lifecycle tracking, depreciation, audit trails) into robust technical solutions. - Write clean, maintainable, and well-documented code, adhering to Go best practices and effective error handling. - Participate in code reviews, providing constructive feedback and ensuring code quality, specifically looking for race conditions and memory leaks. - Troubleshoot and debug production issues in a complex microservices architecture, utilizing distributed tracing and log analysis. Qualifications - 4-6+ years of professional experience in software development, with a strong focus on backend systems and infrastructure engineering. - Strong proficiency in Go (Golang), with a deep understanding of goroutines, channels, and interface-based design patterns. - Proven experience working on high-traffic, large-scale SaaS applications where performance and concurrency are critical. - Deep expertise in PostgreSQL, including the ability to diagnose slow queries, optimize execution plans, and manage connection pools effectively. - Strong experience implementing gRPC services and defining rigid contracts using Protocol Buffers. - Hands-on experience with observability and APM tools (specifically Datadog, New Relic, or OpenTelemetry) to set up dashboards, alerts, and conduct root cause analysis. - Extensive experience with the complete DevOps lifecycle, including Git version control, CI/CD pipelines (e.g., Github, GitLab CI, Jenkins), and infrastructure-as-code. - Strong hands-on experience with Docker and Kubernetes for orchestrating services in a production environment. - Familiarity with distributed caching strategies (e.g., Redis) to offload database pressure. - Excellent problem-solving skills and the ability to work independently and as part of a team. - Strong communication and interpersonal skills. Preferred Skills - Bachelor's degree in Computer Science, Software Engineering, or a related field. - Experience in API development and API integrations - Experience with cloud platforms (AWS or GCP) - Experience with CDC (Kafka, Debezium etc) - Experience with AI-assisted development tools (e.g., GitHub Copilot, Cursor). - Contributions to open-source Go projects or libraries. Where you’ll be working/Location: JumpCloud is committed to being Remote First, meaning that you are able to work remotely within the country noted in the Job Description. You must be located in and authorized to work in the country noted in the job description to be considered for this role. Please note: There is an expectation that our engineers participate in on-call shifts. You will be expected commit to being ready and able to respond during your assigned shift, so that alerts don't go unaddressed. Language: JumpCloud has teams in 15+ countries around the world and conducts our internal business in English. The interview and any additional screening process will take place primarily in English. To be considered for a role at JumpCloud, you will be required to speak and write in English fluently. Any additional language requirements will be included in the details of the job description. Why JumpCloud? If you thrive working in a fast, SaaS-based environment and you are passionate about solving challenging technical problems, we look forward to hearing from you! JumpCloud is an incredible place to share and grow your expertise! You’ll work with amazing talent across each department who are passionate about our mission. We’re out of the box thinkers, so your unique ideas and approaches for conceiving a product and/or feature will be welcome. You’ll have a voice in the organization as you work with a seasoned executive team, a supportive board and in a proven market that our customers are excited about. One of JumpCloud's three core values is to “Build Connections.” To us that means creating " human connection with each other regardless of our backgrounds, orientations, geographies, religions, languages, gender, race, etc. We care deeply about the people that we work with and want to see everyone succeed." - Rajat Bhargava, CEO Please submit your résumé and brief explanation about yourself and why you would be a good fit for JumpCloud. Please note JumpCloud is not accepting third party resumes at this time. JumpCloud is an equal opportunity employer. All applicants will be considered for employment without attention to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran or disability status. Scam Notice: Please be aware that there are individuals and organizations that may attempt to scam job seekers by offering fraudulent employment opportunities in the name of JumpCloud. These scams may involve fake job postings, unsolicited emails, or messages claiming to be from our recruiters or hiring managers. Please note that JumpCloud will never ask for any personal account information, such as credit card details or bank account numbers, during the recruitment process. Additionally, JumpCloud will never send you a check for any equipment prior to employment. All communication related to interviews and offers from our recruiters and hiring managers will come from official company email addresses (@jumpcloud.com) and will never ask for any payment, fee to be paid or purchases to be made by the job seeker. If you are contacted by anyone claiming to represent JumpCloud and you are unsure of their authenticity, please do not provide any personal/financial information and contact us immediately at recruiting@jumpcloud.com with the subject line "Scam Notice" #LI-Remote #BI-Remote
Platform Engineer
DTCC - Depository Trust and Clearing CorporationDTCC, which stands for Depository Trust and Clearing Corporation, is a leading financial services company providing secure, efficient, and transparent post-trad
Design and maintain scalable AI/ML platforms using AWS services, manage AWS EMR clusters for data processing, and develop MLOps pipelines for model deployment, ensuring system reliability and compliance across environments.
Senior Power Platform Developer-Remote
BAE Systems, Inc.Improving the future and protecting lives is an ambitious mission, but it’s what we do. As a leading aerospace, defense, and security company, we work together to deliver a full range of products and services for air, land, space, and naval forces, as well as advanced electronics, security, information technology solutions and customer support services. How we work is rooted in purpose – a purpose to protect those who protect us, to unite our community of colleagues and customers, and to drive forward the growth and development of our exceptional team members. It's where purpose connects.
Job Description Seeking a Senior Power Platform Developer to serve as a technical authority and hands-on builder within our enterprise Microsoft 365 environment. This role requires someone who can own the full development lifecycle of complex Power Apps and Power Automate solutions, architect REST API integrations, and help lead and govern our Power Platform Center of Excellence (CoE). The right candidate is equally comfortable writing intricate flow logic as they are advising stakeholders, mentoring peers, and enforcing platform governance at scale. Experience operating in classified or government environments is expected, and an active Top Secret clearance is required before start. Required Education, Experience, & Skills KEY RESPONSIBILITIES - Architect, design, and deliver complex Power Apps solutions, including canvas apps, multi-screen enterprise applications, and reusable components supporting mission-critical operations. - Build and maintain advanced Power Automate workflows using complex conditional logic, parallel execution paths, error handling, retry logic, and reusable child flows. - Design, develop, and consume REST APIs and custom connectors to integrate Power Platform solutions with external systems, Azure services, third-party data sources, and enterprise applications. - Support complex integrations involving SharePoint, Dataverse, SQL/Azure DB, Microsoft Graph API, and other Microsoft 365 connectors. - Support the Power Platform Center of Excellence, including DLP policies, environment strategy, ALM practices, solution packaging, and governance across Dev/Test/Prod environments. - Conduct solution reviews, code walkthroughs, and architecture assessments to ensure solutions meet security, compliance, and production-readiness standards. - Drive adoption of reusable solution patterns, shared component libraries, and documented development standards. - Monitor platform health, automation reliability, and capacity utilization, and proactively identify and resolve performance issues. - Partner with stakeholders, business analysts, and project managers to translate mission requirements into scalable, production-ready Power Platform solutions. - Mentor junior and mid-level developers through code reviews, pair development, and knowledge-sharing. - Create and maintain technical documentation, SOPs, solution design records, and governance artifacts. Required Qualifications: - Active Top Secret clearance. - 6+ years of hands-on Power Apps and Power Automate development experience in an enterprise government environment. - Proven experience building and consuming REST APIs and custom connectors within the Power Platform. - Demonstrated experience architecting complex, multi-system workflows with advanced logic and error handling. - Experience contributing to or supporting a Power Platform Center of Excellence. - Strong experience with Microsoft 365, SharePoint, Dataverse, and enterprise application integrations. - Ability to partner with technical and non-technical stakeholders to gather requirements and deliver production-ready solutions. Preferred Education, Experience, & Skills Preferred Qualifications - Experience with PowerShell scripting in an M365 enterprise environment, including SharePoint Online administration using PnP PowerShell. - Experience with Power Apps and Power Automate admin modules. - Experience with Exchange Online management through the ExchangeOnlineManagement module. - Experience developing and supporting Power Pages portals for external-facing web experiences or citizen engagement use cases. - Experience designing and developing Model-Driven Apps using Dataverse tables, business rules, relationships, and security roles. Pay Information Full-Time Salary Range: $97008 - $164914 Please note: This range is based on our market pay structures. However, individual salaries are determined by a variety of factors including, but not limited to: business considerations, local market conditions, and internal equity, as well as candidate qualifications, such as skills, education, and experience. Employee Benefits: At BAE Systems, we support our employees in all aspects of their life, including their health and financial well-being. Regular employees scheduled to work 20+ hours per week are offered: health, dental, and vision insurance; health savings accounts; a 401(k) savings plan; disability coverage; and life and accident insurance. We also have an employee assistance program, a legal plan, and other perks including discounts on things like home, auto, and pet insurance. Our leave programs include paid time off, paid holidays, as well as other types of leave, including paid parental, military, bereavement, and any applicable federal and state sick leave. Employees may participate in the company recognition program to receive monetary or non-monetary recognition awards. Other incentives may be available based on position level and/or job specifics. About BAE Systems Intelligence & Security BAE Systems, Inc. is the U.S. subsidiary of BAE Systems plc, an international defense, aerospace and security company which delivers a full range of products and services for air, land and naval forces, as well as advanced electronics, security, information technology solutions and customer support services. Improving the future and protecting lives is an ambitious mission, but it's what we do at BAE Systems. Working here means using your passion and ingenuity where it counts - defending national security with breakthrough technology, superior products, and intelligence solutions. As you develop the latest technology and defend national security, you will continually hone your skills on a team-making a big impact on a global scale. At BAE Systems, you'll find a rewarding career that truly makes a difference. Intelligence & Security (I&S), based in McLean, Virginia, designs and delivers advanced defense, intelligence, and security solutions that support the important missions of our customers. Our pride and dedication shows in everything we do-from intelligence analysis, cyber operations and IT expertise to systems development, systems integration, and operations and maintenance services. Knowing that our work enables the U.S. military and government to recognize, manage and defeat threats inspires us to push ourselves and our technologies to new levels. This position will be posted for at least 5 calendar days. The posting will remain active until the position is filled, or a qualified pool of candidates is identified.
Senior Director of Platform Engineering
NRGNRG Energy is committed to a drug and alcohol-free workplace. To the extent permitted by law and any applicable collective bargaining agreement, employees are subject to periodic random drug testing, and post-accident and reasonable suspicion drug and alcohol testing. EOE AA M/F/Protected Veteran Status/Disability. Level, Title and/or Salary may be adjusted based on the applicant's experience or skills. EEO is the Law Poster (The poster can be found at http://www.eeoc.gov/employers/upload/poster_screen_reader_optimized.pdf ) Official description on file with Talent.
Role Description Vivint is seeking a Senior Director of Platform Engineering to lead the backend engineering teams that power our smart home and energy management platforms. Reporting to the SVP of Engineering, this leader owns Vivint's backend cloud services and APIs, core infrastructure, data and streaming pipelines, and the integrations that connect Vivint to other parts of the business and to strategic third-party partners. The Platform organization delivers the services that millions of Vivint customers depend on every day. Primary Responsibilities - Leadership & Team Development - Lead, mentor, and grow a high-performing organization of software engineers and SREs. - Foster a culture of reliability, operational excellence, and engineering rigor. - Define career paths, set performance expectations, and invest in the growth of every engineer. - Partner with recruiting to attract top platform engineering talent. - Technical Strategy & Architecture - Set the multi-year technical direction and roadmap for the Platform organization, balancing near-term delivery with long-term technical health. - Define the architectural vision for Vivint's cloud platform, ensuring it is resilient, secure, cost-efficient, and scalable for future growth. - Drive modern platform practices: CI/CD, infrastructure-as-code, observability, and automation that improve backend delivery velocity and quality. - Backend Services & Reliability - Own the reliability, availability, and performance of the backend services that power the Vivint smart home experience. - Establish and enforce SLOs, error budgets, and incident management practices that reflect Vivint's commitment to customer uptime. - Manage infrastructure cost-effectively as the platform grows. Qualifications - 10+ years of software engineering experience, with 5+ years in engineering leadership roles overseeing backend or SRE teams. - Deep expertise in cloud and data center architecture, distributed systems, and large-scale backend service design. - Strong background in container orchestration and modern CI/CD practices. - Experience establishing SLOs, on-call practices, and incident response programs in a 24/7 production environment. - Proven ability to define and execute multi-year platform roadmaps in a cloud or IoT product company. - B.S. in Computer Science, Software Engineering, or related field. - Some travel is required. Preferred Qualifications - Experience building backends for IoT or connected-device platforms. - Familiarity with major cloud providers. - Experience with Kubernetes and container orchestration. - Familiarity with streaming and messaging technologies (Kafka, RabbitMQ, MQTT). - Experience operating MongoDB at scale. - Background in platform security including API security, secrets management, and zero-trust networking. Benefits - Paid holidays and flexible paid time away - Employee/Friends/Family Discounts - Medical/dental/vision/life coverage - 401(k) + Employer Match - Employee Resource Groups - Annual Bonus - Employee Stock Purchase Plan - Quarterly Innovation Weeks Working Conditions This job operates in a professional office environment. This role routinely uses standard office equipment. Safety Vivint enforces a safety culture whereby all employees have the responsibility for continuously developing and maintaining a safe working environment. Each new employee is responsible for completing all training requirements. Additionally, the employee must accept they have responsibility for maintaining the safety of themselves, their co-workers, and the public. Employee must adhere to all written and verbal instructions, promptly report and correct all hazards or unsafe conditions, question non-standard operations or unmitigated hazards, and provide feedback to management on all safety issues. NRG Energy is committed to a drug and alcohol-free workplace. To the extent permitted by law and any applicable collective bargaining agreement, employees are subject to periodic random drug testing, and post-accident and reasonable suspicion drug and alcohol testing. EOE AA M/F/Vet/Disability. Level, Title and/or Salary may be adjusted based on the applicant's experience or skills. The base salary range for this position is: $228,880- $377,640. The base salary range above represents the low and high end of the salary range for this position. Actual salaries will vary based on several factors including but not limited to location, experience, and performance. The range listed is just one component of the total compensation package for employees. Other rewards may include annual bonus, short- and long-term incentives, and program-specific awards. In addition the position may be eligible to participate in the benefits program which include, but are not limited to, medical, vision, dental, 401K, and flexible spending accounts.


