Job Closed
This listing is no longer active.
GPU poor? Contact us for your AI cloud compute needs!
Principal Kubernetes Platform Engineer
Location
United States
Posted
112 days ago
Salary
$0
Seniority
Senior
Job Description
Principal Kubernetes Platform Engineer
TensorWave
Our mission at Tensorwave Cloud is to build seamless, secure, reliable, and resilient AI infrastructure at scale, eliminating barriers and challenging the status quo to empower builders and support AI innovation. About the role We are seeking a Principal Platform Engineer to lead the design, development, and deployment of our next-generation Kubernetes platform. In this role, you will define what production excellence looks like at scale: a global, self-healing, autoscaling Kubernetes platform with strong observability, security, and cost efficiency, capable of supporting millions of users. As a technical leader and hands-on architect, you will build and evolve cloud-native and serverless systems on Kubernetes, writing complex manifests, operators, and controllers from scratch. You will set standards and best practices across the company, ensuring platform tooling is well-documented, reliable, and continuously improved, while enabling developer teams to deploy applications with speed, confidence, and minimal friction. Responsibilities Architect and implement end-to-end Kubernetes infrastructure for large-scale, cloud-native applications Design and build serverless platforms on top of Kubernetes using technologies such as Knative, OpenFaaS, or KEDA Develop and maintain Kubernetes custom resources (CRDs), controllers, operators, and admission controllers in Go or Python Define multi-tenant, multi-region architecture supporting millions of users with high availability and low latency Lead Kubernetes cluster lifecycle management - provisioning, upgrades, scaling, monitoring, troubleshooting Collaborate closely with engineering teams to containerize applications, write Helm charts or Kustomize overlays, and standardize deployment practices Implement infrastructure as code using tools like Terraform, Pulumi, or Crossplane Lead efforts around observability, policy enforcement, cost optimization, and RBAC/security hardening within the cluster Evaluate and integrate Kubernetes ecosystem tools - Istio/Linkerd, ArgoCD, Flux, Prometheus, Grafana, OPA Mentor and upskill DevOps engineers and SREs in Kubernetes best practices Required Experience Bachelor of Science in Computer Science, Computer Engineering, or a related technical field, or equivalent practical experience 8+ years of experience in cloud infrastructure, DevOps, or platform engineering roles 8+ years of hands-on Kubernetes experience, including deep knowledge of the Kubernetes API, internals, networking, and storage Proficiency in writing Kubernetes manifests, Helm charts, and custom Kubernetes controllers/operators Proven experience designing cloud-native systems that scale globally - multi-region, multi-cloud or hybrid setups Experience with serverless technologies in production - Knative, OpenFaaS, AWS Lambda Strong knowledge of cloud platforms such as AWS, GCP, or Azure Experience with GitOps tools - ArgoCD, Flux Deep understanding of security, compliance, and resilience in containerized workloads Preferred Experience Contributions to Kubernetes open-source projects or CNCF-related tooling Experience with service mesh design (Istio, Linkerd) Familiarity with eBPF, Cilium, or network-level observability Background in building PaaS or developer platforms on top of Kubernetes What We Bring Mission driven company Competitive Salary Stock Options 100% paid Medical, Dental, and Vision insurance Flexible PTO Paid Holidays 401(k) Parental Leave Flexible Spending Account Short Term Disability Insurance Life and Voluntary Supplemental Insurance Mental Health Benefits through Spring Health We’re looking for resilient, adaptable people to join our team, people who believe in the mission and think at massive scale. The solutions that worked on a handful of devices will not work at Exascale. Be prepared to be pushed daily, to learn a lot, and literally build the future. Tensorwave is an equal opportunity employer, committed to fostering an inclusive and supportive workplace. All qualified applicants and candidates will receive consideration for employment without regard to race, color, religion, sex, disability, age, national origin, or veteran status.
Related Guides
Related Categories
Related Job Pages
More Platform Engineer Jobs
Intermediate Dynamics 365, Power Platform Developer
MitacsBringing Innovation Into Reach | L’innovation à votre portée
• Demonstrate solid technical skills and expertise in Microsoft Dynamics 365 (Model-driven apps), Power Platform, Power Pages, and cloud technologies. • Analyze business needs and recommend appropriate technical solutions using Dynamics 365 and the Power Platform. • Develop and configure solutions in Dynamics 365 CE, Power Platform, and Power Pages. • Design, develop, and maintain custom plugins, workflows, and client-side scripts (JavaScript/XRM API) to extend platform functionality. • Participate in peer code reviews to maintain code quality and adherence to standards. • Follow cybersecurity best practices, participate in audits/training, and respond to incidents when required .
• Infrastructure Management: Design, implement, and maintain cloud-based infrastructure using AWS, Azure, or GCP, ensuring scalability, security, and reliability. • Automation and IaC: Leverage Infrastructure as Code tools like Terraform to automate provisioning, configuration, and management of infrastructure resources. • CI/CD Pipelines: Build, optimize, and manage continuous integration and continuous deployment (CI/CD) pipelines to support streamlined software delivery. • Monitoring and Performance: Implement and maintain monitoring, logging, and alerting systems to ensure system health, performance, and uptime. • Collaboration and Mentorship: Work closely with software engineering teams to align infrastructure initiatives with business goals, and provide guidance to junior team members. • AI Orchestration: Integrate AI-powered tooling into engineering workflows, including automated code review, intelligent bug triage, and developer productivity tools to accelerate delivery and improve code quality across the organization.
About Us Leap is on a mission to decarbonize the world’s electric grids. We do this by building and scaling virtual power plants (VPPs), which call on everyday energy technologies — like battery storage, electric vehicle chargers, and smart thermostats — to support the grid when electricity demand is high. Leap partners with energy technology providers to connect these devices to grid flexibility programs, where they can get paid for reducing or shifting usage to help balance the grid. Our software platform facilitates fast, automated access to energy markets across the U.S., making it easy for our partners to unlock new revenue, make their energy-saving products more affordable for their customers, and provide crucial support for electric grids. Together, we’re making it possible for every home and business to play a meaningful role in building a cleaner, more affordable, and more reliable energy system. Leap is a privately-held tech company with funding from top venture capital firms and experienced energy entrepreneurs. We are a remote, distributed workplace with teammates based in Europe and North America. Key Responsibilities - Operational Support & Data Management: - Support the implementation of strategies and core processes required for participation across existing and new Leap programs in various energy markets (e.g., New York (NYISO), Texas (ERCOT), California (CAISO), New England (ISO-NE), the Mid-Atlantic (PJM), and other target Leap markets). - Manage day-to-day internal data and performance operations and processes, such as creating and maintaining critical dashboards, monitoring data for completeness and accuracy, and supporting required data submissions to market and program operators. - Troubleshoot data and process issues as they arise. - Analyze energy market data to help evolve Leap’s market participation strategies. - Automation & Process Improvement: - Develop scalable and automated monitoring and alerting processes using Looker, Python, SQL, and internal tools. - Collaborate with Leap’s Market Revenue and Partner Success teams to build high-impact, scalable solutions for viewing and understanding VPP performance data. - Drive documentation efforts across the team to increase understanding of complex performance operations and identify ongoing improvements. - Understand Leap’s value proposition and the issues we address. Establish a clear understanding of how Leap functions and apply this growing knowledge to drive operational success. - Partner and Market Engagement: - Work directly with and support Leap's partners and program operators, resolving issues and understanding their feedback. Manage independent collaboration with partners and/or program operators as needed. - Communicate clearly and professionally when interacting with partners and/or market or program operators. - Learning and Development: - Develop a comprehensive understanding of Leap’s value proposition and the issues we address. - Show initiative in learning more about clean energy technologies and wholesale energy markets and identifying opportunities to apply this knowledge to improve operational processes. - Establish a clear understanding of how Leap functions and apply this growing knowledge to drive operational success. - Be open to coaching and guidance from other analysts, team leads, and Director of Platform Operations. - Develop technical skills and share out learnings or best practices with other Platform Operations analysts. - Collaboration & Communication: - Work autonomously on assigned projects. - Communicate with clarity and tact, utilizing various methods (visual, written, verbal) both internally and externally. - Participate in cross-team collaboration, involving stakeholders in decision-making for projects or working groups. - Share learnings and knowledge with the team through syncs and other opportunities.
Contract Data Engineer, Platform, AWS, Data Pipelines
SOUTHWORKSDevelopment on Demand. No overhead. No hidden fees. No do-overs. No surprises.
• Design, build, maintain and primarily operate scalable streaming and batch data pipelines, with a strong focus on maintenance, monitoring, troubleshooting and continuous improvement of existing pipelines. • Work with AWS services, including Redshift, EMR and ECS, to support data processing and analytics workloads. • Develop and maintain data workflows using Python and SQL. • Orchestrate and monitor pipelines using Apache Airflow. • Build and deploy containerized applications using Docker and Kubernetes. • Break down high-level system designs into well-defined, deliverable tasks with realistic estimates. • Collaborate with cross-functional teams in a fast-paced and distributed environment across the US and Europe. • Drive automation, observability and monitoring to improve reliability, performance and operational efficiency. • Support knowledge transfer and ownership handover as part of the planned transition to the consuming team.




