Job Closed
This listing is no longer active.
We put the power in your hands to buy, sell, and trade digital currency 🌏
Senior AI Compute Infrastructure Engineer
Location
United States
Posted
38 days ago
Salary
$127.2K - $254.4K / year
Seniority
Senior
Job Description
Senior AI Compute Infrastructure Engineer
Kraken Digital Asset Exchange
• Own and operate GPU and accelerator clusters • Design infrastructure for local model execution • Build and improve scheduling and orchestration systems • Optimize inference pipelines • Partner with ML engineers to remove bottlenecks
Job Requirements
- 5+ years of infrastructure engineering experience
- Hands-on experience operating GPU clusters
- Strong systems engineering fundamentals
- Experience with ML serving frameworks
- Proficiency in Python for infrastructure automation
Benefits
- Bonus program
- Equity program
- Wellness allowance
- Health insurance
Related Guides
Related Categories
Related Job Pages
More Infrastructure Engineer Jobs
Senior Backend – Infrastructure Engineer
Revelation PharmaRevelation Pharma | National Network of 503A & 503B Compounding Pharmacies 💊
• Own the FHIR R4 data model design • Architect and implement the HealthLake data layer including ingest pipelines • Lead the Supabase-to-HealthLake migration • Design and enforce encryption patterns and PHI access controls • Review and approve KMS infrastructure code • Build the data access layer consumed by agents • Implement audit logging infrastructure that satisfies HIPAA requirements • Design Clean Rooms configurations for analytics use cases
Senior SRE & Automation Engineer – Infrastructure
AddendumJoin Addendum Group, a software development and IT outsourcing company with more than 20 years of hands-on experience by serving Financial services, Telecom, and smart Government industries in the US and Europe. The company specializes in business process analysis & optimization, system integration, maintenance, optimization, custom development, mobile app development and maintenance. Addendum Group philosophy is to develop long-term relationships with Customers, Partners, Employees, and Suppliers. The long-term partnership ensures three main things: exceptional industry knowledge, superior team performance and proven track record.
Role Description Whether you're in the heart of India or anywhere else, you'll join forces with dedicated professionals in the infrastructure and enterprise technology sector, working on exciting projects that enhance and automate global data center environments through scalable infrastructure-as-code, automation engineering, and platform reliability initiatives. - Define and drive infrastructure automation strategy for compute, storage, and backup platforms. - Design and implement Infrastructure as Code (IaC) frameworks to standardize provisioning and configuration management. - Lead automation and infrastructure integration activities for M&A initiatives across global data center environments. - Develop and maintain reusable IaC templates and automation modules for compute, storage, and backup systems. - Automate provisioning, configuration, operational maintenance, and lifecycle management processes to improve reliability and reduce manual effort. - Establish and govern version-controlled infrastructure repositories, branching strategies, and automation standards. - Collaborate with infrastructure engineering, operations, security, and DevOps teams to ensure scalable, resilient, and compliant architectures. - Drive automation adoption across teams and monitor automation effectiveness and delivery outcomes. - Lead technical automation initiatives and mentor automation engineers as a subject matter expert. Qualifications - Bachelor's degree in Computer Science or a related field. - 10+ years of professional experience in infrastructure engineering and automation engineering roles. - Strong hands-on experience with infrastructure automation and Infrastructure as Code in enterprise or data center environments. - Expertise with automation and IaC tools such as Terraform, Ansible, PowerShell, Python, or similar technologies. - Experience with version control systems (Git/GitLab), CI/CD pipelines (GitLab CI, Jenkins), artifact and state management, secrets management tools (Vault, CyberArk), and orchestration platforms. - Good knowledge of RunDeck. - Solid understanding of compute platforms including Windows/Linux, VMware, Nutanix, and Citrix. - Experience with enterprise storage technologies such as Dell/EMC PowerStore, Pure Storage, XtremIO, Unity, and VNX. - Knowledge of backup and recovery technologies including Veritas, Cohesity, Commvault, Veeam, and Rubrik. - Nice to have: People management and project management experience. Benefits - Salary ranges are flexible and depend on your experience. This is a B2B contract-based position. - At Addendum we believe in personalized benefits packages that cater to your unique passions. Tell us what drives you, and we'll craft a package accordingly. - Top Talent Collaboration: Join award-winning teams from different countries. - Diverse, Challenging Projects: Break free from the ordinary with dynamic, stimulating work. - Epic Team Events: Join us for summer and winter parties and online events. - Learning & Development Opportunities: That new tech certificate is just around the corner. - Special Celebrations: We love a good shindig for birthdays and project milestones.
• Analyze disruptions in storage, SAN and server environments and handle incidents • Ensure operational stability and high availability of infrastructures, including change, problem and contingency/emergency management • Support monitoring, performance analysis and capacity management to ensure high-performance, future-proof systems • Operate and maintain storage and server infrastructures and perform updates and structured configuration changes • Take on backlog tasks and support workshops, technical planning and requirements analysis
• Design and develop solutions to automate bare metal lifecycle management using cloud native tools. • Champion best practices for bare metal provisioning at scale. • Design, build, test, and release features to improve platform stability. • Develop, optimize, and simplify the provisioning engine, internal APIs, and related applications. • Participate in and lead code reviews to maintain high standards. • Implement and manage the infrastructure that supports ours and customer's applications. • Escalate and track problems and incidents to the appropriate teams. • Identify areas of improvement in Observability, Monitoring, and code. • Document all developed solutions and results.



