Together AI logo
Together AI

The future of AI is open-source. Let's build together.

Infrastructure Vendor Ops Manager

DevOps EngineerDevOps EngineerFull TimeRemoteLeadTeam 11-50H1B No SponsorCompany SiteLinkedIn

Location

United States

Posted

16 days ago

Salary

$170K - $200K / year

Seniority

Lead

No structured requirement data.

Job Description

Infrastructure Vendor Ops Manager

Together AI

Role Description Together AI is scaling its GPU infrastructure rapidly, working with a growing network of compute suppliers. As we expand, we need someone who owns the operational and financial accountability layer of our vendor relationships: - Tracking SLA compliance - Managing credits - Auditing invoices - Ensuring every dollar spent on compute is accurate and accounted for This role sits within the Infrastructure Strategy team and is highly cross-functional, working with infrastructure engineering, finance, and go-to-market teams. When incidents happen, our engineering team produces root-cause analyses; your job is to take that technical detail, build an airtight case for credit claims, and negotiate directly with providers until credits are recovered. You will also partner with GTM and finance to assess the downstream impact of service disruptions and inform how we handle customer-facing commitments. This requires someone with sharp attention to detail, comfort navigating technical documentation, and the persistence to hold vendors accountable. Responsibilities - SLA tracking and credit recovery across all GPU compute and data center suppliers, including monitoring uptime and performance commitments, documenting violations, and driving credit claims to resolution - Invoice review and validation for compute infrastructure contracts, flagging discrepancies and resolving billing issues directly with vendors - Regular audits of vendor contracts and SLA performance to verify accuracy of charges and identify cost recovery opportunities - Using root-cause analyses prepared by the infrastructure engineering team to build the case for SLA credits, then negotiating directly with providers to recover them - Partnering with GTM and finance to assess the downstream impact of supplier service disruptions and provide the data needed to inform customer-facing remediation decisions - Building tracking systems and dashboards for vendor financial data, SLA metrics, and credit status across the supplier portfolio, using modern tooling and AI-assisted workflows where possible - Cross-functional coordination with procurement, legal, and finance to ensure contract terms are properly reflected in billing and that SLA remedies are enforced - Historical spend analysis and cost forecasting to support operating plan development and infrastructure budget planning - Process development for invoice review, SLA monitoring, and vendor financial operations as the function scales Qualifications - 4+ years of experience in vendor operations, technical program management, or contract compliance in a technology infrastructure, cloud, or data center environment - Direct experience managing SLA credit processes, invoice reconciliation, and vendor performance tracking with infrastructure or cloud providers - Extreme attention to detail. You catch discrepancies others miss, whether in an invoice, a vendor SLA report, or a contract clause - Enough technical fluency to read postmortems and incident reports, understand the engineering context, and translate that into a compelling case for credit recovery - Strong negotiation skills and persistence in vendor-facing conversations, especially when disputing charges or arguing for SLA credits - Proficiency with project management and financial tracking tools (e.g., Linear, JIRA, NetSuite, or similar). Comfort using AI tools to accelerate workflows Nice to Have - Experience with GPU compute or cloud infrastructure vendors specifically (colocation providers, cloud service providers, or hardware OEMs) - Background in building vendor operations processes from scratch at a fast-growing company - Familiarity with data center contract structures, including power and cooling pass-throughs, metered billing, and committed-use pricing Benefits - Competitive compensation - Startup equity - Health insurance - Flexibility in terms of remote work - US base salary range for this full-time position: $170-200K + equity + benefits Equal Opportunity Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.

Related Categories

Related Job Pages

More DevOps Engineer Jobs

EBANX logo

Talent Attraction Ops | Mid Analyst

EBANX

Leading payment partner for global companies in rising markets.

DevOps Engineer16 days ago
Full TimeRemoteTeam 501-1,000Since 2012H1B No Sponsor

Role Description We are looking for a Talent Attraction Ops Mid Analyst to support and optimize our Talent Attraction operations while delivering an exceptional candidate experience throughout the hiring journey. This person will play a key role in ensuring operational excellence across recruitment processes, acting as a bridge between candidates, recruiters, hiring managers, and external partners. The role combines candidate experience, recruitment operations, analytics, and vendor management in a fast-paced and global environment. The ideal candidate is organized, proactive, detail-oriented, data-driven, and comfortable interacting with stakeholders from different countries and cultures. What you will do - Candidate Experience & Recruitment Coordination - Manage interview scheduling across multiple time zones and stakeholders - Support candidates throughout the recruitment process, answering questions and ensuring clear communication - Ensure an excellent end-to-end candidate experience aligned with EBANX employer branding standards - Partner closely with recruiters and hiring managers to guarantee smooth hiring processes - Talent Attraction Operations - Support operational routines related to Talent Attraction processes and tools - Manage recruitment platforms/tools, process governance within ATS (Greenhouse) - Handle processes related to vendor payments and operational controls - Analytics & Reporting - Track and analyze Talent Attraction metrics such as: - Time to Fill - Time to Hire - Diversity metrics - Funnel conversion rates - SLA and operational KPIs - Build reports and dashboards to support decision-making - Identify process improvement opportunities through data analysis - Support strategic projects focused on operational efficiency and scalability Qualifications - Advanced to fluent English skills (written and spoken) - Previous experience in Talent Attraction, Recruitment Operations or Employer Branding - Analytical mindset with proven experience handling metrics and reports - Attention to detail and ability to manage multiple priorities simultaneously - Experience working with ATS platforms and recruiting tools (Greenhouse is a plus) - Experience managing vendors, invoices, and operational routines is a plus Benefits - Meal Allowance: Monthly allowance to support your meals. - EBANX Education: Financial assistance for undergraduate, graduate, and MBA programs to support your professional growth. - EBANX Skills: Dedicated budget for courses, certifications, and workshops to encourage continuous learning. - Language Classes: Language classes to support your personal and professional development. - Health & Well-being: Medical and dental plans with extensive coverage, including support for dependents and wellness programs. - Flexible Work Culture: Semi-flexible hours, additional day off on your birthday, and year-end break to support work-life balance. - Well-being Program: Access to activities and resources that promote physical and mental health.

United States
Job Closed
Pennylane logo

DevOps Engineer

Pennylane

The Financial OS for accounting firms and business owners

DevOps Engineer16 days ago
Full TimeRemoteTeam 501-1,000Since 2020H1B No Sponsor

• Promote and simplify Site Reliability Engineering (SRE) practices across our engineering organization (300+ engineers). • Proactively identify opportunities to improve reliability and performance — not just react to incidents. • Define and evolve performance, metrics, and observability standards (SLIs/SLOs, monitoring frameworks, alerting strategies). • Scale production resources efficiently while ensuring resilience and cost awareness. • Design, implement, and continuously improve tooling and monitoring systems. • Contribute to engineering guidelines and best practices for reliability and production excellence. • Collaborate closely with Product, Backend, Security, and Platform stakeholders to embed reliability by design.

France
Job Closed
Wikimedia Foundation logo

Senior Site Reliability Engineer, Infrastructure Foundations

Wikimedia Foundation

Imagine a world in which every single human being can freely share in the sum of all knowledge.

DevOps Engineer16 days ago
Full TimeRemoteTeam 501-1,000Since 2003H1B Sponsor

• Performing day-to-day operational/DevOps tasks on Wikimedia’s public facing infrastructure (deployment, maintenance, configuration, troubleshooting) • Implementing and utilizing configuration management and deployment tools (Puppet, Kubernetes) • Leading continuous improvement, by automating the installation, configuration and maintenance of services on our platform • Work closely with product teams helping them bring scalable functionality to our users by assisting in the architectural design of new services and making them operate at scale • Participating in a 24/7 on-call rotation shared across the broader SRE team. This includes taking part in incident response, diagnosis and follow-up on system outages or alerts across Wikimedia’s production infrastructure. • Collaborating with a global, cross-functional team in an asynchronous communication environment • Mentoring peers in your areas of technical and operational strength • Ability and willingness to travel 1-2 times a year for in-person events and team meetings

Arizona + 31 moreAll locations: Arizona | California | Colorado | Connecticut | District Of Columbia | Florida | Idaho | Illinois | Iowa | New Jersey | New Mexico | New York | North Carolina | Ohio | Oklahoma | Oregon | Maryland | Massachusetts | Michigan | Minnesota | Missouri | Pennsylvania | Rhode Island | Tennessee | Texas | Utah | Vermont | Virginia | Washington | West Virginia | Wisconsin | Wyoming
$113.1K - $175.7K / year
Full TimeRemoteTeam 201-500H1B No Sponsor

• Vorführung und Umsetzung moderner Security-Lösungen in Kundenprojekten • Implementierung von Security-Prozessen und -Tools in der Cloud, insbesondere in Kubernetes-Umgebungen • Entwicklung und Pflege von Custom API-Scripts in Python zur Automatisierung und Verwaltung von Security-Tools • Unterstützung von Kunden in kurzlaufenden Implementierungsprojekten sowie im langfristigen Support bei Rückfragen • Enge Zusammenarbeit in einem spezialisierten, fünfköpfigen Security-Team

Germany
Job Closed