The future of AI is open-source. Let's build together.
Forward Deployed Engineer
Location
United States
Posted
34 days ago
Salary
$270K - $300K / year
Seniority
Mid Level
Job Description
Forward Deployed Engineer
Together AI
Role Description As a Forward Deployed Engineer (FDE) focused on large scale GPU clusters, you will be a hands-on technical partner to our strategic customers – the world’s leading AI model builders. You will partner with our SAs as a deep-domain specialist in large-scale infrastructure, storage, high-performance networking, and cluster orchestration. As key contributors to the CX, Engineering, and Sales organizations, FDEs add tremendous value by ensuring we can meet the requirements of our most complex POCs, facilitate successful platform adoption for our strategic customers, and guide tailored optimization efforts - directly impacting company growth and the hardening of our core platform. Responsibilities - Cluster Hardening & Validation: Design and execute rigorous pre-handover test suites (NCCL, DCGM, GPU Burn) to ensure clusters are stable under the extreme stress of multi-node training. - Technical Partnership: Act as the primary technical point of contact for model labs, helping them tune their orchestration layer (Kubernetes or SLURM) for maximum throughput. - Infrastructure Optimization: Profile and debug low-level bottlenecks in InfiniBand (IB) fabrics, NVLink topologies, and high-performance storage systems. - Opinionated Onboarding: Build reference designs and "out-of-the-box" configurations for training frameworks to reduce customer time-to-train. - Benchmarking & Migration: Lead complex benchmarking exercises to demonstrate the performance impact of migrating to new hardware families or Together AI’s optimized infrastructure. - Product Feedback Loop: Directly influence our hardware and software roadmap by surfacing edge cases and performance gaps found during customer deployments. Qualifications - Experience: 5+ years in a technical role, with a strong focus on Large-Scale GPU Infrastructure. - Orchestration Mastery: Deep, hands-on experience with Kubernetes (specifically GPU-operator and device plugins) and/or SLURM for workload scheduling. - Networking & Interconnects: Expert knowledge of InfiniBand, RoCE, and NVLink; ability to diagnose network failures that degrade collective communication (NCCL). - Storage Knowledge: Familiarity with parallel file systems (VAST or Weka preferred) and object storage, specifically in the context of large-scale checkpointing. - Benchmarking Skills: Ability to run and interpret training benchmarks and communication tests to validate cluster health and performance. - Coding & Automation: Proficiency in Python and shell scripting; experience with Ansible or similar tools for automated cluster configuration. - Willingness to dive into the customer's stack to solve hard problems and comfortable with the high-stakes, fast-paced environment of frontier model labs. Benefits - Competitive compensation - Startup equity - Health insurance - Flexibility in terms of remote work Company Description Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancements such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers on our journey in building the next generation of AI infrastructure.
Related Guides
Related Categories
Related Job Pages
More Engineer Jobs
Senior Consulting Engineer
MongoDBMongoDB, originally called 10gen, is a software development company. Since 2007, MongoDB has created an open-source, document-oriented database to help clients
Role Description MongoDB is expanding our global team of consulting engineers to further our ongoing plans for worldwide growth! MongoDB Professional Services works with customers of all shapes and sizes in all verticals, from tier-1 banks to small web startups, on a variety of interesting use cases: - E-commerce platforms - Trading systems - Social media applications Be one of the recognized experts in this rapidly growing field in a high-growth software company successfully challenging the status quo of the database industry. You will have abundant opportunities to meaningfully impact the growth of our business in your region. This role will be based remotely in Sydney. Qualifications - Deep, demonstrable expertise in data platform design and operational excellence, including data modeling, performance tuning, and scaling for at least one major database (e.g., MongoDB, PostgreSQL, SQL Server, Oracle). - Proven ability to function as a trusted technical advisor by combining strong verbal and written communication skills with hands-on problem-solving ability. - Confidence speaking and presenting in a collaborative customer setting. - Strong, hands-on proficiency in Linux system use and diagnostics. - Practical experience deploying and managing databases using Kubernetes (k8s)/Containers. - 10+ years of software development/consulting experience, preferably in a number of distinct industries/verticals. - Familiarity with enterprise-scale software architectures, application development methodologies, and software deployment and operations. - Competence in at least one of the following languages: Java, C#, Python, Node.js (JavaScript), Ruby, Go. - Should be capable and comfortable with frequent travel for short trips to customer site during the working week. Requirements - Contribute to customer implementations at any or all phases of the application lifecycle: portfolio assessment, application planning, and design, deployment architectures, development and build, integration and release configuration, system testing, production operations, application optimization, and best practices adoption. - Deliver customer classroom-based training to architect, developer, and operations roles. - Work as part of a wider delivery team comprising fellow consulting engineers, project managers, and account teams. - Cultivate your individual MongoDB reputation as a trusted advisor to guide MongoDB and our customers into the future. - Partner with Product and Engineering teams to influence the direction of the MongoDB product roadmap. Benefits - Employee affinity groups - Fertility assistance - Generous parental leave policy
Role Description We are looking for a Quality Assurance Engineer with expertise in automated testing to ensure our backend systems meet the highest quality standards. - Automated Testing: Design, develop, and execute automated test scripts for software using industry-standard tools and frameworks (e.g., Selenium, TestNG, Junit, JMeter, Python). - Test Strategy: Define and implement automated testing strategies and frameworks to enhance testing efficiency and coverage. - Integration: Collaborate with development teams and integrate automated tests into the CI/CD pipeline. Document progress and issues in Jira and communicate via Microsoft Teams. - Performance Testing: Conduct performance and load testing to assess system scalability and robustness. - Troubleshooting: Identify, analyze, and troubleshoot complex issues in both automated and manual testing environments. - Mentorship: Provide guidance and mentorship to junior QA engineers, fostering a culture of quality and continuous improvement. - Documentation: Maintain detailed documentation of test cases, test scripts, and testing processes in Jira. Use Excel for analyzing and reporting on testing metrics, applying advanced functions as needed. - Data Analysis: Utilize SQL Server Management Studio for validating backend data and understanding data relationships. - Feedback: Provide critical feedback to stakeholders, including clients, project managers, and developers, regarding software quality and areas for improvement. Qualifications - 5+ years of experience in software testing with a strong focus on automated testing of backend systems. - Proficiency in programming languages (e.g., Java, Python, C#) and test automation tools and frameworks (e.g., Selenium, RestAssured, TestNG, JMeter). - Deep understanding of SDLC, testing methodologies, and best practices in automated testing. - Experience with Jira for documentation and issue tracking, Microsoft Teams for collaboration, Excel for reporting (including advanced functions), and SQL Server Management Studio for backend data validation. - Strong ability to understand data relationships and perform effective data manipulation using advanced spreadsheet functions. - Strong analytical and problem-solving skills with the ability to troubleshoot complex issues. - Bachelor’s degree in computer science, Engineering, or a related field. Requirements - Experience with containerization technologies (e.g., Docker, Kubernetes). - Familiarity with cloud platforms (e.g., AWS, Azure) and their testing-related services. - Familiarity with manual testing. - ISTQB certification. Benefits - 40 hours per week - Life insurance 100% covered - 50/50 Health insurance (optional) - 50/50 Half scholarships (optional) - Annual salary appraisal - Internal savings and credits association - Christmas bonus above the law - Internet bonus - Cellphone bonus - Other benefits
• Architect and deploy AWS Appstream, Workspaces Core w/ Citrix • Architect and deploy Omnissa Horizon VDI environments, including Instant Clones, RDSH, App Volumes, Dynamic Environment Manager, TrueSSO, Workspace One Access, and Thinapp • Implement and optimize Workspace ONE UEM for device management, application delivery, and compliance enforcement • Integrate Omnissa solutions for identity, access management, and zero-trust security frameworks
• Installatie en oplevering van AV/IT-projecten, inclusief het doorvoeren van configuratie- en/of softwarewijzigingen • Verantwoordelijk voor de volledige realisatie van AV/IT-projecten, waaronder installatie, inbedrijfstelling, bediening, montage en demontage van AV/IT-systemen van uiteenlopende omvang en complexiteit • Configureren van AV/IT-systemen en implementeren van wijzigingen waar nodig • Opleveren van projecten binnen de afgesproken planning, het budget en de vastgestelde kwaliteitsnormen • Oplossen van storingen en technicale vraagstukken




