Job Closed
This listing is no longer active.
The future of AI is open-source. Let's build together.
Software Engineer - Storage & Observability (Early Career)
Location
United States
Posted
65 days ago
Salary
$165K - $200K / year
Seniority
Mid Level
Job Description
Software Engineer - Storage & Observability (Early Career)
Together AI
About the Role Together AI is building the AI Acceleration Cloud, an end-to-end platform for the full generative AI lifecycle. Our AI Infrastructure team is at the forefront of scaling the foundational systems that power this platform. We are looking for an Early Career Software Engineer to join our Storage and Observability team, where you will help design and maintain robust distributed storage solutions and develop comprehensive observability platforms. In this role, you will work on the systems that provide critical insights into GPU utilization and system performance, ensuring seamless data access for the world's largest AI training and inference workloads. Responsibilities - Build and deploy scalable observability tools (metrics, logs, traces) using the latest state of the art open source distributed telemetry, log search and tracing systems - Develop and implement infrastructure-as-code for stack deployment using Terraform, Ansible, and Helm - Write clean, production-grade code in Go or Python to create custom K8S operators, tools and automation - Support the operation of high-performance distributed storage systems (such as Ceph, Weka/Vast) and Kubernetes-native storage operators - Optimize storage systems for GPU clusters (10-50 GB/s per-node throughput) and scale storage infrastructure to support thousands of nodes - Partner with senior engineers to enhance distributed tracing and optimize data paths for AI workloads Minimum Qualifications - Experience: 1–3 years of professional experience in Software Engineering or Cloud Operations with hyper scalers - Cloud & Containers: Solid understanding of Docker and Kubernetes orchestration, as well as experience with cloud platforms like AWS, GCP, or Azure - Tooling: Familiarity with infrastructure-as-code (Terraform or Helm) and version control (Git) - Observability Fundamentals: Experience using Prometheus and Grafana for system monitoring - Storage Systems: Experience with distributed storage systems, like WekaFS, Vast, Ceph, MinIO, GPFS, Luster etc - Problem Solving: Strong debugging skills and a passion for automation and operational excellence Preferred Qualifications - Experience monitoring AI/ML infrastructure, GPU clusters, and custom metrics for model performance and training pipelines - Background in high-frequency, low-latency systems monitoring, chaos engineering, and reliability testing - Contributions to open-source projects, preferably in the space of observability or storage - Familiarity with security monitoring and compliance frameworks About Together AI Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancement such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers in our journey in building the next generation AI infrastructure. Compensation We offer competitive compensation, startup equity, health insurance, and other benefits, as well as flexibility in terms of remote work. The US base salary range for this full-time position is: $165,000 - $200,000 + equity + benefits. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and job-related knowledge. Equal Opportunity Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more. Please see our privacy policy at https://www.together.ai/privacy
Related Guides
Related Job Pages
More Software Engineer Jobs
• Projetar, codificar, testar serviços que impactam as principais áreas do iFood • Ajudar a Alimentar o Futuro do Mundo através de tecnologia de ponta, Educação, Meio Ambiente e Inclusão • Impactar diretamente a vida de milhões de pessoas diariamente • Aprender e compartilhar conhecimento • Explorar novas possibilidades para dar o seu próximo passo de carreira
• Develop, optimize & rollout workflows and applications (40%) • Create and deploy end-to-end analytics applications (30%) • Collaborate with stakeholders to understand business needs (20%) • Support cross-functional integration and application performance (10%)
Software Development Engineer III, Navigation Incidents
MapboxMapbox powers navigation for people, packages, and vehicles everywhere.
Role Description As a Software Development Engineer III (Staff) on the Incidents team, you can expect to: - Collaborate with your team to identify and scope out well-defined tasks. - Execute on the scope and be accountable for delivering on time with quality. - Design systems and make decisions that will keep pace with the rapid growth of Mapbox’s customer base. - Promote a culture of operational excellence by meticulously testing and monitoring our systems and code, writing documentation, and being on-call to support the health of our services. - Reduce technical debt, share your knowledge, and invest in your teammates’ health and happiness, while optimizing application performance and accelerating feature velocity. - Uphold a culture of collaboration, transparency, creativity, inclusion, and data-driven decisions. Qualifications - 8+ years of experience building scalable high volume low latency backend services. - 3+ years of experience building pipelines (streaming) capable of handling petabytes worth of data. - Experience applying GenAI and LLMs to process and extract insights from unstructured data such as text, documents, and logs. - Experience building search, retrieval, and ranking systems using semantic search, embeddings, and relevance optimization techniques. - Eagerness to learn and become proficient with many different test stacks and languages – Python, NodeJS, TypeScript, C++, and AWS (CDK, ECS, Fargate, Step Functions, Lambda, S3, etc.). - Knowledge and experience handling global data security standards. - Familiarity with code versioning tools, such as GitHub. - Ability to engage, learn and contribute quickly to the initiatives. - Able to perform independently all the development tasks, based on designs and specs. - Self-starter who is communication and outcomes-oriented. - An empirical analytical approach. You develop strong hypotheses, conduct spikes, and clearly communicate your findings. - A desire to share your expertise through documentation, mentorship, pairing and both written and verbal discussion. - A desire to work with individuals with diverse backgrounds, perspectives, and experiences. - High quality mindset -- write unit tests, proactively remedy defects and follow through to production. Requirements - Experience in integrating AI into design, development, and decision-making. Benefits - Supportive health care and parental leave. - Flexibility for personal life events. - Environment of teaching and learning. - Commitment to growing a diverse team. Company Description Mapbox is the leading real-time location platform for a new generation of location-aware businesses. More than 4 million registered developers have chosen Mapbox because of the platform’s flexibility, security, and privacy compliance.
Lead .NET Developer
AgileEngineAgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people-first culture has earned us multiple Best Place to Work awards.
Role Description As a Lead .NET Developer, you’ll drive the full development lifecycle of a test automation suite — working directly with stakeholders to shape product direction and technical strategy. Leading a focused squad within an Agile environment, you’ll build meaningful features in C#, WPF, and WinForms while deepening your expertise in a globally recognized platform trusted by over 80,000 users. This role offers a rare blend of hands-on engineering and technical leadership for someone ready to own quality from the inside out. Qualifications - Expertise in C# - Experience as a team leader - 4+ years of experience with .NET development - Strong knowledge of Microsoft Windows operating systems - Strong knowledge of WPF and WinForms - Experience with Azure DevOps integrations - Experience with performance and memory optimization - Experience with scripting languages and network protocols - Understanding of C++ / Qt - Understanding of databases - Proficiency with JIRA for issue tracking - Fluent English level Requirements - Lead a development team consisting of developers and QA - Communicate with stakeholders and provide updates on progress - Prepare reports on completed work and project status - Participate in Scrum ceremonies and support agile processes - Develop new functionality and resolve issues in automation tools - Troubleshoot installation and testing issues across environments - Maintain and expand internal knowledge base documentation Benefits - Remote work & Local connection: Work where you feel most productive and connect with your team in periodic meet-ups to strengthen your network and connect with other top experts. - Legal presence in India: We ensure full local compliance with a structured, secure work environment tailored to Indian regulations. - Competitive Compensation in INR: Fair compensation in INR with dedicated budgets for your personal growth, education, and wellness. - Innovative Projects: Leverage the latest tech and create cutting-edge solutions for world-recognized clients and the hottest startups.



