Together AI logo
Together AI

The future of AI is open-source. Let's build together.

Platform Engineer, Model Shaping

Platform EngineerPlatform EngineerFull TimeRemoteMid LevelTeam 11-50H1B No SponsorCompany SiteLinkedIn

Location

United States

Posted

4 days ago

Salary

$200K - $290K / year

Seniority

Mid Level

Job Description

Platform Engineer, Model Shaping

Together AI

Role Description The Model Shaping team at Together AI works on products and research for tailoring open foundation models to downstream applications. We build services that allow machine learning developers to choose the best models for their tasks and further improve these models using domain-specific data. In addition to that, we develop new methods for more efficient model training and evaluation, drawing inspiration from a broad spectrum of ideas across machine learning, natural language processing, and ML systems. As a Platform Engineer in Model Shaping, you will work at the intersection of backend engineering and infrastructure, building the foundational layers of Together’s platform for model customization and evaluation. You will design, develop, and operate both the backend services and the underlying systems that enable us to sustainably and reliably scale production workflows launched by our users, as well as internal research experiments. You will operate in a cross-functional environment, collaborating with other engineers and researchers in the team to improve the infrastructure based on the needs of projects they work on. You will also interact with other engineering teams at Together (such as Commerce, Data Engineering, and Cloud Infrastructure) to integrate the services developed by Model Shaping with systems developed by those teams. Responsibilities - Design and build Together’s systems and infrastructure for model customization, including user-facing features and internal improvements - Contribute to reliability improvements for the platform, participating in an on-call rotation and improving processes for incident response - Create and improve internal tooling for deployment, continuous integration, and observability - Build a job orchestration platform spanning multiple datacenters, supporting a highly heterogeneous hardware landscape - Partner with teams developing internal services, co-designing these services and incorporating them in systems built within Together Qualifications - 3+ years of experience in building infrastructure or backend components of production services - Extensive experience designing, operating, and troubleshooting production Linux environments and Kubernetes-based platforms - Strong software engineering background in Python or Go - Experienced with infrastructure automation tools (Terraform, Ansible), monitoring/observability stacks (Prometheus, Grafana), and CI/CD pipelines (GitHub Actions, ArgoCD) - Cloud environment (e.g., AWS/GCP/Azure) administration experience, preferably with a hybrid bare-metal/cloud environment - Strong communication skills, be willing to document systems and processes and collaborate with peers of varying technical expertise - Comfortable operating across the stack, from cluster operations and infrastructure automation to backend service development Requirements - Experience in any of the following will make you stand out: - Developing large-scale production systems with high reliability requirements - Pipeline orchestration frameworks (e.g., Kubeflow, Argo Workflows, Flyte) - Managing GPU workloads on HPC clusters, ideally with hands-on experience in operating NVIDIA’s networking stack (e.g., NCCL, Mellanox firmware, GPUDirect RDMA) - Deployment of services for AI training or inference - Networking fundamentals, including TCP/IP, DNS, routing, load balancing, TLS, and network debugging tools - Maintaining or contributing to open-source projects Benefits - Competitive compensation - Startup equity - Health insurance - Flexibility in terms of remote work - The US base salary range for this full-time position is $200,000 - $290,000 - Individual compensation will be determined by experience, skills, and job-related knowledge Equal Opportunity Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.

Related Categories

Related Job Pages

More Platform Engineer Jobs

RD Station logo

Junior Platform Engineer – Observability

RD Station

To empower the heroes and scale-ups that grow the economy

Full TimeRemoteTeam 1,001-5,000Since 2011H1B Sponsor

Role Description Você fará parte do time de Observabilidade, responsável por evoluir a visibilidade e monitoramento dos sistemas da RD Station. Seu desafio será ajudar as equipes de engenharia a identificar problemas rapidamente, prevenir incidentes e garantir a estabilidade, segurança e confiabilidade dos produtos, atuando de forma colaborativa e proativa para aprimorar a maturidade da observabilidade na companhia. - Criar e manter monitoramentos que garantam a saúde e disponibilidade dos sistemas. - Acompanhar alertas e incidentes, apoiando a identificação inicial de falhas e direcionando tratativas ágeis. - Construir e evoluir dashboards para acompanhamento de performance e estabilidade dos serviços. - Executar testes sintéticos e validar monitoramentos para assegurar a confiabilidade das detecções automatizadas. - Documentar processos, monitoramentos e aprendizados técnicos para disseminação do conhecimento. Qualifications - Formação superior completa ou em andamento em áreas relacionadas à tecnologia (Ciência da Computação, Engenharia de Software, Sistemas de Informação, etc.). - Conhecimento básico em infraestrutura, redes, sistemas operacionais e ambientes cloud. - Familiaridade com monitoramento de aplicações, logs, métricas e ferramentas de observabilidade. - Experiência com ambientes Linux e uso de linha de comando. - Noções básicas de troubleshooting e análise de problemas técnicos. Requirements - Experiência prévia em times de infraestrutura, SRE, DevOps, Platform Engineering ou Observabilidade. - Conhecimento em ferramentas como Datadog, Grafana, Prometheus, Kibana ou similares. - Noções básicas de Kubernetes, containers e ambientes distribuídos. - Conhecimento em automação ou linguagens de scripting (Python, Bash, SQL). - Certificações iniciais em Cloud ou metodologias ágeis. Benefits - Bem-Estar Integral: Cuidamos de quem faz a evolução acontecer, buscando o bem-estar integral de cada pessoa colaboradora. - Pluralidade e Pertencimento: Promovemos a inclusão e o pertencimento de forma ativa. - Valorização da profissional mulher e mãe: Somos uma das melhores empresas do país para mulheres trabalharem. Company Description Através de um ecossistema de mais de 12 mil pessoas inconformadas que, juntas, oferecem soluções, produtos e educam o mercado. Atuamos como uma Única TOTVS, integrando diferentes soluções e expertises para educar o mercado e simplificar o mundo dos negócios.

Brazil

PLM Platform Engineer

Bright Vision Technologies

Bright Vision Technologies is a forward-thinking software development company dedicated to building innovative solutions that help businesses automate and optimize their operations. We leverage cutting-edge technologies to create scalable, secure, and user-friendly applications. We recognize that our people are our strength. We are an equal opportunity employer and place a high value on diversity and inclusion. We do not discriminate on the basis of any protected attribute. We make reasonable accommodations for applicants’ and employees’ religious practices and beliefs, as well as mental health or physical disability needs. Bright Vision Technologies is an Equal Opportunity Employer, including Disability/Veterans.

Role Description We are seeking a PLM Platform Engineer with deep experience operating either PTC Windchill or Siemens Teamcenter (preferably both) in large enterprise environments. In this role you will own the technical operation of the PLM platform — installation, configuration, performance tuning, upgrades, integrations, and high availability — and partner with functional, engineering, and manufacturing teams to deliver a reliable, performant, and secure PLM ecosystem. The ideal candidate will bring strong PLM administration fundamentals, hands-on experience with PLM upgrades and migrations, and a measurement-driven approach to platform reliability. Key Responsibilities - Install, configure, and operate Windchill or Teamcenter environments across development, test, and production. - Lead PLM upgrades, patches, and platform migrations with minimal disruption. - Manage PLM application servers, web servers, database connectivity, and method servers. - Operate file vaults, replication services, and CAD data management subsystems. - Implement and tune HA/DR strategies for PLM environments, applying disciplined engineering practices and partnering closely with stakeholders to ensure outcomes are durable, well-documented, and aligned with broader team and platform standards. - Optimize PLM performance through query tuning, caching, indexing, and JVM tuning. - Manage user provisioning, security configurations, and audit support, applying disciplined engineering practices and partnering closely with stakeholders to ensure outcomes are durable, well-documented, and aligned with broader team and platform standards. - Operate PLM integration brokers and middleware connectors, applying disciplined engineering practices and partnering closely with stakeholders to ensure outcomes are durable, well-documented, and aligned with broader team and platform standards. - Develop automation scripts using shell, Python, or Ansible to reduce operational toil. - Monitor PLM health using native tooling and integrated observability platforms. - Provide hands-on post-go-live and hypercare support, working closely with operations teams to triage incidents quickly, identify root causes, and drive durable fixes that improve long-term system stability. - Maintain comprehensive, current technical documentation — including architecture diagrams, design decisions, configuration references, runbooks, and operational procedures — so that the system remains supportable, auditable, and easy to onboard new engineers onto over time. - Mentor and coach junior and mid-level engineers through code review, design review, pair programming, and structured knowledge sharing, helping the broader team grow in technical maturity and confidence over time. - Drive continuous improvement of the PLM platform, applying disciplined engineering practices and partnering closely with stakeholders to ensure outcomes are durable, well-documented, and aligned with broader team and platform standards. Qualifications - Bachelor’s degree in Computer Science, Engineering, or a related technical discipline. - Five or more years of PLM platform administration experience. - Hands-on experience with either PTC Windchill or Siemens Teamcenter in production. - Strong experience with PLM upgrades and migrations. - Working knowledge of Oracle and SQL Server database administration. - Strong Linux/Unix administration skills. - Experience operating HA/DR for PLM environments. - Familiarity with PLM integration brokers and middleware. - Scripting skills in shell, Python, or Ansible. - Excellent troubleshooting and documentation skills. Preferred Qualifications - Experience operating PLM on cloud platforms (AWS, Azure, OCI). - Exposure to infrastructure-as-code for PLM environments. - Familiarity with CI/CD patterns for PLM change management. - PTC or Siemens PLM certifications. - Experience with CAD integration patterns at scale. How to Apply Would you like to know more about this opportunity? For immediate consideration, please send your resume to [email protected] . Learn more about Bright Vision Technologies at www.bvteck.com .

United States
100K - 150K / year
Job Closed
Full TimeRemoteTeam 51-200Since 1900H1B No Sponsor

• Full‑stack application development • Design, build, and maintain enterprise applications using Python on the backend and React + TypeScript on the frontend • Translate business and product needs into scalable, maintainable application architectures • Establish strong foundations for application structure, security, testing, and developer experience • Build production‑ready application skeletons and core workflows that teams can extend over time • Define and document clear API contracts, data flows, and integration patterns • Build and optimize backend services using FastAPI and modern Python frameworks • Design RESTful APIs with strong validation, versioning, and documentation • Implement authentication and authorization using OAuth 2.0 / OIDC, RBAC, and enterprise identity providers • Develop reliable, high‑performance services with clear error handling and observability • Design database schemas and data access layers (e.g., PostgreSQL) • Build accessible, responsive React applications using TypeScript • Create reusable component architectures and scalable UI patterns • Implement secure authentication flows and protected routes • Develop intuitive interfaces for dashboards, forms, admin tools, and data‑driven experiences • Optimize performance through modern frontend best practices • Containerize services using Docker with secure, maintainable builds • Set up Docker Compose local environments for multi‑service applications • Partner with DevOps to support CI/CD pipelines and cloud deployments • Help ensure applications are ready for cloud‑native environments • Partner with engineers, designers, product managers, and business teams

North Carolina + 1 moreAll locations: North Carolina | Texas
$119K - $143.5K / year
Job Closed
UNIFY Dots logo

Power Platform Developer

UNIFY Dots

Connecting the Dots in your Organization. Putting People and Clients before Profit. Microsoft Dynamics Specialist.

Full TimeRemoteTeam 201-500H1B Sponsor

• Design and develop solutions using: Power Apps (Model-Driven and Canvas Apps), Power Automate (flows and workflows), Dataverse • Build scalable applications following best practices and design patterns • Develop reusable components and optimize performance • Develop and customize Power Pages portals • Work with HTML, CSS, JavaScript for UI enhancements • Implement custom functionalities and integrations • Integrate Power Platform with: Microsoft Dynamics 365, Azure services, External applications and APIs • Support data integration, workflows, and automation use cases • Use Azure DevOps for: Source control (Repos), Work tracking (Boards), CI/CD pipelines • Follow Git-based deployment processes • Work with functional consultants and stakeholders to gather requirements • Translate business needs into technical solutions • Support testing, deployment, and post-go-live activities

India