Job Closed

This listing is no longer active.

Anthropic logo
Anthropic

Anthropic is an AI safety and research company working to build reliable, interpretable, and steerable AI systems.

Staff+ Software Engineer, Claude App Infrastructure

Infrastructure EngineerInfrastructure EngineerOtherHybridSeniorTeam 11-50Since 2020H1B SponsorCompany SiteLinkedIn

Location

California + 2 moreAll locations: California | New York | Washington

Posted

66 days ago

Salary

$320K - $485K / year

Seniority

Senior

Job Description

Staff+ Software Engineer, Claude App Infrastructure

Anthropic

Title: Staff+ Software Engineer, Claude App Infrastructure Locations: San Francisco, CA | New York City, NY | Seattle, WA About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the role Claude App Infrastructure owns the reliability, experience, and agentic capabilities of claude.ai. We are the platform team for Anthropic's flagship consumer product — responsible for the serving architecture that keeps Claude available at scale, the conversation experience, and the infrastructure that lets every product team at Anthropic ship into claude.ai safely and fast. When the product grows at hyper scale, we are the team that makes sure it feels the same — or better. Beyond the foundation, we are building the agentic layer that turns claude.ai from a chat surface into an execution surface: task execution, personalization, browser use, server-side tools. This is where the next generation of the product gets built — the primitives that let Claude act in the world, not just respond. We believe reliability and velocity are the same problem, and our job is to build the systems that make both inevitable. What you'll do: - Design and build sandboxed compute environments where Claude can safely execute code, access tools, and interact with external services - Build state management systems for long-running agent tasks—handling checkpoints, recovery, and resumption across failures - Develop authentication and authorization frameworks for delegated access—enabling Claude to act on behalf of users securely - Create observability and debugging tools for agent execution—understanding what Claude did, why, and how to make it better - Partner closely with product and research teams to define what's possible and ship it You may be a good fit if you: - Have 10+ years of experience building distributed systems, infrastructure, or platform services at the hyper scale - Comfortable building Cloud Native infrastructure on GCP, AWS, or Azure - Care deeply about security, isolation, and building systems that fail safely - Have experience with containers, sandboxing, or secure execution environments (e.g., gVisor, Firecracker, V8 isolates) - Are comfortable with ambiguity—this is greenfield work, and you'll help define the architecture - Write clean, maintainable code in Python, Go, Rust, or similar - Want to work on problems that don't have existing playbooks Strong candidates may have: - Experience building multi-tenant execution platforms or serverless infrastructure - Background in security engineering, sandboxing, or isolation technologies - Familiarity with workflow orchestration systems (Temporal, Airflow, Step Functions) - Experience with state machines, checkpointing, or durable execution patterns - Low-level systems experience (Linux internals, eBPF, container runtimes) The annual compensation range for this role is listed below. For sales roles, the range provided is the role’s On Target Earnings ("OTE") range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role. Annual Salary: $320,000 - $485,000 USD Logistics Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience Required field of study: A field relevant to the role as demonstrated through coursework, training, or professional experience Minimum years of experience: Years of experience required will correlate with the internal job level requirements for the position Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices. Visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this. We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work. We think AI systems like the ones we're building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team. How we're different We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact — advancing our long-term goals of steerable, trustworthy AI — rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We're an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills. The easiest way to understand our research directions is to read our recent research. This research continues many of the directions our team worked on prior to Anthropic, including: GPT-3, Circuit-Based Interpretability, Multimodal Neurons, Scaling Laws, AI & Compute, Concrete Problems in AI Safety, and Learning from Human Preferences. Come work with us! Anthropic is a public benefit corporation headquartered in San Francisco. We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space in which to collaborate with colleagues. Guidance on Candidates' AI Usage: Learn about our policy for using AI in our application process

Related Categories

Related Job Pages

More Infrastructure Engineer Jobs

Sigma Software Group logo

Infrastructure Engineer

Sigma Software Group

We support enterprises, product houses, and startups with custom software solutions development and IT consulting.

Full TimeRemoteTeam 1,001-5,000Since 2002H1B No Sponsor

• Administer and support hybrid IT infrastructure (primarily on-premises, with selected workloads in Microsoft Azure) • Install, configure, harden, and maintain Windows Server environments (2016–2022) • Manage on-premises Active Directory, Group Policy, DNS, DHCP, and other core Windows services • Perform basic administration of Azure infrastructure components (VMs, Virtual Networks, routing, Load Balancer, VPN connectivity, DNS, RBAC, monitoring) • Administer and troubleshoot Microsoft SQL Server, including backup strategies, restore procedures, and performance optimization • Manage Debian-based Linux servers hosting infrastructure services (e.g., Redis/Kafka) • Support and troubleshoot containerized services on Linux hosts using Docker Compose • Design, implement, and maintain infrastructure monitoring and alerting • Administer and troubleshoot network services (routing, DNS, DHCP, SMTP, VPN) • Manage source control systems (TFS / Azure DevOps Server, Git) • Automate infrastructure operations using PowerShell • Investigate incidents, perform root cause analysis, and implement preventive measures • Maintain technical documentation and operational standards • Collaborate with internal teams and customers to resolve issues and improve systems

Ukraine
Job Closed
Full TimeRemoteTeam 1,001-5,000Since 2010H1B Sponsor

Who we are About Stripe Stripe is a financial infrastructure platform for businesses. Millions of companies—from the world’s largest enterprises to the most ambitious startups—use Stripe to accept payments, grow their revenue, and accelerate new business opportunities. Our mission is to increase the GDP of the internet, and we have a staggering amount of work ahead. That means you have an unprecedented opportunity to put the global economy within everyone’s reach while doing the most important work of your career. About the team As a platform company powering businesses all over the world, Stripe processes payments, runs marketplaces, detects fraud, helps entrepreneurs start an internet business from anywhere in the world. Stripe's AppSec Engineers build scanning platforms and tooling, alert and remediation pipelines, ensure reliable data, and transform data from various inputs and applications used to ultimately represent security posture across all of Stripe. At Stripe, we are building security scanning and posture infrastructure using data science tooling and big data systems that will help us with scale while making onboarding and analysis of new data easy and transparent. Rather than traditional commercial tooling, you’ll help to drive codified processes, data analytics and automation. This is a unique challenge for a cyber professional interested in non-traditional security monitoring and response designed to function within a development operations framework. You’ll maintain strong partnerships with the security assessment and security discovery teams on capabilities and other security teams to understand the interfaces to those systems useful for monitoring and response throughout Stripe. What you’ll do Responsibilities - Understand data tooling available at Stripe and determine how to best leverage, modify, or fork them for use by security - Create libraries, tooling and platform needed to operationalize continuous security testing tools at scale - Enable holistic data integration to support advanced data analytics - Maintain libraries that enable interaction with various internal and external data sources and systems used for correlation of security posture logic - Create a reliability layer for metrics related to the data pipeline both for easy debugging and constant improvement of bottlenecks - Create APIs to help security teams access underlying data Who you are We’re looking for someone who meets the minimum requirements to be considered for the role. If you meet these requirements, you are encouraged to apply. The preferred qualifications are a bonus, not a requirement. Minimum requirements - A strong engineering background with interest in data. - Experience writing production Python and Go code - Experience developing and maintaining distributed systems built with open source tools - Experience building libraries and tooling that provide beautiful abstractions to users - Experience integrating with CI/CD developer flows - Experience with tools such as Kafka, Airflow and various Notebook technology - 4+ years of relevant experience in Security - Experience as a consumer of data science tooling and infrastructure - Experience security technologies including endpoint detection, network technologies, AWS cloud services - Strong understanding of the technical capabilities needed for an effective appsec and vulnerability management capability - Ability to build strong relationships and drive cross functional projects with engineering partners Preferred qualifications - Ability to drive concurrent projects and initiatives while managing operational responsibilities - An exemplary, user-focused communication style; emphasizing clarity, empathy and accuracy - Demonstrated success working remotely - Ability to deliver capabilities to teams in an iterative manner while building towards a larger vision - Demonstrated success overseeing internal tool development and automation at scale - Experience with collection of compliance artifacts, security incidents and risk awareness

India
Job Closed
DraftKings Inc. logo

Lead Infrastructure Engineer

DraftKings Inc.

Defining what it means to build and deliver the most extraordinary sports & entertainment experiences.The Crown is Yours

Full TimeRemoteTeam 1,001-5,000Since 2012

At DraftKings, AI is becoming an integral part of both our present and future, powering how work gets done today, guiding smarter decisions, and sparking bold ideas. It's transforming how we enhance customer experiences, streamline operations, and unlock new possibilities. Our teams are energized by innovation and readily embrace emerging technology. We're not waiting for the future to arrive. We're shaping it, one bold step at a time. To those who see AI as a driver of progress, come build the future together. The Crown Is Yours As a Lead Infrastructure Engineer, you'll design, build, and operate the machine learning platform and Databricks infrastructure that powers scalable, reliable data science at DraftKings. You'll own the backbone that makes model development, training, and deployment fast, repeatable, and cost-aware-so teams can move from ideas to impact without friction. Working alongside Data Science, Machine Learning, Data Engineering, and Infrastructure partners, you'll turn evolving use cases into durable platform capabilities. You'll lead projects end to end, strengthening reliability, automation, and developer experience across the stack. What you'll do as a Lead Infrastructure Engineer - Own and operate Databricks infrastructure with a focus on reliability, scalability, performance, and cost optimization. - Build and manage cloud infrastructure on Amazon Web Services using infrastructure-as-code tools like Terraform. - Author and review technical designs that enable scalable, automated, and reproducible machine learning workflows - Lead and mentor engineers, helping grow team capability and strengthen day-to-day execution. - Partner closely with data scientists, data engineers, machine learning engineers, and Infrastructure teams to align platform capabilities to real-world needs. - Drive engineering initiatives from technical planning through delivery, production rollout, and long-term maintenance. - Stay current on data platform and machine learning platform trends, applying best practices that improve platform efficiency, usability, and governance. - Coach and mentor teammates, raising the bar through strong technical feedback, thoughtful enablement, and shared ownership. What you'll bring - At least 5 years of experience in machine learning platform, data platform, data engineering, or infrastructure engineering roles. - Hands-on experience administering and operating Databricks in production environments. - Deep familiarity with infrastructure as code, including Terraform or Pulumi, and proven ability to manage change safely at scale. - Experience with AWS, Docker, Kubernetes, and continuous integration and continuous delivery pipelines. - People management experience is a plus - Strong Python skills and familiarity with machine learning tooling such as MLflow, pandas, and scikit-learn. - A track record of owning complex systems end to end, including reliability improvements, incident follow-up, and performance tuning. - Clear, confident communication skills, including strong technical documentation and the ability to align cross-functional partners. #LI-SP1 Join Our Team We're a publicly traded (NASDAQ: DKNG) technology company headquartered in Boston. As a regulated gaming company, you may be required to obtain a gaming license issued by the appropriate state agency as a condition of employment. Don't worry, we'll guide you through the process if this is relevant to your role.

Bulgaria
Job Closed
Full TimeRemoteTeam 11-50H1B No Sponsor

Deeter Analytics At Deeter Analytics, we’re building something that doesn’t get built twice in a generation. Our goal is to create a fundamental trading model as capable as today’s most advanced AI systems — but applied to global markets. Not incremental signals or isolated strategies, but a system that can continuously interpret, learn from, and act on the evolving state of the world. We train on large-scale, real-time social data — capturing how narratives form, how sentiment propagates, and how collective behavior drives markets. This requires operating at the frontier of data infrastructure, model design, and compute, all tightly integrated into a single system. You’ll work alongside a small group of elite engineers, AI researchers, and traders, in an environment defined by speed and ownership. We run experiments continuously. Ideas move from concept to production in hours. And the feedback loop is immediate — measured directly in live performance. About the role You will build and optimize the systems that turn data and compute into model capability. This role sits at the intersection of distributed systems, GPU infrastructure, and model training — ensuring that both our in-house models and state-of-the-art external models can be trained efficiently at scale. We prefer systems that maximize learning per unit of compute, not just systems that run. What you’ll work on ● Designing and operating distributed training systems on GPU infrastructure ● Optimizing GPU utilization, throughput, and training efficiency ● Translating model requirements into efficient system configurations ● Improving training speed, cost efficiency, and reliability ● Debugging failures in high-cost, high-pressure training environments What we’re looking for We’re looking for people who understand how systems behave under real constraints — and know how to push them to perform. Strong signals: ● You have run or significantly contributed to large-scale training workloads or compute-intensive systems ● You have a strong understanding of distributed systems in practice, including public cloud environments like AWS ● You understand how infrastructure behaves beneath the abstraction: ○ networking constraints ○ GPU/CPU utilization ○ memory and I/O bottlenecks ○ hardware limits at scale ● You can reason about how systems can be tuned for more efficient training and resource usage ● You have debugged systems where failures were non-trivial and costly ● You move quickly, identify bottlenecks, and eliminate them without being asked Bonus signals: ● Experience optimizing systems where small efficiency gains had large downstream impact ● Experience working under strict compute or cost constraints ● Experience debugging distributed or asynchronous systems with non-obvious failure modes ● You use AI tools to accelerate debugging, development, and iteration ● You care about building systems that are measurably efficient, not just functional

Philippines