Site Reliability Engineer (SRE)

Location

New York + 1 moreAll locations: New York | California

Posted

108 days ago

Salary

$150K - $250K / year

Seniority

Mid Level

Job Description

Site Reliability Engineer (SRE)

Baseten

ABOUT BASETEN Baseten powers mission-critical inference for the world's most dynamic AI companies, like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma and Writer. By uniting applied AI research, flexible infrastructure, and seamless developer tooling, we enable companies operating at the frontier of AI to bring cutting-edge models into production. We're growing quickly and recently raised our $300M Series E, backed by investors including BOND, IVP, Spark Capital, Greylock, and Conviction. Join us and help build the platform engineers turn to to ship AI products. THE ROLE As a Site Reliability Engineer, you'll envision and build robust systems and processes that ensure our infrastructure is scalable, reliable, and efficient. This can range from automating deployments and monitoring systems to optimizing performance and managing incidents. We all work closely with our users, learning from their past struggles in operationalizing ML, onboarding them onto our platform, and turning our learnings into ideas for improving Baseten. EXAMPLE INITIATIVES You'll get to work on these types of projects as part of our Infrastructure team: Multi-cloud capacity management Inference on B200 GPUs Multi-node inference Fractional H100 GPUs for efficient model serving RESPONSIBILITIES Build and maintain scalable infrastructure to support the deployment and operation of machine learning models. Establish standards and best practices for reliability and performance across the infrastructure. Automate processes when relevant, particularly for managing CI/CD pipelines. Own products and projects end-to-end, functioning as both an engineer and a project manager, with a focus on user empathy, project specification, and end-to-end execution. Collaborate with cross-functional teams to understand project requirements and translate them into technical solutions. REQUIREMENTS Bachelor's, Master's, or Ph.D. degree in Computer Science, Engineering, Mathematics, or related field. 5+ years of professional work experience in a fast-paced, high-growth environment. Extensive experience with Kubernetes. Experience in building and maintaining scalable infrastructure. Experience with infrastructure-as-code tools (e.g., Terraform, CloudFormation, Pulumi) and CI/CD tooling (e.g., GitHub Actions, GitLab CI, Circle CI, Jenkins). Relevant OSS observability experience (Prometheus, ELK stack, Grafana stack, Opentelemetry) is a plus. Ability to own projects end-to-end, from project specification to execution. No prior machine learning experience required, but should be open to learning about it. BENEFITS Competitive compensation, including meaningful equity. 100% coverage of medical, dental, and vision insurance for employee and dependents Generous PTO policy including company wide Winter Break (our offices are closed from Christmas Eve to New Year's Day!) Paid parental leave Company-facilitated 401(k) Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities. Apply now to embark on a rewarding journey in shaping the future of AI! If you are a motivated individual with a passion for machine learning and a desire to be part of a collaborative and forward-thinking team, we would love to hear from you. At Baseten, we are committed to fostering a diverse and inclusive workplace. We provide equal employment opportunities to all employees and applicants without regard to race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, genetic information, disability, or veteran status.

Job Requirements

  • Mentor junior team members and contribute to knowledge sharing within the organization.
  • Navigate ambiguity and exercise good judgment on tradeoffs and tools needed to solve problems, avoiding unnecessary complexity.
  • Demonstrate pride, ownership, and accountability for your work, expecting the same from your teammates.

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Supabase logo

Deployment Engineer

Supabase

Build in a weekend. Scale to millions.

DevOps Engineer108 days ago
Full TimeRemoteTeam 51-200Since 2020H1B No Sponsor

• Build and maintain the Multigres Operator - Maintain our Go-based Kubernetes operator that orchestrates distributed Postgres deployments • Architect cloud deployment infrastructure - Design and implement robust deployment patterns for EKS and other Kubernetes platforms • Manage storage and networking layers - Work with CSI drivers, persistent volumes, and cross-cloud networking to ensure data reliability and connectivity • Develop deployment tooling - Create internal tools and automation for provisioning, scaling, and managing Multigres clusters • Ensure operational excellence - Build monitoring, alerting, and diagnostic capabilities into the deployment layer • Collaborate across teams - Work with database engineers, SRE, and product teams to deliver seamless deployment experiences

Worldwide
OtherRemoteTeam 34Since 2013

About Andesite: After decades defending the nation's most sensitive networks, we founded Andesite with a clear mission: to build security products that transform how humans and AI collaborate to defend against increasingly sophisticated cyber threats. We’re a diverse team of cyber and security experts, passionate technologists, and experienced product builders. We come from some of the largest national security, tech, cybersecurity, and data organizations on the planet. We've raised more than $38 million from investors like General Catalyst and Red Cell Partners. The future of cybersecurity isn't about better technology alone—it's about reimagining how humans and machines work together. Come build with us. The Role: We are looking for a Senior Release Engineer to own the bridge between our engineering "factory" and our diverse customer environments. You will be responsible for the "definition of done" for our software, ensuring that our weekly SaaS updates are seamless and our self-managed bundles are robust, compliant, and audit-ready. This is a high-impact, hybrid role. You will spend 50% of your time on technical automation (CI/CD, packaging, artifact signing) and 50% on release orchestration (coordinating with Customer Support, Field Engineering, and government compliance officers).  You will have the support of a number of others across departments to deliver regular product updates to our customers! What You'll Do: Technical Delivery & Packaging Design and maintain the pipelines that produce our Single-Tenant SaaS updates and our Self-Managed customer bundles. Ensure "Build Once, Deploy Anywhere" consistency across standard cloud and restricted GovCloud environments. Manage artifact lifecycle, including versioning, container registries, and software signing to meet federal security standards. Release Orchestration & Compliance Act as the primary technical point of contact for ISSM (Information System Security Manager) approvals for GovCloud deployments. Maintain the "Version Map"—tracking which customers are on which versions and managing the complexities of "version lag" for those who opt out of weekly updates. Coordinate across teams to validate bundles before they are shipped to customer-managed environments. Automation and Metrics Continually improve our release operations and processes through automation Develop and track metrics for release operations, recommend and develop solutions to improve alongside the engineering team. Communication & Documentation Lead "Go/No-Go" decisions, synthesizing input from QA, Support, and Product. Empower Customer Support and Sales Engineering by providing them with clear "Known Issues" lists and migration paths for each release, with the support of the engineering and product team for input. What You Have: 5+ years in DevOps, Release Engineering, or SRE, specifically in a company that ships both SaaS and On-Prem/Self-Managed software. Strong communication and coordination skills - you have high agency and can own work end to end through ambiguous situations Deep experience with Docker and Helm Deep experience with AWS, Familiarity with SOC2, FedRAMP, and/or IL4/IL5/IL6 environments. You understand that "compliance" isn't a hurdle; it's a requirement of the build. A competitive salary, bonus, and equity package 100% employer paid, comprehensive health insurance including medical, dental, and vision for you and your family Unlimited PTO, with your manager’s approval Flexible work environment where you manage your work day A remote-first environment, with occasional travel to collaborate with customers, your team, and teammates from across the company in person 14 weeks of fully-paid parental leave Salary range : $170,000 - $210,000. This represents the typical salary range for this position based on experience, skills, and other factors. Andesite is an equal opportunity employer, and qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender perception or identity, national origin, age, marital status, protected veteran status, or disability status. We encourage candidates from all backgrounds to apply, even if you don't feel like you're a perfect fit. If you're passionate about contributing to our mission, we'd love to hear from you!

Washington
$170K - $210K / year
Job Closed
SitusAMC logo

Senior Associate, Site Reliability Engineer

SitusAMC

We're helping our clients identify and capture opportunities across the entire lifecycle of their real estate activity.

DevOps Engineer108 days ago
Full TimeRemoteTeam 5,001-10,000H1B Sponsor

• Support products that have recently been transitioned from on-prem data center into AWS Cloud • Involved in strategizing and implementing cloud best practices for newly transitioned products into Cloud. • Maintain operational coverage of environments and continuously look for optimization, reengineering, and efficiency. • Enhance automation capabilities, scaling, process improvement, metric collection, security, and visibility into the product environments. • Leverage various DevOps approaches which include but not limited to CI/CD processes. • Work closely with development teams to ensure a manageable and secure migration of change into the production environment. • Get embedded within product teams to enable the enterprise PaaS & SaaS offerings created by the Platform teams.

United States
$110K - $130K / year
Job Closed
25madison logo

Senior Infrastructure Engineer – DevOps / Production / AWS

25madison

25madison is a leading global venture platform specializing in both building and investing.

DevOps Engineer108 days ago
OtherRemoteTeam 11-50H1B No Sponsor

• Own uptime (99.9%+), observability, incident response, and root cause analysis • Deep AWS stack: EC2 (including GPU), ECS/Fargate, SQS, Lambda, S3, CloudFront, API Gateway, RDS/DynamoDB — plus VPC design, IAM, autoscaling, and monitoring • Build the plumbing: retry logic, idempotency, checkpointing, parallel orchestration • Chase down performance problems: Queue bottlenecks, cold starts, LLM latency, runaway costs • Help the team ship faster: CI/CD, infrastructure-as-code (Terraform/CDK/Pulumi), clean containerization, and proper staging environments

New York
$145K - $190K / year
Job Closed