Yelp logo
Yelp

Looking for a #FiveStarCareer? We know just the place!

Site Reliability Engineer, Core Streaming

DevOps EngineerDevOps EngineerFull TimeRemoteSeniorTeam 1,001-5,000Since 2004H1B SponsorCompany SiteLinkedIn

Location

California + 3 moreAll locations: California | Illinois | New York | Washington

Posted

57 days ago

Salary

$141K - $216K / year

Seniority

Senior

Bachelor DegreeEnglishApacheCloudJavaKafkaLinuxPython

Job Description

Site Reliability Engineer, Core Streaming

Yelp

• Design, deploy, and maintain large-scale Kafka event streaming infrastructure across hybrid and multi-cloud environments. • Collaborate with engineers to enable new features, ensure data pipeline reliability, and advise on best practices for real-time data processing. • Execute and automate Kafka cluster upgrades, migrations, and major version rollouts with minimal impact to critical services. • Build or enhance self-service capabilities and automation for cluster operations, scaling, and incident recovery. • Troubleshoot complex issues affecting data flow, performance, or stability, and drive root cause analyses. • Participate in on-call rotations.

Job Requirements

  • Strong hands-on experience designing and implementing large-scale Kafka event streaming capabilities in production, across hybrid or multi-cloud and Linux environments, including upgrades and migrations between platforms or versions.
  • In-depth knowledge of event streaming/data-in-motion design principles, architecture, and operational nuances.
  • Programming proficiency in Java, Python, or similar modern languages for tooling, integration, and automation.
  • Familiarity with Kafka Client APIs (Producer, Consumer, Streams), as well as sizing and capacity planning for high-throughput clusters.
  • Experience designing and optimizing real-time data streaming solutions with technologies like Apache Flink.
  • Knowledge of automating infrastructure and operational tasks (configuration management, IaC, scripting, or related).
  • Problem-solving mindset with an eagerness to learn, take initiative, and advocate for infrastructure best practices in a fast-paced environment.
  • A Bachelor’s Degree or an equivalent work experience is required.

Benefits

  • There may be flexibility with the range included in this posting should a candidate be leveled higher or lower than the posted range.
  • This opportunity has the option to be fully remote in all locations across the US.

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Smart Working logo

Senior DevOps Engineer, Cloud, MongoDB, Terraform

Smart Working

Empowering companies to work with the best engineers in the world

DevOps Engineer57 days ago
Full TimeRemoteTeam 51-200H1B No Sponsor

• Design, implement, and operate cloud-native infrastructure across GCP, AWS, or Azure using Terraform. • Take full ownership of MongoDB Atlas in production, including: - Cluster architecture and scaling - Replication and high availability - Backup and disaster recovery strategies - Performance tuning and query optimisation - Security and access control • Architect and manage containerised and serverless workloads (e.g., Cloud Run, ECS, Kubernetes, or equivalents). • Design and operate event-driven systems (e.g., Pub/Sub, SQS/SNS, EventBridge, or equivalents). • Build and maintain CI/CD pipelines with a strong focus on automation, reliability, and scalability. • Develop reusable Infrastructure as Code (Terraform) modules and manage multi-environment setups. • Collaborate with engineering teams on system architecture, scalability, and performance optimisation. • Implement robust monitoring, alerting, and observability across distributed systems. • Lead incident response and root cause analysis, driving long-term improvements. • Own infrastructure decisions end-to-end, including architecture, cost optimisation, and performance. • Document systems, create runbooks, and establish best practices. • Mentor engineers and promote DevOps best practices across the organisation.

India
Full TimeRemoteTeam 51-200

About Smart Working At Smart Working Solutions, we believe your job should not only look right on paper but also feel right every day. This isn’t just another remote opportunity - it’s about finding where you truly belong, no matter where you are. From day one, you’re welcomed into a genuine community that values your growth and well-being. Our mission is simple: to break down geographic barriers and connect skilled professionals with outstanding global teams and products for full-time, long-term roles. We help you discover meaningful work with teams that invest in your success, where you’re empowered to grow personally and professionally. Join one of the highest-rated workplaces on Glassdoor and experience what it means to thrive in a truly remote-first world. About the Role We are looking for a Senior DevOps Engineer with strong cloud infrastructure expertise (GCP / AWS / Azure) using Terraform and deep MongoDB Atlas ownership experience to design, operate, and scale a cloud-native infrastructure powering a large enterprise SaaS platform. This is a high-ownership, architecture-level role, not just execution. You will be responsible for designing and running production systems end-to-end, with a particular focus on database infrastructure (MongoDB Atlas) and scalable cloud environments. You will work in a fully remote, async-first environment, collaborating closely with engineering teams to ensure high availability, performance, and operational excellence across multiple environments. Responsibilities - Design, implement, and operate cloud-native infrastructure across GCP, AWS, or Azure using Terraform. - Take full ownership of MongoDB Atlas in production, including: - Cluster architecture and scaling - Replication and high availability - Backup and disaster recovery strategies - Performance tuning and query optimisation - Security and access control - Architect and manage containerised and serverless workloads (e.g., Cloud Run, ECS, Kubernetes, or equivalents). - Design and operate event-driven systems (e.g., Pub/Sub, SQS/SNS, EventBridge, or equivalents). - Build and maintain CI/CD pipelines with a strong focus on automation, reliability, and scalability. - Develop reusable Infrastructure as Code (Terraform) modules and manage multi-environment setups. - Collaborate with engineering teams on system architecture, scalability, and performance optimisation. - Implement robust monitoring, alerting, and observability across distributed systems. - Lead incident response and root cause analysis, driving long-term improvements. - Own infrastructure decisions end-to-end, including architecture, cost optimisation, and performance. - Document systems, create runbooks, and establish best practices. - Mentor engineers and promote DevOps best practices across the organisation. Requirements - 6+ years of DevOps / Infrastructure Engineering experience in production environments. - Strong hands-on experience with at least one major cloud provider: GCP, AWS, or Azure using Terraform. - Advanced experience with Terraform (modularisation, remote state, multi-environment setups). - Proven experience designing and operating scalable cloud infrastructure. - Mandatory: Deep MongoDB Atlas experience in production, including: - Cluster configuration and scaling - Replication and failover - Backup and recovery strategies - Performance tuning and indexing - Security and access management - Experience with containerised environments (Docker, Kubernetes, or equivalents). - Experience building and maintaining CI/CD pipelines. - Solid understanding of event-driven architectures. - Strong knowledge of monitoring, logging, and observability in distributed systems. - Ability to operate at an architect/owner level, not just execute tasks. - Strong communication skills and ability to work in a remote, async-first team. Nice to Have - Experience working across multiple cloud providers. - Experience implementing GitOps practices. - Familiarity with advanced observability tools (Datadog, APM, tracing). - Experience supporting high-scale SaaS platforms. - Interest in platform engineering and developer experience. Benefits - Fixed Shifts: 12:00 PM - 9:30 PM IST (Summer) | 1:00 PM - 10:30 PM IST (Winter) - No Weekend Work: Real work-life balance - Day 1 Benefits: Laptop and full medical insurance provided - Support That Matters: Mentorship, community, and collaboration - True Belonging: A long-term career where your contributions are valued At Smart Working, you’ll never be just another remote hire. Be a Smart Worker - valued, empowered, and part of a culture that celebrates integrity, excellence, and ambition.

India
ContractRemoteTeam 10,001+H1B Sponsor

• Provide recruitment and staffing services to various industries. • Understand hiring strategies and talent availability. • Collaborate as business partners to deliver high value and return on investment for clients. • Stay knowledgeable of latest industry trends and technologies.

New Jersey
Job Closed
ContractRemoteTeam 5,001-10,000H1B No Sponsor

• Responsible for designing, implementing, and maintaining robust CI/CD pipelines and infrastructure solutions. • Lead the design, deployment, and operation of a new Multi-region Artifactory platform hosted in AWS. • Design, implement, and maintain the Artifactory cloud architecture on AWS. • Lead the migration of existing repositories from on-prem to AWS. • Automate infrastructure provisioning and configuration management using tools such as Terraform, CloudFormation, or Ansible. • Design and maintain robust pipelines using industry-standard tools. • Implement and maintain orchestration solutions using Docker and Kubernetes. • Monitor system performance, troubleshoot complex issues, and implement solutions. • Enforce software supply chain policies and ensure security best practices are implemented throughout the CI/CD pipeline.

Portugal