Dive into anything
Staff SRE, Ads
Location
Netherlands
Posted
1 day ago
Salary
0
Seniority
Lead
Job Description
Staff SRE, Ads
Reddit, Inc.
• Lead reliability initiatives across multiple Ads domains including ad serving, auctions, targeting, reporting, measurement, and billing. • Partner with engineering leadership to improve reliability, scalability, operational excellence, and engineering efficiency across the Ads organization. • Drive architecture reviews and influence technical decisions impacting critical revenue-generating systems. • Design and build platforms, tooling, and automation that improve reliability and developer productivity at scale. • Participate in on-call rotations, lead complex incident investigations and coordinate cross-functional response efforts during major production events. • Identify systemic reliability risks and drive long-term solutions that improve platform resilience. • Establish reliability metrics around advertiser-critical user journeys such as campaign creation, ad delivery, auction participation, reporting, attribution, and billing. • Mentor engineers and provide technical leadership across multiple teams. • Influence roadmap planning and ensure reliability considerations are incorporated into product and infrastructure investments.
Job Requirements
- 8+ years of experience in Site Reliability Engineering, Infrastructure Engineering, or related roles operating large scale distributed systems.
- Strong experience supporting high traffic, user facing production environments.
- Deep understanding of distributed systems, networking, Linux systems, cloud native architectures.
- Experience designing highly available systems with strong operational and reliability practices.
- Strong understanding of observability systems including metrics, logging, tracing, and alerting.
- Good programming skills in languages such as Go, Python, or similar.
- Experience improving reliability through SLOs, automation, incident management, and performance optimization.
- Demonstrated ability to troubleshoot complex issues across a modern distributed system stack.
- Strong collaboration and communication skills with the ability to influence technical direction across teams.
Benefits
- Global Benefit programs that fit your lifestyle, from workspace to professional development to caregiving support
- Family Planning Support
- Gender-Affirming Care
- Mental Health & Coaching Benefits
- Private Pension plan with Employer-matching
- 100% employer-sponsored group medical plan
- Income Replacement Programs
- Flexible Vacation & Paid Volunteer Time Off
- Generous Paid Parental Leave
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Site Reliability Engineer
DropboxDropbox is the one place to keep life organized and keep work moving.
• Ensure the reliability, scalability, and performance of Dropbox's infrastructure and services • Collaborate with cross-functional teams to develop and maintain best practices for monitoring, logging, and incident response • Build, Implement and maintain automations & infrastructure-as-code tooling, specifically Terraform, Ansible, and Github Actions as well as custom code platforms • Utilize container orchestration platforms, such as Kubernetes, Amazon ECS and Red Hat Openshift, to manage containers at scale • Manage and optimize monitoring and logging pipelines using tools like Datadog and Cribl LogStream • Drive improvement projects related to service health and visibility for our stakeholders, ranging from developers to business service owners to C-level • Develop and maintain custom tooling and automation scripts in Bash, Python and other scripting languages
Role Description At HSI, we are committed to delivering innovative solutions that enhance our customers' digital experiences. We are looking for a talented DevOps & Security Engineer II to join our team and help us achieve our goals. As a DevOps & Security Engineer II, you will play a crucial role in deploying, managing, and maintaining systems that support HSI’s proprietary, customer-facing SaaS applications. You will develop deep product knowledge, including technical system capabilities, infrastructure details, and security processes. This role requires a blend of technical expertise, strategic thinking, and collaboration with various teams. Key Responsibilities: - Infrastructure Health & Security: Ensure the stability and security of our infrastructure by administering resources in cloud-hosted environments, managing resource provisioning, and conducting security audits. - Application Reliability: Work closely with DevOps and Product Development leadership to build and maintain a secure and reliable infrastructure, manage automated code deployment pipelines, and diagnose and resolve technical issues. - Technical Support & Collaboration: Provide direct support to Product, Technical Support, and Development Teams, facilitate access control, handle service requests, and assist with customer integrations. - Advanced System Design & Development: Expand on existing design patterns, modify templates, and extend program oversight to new products. - Mentorship & Knowledge Sharing: Support junior team members, share best practices, and contribute to documentation and training materials. Essential Functions: - Debugging & Triage: Provide systems architecture guidance for development teams and debug technical issues. - Security Management: Implement and maintain security tools, support ISO 27001 and SOC 2 compliance efforts, and respond to security incidents. - DevOps & Automation: Manage source control systems, administer build and release pipelines, and improve Infrastructure-as-Code. Level II Responsibilities: - Operate with greater independence, owning end-to-end initiatives and complex technical projects. - Drive strategic improvements, from pipeline optimization to automation and scalability. - Provide mentorship and knowledge-sharing to junior engineers and peers. - Collaborate with leadership on infrastructure design, tooling decisions, and security enhancements. The DevOps Team is accountable for five core domains: Infrastructure Health, Infrastructure Security, Application Security, Application Reliability, and DevOps Efficacy. Qualifications - 3–5 years of hands-on experience in DevOps, Systems Administration, or Software Engineering. - Proven expertise in Amazon Web Services (AWS)—you’ve built, managed, and optimized cloud environments is a plus. - Strong command of Infrastructure-as-Code, especially with Terraform. - Proficient in Python and PowerShell or other scripting languages for automation and tooling. - Solid understanding of source control systems (e.g., Git), modern software development workflows, and CI/CD tooling including Azure DevOps Pipelines and GitHub Actions. - Experience with relational databases; experience with PostgreSQL or MySQL is a plus. - Knowledge of API integrations and service-oriented architecture. - Familiarity with system and information security best practices, including incident response and vulnerability remediation. - Working knowledge of SOC 2 and ISO 27001 compliance frameworks is a plus. - Comfortable working across both Windows and Linux environments is a plus. - Experience with CircleCI is a plus. - Experience with one or more of the following security tools—Tenable, Azure Sentinel, AWS GuardDuty, Azure Key Vault, Mend, or Rootly—is a plus. - Available for some after-hours incident response. Benefits - Fully Remote Work: Work where you're most productive—we trust you. - Unlimited PTO: Recharge when you need to—we support work-life balance. - Career Growth: Unlimited access to our Learning & Development library, including our own HSI LMS content. - Well-Being First: Full medical, dental, vision, and mental health coverage. - Secure Your Future: Generous $1:$1 401(k) company match—your future is important to us.
• As a consultant, you bring hands-on expertise directly into projects and make a strong impression. • As a DevOps expert you have deep expertise and support the entire development process at our clients with passion and dedication. • You are passionate about automation and automate build, test and deployment processes. • You master Infrastructure-as-Code (IaC) tools to automate infrastructure management seamlessly. • You keep a watchful eye on clients' on‑premise and cloud infrastructure and continuously optimize it. • You are dedicated to alerting, monitoring and tracing to ensure smooth operation. • You collaborate closely with developers and testers in agile, cross-functional teams working with Scrum or Kanban. • Your committed and independent approach ensures reliable operation of the applications you help deliver. • You confidently and effectively represent the operations perspective to development teams, product owners and other project stakeholders, making a meaningful difference.
Senior SRE Engineer – Observability Focus
Capital.comWe are making the world of finance more accessible, engaging, and useful with an award-winning trading platform and app.
• Own the full observability stack: metrics (VictoriaMetrics), logs (OpenSearch), and traces (OpenTelemetry) — from pipeline design to day-2 operations. • Architect and run VictoriaMetrics cluster topology (vmstorage/vminsert/vmselect), including vmagent scraping, remote write configuration, vmalert rules, and cardinality control. • Operate OpenSearch clusters: index lifecycle management (ISM), hot-warm-cold architecture, shard tuning, and ingest pipelines via Data Prepper. • Build and maintain OTEL Collector pipelines — receivers, processors, exporters — and instrument services across Java, Python, and JS/TS stacks (auto and manual). • Run Kafka as the telemetry transport layer (OTEL Collector → Kafka → backends), including topic design, partition strategy, consumer group lag monitoring, and throughput tuning for high-volume telemetry. • Manage log shipping infrastructure using Fluent Bit, Vector, or Fluentd; define structured logging standards and field normalization across services. • Build Grafana dashboards and alerting that engineers actually use — clear, actionable, with well-structured variables and thresholds. • Work with platform and application teams to improve sampling strategies (head/tail), batching, and context propagation across distributed services. • Contribute to incident response, post-mortems, and reliability improvements driven by observability signals. • Mentor engineers on observability practices, tooling, and structured logging standards.




