Bloomreach logo
Bloomreach

Bloomreach is a computer software company that is on a mission to empower its clients to seamlessly personalize their customer experience and, in turn, successf

Senior Site Reliability Engineer for Datacraft team

Location

Czechia

Posted

54 days ago

Salary

0

Seniority

Senior

No structured requirement data.

Job Description

Senior Site Reliability Engineer for Datacraft team

Bloomreach

Bloomreach is building the world’s premier agentic platform for personalization.We’re revolutionizing how businesses connect with their customers, building and deploying AI agents to personalize the entire customer journey. - We're taking autonomous search mainstream, making product discovery more intuitive and conversational for customers, and more profitable for businesses. - We’re making conversational shopping a reality, connecting every shopper with tailored guidance and product expertise — available on demand, at every touchpoint in their journey. - We're designing the future of autonomous marketing, taking the work out of workflows, and reclaiming the creative, strategic, and customer-first work marketers were always meant to do. And we're building all of that on the intelligence of a single AI engine — Loomi AI — so that personalization isn't only autonomous…it's also consistent.From retail to financial services, hospitality to gaming, businesses use Bloomreach to drive higher growth and lasting loyalty. We power personalization for more than 1,400 global brands, including American Eagle, Sonepar, and Pandora. Become a Senior SRE for Bloomreach! Join the newly form Datacraft team — the team building the next-generation data platform for Bloomreach Engagement. Datacraft owns three interconnected domains: - Data Warehouses (~60%) — making Bloomreach data first-class in customer DWHs (Snowflake, BigQuery, Databricks). The strategic goal for 2026–27 is to use DWHs to exponentially accelerate data adoption. - Loomi Analytics Agent (~20%) — evolving Loomi Analytics into an agentic analytics assistant that can explore data across systems, explain insights, and act on them. - Dashboards & Analytics Stack (~20%) — moving Engagement reporting onto DWH-backed, modern analytics stacks (semantic layers, headless BI tools). As a Senior SRE, you will be the reliability backbone of this AI-first data team. Your work will directly impact the deployments, pipelines, reliability, and observability of pipelines and services that hundreds of enterprise customers depend on — from data exports into Databricks and BigQuery, to the AI agent Loomi uses to surface insights. Datacraft is an AI-first team. We believe code is a commodity and expect every engineer to fluently use coding agents (e.g., Cursor, Claude Code, Copilot, Gemini CLI) as a core part of their daily workflow. The ability to leverage AI tooling to accelerate development, prototyping, and problem-solving is not optional — it's foundational. Working in one of our Central European offices (Bratislava, Praha, Brno) or from home on a full-time basis, you'll become a core part of the Engineering team. What challenge awaits you? As a P3 (Senior) SRE at Bloomreach, you are an independent professional — expert in reliability engineering, able to decompose objectives into actionable infrastructure improvements, and lead initiatives end-to-end with minimal day-to-day guidance. We need you to build and operate an ecosystem where data engineers can safely and efficiently develop, debug, and operate data-intensive jobs and services — spanning Kafka ingest pipelines, Iceberg data lakes, multi-DWH exports, Databricks deployment and orchestration (Airflow / Cloud Composer), and agentic AI workloads. Your responsibilities a. Platform reliability & observability - Build and maintain the reliability ecosystem where engineers can safely develop, debug, and operate DataCraft services running on GCP and Kubernetes (DataProc, Cloud Composer, BigQuery, Snowflake/Databricks connectors). - Ensure end-to-end observability across the full data platform — from Kafka ingest through GCS/Iceberg staging, Airflow orchestration, to Databricks and BigQuery destinations — enabling the team to catch missing loads, SLA breaches, and data drifts before customers notice, or costs drift. - Drive scalability so services can scale vertically and horizontally based on operational and telemetric data (OpenTelemetry, Prometheus, Victoria Metrics). - Maintain team health dashboards and alerting (Grafana, PagerDuty, Sentry). b. Infrastructure as Code & deployments - Own and evolve Terraform-based infrastructure for DataCraft services. - Automate deployments, instance setup, and operational runbooks to eliminate manual/semi-manual steps. - Maintain CI/CD pipelines (GitLab) with linters, security scans, and code quality checks, AI code reviews, enabling engineers to produce high-quality MRs. c. Security & compliance - Help the team fulfill security requirements for ISO and SOC2 audits by enforcing security principles: key distribution, key rotation, authorization & authentication at the service level, data encryption in transit, data isolation, resource limitations, and audit logs. - Ensure data access controls are properly enforced across multi-DWH environments (BigQuery, Snowflake, Databricks). d. Incident management & L3 support - Participate in and drive L3 on-call rotation and incident resolution for DataCraft services. - Contribute tooling for debugging, troubleshooting, and performance testing of data pipelines and orchestration layers. - Use telemetry data and distributed tracing to navigate complex, distributed service architectures. e. Agentic platform reliability - Ensure reliability and observability of the Loomi Analytics Agent data infrastructure — LLM API gateway performance, MCP server health, and evaluation pipeline availability. - Monitor and alert on data quality issues that could introduce inconsistencies or hallucinations in Loomi's responses — making the agent's data access patterns reliable and debuggable. Our tech stack Languages: Python (primary), Go, SQL Messaging & streaming: Apache Kafka Storage & databases: Databricks, BigQuery, Apache Iceberg, GCS, Mongo, Redis Data processing & orchestration: Apache Spark, DataFlow, Airflow / Cloud Composer Infrastructure: GCP, Kubernetes, Terraform AI / Agentic: LLM APIs, MCP, agent orchestration frameworks Observability: Grafana, Prometheus, Victoria Metrics, PagerDuty, Sentry, OpenTelemetry CI/CD & tooling: GitLab, Jira, Confluence AI coding agents: Cursor, Claude Code Your qualifications Professional experience Impact - You can articulate how your contributions transformed the way engineers work and fostered a strong SRE/DevOps culture. - You can demonstrate how impactful reliability work connects to business success and customer outcomes. Ownership - You embrace the you build it, you run it principle — you love owning what you ship. - You are cost-aware: effective vertical and horizontal autoscaling and detailed telemetry insights are how you demonstrate mindfulness of cloud spend. Systematic approach - Infrastructure as Code is the only thing that brings stability into chaos - You design for failure: SLOs, error budgets, and runbooks are first-class artifacts, not afterthoughts. Data-driven - You use telemetry and metrics to give engineers actionable feedback on how applications and services behave. - You can navigate complex data platform architectures using distributed tracing and debugging. Technical skills - Solid hands-on experience with GCP (BigQuery, DataProc, Cloud Composer, GCS) and Kubernetes. - Experience with Python; Go is a strong advantage. - Familiarity with data pipeline technologies (Kafka, Airflow/Cloud Composer, Spark, Iceberg) — you don't need to write ETL code, but you need to operate it reliably and know when something is wrong. - Fluent use of AI coding agents (Cursor, Claude Code, Copilot, Gemini CLI, or similar) — you already use these tools daily to accelerate work. - Comfortable with on-call rotation and 24/7 incident response. - Remote-first mindset — you know how to be effective in distributed teams. - You are able to learn and adapt — essential when exploring new tech or navigating our growing codebase. Strongly preferred - Experience operating single-DWH environments (Snowflak, Databricks or BigQuery). - Familiarity with agentic/LLM workloads — API reliability, latency SLOs, trace observability for AI systems. - Experience with open table formats (Iceberg, Delta Lake) in production environments. - Exposure to data security and compliance in the context of customer-facing DWH integrations (consent, data retention, PII handling). Personal qualities - Ownership & accountability — you take issues from detection through to resolution and follow-up prevention. - Systematic thinking — you identify root causes, not symptoms, and document your findings so the team learns. - Collaboration & communication — you explain trade-offs and constraints clearly to both engineers and non-engineers. - Bias for reliability — operational excellence (SLOs, oncall friendliness, proactive alerting) is not a chore, it's your craft. - Continuous improvement mindset — you are comfortable iterating, revisiting assumptions, and improving incrementally. - Comfortable operating remote-first in a distributed team across Central Europe. Your success story In 30 days: - Get to know the DataCraft team, the company, and the most important processes. - Set up your local and GCP development environment and complete the Engagement engineering onboarding. - Understand the current state of DataCraft services: pipelines, orchestration, observability gaps, and on-call runbooks. In 90 days: - Start contributing to the L3 on-call rotation, handling incidents, troubleshooting, and debugging — which will sharpen your understanding of the platform and surface fresh improvement ideas. - Deliver your first meaningful reliability improvement: an observability enhancement, a deployment automation, or an SLO definition for a key DataCraft service. In 180 days: - Own the reliability posture of at least one DataCraft domain end-to-end — able to independently design, operate, and continuously improve it. - Drive measurable improvements in MTTR, alert signal-to-noise ratio, or deployment confidence across the team. - Be a trusted reliability partner in architecture discussions — your input shapes how new DataCraft services are designed for operability from day one. #LI-KP1 More things you'll like about Bloomreach: Culture: - A great deal of freedom and trust. At Bloomreach we don’t clock in and out, and we have neither corporate rules nor long approval processes. This freedom goes hand in hand with responsibility. We are interested in results from day one. - We have defined our 5 values and the 10 underlying key behaviors that we strongly believe in. We can only succeed if everyone lives these behaviors day to day. We've embedded them in our processes like recruitment, onboarding, feedback, personal development, performance review and internal communication. - We believe in flexible working hours to accommodate your working style. - We work virtual-first with several Bloomreach Hubs available across three continents. - We organize company events to experience the global spirit of the company and get excited about what's ahead. - We encourage and support our employees to engage in volunteering activities - every Bloomreacher can take 5 paid days off to volunteer*. - The Bloomreach Glassdoor page elaborates on our stellar 4.4/5 rating. The Bloomreach Comparably page Culture score is even higher at 4.9/5 Personal Development: - We have a People Development Program -- participating in personal development workshops on various topics run by experts from inside the company. We are continuously developing & updating competency maps for select functions. - Our resident communication coach Ivo Večeřa is available to help navigate work-related communications & decision-making challenges.* - Our managers are strongly encouraged to participate in the Leader Development Program to develop in the areas we consider essential for any leader. The program includes regular comprehensive feedback, consultations with a coach and follow-up check-ins. - Bloomreachers utilize the $1,500 professional education budget on an annual basis to purchase education products (books, courses, certifications, etc.)* Well-being: - The Employee Assistance Program -- with counselors -- is available for non-work-related challenges.* - Subscription to Calm - sleep and meditation app.* - We organize ‘DisConnect’ days where Bloomreachers globally enjoy one additional day off each quarter, allowing us to unwind together and focus on activities away from the screen with our loved ones. - We facilitate sports, yoga, and meditation opportunities for each other. - Extended parental leave up to 26 calendar weeks for Primary Caregivers.* Compensation: - Restricted Stock Units or Stock Options are granted depending on a team member’s role, seniority, and location.* - Everyone gets to participate in the company's success through the company performance bonus.* - We offer an employee referral bonus of up to $3,000 paid out immediately after the new hire starts. - We reward & celebrate work anniversaries -- Bloomversaries!* (*Subject to employment type. Interns are exempt from marked benefits, usually for the first 6 months.) Excited? Join us and transform the future of commerce experiences! If this position doesn't suit you, but you know someone who might be a great fit, share it - we will be very grateful! Any unsolicited resumes/candidate profiles submitted through our website or to personal email accounts of employees of Bloomreach are considered property of Bloomreach and are not subject to payment of agency fees. #LI-Remote

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Full TimeRemoteTeam 201-500H1B No Sponsor

• Manage infrastructure and architect systems behind a $1B+ e-commerce platform • Own cloud infrastructure and DevOps/DataOps strategy • Architect and scale AWS infrastructure (S3, Redshift, Glue, EMR, Lambda, ECS) • Lead Infrastructure as Code using Terraform/Terragrunt • Build and optimize CI/CD pipelines for high-velocity deployments • Design and evolve ETL pipelines and data platform architecture • Drive containerization and orchestration (Docker, Kubernetes) • Establish observability: monitoring, logging, alerting • Ensure security, compliance, and cost-efficient cloud operations • Lead incident management and improve system reliability • Define DevOps/DataOps standards across teams • Mentor engineers and elevate technical excellence

Uruguay
Caspar Health logo

DevSecOps/DevOps Engineer (all identities)

Caspar Health

Effective, recognized digital rehabilitation combined with personal therapeutic care - independent of time & location!

DevOps Engineer54 days ago
Full TimeRemoteTeam 51-200H1B No Sponsor

What to Expect At Caspar Health, we run a digital rehabilitation clinic where medical expertise meets high-end engineering. We don't just ship code; we provide patients with personal care from real therapists, powered by our platform. This means your work directly enables medical professionals to deliver life-changing therapy to people who otherwise wouldn't have access to it. We are looking for a DevSecOps/DevOps Engineer (all identities) with strong application and infrastructure security skills. We welcome experienced DevOps Engineers who bring solid security knowledge — understanding of common vulnerabilities, application security issues, and infrastructure hardening — and are ready to grow into a full DevSecOps role. You are a fit for this role if... - You live and breathe Cloud Security: You have a solid foundation in AWS and believe that infrastructure is only as good as its security-first design - You are an automation enthusiast: You prefer writing Terraform or Python scripts over manual configurations any day of the week - You are a bridge-builder: You enjoy working at the intersection of Development and Operations, helping teams "shift-left" without slowing them down We know you have choices. Here’s why you should choose us: - Growth over Stagnation: We don't expect you to be a finished DevSecOps guru on day one. If you are a solid DevOps Engineer with a security-focused mindset, we will provide the environment and support for you to become a true specialist - Modern Stack & Real Ownership: Work with AWS, Kubernetes, and Terraform/Terragrunt. You won't just follow tickets; you’ll help define our security architecture - Purpose-Driven Tech: Every line of code and every IAM policy you write directly contributes to someone’s recovery and health - Flexibility & Balance: We live the health-tech mission. Expect flexible working hours, a remote-friendly setup within Germany, and a culture that respects your "deep work" time Your Challenges - Master the Alert Lifecycle: Take the lead on triaging security alerts and vulnerabilities. You won't just "fix bugs"; you will coordinate smart remediations and build the systems that prevent them from reappearing - Champion "Shift-Left": Integrate automated security testing, vulnerability scanning, and compliance checks directly into our CI/CD pipelines - Fortify the Cloud: Use Terraform and Terragrunt to evolve our AWS infrastructure into a gold standard of "Security as Code” - Automate Compliance: Work within an empowered Platform Squad to turn regulatory requirements into automated guardrails, ensuring compliance is a byproduct of our engineering rather than a manual chore - Secure the Core: Manage and harden our data layers (PostgreSQL, Redis) and orchestrate our K8s environment with a zero-trust mindset including applications - Be the Security Mentor: Collaborate with development squads to identify and remediate vulnerabilities early in the software lifecycle Your Profile - A strong knowledge of application security: common vulnerabilities (OWASP Top 10), secure coding practices, dependency scanning, and remediation - The solid understanding of infrastructure security: secure configurations, network segmentation, encryption at rest and in transit, access controls - The DevOps Foundation: You have a proven track record in AWS environments, managing Infrastructure as Code (Terraform) and containers (Docker/K8s) - The Security Mindset: You don’t just build pipelines; you wonder how someone might break them. You’re familiar with encryption, network segmentation, and secure access protocols - The Problem Solver: You enjoy Linux administration and can automate tasks using Python, Go, or Node.js - The Communicator: You can explain complex security risks to a developer in a way that inspires them to fix it. (English is our working language) Note to DevOps Engineers: If your background is primarily in infrastructure but you have a strong desire to deepen your application security skills, we encourage you to apply. We value your curiosity and will support your growth into a full DevSecOps role. Why Caspar Health? - Remote-first with flexible working hours – office optional in Berlin-Mitte or up to 90 days a year outside of Germany - To match this, we support you with a monthly home office allowance and an additional meal subsidy - Plenty of time to recharge – with 30 vacation days per year - Budget for learning & development, conferences, and coaching – tailored to your potential and growth opportunities. - High level of ownership and decision-making freedom – no micromanagement. We hire experts who know what they’re doing. #MakeAnImpact! - Genuine collaboration – no silos, no egos. After all, we’re all working toward the same vision. #ValueFocus! - Access to all Caspar offerings for mental and physical well-being #HealthyTogether! - And yes – all the snacks your heart desires, group sports sessions, a never-empty drinks fridge, and a healthy dose of humor are included too. What Defines Us as a Team - We live diversity – it's not a marketing slogan, it's part of our everyday reality - Feedback isn’t a tool – it’s part of our culture - We believe that technology only makes sense when it truly helps people - Our drive is purpose – but our standard is professionalism At Caspar Health, we strive to provide a friendly, safe and welcoming environment for all Casparians* regardless of ethnic origin, gender, gender identity and expression, sexual orientation, limitations of any kind, physical appearance, social background, age or religion (or lack thereof). When you apply, we purely focus on your experience and motivation. You decide what additional information you want to disclose (picture, marital status, religion, gender, nationality, etc). We value and treat all applications equally. We are looking forward to getting to know you! Your Contact Dana Kussatz, Talent Acquisition Expert and HR Tech Business Partner. Please use our Career Page to apply to this position, we will not consider applications via email. We are looking forward to receive your CV, salary expectation and your earliest starting date. Diversity and Inclusion: At Caspar Health, we strive to provide a friendly, safe, and welcoming environment for all Casparians – regardless of gender, gender identity and expression, sexual orientation, disabilities of any kind, physical appearance, social background, age, or religion (or lack thereof). In your application, we focus on your experience and motivation. It’s entirely up to you which additional personal information you'd like to share (photo, marital status, religion, gender, nationality, etc.). We value and treat all applications equally. Our product CASPAR Health is used in the environment of medical institutions, patients, doctors and therapists. Working with (sensitive) personal data (e.g. health data) is part of our daily work. Therefore, we require a high level of commitment to the protection of personal data from our employees in order to protect the rights of the data subjects as best as possible. Note on data protection: Please apply exclusively via our career site, as this is the only way we can guarantee the protection of your data. You can find our privacy policy here.

Germany
Caspar Health logo

DevSecOps/DevOps Engineer

Caspar Health

Effective, recognized digital rehabilitation combined with personal therapeutic care - independent of time & location!

DevOps Engineer54 days ago
Full TimeRemoteTeam 51-200H1B No Sponsor

• Liderar el ciclo de vida de alertas: tomar la iniciativa en la clasificación de alertas de seguridad y vulnerabilidades. • Integrar pruebas de seguridad automatizadas: pruebas de seguridad, escaneo de vulnerabilidades y verificaciones de cumplimiento directamente en nuestros pipelines CI/CD. • Fortalecer la nube: Usar Terraform y Terragrunt para evolucionar nuestra infraestructura de AWS. • Automatizar el cumplimiento: Convertir requisitos regulatorios en guardrails automatizados. • Gestionar y endurecer nuestras capas de datos (PostgreSQL, Redis) y orquestar nuestro entorno K8s. • Colaborar con escuadras de desarrollo para identificar y remediar vulnerabilidades temprano en el ciclo de vida del software.

Germany
Job Closed
Deutsche Telekom IT Solutions logo

Expert DevOps Engineer - T Cloud Public (REF5412J)

Deutsche Telekom IT Solutions

As Hungary’s most attractive employer in 2025 (according to Randstad’s representative survey), Deutsche Telekom IT Solutions is a subsidiary of the Deutsche Telekom Group. The company provides a wide portfolio of IT and telecommunications services with more than 5300 employees. We have hundreds of large customers, corporations in Germany and in other European countries. DT-ITS received the Best in Educational Cooperation award from HIPA in 2019, acknowledged as the Most Ethical Multinational Company in 2019. The company continuously develops its four sites in Budapest, Debrecen, Pécs and Szeged and is looking for skilled IT professionals to join its team.

DevOps Engineer54 days ago
Full TimeRemoteTeam 5,001-10,000

Company Description Ranked as Hungary’s most attractive employer in 2025 (according to Randstad’s representative survey), Deutsche Telekom IT Solutions is a subsidiary of the Deutsche Telekom Group. The company provides a wide portfolio of IT and telecommunications services with more than 5300 employees. We have hundreds of large customers, corporations in Germany and in other European countries. - DT-ITS received the Best in Educational Cooperation award from HIPA in 2019 and was acknowledged as the Most Ethical Multinational Company in the same year. The company continuously develops its four sites in Budapest, Debrecen, Pécs and Szeged and is looking for skilled IT professionals to join its team. Job Description Your Department Step into the engine room of Europe’s public cloud! At T Cloud Public, we don’t just operate a platform — we shape the future of secure, scalable, open-source cloud technology. Powered by agile ways of working, lean structures, and a passionate, high-performance team, we thrive in one of the most dynamic environments in the industry. If you’re a cloud enthusiast and a hands-on professional ready to tackle complex challenges, you’ll feel right at home with us. Why Join? • Innovative Product: Work with a cutting-edge cloud platform that’s truly making an impact as the leading European alternative to the hyperscalers. • Growth Opportunity: Be part of a rapidly expanding organization with significant career advancement potential. • Competitive Compensation: Attractive package, based on experience, supplemented with comprehensive benefits. • Dynamic Culture: Collaborative, supportive, and inclusive environment with a focus on innovation and continuous learning. Job Description The challenges you’ll tackle: - Solve complex problems in the daily operation of a hyper-scaler's cloud backend. - Build tools to reduce occurrence of errors and improve customer experience. - Develop software to integrate with internal back-end systems. - Perform root cause analysis of production errors and resolve technical issues. - Consistently automate with common automation frameworks. - Work in a team of specialists where everyone helps each other in an open and trusting manner. Qualifications Your Profile - Completed studies in a technical, engineering or scientific subject or comparable professional training. - min 4 years of professional experience in IT with a focus on modern cloud technologies. - Experience as a DevOps engineer or in a similar software engineering role. - Strong experience in Linux and container-technologies. - Proficiency with Git and GitHub workflows - Good knowledge of Python and Go. - Extensive knowledge in infrastructure automation. - Knowledge of agile development processes. - High level of customer focus. - Problem-solving attitude. - Collaborative team spirit. - Fluency in written and spoken English. - You will be working in the European Union to meet our customers' data security and privacy requirements. Additional Information * Please be informed that our remote working possibility is only available within Hungary due to European taxation regulation. - Company: Deutsche Telekom TSI Hungary Kft.

Hungary