Build software collaboratively from anywhere in the world, on any device, without spending a second on setup.
Senior Site Reliability Engineer
Location
France
Posted
14 days ago
Salary
0
Seniority
Senior
Job Description
Senior Site Reliability Engineer
Replit
• Design and Implement Observability Solutions: Develop comprehensive monitoring and alerting systems using modern observability tools. • Drive Automation and Infrastructure as Code: Architect and implement infrastructure automation solutions using tools like Terraform, Ansible, or Pulumi. • Establish SLOs and SLIs: Work with product and engineering teams to define and implement Service Level Objectives (SLOs) and Service Level Indicators (SLIs). • Incident Management and Response: Lead incident response efforts, conducting thorough post-mortems, and implementing improvements to prevent future occurrences. • Performance Optimization: Identify and resolve performance bottlenecks across our infrastructure.
Job Requirements
- 4-8 years of experience in Site Reliability Engineering or similar roles (DevOps, Systems Engineering, Infrastructure Engineering)
- Strong programming skills in languages commonly used for automation (Python, Go, or similar)
- Deep understanding of distributed systems
- Experience with container orchestration platforms (Kubernetes) and cloud-native technologies
- Proven track record of implementing and maintaining monitoring/observability solutions
- Strong incident management skills with experience leading incident response
- Experience with infrastructure as code and configuration management tools
Benefits
- Competitive Salary & Equity
- 401(k) Program with a 4% match (*US Only*)
- Health, Dental, Vision and Life Insurance
- Short Term and Long Term Disability
- Paid Parental, Medical, Caregiver Leave
- Flexible Time Off (FTO) + Holidays
- Commuter Benefits (*In-Office Only*)
- Monthly Wellness Stipend
- Autonomous Work Environment
- In Office Set-Up Reimbursement (*In-Office Only*)
- Quarterly Team Gatherings
- In Office Amenities (*In-Office Only*)
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Role Description As the Forward Deployment Engineer, you will sit at the intersection of client delivery, pre-sales, and internal AI tooling. You embed with a client (or an internal Netguru team), translate ambiguous problems into a defensible technical scope, and ship working AI systems to production - fast. Half of your time goes back into Netguru: every reusable pattern you spot in the field becomes a skill, agent, or MCP inside NetguruOS. This is a builder role, not an advisory one. Proof of work is a running system - not a deck. What you'll do: - Client engagements - Embed temporarily into the team that owns the problem - client engineering org or internal Netguru team - Run discovery with stakeholders from CTO down to end-user - Translate ambiguous pain into a written technical scope you can defend - Lead pre-sales on AI opportunities - scope, price, propose, close - Build - Prototype in days, not weeks - working software, not slides - Write production-grade Python or TypeScript (or whatever the client runs) - Ship AI-enabled workflows: copilots, agents, RAG systems, automation, internal tooling - Integrate with the client's existing stack - Stand up evals, observability, and guardrails before going live - Own the path to production: security review, procurement, change management - Build internally (≈50% of the role) - Extend Netguru's internal AI platform with new agents, MCPs, and skills based on field learnings - Build enablement tooling for internal AI delivery teams — templates, runbooks, reusable skills - Codify reusable client patterns back into NetguruOS Qualifications - 5+ years total, including 3+ years hands-on building and shipping AI-assisted systems - 2+ years in a client-facing consulting or pre-sales engineering role - Proven full-cycle ownership: discovery → scoping → build → production → handoff - Comfortable with both technical and business stakeholders in the same engagement - Communicative (B2+) Polish and English Requirements - Production-grade Python and/or TypeScript - shipped systems, not scripts - LLM stack fluency (Claude, GPT, open-weight) - knows when to use which and why - AI system design: agents, RAG pipelines, copilots, MCP servers, sub-agents - Eval-driven development - defines success metrics before shipping, measures after - Integration experience: REST APIs, iPaaS, common SaaS (CRM, PM, BI) - Cloud deployment, security reviews, change management Mindset - Builder by default - running system over recommendation deck - Stack-agnostic - picks the right tool, not the favorite one - Walks into chaos, imposes structure, delivers clarity - Ships in days, not weeks Nice to have: - Domain experience in Finance, Retail, or Supply Chain - Public presence - conference talks, open-source, published writing on AI - SDLC orchestration and multi-agent pipelines What success looks like (first 6–12 months) - First AI engagement scoped, built, and delivered to production - At least one reusable asset (agent skill, MCP, eval suite) shipped into NetguruOS - Measurable time savings from internal tooling contributions Benefits - Access to the WorkSmile platform offering benefits adapted to your preferences: - Multisport card - Private health insurance package - Life insurance - Hundreds of other options to choose from 15 categories (shopping, leisure, travel, food, etc.) - Discounts on Apple products - Support for your growth - a head/manager’s budget available to every employee - PLN 175 monthly lump sum (ryczałt) for remote employees (Contract of Employment) - Various internal initiatives: webinars, knowledge-sharing sessions, internal conferences Don't hesitate and apply right away! At Netguru, we're committed to creating an inclusive environment for everyone. If you require any disability-related accommodations during the recruitment process, please let us know. We're here to help!
• Install, configure, administer, and support Core Banking systems and services across different environments (DEV/UAT/PROD). • Automate operational and deployment activities using Rundeck and Ansible (job orchestration, playbooks/roles, inventories, and secure credential management). • Create and maintain runbooks, operational procedures, and standard documentation to enable repeatable and auditable executions. • Provide production support and participate in incident response, including triage, remediation, and post-incident analysis (RCA/problem management). • Monitor system health and performance; implement proactive improvements to availability, resilience, and capacity. • Manage configuration, patching, and lifecycle of operating system, middleware, and application components, ensuring compliance with internal controls. • Coordinate management, renewal, and deployment of digital certificates (e.g., TLS/SSL), including inventory, expiry monitoring, change planning, validation, and documentation to prevent service disruption. • Work with application, infrastructure teams, and vendors to troubleshoot complex end-to-end issues and ensure timely resolution. • Support change management by planning, testing, documenting, and executing changes with risk assessment and appropriate approvals. • Contribute to the improvement of CI/CD and release practices (version control, artifact management, environment consistency), where applicable. • Apply security best practices, including hardening, access control, secrets management, and auditability.
• Embed temporarily into the team that owns the problem - client engineering org or internal Netguru team • Run discovery with stakeholders from CTO down to end-user • Translate ambiguous pain into a written technical scope you can defend • Lead pre-sales on AI opportunities - scope, price, propose, close • Prototype in days, not weeks - working software, not slides • Write production-grade Python or TypeScript (or whatever the client runs) • Ship AI-enabled workflows: copilots, agents, RAG systems, automation, internal tooling • Integrate with the client's existing stack • Stand up evals, observability, and guardrails before going live • Own the path to production: security review, procurement, change management • Extend Netguru's internal AI platform with new agents, MCPs, and skills based on field learnings • Build enablement tooling for internal AI delivery teams — templates, runbooks, reusable skills • Codify reusable client patterns back into NetguruOS
About Air Space Intelligence ASI's mission-critical technology powers decision-making across aviation, defense, energy, and other critical infrastructure domains. Backed by top-tier investors including Andreessen Horowitz, Spark Capital, and Renegade Partners, ASI delivers operational decision superiority—compressing days of analysis into seconds of action. ASI is leading the way and pushing the boundaries of what’s possible. What You Will Do: You will wear multiple hats, from technical deployment and execution to product strategy. You will work closely with existing customers, engineering, and growth leads to ensure ASI’s products are continuously delivered to the end user successfully. Your focus will be ensuring the success of ASI’s delivery and execution with customers and end users. What We Value: - A bias for action and distinct aptitude for problem solving in ambiguous environments. - Technical fluency, competency, and curiosity. - Experience with or desire to learn hands-on technical deployment skills (e.g., basic Kubernetes-based deployments, application performance monitoring) - Familiarity with Department of War organizations, culture, and end user needs. - Ability to embed in technical customer work flows, understand customer data, and drive product scale. - Comfort operating in classified network environments (e.g., SIPR, JWICS) - Recent or current U.S. Security Clearance or eligibility to obtain a U.S. Security clearance. How We Hire: We look at the interview process not as a screening or test, but rather as an opportunity to simulate what it would look like working together. We build the interview process around you. ASI works with export controlled technology and restricted U.S. Government data, including on contracts mandating U.S. immigration status and location restrictions for performing personnel. Employment offers are contingent on ability to timely obtain all required authorizations for contemplated job duties.



