Senior Network Site Reliability Engineer – NetSRE
Location
Netherlands
Posted
111 days ago
Salary
0
Seniority
Senior
Job Description
Senior Network Site Reliability Engineer – NetSRE
Nebius Group
• Define and own reliability goals for network services and critical paths (SLIs/SLOs, availability targets, error budgets where it makes sense) • Drive reliability improvements across the whole network: not only services, but also site readiness, inter-site connectivity (DCI), and operational standards • Own incident response for your areas, lead investigations/postmortems, and turn failures into durable fixes (not repeated firefighting) • Build and evolve observability: actionable metrics/logs/traces, alerting, and faster debug loops during and after incidents • Design safer change workflows: automation, CI/CD, test/staging environments, canarying, rollbacks, and auditability for network changes • Work closely with network engineers and platform teams to embed operability into designs and keep operations practical and fast
Job Requirements
- Strong production Linux fundamentals and a structured approach to debugging complex systems
- Solid understanding of networking basics and how real networks fail (control plane vs data plane, latency/loss, failure domains, etc.)
- Hands-on experience operating high-availability systems and improving them over time (not just “keeping lights on”)
- Ability to write and maintain software/automation (Go is common for us; Python is also welcome)
- Experience with modern infrastructure tooling (e.g., IaC, CI/CD, container platforms) and comfort automating operational workflows
Benefits
- Competitive compensation
- Career growth and learning opportunities
- Flexibility and work-life balance
- Collaborative and innovative culture
- Opportunity to work on impactful AI projects
- International environment and talented teams
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Principal Software Engineer – Site Reliability
UpstartOur mission is to enable effortless credit based on true risk.
• Lead the definition, advocacy, and adoption of SRE principles across engineering teams • Partner with leadership to shape long-term reliability, resiliency, and observability strategies • Champion distributed tracing, real user monitoring (RUM), and key performance metrics such as Largest Contentful Paint (LCP) to improve system visibility and user experience • Build and scale self-healing systems to minimize manual intervention and reduce downtime • Drive enterprise-wide improvements to incident response processes, including those related to Machine Learning systems • Collaborate closely with Development Productivity and Quality teams to improve engineering velocity without sacrificing reliability • Influence technical and operational roadmaps through data-driven insights and hands-on technical contributions • Own and deliver cross-functional initiatives from concept through execution, applying program management skills to align stakeholders and achieve results
DevSecOps Delivery Manager
GuidePoint SecurityFounded in 2011 and headquartered in Herndon, Virginia, GuidePoint Security furnishes commercial and federal organizations with customized information security
• Develop and implement standardized process templates for DevSecOps service delivery • Create and manage project planning documentation to ensure smooth delivery and execution • Establish a centralized knowledge base of best practices, guidelines, and standards for DevSecOps • Collaborate with cross-functional teams to identify process improvements and implement changes • Manage key engagements and projects with strategic customers on DevSecOps projects ensuring smooth implementation and alignment with customer needs and expectations • Coordinate with Project Managers to ensure smooth delivery and preemptive escalations • Conduct project check-ins with sub-teams to discuss status of active projects • Establish measurable and repeatable DevSecOps delivery processes, reducing escalations and improving response times • Utilize Salesforce and other tracking systems to monitor project budgets, burn rates, and delivery timelines • Develop and maintain standardized process templates for DevSecOps services delivery • Establish and maintain a centralized knowledge base of best practices, guidelines, and standards for DevSecOps services delivery • Develop and implement deployment plans specific to tools and engagement types to enhance delivery efficiency • Track and report team utilization and forecasted engagements to ensure team capacity • Conduct regular process assessments to identify areas for improvement and opportunities for efficiency gains • Develop and implement process monitoring and reporting metrics to track key performance indicators (KPIs)
Senior Engineer – Build and DevOps
NVIDIABased in Santa Clara, California, with additional offices throughout the U.S., South America, and Canada, NVIDIA is committed to fostering a work environment wh
• Work in a team of DevOps engineers supporting multiple software projects in the data science and AI domain, many of them open source • Manage cutting-edge hardware and help inform purchasing decisions for the team • Collaborate with build engineers, developers, and management to ensure the delivery of high-quality software • Develop and modernize packages, such as streamlined Python wheels, for RAPIDS data science libraries • Design and maintain container build processes • Take a hands-on approach working with engineers on the team to implement DevOps best practices • Execute on a range of DevOps initiatives including CI/CD, observability, security/legal compliance, and SysAdmin tasks • Operate and maintain our infrastructure and development processes




