CoreWeave is a specialized cloud provider, delivering a massive range of GPU compute resources on demand and at scale.
Senior Engineer, Network Observability
Location
United Kingdom
Posted
4 days ago
Salary
0
Seniority
Senior
Job Description
Senior Engineer, Network Observability
CoreWeave
• We’re seeking a talented and experienced Senior Engineer for Network Observability to join our Network Observability team. In this role, you will be a key player in designing, developing, and maintaining the monitoring, telemetry, and observability systems that keep CoreWeave’s GPU cloud network operating reliably and at scale. • You’ll focus on building solutions that provide real-time insights into network performance, ensuring that issues are detected proactively and resolved quickly. • Develop, optimize, and maintain network observability platforms. Use your skills in Python and Golang to create and automate collectors, exporters, and dashboards that provide deep visibility into network health and performance. • Collaborate with Network Engineering and Platform teams to ingest and unify logs, metrics, and events from a variety of platforms (Arista EOS, NVIDIA Cumulus Linux, Nokia SR OS, SR Linux, etc.) into a single observability pipeline. • Design and implement scalable telemetry solutions using protocols like gNMI, SNMP, and streaming analytics. Ensure advanced alerting and anomaly detection with tools such as Prometheus, Grafana, and Alertmanager. • Work closely with network developers, site reliability engineers, and security teams to integrate observability solutions across the broader infrastructure. • Participate in design discussions, RFCs, and architectural decisions. • Join a rotating on-call schedule to troubleshoot and resolve observability-related issues. Provide timely support to operations teams, quickly isolating and fixing problems when they arise. • Guide junior team members, share best practices, and foster a culture of continuous learning and improvement within the observability domain.
Job Requirements
- Deep familiarity with Prometheus, Grafana, Alertmanager, gNMI, and SNMP. Experience writing or extending custom metric collectors/exporters is a plus.
- Experience as a Network Engineer, SRE, Software Developer, or Systems Administrator in large-scale environments. A track record of building and operating robust telemetry and monitoring solutions is a plus.
- Passion for automating tasks and processes. You find satisfaction in creating workflows that handle repetitive tasks and reduce human error to near zero.
- Comfortable containerizing solutions in Kubernetes, designing, building, and deploying container-based workloads efficiently.
- Proficient with Python, Go, and Bash, plus familiarity with configuration management and templating tools (e.g., Ansible, Jinja2). .
- Strong knowledge of Linux systems and IP networking concepts, with hands-on experience in routing, switching, and network troubleshooting.
- Practical knowledge with a variety of platforms, including Arista EOS, NVIDIA Cumulus Linux, Nokia SR OS, and SR Linux.
- Collaborative, humble, and always ready to help others while staying open to learning from more senior colleagues.
Benefits
- Family-level Medical Insurance
- Family-level Dental Insurance
- Generous Pension Contribution
- Life Assurance at 4x Salary
- Critical Illness Cover
- Employee Assistance Programme
- Tuition Reimbursement
- Work culture focused on innovative disruption
Related Guides
Related Job Pages
More Full-stack Engineer Jobs
Senior Software Developer
ZensuranceZensurance makes business insurance easy for Canadian entrepreneurs.
• Take ownership of the development of custom features and drive their technical implementation. • Act as the Subject Matter Expert for the team’s domain and drive its technical direction. • Suggest, design, implement, test and monitor features and functionalities. • Facilitate cross-team collaboration in accordance with established best practices and Agile methodology. • Evangelize proper software architecture and development paradigms. • Collaborate with project stakeholders and the development team to design and build scalable, user-friendly systems for our customers, and in-house tooling. • Discuss strategy and outline tradeoffs of potential software solutions. • Develop, test, and maintain codebase within the team’s domain. • Write clean, maintainable, and scalable code. • Contribute to knowledge sharing of new technologies and solutions which fall within the team’s area of expertise. • Offer guidance and mentorship to junior and intermediate team members. • Develop and maintain documentation for new and existing features and integrations.
Senior Software Engineer
CorityGlobal enterprise EHS software provider empowering those who transform the way the world works.
• Drive technical and architectural decisions to meet product requirements while also anticipating and designing for future needs. • Lead teams technically to drive production ready code. • Design and develop new software and enhance existing software for clients’ systems, and for Cority’s base software. • Communicate directly with Product Owners to ensure that requirements and specifications are understood. • Develop high-quality software and is an advocate of automation frameworks for testing, integration, and deployment. • Review completed software designs or prototypes with team and participates in code reviews • Tracks sprint work and provides proper transparency/visibility to their team • Provide support and maintenance. • Own one or more functional area’s or projects and help breakdown task into manageable stories. • Mentor junior developers. • Open to learning and working with modern technologies as required in the project.
Senior Full Stack Software Engineer
Proof Holdings, Inc.We’re creators and builders that are deeply passionate about art and community.
• Collaborate with Product and Network and Agent Experience team engineers to design, create, and maintain features for Proof's customers and users • Write quality code with a high degree of autonomy, meeting standards for performance and reliability • Use AI tools in your day-to-day coding, review, and deployment processes to work more effectively and deliver a quality experience for our customers • Drive project scoping activities and discussions around requirements and trade-offs • Troubleshoot complex technical issues in production, pertaining to team's areas of ownership • Drive improvements, collaboration, and best practices through code reviews and mentoring • Proactively write and maintain technical documentation • Participate in production on-call rotation several times a year, after receiving in-depth training
Software Engineer
System Inc.Relate everything, to help the world see and solve anything, as a system. System is a Public Benefit Corporation.
• Collaborate with cross-functional teams to design, implement, and maintain scalable software systems • Build resilient infrastructure for serving data at scale and with high availability • Drive technical planning and contribute to the overall engineering strategy • Develop and maintain continuous integration and deployment processes • Work on innovative solutions for advanced and challenging data and systems problems • Contribute to code reviews, unit testing, and development strategies in an agile environment • Influence culture, recruit new engineers, and help shape a rapidly growing startup




