Senior Site Reliability Engineer, SRE
Location
Czechia
Posted
63 days ago
Salary
0
Seniority
Senior
Job Description
Senior Site Reliability Engineer, SRE
Nebius Group
• Ensure fault-tolerance, scale, and uninterrupted operations for the service. • Use cutting-edge cloud technology to solve a variety of infrastructure problems. • Implement and improve CI/CD processes.
Job Requirements
- Solid experience with programming languages (like Go, Python, or C++);
- Solid understanding of classic algorithms and data structures;
- Commercial experience with and deep understanding of Unix systems and network technology;
- Experience with systems for containerization and configuration management (Ansible, Salt, Terraform, Docker, K8s, Helm).
Benefits
- Competitive salary and comprehensive benefits package.
- Opportunities for professional growth within Nebius.
- Flexible working arrangements.
- A dynamic and collaborative work environment that values initiative and innovation.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
DevSecOps Engineer, Entry Level – Clearance Required
ICFFounded in 1969, ICF is a global advisory and technology services company headquartered in Reston, Virginia. It delivers data-driven solutions across energy, environment, infrastru
• Assist in the development, maintenance, and monitoring of CI/CD pipelines using tools such as GitHub Actions, GitLab CI, Jenkins, or Azure DevOps. • Support infrastructure as code (IaC) efforts using tools like Terraform, CloudFormation, or ARM templates. • Help integrate security scanning and compliance checks into build and deployment pipelines (SAST, DAST, dependency scanning, container scanning). • Support cloud infrastructure in AWS, Azure, or GCP, with an emphasis on security best practices. • Assist with containerization efforts using Docker and orchestration platforms such as Kubernetes. • Monitor environments, logs, and alerts; assist with troubleshooting and incident response. • Document configurations, processes, and security controls to support audits and compliance requirements. • Collaborate with development, operations, and security teams in an Agile/Scrum environment. • Learn and apply federal security frameworks such as NIST, FISMA, and FedRAMP.
Java Developer
Trimetis ServicesOur client is an international supplier of communication and information systems for control centers with safety-critical tasks. They specialize in developing and distributing 'Control Centre Solutions' for the Air Traffic Management and Public Safety & Transport sectors.
Role Description As a Software Developer, you will join a dedicated team of professionals to actively contribute to the development of the next-generation platform for safety-critical domains. This platform is designed to address the highest quality needs in air traffic management, providing reliability, real-time performance, and security as critical success factors. The digital platform facilitates seamless integration of voice and data applications, breaks down operational silos, and prioritizes cyber security throughout the solution lifecycle. By streamlining control center tasks and enabling integrated controller operations on a single screen, the platform empowers organizations to enhance service levels without increasing headcount. Project methodology: SCRUM (SAFe) Location: Poland Technologies used in the project: - Java 17/21 - Golang - Python - RedHat 8/10 - Docker - RabbitMQ - Postgresql, CQL (Cassandra) - React, WebSockets - Jenkins, Ansible You will work on a microservices-based framework used across the company to run and integrate multiple products. You will develop reusable services, software packages, and libraries to be used by other teams in the company. The product runs on RedHat, handles the deployment and configuration of the OS, and supports the configuration and orchestration of services packaged as Docker images. We are developing for a safety-critical environment, following industry standards, where high availability and redundancy is a basic requirement. Your tasks: - Take part in the daily work of the Scrum Team - Contribute to the design and architecture of the product - Write code according to the company processes and standards - Write unit and component tests - Support for the QA in testing and developing the automated system tests - Writing documentation and requirements - Clean Code / Code Review Qualifications - 7+ years of software development experience in Java - Knowledge of object-oriented concepts, design patterns and SOLID principles - Passionate about clean, unit tested and maintainable code - Experience with microservice architecture - Experience with SOA and microservices - Experience with REST, messaging/AMQP, WebSockets - Experience with distributed systems and understanding of concepts (CAP theorem, redundancy, high availability, consensus protocols, etc.) - Know-how on continuous integration and delivery - Good English skills (both written and spoken) Requirements - Experience with Go/Python is a plus - Knowledge of Linux and Bash - IP networking know-how - Domain Driven Design experience is a plus - Experience in configuring CI/CD pipelines is a plus - Experience with Github Copilot Benefits - Flexible working hours - Medical insurance - International clients - Remote work - Annual bonuses - Life insurance - Non-corporate work atmosphere - Integration events - Additional days off - Training and development budget Company Description Our client is an international supplier of communication and information systems for control centers with safety-critical tasks. They specialize in developing and distributing 'Control Centre Solutions' for the Air Traffic Management and Public Safety & Transport sectors.
Senior Site Reliability Engineer – FinOps
Jusbrasil💻 Descomplicamos o acesso à informação jurídica por meio da tecnologia
• Ensure the reliability, availability, and scalability of systems and services. • Develop and implement monitoring and observability solutions focused on performance and cost. • Create and maintain infrastructure as code using tools such as Terraform. • Work closely with the Engineering Platform, SRE Partner, FP&A, and product teams. • Help build and evolve a culture oriented toward financial efficiency (FinOps) and reliability (SRE). • Actively contribute to the evolution of the Agentic Engineering Platform. • Support continuous improvement initiatives for infrastructure, automation, and performance. • Contribute to migrations of critical systems.
• Execute configuration, administration and troubleshooting activities on physical and virtualized networks present in the NFVi environment, including L2/L3 switches, virtual switches, SDN controllers and overlay/underlay networks. • Perform acceptance tests and functional validation of new elements and systems. • Work with monitoring and inventory teams to validate integrations, adjust metrics and ensure environment visibility. • Document operational routines, troubleshooting procedures and propose continuous improvements to increase operational efficiency. • Monitor network KPIs and metrics, applying tuning, load balancing and preventive adjustments to avoid congestion, packet loss and bottlenecks. • Perform connectivity troubleshooting involving VLANs, VXLAN, BGP/EVPN, OSPF; Port-channels, VLT/MLAG, STP/ERPS and Overlay/Underlay. • Provide technical support for NFVi infrastructure evolution projects such as upgrades, migrations and integrations. • Collaborate with technical teams to resolve issues and continuously improve environments.




