At Akamai, we make life better for billions of people, billions of times a day. Every moment, billions of people, all over the world, are using the internet to shop, play games, look after finances, learn remotely, share videos, connect across the world, and so much more. These life-shaping digital experiences wouldn’t be possible without Akamai. We power and protect life online. It’s an extraordinary mission, and our global teams achieve it by solving the toughest challenges, and turning the impossible into the possible. With the world’s most distributed compute platform — from cloud to edge — we make it easy for businesses to develop and run applications, while we keep experiences closer to users and threats farther away. That’s why innovative companies worldwide choose Akamai to build, deliver, and secure their digital experiences. Thanks to our world’s most distributed platform for cloud computing, security, and content delivery. Akamai keeps applications and experiences closer and threats farther away. Devoted, determined problem-solvers who share a passion for technology, we’re always pushing ground-breaking ideas and driving innovation. Do you want to power and protect life online, by solving the toughest challenges with us? Be part of an amazing team!
Senior Site Reliability Engineer
Location
Poland
Posted
67 days ago
Salary
0
Seniority
Senior
Job Description
Senior Site Reliability Engineer
Akamai Technologies
• Taking responsibility for observability strategy, designing telemetry, dashboards, alerts, defining SLO/SLI frameworks, and implementing improvements when targets are missed. • Building production-grade automation and tooling that reduces operational toil, improves incident response, and sets patterns that other SREs adopt • Owning incident management integration for inference workloads, designing frameworks, leading incident response during on-call rotations, and driving systemic improvements from post-mortems • Defining and implementing deployment safety practices including progressive rollouts, canary analysis, and rollback automation, establishing standards for the team • Partnering with product engineering teams to influence architecture decisions, ensure operational readiness, and represent the SRE perspective in design reviews • Mentoring Senior and mid-level SREs through code reviews, design discussions, and hands-on problem-solving
Job Requirements
- Have extensive experience in SRE, platform engineering, or infrastructure engineering, working with large-scale distributed systems
- Track record of defining SLO/SLI frameworks, building observability platforms, and running incident management processes at scale
- Demonstrate expertise in Kubernetes and containerization, including autoscaling, resource scheduling, and orchestration for compute-intensive workloads at scale.
- Build automation and tooling using Python or Go, while leveraging CI/CD pipelines, deployment safety practices, and infrastructure-as-code expertise.
- Lead technical initiatives across teams, guide engineers through mentorship, and resolve complex reliability challenges independently with expertise and precision.
- Gain experience in AI/ML infrastructure, model deployment, or handling GPU workloads effectively within relevant environments.
- Demonstrate ownership of intricate reliability issues, deliver solutions collaboratively, and enhance the technical expertise of surrounding SRE team members.
Benefits
- Your health
- Your finances
- Your family
- Your time at work
- Your time pursuing other endeavors
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
DevOps Engineer
VirtuosFounded in 2004, Virtuos is one of the largest independent video game development companies. We are headquartered in Singapore with offices in Asia, Europe, and North America. Specializing in full-cycle game development and art production, we have delivered high-quality content for more than 1,500 console, PC, and mobile games. Our clients include 23 of the top 25 gaming companies worldwide. Volmi - A Virtuos Studio specializes in game development and game content creation. Over the past 9 years, we have successfully completed over a thousand different projects. We have gained considerable experience in creating 2D and 3D art, as well as developing turnkey games. Since 2022, we have been a part of Virtuos, the world's leading game developer. We have contributed to the development of popular games such as Diablo 2: Resurrected, Metro Exodus, Sniper: Ghost Warrior Contracts 2, Smite, Paladins, Gwent: The Witcher Card Game, and Marvel Snap.
• Design, implement, and maintain CI/CD pipelines supporting cross-platform builds (Android, iOS, and PC) using Jenkins, GitHub Actions, and Unreal Automation Tool; • Automate build, packaging, and artifact delivery for mobile clients, PC builds, and headless multiplayer servers; • Develop and maintain deployment pipelines for multiplayer infrastructure using Docker and Kubernetes; • Support orchestration of dedicated game servers and matchmaking services via the Agones game server platform; • Manage multiple runtime environments for development, testing, demos, and release candidates; • Maintain and improve infrastructure reliability and deployment reproducibility across environments; • Implement monitoring and observability systems using Prometheus and Grafana to track infrastructure health, build reliability, and server performance; • Collaborate with backend engineers and game teams to ensure smooth integration of multiplayer services, telemetry pipelines, and backend systems; • Support infrastructure running in cloud environments (AWS, GCP, or Azure); • Participate in infrastructure security practices, including management of secrets, credentials, and access permissions in collaboration with the client infrastructure team; • Optimize build pipelines and infrastructure to support large-scale multiplayer environments (100k+ CCU); • Document DevOps processes and infrastructure configurations to support long-term maintainability.
Senior Site Reliability Engineer
Akamai TechnologiesAt Akamai, we make life better for billions of people, billions of times a day. Every moment, billions of people, all over the world, are using the internet to shop, play games, look after finances, learn remotely, share videos, connect across the world, and so much more. These life-shaping digital experiences wouldn’t be possible without Akamai. We power and protect life online. It’s an extraordinary mission, and our global teams achieve it by solving the toughest challenges, and turning the impossible into the possible. With the world’s most distributed compute platform — from cloud to edge — we make it easy for businesses to develop and run applications, while we keep experiences closer to users and threats farther away. That’s why innovative companies worldwide choose Akamai to build, deliver, and secure their digital experiences. Thanks to our world’s most distributed platform for cloud computing, security, and content delivery. Akamai keeps applications and experiences closer and threats farther away. Devoted, determined problem-solvers who share a passion for technology, we’re always pushing ground-breaking ideas and driving innovation. Do you want to power and protect life online, by solving the toughest challenges with us? Be part of an amazing team!
Do you enjoy solving complex reliability challenges for cutting-edge technology? Do you have a passion for automation and building systems that scale? Join the Akamai Inference Cloud Team The Akamai Inference Cloud team is part of Akamai's Cloud Technology Group. We design, implement, deploy and operate AI platforms that enable customers to run inference models and developers to create AI applications with unmatched performance, compliance, and economics. Partner with the best As a Senior SRE, responsibilities include owning reliability workstreams for Akamai's serverless inference platform, building automation and tooling, and contributing to architecture and operational decisions. Opportunities exist to take ownership of critical reliability problems end-to-end, partner with product engineering teams, and develop expertise in GPU infrastructure, Kubernetes at scale, and AI inference workloads. As a Site Reliability Engineer, you will be responsible for: - Building and maintaining observability for AI workloads, including telemetry, dashboards, alerts, SLO/SLI tracking, and driving improvements when targets are missed - Writing automation and tooling to reduce operational toil, improve deployment safety, and accelerate incident response - Integrating AI workloads into Akamai's existing incident management processes, building runbooks, participating in on-call rotations, and conducting blameless post-mortems - Building and maintaining CI/CD integrations, deployment safety checks, and rollback automation - Collaborating with product engineering teams to improve reliability, contribute to architecture decisions, and ensure operational readiness for product releases - Contributing to capacity planning, autoscaling configuration, and workload scheduling for AI compute infrastructure Do what you love To be successful in this role you will: - Demonstrate expertise in SRE, infrastructure, or platform engineering, managing large-scale distributed systems with extensive operational experience. - Demonstrate expertise in Kubernetes and large-scale containerization systems. - Define SLOs and work with observability tools like Prometheus, Grafana, and distributed tracing to enhance system monitoring. - Demonstrate proficiency in Python or Go for automation, CI/CD pipelines, deployment safety, and infrastructure-as-code like Terraform. - Interest in or experience with AI/ML infrastructure, model serving, or GPU workloads - Resolve issues independently while maintaining accountability throughout the process. - Demonstrate accountability for reliability, develop automation and monitoring, and collaborate effectively with an engineering team unfamiliar with SRE practices. Work in a way that works for you FlexBase, Akamai's Global Flexible Working Program, is based on the principles that are helping us create the best workplace in the world. When our colleagues said that flexible working was important to them, we listened. We also know flexible working is important to many of the incredible people considering joining Akamai. FlexBase, gives 95% of employees the choice to work from their home, their office, or both (in the country advertised). This permanent workplace flexibility program is consistent and fair globally, to help us find incredible talent, virtually anywhere. We are happy to discuss working options for this role and encourage you to speak with your recruiter in more detail when you apply. Learn what makes Akamai a great place to work Connect with us on social and see what life at Akamai is like! We power and protect life online, by solving the toughest challenges, together. At Akamai, we're curious, innovative, collaborative and tenacious. We celebrate diversity of thought and we hold an unwavering belief that we can make a meaningful difference. Our teams use their global perspectives to put customers at the forefront of everything they do, so if you are people-centric, you'll thrive here. Working for you At Akamai, we will provide you with opportunities to grow, flourish, and achieve great things. Our benefit options are designed to meet your individual needs for today and in the future. We provide benefits surrounding all aspects of your life: - Your health - Your finances - Your family - Your time at work - Your time pursuing other endeavors Our benefit plan options are designed to meet your individual needs and budget, both today and in the future. About us Akamai powers and protects life online. Leading companies worldwide choose Akamai to build, deliver, and secure their digital experiences helping billions of people live, work, and play every day. With the world's most distributed compute platform from cloud to edge we make it easy for customers to develop and run applications, while we keep experiences closer to users and threats farther away. Join us Are you seeking an opportunity to make a real difference in a company with a global reach and exciting services and clients? Come join us and grow with a team of people who will energize and inspire you! #LI-Remote
Senior GCP Data Specialist
AvengaA global IT engineering and consulting company specializing in custom software development.
This is us At Avenga, we believe that human creativity empowers technology that matters. Operating globally, our 6000+ specialists provide a full spectrum of services, including business and tech advisory, enterprise solutions, CX, UX and Ul design, managed services, product development, and software development. This is the job In Bulgaria, we are actively seeking for an experienced GCP Data Specialist to strengthen our team dedicated to to build and further develop a scalable cloud data platform. The goal is to implement high-performance, maintainable, and cost-efficient data architectures for analytics and reporting use cases in an international environment. This is you - At least 4 years of practical experience with GCP in a data environment - BigQuery - Dataflow - Pub/Sub - Cloud Storage - Cloud Composer (or comparable orchestration) - Very good SQL skills - Very good Python skills (alternatively Java/Scala) - Experience in data modeling (e.g., Star Schema, Data Vault) - Experience with Infrastructure as Code (e.g., Terraform) is desirable - Independent, structured, and proactive Working Methods - Experience in international projects is an advantage This is your role - Design and implementation of modern data architectures on Google Cloud Platform - Building and optimizing scalable data pipelines (batch & streaming) - Developing data warehouse/lakehouse structures in BigQuery - Implementing and optimizing ETL/ELT processes - Performance and cost optimization within the GCP environment - Implementing data governance, security, and access concepts - Technical coordination with international stakeholders (English-speaking) What awaits you at Avenga? - Through our values, Better Minds, Bolder Ideas, and Bigger Hearts, we strive to provide you with the tools, autonomy, trust, and assistance you need to excel. Enjoy benefits like private health insurance, well-being programs, flexible and hybrid work models, laptops and gear, training, language classes, social events, great offices, and more. At Avenga, everyone matters. We provide equal opportunities in recruitment, career development, and leadership, regardless of race, ethnicity, gender identity, sexual orientation, disability, age, religion, or any other characteristic. We are committed to fostering a work environment where our diverse community of employees, candidates, and business partners actively shapes our growth. By bringing together people from different backgrounds and experiences, we build a workplace where everyone feels free to be themselves while honoring the boundaries of others.
DevOps Engineer, 3+ Years
Codvo.aiBuilding Advance AI & Cloud Native Software Using The "Virtual Silicon Valley" Model. Let’s Talk AI, Cloud and Outcomes.
• Lead a team of DevOps engineers responsible for ensuring the seamless operation of platforms and applications. • Implement and maintain infrastructure, automate deployments, and optimize systems for performance and reliability. • Champion DevOps best practices and drive continuous improvement within the team.



