Akamai Technologies logo
Akamai Technologies

At Akamai, we make life better for billions of people, billions of times a day. Every moment, billions of people, all over the world, are using the internet to shop, play games, look after finances, learn remotely, share videos, connect across the world, and so much more. These life-shaping digital experiences wouldn’t be possible without Akamai. We power and protect life online. It’s an extraordinary mission, and our global teams achieve it by solving the toughest challenges, and turning the impossible into the possible. With the world’s most distributed compute platform — from cloud to edge — we make it easy for businesses to develop and run applications, while we keep experiences closer to users and threats farther away. That’s why innovative companies worldwide choose Akamai to build, deliver, and secure their digital experiences. Thanks to our world’s most distributed platform for cloud computing, security, and content delivery. Akamai keeps applications and experiences closer and threats farther away. Devoted, determined problem-solvers who share a passion for technology, we’re always pushing ground-breaking ideas and driving innovation. Do you want to power and protect life online, by solving the toughest challenges with us? Be part of an amazing team!

Senior Site Reliability Engineer

DevOps EngineerDevOps EngineerFull TimeRemoteSeniorTeam 5,001-10,000Since 1998H1B SponsorCompany SiteLinkedIn

Location

India

Posted

9 days ago

Salary

0

Seniority

Senior

Job Description

Senior Site Reliability Engineer

Akamai Technologies

• Investigating and troubleshoot networking problems within Linux based networking stack • Monitoring the functioning and performance of the networking infrastructure via Prometheus metric systems and Grafana dashboards • Solving complex problems in a timely and accurate manner and avoid recurrence through proactive troubleshooting, automation, and systems programming • Building software tools and systems to automate analytical tasks and workflows to increase efficiency and reliability. • Leveraging skills in data analysis, network diagnostics and debugging tools to characterize performance and recommend improvements.

Job Requirements

  • Have 5+ years' experience in Site Reliability or System Engineering role, and bachelor's degree in computer science or related field.
  • Have expertise in L7 traffic management (Envoy, HAProxy, NGINX) in large-scale distributed systems.
  • Be proficient in coding with Python, Perl, R, Java, or SQL & have networking knowledge including routing, firewalls, and DNS.
  • Have experience with Linux systems and tools such as netstats, traceroute, tcpdump.
  • Be proficient in configuration management and container technologies including Ansible, Salt Stack, Chef, Puppet, Terraform, Docker, Podman, Kubernetes, and Nomad.

Benefits

  • We support your health, well-being, finances, and life beyond work. See our benefits.

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Ensono logo

Senior Consultant, DevOps

Ensono

Ensono delivers complete Hybrid IT solutions, from mainframe to cloud, tailored to each client’s journey.

DevOps Engineer9 days ago
Full TimeRemoteTeam 1,001-5,000H1B Sponsor

• You’ll be a key technical contributor in delivering complex projects — combining deep engineering skills with the confidence to work independently within your area of responsibility. • You’ll take end-to-end ownership of your deliverables, ensuring they meet high standards of quality, security, and performance, while collaborating closely with colleagues and client teams to achieve project goals. • You’ll apply modern DevOps practices to design, build, and optimise solutions that help clients get the most from their cloud platform. • You’ll be actively involved throughout the full project lifecycle — from refining requirements and influencing solutions, to delivering working systems and communicating outcomes to technical and non‑technical audiences. • You’ll also provide guidance and informal mentoring to less experienced engineers, helping to raise capability within the team. • When not delivering on a client engagement, you may contribute to pre‑sales activities such as analysing technical requirements, working with our bid team, and helping develop compelling proposals. • You’ll also have opportunities to engage with Ensono Digital’s internal competencies — sharing knowledge, refining our delivery approaches, and expanding your own expertise.

United Kingdom
Remote logo

Staff Site Reliability Engineer I

Remote

The easier way to employ globally. Remote builds belonging for your team with payroll, benefits, & compliance solutions.

DevOps Engineer9 days ago
Full TimeRemoteTeam 501-1,000H1B Sponsor

Role Description As a Staff SRE at Remote, you will own the technical direction of our SRE platform, shaping its architecture, reliability strategy, and long-term evolution. This is a leadership role as much as a technical one: - Drive platform-wide initiatives. - Set the reliability bar for engineering teams across the organization. - Be a force multiplier for the engineers around you. A key part of this role is identifying and leading opportunities to leverage AI: - Reduce operational toil. - Enable engineering teams to build, ship, and operate software more effectively. You will work with a high degree of autonomy, translating technical risks into business impact and aligning with Engineering Managers, Team Leads, and Product teams to ensure reliability and engineering efficiency are built into everything we do. Qualifications - 8+ years of experience in Site Reliability Engineering, DevOps, or Platform Engineering. - Deep expertise in Kubernetes: operating, designing, and scaling production clusters. - Proven experience designing and managing cloud infrastructure on AWS (or other cloud providers) at scale. - Strong infrastructure-as-code practice with Terraform. - Experience defining and operating reliability frameworks: SLOs, SLIs, error budgets, alerting strategies. - Solid observability background: Datadog, Grafana/Prometheus, or similar. - Proficiency with CI/CD platforms (GitLab CI, GitHub Actions, or similar) and deployment automation. - Comfortable with Bash and scripting for automation; broader programming skills are a plus. - Experience with container tooling (Docker) and the broader ecosystem around it. - Curiosity and practical experience applying AI tools to infrastructure, operations, or developer tooling. Requirements - Proven track record of driving platform-wide technical initiatives and influencing engineering direction without formal authority. - Strong communicator: able to tailor messaging to technical and non-technical audiences, write clearly, and align stakeholders across teams. - Self-directed: able to identify what needs attention, define the path forward, and execute with minimal supervision. - Experience mentoring senior engineers and creating space for others to lead and grow. - Comfortable navigating ambiguity, translating vague requirements into concrete solutions. - Approaches technical problems with a business lens, understands the cost and value of engineering decisions. Key Responsibilities - Own the technical direction of Remote's SRE/Platform domain, its architecture, tooling, and long-term roadmap. - Define and drive the reliability strategy across the platform: SLOs/SLIs, error budgets, observability, and incident management maturity. - Lead complex, cross-team infrastructure initiatives from discovery through delivery, delegating effectively and keeping projects aligned with business goals. - Identify and lead AI enablement initiatives across the engineering organization. - Drive AI-powered automation for platform operations: intelligent alerting, automated incident triage, self-healing infrastructure, and AI-assisted runbooks. - Contribute to capacity planning and cost-efficiency of Remote's infrastructure. - Mentor senior engineers, raising the technical bar through code reviews, design feedback, and hands-on guidance. - Collaborate with the Security team on platform hardening, threat mitigation, and compliance. - Be a steward of engineering quality across the SRE team, championing best practices, managing technical debt deliberately, and raising standards over time. - Contribute to hiring, onboarding, and continuously improving how the SRE team operates. Benefits - Work from anywhere. - Flexible paid time off. - Flexible working hours (we are async). - 16 weeks paid parental leave. - Mental health support services. - Stock options. - Learning budget. - Home office budget & IT equipment. - Budget for local in-person social events or co-working spaces.

Worldwide
$188.6K - $212.2K / year

(Senior) Cloud Site Reliability Engineer (Platform)

Scalable GmbH

Scalable Capital is a leading digital investment and banking platform with a full banking licence, empowering people across Europe to shape their own finances. Scalable Broker makes it easy and affordable for clients to invest professionally in stocks, ETFs, cryptocurrencies, and derivatives, as well as set up savings plans. Scalable Wealth, the digital wealth management service, offers clients professional investment in ETF portfolios, and is also adopted as a white-label solution by banks and other B2B partners. The company’s offerings are rounded off by attractive interest rates, loans, and private equity. With the European Investor Exchange, Scalable Capital offers an exchange specifically for retail investors. Over one million clients have already entrusted more than €30 billion to the platform. Founded in 2014, Scalable Capital now employs over 700 people across Munich, Berlin, Vienna, Milan, and London. Together with the founding and management team, including Erik Podzuweit and Florian Prucker, they are working on a new generation of financial services.

DevOps Engineer9 days ago
Full TimeRemoteTeam 501-1,000

Role Description Our team's mission is to provide secure, compliant and scalable building blocks or automations to enable developers to build workflows that rapidly but reliably ship software. Scalable Capital was built in the cloud from day one. Our services currently run on various AWS services like ECS, Fargate and Lambda and are distributed across multiple accounts. We embrace a DevOps culture where the development teams manage their CI/CD pipelines and cloud infrastructure for their services themselves. Our Platform Engineering Team focuses on providing everything necessary to build and ship code into these systems fast, security and developer friendly. - Shape the way how Scalable builds micro services in the most performant, secure and cost efficient way - Collaborate with cross-functional teams to identify and understand build and development requirements for our platform - Design and rollout CICD related improvements paired with internal automation tooling running in ECS, Lambda and EKS - Develop AI based supporting tools and libraries - Mentor and enable our software development teams to further foster our DevOps culture by educating them and providing reusable and unified building blocks which can be used to improve development speed, security, testing and releasing - Stay up-to-date with the latest industry trends, tools and techniques related to platform engineering - Design and implement best practices around building our infrastructure - Keep our internal development tooling up to date and support teams with migrating to new technologies Qualifications - Multiple years of experience with AWS and infrastructure as code (mainly Terraform) - Solid experience with GitHub Actions and Jenkins or similar CICD tools - Good working knowledge with Python and at least one additional general purpose programming language (Preferably Java/Kotlin or JavaScript/Node.js) and build automation tools - Solid understanding of scalable system design principles, distributed systems, and cloud technologies - A passion for automating, improving processes and working together with other developer teams - A degree in a relevant field of study (e.g. computer science, engineering, sciences) or work experience in a role that typically requires a university degree - Full professional proficiency in English and the ability to communicate concisely in an international English-speaking environment - Excellent communication and collaboration skills, with the ability to work effectively in a cross-functional team environment Benefits - Be part of one of the fastest-growing and most visible Fintech startups in Europe, creating innovative services that have a substantial impact on the lives of our customers - Work with an international, diverse, inclusive, and ever-growing team that loves creating the best products for our clients - Be productive with the latest hardware and tools - Learn and grow by joining our in-house knowledge sharing or career development sessions and spending your individual Education Budget - Learn and experience German culture first hand by joining our free German language classes - International relocation support is provided if required - Opportunity to work from abroad - Benefit from an attractive compensation package and from the company pension scheme - Monthly contribution of 50% for the ‘Deutschland Jobticket’ - Say goodbye to order commissions and say hello to your complimentary subscription of Scalable Capital's PRIME+ Broker - Enjoy flexible and discounted sports activities with Urban Sports Club

Europe

DevOps Engineer

LifeMD

LifeMD is a rapidly growing telehealth company that delivers virtual primary care and treatment services nationwide. Founded in 1987 and headquartered in New York, New York, LifeMD

DevOps Engineer9 days ago

• Design, implement, and manage scalable, secure, and cost-effective cloud infrastructure primarily on AWS using Terraform • Develop and version control Terraform modules for automated provisioning, updating, and de-provisioning of cloud resources (e.g., EC2, S3, RDS, VPC, Lambda in AWS) • Design, build, and optimize automated CI/CD pipelines using GitHub Actions for various applications and microservices • Integrate automated testing, static code analysis, security scanning, and deployment steps into CI/CD workflows for high quality and secure releases • Implement, configure, and maintain comprehensive monitoring, logging, and alerting solutions (e.g., AWS CloudWatch, Datadog) for all environments • Develop custom dashboards, metrics, and alerts for real-time visibility into system health, performance, and security events • Proactively analyze logs and metrics to identify potential bottlenecks and issues • Participate in on-call rotations to swiftly respond to and resolve critical incidents, ensuring high service availability • Automate repetitive operational tasks, system configurations, and deployment processes using Python and Bash to enhance efficiency

California + 1 moreAll locations: California | New York
$150K - $160K / year
Job Closed