Tyk is an equal opportunities employer and we are determined to ensure that no applicant or employee receives less favourable treatment on the grounds of gender, age, disability, religion, belief, sexual orientation, marital status, or race, or is disadvantaged by conditions or requirements which cannot be shown to be justifiable. You can see more about us here https://tyk.io

Senior Site Reliability Engineer

DevOps EngineerDevOps EngineerFull Time Remote Senior

Location

United States + 171 more

Posted

76 days ago

Salary

Seniority

Senior

Kubernetes Python AWS Amazon EKS Linux Terraform Infrastructure as Code Helm MongoDB Redis Prometheus Grafana Thanos DNS TCP/IP TLS

Job Description

Role Description At Tyk, we’re obsessed with building software that solves problems. We count on our Site Reliability Engineers (SREs) to empower users with a rich feature set, high availability, and stellar performance level to pursue their missions. Our customer base is growing, so we’re seeking an experienced Senior SRE to optimise, automate, and improve our performance, using insights from massive-scale data in real time. We want an original thinker, a challenger, a technical legend, an opinionated collaborator who wants to make things better. - Lead hands-on maintenance and optimization of our global Cloud platform within SL(A/I/O)s you'll help define - Collaborate to shape SRE strategy, then translate into actionable technical plans coordinated through SCRUM - Identify reliability issues, drive root cause analysis, and implement solutions alongside your squad - Lead performance tuning and fault finding through analysis of OS and application metrics - Design and implement automation for common operational tasks and cloud-operations workflows - Develop proactive alerting, monitoring roadmap, and relevant dashboards; define and track KPIs - Participate in on-call rotation, ensuring effective incident response and resolution within SLAs - Conduct blame-free postmortems, document findings, and maintain operational runbooks - Drive multi-region and multi-cloud platform expansion with focus on scalability and automation - Optimize infrastructure performance and cost efficiency without impacting service delivery - Engage with commercial teams on growth plans and translate into technical SRE strategies - Coordinate penetration testing through provider liaison, technical setup, and environment configuration - Champion continuous improvement across processes, communication, and team practices - Model excellence in software design and knowledge sharing - Plan and execute software upgrades to enhance cloud services Qualifications - Experience in an SRE role - Strong knowledge of cloud technologies and SLA SLO SLI management - Excellent communication and leadership skills - Ability to analyze and improve operational processes and performance metrics - Experience in software design, automation, and root cause analysis - On-call support experience and customer-focused mindset - Collaborative attitude with commercial and technical teams - Launching and operating production Kubernetes clusters - Designing and operating infrastructure on AWS and other providers - Operating MongoDB (or other document database) clusters - Operating Redis (or other key-value storage) clusters - Administering Linux servers - Operating Prometheus and Grafana - Operating logging collection and analysis system - Participating in the on-call rotation (4:00am - 16:00pm UTC) Requirements - Kubernetes (administrator) - Go and/or Python (advanced) - AWS/ EKS (advanced) - Linux (advanced) - Terraform and IaC in general (proficient) - Helm (proficient) - MongoDB (or similar) - Redis (or similar) - Monitoring – prometheus, grafana, thanos (familiar) - Grasp of networking concepts (subnets, routing, peering, load balancing, NAT, etc.) - Common networking protocols (DNS, TCP/IP, HTTP, TLS, UDP) - Proactive, energetic, innovative and change oriented - A desire to lead/mentor a team Benefits - Everyone has unlimited paid holidays. - Total flexibility in hours, as we believe creativity flows better when our people are given freedom to decide when they are most productive. - Employee share scheme - Generous maternity and paternity leave - Volunteering Days - Employee Wellbeing platform

Job Requirements

Experience in an SRE role
Strong knowledge of cloud technologies and SLA SLO SLI management
Excellent communication and leadership skills
Ability to analyze and improve operational processes and performance metrics
Experience in software design, automation, and root cause analysis
On-call support experience and customer-focused mindset
Collaborative attitude with commercial and technical teams
Launching and operating production Kubernetes clusters
Designing and operating infrastructure on AWS and other providers
Operating MongoDB (or other document database) clusters
Operating Redis (or other key-value storage) clusters
Administering Linux servers
Operating Prometheus and Grafana
Operating logging collection and analysis system
Participating in the on-call rotation (4:00am - 16:00pm UTC)
Kubernetes (administrator)
Go and/or Python (advanced)
AWS/ EKS (advanced)
Linux (advanced)
Terraform and IaC in general (proficient)
Helm (proficient)
MongoDB (or similar)
Redis (or similar)
Monitoring – prometheus, grafana, thanos (familiar)
Grasp of networking concepts (subnets, routing, peering, load balancing, NAT, etc.)
Common networking protocols (DNS, TCP/IP, HTTP, TLS, UDP)
Proactive, energetic, innovative and change oriented
A desire to lead/mentor a team

Benefits

Everyone has unlimited paid holidays.
Total flexibility in hours, as we believe creativity flows better when our people are given freedom to decide when they are most productive.
Employee share scheme
Generous maternity and paternity leave
Volunteering Days
Employee Wellbeing platform

Related Categories

DevOps Engineer

Related Job Pages

Remote Full-time Jobs (US)Remote Python Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

Senior DevOps Engineer/ Consultant

Jobgether

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team. We appreciate your interest and wish you the best! Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time. #LI-CL1 We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.

DevOps Engineer76 days ago

Other RemoteH1B No Sponsor

Company Site LinkedIn

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more. Role Description This role is a hands-on, senior-level position focused on enabling clients to accelerate software delivery through DevOps best practices, cloud-native technologies, and automation. You will work across client environments to implement scalable CI/CD pipelines, infrastructure-as-code solutions, and observability systems, while mentoring and upskilling teams. The position combines technical execution with strategic advisory responsibilities, helping organizations adopt modern software delivery practices efficiently and safely. You will have the opportunity to lead change, solve complex technical and organizational challenges, and contribute to internal and client-facing innovation. Ideal candidates are adaptable, collaborative, and excited by both building and teaching technical solutions in dynamic environments. - Lead the design, implementation, and optimization of CI/CD pipelines, build systems, and cloud-native architectures. - Develop, manage, and test Infrastructure as Code using tools like Terraform, Ansible, Chef, or Puppet. - Implement and maintain observability and monitoring systems such as Prometheus, Grafana, Elasticsearch, or Zipkin. - Guide and coach client and internal teams in modern software delivery practices, DevOps culture, and automation. - Provide technical leadership for complex challenges, contributing to both technical solutions and organizational change. - Mentor engineers and share knowledge through workshops, training, and collaborative problem-solving. - Contribute to internal innovation and thought leadership, including proofs-of-concept, open-source contributions, and community engagement. Qualifications - 5+ years of experience in DevOps, cloud-native engineering, infrastructure automation, or related roles. - Strong experience with CI/CD pipelines and build systems such as Jenkins, GitHub Actions, GitLab, or Azure DevOps. - Expertise in cloud platforms (AWS, Azure, GCP) and container orchestration tools like Kubernetes and Istio. - Hands-on experience with Infrastructure as Code (Terraform, CloudFormation) and modern source control strategies. - Proficiency in programming/scripting languages such as Python, Go, Java, or similar. - Experience with observability and monitoring tools (Prometheus, Grafana, Elasticsearch, Zipkin). - Exceptional communication, coaching, and mentoring skills, with the ability to guide technical and non-technical stakeholders. - Comfortable working in a dynamic client-facing environment, including remote work and periodic travel across the U.S. - Must be authorized to work in the U.S. or Canada without sponsorship. Benefits - Annual base salary: $110,000–$183,000, depending on experience. - Unlimited PTO, quarterly company bonuses, and funded HSA options. - 401(k) plan with company match and monthly LiveWell stipend. - Opportunities to work remotely with occasional travel to client sites. - Collaborative environment emphasizing professional growth, mentorship, and knowledge sharing. - Access to advanced tools, technologies, and client projects across industries.

AWS Azure GCP Kubernetes Terraform Jenkins GitHub Actions GitLab CI Python Prometheus Grafana Elasticsearch Ansible Chef Puppet Istio CI/CD Infrastructure as Code Docker

View details: Senior DevOps Engineer/ Consultant

United States

Apply

Job Closed

Site Reliability Engineer – SRE

ProArch

Consulting and technology- enabled by cloud, guided by data, fueled by apps, and secured by design.

DevOps Engineer76 days ago

Full Time RemoteTeam 201-500H1B Sponsor

Company Site LinkedIn

• Monitor system performance and reliability, ensuring uptime meets organizational SLAs. • Implement and maintain observability tools to gather metrics and logs for proactive issue detection. • Troubleshoot and resolve complex production issues across various components of our infrastructure. • Collaborate with software engineering teams to design and implement scalable, fault-tolerant architectures. • Develop and maintain automation scripts for deployment, monitoring, and system management. • Participate in on-call rotation to respond to production incidents and perform root cause analysis. • Contribute to capacity planning and performance tuning to ensure optimal resource utilization. • Document infrastructure, processes, and incident responses to promote knowledge sharing.

AWS Azure GCP Grafana Jenkins Kubernetes Microservices Prometheus Python Terraform

View details: Site Reliability Engineer – SRE

India

Apply

Senior Site Reliability Engineer, SRE

Degica Co, Ltd.

Making Japan simple

DevOps Engineer77 days ago

Full Time RemoteTeam 11-50H1B No Sponsor

Company Site LinkedIn

• Actively participate in improving and maintaining our AWS infrastructure • Continuously improving the system performance, reliability and security • Design, implement, and maintain our observability stack (metrics, logging, tracing, dashboards). • Correspond with engineering teams to instrument applications for better observability. • Improving developer productivity with tooling • Securing the system and adhere to compliance • Be part of the teams on-call rotation

AWS Jenkins Python Ruby Ruby on Rails TCP/IP Terraform

View details: Senior Site Reliability Engineer, SRE

Japan

Apply

Job Closed

Senior Development Operations Engineer

Cotiviti

Enabling a high-quality and viable healthcare system

DevOps Engineer77 days ago

Other RemoteTeam 5,001-10,000H1B Sponsor

Company Site LinkedIn

Overview Opening available for Senior DevOps Engineer - Hadoop at Cotiviti, Inc. in South Jordan, UT: Responsibilities Responsible for the availability and reliability of the infrastructure and business services within Cotiviti’s Research and Development (R&D) group. Design systems based upon best practices with repeatable standard configurations. Design processes and services with a focus on long-term and operational maintainability. Configure robust solutions based on business requirements to ensure reliability and availability of business services. Develop and maintain automation capabilities across on-premises and cloud environments. Document system configurations. Perform administration activities including security, OS tuning, log management, and networking. Create proactive monitoring services to capture and report availability and performance characteristics. Performance tuning across on-premises and cloud based on Big Data solutions. Automate system monitoring and notification processes. Maintain configuration and release internal software to production systems. Deploy code via CI/CD pipelines. Apply software and security patches. Provide technical expertise for on-premises systems, Hadoop, Cloudera, and AI/ML (artificial intelligence/machine learning). Estimate level of effort and create action plans for project deliverables. Meet project deliverables and document solutions based upon project needs. Communicate and transfer knowledge to peers and project team members. Support R&D in project tasks and administration of development environments. Work within an Agile environment. Perform Linux-based administration including security, networking, performance management/tuning, and system monitoring. Configure and manage enterprise Hadoop implementations. Communicate effectively with both IT and business users. Interface with peer groups such as production control, database administration, and service desk/help desk for incident and problem management. Provide on-call troubleshooting and support during off-hours. Qualifications Bachelor’s Degree or foreign equivalent degree in Computer Science, Information Technology, Technical Studies or related. Five (5) years of experience managing enterprise Hadoop ecosphere and administrating enterprise class Unix-based systems production/operational environments. Work experience to include: Five (5) years of experience working with project or development teams to support infrastructure needs; Five (5) years of experience with open source big-data methodologies including Hadoop, Cloudera, and Oozie; Five (5) years of experience with Linux-based systems; Four (4) years of experience with all aspects of system administration including security, networking, backup and recovery, performance management/tuning, and system monitoring; Four (4) years of experience with enterprise monitoring solutions; Four (4) years of experience with health plan operations; Four (4) years of experience with DevOps; Four (4) years of experience with Spark; Four (4) years of experience with StreamSets; Four (4) years of experience with Zeppelin; One (1) year of experience with JupyterHub; and One (1) year of experience with automation tools and Datameer. Any and all experience may be gained concurrently. Telecommuting available anywhere in the U.S. Company headquarters located at 10701 South River Front Pkwy, Suite 200, South Jordan, UT 84095. Base compensation ranges from at least $152,402. Specific offers are determined by various factors, such as experience, education, skills, certifications, and other business needs. Cotiviti offers team members a competitive benefits package to address a wide range of personal and family needs, including medical, dental, vision, disability, and life insurance coverage, 401(k) savings plans, paid family leave, 9 paid holidays per year, and 17-27 days of Paid Time Off (PTO) per year, depending on specific level and length of service with Cotiviti. #LI-DNI #immigration

Java Python Linux Hadoop Apache Spark CI/CD Observability / Monitoring Unix

View details: Senior Development Operations Engineer

United States

Apply

Job Closed

Senior Site Reliability Engineer

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Senior DevOps Engineer/ Consultant

Site Reliability Engineer – SRE

Senior Site Reliability Engineer, SRE

Senior Development Operations Engineer