Health & well-being tech company led by entrepreneurs on a mission to create a positive impact globally.
Senior Site Reliability Engineer
Location
Cyprus
Posted
8 days ago
Salary
0
Seniority
Senior
Job Description
Senior Site Reliability Engineer
Palta
• You will be working day to day on AWS, our infrastructure-as-code, our CI/CD setup, observability, and the on-call rotation. • A meaningful part of the work is automation: when we find ourselves doing the same thing twice, we usually invest in tooling rather than writing another runbook. • Most of that tooling is written in Go. • Infrastructure is defined with Terraform and Terramate, with Atlantis running plan and apply on pull requests. • Workloads run on EKS with Karpenter and Fargate, deployed through ArgoCD. • Observability is built on Grafana, Loki, Tempo, and Prometheus compatible metrics.
Job Requirements
- Several years of hands-on experience operating production systems on AWS and Kubernetes, including genuine on-call ownership.
- A solid working knowledge of AWS fundamentals, including VPC, IAM, EKS, and RDS.
- Practical experience with Terraform and a GitOps-style delivery workflow (ArgoCD, Atlantis, Flux, or similar).
- Comfort writing code, with some prior experience in Go or a willingness to pick it up (writing small services and tools is a regular part of the work).
- Strong written and spoken English, and the communication skills to drive design discussions across engineering, product and security.
Benefits
- Open-minded teams, a welcoming and inclusive company culture, plus the opportunity to make a real difference with a game-changing health tech product.
- A competitive salary package based on your unique expertise, skillset, and impact on the product plus stock options.
- In-office, remote and hybrid work opportunities.
- The equipment whatever you need to be happy and productive.
- A premium SIMPLE subscription.
- 21 days annual leave, plus bank holidays (those observed where you live).
- Flexible hours. We focus on your results, not how long you spend at your desk.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Principal Site Reliability Engineer - Networking
ElasticSelf-described as the leading platform for search-powered solutions, Elastic helps organizations, their customers, and their employees find what they need faster while protecting a
Title: Principal SRE (Networking) - Platform Control Plane Location: Remote - United States Job Description: Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale — unleashing the potential of businesses and people. The Elastic Search AI Platform, used by more than 50% of the Fortune 500, brings together the precision of search and the intelligence of AI to enable everyone to accelerate the results that matter. By taking advantage of all structured and unstructured data — securing and protecting private information more effectively — Elastic’s complete, cloud-based solutions for search, security, and observability help organizations deliver on the promise of AI. As part of the Platform Engineering department, the Network Infrastructure team is crafting, building, and improving the multi-cloud platform at scale for Elastic Cloud Hosted and Serverless. We grow and mature our distributed large-scale network infrastructure that spans across multiple cloud service providers to support our cloud services. We are built on Kubernetes, Go, and custom orchestration architectures. In your daily life with us, you will participate in coding, innovating technical designs, crafting solutions, improving resilience, and prioritizing security, bug fixes, and features. For example, Debugging Azure Networking for Elastic Cloud Serverless is part of our efforts, and we want your experience to contribute to a truly exceptional customer experience! - Taking an engineering approach in leading technical initiatives for designing, building and automating network infrastructure and services to guarantee the reliability of the global Elastic network infrastructure. Focusing on Layer 2/3/4 of the TCP/IP stack (Ethernet and/or IP encapsulation, routing, firewalling, load balancing). - Growing our global Platform network infrastructure to meet the increasing scaling demands by developing and maintaining software, codebases, tooling and automations to serve our Network Infrastructure as Code principle. - Collaborating in an environment with an inclusive approach, and focusing on operational excellence which uplifts others. - Preventing repeated customer impact in response to major incidents and prioritised problem management. Our on call rotation is spread well, and we address complex customer concerns too. - Excellent networking skills, with knowledge of protocols such as IP/IPv6, TCP/UDP, BGP, DNS. - Strong technical depth for building and automating networks (Terraform, Ansible) in collaboration with other engineers as an authority in identifying, implementing and delivering solutions. - Good knowledge of public CSP network components (Load balancers, VPC peering/Transit gateways, VPN connectivity, Direct Connects). - Success and lessons of experiences from striving for 'progress not perfection' in the name of Platform reliability. We want to hear about your customer-first approach in solving operational problems for both today and the future. - Passion for developing solutions that involve inclusive communication methods to grow and strengthen partner and team relationships. Examples of working in distributed teams or working remotely is desirable. - Site-Reliability Engineering experience. We tackle problems with code, but fundamentally we keep things working and have proven success in operational excellence. Responding to and preventing repeated customer impact in response to major incidents and prioritized problem management. Our on call rotation uses a follow-the-sun model where everyone participates in it in (mostly) their working hours. - You have operated a SaaS product in a public cloud ideally built using Infrastructure-as-Code tooling such as Crossplane or Terraform. - You have designed and/or operated large network topologies that dynamic routing is based on BGP. - You have operated network topologies based on software routers. - You have experience in IP address management (IPAM) and you have used relevant tools for automated IP allocations. - You have designed and/or operated overlay networks with use of encapsulation protocols such as IPSec, GRE and VXLAN. - You have built or operated a Kubernetes-at-scale infrastructure, ideally across multiple cloud providers, with knowledge of the Cilium CNI. - You have written non-trivial programs in Golang or other programming languages. - You have worked with containerized services (such as Docker). - You have proven experience in leading and improving alerting and major incident management standard processes metrics systems (e.g. Elastic Stack, Graphite, Prometheus, Influx) to diagnose issues and quantify impacts to present to others at varying levels of the organization. - You have experience in system and network administration with professional skills in Linux on distributed systems at scale. - You have diagnosed or designed, implemented and created solutions with the Elastic Stack. - You are experienced in thriving in a self-organizing and sharing in a globally distributed team environment. - You strengthen team members in bringing out the best of each other by uplifting others with coaching and mentoring. As a distributed company, diversity drives our identity. Whether you’re looking to launch a new career or grow an existing one, Elastic is the type of company where you can balance great work with great life. Your age is only a number. It doesn’t matter if you’re just out of college or your children are; we need you for what you can do. We strive to have parity of benefits across regions, and while regulations differ from place to place, we believe taking care of our people is the right thing to do. - Competitive pay based on the work you do here and not your previous salary - Health coverage for you and your family in many locations - Ability to craft your calendar with flexible locations and schedules for many roles - Generous number of vacation days each year - Increase your impact - We match up to $2000 (or local currency equivalent) for financial donations and service - Up to 40 hours each year to use toward volunteer projects you love - Embracing parenthood with a minimum of 16 weeks of parental leave Security & Privacy Responsibilities: Take ownership of protecting the confidentiality, integrity, and availability of organizational data and systems by following applicable privacy and security policies, standards, and procedures. Ensure that all individual contributions follow Elastic’s Secure Software Development Framework (SSDF). Proactively participate in mandatory role-based training to ensure personal technical execution consistently aligns with the highest standards of data protection, data privacy, and system resilience. Different people approach problems differently. We need that. Elastic is an equal opportunity employer and is committed to creating an inclusive culture that celebrates different perspectives, experiences, and backgrounds. Qualified applicants will receive consideration for employment without regard to race, ethnicity, color, religion, sex, pregnancy, sexual orientation, gender perception or identity, national origin, age, marital status, protected veteran status, disability status, or any other basis protected by federal, state or local law, ordinance or regulation. We welcome individuals with disabilities and strive to create an accessible and inclusive experience for all individuals. Elasticsearch develops and distributes technology and information that is subject to U.S. and other countries’ export controls and licensing requirements for individuals who are located in or are nationals of the following sanctioned countries and regions: Belarus, Cuba, Iran, North Korea, Syria, or Russia, including the Ukrainian territories annexed by Russia (The Crimea region of Ukraine, The Donetsk People's Republic (DNR), The Luhansk People's Republic (LNR), Kherson or Zaporizhzhia). If you are located in or are a national of one of the listed countries or regions, an export license may be required as a condition of your employment in this role. Please note that national origin and/or nationality do not affect eligibility for employment with Elastic. Please see here for our Privacy Statement. Compensation for this role is in the form of base salary. This role does not have a variable compensation component. The typical starting salary range for new hires in this role is listed below. In select locations (including Seattle WA, Los Angeles CA, the San Francisco Bay Area CA, and the New York City Metro Area), an alternate range may apply as specified below. These ranges represent the lowest to highest salary we reasonably and in good faith believe we would pay for this role at the time of this posting. We may ultimately pay more or less than the posted range, and the ranges may be modified in the future. An employee's position within the salary range will be based on several factors including, but not limited to, relevant education, qualifications, certifications, experience, skills, geographic location, performance, and business or organizational needs. Elastic believes that employees should have the opportunity to share in the value that we create together for our shareholders. Therefore, in addition to cash compensation, this role is currently eligible to participate in Elastic's stock program. Our total rewards package also includes a company-matched 401k with dollar-for-dollar matching up to 6% of eligible earnings, along with a range of other benefits offered with a holistic emphasis on employee well-being. The typical starting salary range for this role is: $179,800—$232,900 USD The typical starting salary range for this role in the select locations listed above is: $179,800—$232,900 USD
• Collaborate with software architects and development teams to define infrastructure requirements and design comprehensive platform solutions. • Lead the design, implementation, and optimization of CI/CD pipelines using Bitbucket to streamline software development, testing, and deployment processes. • Architect and manage Infrastructure as Code (IaC) using tools such as Terraform or CloudFormation, enabling scalable and reproducible AWS infrastructure management. • Conduct PoCs to evaluate new tools, technologies, and methodologies, assessing their potential impact on the platform and operations. • Monitor and enhance the performance, reliability, and scalability of systems, ensuring high availability across production and development environments. • Troubleshoot and resolve complex issues across infrastructure, deployments, and applications, implementing robust solutions to improve system stability. • Integrate security best practices into the architecture and deployment processes, ensuring compliance with industry standards and regulations. • Mentor team members on advanced DevOps practices and contribute to establishing a culture of continuous improvement and operational excellence. • Engage directly with clients to understand their needs, prioritize and plan solutions, and manage expectations regarding project timelines and deliverables.
• Design and maintain scalable, multi-tenant Kubernetes platforms using reusable Helm templates to enable developer self-service. • Own pipeline reliability, cost, and speed. • Embed automated release gates, secure secret architectures, and software vulnerability scanning (SCA). • Build custom tools to extend platform capabilities and standardize shared environments. • Maintain distributed tracing, logging, and metrics systems. • Guide development teams on service instrumentation and establishing SLIs/SLOs/SLAs. • Consult development teams on system architecture, resource optimization, and modern traffic routing patterns.
• Work with a team of DevOps and DBA professionals • Improve existing infrastructure and processes in the 6 countries we’re currently deployed in as well as streamlining processes deploy to new countries in the future • Holistically improve all aspects of our DevOps infrastructure including: reducing costs; streamlining environment provisioning; lowering response times and incorporating the latest techniques and technologies • Monitor and maintain the existing cloud infrastructure via autoscaling, automated alerts, and OpsWork and Grafana dashboards • Take ownership and responsibility for our cloud operation activities • Liaise with external security agencies for annual audits as well as perform our own internal security sweeps • Aid in reconfiguring existing architecture to allow for rapid deployments to new countries • Mentoring less experienced team members




