Unparalleled Visibility Into Issue Detection, Diagnosis, and Remediation
Senior Site Reliability Engineer
Location
Spain
Posted
4 days ago
Salary
0
Seniority
Senior
Job Description
Senior Site Reliability Engineer
Nexthink
Company Description Nexthink is the leader in digital employee experience management software. The company provides IT leaders with unprecedented insight allowing them to see, diagnose and fix issues at scale impacting employees anywhere, with any application or network, before employees notice the issue. As the first solution to allow IT to progress from reactive problem solving to proactive optimization, Nexthink enables its more than 1,300 customers to provide better digital experiences to more than 18 million employees. Dual headquartered in Lausanne, Switzerland and Boston, Massachusetts, Nexthink has 9 offices worldwide. #LI-Hybrid Job Description At Nexthink, we empower our customers with industry-leading solutions to enable continuous improvement of employee experience. We deliver unmatched visibility across all environments, so IT teams can consistently see, diagnose, and fix digital workplace issues. As a SaaS provider, our commitment is to deliver a seamless, resilient, and scalable platform around the clock. We are looking for an experienced, proactive and innovative professional that is keen to join as a Senior Site Reliability Engineer! The mission of Nexthink's SRE team is to strengthen our infrastructure and enhance our ability to deploy, monitor, and scale systems effectively and reliably. They work closely with over 50 Product Engineering teams that develop our products and services, as well as with the Technical Platform Engineering, Security and Architecture teams to understand the reliability requirements, design and implement solutions, and promote them for adoption and usage. Join our vibrant team of diverse and experienced engineers where cutting-edge technology meets innovation. Be a part of Nexthink's Digital Employee Experience technological revolution, ensuring our global customers enjoy a seamless user experience. Apply now and become a key player in our dynamic SRE organisation. As a Senior Site Reliability Engineer, you will: - Implement and manage cloud-native systems (AWS) using best-in-class tools and automation. - Operate and enhance Kubernetes clusters, deployment pipelines, and service meshes to support rapid delivery cycles. - Design, build, and maintain the infrastructure powering our multi-tenant SaaS platform with reliability, security, and scalability in mind. - Define and maintain SLOs, SLAs, and error budgets, and proactively address availability and performance issues. - Develop infrastructure-as-code (Terraform or similar) for repeatable and auditable provisioning. - Build internal platform tools and automation to support provisioning, monitoring, and operational efficiency. - Monitor infrastructure and applications ensuring high-quality user experiences. - Participate in a shared on-call rotation, responding to incidents, troubleshooting outages, and driving timely resolution and communication. - Act as an Incident Commander during the on-call duty and coordinate cross-team responses effectively to maintain an SLA. - Drive and refine incident response processes, reducing Mean Time to Detect (MTTD) and Mean Time to Recovery (MTTR). - Diagnose and resolve complex issues independently, minimizing the need for external escalation. - Work closely with software engineers to embed observability, fault tolerance, and reliability principles into service design. - Automate runbooks, health checks, and alerting to support reliable operations with minimal manual intervention. - Support automated testing, canary deployments, and rollback strategies to ensure safe, fast, and reliable releases. - Contribute to security best practices, compliance automation, and cost optimization. Qualifications Minimum Bachelor's degree in Computer Science or equivalent practical experience. - 5+ years of experience as a Site Reliability Engineer or Platform Engineer with strong knowledge of software development best practices. - Strong hands-on experience with public cloud services (AWS, GCP, Azure) and supporting SaaS product. - Strong programming or scripting skills (e.g., Python, Go, Bash...), and experience with infrastructure-as-code (e.g. Terraform). - Proficiency with Kubernetes, container-based deployment (e.g., Docker) and related ecosystems (e.g., Helm). - Experience supporting multi-tenant microservices architectures. - Experience with CI/CD pipelines & tools (e.g., Jenkins, GitHub Actions, GitLab CI, FluxCD, Crossplane). - Experience with managing monitoring solutions (e.g. Datadog). - Comfortable participating in a rotating on-call schedule, managing critical incidents, and leading post-incident reviews. - At ease with operating and managing production systems, striking the right balance between urgency and methodology. - Strong system-level troubleshooting skills and a proactive mindset toward incident prevention. - Deep understanding of Linux systems, networking, and common troubleshooting practices. - Solid understanding of the network stack (e.g., TCP/IP, VPN, etc.), cloud architectures (VPC, subnets, firewalls, load balancers), service mesh (e.g., Istio) and storage (e.g., S3, EBS, etc). - Knowledge of zero-downtime deployment strategies, blue/green and canary releases. - Exposure to compliance standards such as SOC 2, ISO 27001, or HIPAA. FedRAMP experience is a big plus. - Experience with chaos engineering or resilience testing practices. - Excellent problem-solving skills, collaborative mindset, and a strong grasp of agile, iterative development. - Self-driven, highly organised, and capable of independently managing priorities. - Curiosity to learn new things and discover new technologies. - Strong communication, presentation, and team collaboration skills. - Excellent written and verbal skills in English. The prior experience with any of the above-mentioned tools is a bonus, but not a must! We encourage you to apply even if you do not meet every single requirement. We welcome candidates with different level of background and experience. If you are excited about this role, please apply and our recruiters will assess your application. Additional Information Additional Information We are the pioneers and trailblazers of a global IT Market Category (DEX) that is shaping the future of how the world works, giving our customers' IT Teams total digital visibility across their enterprise. Our innovative solutions integrate real-time analytics, automation, and employee feedback across all endpoints. This enables our IT teams to solve complex technical challenges, create ever more productive workplaces, and deliver happy, satisfied employees in the digital workplace. With over 1000 employees across 5 continents, Nexthink operates as One Team, connecting, collaborating and innovating to continuously grow. We call our employees 'Nexthinkers' and our commitment to diversity, inclusion, and equity is second to none. We currently have over 75 nationalities working with us, from all cultures and backgrounds, speaking many different languages. IIf you are looking for a change and like a nice atmosphere, lots of challenges, and having fun while working, this is a great opportunity for you! Check what we offer: - Permanent Contract and a competitive compensation package. - Health insurance through our partnership with ACKO, including OPD coverage for dental, vision, health check-ups, consultations, and pharmacy expenses. - Hybrid work model balancing office and remote work, with a structured approach for new hires to foster connections and onboarding. - Flexible Hours and unlimited vacation (employees have unlimited paid time off on top of the 22 days of holidays we offer). Plus, company-paid bank holidays (12), sick days (10-30), bereavement leave (5), and 3 days per year for volunteering. - Free access to professional training platforms to explore your interests and enhance your skills. - Stay covered against accidents, bodily injuries, and disabilities with our personal accident insurance policy, providing assurance with coverage up to three times your annual CTC. - New mothers are entitled to up to 26 weeks of maternity leave, with the flexibility to use up to 8 weeks before the expected delivery and the remaining 18 weeks after. Birth fathers can take 6 weeks of paternity leave, while adoptive parents are eligible for 26 weeks of leave for mothers and 6 weeks for fathers. - Under the Payment of Gratuity Act, receive gratuity at the rate of 15 days of basic pay for every completed year of service, provided you've been employed by the company for a minimum of 5 years. Gratuity is payable at retirement or resignation based on your last drawn basic pay. - Bonuses for referring successful hires after three months of continuous employment. Please note that not all the benefits listed above are available for temporary, contract, and internship roles. To ensure you have the most up-to-date information, we recommend checking with your Recruitment Partner. The base salary for this role is €60,000 - €85,500 gross per year, with a total on-target earnings (OTE) range of €66,000 - 93,000€ including an annual performance bonus. You'll also be part of our broader total rewards package - including benefits tailored to where you live and how you work best. We set our pay ranges using objective criteria: the scope and level of the role, the skills it takes to do it well, and the relevant market data. Ranges are reviewed every year to remain competitive and fair. We're transparent about this because we think you deserve to know what you're working towards from day one. In accordance with the EU Pay Transparency Directive (2023/970), we publish salary ranges on every Nexthink role. We won't ask what you currently earn or your previous salary. What matters to us is what this role is worth and whether it works for you. Nexthinkers come from all kinds of backgrounds, and that's what makes us stronger. We welcome applications from everyone.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Lead DevSecOps Systems Engineer
General DynamicsA business unit of General Dynamics, General Dynamics Information Technology (GDIT) supports some of the United States' most complex government, defense, and in
• Architect and scale robust, secure CI/CD pipelines, data integrations, and Infrastructure as Code (IaC) across project-based deployments. • Partner with development teams to seamlessly integrate, automate, and monitor security tool components within automated workflows. • Define guidelines and standards for AWS Cloud and Kubernetes environments, implementing advanced solutions for system security, backups, and redundancy. • Champion DevSecOps culture by mentoring junior/mid-level engineers, educating teams on modern tooling, and resolving complex configuration or performance issues. • Leverage Generative AI engineering tools (such as Claude, Gemini, Copilot) to accelerate the development of Infrastructure as Code (IaC), pipeline scripts, and automation workflows. • Optimize cloud infrastructure and container ecosystems to ensure cost-efficiency, scalability, and strict adherence to governance standards. • Drive engineering excellence by guiding the preparation of comprehensive technical documentation, processes, and procedures.
Senior DevOps Engineer
Lean Solutions GroupLean Tech is a rapidly expanding organization situated in Medellín, Colombia. We pride ourselves on possessing one of the most influential networks within software development and IT services for the entertainment, financial, and logistics sectors. Our corporate projections offer many opportunities for professionals to elevate their careers and experience substantial growth. Joining our team means engaging with expansive engineering teams across Latin America and the United States, contributing to cutting-edge developments in multiple industries.
Role Description The DevOps Engineer is responsible for designing, implementing, and maintaining AWS cloud infrastructure, deployment automation, CI/CD pipelines, and operational reliability capabilities supporting the NexusNow multi-tenant SaaS hosting platform and its expanding product portfolio (such as Sentinel, DRIFT, Line Boss / xPlorate, and VELMA / Legal). This role will work closely with engineering, architecture, and operations teams to improve multi-tenant deployment velocity, infrastructure scalability, platform observability, and rigid tenant isolation safety boundaries. Key Responsibilities - Design, configure, and manage high-availability AWS cloud infrastructure (VPC layouts spanning 3 AZs with segregated public, application, and data tiers) using Infrastructure-as-Code (IaC) best practices. - Build and maintain scalable CI/CD pipelines supporting automated multi-tenant deployment workflows and immutable release paths. - Configure and manage AWS networking, egress-only NAT gateways, private EKS API endpoints, identity/access management (CASL architecture), and environment configuration. - Support AWS deployment automation, logging, alerting, and operational readiness, explicitly monitoring threat vectors emitted to Amazon CloudWatch and CloudTrail. - Partner with engineering teams to optimize the velocity and stability of application onboarding cycles (e.g., standardizing patterns learned during the VELMA / Legal launch). - Troubleshoot complex infrastructure, deployment, cross-tier networking, and multi-tenant isolation configuration anomalies. - Support automated data protection operations, verifying 30-day credential rotations via AWS Secrets Manager and automated snapshots within an isolated AWS Backup vault. - Contribute to core platform standards, compiling detailed operational runbooks to convert tribal knowledge into repeatable assets. - Participate in agile ceremonies including standups, backlog refinement, and retrospectives. - Enforce security and compliance requirements (e.g., SOC2 metrics) across all staging, performance, and production environments. Qualifications - Extensive (5+ years) hands-on experience in engineering, scaling, and debugging enterprise AWS infrastructure platforms. - Deep expertise with AWS CDK, Terraform, or CloudFormation templates, with a mandatory emphasis on using CDK to maintain modular cloud architectures. - Advanced experience building and maintaining automated deployment code workflows (specifically via GitHub Actions). - Mastery across AWS services powering multi-tenant frameworks, explicitly including: Amazon Cognito (multi-pool structures), Amazon Aurora PostgreSQL, Redis/ElastiCache, Amazon S3, CloudFront/AWS WAF, CloudWatch, EventBridge/SQS/Step Functions, and AWS Secrets Manager. - Solid understanding of multi-tenant security strategies, policy-based access layers (RBAC/ABAC), and network boundary definitions. - Experience tracing and debugging errors across heavily decoupled cloud systems and microservices. Nice to Have Skills - Experience in engineering enterprise-scale multi-tenant SaaS cloud platform hosting models. - Experience supporting containerized and serverless AWS workloads, with emphasis on Amazon EKS on Fargate running inside isolated private subnets. - Prior exposure to developer enablement practices and building automated onboarding templates to abstract platform complexity. - Experience maintaining infrastructure inside tightly audited, regulated, or security-sensitive environments (SOC2 alignment). Soft Skills - The ability to understand the daily frustrations of application development teams and approach infrastructure as a service that enables—rather than blocks—their engineering velocity. - Maintains extreme clarity, calm focus, and structured communication during high-pressure platform outages or deployment rollbacks before escalating to Slalom’s Incident Command. - Possesses the open, friction-free communication style needed to pair directly with the Brazil team (OE) to absorb complex cloud infrastructure topologies and code patterns.
Senior Security Engineering & Compliance Lead
CiscoWe securely connect everything to make anything possible.
Role Description This role is remote and can be worked from any US location, though preference is eastern time zone. The Cisco Secure Workload team is at the forefront of data center and cloud security. Our platform provides comprehensive, automated, policy-based security for multi-cloud environments, delivering deep visibility, micro-segmentation, and advanced threat detection. By leveraging sophisticated analytics and machine learning, we empower organizations to protect their workloads, ensuring compliance and operational resilience within increasingly complex, distributed infrastructures. Your Impact: - As a Senior Security Engineering & Compliance Lead, you will bridge the gap between technical infrastructure and regulatory rigor. - You will manage the implementation of security frameworks (SOC, ISO, NIST, etc) with automated compliance pipelines, hardened identity systems, and risk-mitigation strategies. - This role is for a hands-on engineer who views compliance as a technical problem to be solved through automation, robust system design, and proactive threat engineering. Engineering Compliance & Audit Automation - Architect Compliance-as-Code: Design and implement automated controls to satisfy security compliance requirements, reducing manual evidence collection through system integration. - Audit Readiness Engineering: Conduct technical gap assessments of infrastructure and applications; design remediation plans that integrate directly into the CI/CD pipeline. - Evidence Orchestration: Build and maintain automated data pipelines to provide real-time visibility into control effectiveness for auditors and stakeholders. Security Operations & Incident Engineering - Detection Engineering: Oversee security alert queues, prioritizing high-severity risks and engineering automated response playbooks to resolve incidents. - Incident Simulation: Facilitate and document technical incident response tabletop exercises, using the findings to engineer more resilient system architectures and automated recovery processes. - Documentation as Code: Maintain technical documentation and incident logs that serve as the "source of truth" for audit requirements. Identity & Access Engineering (IAM) - IAM Hardening: Engineer and enforce automated user access reviews and segregation-of-duties (SoD) testing. - Privileged Access Management (PAM): Audit and optimize privileged account controls, implementing technical guardrails to minimize the blast radius of unauthorized access. Risk & Vulnerability Engineering - Vulnerability Lifecycle Management: Perform deep-dive vulnerability analyses on enterprise infrastructure; engineer automated patch management and configuration hardening workflows. - Risk-Based Prioritization: Quantify business impact through technical risk assessments, collaborating with engineering teams to implement corrective technical controls rather than just policy-based fixes. Qualifications - Bachelor’s degree in Computer Science, Cybersecurity, or related technical field with 8+ years of experience; or a Master’s degree with 6+ years. - Experience with security engineering in cloud-native environments (AWS/Azure/GCP) and infrastructure-as-code (Terraform/Ansible). - Experience working with technical security controls and regulated compliance frameworks such as SOC, ISO, etc. Requirements - Python, Go, or Bash scripting/programming for security automation or log analysis. - Hands-on experience building "Compliance-as-Code" solutions. - Certifications such as CCSP, CCSK, OSCP, cloud or security specific certifications. - Experience with SIEM/SOAR engineering and automated incident response orchestration. - Strong understanding of zero trust architecture and micro-segmentation engineering. Benefits - Medical, dental and vision insurance. - 401(k) plan with a Cisco matching contribution. - Paid parental leave. - Short and long-term disability coverage. - Basic life insurance. - 10 paid holidays per full calendar year, plus 1 floating holiday for non-exempt employees. - 1 paid day off for employee’s birthday, paid year-end holiday shutdown, and 4 paid days off for personal wellness determined by Cisco. - 16 days of paid vacation time per full calendar year for non-exempt employees. - Flexible vacation time off program for exempt employees. - 80 hours of sick time off provided on hire date and each January 1st thereafter. - Optional 10 paid days per full calendar year to volunteer.
• Collaborate with Cyber teams to maintain and monitor system availability, performance, and logs using enterprise tools • Leverage industry standards to build out automation workflows • Travel up to 10% (domestic) to other RTX locations



