The Cloud Native Developer and Security Operations Company.
Principal Site Reliability Engineer
Location
South Africa
Posted
93 days ago
Salary
0
Seniority
Lead
Job Description
Principal Site Reliability Engineer
Deimos
• Design and build advanced cloud-native infrastructure • Guide technical discussions with clients and build technical roadmaps • Collaborate with the Engineering Director(s) to (re)design architecture • Assist the Site Reliability Manager with resource planning • Assist engineering managers with building career paths for individuals wishing to be promoted to Principal Engineers • Teach, mentor, grow, and provide advice to other domain experts, individual contributors, and across several teams. • Document processes and monitor performance metrics • Guide conversations to remove blockers and encourage collaboration across teams. • Constantly improve the stability, scalability, security, cost-effectiveness, and operational excellence of our clients' systems. • Continuously discover, evaluate, and implement new technologies to maximize development efficiency and security. • Conduct infrastructure planning, testing, and development • Provide technical leadership on multiple projects.
Job Requirements
- At least 7 or more years experience working in a DevOps/SRE team
- Extensive experience in DevOps/SRE, team management and collaboration
- Advanced knowledge of best practices related to data encryption and cybersecurity
- Advanced knowledge of the general DevOps/SRE landscape, architectures, and emerging technologies
- Cloud experience, preferably GCP, Azure and AWS
- Experience in Observability Practices and Incident Management
- Extensive experience with Prometheus, Grafana, the Elastic Stack and all versions of Beats, especially within Kubernetes
- Experience with Infrastructure as Code, preferably Terraform
- Experience with general automation and config management, preferably Ansible
- Extensive experience building and maintaining Kubernetes clusters and workloads
- Strong foundation of basic network and security concepts
- Ability to build robust CICD pipelines
- Familiarity with relational and non-relational databases
- Solid understanding of Linux operating systems
Benefits
- Flexibility and the freedom to work remotely.
- Work-life balance where you are not expected to work over weekends or after hours.
- A forward thinking remote company that knows how important it is to stay connected as one team, by providing virtual social platforms for employee engagement.
- A monthly work from home allowance which you can use to set yourself up to work comfortably from home.
- A MacBook or Windows laptop for you to do your best work on.
- Become part of a team of exceptionally clever and talented people who like to share their knowledge and learnings.
- We support your career growth and love to celebrate your successes and advancement!
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Lead the design, implementation, and evolution of cloud infrastructure using Infrastructure as Code. • Drive automation strategies for deployment and infrastructure to ensure high availability, reliability, and efficiency. • Own the architecture and management of AWS networking environments. • Lead the design and operation of containerized platforms and orchestration. • Architect, maintain, and continuously improve CI/CD pipelines at scale. • Define and implement monitoring, observability, and alerting strategies. • Ensure end-to-end security, performance, scalability, and cost optimization of the infrastructure. • Act as a technical partner for engineering and operations teams to enable continuous delivery. • Evaluate, propose, and lead the adoption of new tools, technologies, and best practices. • Provide technical leadership, mentorship, and guidance to engineers.
This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more. Role Description Our mission at HubSpot is to help millions of organizations grow better. On the Marketing Web Team, you’ll play a critical role in ensuring the systems behind HubSpot’s digital experiences are reliable, scalable, and built to grow with our customers. This is a highly visible team responsible for delivering innovative web platforms and applications that power HubSpot’s global brand and are experienced by millions every day. As a Senior DevOps Engineer, you’ll design and operate the infrastructure that supports high-traffic, data-intensive marketing platforms and systems. You’ll partner closely with developers and marketers to improve delivery velocity while upholding strong standards for security, performance, and operational excellence. Together, we’re on a mission to build the best web experience in the world, and the reliability of our platforms is foundational to that goal. What You’ll Do - Design, build, and maintain scalable, secure, and reliable cloud infrastructure to support high-traffic, business-critical digital applications. - Proactively improve system reliability, availability, and performance through automation, observability, and continuous optimization. - Collaborate cross-functionally with Web Development, QA, and Product teams to align infrastructure decisions with business and delivery goals. - Lead incident response, root cause analysis, and post-incident reviews, ensuring learnings are translated into systemic improvements. - Build and maintain CI/CD pipelines that reduce friction and increase deployment confidence. - Establish and evolve infrastructure-as-code standards to ensure consistency, scalability, and long-term maintainability. - Identify opportunities to incorporate AI-assisted tooling into DevOps workflows and partner with leadership to measure its impact on reliability, delivery velocity, and cost efficiency. - Drive security, resiliency, and cost-awareness across environments. - Own medium-to-large infrastructure initiatives from design through production, including documentation and long-term support. - Contribute to sprint planning and backlog refinement within an Agile environment, ensuring work is prioritized and delivered effectively. - Contribute to roadmap planning by identifying infrastructure investments that improve reliability, scalability, and developer velocity. - Evaluate technical trade-offs and clearly communicate risks, impact, and recommendations to stakeholders. - Develop and maintain runbooks, architectural diagrams, and system documentation to support operational excellence. - Mentor engineers by sharing best practices in DevOps and cloud architecture. Qualifications - 5+ years of experience in DevOps, platform engineering, or infrastructure-focused software engineering roles. - Significant experience operating production systems in at least one major public cloud environment (AWS, Azure, or GCP). - Practical experience with infrastructure as code and configuration management (e.g., Terraform, CloudFormation, Ansible, Chef). - Hands-on experience building and maintaining CI/CD pipelines using tools such as Jenkins, GitHub Actions, or similar platforms. - Experience working with CDN, DNS, and edge security platforms (e.g., Cloudflare). - Solid understanding of Linux systems, networking, and distributed systems. - Hands-on experience with containers and orchestration tools (e.g., Docker, Kubernetes). - Ability to troubleshoot complex system issues and drive them to resolution. - Ability to prioritize competing initiatives and manage work with minimal oversight. - Strong written and verbal communication skills, with the ability to explain complex technical concepts clearly. Requirements - Experience supporting large-scale, customer-facing SaaS platforms. - Experience operating highly distributed, scalable systems in production environments. - Exposure to observability tooling such as Prometheus, Grafana, or OpenTelemetry. - Familiarity with web technologies such as TypeScript, JavaScript, and Node.js. - Experience leveraging AI-assisted engineering or operations tools to improve productivity or system reliability. - Experience with security best practices in cloud environments. - Exposure to cost optimization strategies in high-growth systems. - Prior experience mentoring or leading technical initiatives. Benefits - Annual Cash Compensation Range: $108,000 — $162,000 USD. - Base salary, on-target commission for eligible roles, and annual bonus targets under HubSpot’s bonus plan. - Participation in HubSpot’s equity plan to receive restricted stock units (RSUs) for eligible roles. - Flexible work arrangements, including remote options and in-person onboarding. - Support for candidates needing accommodations during the hiring process. Company Description HubSpot (NYSE: HUBS) is an AI-powered customer platform with all the software, integrations, and resources customers need to connect marketing, sales, and service. HubSpot's connected platform enables businesses to grow faster by focusing on what matters most: customers. At HubSpot, bold is our baseline. Our employees around the globe move fast, stay customer-obsessed, and win together. Our culture is grounded in four commitments: Solve for the Customer, Be Bold, Learn Fast, Align, Adapt & Go!, and Deliver with HEART. These commitments shape how we work, lead, and grow. We’re building a company where people can do their best work. We focus on brilliant work, not badge swipes. By combining clarity, ownership, and trust, we create space for big thinking and meaningful progress. And we know that when our employees grow, our customers do too. Recognized globally for our award-winning culture by Comparably, Glassdoor, Fortune, and more, HubSpot is headquartered in Cambridge, MA, with employees and offices around the world.
Senior Director of Engineering, SRE
AlphaSenseThe market intelligence and search platform trusted by over 3,500 leading organizations
• Lead reliability and operational excellence across AlphaSense’s platforms and products • Scale SRE practices in a “you build it, you run it” engineering organization • Lead and grow a follow-the-sun SRE team across multiple time zones • Build, mentor, and develop high-performing SRE engineers • Own incident management, on-call operations, and post-incident learning • Cultivate an awareness and culture of reliability throughout the engineering organization • Set direction for observability and operational tooling • Enable teams to operate production systems safely and confidently • Embed reliability into the whole software delivery lifecycle in collaboration with Product, Platform, Cloud, and Security • Reduce systemic risk through toil reduction and continuous improvement
• Lead the design, implementation, and evolution of scalable and reliable cloud infrastructure across multiple projects. • Own and continuously improve CI/CD platforms and infrastructure-as-code practices using tools such as Terraform, Helm, and cloud-native solutions. • Define and enforce DevOps best practices, standards, and operational guidelines across engineering teams. • Mentor and support DevOps and engineering team members, promoting knowledge sharing and technical growth. • Work closely with Data Engineering and Machine Learning teams to design infrastructure and deployment strategies for data and ML workloads. • Ensure high levels of reliability, observability, and performance by implementing monitoring, logging, and incident response practices. • Lead troubleshooting efforts for complex production incidents and coordinate root cause analysis and preventive improvements. • Collaborate with leadership and engineering teams to plan infrastructure strategy, scalability, and long-term platform evolution. • Promote security best practices and ensure infrastructure complies with security and compliance standards. • Drive cost-efficiency initiatives and implement FinOps practices to optimize cloud usage.




