Job Closed
This listing is no longer active.
Menlo Security protects productivity online with a one-of-a-kind, isolation-powered cloud security platform.
Platform Infrastructure Engineer – SRE Core
Location
Canada
Posted
107 days ago
Salary
0
Seniority
Senior
Job Description
Platform Infrastructure Engineer – SRE Core
Menlo Security Inc.
• Design, deploy, and maintain VM and Kubernetes infrastructure on GCP and AWS across dozens of clusters spanning development, staging, and production environments in multiple regions. • Coordinate with your peers in your direct team as well as across teams to ensure that the tasks you’re working on are going to solve the problems that we need them to solve. • Build and maintain Infrastructure as Code (IaC) using Terraform modules, managing resources through Spacelift or equivalent Terraform Automation and Collaboration Software (TACOS). Provision cloud infrastructure including networking, compute, storage, and security components primarily on GCP, with secondary AWS support. • Implement and manage workflows with sophisticated multi-layer configuration management. • Build and maintain comprehensive observability solutions using Grafana Cloud, Prometheus/Mimir, and OTel collectors. Design Grafana dashboards, configure alerting rules, and ensure visibility across all platform components. • Manage certificate lifecycle, DNS automation, ingress controllers, and service mesh networking with Cilium. • Partner with Engineering, Product, Compliance, and Security teams to design resilient, scalable systems. Consult on capacity planning, disaster recovery, and architectural decisions for cloud-native applications. • Identify and eliminate toil through automation. Write scripts, develop tools, and build CI/CD pipelines to improve operational efficiency and reduce manual work. • Participate in a 24x7 on-call rotation as part of a globally distributed team, responding to incidents and driving post-incident reviews.
Job Requirements
- Bachelor's degree in Computer Science, similar technical field of study, or equivalent practical experience.
- Proficiency in common programming & scripting languages. We use a lot of python, bash and go.
- Understanding of network topologies, communication protocols (ie. TCP/IP, HTTP/S, UDP, TLS) and enterprise grade connectivity solutions.
- Kubernetes expertise including cluster administration, RBAC, networking, workload management, and troubleshooting across production environments.
- Proven experience with Terraform for infrastructure provisioning and management.
- Knowledge of Google Cloud Platform services including GKE, VPC networking, Cloud DNS, Artifact Registry, Secret Manager, IAM, Gemini Code Assist, and Workload Identity.
- Experience with GitOps methodologies and tools.
- Clear understanding of how to use LLM code assist tools to effectively build software.
Benefits
- Health insurance
- Retirement plans
- Paid time off
- Flexible work arrangements
- Professional development opportunities
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Lead two high-performing teams: a Dev Tooling team focused on CI/CD pipelines, developer experience, and internal platforms, and a Site Reliability Engineering (SRE) team focused on system health, performance, secops, and incident response. • Improve communication and collaboration, including reuse and ownership, across DevOps, Product, and Platform teams. • Improve our processes for deployment, monitoring, incident management, and release engineering, ensuring speed and reliability. • Drive the adoption of newer GitOps practices, with an emphasis on MLOps and AI-related infrastructure and tooling. • Ensure productive and collaborative communications within, and across, your teams and stakeholders. • Own design and implementation of infrastructure and tooling roadmap capabilities by working with internal, data, and product teams as necessary. • Work in partnership with Product and Engineering to ensure technical health, work through infrastructure challenges, ensure products ship reliably, and manage operational risks. • Participate in technical and architecture reviews across your teams and the wider engineering organization.
DevOps Manager
HashgraphHashgraph, formerly Swirlds Labs, is a software company home to some of the brightest minds in web3.
• Lead and mentor a team of DevOps engineers, providing technical guidance and career development • Manage day-to-day operations of Hedera production and preproduction infrastructure • Coordinate with Hedera Governing Council members on operational matters and infrastructure requirements • Design and implement automation solutions to reduce operational toil and improve efficiency • Own and evolve infrastructure as code practices using Terraform and Ansible • Establish and maintain incident management processes, including on-call rotations and post-mortem reviews • Drive continuous improvement initiatives for monitoring, observability, and alerting systems • Manage capacity planning and scaling strategies for cloud and bare metal infrastructure • Ensure 24/7 operational readiness and lead response to critical incidents • Lead hiring efforts to grow the DevOps team, including defining role requirements, interviewing candidates, and making hiring decisions • Collaborate with development teams to improve CI/CD pipelines and deployment processes • Define and track team KPIs, SLOs, and operational metrics • Manage team budget and resource allocation • Interface with senior leadership on strategic planning and technical roadmap
Role Description As a Cloud & DevOps Engineer, you will be responsible for designing, building, and operating secure, scalable, and cost-efficient cloud infrastructure, while improving reliability, automation, and developer experience. - Cloud infrastructure & architecture - Design and maintain AWS multi-account, multi-region environments. - Build and evolve VPC networking (subnets, routing, peering, security groups, private endpoints). - Operate container platforms (ECS/EKS clusters, services, capacity management, scaling strategies). - Manage data platforms (RDS PostgreSQL, ElastiCache, S3, OpenSearch). - Infrastructure as Code - Develop and maintain CloudFormation and Terraform stacks. - Standardize reusable modules and enforce best practices. - Review and manage infrastructure changes with CI/CD workflows. - CI/CD & delivery - Design and operate deployment pipelines (GitHub Actions, self-hosted runners). - Improve developer experience and release velocity. - Observability & reliability - Implement monitoring, logging, and alerting (Sentry, Opensearch, CloudWatch, metrics, alarms, log pipelines). - Troubleshoot production incidents and perform root cause analysis. - Define SLOs, improve system resilience. - Cost optimization - Analyze AWS billing and usage patterns. - Optimize compute, storage, and data transfer costs. - Evaluate instance types, storage classes, and architectural trade-offs. - Security & compliance - Manage Google, VPN and SSO access, IAM roles, policies, and least-privilege access. - Secure network boundaries and service communications. - Support security reviews and third-party assessments. - Operations & incident response - Participate in on-call rotations and incident management. - Automate remediation and operational workflows. - Coordinate with developer teams during outages or performance issues. - Automation & tooling - Build scripts and internal tools (Python, CLI automation, Airflow). - Automate repetitive operational tasks. - Integrate systems across monitoring, alerting, and ticketing. - Collaboration & leadership - Work closely with engineering teams to design solutions. - Provide guidance on architecture, scalability, and reliability. - Contribute to technical documentation and knowledge sharing. Qualifications - 2-5 years of experience as a DevOps / Infrastructure engineer in a fast-growing environment. - Strong experience operating AWS production environments across multiple accounts, with knowledge of networking, security, and cloud architecture. - Strong experience with Infrastructure as Code (CloudFormation, Terraform) and automated delivery pipelines (GitHub Actions). - Scripting skills (Python, Bash), an automation mindset, and experience of GNU/Linux systems administration. - Hands-on experience running containerized workloads on ECS and managing deployments, scaling, and troubleshooting. - Good experience with observability, incident response, and improving reliability in production systems. - Knowledge on data services (PostgreSQL, OpenSearch, S3) and optimizing performance and costs. - Familiar with VPN implementations, networking and common protocols (TCP/IP, DNS, HTTP, FTP, SSH, ...), load balancing & proxy solutions. - Embrace modern tooling and continuously look for ways to leverage automation and AI to improve productivity and decision-making. - Fluent in English; another language is a plus. Requirements - Communicative. - Autonomous, proactive, and not afraid to take initiatives. - Problem solver. - Ready to join a complex project with more than 60 engineers. Benefits - Thrive in an international and inclusive environment with over 46 different nationalities. - Compensation plan for Subscription Warrants for Company Creators (BSPCE) and a Pluxee card for managing tax levels. - Attractive deals for home insurance and green electricity and gas. - Medical insurance through Alan or Sanitas with up to 50% coverage by papernest (after 6 months in the company). - Access to ongoing training tailored to your goals (technical, language, or managerial skills). - Numerous opportunities for career development and growth.
DevOps Lead
Newfire Global PartnersSoftware Development, Staff Augmentation, and Advisory Services company operating in 8 countries across 4 continents.
• Define and evolve the **DevOps architecture and cloud strategy** for the TDRx platform • Provide technical leadership on **CI/CD, infrastructure automation, and deployment strategies** • Establish and guide best practices for **security, access control, and secrets management** • Support engineering teams in improving **build, release, and environment reliability** • Ensure infrastructure is **scalable, observable, and maintainable** • Advise on and review infrastructure-as-code implementations • Help teams troubleshoot complex environment or deployment issues • Participate in key technical discussions and architecture decisions




