Self-described as the leading platform for search-powered solutions, Elastic helps organizations, their customers, and their employees find what they need faster while protecting a
Senior SRE - Platform
Location
Canada
Posted
4 days ago
Salary
C$148.3K - C$185.6K / year
Seniority
Senior
Job Description
Senior SRE - Platform
Elastic
Role Description As part of the Platform Engineering department, the SRE team is designing, building, scaling and maturing the multi-cloud platform for hosting internal and external services such as the Elastic Cloud Hosted and Serverless. We develop and extend new software and tools that support the rest of the infrastructure, so that we can rapidly deploy products from all corners of Elastic. We want your experience and recommendations to offer a truly exceptional customer experience! What you will be doing - Taking an engineering approach in leading technical initiatives for automating system engineering efforts to guarantee the reliability of the global Elastic infrastructure. - Growing our global Platform infrastructure to meet the increasing scaling demands by developing and maintaining software, tooling and automations. - Collaborating in an environment with an inclusive approach, and focusing on operational excellence, and uplifting others. - Responding to and preventing repeated customer impact in response to major incidents and prioritised problem management. Our on-call rotation uses a follow-the-sun model where everyone participates in it in (mostly) their working hours. Qualifications - Success and lessons of experiences from striving for 'progress not perfection' in the name of Platform reliability. - A customer-first approach in solving operational problems with an SRE perspective. - A background in software engineering to collaborate with engineers to expertly identify, implement and deliver solutions ideally using Golang. - Production experience in Public Cloud Service Providers and managing Kubernetes infrastructure at scale. - Passion for developing solutions that involve inclusive communication methods to grow and strengthen partner and team relationships. - Examples of working in distributed teams or working remotely is desirable. Bonus Points - You have operated a SaaS product in a public cloud ideally built using Infrastructure-as-Code tooling such as Crossplane or Terraform. - You have built or operated a Kubernetes-at-scale infrastructure, ideally across multiple cloud providers, and the vital automation to support it. - You have worked with containerized services (such as Docker). - You have proven experience in leading and improving alerting and major incident management standard processes metrics systems (e.g. Elastic Stack, Prometheus, Influx) to diagnose issues and quantify impacts to present to others at varying levels of the organization. - You have experience in system administration with professional skills in Linux on distributed systems at scale. - You have diagnosed or designed, implemented and created solutions with the Elastic Stack. - You are experienced in thriving in a self-organizing and sharing in a globally distributed team environment. - You strengthen team members in bringing out the best of each other by uplifting others with coaching and mentoring. Compensation Compensation for this role is in the form of base salary. This role does not have a variable compensation component. The typical starting salary range for new hires in this role is: - $148,300 — $185,600 CAD An employee's position within the salary range will be based on several factors including, but not limited to, relevant education, qualifications, certifications, experience, skills, geographic location, performance, and business or organizational needs. Elastic believes that employees should have the opportunity to share in the value that we create together for our shareholders. Therefore, in addition to cash compensation, this role is currently eligible to participate in Elastic's stock program. Our total rewards package also includes a company-matched Registered Retirement Savings Plan (RRSP) with dollar-for-dollar matching up to 6% of eligible earnings, along with a range of other benefits offered with a holistic emphasis on employee well-being. Benefits - Competitive pay based on the work you do here and not your previous salary. - Health coverage for you and your family in many locations. - Ability to craft your calendar with flexible locations and schedules for many roles. - Generous number of vacation days each year. - Increase your impact - We match up to $2000 (or local currency equivalent) for financial donations and service. - Up to 40 hours each year to use toward volunteer projects you love. - Embracing parenthood with a minimum of 16 weeks of parental leave. Equal Opportunity Employer Elastic is an equal opportunity employer and is committed to creating an inclusive culture that celebrates different perspectives, experiences, and backgrounds. Qualified applicants will receive consideration for employment without regard to race, ethnicity, color, religion, sex, pregnancy, sexual orientation, gender perception or identity, national origin, age, marital status, protected veteran status, disability status, or any other basis protected by federal, state or local law, ordinance or regulation. We welcome individuals with disabilities and strive to create an accessible and inclusive experience for all individuals. To request an accommodation during the application or the recruiting process, please email candidate_accessibility@elastic.co. We will reply to your request within 24 business hours of submission.
Related Guides
Related Categories
Related Job Pages
More Platform Engineer Jobs
Senior Site Reliability Engineer - Platform
Referral BoardRemote's Total Rewards philosophy is to ensure fair, unbiased compensation and fair equity pay along with competitive benefits in all locations in which we operate. We do not agree to or encourage cheap-labor practices and therefore we ensure to pay above in-location rates. At Remote, we foster internal mobility as a key element of our culture of employee growth and development, supported by a compensation philosophy that guarantees pay equity and fairness.
Role Description As part of the Platform Engineering department, the SRE team is designing, building, scaling and maturing the multi-cloud platform for hosting internal and external services such as the Elastic Cloud Hosted and Serverless. We develop and extend new software and tools that support the rest of the infrastructure, so that we can rapidly deploy products from all corners of Elastic. We want your experience and recommendations to offer a truly exceptional customer experience! What you will be doing - Taking an engineering approach in leading technical initiatives for automating system engineering efforts to guarantee the reliability of the global Elastic infrastructure. - Growing our global Platform infrastructure to meet the increasing scaling demands by developing and maintaining software, tooling and automations. - Collaborating in an environment with an inclusive approach, and focusing on operational excellence, and uplifting others. - Responding to and preventing repeated customer impact in response to major incidents and prioritised problem management. - Participating in an on-call rotation using a follow-the-sun model where everyone participates in it in (mostly) their working hours. Qualifications - Success and lessons of experiences from striving for 'progress not perfection' in the name of Platform reliability. - A customer-first approach in solving operational problems with a SRE perspective. - A background in software engineering to collaborate with engineers to expertly identify, implement and deliver solutions ideally using Golang. - Production experience in Public Cloud Service Providers and managing Kubernetes infrastructure at scale. - Passion for developing solutions that involve inclusive communication methods to grow and strengthen partner and team relationships. - Examples of working in distributed teams or working remotely is desirable. Bonus Points - Experience operating a SaaS product in a public cloud ideally built using Infrastructure-as-Code tooling such as Crossplane or Terraform. - Experience building or operating a Kubernetes-at-scale infrastructure, ideally across multiple cloud providers, and the vital automation to support it. - Experience with containerized services (such as Docker). - Proven experience in leading and improving alerting and major incident management standard processes metrics systems (e.g. Elastic Stack, Prometheus, Influx) to diagnose issues and quantify impacts. - Experience in system administration with professional skills in Linux on distributed systems at scale. - Experience diagnosing, designing, implementing and creating solutions with the Elastic Stack. - Experience thriving in a self-organizing and sharing in a globally distributed team environment. - Ability to strengthen team members by uplifting others with coaching and mentoring. Compensation Compensation for this role is in the form of base salary. This role does not have a variable compensation component. The typical starting salary range for new hires in this role is: - $148,300 — $185,600 CAD An employee's position within the salary range will be based on several factors including, but not limited to, relevant education, qualifications, certifications, experience, skills, geographic location, performance, and business or organizational needs. Elastic believes that employees should have the opportunity to share in the value that we create together for our shareholders. Therefore, in addition to cash compensation, this role is currently eligible to participate in Elastic's stock program. Our total rewards package also includes a company-matched Registered Retirement Savings Plan (RRSP) with dollar-for-dollar matching up to 6% of eligible earnings, along with a range of other benefits offered with a holistic emphasis on employee well-being. Benefits - Competitive pay based on the work you do here and not your previous salary. - Health coverage for you and your family in many locations. - Ability to craft your calendar with flexible locations and schedules for many roles. - Generous number of vacation days each year. - Increase your impact - We match up to $2000 (or local currency equivalent) for financial donations and service. - Up to 40 hours each year to use toward volunteer projects you love. - Embracing parenthood with a minimum of 16 weeks of parental leave. Equal Opportunity Statement Elastic is an equal opportunity/affirmative action employer committed to diversity, equity, and inclusion. Qualified applicants will receive consideration for employment without regard to race, ethnicity, color, religion, sex, pregnancy, sexual orientation, gender perception or identity, national origin, age, marital status, protected veteran status, disability status, or any other basis protected by federal, state or local law, ordinance or regulation. We welcome individuals with disabilities and strive to create an accessible and inclusive experience for all individuals. To request an accommodation during the application or the recruiting process, please email candidate_accessibility@elastic.co. We will reply to your request within 24 business hours of submission.
AI Infrastructure & Platform Operations Engineer
MirantisStrategic open source infrastructure for containers and virtual machines.
• Monitor, operate, and support production AI infrastructure platforms. • Investigate and resolve infrastructure, networking, hardware, and platform-related incidents. • Support NVIDIA GPU infrastructure and associated platform services. • Monitor and troubleshoot Kubernetes-based environments. • Investigate performance, availability, and reliability issues across infrastructure and platform components. • Collaborate with engineering teams, hardware vendors, datacenter personnel, and service delivery teams to resolve technical issues. • Participate in incident response, root cause analysis, and operational improvement activities. • Contribute to improvements in monitoring, observability, automation, and operational processes. • Maintain operational documentation, runbooks, and knowledge articles.
Software Engineer - SaaS Platform & Asset Management - Türkiye
JumpCloudAn open directory platform for secure, frictionless access from any device to any resource, anywhere
All roles at JumpCloud® are Remote unless otherwise specified in the Job Description. About JumpCloud®JumpCloud® is the AI-powered unified IT management platform designed to secure the modern workforce. By consolidating identity, device, and access management, JumpCloud provides intelligent, secure IT that scales from human users to autonomous AI agents. We help organizations around the globe eliminate complexity and turn AI risk into an optimized advantage, ensuring the right people and agents have secure access to the right resources at all times. JumpCloud is Intelligent, Secure IT. Key Responsibilities - Design, develop, and implement highly scalable backend services for our Asset & SaaS Management platform using Go (Golang). - Architect and build high-throughput microservices using gRPC and Protobuf, ensuring low-latency communication in a high-traffic distributed system. - Take ownership of the database layer using PostgreSQL; design efficient schemas, write complex optimized queries, and perform deep performance analysis to prevent bottlenecks. - Implement and maintain comprehensive observability pipelines using tools like Datadog, Prometheus, or Grafana to monitor system health, trace requests, and proactively identify performance degradation. - Design and implement asynchronous processing workflows for asset data synchronization using message brokers (e.g., Kafka, RabbitMQ, or AWS SQS). - Manage and deploy containerized applications using Kubernetes, ensuring high availability and zero-downtime deployments. - Collaborate with product managers to translate complex asset & saas management logic (lifecycle tracking, depreciation, audit trails) into robust technical solutions. - Write clean, maintainable, and well-documented code, adhering to Go best practices and effective error handling. - Participate in code reviews, providing constructive feedback and ensuring code quality, specifically looking for race conditions and memory leaks. - Troubleshoot and debug production issues in a complex microservices architecture, utilizing distributed tracing and log analysis. Qualifications - 4-6+ years of professional experience in software development, with a strong focus on backend systems and infrastructure engineering. - Strong proficiency in Go (Golang), with a deep understanding of goroutines, channels, and interface-based design patterns. - Proven experience working on high-traffic, large-scale SaaS applications where performance and concurrency are critical. - Deep expertise in PostgreSQL, including the ability to diagnose slow queries, optimize execution plans, and manage connection pools effectively. - Strong experience implementing gRPC services and defining rigid contracts using Protocol Buffers. - Hands-on experience with observability and APM tools (specifically Datadog, New Relic, or OpenTelemetry) to set up dashboards, alerts, and conduct root cause analysis. - Extensive experience with the complete DevOps lifecycle, including Git version control, CI/CD pipelines (e.g., Github, GitLab CI, Jenkins), and infrastructure-as-code. - Strong hands-on experience with Docker and Kubernetes for orchestrating services in a production environment. - Familiarity with distributed caching strategies (e.g., Redis) to offload database pressure. - Excellent problem-solving skills and the ability to work independently and as part of a team. - Strong communication and interpersonal skills. Preferred Skills - Bachelor's degree in Computer Science, Software Engineering, or a related field. - Experience in API development and API integrations - Experience with cloud platforms (AWS or GCP) - Experience with CDC (Kafka, Debezium etc) - Experience with AI-assisted development tools (e.g., GitHub Copilot, Cursor). - Contributions to open-source Go projects or libraries. Where you’ll be working/Location: JumpCloud is committed to being Remote First, meaning that you are able to work remotely within the country noted in the Job Description. You must be located in and authorized to work in the country noted in the job description to be considered for this role. Please note: There is an expectation that our engineers participate in on-call shifts. You will be expected commit to being ready and able to respond during your assigned shift, so that alerts don't go unaddressed. Language: JumpCloud has teams in 15+ countries around the world and conducts our internal business in English. The interview and any additional screening process will take place primarily in English. To be considered for a role at JumpCloud, you will be required to speak and write in English fluently. Any additional language requirements will be included in the details of the job description. Why JumpCloud? If you thrive working in a fast, SaaS-based environment and you are passionate about solving challenging technical problems, we look forward to hearing from you! JumpCloud is an incredible place to share and grow your expertise! You’ll work with amazing talent across each department who are passionate about our mission. We’re out of the box thinkers, so your unique ideas and approaches for conceiving a product and/or feature will be welcome. You’ll have a voice in the organization as you work with a seasoned executive team, a supportive board and in a proven market that our customers are excited about. One of JumpCloud's three core values is to “Build Connections.” To us that means creating " human connection with each other regardless of our backgrounds, orientations, geographies, religions, languages, gender, race, etc. We care deeply about the people that we work with and want to see everyone succeed." - Rajat Bhargava, CEO Please submit your résumé and brief explanation about yourself and why you would be a good fit for JumpCloud. Please note JumpCloud is not accepting third party resumes at this time. JumpCloud is an equal opportunity employer. All applicants will be considered for employment without attention to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran or disability status. Scam Notice: Please be aware that there are individuals and organizations that may attempt to scam job seekers by offering fraudulent employment opportunities in the name of JumpCloud. These scams may involve fake job postings, unsolicited emails, or messages claiming to be from our recruiters or hiring managers. Please note that JumpCloud will never ask for any personal account information, such as credit card details or bank account numbers, during the recruitment process. Additionally, JumpCloud will never send you a check for any equipment prior to employment. All communication related to interviews and offers from our recruiters and hiring managers will come from official company email addresses (@jumpcloud.com) and will never ask for any payment, fee to be paid or purchases to be made by the job seeker. If you are contacted by anyone claiming to represent JumpCloud and you are unsure of their authenticity, please do not provide any personal/financial information and contact us immediately at recruiting@jumpcloud.com with the subject line "Scam Notice" #LI-Remote #BI-Remote
Platform Engineer
DTCC - Depository Trust and Clearing CorporationDTCC, which stands for Depository Trust and Clearing Corporation, is a leading financial services company providing secure, efficient, and transparent post-trad
Design and maintain scalable AI/ML platforms using AWS services, manage AWS EMR clusters for data processing, and develop MLOps pipelines for model deployment, ensuring system reliability and compliance across environments.


