MongoDB logo
MongoDB

MongoDB, originally called 10gen, is a software development company. Since 2007, MongoDB has created an open-source, document-oriented database to help clients

Senior Site Reliability Engineer

Location

Canada

Posted

54 days ago

Salary

C$144K - C$200K / year

Seniority

Senior

Job Description

Senior Site Reliability Engineer

MongoDB

Role Description MongoDB’s Storage Layer Services (SLS) team is re-architecting the MongoDB cloud storage layer and sits at the heart of our next-generation cloud storage architecture. This relatively new team is building performant, multi-tenant distributed storage services that both enhance today’s Atlas storage stack and enable more customer workloads to run more efficiently. You will partner with the teams building these storage services to define SLOs, shape capacity plans, and ensure the reliability, durability, and operational safety of the storage layer that underpins Atlas. You’ll join a small, senior team of SREs as founding members of this organization, playing a crucial role in executing on a multi-year roadmap for MongoDB’s cloud storage architecture. This role can be based out of our Toronto or Montreal office or remotely in Canada while physically based in an Eastern or Central time zone location. Qualifications - 6+ years of experience working on software development and operating distributed systems - Proficiency in Python, Go, or a similar language - Experience operating or supporting stateful storage or database systems at scale - Comfortable with durability, consistency, and recovery trade-offs - Possess a customer-focused mindset - Value efficiency in processes and operations - Prefer automation over manual processes - Experience using and extending containerization technologies, particularly Kubernetes - Expertise in cloud infrastructure platforms, including AWS, Google Cloud Platform (GCP), or Azure - Understanding of Linux operating system internals and networking concepts (e.g., TCP/IP, DNS, TLS, routing) Requirements - Work on multi-tenant distributed storage systems, balancing long-term strategic infrastructure goals with immediate engineering needs - Build for reliability, making services and infrastructure available, resilient, fault-tolerant, and self-healing - Identify and configure key metrics to detect incidents and quantify service health, availability, and performance - Participate in a 24/7 on-call rotation to resolve issues involving the storage infrastructure - Become an expert in infrastructure performance, helping optimize from the application level all the way to the kernel Benefits - Equity - Participation in the employee stock purchase program - Flexible paid time off - 20 weeks fully-paid gender-neutral parental leave - Fertility and adoption assistance - Registered Retirement Savings Plan (RRSP) with employer match - Mental health counseling - Backup child and elder care - Health, dental, and vision benefits offerings Company Description MongoDB is built for change, empowering our customers and our people to innovate at the speed of the market. We have redefined the database for the AI era, enabling innovators to create, transform, and disrupt industries with software. MongoDB’s unified database platform, the most widely available, globally distributed database on the market, helps organizations modernize legacy workloads, embrace innovation, and unleash AI. With offices worldwide and over 60,000 customers, including 75% of the Fortune 100 and AI-native startups, relying on MongoDB for their most important applications, we’re powering the next era of software. Our compass at MongoDB is our Leadership Commitment, guiding how and why we make decisions, show up for each other, and win. It’s what makes us MongoDB. To drive the personal growth and business impact of our employees, we’re committed to developing a supportive and enriching culture for everyone.

Related Categories

Related Job Pages

More Engineer Jobs

EnCharge AI logo

Senior Emulation Engineer

EnCharge AI

Where the future of AI compute is being defined and built, to unlock new levels of machine intelligence.

Engineer54 days ago
Full TimeRemoteTeam 11-50Since 2022H1B Sponsor

Role Description At EnCharge AI, we are building the next generation of AI compute silicon — purpose-built for high-performance, low-power, and scalable AI inference. As an Emulation Engineer, you will play a critical role in validating complex AI accelerator architectures on emulation platforms before tape-out. This position is ideal for someone passionate about bridging the gap between hardware and software in fast-paced, deep tech environments. - Set up and maintain Siemens Veloce emulation and prototyping platforms - Adapt SoC designs for Emulation and Prototyping - Develop and debug emulation testbenches and system-level environments - Support pre-silicon validation, power/performance analysis, and early software bring-up. Participate in silicon bring-up and validation. - Collaborate with design and verification teams to isolate design issues and accelerate debug. - Optimize performance of the emulation workloads and reduce turnaround time. - Work with firmware/software teams to enable use of emulators for OS and driver testing. Qualifications - BS/MS/Ph.D. in EE, CS, or related field with 7+ years of SoC design experience. - Experience with emulation platforms (Veloce, Palladium, or ZeBu) and FPGA-based prototyping systems (proFPGA, HAPS, or Protium) - Experience with emulating high speed I/O interfaces such as PCIe and UCIe and memory technologies such as LPDDR and HBM - Solid understanding of digital design, RTL (Verilog/SystemVerilog), and SoC architecture. - Proficiency in hardware debug tools, waveform viewers, and logic analyzers - Scripting skills (e.g., Python, Tcl, Perl) for automation and infrastructure development - Familiarity with UVM, simulation, and testbench environments. SystemVerilog and UVM-based verification experience a plus - Hand-on software development experience with C/C++ is a plus

Worldwide
MongoDB logo

Staff Site Reliability Engineer

MongoDB

MongoDB, originally called 10gen, is a software development company. Since 2007, MongoDB has created an open-source, document-oriented database to help clients

Engineer54 days ago

Role Description We are seeking a talented Site Reliability Engineer (SRE) with a strong networking background to join the Fabric team. This role is pivotal in building and maintaining the robust infrastructure necessary for secure and efficient communication between our services. As an SRE on the Fabric team, you will leverage your expertise in networking, distributed systems, and automation to ensure our systems are resilient, scalable, and reliable. Qualifications - 10+ years of experience working on software and operating distributed systems. - Deep expertise in networking fundamentals and a good understanding of how the internet works (e.g., TCP/IP, DNS, TLS/mTLS, BGP, tunnels, overlays, and SDN principles). - Possess a customer-focused mindset, driving improvements that benefit end-users. - Value efficiency in processes and operations, with a strong preference for automation over manual processes. - Familiar with modern cloud-based infrastructure and the network design primitives of at least one of AWS, Azure, or GCP (e.g., VPCs, subnetting, routing, VPNs, peering, private link/private service connect, and CDNs). - Strong knowledge of service mesh and load-balancing concepts, eager to implement these in a multi-cloud environment. Requirements - Participate in the development of a reliable and resilient multi-cloud globally-connected network that is crucial for MongoDB’s services. - Collaborate with service-owning teams to provide internal support, addressing technical issues and offering guidance on best practices for service-to-service connectivity. - Participate in a 24/7 on-call rotation to swiftly resolve issues related to network architecture and service-to-service connectivity, ensuring minimal disruption and high availability. Benefits - Equity participation. - Participation in the employee stock purchase program. - Flexible paid time off. - 20 weeks fully-paid gender-neutral parental leave. - Fertility and adoption assistance. - 401(k) plan. - Mental health counseling. - Access to transgender-inclusive health insurance coverage. - Health benefits offerings.

United States
$127K - $249K / year
Capgemini logo

FBS MLOps Engineer Manager

Capgemini

Founded in 1967, Capgemini is revered as one of the world's leading consulting, technology, and outsourcing agencies. In 2016 alone, the company reported global

Engineer54 days ago

FBS – Farmer Business Services is part of Farmers operations with the purpose of building a global approach to identifying, recruiting, hiring, and retaining top talent. By combining international reach with US expertise, we build diverse and high-performing teams that are equipped to thrive in today’s competitive marketplace. We believe that the foundation of every successful business lies in having the right people with the right skills. That is where we come in—helping Farmers build a winning team that delivers consistent and sustainable results. Since we don’t have a local legal entity, we’ve partnered with Capgemini, which acts as the Employer of Record. Capgemini is responsible for managing local payroll and benefits. What to expect on your journey with us: - A solid and innovative company with a strong market presence - A dynamic, diverse, and multicultural work environment - Leaders with deep market knowledge and strategic vision - Continuous learning and development This position leads the deployment, implementation, and optimization of machine learning pipelines to solve complex business challenges. The role involves both hands-on work and supervising a team to deliver effective machine learning engineering solutions for a line of business. The position applies in-depth knowledge of policies, procedures, and business objectives to make decisions and guide team. Performs work independently while receiving limited guidance Key Responsibilities - Delivers machine learning ops engineering tasks such as deployment, implementation, optimization, and maintenance of machine learning pipelines and models. - Ensures pipelines support efficient data ingestion, preprocessing, model training, validation, deployment and monitoring. - Implements scalable and robust machine learning solutions that can handle large volumes of data and complex models. - Implements real-time inference with high availability and low latency. - Creates strategic plans within span of control and implements them across one to two business domains. - Ensures seamless integration of pipelines with continuous integration and continuous deployment (CI/CD) tools and workflows. - Supporting and maintaining solutions in production (fixing bugs, make changes as required, maintaining models)   - Collaborates with cross-functional teams to integrate machine learning and business logic-based solutions into production systems - Effectively communicates and applies machine learning engineering value, concepts, and strategies in various scenarios with stakeholders - Recruits, hires, and mentors' top talent to build a high-performing MLOps team. Supervises, coaches, and guides direct reports - Uses advanced knowledge of code management principles to follow architectural and governance guidelines

Brazil
Capgemini logo

FBS MLOps Engineer Manager

Capgemini

Founded in 1967, Capgemini is revered as one of the world's leading consulting, technology, and outsourcing agencies. In 2016 alone, the company reported global

Engineer54 days ago

FBS – Farmer Business Services is part of Farmers operations with the purpose of building a global approach to identifying, recruiting, hiring, and retaining top talent. By combining international reach with US expertise, we build diverse and high-performing teams that are equipped to thrive in today’s competitive marketplace. We believe that the foundation of every successful business lies in having the right people with the right skills. That is where we come in—helping Farmers build a winning team that delivers consistent and sustainable results. Since we don’t have a local legal entity, we’ve partnered with Capgemini, which acts as the Employer of Record. Capgemini is responsible for managing local payroll and benefits. What to expect on your journey with us: - A solid and innovative company with a strong market presence - A dynamic, diverse, and multicultural work environment - Leaders with deep market knowledge and strategic vision - Continuous learning and development This position leads the deployment, implementation, and optimization of machine learning pipelines to solve complex business challenges. The role involves both hands-on work and supervising a team to deliver effective machine learning engineering solutions for a line of business. The position applies in-depth knowledge of policies, procedures, and business objectives to make decisions and guide team. Performs work independently while receiving limited guidance Key Responsibilities - Delivers machine learning ops engineering tasks such as deployment, implementation, optimization, and maintenance of machine learning pipelines and models. - Ensures pipelines support efficient data ingestion, preprocessing, model training, validation, deployment and monitoring. - Implements scalable and robust machine learning solutions that can handle large volumes of data and complex models. - Implements real-time inference with high availability and low latency. - Creates strategic plans within span of control and implements them across one to two business domains. - Ensures seamless integration of pipelines with continuous integration and continuous deployment (CI/CD) tools and workflows. - Supporting and maintaining solutions in production (fixing bugs, make changes as required, maintaining models)   - Collaborates with cross-functional teams to integrate machine learning and business logic-based solutions into production systems - Effectively communicates and applies machine learning engineering value, concepts, and strategies in various scenarios with stakeholders - Recruits, hires, and mentors' top talent to build a high-performing MLOps team. Supervises, coaches, and guides direct reports - Uses advanced knowledge of code management principles to follow architectural and governance guidelines

Mexico