Job Closed
This listing is no longer active.
Established in 1986, SS&C Technologies is a leading global provider of services and software for the global financial services industry. Committed to helping cl
Senior Site Reliability Engineer
Location
Florida
Posted
101 days ago
Salary
0
Seniority
Senior
Job Description
Senior Site Reliability Engineer
SS&C Technologies
• Collaborate with Technology Infrastructure teams to build and operate reusable, cloud-native platforms that abstract complexity and accelerate delivery while incorporating reliability from design through operations. • Work with business units and technical teams to improve application availability, observability, and reliability as our business applications are migrated to the Private Cloud. • Enhance platform reliability through automatic problem detection, self-healing systems, and well-architected notification and escalation protocols. • Use SLOs, SLIs, and KPIs to guide prioritization, measure impact, and drive continuous improvement. • Eliminate toil using intelligent automation and agentic workflows. • Conduct blameless retrospectives and share learnings across the organization. • Foster a culture of ownership, positive thinking, and continuous learning while remaining grounded in practicality, experimentation, and engineering excellence. • Integrate DevSecOps, zero-trust principles, and policy-as-code into every pipeline. • Produce and promote Architecture Decision Records (ADRs) and Cloud Well-Architected Frameworks that our business units can adopt to improve our technology standardization. • Maintain 24x5 active coverage with seamless regional handoffs and weekend escalation protocols.
Job Requirements
- 5+ years of professional experience in a SRE role, with 3+ years in financial services or other regulated industries preferred.
- Minimum Bachelor’s degree in Computer Science, Engineering, or a related field.
- Proven expertise in architecting, designing and operating private cloud environments (e.g., VMware, OpenStack, OpenShift Virtualization) and Kubernetes clusters from a micro to a global scale.
- Hands-on experience with building, deploying, and operating infrastructure as code platforms, CI/CD pipelines, and observability platforms (e.g., Prometheus, Splunk).
- Strong understanding of modern systems reliability standards and practices, including establishing KPIs, monitoring and reporting on SLAs and SLOs, and sorting through the noise to establish actionable insights.
- Familiarity with various financial services regulatory frameworks and their impact on infrastructure design and operations.
- Familiarity with structured naming conventions and asset management for global infrastructure.
- Experience with financial-grade network segmentation, micro-segmentation, and zero-trust architecture.
- Certifications such as TOGAF, AWS Certified Solutions Architect, VMware VCP, or Red Hat Certified Architect are a plus.
- Familiarity with ISO 27001, NIST 800-53, and other security frameworks is a plus.
Benefits
- Flexibility: Hybrid Work Model & a Business Casual Dress Code, including jeans
- 401k Matching Program, Professional Development Reimbursement
- Flexible Personal/Vacation Time Off, Sick Leave, Paid Holidays
- Medical, Dental, Vision, Employee Assistance Program, Parental Leave
- Discounts on fitness clubs, travel and more!
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Senior DevOps Engineer
FetLifeOur mission is to help everyone feel comfortable with who they are sexually by connecting and educating kinksters in a safe, open, and supportive environment.
At FetLife we're looking for a Senior DevOps Engineer to help us better serve the community. The Job As a DevOps Engineer, you'll be responsible for: Upgrading and improving our infrastructure Reducing noise through smarter alerting and root-cause fixes Handling daily infra tasks (K8s issues, deployment health, etc.) Joining the on-call rotation and making it smoother through automation Strengthening security and disaster recovery Tuning and maintaining our databases Currently, our platform is built as a majestic Rails monolith, using Vue.js with Typescript on the front-end, and enhanced by Rust for select services and gems. More details about our tech stack: Testing is done with RSpec & Capybara Continuous integration and deployment are done with GitHub Actions PostgreSQL for our main databases ScyllaDB for our activity feeds Redis for session storage, queue management, and caching ElasticSearch for full-text search DevOps using Containers orchestrated with Kubernetes and Helm Monitoring and alerting is done with DataDog, New Relic, and Sentry Infrastructure managed with Terraform Hosting on Google Cloud CDN and endpoint protection using Cloudflare and Fastly About You We're looking for someone who has proven experience maintaining large production-level Ruby on Rails applications. Ideally, you have experience: Running infrastructure on Linux, Kubernetes, and Google Cloud/AWS/Azure Managing databases like PostgreSQL, and Redis Supporting high-traffic, consumer-facing applications Bonus points if you have experience with: ScyllaDB and Elasticsearch Ruby on Rails and/or Rust Managing CDN and WAF setups Additionally, since we're 100% remote, we: Highly value strong written communicators Require three hours overlap any time between 10 AM and 6 PM CET About Us FetLife is the largest kinky social network on this side of the Milky Way. We: Have over 10 million members and growing Grew 100% by word-of-mouth Serve over 3 billion requests a day You can find our team and core values at: https://fetlife.com/team. Pay & Benefits We use a standardised salary calculator for each position to ensure we are competitive, fair, and consistent. For this specific role, the rate is between $115k - $180k USD / year. Additionally, we offer: Paid time off: 2 weeks vacation 5 statutory holidays (e.g. Easter & Thanksgiving) 2 weeks during Christmas 4-day workweeks during the summer months (July & August) Annual company retreat (e.g. Malaga, Miami, Vancouver, and Montreal) Annual anniversary gift (200$ USD for every year with us, e.g. 4th anniversary is $800) Monthly streaming music subscription reimbursement Fully paid maternity and paternity leave How to Apply Send an email to jointheteam+do+builtin@fetlife.com. In the email, please include: Brief introduction of yourself Tell us about 2-3 of your favourite projects you worked on Link to your GitHub or GitLab account Link to your LinkedIn profile -and/or- a copy of your resume in PDF format If you have any questions or concerns, please don't hesitate to email us at jointheteam+do+builtin@fetlife.com! Hiring Process We review every application carefully and if we believe that you might be a good fit for this role, we'll get back to you within 1-2 weeks of you applying. The interview process includes an initial screening interview, technical interview, take home paid project, and a project presentation call.
Site Reliability Engineer (Senior or Staff), Fabric
MongoDBMongoDB, originally called 10gen, is a software development company. Since 2007, MongoDB has created an open-source, document-oriented database to help clients
The Team Platform Engineering is the department within SRE that is responsible for a range of critical infrastructure and operational functions that support the broader engineering organization. Among these are our multi-cloud-provider Kubernetes infrastructure, deployment machinery, and observability and alerting systems. The Fabric team manages the infrastructure that enables secure communication between systems and from the public internet. Their responsibilities encompass network architecture, service mesh, and edge load balancing, ensuring customer data remains safe in transit. The team plays a crucial role in developing and maintaining the reliable and globally connected multi-cloud network that supports MongoDB products. This role can be based out of our Austin, Boston, Los Angeles, New York City, Raleigh, or San Francisco offices or remotely in the United States region. Role Overview We are seeking a talented Site Reliability Engineer (SRE) with a strong networking background to join the Fabric team. This role is pivotal in building and maintaining the robust infrastructure necessary for secure and efficient communication between our services. As an SRE on the Fabric team, you will leverage your expertise in networking, distributed systems, and automation to ensure our systems are resilient, scalable, and reliable. The ideal candidate should Have 6+ years of experience working on software and operating distributed systems, with deep expertise in networking fundamentals and a good understanding of how the internet works, e.g. TCP/IP (including IPv6), DNS, TLS/mTLS, BGP, tunnels, overlays, and SDN principles Possess a customer-focused mindset, driving improvements that benefit end-users Value efficiency in processes and operations, and display a strong preference for automation over manual processes (“allergic to ops work”) Be intimately familiar with modern cloud-based infrastructure and the network design primitives of at least one of AWS, Azure, or GCP, e.g. VPCs, subnetting, routing, VPNs, peering, private link / private service connect, and CDNs Have a strong knowledge of service mesh and load-balancing concepts, and be eager to implement these in a multi-cloud environment Expectations Participate in the development of a reliable and resilient multi-cloud globally-connected network that is crucial for MongoDB’s services Collaborate with service-owning teams to provide internal support, addressing technical issues and offering guidance on best practices for service-to-service connectivity Participate in a 24/7 on-call rotation to swiftly resolve issues related to network architecture and service-to-service connectivity, ensuring minimal disruption and high availability About MongoDB MongoDB is built for change, empowering our customers and our people to innovate at the speed of the market. We have redefined the database for the AI era, enabling innovators to create, transform, and disrupt industries with software. MongoDB’s unified database platform—the most widely available, globally distributed database on the market—helps organizations modernize legacy workloads, embrace innovation, and unleash AI. Our cloud-native platform, MongoDB Atlas, is the only globally distributed, multi-cloud database and is available across AWS, Google Cloud, and Microsoft Azure. With offices worldwide and nearly 60,000 customers—including 75% of the Fortune 100 and AI-native startups—relying on MongoDB for their most important applications, we’re powering the next era of software. Our compass at MongoDB is our Leadership Commitment, guiding how and why we make decisions, show up for each other, and win. It’s what makes us MongoDB. To drive the personal growth and business impact of our employees, we’re committed to developing a supportive and enriching culture for everyone. From employee affinity groups, to fertility assistance and a generous parental leave policy, we value our employees’ wellbeing and want to support them along every step of their professional and personal journeys. Learn more about what it’s like to work at MongoDB, and help us make an impact on the world! MongoDB is committed to providing any necessary accommodations for individuals with disabilities within our application and interview process. To request an accommodation due to a disability, please inform your recruiter. MongoDB, Inc. provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type and makes all hiring decisions without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws. REQ ID: 2263172127 MongoDB’s base salary range for this role is posted below. Compensation at the time of offer is unique to each candidate and based on a variety of factors such as skill set, experience, qualifications, and work location. Salary is one part of MongoDB’s total compensation and benefits package. Other benefits for eligible employees may include: equity, participation in the employee stock purchase program, flexible paid time off, 20 weeks fully-paid gender-neutral parental leave, fertility and adoption assistance, 401(k) plan, mental health counseling, access to transgender-inclusive health insurance coverage, and health benefits offerings. Please note, the base salary range listed below and the benefits in this paragraph are only applicable to U.S.-based candidates. MongoDB’s base salary range for this role in the U.S. is: $127,000 — $249,000 USD
• As a DevOps leader, you will be responsible for building and optimizing CI/CD pipelines • Strengthening cloud infrastructure • Ensuring seamless collaboration between development and operations teams • Defining DevOps best practices • Improving deployment velocity • Enhancing system reliability and reducing operational overhead through automation. • Troubleshoot production issues and lead root cause analysis efforts • Continuously improve DevOps processes, tools, and governance standards
Site Reliability Engineer
CoalitionCoalition is a cybersecurity company dedicated to partnering with clients to help them prevent and mitigate losses. Coalition helps small and medium-sized businesses around the wor
We are looking for a Site Reliability Engineer to join our Platform SRE team. In this role, you will build and operate the infrastructure, tools, and "paved roads" that empower our developers to deliver scalable, secure, and reliable software with speed and confidence. You’ll work across the entire stack—from infrastructure automation and observability to developer enablement and system reliability. You will be a key collaborator with software engineering and security teams, helping to evolve our Infrastructure as Code (IaC), enhance CI/CD pipelines, and scale our internal developer platform. We value pragmatism and engineering excellence, primarily using Python, Go, and AWS to reduce toil and build self-service capabilities. Infrastructure Automation: Design, build, and scale production environments using AWS and Terraform. System Reliability: Improve the resilience and operability of our platform through failure-based testing and automated recovery strategies. Developer Enablement: Design and implement reusable platform components and self-service tools to streamline the developer experience. Observability: Implement and maintain robust observability practices, including system metrics, distributed tracing, and SLO management. Mentorship & Standards: Guide junior engineers, uphold high infrastructure quality, and contribute to the team’s evolving best practices. Collaboration: Participate in technical design discussions, sharing feedback and adapting strategies based on team input and evolving requirements.



