Job Closed

This listing is no longer active.

Eltropy is on a mission to disrupt the way people access financial services. Eltropy enables financial institutions to digitally engage in a secure and compliant way. Using our world-class digital communications platform, community financial institutions can improve operations, engagement, and productivity. CFIs (Community Banks and Credit Unions) use Eltropy to communicate with consumers via Text, Video, Secure Chat, co-browsing, screen sharing, and chatbot technology — all integrated in a single platform bolstered by AI, skill-based routing, and other contact center capabilities. Customers are our North Star No Fear - Tell the truth Team of Owners Eltropy is an equal opportunity employer. All applicants will be considered for employment without attention to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran or disability status.

Senior Manager, Site Reliability Engineer

DevOps EngineerDevOps EngineerOther RemoteTeam 51-200

Location

United States

Posted

104 days ago

Salary

No structured requirement data.

Job Description

About the Role We are seeking a Senior Manger of Site Reliability Engineering to lead and scale our SRE function, ensuring the reliability, availability, performance, and efficiency of our critical systems. This role blends deep technical expertise with strategic leadership, partnering closely with Engineering, Product, Security, and Infrastructure teams to build resilient, scalable platforms that support business growth. As a Senior Manager of SRE, you will define reliability standards, establish operational excellence, and foster a culture of automation, observability, and continuous improvement. Key Responsibilities Leadership & Strategy - Define and execute the SRE vision, strategy, and roadmap aligned with business objectives - Build, mentor, and lead a high-performing team of SRE managers and engineers - Establish best practices for reliability, incident management, change management, and capacity planning - Serve as a senior technical leader and trusted advisor across the organization Reliability & Operations - Own system reliability metrics, including SLIs, SLOs, and error budgets - Lead major incident response, post-incident reviews, and long-term remediation efforts - Drive improvements in uptime, latency, scalability, and fault tolerance across Architecture & Engineering Excellence - Influence system architecture to improve resilience, scalability, and operability - Champion automation, Infrastructure as Code, and self-service platforms - Oversee observability strategy (monitoring, logging, tracing, alerting) - Ensure systems are designed for high availability, disaster recovery, and business continuity Collaboration & Governance - Partner with Product, Platform, Security, and Compliance teams to meet operational and regulatory requirements - Define operational standards, runbooks, and on-call practices - Communicate reliability risks, tradeoffs, and performance to executive leadership Required Qualifications - 8+ years of experience in Site Reliability Engineering, DevOps, or Production Engineering - 3+ years in engineering leadership roles - Strong background in distributed systems, cloud platforms (AWS, GCP, Azure), and container orchestration (Kubernetes) - Hands-on experience with CI/CD, Infrastructure as Code (e.g., Terraform, CloudFormation), and automation - Proven experience defining and operating SLOs, SLIs, and error budgets - Excellent incident management and root cause analysis skills - Strong communication skills with the ability to influence technical and non-technical stakeholders Preferred Qualifications - Experience supporting large-scale, high-traffic, or mission-critical systems - Background in software engineering or systems engineering - Experience scaling SRE practices in a fast-growing organization - Familiarity with security, compliance, and regulatory requirements - Bachelor’s or Master’s degree in Computer Science or a related field (or equivalent experience) Location: Remote Compensation: $200,000-$220,000 (Base) About Eltropy (www.eltropy.com) Eltropy is on a mission to disrupt the way people access financial services. Eltropy enables financial institutions to digitally engage in a secure and compliant way. Using our world-class digital communications platform, community financial institutions can improve operations, engagement and productivity. CFIs (Community Banks and Credit Unions) use Eltropy to communicate with consumers via Text, Video, Secure Chat, co-browsing, screen sharing and chatbot technology — all integrated in a single platform bolstered by AI, skill-based routing and other contact center capabilities. Eltropy Values: - Customers are our North Star - No Fear - Tell the truth - Team of Owners Eltropy is an equal opportunity employer. All applicants will be considered for employment without attention to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran or disability status.

Related Categories

DevOps Engineer

Related Job Pages

More Remote Jobs

More DevOps Engineer Jobs

Site Reliability Engineer

Veriff

Veriff is an industry leader in online identity verification, helping businesses achieve greater levels of trust.

DevOps Engineer104 days ago

Full Time RemoteTeam 501-1,000Since 2015H1B No Sponsor

Company Site LinkedIn

• Advocating for SRE principles, fostering a culture of reliability, observability, and operational excellence across teams. • Collaborating with developers, product managers, and service teams to streamline processes and resolve blockers. • Driving improvements in observability, including logs, traces, and metrics, ensuring proactive issue detection and resolution. • Defining and implementing SLIs and SLOs to measure and maintain system reliability aligned with business objectives. • Promoting the use of Error Budgets and leading post-incident reviews to foster a learning culture and continuous improvement. • Assessing systems and processes to identify opportunities for scalability and reliability enhancements. • Inspiring teams to take ownership of service reliability and driving alignment on shared goals.

JavaScript Node.js Python

View details: Site Reliability Engineer

Estonia

Apply

Job Closed

Site Reliability Engineer

Sprout Social, Inc.

See social differently.

DevOps Engineer104 days ago

Other RemoteTeam 1,001-5,000Since 2010H1B No Sponsor

Company Site LinkedIn

• Design and build reliable, scalable, and performant systems that support Sprout’s global 30,000+ customer base across 100+ countries • Drive infrastructure initiatives that enable product teams to deliver value quickly and safely through shared, production-ready tools and platforms (“Paved Roads”) • Work to improve Sprout’s security posture through automation, auditability, and clear processes in order to build sustainable and secure solutions • Collaborate cross-functionally with product, site reliability engineering, data platform, and GRC teams to deliver scalable, secure-by-default infrastructure • Investigate and learn from system failures and incidents to improve overall system resilience • Contribute to security tooling deployments and maintenance to improve overall security posture.

Ansible AWS Chef Java Jenkins Linux Python Ruby SaltStack Terraform Unix

View details: Site Reliability Engineer

United States

$125.6K - $172.7K / year

Apply

Job Closed

DevOps Team Lead

Cisive

We are a comprehensive global background screening firm offering onboarding, drug testing, & risk mitigation solutions.

DevOps Engineer104 days ago

Other RemoteTeam 1,001-5,000Since 1977H1B Sponsor

Company Site LinkedIn

• Lead the design, implementation, and maintenance of CI/CD pipelines using Azure DevOps. • Own and manage cloud infrastructure in production environments, primarily within Microsoft Azure. • Design, write, and maintain Infrastructure as Code using Bicep and ARM templates. • Apply cloud-agnostic DevOps principles and best practices across infrastructure and deployment pipelines. • Establish DevOps best practices for build, release, deployment, monitoring, and operational excellence. • Ensure high availability, scalability, security, and performance of systems. • Collaborate with development teams to streamline deployment workflows and improve developer productivity. • Monitor and optimize system reliability, performance, and cost efficiency. • Lead incident response activities, including root cause analysis and continuous improvement processes. • Mentor, coach, and provide technical guidance to DevOps engineers. • Document architecture, operational procedures, and technical processes. • Partner with security teams to ensure compliance and implement secure deployment methodologies. • Perform other duties as assigned.

AWS Azure Firewalls Jenkins Terraform

View details: DevOps Team Lead

Maryland + 4 more

Apply

Job Closed

System Administrator

ECS Tech Inc

All candidates must meet the following criteria: Must be a US Citizen, no dual Citizenships. Must be able to secure a Public trust clearance. Must be able to work across multiple programs across the Federal and DOD space. The core values that ECS looks for in an engagement manager include: Teamwork, Respect, Accountability, Integrity, and Leadership.

DevOps Engineer104 days ago

Other RemoteH1B No Sponsor

Company Site LinkedIn

The Application Administrator for the DISA Storefront platform is responsible for the technical support, maintenance, patching, and optimization of the application system. This role ensures the high availability and efficiency of the Storefront, which serves as a key, Amazon-like single point of entry for DoW service provisioning. The administrator will manage the application across Windows and Linux environments, apply critical Oracle patches, and provide proactive support to users, including on-call support for emergency, after-hours maintenance. Perform routine, timely updates, and patches to application systems, specifically including Windows, Linux, and Oracle databases/middleware (WebLogic, etc.). Monitor system performance and proactively maintain high availability, ensuring the Storefront remains operational with minimal, authorized downtime. Analyze server resource consumption and tune applications/databases to maintain peak efficiency. Respond to customer inquiries regarding system status, functionality, and performance in a professional manner. Provide emergency response if the system is unavailable (on-call rotation). Must be available for occasional nights/weekends for authorized maintenance, security patching, and system upgrades. Apply DISA Security Technical Implementation Guides (STIGs) and Information Assurance Vulnerability Alerts (IAVAs) to maintain system security. Maintain comprehensive technical documentation of configurations, procedures, and system changes.

View details: System Administrator

United States

Apply

Job Closed

Senior Manager, Site Reliability Engineer

Job Description

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Site Reliability Engineer

Site Reliability Engineer

DevOps Team Lead

System Administrator