InvestorFlow

InvestorFlow is a leading provider of integrated CRM and portals for asset and investment managers.

Site Reliability Engineer II

DevOps EngineerDevOps EngineerFull Time Remote SeniorTeam 51-200H1B No SponsorCompany Site LinkedIn

Location

Dominican Republic

Posted

98 days ago

Salary

Seniority

Senior

Bachelor Degree5 yrs expEnglishAzure Grafana Prometheus Terraform

Job Description

• Design and implement comprehensive monitoring strategies rather than owning observability platforms outright. • Collaborate with DevOps and Engineering on shared observability platforms (Grafana, Prometheus/Loki, Azure Monitor/Application Insights). • Define golden signals dashboards, measure SLOs/SLIs/error budgets, and help implement actionable alerting. • Drive structured logging standards, distributed tracing patterns, and OpenTelemetry implementation standards for teams to deploy and SRE to validate. • Conduct monitoring/auditing of production systems to ensure instrumentation completeness. • Take ownership of production incident response, lead incident handling, and drive remediation. • Conduct blameless post-incident reviews and ensure follow-through on action items. • Continuously improve operational processes, reliability practices, and team readiness. • Monitor system resource utilization and forecast future needs. • Tune autoscaling configurations in partnership with Engineering teams. • Evaluate capacity efficiency and support cost optimization strategies. • Validate DR environments and test failover processes—not build them. • Ensure DR capabilities are functioning as-designed with clear documentation. • Define and lead regular DR drills in partnership with Engineering/Platform teams. • Work with the Non-Functional Testing team on resilience and DR scenario simulations. • Support chaos experiment planning and validation as a nice-to-have capability.

Job Requirements

5+ years in Site Reliability Engineering, Production Engineering, or related operations roles.
Strong knowledge of cloud-native systems, preferably Microsoft Azure.
Experience with observability tooling (Grafana ecosystem, Prometheus/Loki, Azure Monitor, Application Insights).
Understanding of DR concepts, failover validation, and operational readiness.
Familiarity with chaos engineering practices (nice-to-have).
Ability to read Terraform/HCL is a plus but not required.
Strong grasp of SRE principles (SLOs/SLIs, error budgets, toil reduction, postmortems).
Strong collaboration and communication skills.
Mindset We Value**
Treat observability as a foundational product feature — not an afterthought.
Proactively break systems to strengthen them.
Automate away repetitive pain and convert incidents into lasting defenses.
Clearly articulate complex risks, trade-offs, and recovery approaches to both technical and non-technical stakeholders.
Remain composed during incidents while relentlessly focused on prevention.

Related Categories

DevOps Engineer

Related Job Pages

Remote Full-time Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

DevOps Engineer

S&P Global

DevOps Engineer98 days ago

Full Time RemoteTeam 10,001+Since 1860H1B No Sponsor

Company Site LinkedIn

• Creating infrastructure and environments to support our platforms and applications using Terraform and related technologies to ensure all our environments are controlled and consistent. • Implementing DevOps technologies and processes, e.g: containerisation, CI/CD, infrastructure as code, metrics, monitoring etc. • Automating always. • Supporting, monitoring, maintaining and improving our infrastructure and the live running of our applications. • Maintaining the health of cloud accounts for security, cost and best practices. • Providing assistance to other functional areas such as development, test and client services.

AWS Chef Cloud DynamoDB EC2 Java JavaScript Linux MySQL NoSQL PHP PostgreSQL Puppet Python SQL Terraform Unix

View details: DevOps Engineer

Canada

Apply

Senior Site Reliability Engineer – SRE

ZORA

Imagine. Mint. Enjoy.

DevOps Engineer98 days ago

Full Time RemoteTeam 11-50Since 2020H1B Sponsor

Company Site LinkedIn

• Design, build, and deliver software to enhance the availability, scalability, latency, and efficiency of Zora’s infrastructure platform • Provide technical and strategic input to shape the direction of the infrastructure platform • Operate and maintain core infrastructure systems in service of enhancing the developer experience • Automate key infrastructure workflows, including service lifecycle management and critical operational processes • Participate in the team’s on-call rotation and respond to production incidents as needed

Docker IPFS Kubernetes MongoDB PostgreSQL Python

View details: Senior Site Reliability Engineer – SRE

Worldwide

$170K - $215K / year

Apply

Job Closed

Software Architect, Reliability Engineering

Twilio

Build the future of communications.

DevOps Engineer99 days ago

Other RemoteTeam 5,001-10,000H1B Sponsor

Company Site LinkedIn

• Partner with senior technical leaders across Twilio to set and communicate the reliability strategy, translating business goals into measurable outcomes. • Influence company-wide architectural decisions while balancing long-term vision with near-term and compliance needs. • Lead the design, implementation, and operation of scalable solutions and paved roads that enable reliable, high-traffic services; • Influence company-wide architectural decisions to focus on availability, performance, resilience, and cost efficiency using Kubernetes, AWS, Terraform, and modern observability. • Ensure integrity and quality across the service lifecycle; design fault-tolerant architectures, incident response, disaster recovery, and capacity/cost management. • Collaborate with product and cross-functional teams to identify reliability risks and convert them into actionable designs, programs, and tooling. • Establish and champion reliability practices and drive systemic improvements. • Mentor and grow engineers and technical leaders • Track and apply emerging SRE, cloud, and large-scale systems best practices; introduce pragmatic innovations that improve reliability at scale.

AWS Distributed Systems Grafana Java Kubernetes Microservices Prometheus Python Terraform

View details: Software Architect, Reliability Engineering

California + 9 more

$227.8K - $335K / year

Apply

DevOps Engineer

The VOID

The VOID is the most immersive virtual reality experience ever.

DevOps Engineer99 days ago

Contract RemoteTeam 51-200H1B No Sponsor

Company Site LinkedIn

• Standardise and automate artefact generation across multiple platforms • Develop, manage, and continuously improve end-to-end release processes • Optimise source control workflows and CI/CD pipelines • Manage and assist in microservice development and deployment life cycles • Maintain and improve build systems and infrastructure reliability • Implement and manage configuration management solutions • Apply and enforce basic security best practices across pipelines and infrastructure • Debug, troubleshoot, and resolve pipeline and infrastructure issues efficiently • Collaborate cross-functionally with engineering, QA, and production teams • Document processes and contribute to operational best practices

Microservices

View details: DevOps Engineer

Arkansas + 29 more

$22+ / hour

Apply

Job Closed

Site Reliability Engineer II

Job Description

Job Requirements

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

DevOps Engineer

Senior Site Reliability Engineer – SRE

Software Architect, Reliability Engineering

DevOps Engineer