InvestorFlow logo
InvestorFlow

InvestorFlow is a leading provider of integrated CRM and portals for asset and investment managers.

Site Reliability Engineer II

DevOps EngineerDevOps EngineerFull TimeRemoteSeniorTeam 51-200H1B No SponsorCompany SiteLinkedIn

Location

Dominican Republic

Posted

98 days ago

Salary

0

Seniority

Senior

Bachelor Degree5 yrs expEnglishAzureGrafanaPrometheusTerraform

Job Description

Site Reliability Engineer II

InvestorFlow

• Design and implement comprehensive monitoring strategies rather than owning observability platforms outright. • Collaborate with DevOps and Engineering on shared observability platforms (Grafana, Prometheus/Loki, Azure Monitor/Application Insights). • Define golden signals dashboards, measure SLOs/SLIs/error budgets, and help implement actionable alerting. • Drive structured logging standards, distributed tracing patterns, and OpenTelemetry implementation standards for teams to deploy and SRE to validate. • Conduct monitoring/auditing of production systems to ensure instrumentation completeness. • Take ownership of production incident response, lead incident handling, and drive remediation. • Conduct blameless post-incident reviews and ensure follow-through on action items. • Continuously improve operational processes, reliability practices, and team readiness. • Monitor system resource utilization and forecast future needs. • Tune autoscaling configurations in partnership with Engineering teams. • Evaluate capacity efficiency and support cost optimization strategies. • Validate DR environments and test failover processes—not build them. • Ensure DR capabilities are functioning as-designed with clear documentation. • Define and lead regular DR drills in partnership with Engineering/Platform teams. • Work with the Non-Functional Testing team on resilience and DR scenario simulations. • Support chaos experiment planning and validation as a nice-to-have capability.

Job Requirements

  • 5+ years in Site Reliability Engineering, Production Engineering, or related operations roles.
  • Strong knowledge of cloud-native systems, preferably Microsoft Azure.
  • Experience with observability tooling (Grafana ecosystem, Prometheus/Loki, Azure Monitor, Application Insights).
  • Understanding of DR concepts, failover validation, and operational readiness.
  • Familiarity with chaos engineering practices (nice-to-have).
  • Ability to read Terraform/HCL is a plus but not required.
  • Strong grasp of SRE principles (SLOs/SLIs, error budgets, toil reduction, postmortems).
  • Strong collaboration and communication skills.
  • Mindset We Value**
  • Treat observability as a foundational product feature — not an afterthought.
  • Proactively break systems to strengthen them.
  • Automate away repetitive pain and convert incidents into lasting defenses.
  • Clearly articulate complex risks, trade-offs, and recovery approaches to both technical and non-technical stakeholders.
  • Remain composed during incidents while relentlessly focused on prevention.

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Full TimeRemoteTeam 10,001+Since 1860H1B No Sponsor

• Creating infrastructure and environments to support our platforms and applications using Terraform and related technologies to ensure all our environments are controlled and consistent. • Implementing DevOps technologies and processes, e.g: containerisation, CI/CD, infrastructure as code, metrics, monitoring etc. • Automating always. • Supporting, monitoring, maintaining and improving our infrastructure and the live running of our applications. • Maintaining the health of cloud accounts for security, cost and best practices. • Providing assistance to other functional areas such as development, test and client services.

Canada
Full TimeRemoteTeam 11-50Since 2020H1B Sponsor

• Design, build, and deliver software to enhance the availability, scalability, latency, and efficiency of Zora’s infrastructure platform • Provide technical and strategic input to shape the direction of the infrastructure platform • Operate and maintain core infrastructure systems in service of enhancing the developer experience • Automate key infrastructure workflows, including service lifecycle management and critical operational processes • Participate in the team’s on-call rotation and respond to production incidents as needed

Worldwide
$170K - $215K / year
Job Closed
Twilio logo

Software Architect, Reliability Engineering

Twilio

Build the future of communications.

DevOps Engineer99 days ago
OtherRemoteTeam 5,001-10,000H1B Sponsor

• Partner with senior technical leaders across Twilio to set and communicate the reliability strategy, translating business goals into measurable outcomes. • Influence company-wide architectural decisions while balancing long-term vision with near-term and compliance needs. • Lead the design, implementation, and operation of scalable solutions and paved roads that enable reliable, high-traffic services; • Influence company-wide architectural decisions to focus on availability, performance, resilience, and cost efficiency using Kubernetes, AWS, Terraform, and modern observability. • Ensure integrity and quality across the service lifecycle; design fault-tolerant architectures, incident response, disaster recovery, and capacity/cost management. • Collaborate with product and cross-functional teams to identify reliability risks and convert them into actionable designs, programs, and tooling. • Establish and champion reliability practices and drive systemic improvements. • Mentor and grow engineers and technical leaders • Track and apply emerging SRE, cloud, and large-scale systems best practices; introduce pragmatic innovations that improve reliability at scale.

California + 9 moreAll locations: California | Colorado | Illinois | New Jersey | New York | Maryland | Massachusetts | Minnesota | Vermont | Washington
$227.8K - $335K / year
The VOID logo

DevOps Engineer

The VOID

The VOID is the most immersive virtual reality experience ever.

DevOps Engineer99 days ago
ContractRemoteTeam 51-200H1B No Sponsor

• Standardise and automate artefact generation across multiple platforms • Develop, manage, and continuously improve end-to-end release processes • Optimise source control workflows and CI/CD pipelines • Manage and assist in microservice development and deployment life cycles • Maintain and improve build systems and infrastructure reliability • Implement and manage configuration management solutions • Apply and enforce basic security best practices across pipelines and infrastructure • Debug, troubleshoot, and resolve pipeline and infrastructure issues efficiently • Collaborate cross-functionally with engineering, QA, and production teams • Document processes and contribute to operational best practices

Arkansas + 29 moreAll locations: Arkansas | Arizona | Colorado | Florida | Georgia | Idaho | Indiana | Illinois | IO | Kansas | Maryland | Minnesota | Missouri | North Carolina | New Mexico | Nevada | Ohio | Oklahoma | Oregon | Rhode Island | South Carolina | Tennessee | Texas | Utah | Virginia | Washington | Wisconsin | West Virginia | Wyoming | Ireland
$22+ / hour
Job Closed