Sigma Software Group

We support enterprises, product houses, and startups with custom software solutions development and IT consulting.

Principal Site Reliability Engineer

DevOps EngineerDevOps EngineerFull Time Remote LeadTeam 1,001-5,000Since 2002H1B No SponsorCompany Site LinkedIn

Location

Brazil

Posted

108 days ago

Salary

Seniority

Lead

Bachelor Degree8 yrs expEnglishAWS Python Terraform

Job Description

• Define and lead infrastructure and reliability strategy across the platform • Design scalable, resilient systems in collaboration with engineering teams • Optimize build, testing, and deployment processes for speed and stability • Establish and uphold best practices for CI/CD, monitoring, and observability • Lead incident response and drive continuous improvement post‑incident • Automate workflows to reduce operational toil and risk • Mentor engineers and foster a culture of operational excellence • Make strategic build‑vs‑buy decisions balancing speed, quality, and sustainability

Job Requirements

At least 8 years of experience in Site Reliability Engineering or DevOps roles, including 2+ years in a Principal or Lead position
Proven experience in infrastructure modernization and scaling initiatives for high‑growth environments
Strong proficiency in Python
Deep expertise in cloud platforms and container orchestration tools such as AWS ECS and EKS
Solid experience in CI/CD pipeline design and optimization using tools like GitHub Actions and Buildkite
Proficiency in infrastructure‑as‑code tools such as Terraform
Strong knowledge of monitoring, observability, and performance optimization practices
Upper-Intermediate level of spoken and written English

Benefits

Health insurance
Retirement plans
Paid time off
Flexible work arrangements
Professional development

Related Categories

DevOps Engineer

Related Job Pages

Remote Full-time Jobs (US)Remote Python Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

DevSecOps Engineer

Valer

One Platform, Built Around You™.

DevOps Engineer108 days ago

Other RemoteTeam 1-10H1B No Sponsor

Company Site LinkedIn

At Voluware, we strive to revolutionize healthcare through technology, making a tangible impact on the lives of providers and patients alike. Our flagship product, VALER®, is a cloud-based application designed to automate health care revenue operations and enhance connectivity between health insurers and providers. We are on the lookout for a skilled DevSecOps Engineer to fortify our infrastructure and security posture, ensuring our applications remain reliable and secure. Joining our innovative team means engaging with challenging problems, collaborating with other unconventional engineers, and working directly with users to continuously improve the software that is integral to their success.

View details: DevSecOps Engineer

California

Apply

Job Closed

Data Platform Reliability Engineer, Postgres

Supabase

Build in a weekend. Scale to millions.

DevOps Engineer108 days ago

Full Time RemoteTeam 51-200Since 2020H1B No Sponsor

Company Site LinkedIn

• Manage the lifecycle of Postgres databases - platform RDS clusters and customer project databases. • Design and execute strategies for low-downtime major version upgrades and database migrations. • Proactively identify and resolve database performance issues before they impact users. • Build and maintain comprehensive monitoring, alerting, and observability for database systems. • Write detailed run books, technical documentation, and operational guides. • Identify reliability risks and implement preventative measures. • Participate in on-call rotation to support our global platform. • Work with development teams to optimize database schema and query patterns. • Analyze and optimize slow queries, connection pooling, and resource utilization. • Tune Postgres configurations for different workload patterns. • Monitor and address database bloat, vacuum strategies, and WAL management. • Partner with platform engineers, product teams, and SREs to deliver reliable database services. • Communicate database changes and maintenance windows clearly to stakeholders. • Share knowledge and mentor team members on Postgres best practices.

AWS Azure GCP PostgreSQL SQL TypeScript

View details: Data Platform Reliability Engineer, Postgres

Worldwide

Apply

Job Closed

DevOps Engineer

Jetbro

Hiring now

DevOps Engineer108 days ago

Contract RemoteTeam 11-50Since 2011H1B No Sponsor

Company Site LinkedIn

• Audit Prometheus scrape targets, exporters, and metric endpoints • Review Grafana dashboards, alert rules, and data sources • Assess log coverage across Kibana and Loki • Map monitoring coverage across application, infrastructure, database, ingress, and platform layers • Identify missing exporters, stale dashboards, broken panels, and alert gaps • Analyze historical metrics to establish performance baselines • Define SLOs, KPIs, warning thresholds, and breach thresholds • Suggest Prometheus alert rules and Alertmanager routing strategies • Implement KPI and SLO alerts within Grafana alert management • Evaluate Kubernetes cluster topology and infrastructure usage patterns • Recommend architecture optimizations based on observed load and behavior • Document findings in structured audit and advisory reports • Participate in weekly syncs and structured handover sessions

Grafana Kubernetes Prometheus

View details: DevOps Engineer

India

Apply

Job Closed

DevOps Manager

BlastPoint

A.I.-driven customer intelligence tools that give companies the power to discover & engage the humans in their data.

DevOps Engineer108 days ago

Other RemoteTeam 11-50H1B No Sponsor

Company Site LinkedIn

• Ensure high availability, fault tolerance, and scalability of cloud services • Optimize performance and cost efficiency across AWS environments • Lead and mentor a small team of DevOps engineers, fostering a culture of innovation, collaboration, and accountability • Balance hands-on contributions with strategic leadership, leading by example to ensure smooth execution of DevOps initiatives • Design, deploy, and maintain BlastPoint’s AWS-based infrastructure using Terraform • Own the SOC 2 certification and compliance monitoring process • Implement security best practices, including IAM policies, encryption, vulnerability management, and incident response. • Enhance and maintain CI/CD pipelines using GitHub Actions to improve developer productivity and deployment speed • Collaborate with software engineers to streamline build, testing, and release processes • Implement observability, logging, and monitoring solutions to proactively detect and resolve issues. • Establish best practices for disaster recovery, data backup, and infrastructure resilience.

AWS Amazon EC2 MapReduce Terraform

View details: DevOps Manager

United States

$140K - $170K / year

Apply

Job Closed

Principal Site Reliability Engineer

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

DevSecOps Engineer

Data Platform Reliability Engineer, Postgres

DevOps Engineer

DevOps Manager