Job Closed

This listing is no longer active.

The Leaflet

An independent platform for cutting-edge, progressive, legal, and political opinion.

Senior Site Reliability Engineer

DevOps EngineerDevOps EngineerOther Remote SeniorTeam 11-50H1B No SponsorCompany Site LinkedIn

Location

Florida

Posted

166 days ago

Salary

Seniority

Senior

Bachelor Degree5 yrs expEnglishAnsible AWS Azure GCP Grafana Java Kubernetes Prometheus Python Terraform

Job Description

• Ensure the availability, reliability, and performance of a high-traffic Java-based application in a distributed environment. • Troubleshoot and resolve complex issues in production and non-production environments. • Participate in both pre- and post-deployment performance testing and monitoring efforts to improve application performance. • Optimize Java application performance, ensuring efficient resource utilization and scaling. • Deploy and manage the Grafana stack (Grafana, Prometheus, Loki) to provide real-time monitoring, logging, and alerting. • Implement and refine observability strategies to enhance application and infrastructure visibility. • Create and maintain dashboards, alerts, and logs for comprehensive monitoring of system health and performance. • Support the operations team’s incident response efforts, conduct post-mortems, and identify root causes of issues to prevent recurrence. • Document and share lessons learned from incidents, contributing to a culture of continuous improvement. • Work closely with developers, architects, and other engineers to design and implement solutions that improve application reliability. • Collaborate closely with DevOps and NOC teams to support the application platform. • Communicate SRE practices and principles to technical and non-technical stakeholders. • Provide feedback and insights on application performance, potential improvements, and observability metrics.

Job Requirements

Degree in computer science or a related field, or equivalent work experience
5+ years in SRE, DevOps, or similar Infrastructure roles
Experience managing large-scale, high-availability production systems
Track record of incident response and post-mortem processes
Experience with capacity planning and performance optimization
3+ years hands-on experience managing production Kubernetes clusters
Deep understanding of k8s architecture, networking, storage, and security
Experience with cluster scaling (Karpenter), upgrades, and multi-cluster management
Proficiency with kubectl, Helm, and Kubernetes operators
Container orchestration and troubleshooting expertise
Advanced expertise with the Grafana stack for dashboards, alerting, and visualization
Hands-on experience with Grafana Alloy for telemetry data collection
Proficiency in PromQL
Experience with Loki for log aggregation and analysis
Experience building comprehensive monitoring and alerting strategies
Hands-on experience managing Java-based applications in large-scale, distributed environments, with a focus on JVM tuning and application optimization.
Cloud Platform expertise (AWS, GCP, or Azure)
Familiarity with infrastructure as code (IAC) tools like Terraform/Terragrunt or Ansible.
ArgoCD proficiency for GitOps workflows and continuous deployment
Strong scripting abilities in Bash, Python, or Go
Experience with CI/CD pipleines and automation tools
Configuration Management and deployment automation
Strong troubleshooting skills, with a proactive approach to diagnosing and resolving performance bottlenecks.
Proven experience managing on-call rotations, incident response, and root cause analysis.
Ability to mentor junior team members
Strong communication skills (both written and verbal), positive attitude, and ability to receive constructive feedback.

Benefits

Competitive pay and benefits
Flexible vacation allowance
A hybrid / remote working environment
Startup culture backed by a secure, global brand

Related Categories

DevOps Engineer

Related Job Pages

DevOps Engineer Jobs in Florida Remote Python Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

DevOps Engineer

StarCompliance

We are Reputation Guardians, on a mission to make compliance simple and easy.

DevOps Engineer166 days ago

Full Time RemoteTeam 201-500H1B No Sponsor

Company Site LinkedIn

• Design, build, and own the creation of shared CI/CD processes for cloud-hosted applications, data and infrastructure • Assist with the automation and maintenance of cloud-hosted infrastructure via Infrastructure as Code • Implement platform tooling and automation that supports our tenancy on-boarding process • Support deployment, configuration, and environment management across the software platform • Collaborate with Development and QA teams to improve build, test, and release workflows • Create and evangelise the creation of automation initiatives that improve consistency in our release processes and reduce manual effort where possible • Assist with cloud platform reliability and deployment troubleshooting • Support colleagues in Product development and SRE as part of a joint responsibility for the successful operation of our software platform • Maintain documentation for pipelines and infrastructure components

Azure Cloud SQL .NET

View details: DevOps Engineer

United Kingdom

Apply

Job Closed

Staff DevOps Engineer – Malware Research

Bitsight

DevOps Engineer167 days ago

Full Time RemoteTeam 501-1,000Since 2013H1B Sponsor

Company Site LinkedIn

• Design, deploy, and manage scalable, high-volume malware tracking infrastructure • Develop tools to automate provisioning of security research infrastructure • Autonomously identify and develop implementation plans for infrastructure improvement opportunities • Collaborate with engineering and product teams on the design of new production data feeds from threat research capabilities • Increase infrastructure robustness through thoughtful design of data monitoring and alerting practices • Provision, configure, and deploy cloud infrastructure resources to support threat research operations • Document infrastructure capabilities and functionality to support troubleshooting and response to data-related escalations

Ansible AWS DNS GCP Python Terraform

View details: Staff DevOps Engineer – Malware Research

Portugal

Apply

Job Closed

Senior Site Reliability Engineer

Adobe

Changing the world through digital experiences.

DevOps Engineer167 days ago

Other RemoteTeam 10,001+Since 1982H1B Sponsor

Company Site LinkedIn

• Identifying, triaging, prioritizing, and facilitating the mitigation of application availability concerns • Propose and implement projects for time-saving automation and vital reliability measures • Partner with multi-functional experts to define customer-centric solutions • Define and refine operational practices, tooling, and runbooks • Evolve our CI/CD pipeline and application infrastructure

Linux

View details: Senior Site Reliability Engineer

California + 3 more

$139K - $257.6K / year

Apply

Job Closed

DevOps Ingenjör – Försvarsindustrin

AFRY

DevOps Engineer167 days ago

Full Time RemoteTeam 10,001+H1B Sponsor

Company Site LinkedIn

• Designa, implementera och underhålla CI/CD-pipelines (t.ex. Jenkins, GitLab CI). • Arbeta med containerisering och orkestrering (Docker, Kubernetes). • Automatisera infrastruktur med verktyg som Ansible, Terraform eller liknande. • Säkerställa att system är säkra, skalbara och övervakade. • Samarbeta nära utvecklingsteam för att förbättra leveransflöden och minska time-to-market. • Delta i felsökning, prestandaanalys och förbättring av befintliga system. • Arbeta i Linux-miljö och hantera versionshantering med Git.

Ansible Docker Jenkins Kubernetes Linux Terraform

View details: DevOps Ingenjör – Försvarsindustrin

Sweden

Apply

Senior Site Reliability Engineer

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

DevOps Engineer

Staff DevOps Engineer – Malware Research

Senior Site Reliability Engineer

DevOps Ingenjör – Försvarsindustrin