Webflow logo
Webflow

Webflow is the way to design, build, and launch powerful websites visually — without coding.

Senior Site Reliability Engineer, Observability

DevOps EngineerDevOps EngineerFull TimeRemoteSeniorTeam 501-1,000Since 2013H1B SponsorCompany SiteLinkedIn

Location

Argentina

Posted

26 days ago

Salary

0

Seniority

Senior

Job Description

Senior Site Reliability Engineer, Observability

Webflow

• Improve reliability and stability of Webflow’s customer-facing, production infrastructure. • Ensure platform security and scalability for users as projects are launched. • Help define and implement observability practices, enabling engineers to confidently ship and operate services in production. • Build and maintain AI-powered agents and automation that help engineers surface insights faster, reduce alert fatigue, and accelerate incident resolution. • Participate in and improve on-call and incident response processes.

Job Requirements

  • BS / BA college degree or relevant experience.
  • Business-level fluency to read, write and speak in English.
  • 5+ years of experience building, maintaining, and debugging distributed systems in a customer-facing environment that allows for little to no downtime.
  • Hands-on experience with observability platforms and tooling such as Datadog, Grafana, Prometheus, ElasticSearch or similar.
  • Experience with OpenTelemetry or similar instrumentation frameworks for collecting metrics, traces, profiles and logs across distributed services.
  • Experience defining and operationalizing SLOs/SLIs at scale.
  • Experience navigating and scaling multi-tier cloud environments on either AWS or GCP.
  • Experience with container-centric architectures built with tools like Docker and Kubernetes (EKS, GKE, AKS, etc.), or ECS.
  • Experience with infrastructure-as-code tools like Terraform,or Pulumi.
  • Experience contributing to full-stack applications built using software like React, Node.js, and MongoDB or PostgreSQL.

Benefits

  • Ownership in what you help build.
  • Health coverage that actually covers you.
  • Support for every stage of family life.
  • Time off that’s actually off.
  • Wellness for the whole you.
  • Invest in your future.
  • Monthly stipends that flex with your life.
  • Bonus for building together.

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Level Access logo

DevOps Engineer

Level Access

A leading provider of digital accessibility solutions, Level Access endeavors to create a world in which individuals with disabilities can readily access digital systems. Founded b

DevOps Engineer26 days ago

• Maintain and improve a multi-cloud infrastructure that supports the company’s accessibility solutions. • Implement and manage scalable, reliable, and secure infrastructure on AWS, with an awareness of cost optimization best practices. • Support Kubernetes clusters, including deploying applications and managing configurations. • Build, maintain, and enhance CI/CD pipelines using best practices to improve reliability and speed of software delivery. • Automate infrastructure provisioning and configuration management with tools such as Terraform and Ansible. • Collaborate with development teams to ensure deployment and architectural best practices. • Implement observability and monitoring solutions using tools like Datadog and CloudWatch to ensure system reliability. • Apply GitOps workflows using tools like ArgoCD or FluxCD for deployment automation. • Contribute to infrastructure and process documentation for consistency and knowledge sharing. • Assist with compliance efforts including security and frameworks like ISO 27001 and SOC 2. • Participate in incident response and on-call rotations as required.

United States
Job Closed
Volvo Cars logo

Salesforce Release Engineer

Volvo Cars

For a better future. We want to provide you with the freedom to move in a personal, sustainable, and safe way.

DevOps Engineer26 days ago
Full TimeRemoteTeam 10,001+Since 1927H1B No Sponsor

• Manage end-to-end Salesforce release lifecycle across Dev, QA, UAT, and Production environments • Configure and maintain CI/CD pipelines using Gearset • Perform metadata comparisons, validations, and deployments using Gearset • Troubleshoot deployment failures and resolve metadata dependencies, test failures, and conflicts • Integrate and manage source control using GitHub (branching, pull requests, merges) • Collaborate with developers, QA, and business stakeholders to coordinate releases • Maintain deployment best practices, governance, and audit compliance • Monitor deployments, backups, and rollback strategies using Gearset • Drive improvements in release automation, quality, and speed

India
Jalasoft logo

Senior DevOps Engineer – AWS, Azure

Jalasoft

We provide the best software engineering solutions by investing in our people first.

DevOps Engineer26 days ago
Full TimeRemoteTeam 1,001-5,000Since 2003H1B No Sponsor

• Responsible for designing, implementing, and managing scalable cloud infrastructure • Automating deployment processes and ensuring system reliability and security • Collaborate with development and operations teams to streamline CI/CD pipelines and enhance operational efficiency

Colombia
Stellar Cyber logo

Staff SRE Engineer

Stellar Cyber

Empowering lean security operations teams of any skill to successfully secure their environments. WE ARE HIRING!

DevOps Engineer26 days ago
Full TimeRemoteTeam 51-200H1B Sponsor

Role Description We are seeking a highly skilled Staff Site Reliability Engineer (SRE) to join our team and drive reliability, scalability, and efficiency across our production systems. The ideal candidate will have deep expertise in cloud infrastructure, Kubernetes administration, observability, and incident management, with a proven track record of building and maintaining highly available and resilient platforms. As a senior member of the SRE team, you will not only operate complex distributed systems but also influence architecture, tooling, and best practices to ensure operational excellence. - Administer and maintain container orchestration platforms and containerized workloads. - Monitor and troubleshoot production systems, participating in on-call rotations to ensure reliability. - Drive observability improvements by enhancing monitoring, logging, and alerting capabilities across systems and data platforms. - Administer and optimize cloud-based environments across multiple providers. - Manage and support distributed data platforms and real-time processing systems. - Develop and maintain continuous integration and delivery pipelines for efficient and reliable deployments. - Own and implement Infrastructure as Code (IaC) practices to ensure consistency and scalability. - Automate and orchestrate infrastructure using programming and scripting languages. - Perform system administration and networking tasks to support internal and external environments. - Collaborate effectively with engineers and stakeholders across different time zones. Qualifications - 5+ years of experience in Site Reliability Engineering, DevOps, or Platform Engineering roles. - Proven success leading large-scale production systems in cloud environments (AWS, GCP, Azure, or OCI). - Demonstrated leadership in driving incident response, on-call best practices, and reliability-focused culture. - Strong experience with production on-call operations and incident management. - Advanced proficiency in Kubernetes administration and troubleshooting. - Hands-on experience with observability tools: Prometheus, Grafana, Loki, and Alertmanager. - Knowledge in chat-based operations interfaces and/or auto-remediation controllers using AI agentic framework. - Understanding of AI agents for Auto-triaging alerts, correlate signals and suggest/root-cause hypotheses. - Expertise in operating data platforms (Elasticsearch, MongoDB, Spark, Kafka, Redis). - Proficiency with public cloud services (AWS, Azure, GCP, or OCI). - Strong programming and automation skills in Python and Bash. - Deep understanding of Infrastructure as Code (Terraform, Helm). - Experience with CI/CD pipelines (GitHub Actions, Bitbucket, ArgoCD). - Strong technical background in distributed systems, databases, networking, and Linux administration. - Excellent problem-solving, communication, and leadership abilities. - Bachelor's degree in Computer Science, Engineering, or a related technical field. - Certifications in AWS, GCP, Observability, Linux or Kubernetes are a plus.

Hungary