Dropbox is the one place to keep life organized and keep work moving.
Site Reliability Engineer
Location
Mexico
Posted
130 days ago
Salary
0
Seniority
Senior
Job Description
Site Reliability Engineer
Dropbox
• Ensure the reliability, scalability, and performance of Dropbox's infrastructure and services • Collaborate with cross-functional teams to develop and maintain best practices for monitoring, logging, and incident response • Build, Implement and maintain automations & infrastructure-as-code tooling, specifically Terraform, Ansible, and Github Actions as well as custom code platforms • Utilize container orchestration platforms, such as Kubernetes, Amazon ECS and Red Hat Openshift, to manage containers at scale • Manage and optimize monitoring and logging pipelines using tools like Datadog and Cribl LogStream • Drive improvement projects related to service health and visibility for our stakeholders, ranging from developers to business service owners to C-level • Develop and maintain custom tooling and automation scripts in Bash, Python and other scripting languages
Job Requirements
- 5+ years of experience in site reliability engineering or a similar engineering roles with hands-on coding experience
- Strong knowledge of AWS services, including EC2, S3, RDS, R53, Lambda, and others
- Strong knowledge of Linux administration, internals, filesystems, volume management and specific distro's such as Ubuntu, RHEL, DNS, DHCP
- Experience with monitoring and logging tools, Datadog and logging pipeline tools such as Vector or Cribl LogStream
- Experience driving one or more transformational programs related to metrics and observability
- Experience with scripting in a higher level language (Python preferred)
- Experience developing automation to solve infrastructure-related tasks with tools such as Chef/Ansible/Terraform
- Experience with log analysis and building metrics, alerts and visuals from log data
- Strong proficiency in infrastructure-as-code tools, such as Terraform
- Strong Proficiency in Config Management tools specifically Ansible Automation Platform and Chef
- Experience with containerization technologies, such as Docker, and container orchestration platforms like Kubernetes or Amazon ECS
- Knowledge of LDAP, REST API's and current Auth
- Familiarity with GitHub and Git-based workflows
- Understanding of RDS databases and network security technologies, such as WAF
- Strong problem-solving skills and the ability to work well in a fast-paced, collaborative environment
- Excellent written and verbal communication skills.
Benefits
- Competitive salary
- Flexible work hours
- Professional development budget
- Home office setup allowance
- Global team events
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Deploy, maintain, and optimize cloud-based data infrastructure on AWS • Own CI/CD pipelines, infrastructure automation, and monitoring • Ensure platform stability, observability, and scalability • Support the transition from single-client to multi-client architecture • Work closely with the founder and data engineers to move fast and safely
DevSecOps Site Reliability Engineer
System Automation CorporationBringing innovative solutions to our regulatory communities. FOLLOW us to be connected to the Evoke Network.
• Design and evolve Azure platform infrastructure with a focus on scalability, reliability, and growth readiness. • Participate in capacity planning to support growth, peak demand, and seasonal usage patterns. • Integrate with development resources to implement infrastructure-as-code (e.g., Bicep). • Troubleshoot production infrastructure issues and lead incident response efforts, including coordination, escalation, and real-time remediation across teams. • Conduct post-incident reviews (postmortems) focused on root cause analysis, corrective actions, and long-term reliability improvements rather than blame. • Monitor and operate production systems using Azure Monitor, Application Insights, Sentinel, and related observability tooling. • Improve system reliability and performance through alerting, error monitoring, SLOs/SLAs, and analysis of performance and capacity trends. • Collaborate with security analyst to define and implement security controls across Azure resources and pipelines. • Manage secrets, certificates, and identity integrations. • Automate security posture checks in CI/CD pipelines. • Maintain policy-as-code using Azure Blueprints or Defender for Cloud Compliance & Audit Support. • Support SOC 2 Type II compliance through tooling, automation, and audit readiness. • Respond to evidence requests and generate reports from observability and security systems. • Contribute to the documentation of platform controls and best practices. • Support, maintain, and own CI/CD pipelines (GitHub Actions, Azure DevOps, or equivalent). • Optimize build, test, and release flows, partnering with engineers to diagnose failures and improve deployment reliability. • Define and maintain consistent environment standards across development, staging, and production to ensure deployment safety, reliability, and compliance. • Partner with engineering teams to improve deployment promotion strategies, rollback mechanisms, and release safety practices.
Lead Site Reliability Engineer – Infrastructure
Milestone PLM Solutions Private LimitedEngineering Design Services | Architectural-MEP-Structural BIM Services | Cost Effective CAD-FEA-BIM Outsourcing
• Own the reliability, scalability, and operability of our shared platform and production systems • Operate and evolve large-scale distributed systems • Lead the design, build, and implementation of automation, orchestration, and operational tooling • Set technical direction and influence platform strategy • Establish and enforce standards, operational rigor, and best practices • Lead the adoption and execution of modern CI/CD, GitOps, and cloud-native infrastructure practices • Mentor and develop senior and staff engineers • Collaborate closely with product and engineering stakeholders
Software Engineer – DevOps
WincentWe create market-making algorithms to help achieve a liquid and efficient market environment for digital assets
• Support engineering teams by maintaining and improving the high-frequency trading (HFT) platform across multiple cloud and On-Prem environments. • Work on network, operation system and application to achieve ultra-low latency and high system reliability. • Enhance the developer experience by introducing modern tools, automation, and best practices across infrastructure and development workflows. • Explore and implement new technologies to further optimize infrastructure performance and scalability.




