Job Closed
This listing is no longer active.
Libera el poder de tu e-commerce
Site Reliability Engineer
Location
Mexico
Posted
59 days ago
Salary
0
Seniority
Senior
Job Description
Site Reliability Engineer
DEUNA
• Design, define, and maintain observability and monitoring for our AWS infrastructure. • Define and track SLIs, SLOs, and SLAs for critical systems. • Improve system uptime, latency, and fault tolerance across the platform. • Provide internal libraries and toolsets to developers for diagnostics and debugging. • Manage scaling, performance, and resilience efforts related to system reliability. • Collaborate with technical teams on capacity planning, load testing, and scaling policies. • Improve production operations by defining and evolving deployment strategies and conducting disaster recovery (DR) testing.
Job Requirements
- Excellent communication and collaboration skills.
- Adaptability to thrive in dynamic, fast-paced environments.
- Strong time management and task prioritization.
- Proficiency in English.
- Expertise with Prometheus, Grafana, OpenTelemetry, AWS CloudWatch, or other observability tools.
- Experience designing dashboards, alerts, and log aggregation pipelines.
- Deep understanding of AWS services: ECS, Lambda, RDS, CodePipeline.
- Strong proficiency in Go programming language.
- Skilled at defining SLIs, SLOs, error budgets, and improving Mean Time to Recovery (MTTR).
- Experience conducting failure drills (e.g., Chaos Monkey, Gremlin) to ensure system resilience.
Benefits
- Vacations and additional PTO 🏝️
- Remote work from anywhere 💻
- Economic support for health insurance, internet and cell phone line📱🌐
- We all own DEUNA, we offer stock options 💸
- Learning and development platform 📚
- Multidisciplinary, diverse and dynamic team 🧡
- Growth and career path 🚀
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Define and drive the technical vision, architecture, and strategy for YugabyteDB’s Database-as-a-Service (DBaaS). • Lead, Design, develop, test, debug, troubleshoot, and maintain components of the DBaaS cloud infrastructure • Manage operational priorities of the DBaaS infrastructure • Establish processes for handling and leading response to incidents on databases or infrastructure • Automate and manage regular maintenance operations such as upgrades etc. • Design and build DBaaS processes for encryption, security key/password management, storage management, etc. • Utilize SRE golden signals to analyze and optimize the DBaaS system's performance and reliability strategies
Senior DevOps Engineer
PropertyRadarData-driven real estate and home services professionals use PropertyRadar to drive new business directly since 2007.
• Design and implement scalable, resilient, and secure cloud architectures on AWS • Provide proactive monitoring, management, and support for cloud environments • Lead the migration of legacy AWS cloud workloads and services to AWS managed services • Champion automation initiatives, develop and implement Infrastructure-as-Code (IaC) solutions • Diagnose and resolve complex cloud-related issues and enhance service delivery
Senior Site Reliability Engineer
IO Connect ServicesCloud Technologies | Enterprise Integrations | E-Commerce | Retail | Cloud-Native Development | DevOps | MSP
• Responsible for designing, building, maintaining, and scaling production services and server farms across multiple data centers for complex and data-intensive cloud services. • Design and enhance software architecture to improve scalability, service reliability, capacity, and performance. • Write automation code for provisioning and operating infrastructure at massive scale. You are not an operator, you’re an experienced software engineer focused on operations. • Work with development teams to make sure the applications fit nicely within the infrastructure and scalability/reliability is designed and implemented from the grounds up. You will work with QA on building pipelines and automation for delivering and deploying applications to production. • Roll up the sleeves to troubleshoot incidents, formulate theories and test your hypothesis, and narrow down possibilities to find the root cause. • Write postmortem reviews and remediation recommendation. • Identify bad trends before they become problems; respond to automated system alerts, effectively troubleshoot system errors and work incidents to return systems to normal operating conditions • Author and update high-quality documentation of all relevant specifications, systems and procedures • Support and comply with the company’s Quality Management System policies and procedures.
• Supporting clients' custom and off-the-shelf software through all phases of the lifecycle • Owns the change request process and may coordinate with other teams as necessary • Provides technical advice and weighs in on technical decisions that impact cross functional teams • Researches and may propose new technologies • Develops and owns list of final enhancements • Develops and defines application scope and objectives and prepares technical and/or functional specifications • Performs technical design reviews and code reviews • May own technical testing to ensure unit test is completed and meets the test plan requirements • Assesses current status and supports data information planning • Coordinates on-call support and ensures effective monitoring of system • Mentors others and may lead multiple or small to medium sized projects • Provides technical guidance, and mentoring • Maintain application and environment configuration through automated processes • Monitor testing and production environments to ensure stable operation • Perform initial triage of application issues to ensure rapid resolution



