Job Closed
This listing is no longer active.
Senior Platform SRE – Platform Operations
Location
Peru
Posted
113 days ago
Salary
0
Seniority
Senior
Job Description
Senior Platform SRE – Platform Operations
TD SYNNEX
• Ensure reliability, operability, and continuous improvement of TD SYNNEX enterprise platforms across hybrid cloud and on‑prem environments. • Engineering‑driven operations focused on automation, Infrastructure‑as‑Code (IaC), observability, and toil reduction. • Serve as the L3 escalation for complex incidents; continuously improve platform run posture and readiness for L1/L2 execution. • Own L3 reliability posture; define SLOs/KPIs; lead operability gates and production readiness; maintain runbooks/SOPs. • Design/build operational automation (health checks, remediation workflows); develop Terraform/Ansible configurations; script with Python (preferred), PowerShell, and/or Bash; integrate with ITSM for auditable self‑service and controlled remediation. • Lead diagnosis, stabilization, and recovery for major incidents; drive problem management, RCA, preventive actions; reduce MTTR/MTTD via better signals, runbooks, and automation. • Define actionable signals, alert quality, dashboards, logging; tune alerting to reduce noise; run data‑driven operational reviews. • Advance predictive/proactive operations (anomaly detection, trend/capacity analysis); support Python‑based analytics and ML/DL where applicable; industrialize operational intelligence safely. • Equip provider with clear runbooks, training, standard changes, escalation criteria; govern performance and ITSM alignment; drive continuous improvement. • Partner with Platform Engineering to ensure operable‑by‑design capabilities; feed operational insights into roadmap; mentor peers and promote engineering‑led operations culture.
Job Requirements
- 5+ years in platform/SRE/operations/platform engineering with production ownership in large‑scale environments.
- Hands‑on hybrid operations (cloud + on‑prem) with strong enterprise cloud fundamentals (compute, networking, storage, identity).
- Production IaC and automation (Terraform, Ansible); scripting with Python/PowerShell/Bash (Python strongly preferred).
- Proven L3 incident troubleshooting and major incident leadership.
- Strong infrastructure fundamentals: networking (including DNS/DHCP concepts), virtualization, storage, Windows Server and/or Linux.
- ITSM experience (incident, problem, change) and ticket‑based operations.
- Azure platform knowledge.
Benefits
- Grow Your Career: Accelerate your path to success (and keep up with the future) with formal programs on leadership and professional development, and many more on-demand courses.
- Elevate Your Personal Well-Being: Boost your financial, physical, and mental well-being through seminars, events, and our global Life Empowerment Assistance Program.
- Diversity, Equity & Inclusion: It’s not just a phrase to us; valuing every voice is how we succeed. Join us in celebrating our global diversity through inclusive education, meaningful peer-to-peer conversations, and equitable growth and development opportunities.
- Make the Most of our Global Organization : Network with other new co-workers within your first 30 days through our onboarding program.
- Connect with Your Community: Participate in internal, peer-led inclusive communities and activities, including business resource groups, local volunteering events, and more environmental and social initiatives.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Design, deploy, and operate AWS infrastructure supporting data lakes and containerized workloads. • Implement Infrastructure-as-Code using Terraform, CloudFormation, or similar tools. • Establish secure, scalable, and highly available AWS architectures following cloud best practices. • Collaborate with application and data engineering teams to translate requirements into reliable platform solutions. • Build and manage AWS-based data lakes using services such as S3, Glue, Athena, EMR, Redshift, and Lake Formation. • Support ingestion, transformation, storage, and access for structured and unstructured datasets. • Implement data lifecycle management, tiering, and cost optimization strategies. • Ensure data platforms meet required security, compliance, and governance standards. • Deploy, manage, and operate containerized applications on Amazon EKS. • Build and maintain container images, registries, and deployment pipelines. • Manage Kubernetes clusters including upgrades, scaling, networking, and security. • Partner with developers to improve application reliability, performance, and deployment consistency. • Design, implement, and maintain CI/CD pipelines for data and application workloads. • Automate infrastructure provisioning, application deployment, and operational tasks. • Implement monitoring, logging, and alerting for AWS services, data pipelines, and Kubernetes workloads. • Participate in incident response, root cause analysis, and continuous improvement initiatives.
• Deploy, manage, and maintain virtual machines on VMware vSphere/ESXi and/or Proxmox platforms. • Design, maintain, and standardize VM templates and golden images for Linux and Windows OS environments. • Monitor, tune, and optimize VM performance, capacity, and resource utilization across compute, memory, storage, and networking. • Support lifecycle management of virtual infrastructure, including patching, major upgrades, and decommissioning of legacy systems. • Administer and harden Linux distributions (e.g., Ubuntu, CentOS) and Windows Server environments. • Perform OS-level patching, configuration management, and security hardening following best practices. • Troubleshoot OS, application, and infrastructure issues across virtualized environments. • Implement and maintain backup, restore, and disaster recovery processes for virtual machines and hosted applications. • Automate provisioning, configuration, and operational tasks using Bash, PowerShell, Python, or Ansible. • Build and manage CI/CD workflows that integrate with virtualized infrastructure. • Introduce Infrastructure-as-Code and automation-first practices to reduce manual operational work. • Collaborate with application teams to ensure consistent, reproducible, and deployment-ready environments. • Implement and maintain monitoring, logging, and alerting for hosts, VMs, and applications. • Participate in incident response, root cause analysis, and continuous improvement efforts. • Develop and maintain operational standards, documentation, and runbooks for virtualized infrastructures. • Support on-call or escalation rotations to ensure platform availability and reliability.
Senior Software Engineer – SRE
VeevaHeadquartered in Pleasanton, California, Veeva is a leading provider of cloud-based software and services for the life sciences industry. As an employer, Veeva
• Build Cloud Infrastructure: Rapidly build new cloud infrastructure from scratch, adhering to software development best practices • Drive Reliability & Scalability: Ensure our platform meets the scalability and reliability needs of our hundreds of global customers (across North America, Europe, and Asia) • Lead Incident Management: During an incident, effectively lead triage and mitigation efforts, potentially performing periodic on-call duty for escalations • Automate & Optimize: Develop tools and automation to eliminate manual work and reduce issue resolution times • Full-Stack Diagnostics: Proactively learn all necessary systems to provide full-stack diagnostics and determine root causes of production problems • Strategic Engineering Partnership: Strategize with engineering teams on complex problems, offering insights on what will work at scale (supporting 2M+ users) and guiding development decisions before features ship • Influence Design: Participate in engineering design reviews of new features and drive initiatives to improve operational efficiency and platform scalability • Cross-functional Collaboration: Partner effectively with Product Management, Design, and QA to deliver cutting-edge solutions and direct customer value • Backend Focus: Work across multiple layers of our technology stack, with a primary focus on backend development, and opportunities in frontend and infrastructure • Effective Communication: Communicate clearly with engineering teams, succinctly describing problems for seamless hand-offs during outages with both technical and non-technical audiences • Mentorship: Actively mentor team members, contributing to a positive and high-performing team environment
Senior Software Engineer – SRE
VeevaHeadquartered in Pleasanton, California, Veeva is a leading provider of cloud-based software and services for the life sciences industry. As an employer, Veeva
• Build Cloud Infrastructure: Rapidly build new cloud infrastructure from scratch, adhering to software development best practices • Drive Reliability & Scalability: Ensure our platform meets the scalability and reliability needs of our hundreds of global customers (across North America, Europe, and Asia) • Lead Incident Management: During an incident, effectively lead triage and mitigation efforts, potentially performing periodic on-call duty for escalations • Automate & Optimize: Develop tools and automation to eliminate manual work and reduce issue resolution times • Full-Stack Diagnostics: Proactively learn all necessary systems to provide full-stack diagnostics and determine root causes of production problems • Strategic Engineering Partnership: Strategize with engineering teams on complex problems, offering insights on what will work at scale (supporting 2M+ users) and guiding development decisions before features ship • Influence Design: Participate in engineering design reviews of new features and drive initiatives to improve operational efficiency and platform scalability • Cross-functional Collaboration: Partner effectively with Product Management, Design, and QA to deliver cutting-edge solutions and direct customer value • Backend Focus: Work across multiple layers of our technology stack, with a primary focus on backend development, and opportunities in frontend and infrastructure • Effective Communication: Communicate clearly with engineering teams, succinctly describing problems for seamless hand-offs during outages with both technical and non-technical audiences • Mentorship: Actively mentor team members, contributing to a positive and high-performing team environment


