IPSY is the beauty industry’s most powerful platform, uniting brands, creators, and hyper-engaged consumers with unprecedented access to each other through the ultimate beauty membership. We're proud to be a remote-first company. Our fully remote team members have the chance to live and work where they want, because we believe work should fit into your life—not the other way around.

Senior Site Reliability Engineer

EngineerEngineerContract Remote SeniorTeam 201-500

Location

Argentina

Posted

22 hours ago

Salary

Seniority

Senior

Observability/Monitoring Datadog AI Slack AWS Netlify Auth0 Contentful CI/CD Microservices Python Shell Terraform

Job Description

Role Description We are looking for a Senior Site Reliability Engineer Contractor to join our Foundation / SREIQ team and take a leadership role in keeping IPSY and BoxyCharm fast, available, and resilient for millions of beauty community members. As a Senior SRE, you will be a primary driver of how we detect, respond to, and learn from incidents, partnering closely with Engineering, Product, and the broader Tech org to protect the critical flows our members rely on every day. This is a senior hands-on reliability role. You will own observability strategy and alerting in Datadog, lead our on-call rotation, act as primary SRE Partner and war room lead during major incidents, drive root cause analyses end to end, mentor peers, and set the direction for automation and tooling across the team. We are an automation-forward, AI-embracing team: we expect you to model best practices and push the frontier of what modern AI tools can do for reliability. The Senior SRE Engineer Contractor will report to the Infrastructure Engineering Manager and will be fully remote from Argentina. What You'll Be Doing - Own and evolve observability strategy across our platform in Datadog: dashboards, monitors, APM, log pipelines, and meaningful, low-noise alerting tied to user impact. - Define, track, and drive SLIs, SLOs, and error budgets for critical services, and use them to lead reliability and prioritization conversations with service owners and leadership. - Act as primary SRE Partner on major incidents: lead war rooms, triage, classify priority (P1-P6), validate hypotheses, drive resolution, and keep clear real-time documentation throughout. - Drive incident response per our framework, meeting target response times (P1 within ~15 minutes), creating and running war rooms when severity requires, and maintaining structured communications across incident Slack channels. - Lead and own blameless post-incident reviews (RCAs) end to end, identifying root causes, systemic fixes, and action items that prevent recurrence. - Set the automation roadmap: architect scripts, tooling, and self-healing / automated remediation to reduce manual operational work and speed up recovery across the team. - Leverage and champion AI tools (e.g., Claude, Cursor) to accelerate debugging, generate and maintain runbooks, draft RCAs, and build automation, helping the broader team do the same. - Set reliability strategy for our cloud and third-party stack (e.g., AWS, Netlify, CommerceTools, Auth0, Contentful), including capacity, performance, and dependency-failure readiness. - Lead contributions to CI/CD reliability, deployment safety, code-freeze readiness around peak commerce events, and infrastructure-as-code practices. - Own and evolve SRE runbooks, triage workflows, and the incident priority framework so the whole org responds consistently. - Mentor mid-level SREs, drive a culture of blameless learning, and embed reliability earlier in the development lifecycle. - Partner with Engineering and Product leadership to influence architecture and design decisions that improve reliability at the source. Qualifications - A great attitude, strong ownership, and a genuine eagerness to drive change, mentor others, and raise the bar. - Deep hunger for automation: you instinctively look for systemic ways to remove toil and build tooling that scales the whole team. - Strong comfort with AI tools and a desire to push how they can be applied to reliability work, and to help the team do the same. - Significant hands-on experience with observability and monitoring tooling, ideally Datadog (dashboards, monitors, APM, logs); deep experience with other platforms also translates well. - Extensive experience in on-call and incident response, including leading war rooms, driving prioritization, managing escalation, and owning post-incident reviews end to end. - Deep mastery of SRE fundamentals: SLIs / SLOs / error budgets, reducing toil, and driving reliability strategy across a platform. - Strong knowledge of cloud infrastructure (AWS preferred) and modern distributed / microservice and API-gateway architectures. - Strong scripting and automation skills (e.g., Python, Bash, or similar); ability to reason about and improve code across services. - Solid experience with CI/CD pipelines and infrastructure-as-code (e.g., Terraform). - Excellent communication: able to run an incident channel, brief leadership during a major outage, and write a polished, actionable RCA. - Collaborative leader who works effectively across a distributed, multi-time-zone team and influences without authority. Bonus If You Have - Experience operating high-traffic consumer or e-commerce platforms, especially through peak events (flash sales, product drops). - Experience with Opsgenie (or PagerDuty / similar) for alerting and escalation. - Experience with e-commerce / subscription platforms such as CommerceTools, and identity providers such as Auth0. - Experience building automated remediation, chaos / resilience testing, or self-healing systems. - Experience with product analytics and monitoring (e.g., Amplitude) to connect reliability to user behavior. - Experience defining or maturing an incident management framework and driving a blameless post-mortem culture. What We Offer - Competitive salary (USD) - Paid time off and work from home flexibility EEO Statement We celebrate diversity and are an equal-opportunity employer. We do not discriminate based on race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, disability status, or any other protected characteristic. If you need reasonable accommodation in the application or employment process, please contact us. IPSY is based in the US and to ensure compliance with country specific employment laws we partner with DEEL, that assists us with employee recruiting and payroll. Please submit a resume/CV in English.

Related Categories

Engineer

Related Job Pages

Remote Python Jobs (US)More Remote Jobs

More Engineer Jobs

Sr. Eng, Project

Magna International

Founded in 1957, Magna International is now one of the largest automotive suppliers in the world. Headquartered in Aurora, Ontario, Canada, the company maintains more than 320 manu

Engineer22 hours ago

Full Time RemoteTeam 171,000Since 1957

Job descriptions may display in multiple languages based on your language selection. What we offer:At Magna, you can expect an engaging and dynamic environment where you can help to develop industry-leading automotive technologies. We invest in our employees, providing them with the support and resources they need to succeed. As a member of our global team, you can expect exciting, varied responsibilities as well as a wide range of development prospects. Because we believe that your career path should be as unique as you are. Group Summary:Transforming mobility. Making automotive technology that is smarter, cleaner, safer and lighter. That’s what we’re passionate about at Magna Powertrain, and we do it by creating world-class powertrain systems. We are a premier supplier for the global automotive industry with full capabilities in design, development, testing and manufacturing of complex powertrain systems. Our name stands for quality, environmental consciousness, and safety. Innovation is what drives us and we drive innovation. Dream big and create the future of mobility at Magna Powertrain. Job Responsibilities: Mission of the Position: Responsible for the mechanical engineering project management of several varying driveline systems/products including, but not limited to AWD/4WD systems, Transfer Cases, Differentials, Couplings and Power Transfer Units (PTU’s). System integration of various mechatronic sub-systems. Translates project requirements into project objectives and tasks. Manages activities concerned with Mechanical technical developments, scheduling, and resolving engineering design and test problems. This job requires strong analytical and technical abilities and demonstrate fast, but carefully thought-out results. Strong leadership and control of the work process from beginning to end is necessary. Key Responsibilities: - Lead cross-functional teams through a series of key milestones on multiple projects. - Develop a resource plan, with buy-in from multiple engineering disciplines and align appropriate talent to support activities required to successfully complete the project. - Responsible for tracking and managing all project changes, developing project presentations, advising management of project status, and managing and reporting all project costs. - Coordinate the efforts of the functional departments to ensure that the projects are managed in accordance with established policies and procedures. - Coordinate and lead engineering quotation activities from inception to approval. - Working directly with customers and internal teams to determine project timelines as well as any potential delays and initiate plans for alternative actions - Work with the customer to understand timing goals, define customer requirements, and manage tasks to achieve customer deliverables/specifications. - Drive the development of new technologies to improve quality, efficiency and reduce cost. Requirements and Qualifications: - Bachelor's degree in Mechanical or Electrical Engineering or similar required; Master's degree preferred. - Advanced English, German desirable. - Strong product planning, development, and analytical skills. - Excellent written and verbal communication skills as well as teamwork skills. - Minimum 5 years related work experience within Engineering. Experience in powertrain systems preferred. - 3+ years of Project Management and/or product related experience is preferred. Awareness, Unity, Empowerment:At Magna, we believe that a diverse workforce is critical to our success. That’s why we are proud to be an equal opportunity employer. We hire on the basis of experience and qualifications, and in consideration of job requirements, regardless of, in particular, color, ancestry, religion, gender, origin, sexual orientation, age, citizenship, marital status, disability or gender identity. Magna takes the privacy of your personal information seriously. We discourage you from sending applications via email or traditional mail to comply with GDPR requirements and your local Data Privacy Law. AI-Assisted Screening Disclosure As part of our commitment to a fair, consistent, and efficient recruitment process, we may use artificial intelligence (AI) tools to assist in the initial screening of applications submitted through our Workday system. These tools help identify qualifications and experience that align with the role requirements. Please note that AI is used solely to support our recruiters. Final decisions are always made by the hiring manager and the hiring team. Importantly, no applicant data is shared externally through these AI tools. All information remains securely within our systems and is handled in accordance with our privacy and data protection policies. Under conditions defined by applicable law, you may have the right to request an explanation of how AI is used to support decision-making. If you have any questions or concerns about this process, feel free to contact our Talent Attraction team. Worker Type: Regular / Permanent Group: Magna Powertrain

View details: Sr. Eng, Project

Mexico

Apply

Specialized Account Engineer

Hewlett Packard Enterprise

Engineer22 hours ago

Full Time RemoteTeam 10,001+Since 2015H1B Sponsor

Company Site LinkedIn

• Provide Customer with overview of installation activity, site-specific information and access to appropriate contacts. • Handle Customer-relation problems promptly and appropriately, provide guidance, escalate issues according to established procedures. • Provide software service, pre-sales, post-sales or service delivery support. • Deliver services, including customized services to large enterprise, complex or corporate accounts. • Use proactive monitoring procedures/tools to identify problem prevention opportunities.

Linux

View details: Specialized Account Engineer

Illinois + 3 more

$24 - $49 / hour

Apply

Senior Endpoint Support, Desktop Engineer

Connection

We solve IT

Engineer23 hours ago

Full Time RemoteTeam 1,001-5,000Since 1982H1B Sponsor

Company Site LinkedIn

• Serve as the Tier 3 escalation point for complex desktop and endpoint support issues. • Troubleshoot and resolve advanced Windows desktop, Microsoft 365, and endpoint management issues. • Provide technical leadership and mentorship to Desktop Support engineers, assisting with escalated incidents and knowledge transfer. • Support and administer Microsoft endpoint technologies including Microsoft 365, Active Directory, Intune, Windows Autopilot, and Microsoft Entra ID. • Manage endpoint provisioning, configuration, deployment, and lifecycle management. • Assist with endpoint security, device compliance, software deployments, and policy management. • Work collaboratively with infrastructure, security, and application teams to resolve cross-functional issues. • Document technical solutions, standard operating procedures, and best practices.

View details: Senior Endpoint Support, Desktop Engineer

United States

Apply

GSD Remote Desktop Engineer, English

DXC Technology

Delivering excellence for our customers and colleagues

Engineer23 hours ago

Full Time RemoteTeam 10,001+Since 2017H1B Sponsor

Company Site LinkedIn

• Provide technical support in infrastructure services, responding to issues and assisting in tasks. • Contribute to the implementation of infrastructure projects and assignments. • Monitor and troubleshoot infrastructure systems, ensuring reliability and performance. • Work with the team to enhance infrastructure effectiveness and address technical challenges. • Support the development of infrastructure documentation, including incident logs and configuration records. • Apply technical knowledge to address infrastructure-related challenges. • Follow established best practices and standards in infrastructure service delivery. • Utilize technical skills to address infrastructure issues and incidents.

Azure Cloud

View details: GSD Remote Desktop Engineer, English

Bulgaria

Apply