Senior Associate, Site Reliability Engineering

DevOps EngineerDevOps EngineerFull TimeRemoteSeniorTeam 10,001+Since 1833H1B SponsorCompany SiteLinkedIn

Location

United States

Posted

1 day ago

Salary

$125K - $144.6K / year

Seniority

Senior

No structured requirement data.

Job Description

Senior Associate, Site Reliability Engineering

McKesson

Role Description What We Need: - Multiple Positions Available - Position: Sr. Associate, Site Reliability Engineering - Location: 6555 State Hwy 161, Irving, TX 75039 Job Duties: - Development, deployment, and maintenance of cloud-based infrastructure and data platforms hosted within AWS. - Designing and maintaining scalable, secure, and highly available cloud environments that support our production workloads. - Ensuring the reliability and performance of our Databricks-based data infrastructure, which is central to our business intelligence and data science operations. - Support rapid deployment cycles and maintain consistency across development, staging, and production environments. - Diagnosing complex system failures and implementing preventive measures to minimize downtime. - Managing access controls, encryption, and vulnerability remediation. - Collaborate with software engineers, data scientists, and IT operations teams. - 100% telecommuting allowed from anywhere in the U.S. Qualifications - Master’s degree, or a foreign equivalent, in Computer Science or a related field of study. - Two (2) years of experience in an SRE or DevOps role on any cloud platform, in the job offered or a related occupation. Requirements - Experience must include two (2) years in the following skills: - Amazon Web Services (AWS), including services EC2, S3, Lambda, CloudFormation, and IAM, DMS, RDS Proxy, Event Bus, Athena, State Machines, API Gateway, DynamoDB. - Databricks for managing large-scale data pipelines, real-time analytics, and machine learning workflows. - Infrastructure automation using tools Terraform, Ansible, GitLab and CI/CD pipelines. - Incident management practices and high-availability system design to ensure 24/7 uptime of mission-critical systems. - Security best practices and compliance standards including SOC 2 and ISO 27001. - Linux system administration. - Programming in python, ruby, or bash. - Cloud resources and concepts such as networking, load balancing, DNS, and security. - Identifying performance bottlenecks, identifying anomalous system behavior, and resolving root cause of service issues. - Deploying and maintaining docker applications and container orchestration systems management environments in production using ECS or EKS. - Working with relational databases MySQL and PostgreSQL. - Infrastructure monitoring using Datadog, including setting up synthetic monitoring, oncall alerts and pager alerts. - Participation in Incident Management teams. - Leading and managing technical projects including costing and time management projections. - Identity management within Okta and Azure AD. - Experience must include one (1) year in the following skills: - Salesforce technical support and knowledge. - GoAnywhere MFT. - Tableau. - PHP. Benefits - Offered Wage: $125,000 – $144,600/year - Competitive compensation package as part of Total Rewards. - Additional compensation may include an annual bonus or long-term incentive opportunities. Contact To apply, please send resumes to JobPostings@McKesson.com . Reference #: 002121.

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Multiplica Talent logo

Junior DevSecOps Engineer

Multiplica Talent

We connect extraordinary talent with forward thinking companies.

Full TimeRemoteTeam 201-500Since 2003H1B No Sponsor

• Diseñar, implementar y optimizar procesos de integración y despliegue continuo (CI/CD), infraestructura cloud y prácticas de seguridad. • Promover una cultura DevSecOps dentro de los equipos de desarrollo.

Mexico
Autodesk logo

Senior Site Reliability Engineer

Autodesk

How the world gets designed and made. #MakeAnything

Full TimeRemoteTeam 10,001+Since 1982H1B No Sponsor

• Serve as a primary owner for the reliability, availability, performance, operability, and capacity of one or more production services • Deploy, operate, maintain, and continuously improve production services running in Autodesk GovCloud environments • Partner with engineering teams to ensure services are designed with reliability, scalability, security, and operability in mind • Define and operate reliability practices such as SLOs/SLIs, error budgets, production readiness reviews, service reviews, and operational health reviews • Build automation to improve deployment safety, operational efficiency, incident response, and service recovery • Design, develop, and maintain software, automation, and tooling that improve the reliability, scalability, and efficiency of production systems • Implement and improve monitoring, alerting, logging, tracing, and observability capabilities across supported services • Lead and participate in incident response, troubleshooting, and post-incident reviews focused on learning and continuous improvement • Develop and maintain operational documentation, runbooks, and recovery procedures • Scale and enhance resilience testing and Gameday practices to validate system behavior, recovery capabilities, and operational readiness • Continuously identify and eliminate operational toil through software engineering, automation, and process improvement • Ensure supported services remain compliant with Autodesk security, privacy, and regulatory requirements, including FedRAMP and related controls where applicable • Participate in a 24x7 on-call rotation for production services

Idaho + 1 moreAll locations: Idaho | Texas
$117K - $209.3K / year
Autodesk logo

Senior Site Reliability Engineer

Autodesk

How the world gets designed and made. #MakeAnything

Full TimeRemoteTeam 10,001+Since 1982H1B No Sponsor

• Serve as a primary owner for the reliability, availability, performance, operability, and capacity of one or more production services • Deploy, operate, maintain, and continuously improve production services running in Autodesk GovCloud environments • Partner with engineering teams to ensure services are designed with reliability, scalability, security, and operability in mind • Define and operate reliability practices such as SLOs/SLIs, error budgets, production readiness reviews, service reviews, and operational health reviews • Build automation to improve deployment safety, operational efficiency, incident response, and service recovery • Design, develop, and maintain software, automation, and tooling that improve the reliability, scalability, and efficiency of production systems • Implement and improve monitoring, alerting, logging, tracing, and observability capabilities across supported services • Lead and participate in incident response, troubleshooting, and post-incident reviews focused on learning and continuous improvement • Develop and maintain operational documentation, runbooks, and recovery procedures • Scale and enhance resilience testing and Gameday practices to validate system behavior, recovery capabilities, and operational readiness • Continuously identify and eliminate operational toil through software engineering, automation, and process improvement • Ensure supported services remain compliant with Autodesk security, privacy, and regulatory requirements, including FedRAMP and related controls where applicable • Participate in a 24x7 on-call rotation for production services • Function effectively in a fast-paced environment while helping establish and mature operational excellence practices for Autodesk GovCloud

Idaho
$117K - $209.3K / year
Gladly logo

AI Deployment Engineer

Gladly

Radically personal customer service software.

Full TimeRemoteTeam 51-200H1B Sponsor

• Partner with customers to decompose ambiguous goals into concrete, buildable AI use cases, uncovering hidden complexity and edge cases along the way. • Determine whether the data a use case needs is available, identify the right APIs or MCP sources, and secure access. • Use Gladly’s CLI to register APIs on the App Platform, making customer data accessible to Gladly AI and agents. • Write app actions in JavaScript to condense large API payloads down to the fields the AI actually needs. • Build the workflows and guides that tell Gladly’s AI how to use that information and respond to the customer. • Own use cases end to end after launch: monitor performance, optimize, and build new use cases that lift assist and resolution rates. • Give proactive status updates to customers and the internal team, and partner with SAMs and Implementation Managers to keep goals and timelines aligned. • Participate in QBRs and EBRs to show progress and ensure customers are getting measurable value. • Partner with Solutions Engineering on pre-sales demos, and pull in Professional Services Engineering for the most complex custom work.

Colombia
$40K - $54K / year