UnitedHealth Group is a healthcare and well-being company that’s dedicated to improving the health outcomes of millions around the world. We are comprised of
Associate Site Reliability Engineer
Location
United States
Posted
5 days ago
Salary
$60.2K - $107.4K / year
Seniority
Mid Level
No structured requirement data.
Job Description
Associate Site Reliability Engineer
UnitedHealth Group
Role Description As a member of our team, you will: - Design, develop, and deploy AI-powered solutions using no-code, low-code, and advanced platforms, translating business needs into scalable applications that enhance products, workflows, and decision-making. - Design, deploy, and maintain Kubernetes-based infrastructure to ensure high availability and scalability of applications. - Build and manage CI/CD pipelines using GitHub Actions to enable fast and reliable deployments. - Use Terraform to provision and manage infrastructure in Google Cloud Platform (GCP). - Manage and optimize Apache Kafka-based systems to ensure reliable message streaming and data processing. - Monitor and improve system performance and reliability using Prometheus and Grafana. - Collaborate with developers to automate workflows and implement best practices for infrastructure-as-code (IaC). - Write Python scripts for automation and tooling to enhance operational efficiency. - Troubleshoot and resolve system issues to minimize downtime and impact on users. - Participate in on-call rotations and incident response to ensure high service reliability. Qualifications - 1+ years of experience with Google Cloud Platform (GCP) services such as Compute Engine, Kubernetes Engine, and Cloud Storage. - 1+ years of hands-on experience with Kubernetes for deploying and managing containerized applications. - 1+ years of experience in understanding GitHub Actions for creating and maintaining CI/CD pipelines. - 1+ years of experience in proficiency in Python for scripting, automation, and tooling. - 1+ years of experience with Apache Kafka for building, maintaining, and troubleshooting message-driven systems. - 1+ years of experience using Prometheus and Grafana for monitoring and observability. - Basic level of knowledge of Terraform for infrastructure provisioning and management. Requirements - Familiarity with other cloud providers (e.g., AWS or Azure). - Knowledge of Helm for Kubernetes package management. - Experience with debugging and optimizing distributed systems. - Exposure to security best practices for cloud infrastructure. - Knowledge of Java for developing and troubleshooting backend systems. - Familiarity with DataHub or similar data cataloging and metadata management platforms. - Understanding of Artificial Intelligence (AI) concepts and tools, such as building or managing machine learning pipelines, integrating AI models, or working with ML platforms like TensorFlow, PyTorch, or Vertex AI. - Experience with Golang for developing infrastructure tools or cloud-native applications. Benefits - Comprehensive benefits package. - Incentive and recognition programs. - Equity stock purchase. - 401k contribution (subject to eligibility requirements).
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Senior Site Reliability Engineer
AmwellAmwell (previously known as American Well): digital care delivery will transform healthcare
• Support production systems on platforms such as ESXi, Azure, AWS, and GCP • Utilize configuration management tools for scalable and repeatable systems management including Ansible and Puppet • Design, develop, and maintain automation frameworks, scripts, and operational tooling to improve scalability, reliability, and operational efficiency across infrastructure and platform services. • Configure, maintain, patch, and troubleshoot Linux operating systems with basic knowledge of Windows operating systems • Ensure compliance with security and data handling policies to meet PCI, HIPAA, and other standards • Develop and maintain Infrastructure-as-Code (IaC) solutions using tools such as Terraform, Ansible, and Puppet to support repeatable and standardized deployments. • Collaborate with peers as an accountable and supportive member of Amwell technology teams • Participate in 24/7 call rotation and scheduled maintenance tasks
• Developing, testing, and distributing changes to software, services, and tools. • Developing subject matter expertise in VHP components. • Developing CI/CD pipelines to drive a highly sustainable infrastructure platform supply chain for our rapidly growing Akamai Cloud fleet. • Comfortable working in new tooling, code and environments and automating what’s possible. • Collaborating with our support, operations and engineering teams to troubleshoot complex problems
• Developing, testing, and distributing changes to software, services, and tools the VHP team is responsible for. • Designing and implementing enhancements to VHP observability infrastructure in order to identify and correct problems before they impact our customers. • Developing subject matter expertise in VHP components. • Comfortable working in new tooling, code and environments and automating what’s possible. • Collaborating with our support, operations and engineering teams to investigate and troubleshoot complex problems. • Participating in on-call rotations, guiding restoration and repair of service-impacting issues.
• participating in the day to day operations including the integrity, architecture, modeling, security, and performance tuning for MySQL databases • managing and improving the health and stability of MySQL instances • partnering with development and systems groups to provide deep subject matter expertise in various projects • installing, configuring, upgrading, and migrating existing databases • responding to and resolving database related requests from other departments • defining database requirements as part of the product lifecycle to influence new designs and standards • identifying data security related issues and improve overall security of the database environment • collaborating with support, operations and engineering teams to investigate and troubleshoot complex problems • participating in on-call rotations, guiding restoration and repair of service-impacting issues

