Spend is the fuel to help your company deliver performance, profitability, and purpose!
Lead Database Reliability Engineer
Location
California
Posted
2 days ago
Salary
$142K - $198.7K / year
Seniority
Senior
Job Description
Lead Database Reliability Engineer
Coupa Software
• Lead database architecture and design initiatives, delivering scalable, high-performance, reliable solutions aligned with business needs and long-term strategic goals. • Leverage AI/ML technologies to improve database operations through performance optimization, capacity planning, anomaly detection, predictive maintenance, and automation. • Develop, maintain, and enhance database monitoring, alerting, backup, and disaster recovery systems to ensure system health, data integrity, and high availability. • Troubleshoot and resolve complex database issues, providing technical leadership, guidance, and operational support, including participation in on-call rotations. • Ensure compliance with regulatory requirements and internal policies by conducting regular database audits, assessments, and governance reviews. • Collaborate effectively across cross-functional teams, mentor junior database engineers, stay current on emerging database technologies and best practices, and remain flexible to support global teams across multiple time zones.
Job Requirements
- Bachelor's or Master's degree in Engineering, Science, or a related field (or equivalent practical experience) with 8+ years of hands-on database administration, management, and performance optimization experience.
- Deep expertise in MySQL, including database design, implementation, configuration, security, troubleshooting, maintenance, backup/recovery, replication, and performance tuning.
- Strong automation and cloud experience, including scripting with Bash, Python, or Ruby and managing large-scale AWS environments, with knowledge of cloud-native database services such as RDS and Aurora; Azure or GCP experience is a plus.
- Experience building and maintaining database observability solutions, including monitoring, dashboards, and alerting using tools such as PMM, New Relic, VividCortex, or similar database management platforms.
- Expertise in high-availability and disaster recovery solutions, including failover clustering, Orchestrator, and strategies that ensure database reliability, resilience, and business continuity.
- Preferred qualifications include experience with PostgreSQL or MongoDB, configuration management tools (Chef/Puppet), Terraform, GitHub-based workflows, and relevant database administration or automation certifications.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
DevSecOps / Platform Engineer
RightMove HealthYour on-demand resource for movement health. Powered by the world's #1 ortho.
Role Description We run a serverless-first stack on AWS, and we operate as a true DevOps org: engineers build and own their own infrastructure, but we're looking to add a DevOps/Platform engineer to the team to help us grow. This is a sole-platform-engineer role today, with real autonomy and real scope. You'll work across infrastructure, developer experience, and security/compliance. You'll build tooling so that we don't have to keep reinventing the wheel; you'll standardize and update our infrastructure; and you'll own and update our CI/CD pipeline so we can deploy code safely and quickly. What you'll own: - Application infrastructure: - Build and maintain reusable infrastructure components, so application engineers can safely stand up new components without shooting themselves in the foot. - Take the lead on improving observability (monitoring, alerting, etc), so it's easy for application engineers to know that their code is running, and to learn about issues before users have to report them. - Solve concrete infra needs as they arise — everything from setting up SFTP sites for customer file-sharing to creating a pathway so that outbound API requests are sent from a fixed IP. - Standardize and harden our AWS footprint, with security, budget, and HIPAA considerations front of mind. - Developer experience: - Make our CI/CD pipelines faster and more effective. - Solve developer pain points like shared dev environments and locally running code. - Help us move to the next stage of maturity with improved monitoring and alerting tools. - Security Engineering: - You'll own security tooling integration across our SDLC — embedding automated scanning and policy enforcement so that security is a feature of our delivery pipeline, not a final gate. - Run and tune SAST, DAST, SCA, and container scanning tools so the signal-to-noise ratio is actually usable. - Implement guardrails and controls using AWS-native services such as AWS Security Hub, GuardDuty, and Config; conduct regular vulnerability scans, configuration reviews, and remediation tracking. - Threat model new services and architecture changes before they ship, ideally during design review rather than after launch. - Other duties: - Act as the technical interface to our outsourced IT department and recognize when our users need additional support. Qualifications - Strong AWS experience, especially serverless (currently AppSync and Lambdas, but we're considering a move to API Gateway). - Solid infrastructure-as-code expertise (Terraform, CloudFormation, CDK, or similar). We're currently transitioning from the Serverless Framework to CDK (TypeScript); CDK experience is a plus, but strong IaC fundamentals matter more. - A platform mindset: you measure your success by how productive you make other engineers, not by how many tickets you close. - Comfortable in code: not afraid to edit application code to achieve infrastructure or tooling goals. - Solid grasp of cloud security fundamentals (IAM, network boundaries, secrets, least privilege). - CI/CD pipeline experience and a bias toward automation. - Comfort operating with autonomy in a small team where you'll likely wear many hats. Nice to have - Experience supporting SOC 2, HIPAA, or similar audits/compliance regimes. - SSO / identity tooling (Okta, AWS IAM Identity Center, etc.).
Principal Site Reliability Engineer - Remote
OptumOptum, part of the UnitedHealth Group family of businesses, is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best. Here, you will find a culture guided by inclusion, talented peers, comprehensive benefits and career development opportunities. Come make an impact on the communities we serve as you help us advance health optimization on a global scale. Join us to start Caring. Connecting. Growing together. At Optum, we support your well-being with an understanding team, extensive benefits and rewarding opportunities. By joining us, you’ll have the resources to drive system transformation while we help you take care of your future. We recognize the power of connection to drive change, improve efficiency and make a difference in health care. Join a team where your skills and ideas can make an impact and where collaboration is key to creating technology that produces healthier outcomes.
Requisition Number: 2371904 Optum Tech is a global leader in health care innovation. Our teams develop cutting-edge solutions that help people live healthier lives and help make the health system work better for everyone. From advanced data analytics and AI to cybersecurity, we use innovative approaches to solve some of health care's most complex challenges. Your contributions here have the potential to change lives. Ready to build the next breakthrough? Join us to start Caring. Connecting. Growing together. We are seeking a Principal Site Reliability Engineer (SRE) to define and scale reliability practices across large-scale cloud platforms. This is a senior individual contributor role focused on setting SRE standards, influencing engineering teams, and driving reliability through automation and AI-enabled operations. This is a remote role with preference for candidates located in MN. You'll enjoy the flexibility to work remotely * from anywhere within the U.S. as you take on some tough challenges. For all hires in the Minneapolis or Washington, D.C. area, you will be required to work in the office a minimum of four days per week. What Makes This Role Unique: - Define and influence SRE best practices across multiple platforms and teams - Drive adoption of AI-enabled reliability and operational innovation (AIOps) - Work on mission-critical healthcare systems at enterprise scale - Blend hands-on technical depth with strategic influence - Partner across engineering, platform, and security teams to elevate reliability standards Primary Responsibilities: - Define and drive SRE standards across teams - Lead implementation of: - SLOs, SLIs, error budgets - Observability (metrics, logs, tracing) - Resiliency patterns (failover, self-healing) - Improve reliability through automation and proactive risk mitigation - Drive reliability practices in Azure environments - Apply AIOps (anomaly detection, intelligent alerting, automation) - Influence engineering teams without direct authority What Success Looks Like: - Established consistent SRE practices and standards across teams - Improved system reliability, observability, and incident response maturity - Delivered measurable gains in uptime, performance, and operational efficiency - Enabled AI-driven improvements in reliability and operations Why Join Optum? You'll be rewarded and recognized for your performance in an environment that will challenge you and give you clear direction on what it takes to succeed in your role as well as provide development for other roles you may be interested in. Required Qualifications: - Bachelor's Degree in Computer Science, Information Technology, or a related field, or equivalent practical experience - 10+ years of experience in Site Reliability Engineering, Software Engineering, or Cloud Engineering - Experience influencing multiple teams or platforms without direct ownership - Demonstrated experience improving reliability through automation, tooling, or AI-enabled approaches - Proven hands-on expertise in: - Reliability engineering (SLOs, SLIs, incident management, observability) - Distributed systems in cloud environments (Azure preferred) - Solid understanding of system design, performance, scalability, and failure modes Preferred Qualifications: - Experience implementing AI/ML or AIOps solutions in production environments (e.g., anomaly detection, alert optimization, automation) - Experience standardizing observability frameworks (e.g., OpenTelemetry or similar) - Experience working in complex enterprise or regulated environments - Background supporting large-scale, mission-critical systems - Proven ability to influence senior technical stakeholders - Location: MN *All employees working remotely will be required to adhere to UnitedHealth Group's Telecommuter Policy. Pay is based on several factors including but not limited to local labor markets, education, work experience, certifications, etc. In addition to your salary, we offer benefits such as, a comprehensive benefits package, incentive and recognition programs, equity stock purchase and 401k contribution (all benefits are subject to eligibility requirements). No matter where or when you begin a career with us, you'll find a far-reaching choice of benefits and incentives. The salary for this role will range from $xx,xxx to $xx,xxx annually based on full-time employment. We comply with all minimum wage laws as applicable. Application Deadline: This will be posted for a minimum of 2 business days or until a sufficient candidate pool has been collected. Job posting may come down early due to volume of applicants. At UnitedHealth Group, our mission is to help people live healthier lives and make the health system work better for everyone. We believe everyone-of every race, gender, sexuality, age, location and income-deserves the opportunity to live their healthiest life. Today, however, there are still far too many barriers to good health which are disproportionately experienced by people of color, historically marginalized groups and those with lower incomes. We are committed to mitigating our impact on the environment and enabling and delivering equitable care that addresses health disparities and improves health outcomes - an enterprise priority reflected in our mission. UnitedHealth Group is an Equal Employment Opportunity employer under applicable law and qualified applicants will receive consideration for employment without regard to race, national origin, religion, age, color, sex, sexual orientation, gender identity, disability, or protected veteran status, or any other characteristic protected by local, state, or federal laws, rules, or regulations. UnitedHealth Group is a drug - free workplace. Candidates are required to pass a drug test before beginning employment.
• Develop and maintain automated and manual test scripts for Salesforce applications (Sales Cloud, Service Cloud, etc.). Conduct functional, regression, integration, user acceptance testing, SI, and Smoke testing. • Collaborate with development and operations teams to streamline CI/CD processes using tools like Gearset, Git, Jenkins, and Salesforce DX. • Monitor and manage Salesforce environments and releases across sandboxes and production. Implement DevOps best practices to improve deployment efficiency and reduce errors. • Analyze test results, identify bugs, and provide feedback for continuous improvement. Ensure compliance with company policies and data security standards during deployment and testing.
Role Description - Gerenciamento e governança ambientes cloud utilizando a plataforma AWS e OCI; - Gerenciamento do ambiente Kubernetes (OpenShift); - Automação de provisionamento de servidores com Terraform; - Apoio na automação de testes e integração contínua; - Administração de Servidores Linux. Qualifications - Capacidade de conduzir projetos estratégicos de missão crítica. - Conhecimento Avançado em Sistema Operacional Linux; - Conhecimento em Apigateway; - Conhecimento em filas; - Conhecimento em Cloud OCI; - Cloud WAF Cloudflare; - APM; - Melhores práticas de FinOps; - Desenvolvimento de scripts em Shell Scripts/Python; - Experiência em Infraestrutura Cloud AWS com boas práticas - well architected, landing zone; - Vivência em Recursos de infraestrutura (EC2, VPC, S3, EKS, Route 53, SNS, SQS, API Gateway e Lambda); - Experiência com automação (Terraform); - Metodologias Ágeis; - Senso de dono e ser colaborativo(a) são características obrigatórias; - Atuar com containers e suas principais plataformas (Docker, Kubernetes, EKS, Openshift); - Desejável: AWS Certified Solutions Architect – Associate; CKA – Certified Kubernetes Administrator; Redhat Certified Specialist in Openshift Administration; Oracle Cloud Infrastructure 2024 Certified Cloud Operations Professional; Company Description



