Founded in 2003, First Advantage provides comprehensive background-check insights and solutions, enabling employers and housing providers to make confident choices, diminish risks,
SRE Lead (US Remote)
Location
United States
Posted
85 days ago
Salary
$120K - $150K / year
Seniority
Lead
No structured requirement data.
Job Description
SRE Lead (US Remote)
First Advantage
At First Advantage (Nasdaq: FA), people are at the heart of everything we do. From our customers and partners to our greatest advantage — our team members. Operating with empathy and compassion, First Advantage fosters a global inclusive workforce devoted to the diverse voices that make up our talent and products. Our team members empower each other to be their authentic selves and treat all with respect, integrity, and fairness. Say hello to a rewarding career, and come join a leading provider of mission-critical background screening solutions to some of the most recognized Fortune 100 and Global 500 brands. First Advantage is a global leader in background screening, identity, and verification solutions. As we continue to scale our digital platforms and modern cloud-native infrastructure, we are seeking a highly skilled and forward-thinking Lead Site Reliability Engineer (SRE) to drive reliability, resilience, and operational excellence across our systems. The Lead SRE will be responsible for guiding reliability strategy, overseeing complex incident response, improving observability, strengthening automation and CI/CD practices, and partnering closely with engineering teams to embed SRE principles throughout the organization. This role requires a deep understanding of modern cloud architecture—including both Azure and AWS—as well as expertise in Linux systems, monitoring technologies, and root‑cause analysis. This is a senior hands-on engineering role, ideal for someone who enjoys solving difficult problems at scale and mentoring others while driving meaningful improvements to uptime, performance, and customer experience. What You'll Do: - Site Reliability & Platform Stability - Lead reliability initiatives across multiple high-availability, large-scale SaaS systems, ensuring platform uptime, performance, and resilience. - Build and maintain distributed systems, infrastructure components, and automation tooling to ensure consistent, reliable delivery of production services. - Champion proactive reliability engineering, holistic system monitoring, and continuous operational improvements. - Partner with architecture, engineering, and operations teams to define SLAs, SLOs, and SLIs. - Cloud Engineering (Azure & AWS) - Architect, build, and maintain cloud infrastructure using best practices. - Guide cloud migrations, cost optimization, and resilience engineering across multi-cloud environments. - Implement and enforce cloud security, compliance, and governance standards. - DevOps, CI/CD, and Automation - Create and maintain CI/CD pipelines using GitHub Actions, Azure DevOps, Jenkins, or equivalent. - Automate deployments using IaC tools (Terraform, Bicep, CloudFormation). - Reduce manual operational burden through automation and self-service tooling. - Monitoring, Observability & Performance - Implement observability stacks covering metrics, logs, traces, and synthetic checks. - Standardize monitoring practices using industry tooling. - Perform performance analysis, load testing, and optimization. - Incident Response & Management - Serve as Incident Commander for major production incidents. - Define and improve incident management processes. - Ensure clear communication during outages and lead technical bridges. - Deliver high‑quality RCAs with actionable follow‑ups. - Root‑Cause Analysis (RCA) & Continuous Improvement - Drive deep, data‑driven RCAs and long-term reliability improvements. - Identify and eliminate systemic issues and operational toil. - Leadership, Collaboration & Mentorship - Provide technical leadership across teams. - Mentor engineers and promote SRE best practices. - Foster strong cross‑functional partnerships. What You'll Need to be Successful: - 7+ years in SRE, DevOps, Platform Engineering, or Cloud Engineering. - Strong expertise in Azure and AWS. - Proficiency in CI/CD, automation, and release engineering. - Deep monitoring, logging, and observability experience. - Incident response leadership experience. - Proven RCA experience. - Strong Linux skills. - Scripting skills (Python, Bash, PowerShell, Go). - IaC experience. - Strong systems and networking fundamentals. - Additional Preferred Qualifications - Experience with large-scale distributed systems. - Message queues or event streaming knowledge. - Familiarity with incident management frameworks. - Multi-cloud enterprise experience. - Kubernetes, ECS, AKS, or EKS exposure Why First Advantage is Your Next Big Career Move First Advantage is going through a technology transformation! We are looking for experts who are excited to work with advanced technologies and provide best-in-class user experiences, drive the development and deployment of scalable solutions, and smoothly guide our agile teams and clients through meaningful changes as we continue to expand our impact. What Are You Waiting For? Apply Today! You have learned a little about us today – we want to learn about you! If you think this position and our company are a great fit for your areas of interest and expertise, tell us about you by applying now! The salary range for this position is approximately $120,000 - $150,000 base annually. This range reflects our good faith estimate to pay fairly as to what our ideal candidates are likely to expect, and we tailor our offers within the range based on the selected candidate’s experience, industry knowledge, technical and communication skills, and other factors that may prove relevant during the interview process. United States Equal Opportunity Employment: First Advantage is proud to be a global leader in removing barriers and supporting our community members to ensure the changing demographics of the workforce are reflected in our hiring and employment practices. We value all of our candidates, employees, and clients, and place great emphasis on hiring and supporting qualified individuals in each role. We are an equal opportunity employer. We do not discriminate on the basis of race, color, ethnicity, ancestry, religion, sex, national origin, sexual orientation, age, citizenship status, marital status, disability, gender identity, gender expression, veteran status, genetic information, or any other area protected by applicable law.
Job Requirements
- 7+ years in SRE, DevOps, Platform Engineering, or Cloud Engineering.
- Strong expertise in Azure and AWS.
- Proficiency in CI/CD, automation, and release engineering.
- Deep monitoring, logging, and observability experience.
- Incident response leadership experience.
- Proven RCA experience.
- Strong Linux skills.
- Scripting skills (Python, Bash, PowerShell, Go).
- IaC experience.
- Strong systems and networking fundamentals.
- Additional Preferred Qualifications
- Experience with large-scale distributed systems.
- Message queues or event streaming knowledge.
- Familiarity with incident management frameworks.
- Multi-cloud enterprise experience.
- Kubernetes, ECS, AKS, or EKS exposure.
Benefits
- Competitive salary range of approximately $120,000 - $150,000 base annually.
- Opportunities for professional growth and development.
- Inclusive and supportive work environment.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Staff Site Reliability Engineer
Coalition, Inc.Coalition is the world's first Active Insurance provider designed to help prevent digital risk before it strikes. Founded in 2017, Coalition combines comprehensive insurance coverage and innovative cybersecurity tools to help businesses manage and mitigate potential cyberattacks. Work at Coalition is centered on the joint mission to Protect the Unprotected. We have built a remote-first, highly inclusive culture that welcomes people from diverse backgrounds. We trust each other to take responsibility, share ownership of outcomes, and put in the work together to protect businesses from digital risk. Coalition’s exceptional growth stems from its ability to address real-world problems for organizations of all sizes while remaining true to our founding values of character, humility, responsibility, purpose, authenticity, and inclusion.
About us Coalition is the world's first Active Insurance provider designed to help prevent digital risk before it strikes. Founded in 2017, Coalition combines comprehensive insurance coverage and innovative cybersecurity tools to help businesses manage and mitigate potential cyberattacks. Opportunities to make an impact with bold thinking are real—and happening daily at Coalition. About the role We are looking for a Staff Site Reliability Engineer to lead AI enablement across our engineering organization. As AI-assisted development reshapes how software gets built, a new platform layer is emerging underneath — one that requires guardrails, quality gates, security standards, and tooling infrastructure to ensure AI-generated output is reliable, secure, and production-worthy. This role owns that layer. This role blends building and buying — you'll design and develop custom tools and frameworks where the market doesn't meet our needs, while continuously evaluating the evolving landscape to ensure we're leveraging the best solutions available. We aim to be on the cutting edge, not the bleeding edge — investing deliberately in what delivers real value and staying ready to pivot when the market shifts meaningfully. You will define and drive the strategy for embedding AI-native tools and practices into the software development lifecycle — from AI-assisted code review and developer workflow automation to establishing security standards for emerging frameworks like MCP. You'll own AI tooling standards for the engineering org, evaluate and adopt the best platforms, use data to measure impact and prioritize where to invest next, and partner with teams to automate repetitive workflows using agentic tools. This is a visible, high-influence role — you'll run lunch-and-learns, shape best practices, and be the go-to voice for how we leverage AI to multiply engineering output while keeping the foundations trustworthy. This role sits within our Platform SRE team, and you'll participate in the team's ad-hoc support rotation, providing infrastructure guidance and troubleshooting for engineering teams. This means you bring deep SRE fundamentals — AWS, Terraform, production operations — alongside your AI enablement focus. Responsibilities - 8–10+ years of experience in SRE, DevOps, Cloud Engineering, Platform Engineering, or Software Development roles - Hands-on experience with AI-assisted development tools such as Cursor, GitHub Copilot, or similar - Experience building AI/LLM-powered developer tools or integrations - Demonstrated ability to drive org-wide tooling adoption, including change management, training, and measuring outcomes - Proficiency in prompt engineering techniques - Proficiency in Go or Python, with experience building production-grade automation, tooling, or libraries - Hands-on experience operating production environments in AWS - Strong experience with Terraform - Experience with container orchestration platforms like ECS or Kubernetes - Familiarity with CI/CD tools such as GitHub Actions - Solid understanding of observability practices including system metrics, distributed tracing, and SLOs. Datadog is a plus. - Exceptional communication and presentation skills, both written and verbal Skills and Qualifications - AI Enablement Strategy: Define and own the standards and best practices for AI-assisted development across the engineering organization, from tool selection to workflow integration. - Tooling Development: Evaluate, build, or adopt AI-powered tools that improve code quality, catch vulnerabilities earlier in the development process, and reduce review cycle times — whether that means evolving internal solutions or identifying and integrating third-party platforms. - Adoption & Advocacy: Partner with engineering teams to understand what's impacting their AI tool adoption, guide them through improvements, and lead org-wide enablement efforts such as lunch-and-learns, workshops, and documentation. - Measuring Impact: Establish metrics and feedback loops to quantify the impact of AI tooling on developer productivity, code quality, and delivery speed. - Infrastructure Automation: Contribute to the design and scaling of production environments using AWS and Terraform when on rotation or as needs arise. - Mentorship & Standards: Mentor engineers across the team, uphold high infrastructure quality, and actively shape the best practices and standards used by the organization. - On-Call: Participate in a low-volume on-call rotation. Bonus Points - Experience troubleshooting complex distributed systems in a high-traffic production environment. - Exposure to event streaming systems such as Kafka or Kinesis. - Experience building Internal Developer Platforms (IDP) or designing self-service infrastructure workflows. - Familiarity with systems security, compliance requirements, or infrastructure hardening. - Experience with agentic AI workflows, MCP frameworks, or AI-powered automation beyond code generation. - Track record of leading incident response or driving post-incident review processes. Compensation Our compensation reflects the cost of labor across several US geographic markets. The US base salary for this position ranges from $150,000/year in our lowest geographic market up to $200,000/year in our highest geographic market. Consistent with applicable laws, an employee's pay within this range is based on a number of factors, which include but are not limited to relevant education, skills, job-related knowledge, qualifications, work experience, credentials, and/or geographic location. Your recruiter can share more on target salary for your location during the interview process. Coalition, Inc. reserves the right to modify this range as needed. Perks - 100% medical, dental and vision coverage - Flexible PTO policy - Annual home office stipend and WeWork access - Mental & physical health wellness programs (One Medical, Headspace, Wellhub, and more)! - Competitive compensation and opportunity for advancement Why Coalition? Work at Coalition is centered on the joint mission to Protect the Unprotected. We have built a remote-first, highly inclusive culture that welcomes people from diverse backgrounds. We trust each other to take responsibility, share ownership of outcomes, and put in the work together to protect businesses from digital risk. Coalition’s exceptional growth stems from its ability to address real-world problems for organizations of all sizes while remaining true to our founding values of character, humility, responsibility, purpose, authenticity, and inclusion. We’re always looking for collaborative, inquisitive individuals to join #OurCoalition. Visit our Newsroom > Privacy Notice Coalition is committed to protecting your privacy and handling your personal information responsibly. We collect, use, and store personal information as necessary for the recruitment process and in compliance with applicable privacy laws and regulations in all regions where we operate. We want you to understand what personal information we collect, how we use it, and your rights regarding access, correction, and deletion of your data where applicable. Information submitted, collected, and processed as part of your application is subject to Coalition's Privacy Policy. For further details, please review our full Privacy Policy or contact us with any questions regarding how your information is handled. Our Privacy Policy > Safe Hiring Notice All legitimate communication from Coalition comes from @coalitioninc.com emails, and open roles are listed only on our Careers page. We never ask for payment, banking details, or personal identification before an offer is accepted through our secure systems. If you believe you’ve been a victim of fraudulent recruiting, follow guidance from the Federal Trade Commission (FTC). Anti-Discrimination Notice Coalition is proud to be an Equal Opportunity employer. Our policy is to provide equal employment opportunities to all individuals, without discrimination or harassment on the basis of any characteristic protected by applicable laws in each country where we operate. This commitment includes, but is not limited to, ensuring equal treatment in recruitment, selection, training, promotion, transfer, compensation, and all other aspects of employment. Coalition does not tolerate discrimination or harassment of any kind, and we are dedicated to fostering an inclusive and supportive workplace. Accommodations Coalition is committed to providing reasonable accommodations to qualified individuals with disabilities, including applicants and employees, in accordance with applicable laws and regulations in each country where we operate. Our policy is to support equal opportunity in the hiring process by considering qualified applicants regardless of disability or other protected characteristics, unless providing accommodation would impose an undue hardship or disproportionate burden. If you require accommodation to complete an application, interview, pre-employment testing, or participate in the selection process, please contact us at candidateaccommodations@coalitioninc.com. We also consider all qualified applicants, including those with criminal histories, in line with applicable laws and regulations in each jurisdiction. To all recruitment agencies: Coalition does not accept unsolicited agency resumes. Do not forward resumes to our email alias, employees, or other physical or virtual organization locations. Coalition is not responsible for any fees related to unsolicited resumes.
• Continue implementing and evolving the Kubernetes cluster; • Execute the migration of virtualized environments on EC2 to Kubernetes; • Maintain and improve existing Dockerfiles; • Manage CI/CD pipelines; • Provide support for the Development environment; • Ensure observability, reliability, and best practices for infrastructure as code; • Contribute to the continuous improvement of architecture and automation processes.
• Build and maintain CI/CD pipelines and GitOps workflows across a diverse set of engineering teams (Blockchain, Frontend, Backend, iOS, Android, QA). • Own observability — monitoring, alerting, logging — and support development teams in instrumenting their services. • Participate in incident response and drive post-incident reviews and actions. • Optimise infrastructure for security, cost, performance and reliability. • Partner with engineering teams on new product launches and innovative DeFi solutions.
Senior DevOps Engineer – Financial
Truelogic SoftwarePremium boutique software development company that helps brands with big ideas to make a difference in people’s lives.
• Design, build, and maintain cloud environments within Microsoft Azure using best practices for scalability, reliability, and cost efficiency. • Implement and manage Infrastructure as Code (IaC) using Terraform to automate resource provisioning and environment configuration. • Build and maintain CI/CD pipelines using Azure DevOps for application deployments, infrastructure, and automated testing. • Collaborate with development teams using C#, .NET, Visual Studio, and Angular to optimize the build, test, and release processes. • Develop container strategies using Docker and manage workloads running on Kubernetes (AKS). • Monitor system performance, troubleshoot issues, and implement improvements to enhance reliability and uptime. • Implement secure configurations for Azure resources and Kubernetes clusters.




