Spend is the fuel to help your company deliver performance, profitability, and purpose!
Senior Site Reliability Engineer
Location
Michigan
Posted
40 days ago
Salary
0
Seniority
Senior
Job Description
Senior Site Reliability Engineer
Coupa Software
• Own end-to-end availability and performance of critical services, including building automation to prevent recurring issues • Administer Linux and Windows systems across web, application, and database servers • Develop and automate solutions using various programming languages • Provide application and infrastructure support, including participating in on-call rotations for emergencies • Enhance monitoring, alerting, and observability to ensure reliability and performance • Collaborate with cross-functional teams on releases, infrastructure, troubleshooting, and maintain documentation such as RCAs
Job Requirements
- Bachelor’s degree in Computer Science, Information Systems, or related field, with 5+ years of experience in system administration and large-scale web operations
- Strong programming skills (PowerShell, Python, Bash, or OOP languages) and experience with automation and configuration management tools (Chef, Puppet, Ansible, etc.)
- Hands-on experience managing cloud infrastructure (AWS, GCP) and container platforms (EKS, GKE), plus Infrastructure as Code tools like Terraform
- Proficiency in CI/CD pipelines, source control (Git with complex branching), and deployment/automation tools (Jenkins, Octopus, Rundeck)
- Solid understanding of networking and operations concepts (DNS, load balancing), monitoring tools (Datadog, Splunk, New Relic), and database administration (MS SQL Server)
- Strong Agile/Scrum experience (JIRA), ITIL practices (incident/change management, RCA), and excellent communication, problem-solving, and ownership skills
Benefits
- 401(K), 401(K) matching, Customized development tracks, Dental insurance, Volunteer in local community, Family medical leave, Flexible Spending Account (FSA), Free daily meals, Generous parental leave, Generous PTO, Health insurance, Highly diverse management team, Life insurance, Charitable contribution matching, Mentorship program, Paid volunteer time, Open office floor plan, Paid holidays, Paid sick days, Partners with nonprofits, Performance bonus, Pet insurance, Remote work program, Free snacks and drinks, Mandated unconscious bias training, Vision insurance, Wellness programs, Some meals provided, Mental health benefits, Diversity employee resource groups, Hiring practices that promote diversity, Fertility benefits, Employee resource groups, Employee-led culture committees, Day off for your birthday, Pension, Wellness days, Mother's room, Personal development training
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Design, implement, and manage Azure infrastructure using Infrastructure as Code (IaC) tools like Terraform, ARM templates, or Bicep. • Automate cloud deployments and manage resources using Azure DevOps pipelines, scripts, and other automation tools. • Monitor and optimize the performance, scalability, and cost of Azure services. • Create, maintain, and enhance CI/CD pipelines for applications and services hosted on Azure. • Collaborate with development teams to integrate code repositories, build processes, and deployment automation. • Ensure proper version control and release management practices are followed. • Manage and maintain Linux servers, ensuring high availability, security, and performance. • Perform regular system updates, patches, and configuration management using tools like Ansible, Puppet, or Chef. • Troubleshoot and resolve system-related issues, including networking, security, and application performance. • Implement and enforce security best practices for both Azure cloud services and Linux servers. • Conduct regular security assessments and audits, applying patches and updates as necessary. • Ensure compliance with relevant regulations and industry standards. • Work closely with development, QA, and operations teams to streamline processes and improve efficiency. • Provide support for production systems, including on-call rotations as needed. • Document processes, configurations, and best practices for both Azure and Linux environments.
• Azure Cloud Management: • Design, implement, and manage Azure infrastructure using Infrastructure as Code (IaC) tools like Terraform, ARM templates, or Bicep. • Automate cloud deployments and manage resources using Azure DevOps pipelines, scripts, and other automation tools. • Monitor and optimize the performance, scalability, and cost of Azure services. • CI/CD Pipeline Development: • Create, maintain, and enhance CI/CD pipelines for applications and services hosted on Azure. • Collaborate with development teams to integrate code repositories, build processes, and deployment automation. • Ensure proper version control and release management practices are followed. • Linux Administration: • Manage and maintain Linux servers, ensuring high availability, security, and performance. • Perform regular system updates, patches, and configuration management using tools like Ansible, Puppet, or Chef. • Troubleshoot and resolve system-related issues, including networking, security, and application performance. • Security and Compliance: • Implement and enforce security best practices for both Azure cloud services and Linux servers. • Conduct regular security assessments and audits, applying patches and updates as necessary. • Ensure compliance with relevant regulations and industry standards. • Collaboration and Support: • Work closely with development, QA, and operations teams to streamline processes and improve efficiency. • Provide support for production systems, including on-call rotations as needed. • Document processes, configurations, and best practices for both Azure and Linux environments.
Senior DevOps Engineer (Active Secret Clearance)
StriveworksStriveworks is a software development company that has created a platform to rework “the data analytic process as high-level code.” As an employer, the company desires to creat
Build, Deploy, and Maintain AI for an Unpredictable WorldStriveworks helps organizations harness the power of artificial intelligence to solve real-world national security and business challenges by serving as the command center between data, models, and business outcomes. Founded by data scientists and engineers, Striveworks set out to make the journey from deployment to ongoing optimization simple and effective. With Striveworks, organizations aren’t just deploying AI—they’re building systems that remain reliable, adaptable, and ready to scale in an unpredictable world. Mission-critical operations require models that perform where they’re deployed, scale as workloads grow, and adapt rapidly as AI capabilities advance. Striveworks meets these demands, increasing reliability and performance while lowering costs—and enabling confident, data-driven decision-making in dynamic environments. The RoleAs a Senior DevOps Engineer at Striveworks, you will be challenged—and trusted—on day one to take ownership of specific product deployments by maintaining, optimizing, and enhancing our on-premises and cloud computing environments. You will play a crucial role in the successful deployment of our software solutions to clients. You will be responsible for executing technical aspects of implementation projects and for ensuring the seamless integration, customization, and configuration of our software. Your expertise will play a critical role for the company as we deploy new instances of Striveworks’ AI operations (AIOps) capabilities to customer infrastructure. You are right for this opportunity if you value and possess technical expertise and you enjoy pushing the boundaries of your capabilities. You will be responsible for maintaining Striveworks’ software deployments using Infrastructure-as-Code (IaC) methodologies. Your day-to-day will include: - Automating IaC to manage virtual machines and deploy containers, services, and other infrastructure; leaning on expertise to deploy custom Kubernetes clusters in AWS, Azure, GCP, on-premises, or hybrid cloud environments - Working with platform developers, other DevOps teammates, and customer-facing teams to define requirements and build solutions for customer use cases of the platform - Software deployments to commercial and, later, unclassified, CUI, and classified Department of Defense (DOD) networks - Incident response and initial triage of critical system faults The Senior DevOps Engineer works on the DevOps team. You will be responsible for monitoring, automating, and improving software reliability, performance, and availability for various projects. You will also act as a liaison between platform developers and customer-facing teams, taking on operational tasks to ensure the efficient functioning of Striveworks’ solutions. You will work alongside a team of software engineers and data scientists to help them deploy and operate their work as functional products, learning from them so that building effective AI solutions becomes second nature. You may provide guidance and leadership to junior DevOps team members. You will directly contribute to the success of mission-critical systems within national security and commercial clients. You will be expected to wear multiple hats and to step into vacuums where improvements are needed, and you will be given the breadth to explore new technologies and solutions. This position offers a fully remote work environment, or you can work hybrid/on site at our office in northwest Austin, TX. You will be expected to travel up to 20% of the time. The Right FitIn addition to the specific skills and expertise detailed below, we are looking for individuals who share our values. Sharing a set of values allows us to move at the speed of trust. Collectively, we value a high-trust work environment where people respect each other and use candor kindly and constructively. We value work that intersects passion and perseverance, we geek out about the potential of our contributions, and we find joy in working hard on things that matter. Finally, we value taking ownership, having agency, and feeling individual responsibility for collective results. Here’s what we’re looking for: - 6+ years of direct, hands-on experience in: - Python and/or Golang programming, or other general purpose programming languages - Microservice deployment in Kubernetes - Diagnosing and resolving issues within containerized environments - Helm Chart and Kustomizations development/deployment - Automation and IaC (e.g., Terraform, Ansible) - Cloud infrastructure (e.g., AWS, Azure, GCP, or OpenStack) - Managing and troubleshooting Linux systems (e.g., RHEL, Ubuntu, CentOS) - The ability to work cross functionally to define requirements and build solutions for customer use cases of the platform - The ability to respond professionally and competently to incident reports and triage critical system faults - Active Secret (or above) US security clearance - Due to the nature of this role, candidates must have US citizenship The Wish ListWe are very interested in candidates who possess the above qualifications, and we appreciate and consider the addition of: - Experience with US federal information system security policies, including Security Technical Implementation Guides (STIGs), NIST 800-171, NIST 800-53, CMMC, and ICD 503 - Experience with software deployments to on-premises and cloud-based unclassified, CUI, and classified networks within the DOD - Experience with DevSecOps/DevOps and CI/CD for the administration and deployment of GPU-enabled servers - Experience deploying or maintaining Cloud Native Computing Foundation (CNCF) projects - Experience with network-attached storage (NAS) and storage area network (SAN) technologies - Experience with Kubernetes and cloud-native applications and services in denied, disrupted, intermittent, and limited impact (DDIL) environments The anticipated base pay range for this position is $160,000 to $200,000/year. Striveworks’ total compensation package includes a competitive base salary, equity grants, and cash bonuses. The Benefits - Medical/dental/vision insurance - Voluntary life, long-term disability, accident, and hospital indemnity insurance - HSA and FSA (including dependent care FSA) plans - 401(k) plan - Unlimited PTO - Paid parental leave Check us out on Built In! Striveworks is an Equal Opportunity Employer and does not discriminate in employment on the basis of race, color, religion, belief, sex (including pregnancy and gender identity or expression), national origin, social or ethnic origin, political affiliation, sexual orientation, marital status, disability, genetic information, age, membership in an employee organization, retaliation, parental status, military service, or other non-merit factors. Striveworks will not tolerate discrimination or harassment of any kind. If you require assistance or a reasonable accommodation in the application process, please contact Operations at hr@striveworks.us. In compliance with federal law, all persons hired will be required to verify identity and eligibility to work in the United States and to complete an employment eligibility verification form upon hire. Striveworks is a participating employer in the E-Verify program.
Senior Technical Manager – Site Reliability Engineering
CoalfireCyber solutions that move you forward, faster.
• Allocate approximately 70% of time to hands-on engineering tasks, such as developing new deployments, tooling, and automation scripts to address client needs • Dedicate around 30% of time to leadership duties, including mentoring engineers, ensuring quality deliverables, and managing escalations • Act as the primary escalation contact for complex technical issues, resolving them promptly to maintain high levels of client satisfaction • Monitor and uphold quality standards for engineering work, confirming alignment with internal protocols, compliance regulations, and project milestones • Identify and mitigate risks in partnership with consulting and solutions architecture teams, ensuring regulatory requirements and client expectations are fully addressed • Coordinate day-to-day engineering activities, tracking progress and adjusting resources to meet project goals on schedule utilizing Agile practice methodologies • Help create and implement solutions that improve the practice



