Contact center solution and conversational AI driving more conversations and more revenue for sales and lead gen teams
Director of DevOps
Location
United States
Posted
6 days ago
Salary
$220K - $260K / year
Seniority
Lead
Job Description
Director of DevOps
Convoso
• Own service reliability across Convoso’s platform, being a champion for quality and resilience, and drive adherence to internal and external SLAs, SLOs, and SLIs • Collaborate with other teams to establish key metrics and dashboards in order to drive quality in the SDLC • Own the incident management process, including monitoring and observability systems, and major incident response process, tracking and reducing MTTR continuously • Leverage chaos engineering principles and “game days” to proactively test the resilience of Convoso’s platform • Establish and maintain a CI/CD center of excellence • Drive automation for CI/CD processes across different tech stacks and cloud providers. • Recruit and lead DevOps teams across multiple product lines that excel at applying industry best practices • Develop and support automated, scalable solutions to deploy and manage our global infrastructure. • Maintain and improve the use of automation tools for infrastructure provisioning, configuration, and deployment. • Work with Development and Operations personnel to define CloudOps processes, introduce new insights and technologies so that we can stay on the cutting edge. • Provide mentorship and expertise on system options, risk and impact management, as well as cost vs. benefit analysis. • Uphold and ensure security requirements for tooling, systems, and environments are met and protect the assets of the company and our customers. • Lead troubleshooting of system and performance problems in Prod/QA/Dev environments. • Identify improvements to reduce technical debt. • Lead infrastructure focused on optimization and performance. • Lead DevOps Team • Author a dynamic training and skills improvement plan. • Acquire team level certifications applicable to AWS or Google Cloud • Constantly evaluate our services to ensure high quality • Conduct reviews to identify optimization opportunities in processes or systems • Contribute to the creation of departmental procedure documents and working instructions • Works with consultants and team members to design the solutions to be implemented • Delegate tasks related to solutions and monitors team members’ progress • Inspire and motivate teamwork for achieving goals • Provide mentoring and identify training opportunities to team members, staying current in the latest technologies and best practices.
Job Requirements
- BS in Computer Science, Computer Engineering, or related technical field
- 5+ years experience as a Site Reliability Engineer leader leading multiple teams with a passion for automation and continuous improvement
- 5+ years experience in operational management of SaaS production applications
- 5+ years working with Agile/DevOps development teams
- 5 years minimum related experience in infrastructure management in a Linux environment
- Deep knowledge of cloud platforms like AWS or GCP and management of on-prem and hybrid environments
- Track record as a player-coach who is comfortable both managing and doing
- Experience deploying and operating containerized web applications
- Experience with cloud automation/provisioning and configuration management tools using Ansible, Chef, Puppet, Docker or Salt
- Experience with Containerization Tools such as Docker and Kubernetes
- Experience in planning, creating, implementing and maintaining a scalable software development infrastructure
- Knowledge of development, build, and CI/CD tools such as Git, GitHub and Jenkins
- A security background that will come in handy as we navigate through the process of SOC2 certification
- Good people and line management skills and the ability to recruit and develop a high performing team
- Process-oriented with great documentation skills
- Excellent oral and written communication skills.
Benefits
- Competitive compensation package
- Stock options
- 100% covered premiums for employees; Medical, Dental, Basic life insurance, Long term disability
- Affordable Vision plan and optional FSA
- PTO, Paid Sick Time, Holidays, Bereavement time, Parental Leave
- Your birthday off
- 401k program with generous company match
- No cost Employee Assistance Program and Travel Assistance
- Monthly Gym membership reimbursement
- Monthly credits toward food & beverage
- Company Outings
- On and offsite team building events
- Paid training for departments
- Apple laptop (most roles)
- And a team of highly experienced and kind colleagues!
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Senior SRE Manager
DynatraceDynatrace is a global application performance management software firm and a former member of Compuware. As an employer, the company is in support of helping it
Your role at DynatraceLead the APAC Site Reliability Engineering team located in Sydney, responsible for the reliability, availability, and performance of the Dynatrace SaaS platform. You will be the senior technical, operational, and people leader in APAC, working directly alongside your Site Reliability Engineers and Incident Commanders. You will be expected to work on incidents, lead customer escalations, and contribute technically, while also owning team health, people's growth, operational maturity, and regional SRE outcomes. You will represent APAC in global SRE initiatives and bring regional context into decisions that shape how we run SRE globally. You will report directly to the Senior Director, SRE based in EMEA. The APAC SRE team operates as part of a broader SRE Observability team spanning EMEA and APAC. Your leadership focus is on the APAC team, and planned engineering work is shared across the full team. Your success depends as much on strong async collaboration and global alignment as it does on regional execution. What You'll Do - Lead, mentor, and grow a team of <10 SREs and Incident Commanders. Set the bar for technical quality and operational discipline. - Be hands-on during high-severity incidents: help orchestrate the response, drive resolution, and derive learnings in the post-incident process. - Act as the primary interface for APAC customer escalations that require SRE involvement, working closely with Customer Success and Support. - Contribute actively to global SRE strategy, tooling, and platform reliability practices — not just regional operations. - Drive continuous improvement: reduce toil, improve observability, and push the team toward engineering-led reliability solutions. - Champion AI native practices across incident response, root cause analysis, toil reduction, and everyday engineering workflows — using them to take load off the team and setting the standard for how we work with AI. - Lead, mentor, and grow a team of <10 SREs and Incident Commanders. Set the bar for technical quality and operational discipline. - Be hands-on during high-severity incidents: help orchestrate the response, drive resolution, and derive learnings in the post-incident process. - Act as the primary interface for APAC customer escalations that require SRE involvement, working closely with Customer Success and Support. - Contribute actively to global SRE strategy, tooling, and platform reliability practices — not just regional operations. - Drive continuous improvement: reduce toil, improve observability, and push the team toward engineering-led reliability solutions. - Champion AI native practices across incident response, root cause analysis, toil reduction, and everyday engineering workflows — using them to take load off the team and setting the standard for how we work with AI. What will help you succeed - 5+ years of experience managing or leading high-performing SRE teams, preferably in distributed, global teams - Comfortable owning high-severity incidents end-to-end: declaring, coordinating, communicating, and closing. - Proven ability to manage customer escalations at a technical level: you can translate operational reality into clear, credible communication with customers and account teams. - Hands-on experience with AI-native engineering workflows. Using AI tooling to accelerate incident analysis, automate toil, or improve observability. You are not waiting for AI to mature; you are already working this way and want to lead others through the same shift. - Strong cloud-native fundamentals and hands-on experience with AWS, GCP, or Azure in a production SaaS context. - Experience with observability practices: SLIs, SLOs, alerting philosophy, and incident review culture. - A strong bias for action and a habit of making knowledge shared, not siloed. You document, automate, and build for scale even when the team is small. Why you will love being a Dynatracer - Dynatrace is a leader in unified observability and security. - We provide a culture of excellence with competitive compensation packages designed to recognize and reward performance. - Our employees work with the largest cloud providers, including AWS, Microsoft, and Google Cloud, and other leading partners worldwide to create strategic alliances. - You'll get to work at the forefront of innovation with Dynatrace Intelligence—the industry's first agentic operations system. Bringing together deterministic and agentic AI, it helps teams understand what's happening, why it matters, and what to do next— automatically. - Over 50% of the Fortune 100 companies are current customers of Dynatrace.
• Integrate automated security checks into the build and deployment processes • Design secrets as well as identity and access management • Implement protections at the network and application level • Harden the container infrastructure • Implement the stringent compliance and data-protection requirements of the healthcare sector
• Bindeglied zwischen Entwicklung und IT-Betrieb • Gestaltung, Automatisierung und Optimierung der Build-, Test- und Deployment-Prozesse • Entwicklung einer Container-Plattform • Pflege der IT-Infrastruktur mit klarer Dokumentation • Automatisierung wiederkehrender Aufgaben
• Drive infrastructure standardization and operational excellence by designing and developing scalable automation frameworks in Python that enable consistent and repeatable deployments across cloud and on-premises environments. • Accelerate infrastructure provisioning by building and enhancing Terraform code-generation platforms using Python and Jinja2, enabling teams to produce validated, environment-specific infrastructure code from reusable templates. • Improve system reliability and compliance by developing and maintaining Puppet modules, manifests, and Hiera configurations that manage Linux and Windows environments at scale. • Increase operational efficiency across Windows platforms by creating robust PowerShell automation solutions for server management, Active Directory administration, and hybrid cloud integrations. • Simplify complex infrastructure workflows by developing internal automation tools, command-line utilities, and APIs that empower engineering teams to self-service common operational tasks. • Enhance the speed and safety of infrastructure delivery by integrating automation frameworks with CI/CD pipelines, enabling automated validation, testing, and deployment of infrastructure changes. • Improve software quality and reduce deployment risk by implementing comprehensive testing strategies for infrastructure code, including unit testing, linting, and integration testing. • Partner closely with cloud, platform, and application teams to identify manual processes, eliminate operational toil, and drive automation-first solutions across the organization. • Enable long-term scalability and maintainability by creating clear documentation, standards, and runbooks for automation frameworks and infrastructure templates. • Contribute to a strong engineering culture by participating in code reviews, sharing best practices, and continuously improving the quality, security, and maintainability of automation solutions.



