Atlassian is a publicly-traded computer software business specializing in collaboration, development, and issue-tracking software for teams. As an employer, Atlassian maintains a t
Senior Engineering Manager, SRE
Location
Worldwide
Posted
33 days ago
Salary
0
Seniority
Lead
No structured requirement data.
Job Description
Senior Engineering Manager, SRE
Atlassian
Role Description We're looking for a Senior Engineering Manager to lead a team of Site Reliability Engineers who are supporting the build of an exciting new infrastructure platform. - Your team will be responsible for using software engineering principles to reliably scale the Cloud infrastructure that underpins some of our products as well as the products themselves. - You’re an experienced manager, coaching engineers and technical leaders who report to you, supporting them in their professional development to unlock their potential, and encouraging them to step outside their comfort zone to grow and excel. - You roll up your sleeves and aren’t afraid to get hands-on to help your team, when the right opportunity calls. - You’ll also play an important role in the organization's leadership team, working with other engineering managers, architects, and technical program managers to steer the organization by contributing to the strategy and helping determine the right problems for the teams to invest in solving. Qualifications - Experience managing & growing technical leaders and teams. - A drive for operational excellence and experience with teams responsible for running mission-critical production services. - A passion for driving cultural change in technical excellence, quality and efficiency. - Familiarity with agile software development methodologies. Requirements - 5+ years experience implementing reliability & scale principles and practices. - 3+ years experience influencing teams outside your own organization with data and insights. - A demonstrated ability to foster an innovation culture in your teams. - 5+ years experience with large scale distributed systems and microservices. - 3+ years experience developing and implementing a long term strategy for a team.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
SRE Partner – Affirmative Action Position for Persons with Disabilities
Jusbrasil💻 Descomplicamos o acesso à informação jurídica por meio da tecnologia
• Ensure reliability, availability and scalability of systems and services in the Product Areas (PAs) where assigned. • Develop and implement monitoring, observability and alerting solutions integrated with the Agentic Engineering Platform. • Support teams in defining and tracking SLIs, SLOs and error budgets. • Structure and evolve on-call management in the PAs: rotation, escalation, alerting tools and incident management. • Work closely with the Engineering Platform to ensure platform capabilities reach and are adopted by product teams. • Actively contribute to the evolution of the Agentic Engineering Platform by bringing real feedback from PAs about friction points, gaps and opportunities for improvement. • Participate in and influence the building of a reliability-oriented (SRE) engineering culture across the company. • Support migrations of critical systems, environment segregation and deprecation of legacy technologies.
Operations Reliability Engineer - Automations
AlpacaDBAlpacaDB, Inc., also known as Alpaca and Alpaca Securities, is an API stock and crypto brokerage platform that enables services to embed investing and developer
Role Description As an Operations Reliability Engineer , you will embed directly within brokerage operations functions to systematically eliminate manual work and replace it with durable, auditable software systems. You start by immersing yourself in operational workflows: observing, documenting, and deeply understanding processes end-to-end before designing solutions. Every recurring manual process is treated as a system defect, and every fix you ship is measured by its real-world impact on efficiency and reliability. You will work closely with licensed brokerage staff, domain experts, and platform engineers to build automations and tooling that allow Alpaca's operations to scale globally without scaling headcount linearly. The ideal candidate is equally comfortable shadowing an operational process and architecting the backend service that replaces it. Things You Get To Do - Design, build, test, deploy, and monitor production automations and UIs that remove manual steps and reduce operation time. - Partner with frontend engineers to productize ops tooling so global teams can run functions with predictable staffing. - Execute operational procedures to surface painful manual processes prior to automation. - Instrument and report baseline and outcome metrics (MTTC, manual-steps removed, queue sizes, ops satisfaction) and iterate based on measured impact. - Produce Platform Opportunity Briefs / RFCs for higher-level platform tooling and automations. - Collaborate with licensed BD leadership, Compliance, and Security to build auditable, safe automations with role-based access and clear runbooks. - Own the full lifecycle of the systems you build, including automated deployment (CI/CD with tools like ArgoCD and Terraform), proactive monitoring, On-call support rotations and incident response, following a "you build it, you run it" philosophy. - Build systems with auditability, traceability, and data lineage as a first-class concern to ensure transparency for our auditors and regulators. Qualifications - 5+ years of professional software engineering experience, with a proven track record of shipping and operating complex, large-scale systems in production. - Strong business sense and understanding of operations. - Deep, hands-on expertise in Golang, including a strong command of its concurrency models (goroutines, channels), memory management, and standard library. - Proven track record of building user-facing features end-to-end with Typescript/React. - Proficient with SQL and relational databases, preferably PostgreSQL. - Demonstrated ability to reason about human workflows as systems, not just software services. - Experience with observability, tracing, continuous profiling. - Exceptional analytical and problem-solving skills, with the ability to deconstruct complex requirements into clear technical components and excellent communication skills for working in a cross-functional environment. - High ownership mindset with bias toward durable, structural fixes over tactical patches. Requirements - Knowledge of service oriented architectures. - Experience with major cloud platforms (we primarily use GCP). - Financial market (exchange, broker-dealers, clearing, etc.) knowledge. - Experience with Docker and Kubernetes. - A passion for financial markets or the desire to learn. - Knowledge of Agile/Scrum methodologies. - Demonstrable experience in designing, building, and reasoning about distributed systems, including a strong understanding of microservices architecture and API design patterns (e.g., REST, gRPC). - Experience with capacity planning and benchmarking. Benefits - Competitive Salary & Stock Options. - Health Benefits. - New Hire Home-Office Setup: One-time USD $500. - Monthly Stipend: USD $150 per month via a Brex Card.
• Collaborate closely with the development team to deploy and maintain application infrastructure. • Assist in the development and support of tooling to streamline the deployment and maintenance of our products. • Work with Kubernetes, Docker, Helm and ArgoCD to deploy applications from development through to production environments. • Support both in-house and third-party applications, including handling deployments, upgrades, and troubleshooting. • Write and manage automation pipelines for application deployment and maintenance. • Provision and manage infrastructure using Terraform. • Document processes and best practices clearly and concisely.
Senior Site Reliability Engineer
PlayOn! SportsThe nation's leading high school media company providing live streaming and digital ticketing services.
• Contribute to system observability i.e implementing, improving metrics, alerting, and dashboards for better insight and faster recovery. • Develop automation, tooling, and monitoring solutions to support high service availability. • Partner with application and quality engineering teams to implement best practices in reliability, release automation, and testing. • Drive operational excellence through proactive incident prevention, blameless postmortems, and capacity planning. • Participate in on-call rotations to support critical services and ensure rapid response to incidents.




