Stack AV logo
Stack AV

Revolutionizing the Transportation of Goods

Senior Site Reliability Engineer

DevOps EngineerDevOps EngineerFull TimeRemoteSeniorTeam 51-200H1B No SponsorCompany SiteLinkedIn

Location

United States

Posted

5 days ago

Salary

0

Seniority

Senior

No structured requirement data.

Job Description

Senior Site Reliability Engineer

Stack AV

Role Description Stack AV Site Reliability Engineers are responsible for enabling and ensuring our production systems meet their service-level objectives. Through the implementation of centralized observability and automation, the SRE team constantly ensures the health, reliability, scalability, and performance of Stack AV’s infrastructure. Members of the team are expected to contribute to a culture of continuous learning, provide consultation on architecting for high-availability, and ultimately drive the uptime and performance of our systems. Responsibilities - Monitor and maintain mission-critical production services to ensure maximum uptime. - Design and implement scalable distributed systems to facilitate the development of self-driving vehicles. - Design and implement an incident management framework and build a culture of blameless postmortems and continuous learning. - Scale the reliability and velocity of our systems and processes through increased automation. - Document actions to build a comprehensive library of runbooks, which will act as a knowledge base and foundation for automation. - Participate in an on-call rotation to uphold the SLOs and SLAs of production services. Qualifications - Expertise in at least one scripting language (e.g. Bash, Python). - Fundamental understanding of Linux operating system internals, TCP/IP networking, and storage subsystems. - Experience scaling and securing services in the cloud (AWS, GCP) or cloud native environments. - Experience using infrastructure-as-code principles to automate the creation of infrastructure resources (e.g. Terraform, CloudFormation). - Understanding of engineering design limitations and ability to provide guidance to teams to scale their services to achieve desired performance within budget. - Strong experience implementing and debugging cloud native and open source tools such as Kubernetes, etcd, Prometheus, OpenTelemetry, and Istio. - Strong communication skills and the ability to work effectively in a diverse and distributed team. Company Description Stack is developing revolutionary AI and advanced autonomous systems designed to enhance safety, reliability, and efficiency of modern operations. Stack's autonomous technology incorporates cutting-edge advancements in artificial intelligence, robotics, machine learning, and cloud technologies, empowering us to create innovative solutions that address the needs and challenges of the dynamic trucking transportation industry. With decades of experience creating and deploying real world systems for demanding environments, the Stack team is dedicated to developing an autonomous solution ecosystem tailored to the trucking industry's unique demands.

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Kong Inc. logo

Staff Site Reliability Engineer – Volcano

Kong Inc.

Kong Inc. is a cloud connectivity company founded in 2017 to create software products that power connections. Well-known as the creator of Kong, a widely adopte

DevOps Engineer5 days ago

• Own reliability for Volcano end-to-end: Define and drive SLOs, error budgets, and incident response practices for all Volcano services — edge deployments, managed Postgres, auth, realtime, storage, and the control plane. • Architect the platform's infrastructure: Design and build the multi-region Kubernetes infrastructure, networking, and data plane that powers Volcano's edge deployment pipeline and backend-as-a-service capabilities. • Build the GitOps and CI/CD backbone: Establish deployment automation, canary pipelines, and preview environment provisioning using ArgoCD, Helm, and Terraform/Terragrunt — setting patterns the broader team will follow. • Scale managed data services: Design, operate, and harden multi-tenant PostgreSQL clusters, Redis caching layers, and object storage — with a focus on data isolation, performance, and disaster recovery. • Drive observability from day one: Instrument every Volcano service with meaningful SLIs; build dashboards, alerts, and runbooks using Datadog, Prometheus, and Grafana before services go live, not after incidents. • Lead cross-functional reliability work: Collaborate with the OCTO team, product engineering, and security to bake reliability and compliance into Volcano's architecture — not bolt it on later. • Set SRE culture and standards: Mentor engineers across Volcano's contributing teams on reliability principles; lead postmortems, define on-call practices, and build a blameless engineering culture. • Evaluate and adopt emerging technologies: Given Volcano's greenfield nature, evaluate and make architectural decisions on edge runtimes, serverless compute, vector databases, and AI-native infrastructure components.

United States
$150K - $210K / year
CivicActions logo

DevOps Engineer

CivicActions

CivicActions is a leading development, design, and strategy organization founded in 2004. It serves clients from nonprofit organizations to government agencies

DevOps Engineer5 days ago

Role Description This position will join our cross-functional and highly collaborative team developing the next generation of digital services, using modern technologies and practices. This position is remote (work from home), requires a federal background investigation and US residence for 3 of the last 5 years. - Break down complex problems into understandable and iterative solutions - Infrastructure-as-code development and operations on Kubernetes environments using Docker and Helm - Familiarity with AWS services including EKS, RDS, S3, CloudWatch and managing infrastructure using Terraform - Continuous integration & continuous deployment with tools such as Gitlab CI, Github Actions or Jenkins - Create and maintain documentation, timely and detailed ticket updates and communications around work - Planning and implementing migration of systems and applications between hosts with minimal downtime - Can work both collaboratively and solo, with experience navigating complex troubleshooting scenarios Qualifications - At least six years of DevOps, SRE, IT, sysadmin, security, developer or other relevant experience - Site reliability engineering (SRE) and on-call rotation - must be able to respond nights and/or weekends, as necessary - Experience with Infrastructure-as-code development and operations on Kubernetes environments using Docker and Helm - Familiarity with AWS services including EKS, RDS, S3, CloudWatch and managing infrastructure using Terraform - Experience with continuous integration & continuous deployment - Experience working in Agile and cross-functional teams (with users, developers, product managers, security and compliance) Requirements - Nice to have: Team leadership and/or cloud architecture experience - Experience working with distributed teams - Experience with Lagoon, Ansible, GNU/Linux, Apache, PHP and/or Drupal configuration - Previous federal background investigation Benefits - Fully remote work (always) - Comprehensive medical, dental, vision, life, and disability coverage for employees, with company contributions toward dependent coverage - 401(k) with a 3% company contribution - Flexible time off policy - 12 weeks paid parental leave - Annual professional development stipend, $1,200 - Annual technology stipend, $820 - Employee growth plans, appreciation programs, and company summits to support connection and career development

United States
$125K - $145K / year
Job Closed
Deutsche Telekom IT Solutions logo

SRE Engineer

Deutsche Telekom IT Solutions

As Hungary’s most attractive employer in 2025 (according to Randstad’s representative survey), Deutsche Telekom IT Solutions is a subsidiary of the Deutsche Telekom Group. The company provides a wide portfolio of IT and telecommunications services with more than 5300 employees. We have hundreds of large customers, corporations in Germany and in other European countries. DT-ITS received the Best in Educational Cooperation award from HIPA in 2019, acknowledged as the Most Ethical Multinational Company in 2019. The company continuously develops its four sites in Budapest, Debrecen, Pécs and Szeged and is looking for skilled IT professionals to join its team.

DevOps Engineer5 days ago
Full TimeRemoteTeam 5,001-10,000

Role Description We are seeking a cloud engineer with a strong learning mindset to help build and operate our Google Cloud Platform (GCP) environment. You will work with senior engineers and architects to operate and implement infrastructure-as-code, networking, security guardrails, automation, and monitoring. This role suits someone with foundational, possibly theoretical, cloud knowledge who is eager to learn quickly and grow into a cloud engineering/architecture path. It is implemented as a Site Reliability Engineering (SRE) team. - Work with an agile mindset and according to agile methodology, collaborating closely within the team, with other teams in the Domain and Deutsche Telekom, and with software engineers using the cloud platform. - Approach work with a DevOps and continuous improvement mindset. - Contribute to the implementation of a reliable, scalable GCP platform under guidance from senior engineers. - Help design and maintain the landing zone in line with security and privacy guidelines. - Implement infrastructure-as-code (e.g., Terraform) for repeatable environments with code reviews and mentorship. - Assist with configuring networking components (load balancing, troubleshooting, DNS config, interconnect, VPCs, routing, VPN, distributed networks and how to integrate networks with cloud services) following established patterns. - Set up monitoring, logging, dashboards, and assist in alert tuning and runbook updates. - Automate routine platform tasks (CI/CD pipelines, scripts, tooling). - Maintain clear documentation and contribute to policies, standards, and guidelines. - Maintain current technical knowledge and make recommendations to help the team, hub, and company excel. Qualifications - English language proficiency (team and hub language is English). - 2 years of experience in cloud/platform/DevOps/SRE roles, or equivalent. - Familiarity with at least one major cloud (GCP preferred): core services like compute, storage, IAM, and basic networking concepts. - Proficiency in one programming language (Python, Go, Java, or Node.js) and shell scripting. - Understanding of version control with Git and CI/CD fundamentals. - Exposure to infrastructure-as-code concepts (Terraform or similar). - Knowledge of containers (Docker) and microservices concepts. - You are curious and interested in the latest technologies, especially cloud, agile methods, and DevOps. - You are willing to enter unknown territory, make mistakes, and learn from them together. - You enjoy varied topics in an interdisciplinary team and have high intrinsic motivation. - You can present and communicate ideas (e.g., about architecture) in a visual form. Requirements - Hands-on labs or projects on GCP, AWS, or Azure. - Intro-level certifications or in progress (e.g., Google Associate Cloud Engineer, AWS Solutions Architect – Associate, Azure Fundamentals). - Basics of networking (HTTP, TLS, load balancing), databases (SQL/NoSQL), and platform security/IAM. - Exposure to monitoring/logging tools (Cloud Monitoring/Logging, Prometheus, Grafana). Benefits - Dedicated mentorship from senior cloud architects/platform engineers. - Learning time and budget for certifications (target: Google Associate Cloud Engineer within 6–12 months). - Opportunities to present mini-architectures/POCs and contribute to platform standards. Additional Information - First months success indicators: - Ship reviewed Terraform code and CI/CD pipelines for production environments. - Implement monitored, documented configurations aligned with security guidelines. - Complete agreed GCP learning paths and attain at least one entry-level certification. - Please be informed that our remote working possibility is only available within Hungary due to European taxation regulation.

Hungary
Reltio logo

Senior Manager Engineering - DevOps

Reltio

Reltio is proud to be an equal opportunity workplace. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status.

DevOps Engineer5 days ago
Full TimeRemoteTeam 345Since 2011

Role Description Work at a dynamic pace with cutting-edge technologies like Kubernetes (K8S), CI/CD, and infrastructure-as-code tools in AWS, GCP, and Azure. Play a crucial role in implementing tools and strategies to enhance Reltio’s internal infrastructure platform. Address challenging problems by combining MDM and Big Data with an emphasis on cost efficiency, reliability, and availability. Innovate with automated approaches to scalability, capacity management, cost optimization, and elasticity. As the Senior Manager of DevOps Engineering, you will be responsible for managing and optimizing the infrastructure and services for Reltio’s cloud offerings, including MDM, RIQ, RDM, and other foundational components. You will focus on ensuring service reliability, security, and efficiency while fostering a culture of continuous improvement within the team. Team Leadership and Collaboration - Lead a team of DevOps engineers in implementing best practices and innovative solutions. - Support a collaborative, high-performance culture and provide mentorship to junior team members. - Collaborate with product managers, developers, InfoSec, and other stakeholders to align platform requirements and priorities. Infrastructure Management - Implement and manage infrastructure-as-code (IaC) solutions using tools like Terraform and Helm Charts. - Ensure the efficient provisioning and management of cloud resources across AWS, GCP, and Azure. Continuous Improvement - Monitor, troubleshoot, and optimize the performance of DevOps tools and platforms. - Implement and enhance CI/CD pipelines to streamline software delivery processes. Innovation and Tools Enhancement - Stay updated with industry trends and evaluate new tools, technologies, and best practices to enhance DevOps processes. - Drive process improvements and the adoption of modern DevOps methodologies. Security and Compliance - Collaborate with security teams to integrate security controls into DevOps processes and infrastructure. - Ensure documentation of technical designs, procedures, and configurations to maintain system integrity. Qualifications - Bachelor's degree in Computer Science, Engineering, or related field, or equivalent work experience. - 8-10 years of experience in DevOps Engineering or a similar role with a solid background in software development and infrastructure operations. - 3-5 years of experience in leading teams in a DevOps or related technical environment. - Strong experience with cloud platforms like AWS, Azure, or Google Cloud Platform. - Understanding of containerization technologies (e.g., Docker, Kubernetes) and orchestration tools. - Experience with infrastructure-as-code (IaC) tools such as Terraform or CloudFormation. - Proficiency with CI/CD tools (e.g., Jenkins, ArgoCD) and practices. - Solid troubleshooting and problem-solving skills. - Strong communication and collaboration skills. - Proven ability to drive process improvements and implement DevOps best practices. - Knowledge of security best practices in cloud environments. Skills That Are Nice to Have - Familiarity with monitoring and logging tools such as Prometheus, Grafana, ELK Stack, or OpenTelemetry. - Experience with agile development methodologies and DevOps practices. Benefits - Flexible work arrangements to help our people manage their personal and professional lives. Company Description Reltio is proud to be an equal opportunity workplace. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. Reltio is committed to working with and providing reasonable accommodation to applicants with physical and mental disabilities.

India