Job Closed

This listing is no longer active.

Devsu logo
Devsu

Devsu is a technology agency that provides software development services, IT augmentation and staffing.

Site Reliability Engineer

DevOps EngineerDevOps EngineerOtherRemoteTeam 51-200H1B No SponsorCompany SiteLinkedIn

Location

United States + 1 moreAll locations: United States | Dominican Republic

Posted

90 days ago

Salary

0

No structured requirement data.

Job Description

Site Reliability Engineer

Devsu

We are seeking a Site Reliability Engineer (SRE) with deep expertise in monitoring, observability, and reliability engineering to support systems running across on-premises infrastructure and Google Cloud Platform (GCP). This role is primarily responsible for designing, operating, and improving monitoring, alerting, and observability platforms, with a strong focus on Grafana and Kubernetes environments. As a secondary responsibility, this role provides backup coverage for the Application Support team during periods of resource constraints or major incidents, offering L2/L3 technical support when required. ResponsibilitiesMonitoring & Observability (Core Focus) - Own and operate the monitoring and observability stack across on-prem and GCP environments - Design, build, and maintain Grafana dashboards for infrastructure, Kubernetes, and applications - Define, tune, and maintain alerts to ensure high signal-to-noise ratio - Establish observability standards and best practices across teams - Improve visibility into system health, performance, and reliability Site Reliability Engineering - Apply SRE principles to improve availability, performance, and resilience - Define and track SLIs, SLOs, and error budgets - Participate in on-call rotations and SEV incident response - Lead or contribute to incident investigations and root cause analysis (RCA) - Drive preventative actions to reduce repeat incidents Kubernetes & Platform Reliability - Support and monitor Kubernetes environments (GKE and on-prem clusters) - Monitor cluster health, capacity, and resource utilization - Troubleshoot platform-level issues impacting application reliability - Collaborate with Platform and Engineering teams on reliability improvements Secondary Responsibilities (Backup Application Support) - These responsibilities are activated as needed, not part of day-to-day operations. - Provide L2/L3 application support coverage during: - Support team resource shortages - High-severity incidents (SEVs) - Peak support periods or escalations - Triage and troubleshoot application issues using existing runbooks and dashboards - Collaborate with Application Support and Engineering teams during incidents - Ensure all actions, findings, and resolutions are documented in ServiceNow (SNOW) - Strong experience as a Site Reliability Engineer or Reliability Engineer - Deep hands-on expertise with Grafana (dashboards, alerting, troubleshooting) - Solid experience with monitoring and observability systems - Production experience operating Kubernetes environments - Experience supporting systems in GCP and on-prem environments - Strong Linux systems and troubleshooting skills - Fluent English (written and spoken). - Ability to work in PST time zone. - Ability to participate in an on-call rotation that includes coverage for one weekend day. Time worked during the weekend is compensated with one day off during the week, in accordance with the established work schedule. Technology Stack: - Observability: Grafana, Prometheus, logging platforms - Containers: Kubernetes (GKE and on-prem) - Cloud: Google Cloud Platform (GCP) - Operations: Linux, networking, infrastructure monitoring - Incident Tools: PagerDuty, ServiceNow, Slack (or equivalents) Nice to have:  - Experience supporting application teams during SEV incidents - Knowledge of capacity planning and performance tuning - Scripting skills (Python, Bash, etc.) - Experience with hybrid infrastructure environments At Devsu, we believe in creating an environment where you can thrive both personally and professionally. By joining our team, you’ll enjoy: - A stable, long-term contract with opportunities for career growth - Private health insurance - A remote-friendly culture that promotes work-life balance - Continuous training, mentorship, and learning programs to keep you at the forefront of the industry - Free access to AI training resources and state-of-the-art AI tools to elevate your daily work - A flexible Paid Time Off (PTO) policy as well as paid holiday days - Challenging, world-class software projects for clients in the US and LatAm - Collaboration with some of the most talented software engineers in Latin America and the US, in a diverse work environment Join Devsu and discover a workplace that values your growth, supports your well-being, and empowers you to make a global impact.

Related Categories

Related Job Pages

More DevOps Engineer Jobs

ChowNow logo

Senior DevOps Engineer

ChowNow

The only fair-for-all food ordering marketplace — no commissions for restaurants and no hidden fees for diners.

DevOps Engineer90 days ago
OtherRemoteTeam 201-500Since 2011H1B Sponsor

• As a Senior DevOps Engineer at ChowNow, you will be specifically responsible for building, improving, and growing our technology infrastructure. • You will help design and implement reproducible processes in the enterprise environment as well as support the application production environment. • You will own and support engineering user-facing technology as well as share responsibility for supporting the production operations.

United States
$169.7K - $200.5K / year
Job Closed
Exol logo

Senior DevOps Engineer (Exol)

Exol

Symbotic is an automation technology leader reimagining the supply chain with its end-to-end, AI-powered robotic and software platform. Symbotic reinvents the warehouse as a strategic asset for the world’s largest retail, wholesale, and food & beverage companies Applying next-gen technology, high-density storage and machine learning to solve today's complex distribution challenges Transforms the flow of goods and the economics of supply chain for its customers

DevOps Engineer90 days ago
OtherRemoteTeam 501-1,000

Who we are With its A.I.-powered robotic technology platform, Symbotic is changing the way consumer goods move through the supply chain. Intelligent software orchestrates advanced robots in a high-density, end-to-end system – reinventing warehouse automation for increased efficiency, speed and flexibility. What we need Exol is seeking an experienced and security-focused Senior DevOps Engineer to join our growing team. In this role, you will bridge the gap between software development and IT operations, ensuring our infrastructure and software delivery pipelines are efficient, scalable, and secure. You will be the subject matter expert of our cloud infrastructure, heavily focused on Google Cloud Platform. You will work directly with software developers to automate software deployment workflows, infrastructure as code (IaC) pipelines, and ensure high availability for our software services. This is a fast-paced environment where agility is key. We need someone who can not only write clean, modular Terraform code but also strategize on cloud architectures and operational excellence. What we do Exol* is pioneering fulfillment as-a-service, offering outsourced warehousing operations and specializing in automated warehousing solutions. Our focus is on the efficient movement of goods in cases and pallets across all sectors, such as CPG, food and beverage, wholesale, and retail. *Exol is an independently managed joint venture between Symbotic and Softbank What you’ll do - Infrastructure as Code (IaC): Design, build, and maintain production-grade cloud infrastructure using Terraform. - You will be responsible for state management, module development, and ensuring our delivery pipelines are efficient, repeatable, and scalable. - Cloud Architecture: Architect and deploy secure, scalable solutions on GCP (GKE, Cloud Run, Compute Engine, Cloud SQL, VPCs, etc). - CI/CD Implementation: Build and optimize CI/CD pipelines (e.g., GitHub Actions, GitLab CI, or Jenkins) to enable seamless code deployment from development to production. - Multi-Cloud Strategy: Leverage your experience with other cloud providers (AWS or Azure) to assist with integrations, migrations, or disaster recovery strategies. - Reliability & Monitoring: Implement robust monitoring, logging, and alerting solutions (e.g., Prometheus, Grafana, Google Cloud Operations Suite) to ensure system health. - Collaboration: Act as an embedded consultant for the software development team, helping them containerize applications (Docker/Kubernetes) and troubleshoot issues. What we need - Bachelor’s degree in computer science or a related field preferred. - Minimum 8 years of DevOps or Cloud Engineering experience, with multiple years working in GCP. - Terraform Expertise: Deep proficiency in Terraform is non-negotiable. You must have experience writing custom modules, managing remote state, and preventing infrastructure drift. - Cloud Versatility: Demonstrated experience with multiple cloud providers is required. - Containerization: Strong experience with Docker or Kubernetes or GKE specifically. - Scripting: Proficiency in Python, Go, or Bash for automation tasks. - Environment: Proven track record working in a fast-paced software start-up environment; ability to context-switch and manage competing priorities effectively. Preferred Qualifications - GCP Professional Cloud Architect or DevOps Engineer certification. - Experience with "GitOps" workflows (e.g., ArgoCD, Helm Charts). - Proven knowledge of security compliance frameworks (SOC 2, ISO 27001) and deploying secure infrastructure. - Experience deploying and managing database platforms and storage lifecycles. Our Environment - Travel could be up to 10% of the time. Employee must have a valid driver’s license and the ability to drive and/or fly to client and other customer locations - The employee is responsible for owning a credit card and managing expenses personally to be reimbursed on a bi-weekly basis. - This is an in-warehouse role; you’ll spend time on the floor as well as in the office. - Flexibility to work multiple shifts (day, swing, night) or be on call depending on operational demands. - Ability to walk/stand for extended periods, climb stairs/ladders, and tolerate warehouse environmental conditions (temperature variations, noise, etc). #LI-JH2 #LI-Remote About Symbotic Symbotic is an automation technology leader reimagining the supply chain with its end-to-end, AI-powered robotic and software platform. Symbotic reinvents the warehouse as a strategic asset for the world’s largest retail, wholesale, and food & beverage companies. Applying next-gen technology, high-density storage and machine learning to solve today's complex distribution challenges, Symbotic enables companies to move goods with unmatched speed, agility, accuracy and efficiency. As the backbone of commerce the Symbotic platform transforms the flow of goods and the economics of supply chain for its customers. For more information, visit www.symbotic.com. We are a community of innovators, collaborators and pioneers who embrace our differences, because we know unique perspectives make us stronger and smarter. Every perspective matters. We depend on the collective voices of our employees, customers and community to help guide us as we build a better place to work – for you and the world. That’s why we’re proud to be an equal opportunity employer. We do not discriminate based on race, color, ethnicity, ancestry, religion, sex, national origin, sexual orientation, age, citizenship status, marital status, disability, gender identity, gender expression, veteran status, or genetic information. The base range for this position in the posted location is $147,000.00 - $202,400.00 however, base pay offered may vary depending on job-related knowledge, skills, and experience. The compensation package includes medical, dental, vision, disability, 401K, PTO and/or other benefits.

United States
$147K - $202K / year
Job Closed
BlueVoyant logo

Junior Dev Ops Engineer

BlueVoyant

BlueVoyant is a cloud-native cyber defense platform that delivers positive security outcomes that drive business results. The company converges external and int

DevOps Engineer90 days ago

Position: Junior Dev Ops Engineer Location: Remote, US or Canada Work Authorization: U.S. Citizenship required for all applicants (regardless of location) About the Position: BlueVoyant is seeking a Junior DevOps Engineer to join our Infrastructure Engineering team, responsible for building, operating, and scaling our multi‑cloud, multi‑region SaaS platform. In this role, you’ll support the systems and infrastructure that enable our services—from development through production—while growing your skills across the stack. You’ll gain hands‑on experience with cloud infrastructure, CI/CD pipelines, observability tooling, Kubernetes, and Software Engineering. This role is ideal for someone early in their career who brings strong fundamentals, curiosity, and a willingness to learn. You’ll work alongside engineers from both software and systems backgrounds, receiving training, mentorship, and support as you develop. About You: You are an early‑career engineer with a passion for automation, infrastructure, and improving operational reliability. You enjoy solving technical problems, learning new tools, and working collaboratively with experienced engineers. You ask good questions, take initiative, and thrive in environments where you can learn by doing. You don’t need to be an expert in everything; curiosity, strong fundamentals, and willingness to learn are most important. You are comfortable working across multiple domains—cloud services, CI/CD, container technologies, performance monitoring, and basic software engineering. You bring a customer‑minded approach, strong fundamentals, and a desire to contribute to reliable, scalable systems. Responsibilities: - Reduce operational workload by automating repeatable tasks. - Assist with deploying, supporting, and troubleshooting services in production. - Improve CI/CD pipelines using GitLab and Helm. - Contribute to cloud infrastructure using Terraform. - Support Kubernetes clusters and containerized workloads. - Create and maintain alerts and runbooks. - Contribute to observability across logs, metrics, and traces. - Read and write code to support internal tooling, applications, and services. - Participate in an on‑call rotation with full training and team support. Qualifications: - Bachelor’s degree in Computer Science or equivalent practical experience. (Candidates without degrees encouraged to apply.) - 1+ year of experience working with production or production‑like systems (projects, internships, co‑ops accepted). - Familiarity with at least one programming or scripting language (Python, Go, Java preferred; others acceptable). - Working knowledge of Linux/Unix fundamentals and networking basics (DNS, TLS/SSL, HTTP). - Some exposure to Kubernetes, Docker, and/or at least one major cloud provider (AWS, GCP, or Azure). - Basic familiarity with Infrastructure‑as‑Code concepts and tools (Terraform). - Ability to learn quickly, follow runbooks, collaborate effectively, and ask the right questions. - Customer‑focused approach to delivering reliable, scalable systems. Preferred Qualifications: - Experience with SQL databases such as PostgreSQL, Redis, Elasticsearch, or RabbitMQ. - Working knowledge of AWS, Azure, or GCP networking (PrivateLink, transit gateways, VPC peering, firewalls). - Familiarity with OpenTelemetry or other application performance monitoring tools. - Understanding of Unix system internals. - Relevant certifications (AWS/Azure/GCP, Kubernetes, Linux, networking, security) are beneficial but not required. About BlueVoyant At BlueVoyant, we recognize that effective cyber security requires active prevention and defense across both your organization and supply chain. Our proprietary data, analytics, and technology, coupled with deep expertise, works as a force multiplier to secure your full ecosystem. Accuracy! Actionability! Timeliness! Scalability! Led by CEO, Jim Rosenthal, BlueVoyant’s highly skilled team includes former government cyber officials with extensive frontline experience in responding to advanced cyber threats on behalf of the National Security Agency, Federal Bureau of Investigation, Unit 8200, and GCHQ, together with private sector experts. BlueVoyant services utilize large real-time datasets with industry leading analytics and technologies. Founded in 2017 by Fortune 500 executives, including Executive Chairman, Tom Glocer, and former Government cyber officials, BlueVoyant is headquartered in New York City and has offices in Maryland, Tel Aviv, San Francisco, London, Budapest, and Latin America. BlueVoyant uses AI-assisted tools within our applicant tracking system to help identify candidates whose experience and skills best match the requirements of a role. This technology provides hiring teams with additional insights to support fair and efficient hiring decisions. Please note that all applications are reviewed by a member of our hiring team, and final hiring decisions are made by humans, not AI. By submitting your application, you acknowledge that AI tools may assist in the evaluation of your resume as part of the recruitment process. For more information on how we process your personal data, please review our Candidate Privacy Notice available at https://www.bluevoyant.com/candidate-privacy-notice. All employees must be authorized to work in the United States. BlueVoyant provides equal employment opportunities to all employees and applicants for employment without regard to race, color, religion, sex, national origin, age, disability or genetics. In addition to federal law requirements, BlueVoyant complies with applicable state and local laws governing non-discrimination in employment in every location in which the company has facilities. Disclaimer: Please note that pursuant to contractual requirements and applicable law, in order for employees to perform work on some of the company’s federal contracts, U.S. citizenship is required. Accordingly, an employee’s ability to perform work on such contracts is contingent upon the company’s verification of the employee’s citizenship status. Furthermore, individuals may be subject to additional background checks and fingerprinting. BlueVoyant Candidate Privacy Notice To understand how we secure and manage your personal data upon submitting a job application, please see our Candidate Privacy Notice, which can be found here - Candidate Privacy Notice

United States
Full TimeRemoteTeam 5,001-10,000Since 1995H1B No Sponsor

• Maintain and optimize our AWS EC2 and EKS clusters to ensure high availability and performance • Lead troubleshooting of production outages, providing timely resolution and root cause analysis • Collaborate with development teams to address questions, provide support, and share best practices for deployment and operations • Communicate effectively with directors and other stakeholders, providing updates on cluster status, incidents, and ongoing improvements • Implement and improve CI/CD pipelines using Jenkins and GitHub Actions to streamline deployment processes • Create and maintain comprehensive documentation for systems, processes, and troubleshooting guides

Brazil
Job Closed