High Tech Genesis

Product Engineering Services for the High-Tech Sector

Senior DevOps Engineer

DevOps EngineerDevOps EngineerFull Time Remote SeniorTeam 51-200H1B No SponsorCompany Site LinkedIn

Location

Canada

Posted

75 days ago

Salary

CA$70 - CA$80 / hour

Seniority

Senior

Bachelor Degree10 yrs expEnglishAWS Cloud Docker Kafka Kubernetes Python RabbitMQ Terraform

Job Description

• Build, optimize, and manage Continuous Integration and Continuous Deployment (CI/CD) pipelines. • Automate build, testing, and deployment processes. • Ensure faster, reliable, and repeatable software releases. • Troubleshoot pipeline failures and improve performance. • Design and manage infrastructure using code instead of manual configuration. • Automate provisioning of servers, networks, and environments. • Ensure consistency across development, staging, and production environments. • Implement version control for infrastructure changes. • Architect, deploy, and manage cloud-based systems. • Optimize scalability, availability, and cost efficiency. • Monitor cloud performance and resource utilization. • Implement backup, recovery, and disaster recovery strategies. • Implement monitoring and alerting systems for infrastructure and applications. • Ensure high system availability and performance. • Manage incident response and root cause analysis. • Maintain logging solutions for troubleshooting and auditing. • Integrate security practices into development and deployment pipelines. • Manage access control, secrets, and vulnerability scanning. • Ensure systems comply with organizational and regulatory standards. • Implement automated security checks.

Job Requirements

10 years of Devops experience
Experience implementing or managing CI/CD pipelines using Harness (AI-powered DevOps platform)
Team lead exp but is also hands on
Strong experience working in an AWS environment
Hands-on expertise with AWS Lambda (serverless applications)
Deep understanding of event-based messaging systems such as Solace, RabbitMQ, or Kafka
Proficient with containers and orchestration tools: ECS, Docker, Kubernetes
Expert-level knowledge of Terraform for infrastructure as code (IaC)
Skilled in scripting using Bash and/or Python
Experience with traceability and monitoring tools such as Dynatrace
Familiar with Service Mesh technologies including Kong or Istio
Strong background in building and maintaining CI/CD pipelines
Ideal background: Cloud DevOps Engineer or Platform Engineer

Related Categories

DevOps Engineer

Related Job Pages

Remote Full-time Jobs (US)Remote Python Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

Staff Engineer - SRE, Retail and Pharmacy

CVS Health

Bringing our heart to every moment of your health.

DevOps Engineer75 days ago

Other RemoteTeam 10,001+Since 1963H1B No Sponsor

Company Site LinkedIn

We’re building a world of health around every individual — shaping a more connected, convenient and compassionate health experience. At CVS Health®, you’ll be surrounded by passionate colleagues who care deeply, innovate with purpose, hold ourselves accountable and prioritize safety and quality in everything we do. Join us and be part of something bigger – helping to simplify health care one person, one family and one community at a time. The Staff Engineer – SRE, Retail & Pharmacy will implement and maintain comprehensive observability solutions, providing real-time insights into the performance and overall health of systems to proactively identify and address potential issues. This role is responsible for investigating and resolving incidents quickly during critical situations and performing root cause analysis to prevent future recurrence. You will collaborate with cross-functional teams to build robust monitoring, alerting, and telemetry solutions, enabling proactive issue detection and resolution across distributed systems. As a senior member of the SRE team, you will drive best practices, mentor others, and shape the strategic evolution of our observability ecosystem in a complex, edge-centric architecture. What You Will Do: - Observability Strategy & Implementation - Design and implement comprehensive observability solutions tailored for edge computing environments, including monitoring, logging, tracing, and metrics collection, to provide deep visibility into system performance and health across distributed remote facilities - Define and maintain Service Level Indicators (SLIs), Service Level Objectives (SLOs), and business KPIs to measure and enhance system reliability in edge and centralized infrastructure - Build and optimize dashboards, visualizations, and alerting systems to enable real-time insights and rapid incident response for edge nodes and remote facilities - Implement distributed tracing and log aggregation systems to troubleshoot complex issues in edge computing environments - System Reliability & Performance - Collaborate with engineering teams to ensure applications and infrastructure at edge locations are designed with observability in mind, incorporating best practices for instrumentation and monitoring in resource-constrained environments - Drive proactive identification of issues in edge facilities through advanced observability tools, reducing Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR) across distributed systems - Lead incident postmortems, analyzing root causes specific to edge environments and implementing observability-driven improvements to prevent recurrence - Tooling & Automation - Develop and maintain tools, scripts, and automation to enhance observability pipelines, optimizing for the unique challenges of edge computing, such as bandwidth limitations and intermittent connectivity - Evaluate and integrate industry-standard observability tools (e.g., Prometheus, Grafana, ELK Stack, OpenTelemetry) and recommend solutions tailored for edge computing use cases - Optimize observability data storage, retention, and querying to balance performance, cost, and scalability across a large number of remote facilities - Leadership & Collaboration: - Mentor and guide junior SREs and engineers on observability best practices for edge computing, fostering a culture of reliability and proactive monitoring - Partner with solution, engineering, and business teams to align observability efforts with business objectives, ensuring seamless operation of edge and centralized systems - Lead cross-functional initiatives to improve observability, reliability, and operational efficiency across distributed edge infrastructure - Continuous Improvement: - Stay current with emerging observability trends, tools, and methodologies, particularly those suited for edge computing and distributed systems, and advocate for their adoption - Contribute to the development of observability standards, runbooks, and documentation tailored for edge environments to ensure consistency and scalability - Drive cost optimization for observability infrastructure while maintaining high-quality monitoring and alerting capabilities across remote facilities Minimum Qualifications: - 8+ years of experience in SRE, DevOps, or related technology roles - 5+ years of experience in delivering software in a large-scale environment with reliability and resilience concepts (multi-region, multi-cloud, containerization, etc.) - 5+ years of experience with observability and monitoring tools such as Splunk, Dynatrace, Datadog, Prometheus, Grafana, etc. - 3+ years of experience with programming/scripting languages (e.g., Python, java) for automation and tooling in distributed environments - 3+ years of experience on Cloud Technologies (AWS, Microsoft Azure, Google Cloud - 3+ years of experience with source control and continuous integration tools like Git/Stash, BitBucket, or Jenkins - 2+ years of engineering team leadership or management experience - Experience using customer feedback tools such as Quantum Metrics, Medalia, and Adobe Analytics - Deep understanding of microservices architecture and cloud-native technologies - Experience in configuring, supporting, and managing Rancher, Kubernetes, and/or Docker - Experience in Incident Management, Change Management, Infrastructure Support, and Problem Management concepts and processes - Excellent interpersonal and communication skills, including the ability to engage technical and non-technical stakeholders Preferred Qualifications: - Expertise working in edge computing environments with a large number of remote facilities, managing observability for distributed, high-latency, or resource-constrained systems - Familiarity with chaos engineering principles to validate observability systems in edge environments - Experience with retail SRE organizations, including experience with store systems; Point of Sale (POS), hand-helds, etc. - Expertise in cloud development and deployment technologies, including containerization and multi-cloud configurations - Demonstrated understanding of various API management and related platforms like Apigee, Vordel, Data power Education: - Bachelor’s degree in Computer Science, Engineering, or related field required - Master’s degree in Computer Science, Engineering, or related field preferred Pay Range The typical pay range for this role is: $118,450.00 - $260,590.00 This pay range represents the base hourly rate or base annual full-time salary for all positions in the job grade within which this position falls. The actual base salary offer will depend on a variety of factors including experience, education, geography and other relevant factors. This position is eligible for a CVS Health bonus, commission or short-term incentive program in addition to the base pay range listed above. This position also includes an award target in the company’s equity award program. Our people fuel our future. Our teams reflect the customers, patients, members and communities we serve and we are committed to fostering a workplace where every colleague feels valued and that they belong. Great benefits for great people We take pride in our comprehensive and competitive mix of pay and benefits – investing in the physical, emotional and financial wellness of our colleagues and their families to help them be the healthiest they can be. In addition to our competitive wages, our great benefits include: - Affordable medical plan options, a 401(k) plan (including matching company contributions), and an employee stock purchase plan. - No-cost programs for all colleagues including wellness screenings, tobacco cessation and weight management programs, confidential counseling and financial coaching. - Benefit solutions that address the different needs and preferences of our colleagues including paid time off, flexible work schedules, family leave, dependent care resources, colleague assistance programs, tuition assistance, retiree medical access and many other benefits depending on eligibility. For more information, visit https://jobs.cvshealth.com/us/en/benefits We anticipate the application window for this opening will close on: 04/24/2026 Qualified applicants with arrest or conviction records will be considered for employment in accordance with all federal, state and local laws.

Observability / Monitoring Prometheus Grafana ELK Stack OpenTelemetry AWS Azure GCP Python Java Splunk Datadog Git Jenkins Microservices Kubernetes Docker

View details: Staff Engineer - SRE, Retail and Pharmacy

United States + 1 more

$118K - $260K / year

Apply

Senior Site Reliability Engineer

Jobgether

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team. We appreciate your interest and wish you the best! Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time. #LI-CL1 We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.

DevOps Engineer75 days ago

Other RemoteH1B No Sponsor

Company Site LinkedIn

Role Description This role offers a unique opportunity to ensure the reliability, scalability, and performance of critical platform services in a fast-paced, technology-driven environment. The Senior Site Reliability Engineer (SRE) will combine software engineering expertise with operational excellence to automate processes, improve observability, and reduce operational risk across the platform. You will collaborate closely with development, DevOps, release engineering, and security teams to embed reliability and security best practices throughout the software lifecycle. This position emphasizes proactive problem-solving, automation, and continuous improvement while providing mentorship to peers and contributing to high-impact projects. The role is ideal for someone who thrives on solving complex technical challenges while shaping the platform’s resilience and scalability. - Define and maintain Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets for critical services. - Lead capacity planning, performance tuning, design reviews, and disaster recovery exercises to validate platform resilience. - Automate infrastructure provisioning, patching, and operational tasks using Terraform, Ansible, and CI/CD pipelines to eliminate manual processes. - Partner with security teams to enforce compliance (SOC2, CIS benchmarks), implement least-privileged IAM policies, and maintain hardened, secure systems. - Serve as Tier-2 escalation during incidents, lead root cause analysis, and continuously improve incident response playbooks and on-call processes. - Identify repetitive operational tasks and implement automation or self-service modules to reduce toil and improve developer productivity. - Measure system performance, track reliability metrics, and collaborate with leadership to drive iterative improvements. Qualifications - Bachelor’s degree in Computer Science, Engineering, or related field. - Minimum of 5 years of experience in Site Reliability Engineering, DevOps, or Systems Engineering roles. - Strong experience with AWS multi-account environments, Terraform, Ansible, CI/CD tools (GitHub Actions, Bitbucket, Jenkins, AWS CodeBuild/CodePipeline), and observability platforms (New Relic, CloudWatch). - Background with containerized environments (ECS, Fargate, EKS) and resilient system architectures. - Preferred certifications: AWS DevOps Engineer or Solutions Architect, Kubernetes, or SRE/DevOps practitioner certifications. - Excellent analytical, troubleshooting, and problem-solving abilities. - Strong collaboration skills to work effectively with cross-functional teams, mentor peers, and contribute to continuous improvement. Benefits - Competitive salary range: USD $120,000 – $125,000 per year. - Day-one medical, dental, vision coverage with flexible spending options (HSA/FSA). - 401(k) with company match available from day one. - Paid sick leave, volunteer time, and parental leave options. - Employer-paid life and disability insurance. - Wellbeing on Demand program to support personal health and wellness. - Flexible work environment with remote opportunities and casual dress code. Company Description

AWS Terraform Ansible CI/CD GitHub Actions Jenkins Amazon CloudWatch New Relic Amazon ECS Amazon EKS Kubernetes Amazon IAM Python Shell Linux Observability / Monitoring Infrastructure as Code

View details: Senior Site Reliability Engineer

United States

$120K - $125K / year

Apply

Job Closed

Site Reliability Engineer

Empower

DevOps Engineer75 days ago

Other RemoteTeam 10,001+H1B Sponsor

Company Site LinkedIn

Our vision for the future is based on the idea that transforming financial lives starts by giving our people the freedom to transform their own. We have a flexible work environment, and fluid career paths. We not only encourage but celebrate internal mobility. We also recognize the importance of purpose, well-being, and work-life balance. Within Empower and our communities, we work hard to create a welcoming and inclusive environment, and our associates dedicate thousands of hours to volunteering for causes that matter most to them. Chart your own path and grow your career while helping more customers achieve financial freedom. Empower Yourself. ***Applicants must be authorized to work for any employer in the U.S. We are unable to sponsor or take over sponsorship of an employment visa at this time, including CPT/OPT.*** We are seeking a Site Reliability Engineer (SRE) to own the reliability, availability, and operational excellence of our AWS-based data platform. This role is focused on applying core SRE principles — production engineering, incident management, root cause elimination, observability, automation, and capacity planning — to large-scale data infrastructure supporting EMR, EMR Serverless, Redshift, DynamoDB, and S3. You will treat data pipelines and analytics platforms as production systems, designing and enforcing SLAs/SLOs for uptime, performance, scalability, and data freshness. You will lead incident response, perform deep root cause analysis, implement durable fixes, and eliminate toil through automation and infrastructure-as-code. What you will do: - Own and improve the reliability, stability, scalability, and performance of our core data platforms and services - Provide operational support for large-scale, distributed data systems, ensuring high availability and strong SLAs - Partner closely with full-stack, data, and platform engineering teams to deliver continuous improvements - Operate and support EMR and EMR Serverless (Python/Spark) workloads and data pipelines - Support and optimize Amazon Redshift and DynamoDB in high-throughput, production environments - Design, build, and evolve monitoring, alerting, and observability frameworks with a focus on symptoms, not just outages - Lead incident response, troubleshooting production issues across the full stack and coordinating with internal and external stakeholders - Perform root cause analysis (RCA) and readiness reviews; turn findings into durable fixes and automation - Create and maintain runbooks, SOPs, and operational documentation - Collaborate with engineering teams to optimize performance, reliability, and cost - Participate in an on-call rotation to respond to incidents impacting customer-facing systems - Recommend and influence the use of AWS managed services and architectural patterns - Continuously evaluate system performance, capacity, and cost to scale efficiently What you will bring: - 4–6 years of experience building or operating systems across multiple architecture domains: application, data, integration, infrastructure, and security - 4+ years of hands-on AWS experience, with strong production exposure to several of the following: - Redshift, DynamoDB, EMR, EMR Serverless, EC2, S3 - Lambda, Step Functions, EventBridge, RDS, IAM - Proven experience operating data platforms such as data lakes and data warehouses in production - Strong SQL skills and experience working with modern databases (e.g., Redshift, DynamoDB, Postgres, MySQL, Oracle) - 4+ years of Python experience, including scripting, automation, or data workloads - Experience with CloudWatch, infrastructure monitoring, and alerting - Hands-on experience with incident management, uptime SLAs, and customer-impacting systems - Strong understanding of Git-based workflows (GitHub, Git Flow, or similar) - Experience working in Agile environments (Scrum / Kanban) using tools such as Jira and Confluence - Bachelor’s in Computer Science, Information Systems, Data/Analytics, or related; equivalent practical experience welcomed. What will set you apart: - Experience with Terraform or other Infrastructure-as-Code tools - Exposure to Snowflake or experience supporting analytics platforms beyond Redshift - Experience in financial services or other highly regulated environments - Knowledge of DevOps and CI/CD best practices - Familiarity with observability tools such as Splunk, AppDynamics, or advanced CloudWatch usage - Comfortable working across Linux/Unix environments - Strong communication skills during incident response with both technical and non-technical stakeholders - Security-minded approach to building secure, reliable, and durable systems - Willingness to support occasional off-hours or weekend incidents as part of on-call responsibilities - Streaming/event pipelines (Kafka/Kinesis), CDC patterns, and backfill strategies. - Experience with OpenLineage/Marquez and catalog integrations (Collibra/Alation/Purview). - Prior FinOps or capacity-planning ownership for data platforms. - Familiarity with BI semantic layers and contract enforcement at consumption (Looker/Power BI/Tableau). Work conditions Participate in an on-call rotation; occasional change windows outside business hours to support safe releases and resiliency drills. This job description is not intended to be an exhaustive list of all duties, responsibilities and qualifications of the job. The employer has the right to revise this job description at any time. You will be evaluated in part based on your performance of the responsibilities and/or tasks listed in this job description. You may be required perform other duties that are not included on this job description. The job description is not a contract for employment, and either you or the employer may terminate employment at any time, for any reason. What we offer you We offer an array of diverse and inclusive benefits regardless of where you are in your career. We believe that providing our employees with the means to lead healthy balanced lives results in the best possible work performance. - Medical, dental, vision and life insurance - Retirement savings – 401(k) plan with generous company matching contributions (up to 6%), financial advisory services, potential company discretionary contribution, and a broad investment lineup - Tuition reimbursement up to $5,250/year - Business-casual environment that includes the option to wear jeans - Generous paid time off upon hire – including a paid time off program plus ten paid company holidays and three floating holidays each calendar year - Paid volunteer time — 16 hours per calendar year - Leave of absence programs – including paid parental leave, paid short- and long-term disability, and Family and Medical Leave (FMLA) - Business Resource Groups (BRGs) – BRGs facilitate inclusion and collaboration across our business internally and throughout the communities where we live, work and play. BRGs are open to all. Base Salary Range $87,400.00 - $123,400.00 The salary range above shows the typical minimum to maximum base salary range for this position in the location listed. Non-sales positions have the opportunity to participate in a bonus program. Sales positions are eligible for sales incentives, and in some instances a bonus plan, whereby total compensation may far exceed base salary depending on individual performance. Actual compensation offered may vary from posted hiring range based upon geographic location, work experience, education, licensure requirements and/or skill level and will be finalized at the time of offer. Equal opportunity employer • Drug-free workplace We are an equal opportunity employer with a commitment to diversity. All individuals, regardless of personal characteristics, are encouraged to apply. All qualified applicants will receive consideration for employment without regard to age (40 and over), race, color, national origin, ancestry, sex, sexual orientation, gender, gender identity, gender expression, marital status, pregnancy, religion, physical or mental disability, military or veteran status, genetic information, or any other status protected by applicable state or local law. ***For remote and hybrid positions you will be required to provide reliable high-speed internet with a wired connection as well as a place in your home to work with limited disruption. You must have reliable connectivity from an internet service provider that is fiber, cable or DSL internet. Other necessary computer equipment, will be provided. You may be required to work in the office if you do not have an adequate home work environment and the required internet connection.*** Job Posting End Date at 12:01 am on: 03-20-2026 Want the latest money news and views shaping how we live, work and play? Sign up for Empower’s free newsletter and check out The Currency.

AWS Python Amazon Redshift DynamoDB Amazon S3 Amazon CloudWatch SQL Git Linux

View details: Site Reliability Engineer

United States

$87.4K - $123K / year

Apply

Job Closed

DevOps Automation Engineer - Web Content Management (WCM)

Peraton Corporation

Peraton Corporation, a national security company headquartered in Herndon, Virginia, supplies solutions for mission-critical programs and systems. Founded in 2017, Peraton's missio

DevOps Engineer75 days ago

Other Remote

Responsibilities Peraton is looking for a DevOps Automation Engineer to support a team of engineers and is responsible for the design, implementation, and operations and maintenance (O&M) of the multi-tenant Web Content Management as a Service (WCMaaS) platform. This includes managing all aspects of the underlying infrastructure, such as servers and storage. The Engineer will work to continuously enhance the platform, leveraging secure, scalable, cost-effective, and operationally sustainable cloud solutions. Aside from technical qualifications, applicants should have effective communication skills, both written and verbal. This role requires deep expertise in cloud services, infrastructure automation, DevSecOps, open-source ecosystems, enterprise solution design, and software and infrastructure engineering practices. The ideal candidate brings strong technical leadership, the ability to engage with a diverse set of stakeholders, and the ability to adapt solutions to evolving requirements and priorities. Location: Remote (but must reside and perform all work within the United States) Work Hours: This position requires working online from 8:00 AM to 5:00 PM Eastern Day to Day Roles and Responsibilities: - DevOps Pipeline and infrastructure automation using GitLab and Ansible - Strong Linux experience for both troubleshooting and System Administration related tasks. - Apache web platform management - Develop code using scripting and programming language - Investigate and resolve issues across existing pipelines and infrastructure - Perform root cause analysis to address underlying IaC issues and provide solutions to prevent recurrence. - AWS Infrastructure Management - Ensure that all tenants’ AWS resources are secure, FedRAMP compliant, and optimized for performance - Collaborate with the Architecture team to implement solutions that align with best practices for AWS cloud infrastructure - Adhere to Change Management procedures - Collaboration and Knowledge Sharing - Collaborate with other team engineers to resolve development issues/incidents and implement improvements - Document solution designs, process procedures, and lessons learned to enhance team knowledge Qualifications Basic Qualifications: - Bachelors degree and 5 years experience or a Masters degree and 3 years experience or an Associates degree and 7 years experience or a High School diploma and 9 years experience. - Must be a U.S. Citizen and have the ability to obtain/maintain a DHS Public Trust clearance. - 5+ years of experience in cloud services and infrastructure. - 3+ years of extensive hands-on experience with automation involving a wide range of AWS services including but not limited to EC2 instances ASG’s, Lambdas, and other services - Required Certification: - AWS Cloud Practitioner - Required pipeline and infrastructure automation experience: - Pipeline Orchestration tool experience required, with GitLab preferred - Config as Code is required, with Ansible preferred - Python programming language required - Extensive knowledge and understanding of AWS GovCloud and deploying in Government environments. - Exemplary communication, analytical skills, and technical knowledge across the client environment. - Ability to produce concise and clear technical documentation. Preferred Qualifications: - Preferred Certifications: - RHSCA/RHSCE - Any AWS associate level certification (SysOps, Developer, Solution Architect) Peraton Overview Peraton is a next-generation national security company that drives missions of consequence spanning the globe and extending to the farthest reaches of the galaxy. As the world’s leading mission capability integrator and transformative enterprise IT provider, we deliver trusted, highly differentiated solutions and technologies to protect our nation and allies. Peraton operates at the critical nexus between traditional and nontraditional threats across all domains: land, sea, space, air, and cyberspace. The company serves as a valued partner to essential government agencies and supports every branch of the U.S. armed forces. Each day, our employees do the can’t be done by solving the most daunting challenges facing our customers. Visit peraton.com to learn how we’re keeping people around the world safe and secure. Target Salary Range $80,000 - $128,000. This represents the typical salary range for this position. Salary is determined by various factors, including but not limited to, the scope and responsibilities of the position, the individual’s experience, education, knowledge, skills, and competencies, as well as geographic location and business and contract considerations. Depending on the position, employees may be eligible for overtime, shift differential, and a discretionary bonus in addition to base pay. EEO EEO: Equal opportunity employer, including disability and protected veterans, or other characteristics protected by law.

GitLab Ansible Linux Apache HTTP Server Python AWS Amazon EC2 Amazon Lambda Infrastructure as Code

View details: DevOps Automation Engineer - Web Content Management (WCM)

United States

$80K - $128K / year

Apply

Job Closed

Senior DevOps Engineer

Job Description

Job Requirements

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Staff Engineer - SRE, Retail and Pharmacy

Senior Site Reliability Engineer

Site Reliability Engineer

DevOps Automation Engineer - Web Content Management (WCM)