Job Closed

This listing is no longer active.

CookUnity

We are on a mission to unlock the world's best food creators and bring their dishes to the doorstep of the masses.

Senior Site Reliability Engineer – B2B

DevOps EngineerDevOps EngineerFull Time Remote SeniorTeam 501-1,000Since 2015H1B SponsorCompany Site LinkedIn

Location

Argentina

Posted

125 days ago

Salary

Seniority

Senior

Bachelor DegreeEnglishAWS Docker Apache Kafka Kubernetes MySQL PostgreSQL Terraform

Job Description

• As a Senior Infrastructure Engineer, you will play a crucial role in designing and architecting scalable and reliable platforms that empower our engineers to deliver features to our customers efficiently. • Manage Kubernetes clusters and EKS environments, ensuring efficient scaling and high availability of our applications. • Create and maintain a robust framework for Infrastructure as Code (IaC) using Terraform for automated infrastructure deployment and management. • Implement and manage efficient and reliable CI/CD pipelines with GitLab for seamless software delivery. • Manage our growing database fleet (including RDS Postgres and MySQL), ensuring seamless database migrations, data integrity, and reliability. • Design, implement, and manage event streaming platforms (e.g., Kafka, Kinesis) to enable real-time data processing and asynchronous communication between services. • Implement and maintain robust security measures to ensure the confidentiality, integrity, and availability of our systems. • Develop and maintain an holistic approach to observability and infrastructure monitoring to proactively detect and resolve issues before they impact our services. • Actively partner with Compliance and Security teams to plan and execute compliance workstreams for HIPAA, PCI, SOC 2, ISO 27001, and HITRUST.

Job Requirements

Experience with AWS and advanced multi-region cloud networking concepts.
Strong Infrastructure as Code (IaC) knowledge and experience with Terraform.
Extensive experience with containerization technologies such as Docker.
Experience with Kubernetes and EKS to manage containerized applications at scale.
Experience with database management and decentralized database migrations.
Proficiency in implementing and managing CI/CD pipelines to achieve efficient software delivery.
Experience partnering with Security and Compliance teams to deliver audit-ready evidence and drive remediation work.
Working knowledge of common control frameworks and audit processes, including SOC 2 and ISO 27001.
Familiarity with compliance requirements in regulated environments, such as HIPAA, PCI, and HITRUST.

Benefits

💸 Get paid in USD
🗺 Work remotely: design the life that you want
⛱ Enjoy 15 days of vacation each year from the start date
🎄 16 fully paid Argentinean holidays
🩺 Healthcare Benefit: Monthly stipend to use in your preferred healthcare provider
🗓️ 5- year Sabbatical: After 5 years with CookUnity, you get a 4-week paid sabbatical
🐣 Paid Family leave
🕯 Compassionate Leave: 3-5 days each time the need arises
🧘🏽‍♀️ Customize the benefits that suit your needs! Access a range of perks tailored to you, including learning opportunities, wellness memberships, delivery apps, and more through our comprehensive benefit platform
🧑‍🏫 Personalized English coach

Related Categories

DevOps Engineer

Related Job Pages

Remote Full-time Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

Staff Site Reliability Engineer

Unqork

Using CaaS (Codeless-as-a-Service) to accelerate time-to-market & eliminate legacy code for the enterprise 🚀

DevOps Engineer125 days ago

Other RemoteTeam 201-500Since 2017H1B Sponsor

Company Site LinkedIn

• Report to our VP of Engineering • Observability Architecture & Strategy: Define and own end-to-end observability architecture, establishing standards for instrumentation and telemetry while leading OpenTelemetry adoption. • Reliability & Operational Excellence: Use telemetry to proactively manage reliability risks, define SLIs/SLOs, and shift the organization from reactive response to proactive engineering. • Technical Leadership: Serve as a Staff-level IC providing architectural guidance and mentorship across SRE, DevOps, and application teams. • System Insight: Improve how the organization understands system behavior and detects/resolves incidents • Operational Impact: Directly influence platform uptime, customer experience, engineering velocity, and operational efficiency • Data-Driven Decisions: Enable better decision-making regarding performance, scalability, and reliability through data

AWS Java Kubernetes Python

View details: Staff Site Reliability Engineer

United States

$160K - $215K / year

Apply

Job Closed

Senior DevOps Engineer – AWS

Leega

Inteligência, Inovação e Tecnologia.

DevOps Engineer125 days ago

Full Time RemoteTeam 201-500Since 2010H1B No Sponsor

Company Site LinkedIn

• Maintain infrastructure and understand application behavior running inside containers. • Orchestrate complex environments using Kubernetes. • Act as the technical bridge between development teams and corporate infrastructure. • Manage and optimize Dockerfiles with a focus on security and layer management. • Perform advanced container troubleshooting and configure container networking. • Manage the application lifecycle within Kubernetes clusters.

DNS Docker Kubernetes PostgreSQL Terraform

View details: Senior DevOps Engineer – AWS

Brazil

Apply

Job Closed

Senior System Reliability Engineer

Lirio

DevOps Engineer125 days ago

Full Time RemoteTeam 51-200

Role Description The Senior System Reliability Engineer (SRE) at Lirio is responsible for the reliability, scalability, and performance of our cloud-native applications and infrastructure. This role leads the design and implementation of automation, monitoring, and incident response processes, and mentors other engineers in SRE best practices. The Senior SRE partners with development teams to ensure robust, secure, and highly available systems, and drives continuous improvement in operational excellence. This role operates as a senior, hands-on reliability engineer embedded with product and platform teams. The Senior SRE is accountable for: - Defining and enforcing service-level objectives (SLOs) - Reducing operational toil through automation - Improving system reliability through proactive engineering rather than reactive support This role is not ticket-driven operations and is expected to influence architecture, development practices, and incident readiness across the platform. Essential Duties & Responsibilities - Reliability Engineering & Automation (40%) - Architect, implement, and maintain automated solutions for deployment, monitoring, alerting, and incident response using Lirio’s technology stack (AWS, Azure, Kubernetes, Kafka, Java, TypeScript, Groovy, Databases/SQL). - Develop and manage infrastructure as code (e.g., Terraform, AWS CloudFormation). - Build and optimize CI/CD pipelines for seamless, reliable delivery. - Define, implement, and continuously refine service-level indicators (SLIs), service-level objectives (SLOs), and error budgets for critical services. - Identify and reduce operational toil through automation, platform improvements, and architectural changes. - Performance analysis and optimization of Lirio systems and services. - Ensure high availability and scalability of services through proactive engineering, load testing, and capacity planning across multi-tenant and client-specific environments. - Peer Reviews & Collaboration (10%) - Review infrastructure changes, automation scripts, and reliability-impacting code changes to ensure production readiness. - Collaborate with software engineers to embed reliability, security, and operational best practices into development workflows. - Partner with software engineering teams during design and architecture discussions to identify reliability risks early. - Operational Support & Incident Management (20%) - Monitor system health using modern observability tools (e.g., Prometheus, Grafana, Datadog). - Participate in a defined on-call rotation supporting production systems, with clear escalation paths and expectations. - Contribute to and maintain incident severity definitions, response procedures, and no-blame postmortem practices. - Lead incident response, root cause analysis, and postmortems for production issues. - Triage and resolve issues, ensuring minimal downtime and rapid recovery. - Support client onboarding and production rollouts by ensuring reliability, observability, and operational readiness standards are met. - Mentorship & Knowledge Sharing (10%) - Mentor and coach engineers on reliability engineering principles, operational ownership, and incident response best practices. - Design processes to share operational knowledge and avoid single points of failure. - Advise colleagues on architecture and reliability strategies. - Help establish shared operational ownership across teams to reduce single points of failure and knowledge silos. - Continuous Learning & Innovation (10%) - Stay current with industry trends in reliability engineering, cloud operations, and automation. - Bring innovation to operational practices and system design, evaluating and introducing new tools and technologies as appropriate for Lirio. - Evaluate new tooling with an emphasis on operational simplicity, security, and long-term maintainability. - Documentation & Process Improvement (5%) - Define and document operational processes, incident response playbooks, and reliability standards. - Contribute to operational planning, incident reviews, and reliability documentation. Qualifications - 5-7 years related experience - Bachelor's Degree in related field - Linux systems and networking fundamentals (DNS, TCP/IP, TLS) - Distributed systems debugging and failure analysis - Load, stress, and fault-injection testing - CI/CD tools and processes - Version control (e.g., Git) - Cloud platforms (e.g., AWS, Azure) - Containers and orchestration (Kubernetes) - Kafka (messaging/streaming) - Scripting and programming languages (e.g., Java, TypeScript, Groovy, Python) - Agile methodologies (e.g., Scrum, XP, SAFe) - Databases/SQL - Observability/monitoring tools (DataDog) Benefits - Medical (HSA available) - Dental - Vision - Short-term & long-term disability (company-paid) - Life & AD&D (company-paid) - 401K with company match - 10 paid holidays, quarterly company closure dates, + holiday week company closure - Flexible time off policy - Work from home - 6 weeks paid parental leave Salary Range $130k-$150k

AWS Azure Kubernetes Apache Kafka Java TypeScript Groovy SQL Terraform Prometheus Grafana Datadog Git Python Linux DNS TCP/IP TLS

View details: Senior System Reliability Engineer

United States

$130K - $150K / year

Apply

Job Closed

Senior DevSecOps Engineer

Element 84

Accelerating and scaling impactful projects with great software and design. Geospatial, cloud, and petabyte-scale data.

DevOps Engineer125 days ago

Other RemoteTeam 51-200Since 2010H1B No Sponsor

Company Site LinkedIn

• Design, implement, and maintain secure cloud solutions across AWS, Azure, and GCP to meet mission and compliance requirements. • Assist in developing and maintaining essential security artifacts, including System Security Plans (SSPs), Security Assessment Reports (SARs), and Plans of Action and Milestones (POA&Ms). • Analyze complex cloud and system architectures to identify security risks and recommend effective mitigation strategies. • Apply and document security controls based on NIST 800-53 and NIST 800-171 standards. • Collaborate with all functional areas of the team to embed security into CI/CD pipelines and automate security checks. • Assist in cloud-based incident response and lead vulnerability remediation efforts. • Provide expert guidance on cloud security best practices, including encryption, access controls, identity management, and data protection. • Evaluate, recommend, and implement cloud-native and third-party security tools. • Participate in design reviews, risk assessments, and change control processes to ensure the security of new systems and changes. • Lead annual security assessments and ongoing monitoring activities to maintain a strong security posture. • Advise Information System Owners (ISOs) on system security and compliance matters. • Oversee security posture for cloud infrastructure and monitor tenant security control implementation. • Support the development and maintenance of ISAs between tenants and Cloud Computing Services.

AWS Azure GCP

View details: Senior DevSecOps Engineer

Arizona + 18 more

$150K - $180K / year

Apply

Job Closed

Senior Site Reliability Engineer – B2B

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Staff Site Reliability Engineer

Senior DevOps Engineer – AWS

Senior System Reliability Engineer

Senior DevSecOps Engineer