Job Closed

This listing is no longer active.

Air Apps

Site Reliability Engineer – SRE

DevOps EngineerDevOps EngineerFull Time Remote SeniorTeam 51-200H1B No SponsorCompany Site LinkedIn

Location

France

Posted

43 days ago

Salary

€55K - €68K / year

Seniority

Senior

Bachelor Degree4 yrs expEnglishAWS Azure Cloud Distributed Systems Docker Google Cloud Platform Grafana Kubernetes Linux Prometheus Python Terraform Go

Job Description

• Design and implement scalable, reliable, and fault-tolerant systems across cloud environments. • Develop and maintain observability tools, including monitoring, logging, and alerting (e.g., Prometheus, Grafana, Datadog, ELK). • Automate infrastructure provisioning, deployment, and incident response using Infrastructure as Code (IaC) tools like Terraform or CloudFormation. • Optimize system performance, scalability, and incident response workflows to improve uptime. • Work closely with development and DevOps teams to improve system design for reliability. • Conduct root cause analysis (RCA) and implement preventative measures to minimize failures. • Ensure high availability by designing and maintaining load balancing, failover, and disaster recovery strategies. • Improve CI/CD pipelines to enhance deployment speed while maintaining stability. • Optimize cloud cost and resource utilization for AWS, Azure, or Google Cloud Platform (GCP). • Participate in on-call rotations to quickly address system failures and minimize downtime.

Job Requirements

Around 4+ years of experience in Site Reliability Engineering (SRE), DevOps, or System Engineering.
Strong knowledge of cloud platforms (AWS, Azure, or GCP) and cloud-native architectures.
Experience with observability and monitoring tools (Prometheus, Grafana, ELK, Datadog, New Relic).
Proficiency in Infrastructure as Code (IaC) tools such as Terraform, CloudFormation, or Pulumi.
Hands-on experience with containerization and orchestration (Docker, Kubernetes, Helm).
Strong Linux system administration and networking fundamentals.
Experience with incident management, debugging, and root cause analysis.
Proficiency in scripting (Bash, Python, or Go) for automation and system monitoring.
Knowledge of load balancing, failover strategies, and distributed systems.
Understanding of security best practices, access control, and compliance requirements.
Strong communication skills and the ability to collaborate with cross-functional teams.

Benefits

Apple hardware ecosystem for work.
Annual Bonus
Top-tier Health and Life Insurance for peace of mind.
Transportation Budget to support your commute needs.
Coverflex benefits package for meal allowances, well-being, and more.
Childcare support.
Air Conference - an opportunity to meet the team, collaborate, and grow together.
Pension Fund to support your long-term financial planning.
Urban Sports Club membership to keep you active.
Meals 100% free at the hub.

Related Categories

DevOps Engineer

Related Job Pages

Remote Full-time Jobs (US)Remote Python Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

DevOps Engineer

Avenga

A global IT engineering and consulting company specializing in custom software development.

DevOps Engineer43 days ago

Full Time RemoteTeam 5,001-10,000H1B No Sponsor

Company Site LinkedIn

• Design and manage end-to-end GCP infrastructure using Terraform • Build and maintain GCP organizational structure and landing zones • Design and manage the application build and deployment lifecycle for modern programming languages • Develop and maintain CI/CD pipelines • Manage production workloads on Google Kubernetes Engine (GKE) • Implement and maintain observability and reliability solutions

AWS Azure Cloud Google Cloud Platform Kubernetes SQL Terraform

View details: DevOps Engineer

Czechia

Apply

Job Closed

Ingeniero Software Java, DevOps

knowmad mood

growing together

DevOps Engineer43 days ago

Full Time RemoteTeam 1,001-5,000Since 1994H1B No Sponsor

Company Site LinkedIn

• Desarrollar y mantener aplicaciones Java y servicios DevOps. • Colaborar con equipos para mejorar la arquitectura de software. • Implementar pruebas automatizadas y contenedores Docker. • Participar en reuniones ágiles y contribuir en la planificación de sprint.

Ansible Azure Docker Java JUnit Kubernetes Mockito OpenShift Spring

View details: Ingeniero Software Java, DevOps

Spain

Apply

Job Closed

AI Infrastructure & Reliability Engineer

HiBob

HiBob is a modern HR technology company focused on transforming the way organizations operate in today’s dynamic workplace. Its platform streamlines core HR processes, enhances e

DevOps Engineer43 days ago

Full Time RemoteTeam 1,350Since 2015

Job Description About UsHiBob helps modern, mid-size businesses transform the way they manage people, giving HR and managers all they need to connect, engage, develop, and retain top talent. Since 2015, we've achieved consecutive triple-digit year-over-year growth, all backed by our amazing team of Bobbers from across the globe, making us the choice HRIS of over ~5500 midsize and multinational companies and over 1 Milion users. Our HR platform is intuitive, data-driven, and built for the way people work today: globally, remotely, and collaboratively. What this role is really about You'll join a 3-person platform team within our Business Technology group -owning the internal infrastructure that our AI platform and its users depend on. This isn't a product engineering role, and it isn't ticket work or babysitting pipelines someone else built. You're building and operating the internal foundation that the company runs on. The work covers the full stack of platform engineering: core cloud infrastructure (AWS, Kubernetes, IaC), CI/CD pipelines, AI-driven infrastructure components, and the SRE and observability practice that keeps it all honest -metrics, alerting, incident response, and reliability standards. As our AI capabilities grow, so does the complexity underneath them, and staying ahead of that is central to the role. If you treat infrastructure as a product -reusable, automated, observable, and built to last -this is your kind of role. Job Requirements - 2-4 years Hands-on DevOps, SRE, or infrastructure engineering in production SaaS environments. - Strong AWS experience: multi-account architecture, cross-account IAM, serverless and event-driven services (Lambda, SQS, SNS, EventBridge), and EKS cluster management. - Proven Kubernetes experience in production, including cross-account migrations and stateful workload management. - Proficiency with Terraform - repository structure design, module architecture, and CI/CD pipeline implementation. - Hands-on experience building and maintaining GitHub Actions pipelines for end-to-end CI/CD workflows. - Working Python proficiency for scripting, internal tooling, and workflow automation. - Practical experience implementing observability stacks from scratch: metrics, logging, distributed tracing, and alerting. - Experience owning reliability practices: SLOs, incident response, and postmortem culture. Nice to have - Hands-on experience operating LLM APIs in production: rate-limit and quota management, cost attribution per team/model, latency monitoring, and resilience patterns (retries, fallbacks, circuit breakers). - FinOps experience across cloud, AI, and observability spend. - Experience introducing self-healing or auto-remediation patterns in production. Job Responsibilities - DevOps & AI-Driven Infrastructure - own CI/CD, deployment processes, and release reliability. Build and operate cloud infrastructure that is automated, intelligent, and continuously self-improving - not just managed. - Design and build our Terraform repository and IaC pipeline from scratch -AI-assisted generation, drift detection, and policy enforcement built in. - Build AI-driven GitHub Actions pipelines -automated code review, risk assessment, and intelligent deployment decisions. - Manage Kubernetes workloads across AWS accounts -zero downtime, fully automated, nothing left behind. - Embed AI into the operational layer -proactive drift detection, automated remediation, and intelligent scaling toward a self-healing runtime. - Reliability & SRE -improve uptime, resilience, and incident response. - Define and enforce SLOs/SLIs, error budgets, and on-call practices. - Lead incident response, postmortems, and systemic reliability improvements. - Own AI-specific reliability: model latency SLOs, token quota monitoring, rate limit handling, fallback and retry strategies, and cost-per-request alerting. - Observability & Telemetry - increase visibility, reduce noise, improve troubleshooting. - Establish and continuously evolve the observability stack: metrics, logs, distributed tracing, and alerting tuned for both application and AI workloads. - AI / LLM Operations- bringing AI systems to production and operating them at scale, with a focus on reliability, performance, and trust. - Own the AI infrastructure layer: rate limits, quota management, latency SLOs, and fallback strategies (retries, circuit breakers). - Operate LLM APIs in production with resilience and cost attribution per team/model. - FinOps & Cost Optimization - optimize AI, infra, and logging costs at scale. - Build cost visibility and guardrails across AWS, LLM usage, and observability pipelines. Benefits Join our village HiBob is a village filled with amazing people and we're especially proud of that. It's a place where Bobbers can be themselves. We're about fun, dreams, hopes and ambition, just as much as we are about precision, growth, and top performance. Becoming a Bobber means you'll receive competitive compensation, benefits, and pre-IPO equity alongside all of this: - Company share options plan - We have a flexible hybrid working model - Work from home allowance- to get your home office set up! - Payment for sick leave from the first day - 2 Social Impact days per year for volunteering - Annual Headspace subscription and wellness benefits - Awesome employee referral program- $2,500 for each successful referral with an additional ambassador programme - Monthly Wolt Allowance - Transportation allowance - Dog-friendly - Temporary remote work from anywhere in the world for up to 2 months (after 6 months of employment) - Fun company and team social events (locally and virtually with our global teams) - Bob balance days - 4 additional days within a calendar year - Enjoy a company-wide long weekend at the beginning of each quarter If this sounds like something you've been looking for, we'd love to have you. Come on, join our village!

AWS Github Actions Kubernetes Python Terraform

View details: AI Infrastructure & Reliability Engineer

Israel

Apply

Job Closed

Senior DevOps Engineer

TekSynap

TekSynap, formerly known as Synaptek, is a privately held, ISO-certified IT company offering solutions and services to meet the business technology needs of local, state, and feder

DevOps Engineer43 days ago

Full Time Remote

Role Description We are seeking a DevOps Engineer (Senior). - Facilitates the development of new software solutions and transition of existing solutions from monolithic structures to micro-service structure operating within hardened containers. - Working with application development teams to refactor or create solutions that leverage the DevSecOps CI/CD pipeline and tools. - Instructs/guides teams through their solution development. - Deploys and sustain microservices factory utilizing COTS and open-source solutions. Qualifications - Five (5+) years Agile experience. - Driving strategy and overseeing architecture of continuous integration and deployment, and monitoring across technologies. - Demonstrated experience with defining SAFe Agile methodology for large scale clients. - Demonstrated experience in leading DevOps methodologies. - Demonstrated experience with implementing Test Driven Development (TDD) Methodologies. - Demonstrated experience with driving automated software development lifecycle toolchain. - Demonstrated experience with deployments in both on premise and cloud environments. - Experience serving as the engineer of complex technology implementations in a product centric environment. - Experience with DevOps services using infrastructure as a service provider (e.g., Amazon Web Services, Microsoft Azure, Google Compute Engine, RackSpace/OpenStack). - Using scripting or basic programming skills to solve problems. - Experience with configuration management tools (e.g., TFS, Puppet, Chef, Ansible, Salt, LVM). - Familiarity with containerization technologies (e.g., LXC, Docker, Rocket, OpenShift). - Preferred SAFe Agile Certification or industry recognized equivalent certification. - Demonstrated experience supporting government agencies, customers, or contracts within federal environments. - Certifications: Cloud Service Provider Certification: AWS; Microsoft Azure; Google Cloud Platform. - Security + - Bachelor’s Degree - Clearance: Secret, IT II Requirements - Location: Remote with periodic support at Fort Belvoir or other places in the National Capitol Region. - Type of environment: Remote - Noise level: Low - Work schedule: Schedule is day shift Monday – Friday. May be requested to work evenings and weekends to meet program and contract needs. - Amount of Travel: less than 10% - List of Approved States: AL, AK, AZ, AR, CT, DE, FL, GA, ID, IN, IO, KS, KY, LA, ME, MI, MS, MO, MT, NE, NV, NH, NM, NC, ND, OH, OK, OR, PA, RI, SC, SD, TN, TX, UT, VA, WV, WI, WY. - U.S. Citizen - Secret Clearance Physical Demands The physical demands described here are representative of those that must be met by an employee to successfully perform the essential functions of this job. Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions. - Regularly required to use hands to handle, feel, touch; reach with hands and arms; talk and hear. - Regularly required to stand; walk; sit; climb or balance; and stoop, kneel, crouch, or crawl. - Regularly required to lift up to 10 pounds. - Frequently required to lift up to 25 pounds; and up to 50 pounds. - Vision requirements include close vision, distance vision, peripheral vision, depth perception, and ability to adjust focus. Benefits - Competitive benefits package including health, dental, vision, 401K, life insurance, short-term and long-term disability plans, vacation time and holidays. Equal Employment Opportunity In order to provide equal employment and advancement opportunities to all individuals, employment decisions will be based on merit, qualifications, and abilities. TekSynap does not discriminate against any person because of race, color, creed, religion, sex, sexual orientation, gender identity, protected veteran status, national origin, disability, age, genetic information or any other characteristic protected by law.

View details: Senior DevOps Engineer

United States

Apply

Site Reliability Engineer – SRE

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

DevOps Engineer

Ingeniero Software Java, DevOps

AI Infrastructure & Reliability Engineer

Senior DevOps Engineer