A better way to AI
Senior DevOps Engineer
Location
United States
Posted
67 days ago
Salary
0
Seniority
Senior
Job Description
Senior DevOps Engineer
Entefy
• Create, deploy, and manage high performing servers • Deliver millions of requests globally with sub-second latency • Shape technology from the core in a startup environment
Job Requirements
- 6+ years of experience in deployment automation, secure systems, and fault tolerance
- Demonstrable experience with computer networks
- Ability to quickly learn complex systems and new technologies
- Ability to collaborate well with others
- Familiarity with database concepts and SQL
- Passion for automation over repetitive manual work
- Strong communication skills
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Ensure the reliability, availability, and performance of systems and applications in production; • Implement and maintain monitoring, observability, and alerting solutions with a focus on Datadog, monitoring metrics, logs, and traces; • Define, track, and report reliability indicators such as SLOs, SLIs, and SLAs; • Respond to critical incidents, including on-call rotations, ensuring rapid service recovery and conducting root cause analyses; • Develop and improve automations for deployment, monitoring, scalability, and failure recovery; • Collaborate with development teams to build resilient and scalable systems; • Support CI/CD pipelines, infrastructure as code, and operational best practices; • Document procedures, runbooks, and operational standards, promoting continuous improvement and reducing operational risks; • Support platform maintenance and evolution by directly monitoring environments, responding to critical incidents, and implementing continuous improvements to reduce recurring failures; • Strong knowledge of Datadog for monitoring, metrics, alerting, and performance analysis is essential, as is availability to respond to emergency incidents outside business hours according to the defined on-call schedule.
DevOps Engineer
Lucidya | لوسيدياThe leading Customer Experience Management platform geared towards Arab.
• Own and improve CI/CD pipelines, ensuring software is delivered efficiently, reliably, and at scale. • Manage, deploy, and optimize containerized applications using Docker and Kubernetes. • Maintain and evolve cloud infrastructure with Infrastructure as Code (Terraform), ensuring high availability and scalability. • Implement monitoring, alerting, and logging solutions (e.g., Grafana), catching issues before they affect users. • Automate repetitive operational tasks and support development teams with infrastructure needs. • Collaborate closely with engineers to optimize system performance, reliability, and release processes. • Apply security best practices across deployments, pipelines, and infrastructure. **Day-to-Day You’ll:** • Operate Kubernetes clusters, manage scaling, and ensure service reliability. • Build and enhance CI/CD pipelines for new and existing services. • Monitor system health, troubleshoot issues, and resolve incidents efficiently. • Support development teams with deployment processes and infrastructure requests. • Continuously improve automation, observability, and operational efficiency. **Success Looks Like:** • Highly reliable CI/CD pipelines with minimal failures. • Kubernetes environments that scale seamlessly under load. • Faster incident response and proactive problem detection. • Reduced manual operational effort thanks to automation. **First 90 Days:** • 0–30 Days: Gain deep understanding of Lucidya’s infrastructure, architecture, and workflows. • 30–60 Days: Contribute to CI/CD pipelines, handle smaller tasks, and support issue resolution. • 60–90 Days: Successfully onboard 2–3 services to the Kubernetes cluster with full CI/CD, monitoring, and health checks.
Site Reliability Engineer
Lucidya | لوسيدياThe leading Customer Experience Management platform geared towards Arab.
• You’ll design and maintain infrastructure that is highly available, fault-tolerant, and scalable • You’ll proactively identify and eliminate single points of failure before they become incidents • You’ll ensure our production systems remain stable, even under increasing scale and load • You’ll manage and continuously improve workloads across AWS, GCP, or Azure • You’ll use Infrastructure as Code (Terraform) to standardize and scale infrastructure • You’ll optimize resource usage to balance performance and cost • You’ll operate and scale Kubernetes clusters (EKS, GKE, etc.) with confidence • You’ll troubleshoot issues quickly and ensure smooth deployments and upgrades • You’ll ensure our containerized workloads perform reliably at scale • You’ll implement and refine monitoring systems using tools like Prometheus, Grafana, Datadog, or ELK • You’ll define alerting that is meaningful, not noisy • You’ll respond to incidents, lead root cause analysis, and ensure we learn from every failure • You’ll write scripts and build tooling to eliminate repetitive operational work • You’ll continuously improve infrastructure efficiency through automation • You’ll promote a culture where manual work is a temporary state, not the norm • You’ll work closely with DevOps and engineering teams to solve performance bottlenecks • You’ll contribute to CI/CD improvements and deployment reliability • You’ll help shape reliability best practices across the organization
Forward Deployment Engineer
WelocalizeReach, Grow, and Engage Global Audiences with Multilingual Content
• Build and maintain automation scripts and internal lightweight tools to support operations teams. • Develop, troubleshoot and optimize enterprise integrations to ensure seamless data exchange. • Implement programmatic quality checks to improve data consistency and accuracy. • Support the rollout of technical solutions by documenting, and iterating based on feedback. • Build and support integrations between internal systems and third-party tools • Maintain and improve existing workflows as the teams needs evolve.



