Facilitating efficient and sustainable trade.
Senior DevOps Engineer – Cloud, ML Infrastructure
Location
Greece
Posted
74 days ago
Salary
0
Seniority
Senior
Job Description
Senior DevOps Engineer – Cloud, ML Infrastructure
Kpler
• Design, operate, and improve Kpler’s cloud-native infrastructure (Kubernetes, networking, compute, storage). • Contribute to Infrastructure as Code, CI/CD pipelines, and platform automation. • Ensure high availability, reliability, and security of production systems. • Improve observability, monitoring, alerting, and incident response processes. • Reduce MTTR and failure rates through structured reliability improvements. • Optimize infrastructure cost and performance, including compute-intensive workloads. • Support and help standardize ML/GPU-based workloads within the existing platform model. • Collaborate closely with ML engineers, data engineers, and backend teams to ensure production-grade deployments. • Contribute to architectural decisions shaping the evolution of the platform.
Job Requirements
- 5+ years of experience in cloud/platform engineering in production environments.
- Strong hands-on experience with Kubernetes in production.
- Experience with Infrastructure as Code (Terraform preferred).
- Strong knowledge of AWS (or equivalent cloud provider).
- Experience operating distributed systems in 24/7 environments.
- Strong operational mindset (SLOs, monitoring, incident management).
- Bachelor’s or Master’s degree in Computer Science, Engineering, or equivalent practical experience.
- Strong programming skills (Python or Go preferred).
- Solid understanding of cloud-native architecture and reliability engineering principles.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
DevOps Engineer
Deutsche Telekom IT SolutionsAs Hungary’s most attractive employer in 2025 (according to Randstad’s representative survey), Deutsche Telekom IT Solutions is a subsidiary of the Deutsche Telekom Group. The company provides a wide portfolio of IT and telecommunications services with more than 5300 employees. We have hundreds of large customers, corporations in Germany and in other European countries. DT-ITS received the Best in Educational Cooperation award from HIPA in 2019, acknowledged as the Most Ethical Multinational Company in 2019. The company continuously develops its four sites in Budapest, Debrecen, Pécs and Szeged and is looking for skilled IT professionals to join its team.
Role Description Are you an expert in deploying, observing, and maintaining distributed fleets of devices? Do you build infrastructure that scales effortlessly and recovers automatically from mass reconnections? Join our team to oversee the operational backbone of our edge-to-cloud ecosystem. If you love automating complex deployments and diving deep into observability metrics, you are the right fit for us! Our project, GroundOS, is not just another screen manager. It is a next-generation Universal Display System (UDS) built to power the future of global mobility. We are building an "Operating System for Reality" that orchestrates massive, data-driven signage networks across critical infrastructure, from major international airports to sprawling public transport systems. GroundOS moves beyond static displays; it uses a state-of-the-art digital twin to process and react to real-time operational data. To guarantee continuous operation, the platform features a resilient, offline-first edge architecture that ensures screens keep running smoothly even if the network fails. Join us to blend high-performance Rust edge computing with modern TypeScript cloud services and help us set a new global standard for how hundreds of millions of passengers experience their journey. - Manage the deployment, observability, and lifecycle of thousands of remote mini-PCs alongside Cloud components. - Execute Over-The-Air (OTA) updates reliably across a massive edge fleet. - Configure and manage NATS JetStream, including Leaf Nodes for edge-cloud bridging, stream retention, and cluster HA. - Setup and maintain tracing and metrics using OpenTelemetry to monitor cross-system health. - Architect resilient systems capable of withstanding mass fleet reconnection events (thundering herd) without performance loss. - Manage secrets, certificates, and secure mTLS communication between edge devices and the central control plane. - Lead incident management and root-cause analysis for fleet-wide issues. - Design scalable operations workflows to keep maintenance effort constant as the fleet grows. Qualifications - Extensive experience with infrastructure automation and remote fleet management. - High proficiency in containerization (Docker), specifically optimized for edge devices (multi-arch builds, ARM/x64). - Deep operational knowledge of NATS JetStream or similar high-throughput event brokers. - Strong background in observability, tracing, and metric collection. - Solid understanding of Zero-Trust security architectures and certificate management. - Ability to remain calm and analytical during high-pressure incident response situations. - Expert knowledge of agile development. - Solid knowledge of Scrum. - Experience working in agile projects and teams. - Excellent English skills, both written and spoken (B2–C1). - Excellent technical and analytical skills, as well as problem-solving abilities. - Ability to handle stressful situations and work independently. Advantages - Experience with Google Clouds GKE for the central cloud control plane. - Prior experience with specific edge orchestration tools. Additional Information - Please be informed that our remote working possibility is only available within Hungary due to European taxation regulation. Company Description As Hungary’s most attractive employer in 2025 (according to Randstad’s representative survey), Deutsche Telekom IT Solutions is a subsidiary of the Deutsche Telekom Group. The company provides a wide portfolio of IT and telecommunications services with more than 5300 employees. We have hundreds of large customers, corporations in Germany and in other European countries. DT-ITS received the Best in Educational Cooperation award from HIPA in 2019, acknowledged as the Most Ethical Multinational Company in 2019. The company continuously develops its four sites in Budapest, Debrecen, Pécs and Szeged and is looking for skilled IT professionals to join its team.
Senior Engineer - CI - CD Change Management
GEICOGEICO is the second-largest private auto insurer in the United States and has built a brand recognized for its memorable Gecko advertisements and its commitment
Title: Senior Engineer - CI/CD Change Management [Hybrid] Location: Bethesda United States Full time Job Description: At GEICO, we offer a rewarding career where your ambitions are met with endless possibilities. Every day we honor our iconic brand by offering quality coverage to millions of customers and being there when they need us most. We thrive through relentless innovation to exceed our customers' expectations while making a real impact for our company through our shared purpose. When you join our company, we want you to feel valued, supported and proud to work here. That's why we offer The GEICO Pledge: Great Company, Great Culture, Great Rewards and Great Careers. Position Summary GEICO is seeking an experienced Senior Engineer with a passion for building high-performance, low maintenance, zero-downtime platforms, and applications. You will help drive our insurance business transformation as we transition from a traditional IT model to a tech organization with engineering excellence as its mission, while co-creating the culture of psychological safety and continuous improvement. Position Description Our Senior Engineer is a key member of the engineering staff working across the organization to provide a friction-less experience to our customers, maintain the highest standards of protection and availability. Our team thrives and succeeds in delivering high quality technology products and services in a hyper-growth environment where priorities shift quickly. The ideal candidate has broad and deep technical knowledge, typically ranging from front-end UIs through back-end systems and all points in between. Position Responsibilities As a Senior Engineer, you will: - Scope, design, and build scalable, resilient distributed systems - Build product definition and leverage your technical skills to drive towards the right solution - Lead in design sessions and code reviews with peers to elevate the quality of engineering across the organization - Define, create, and support reusable application components/patterns from a business and technology perspective - Utilize developer tooling and a variety of Azure tools and services across the software development life cycle (task management, source code, building, deployment, operations, real-time communication) to perform advanced-level Java application design, implementation, and maintenance activities under minimal direction - Mentor other engineers - Consistently share best practices and improve processes within and across teams - Build and release software baselines, code merge, branch, and label creation - Work with development teams on CI/CD and feature-flag code management procedures - Collaborate with automated testing teams, monitoring teams, and infrastructure teams to ensure reliable deployments - Resolve dependencies and ensure that deadlines are met - Support a continuous Integration model by streamlining the code changes, triggering an automated code build and test sequence - Support a continuous delivery model by automating software build and package migration processes - Create and manage automated YAML-based deployment processes for Java, .NET, or Python solutions Qualifications - Advanced programming experience with at least two modern languages such as Java, C++, Python or C# including object-oriented design - Proven understanding of micro-services-oriented architecture and extensible REST APIs - Experience building the architecture and design (architecture, design patterns, reliability, and scaling) of new and current systems - Advanced understanding of DevOps Concepts and Cloud Architecture - Experience with continuous delivery and infrastructure as code - Knowledge of developer tooling across the software development life cycle (task management, source code, building, deployment, operations, real-time communication) - Hands-on configuration skills with code management, work item, and continuous integration tools. Microsoft DevOps experience is preferred. Skills with similar tools (GIT, Jenkins) are acceptable - Proven experience in supporting JAVA, .NET, or Python development lifecycle for enterprise level applications - Demonstrated knowledge of Continuous Integration/Continuous Deployment (CI/CD) code deployment, branching, and merging strategies - Knowledge of YAML scripting is a plus - Ability to write and implement scripts in PowerShell, Ant, Maven, or similar build and deployment languages is a plus - Experience with Microservices, API services deployment - Proven ability to work collaboratively with development teams with solid verbal and written communication skills - Strong problem-solving ability - Ability to excel in a fast-paced, startup-like environment Experience - 4+ years of developing and maintaining software deployment processes in a Java, .NET, or python environment - 3+ years of experience building the architecture and design of new and current systems - 3+ years of experience with AWS, GCP, Azure, or hybrid data center - 2+ years of experience in open-source frameworks, or one of the following: .NET Core, asp. Net, Angular, or Express Education - Bachelor's degree in computer science, Information Systems, or equivalent education or work experience Annual Salary $105,000.00 - $215,000.00 The above annual salary range is a general guideline. Multiple factors are taken into consideration to arrive at the final hourly rate/ annual salary to be offered to the selected candidate. Factors include, but are not limited to, the scope and responsibilities of the role, the selected candidate's work experience, education and training, the work location as well as market and business considerations. GEICO will consider sponsoring a new qualified applicant for employment authorization for this position. The GEICO Pledge: Great Company: At GEICO, we help our customers through life's twists and turns. Our mission is to protect people when they need it most and we're constantly evolving to stay ahead of their needs. We're an iconic brand that thrives on innovation, exceeding our customers' expectations and enabling our collective success. From day one, you'll take on exciting challenges that help you grow and collaborate with dynamic teams who want to make a positive impact on people's lives. Great Careers: We offer a career where you can learn, grow, and thrive through personalized development programs, created with your career - and your potential - in mind. You'll have access to industry leading training, certification assistance, career mentorship and coaching with supportive leaders at all levels. Great Culture: We foster an inclusive culture of shared success, rooted in integrity, a bias for action and a winning mindset. Grounded by our core values, we have an an established culture of caring, inclusion, and belonging, that values different perspectives. Our teams are led by dynamic, multi-faceted teams led by supportive leaders, driven by performance excellence and unified under a shared purpose. As part of our culture, we also offer employee engagement and recognition programs that reward the positive impact our work makes on the lives of our customers. Great Rewards: We offer compensation and benefits built to enhance your physical well-being, mental and emotional health and financial future. - Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family's overall well-being. - Financial benefits including market-competitive compensation; a 401K savings plan vested from day one that offers a 6% match; performance and recognition-based incentives; and tuition assistance. - Access to additional benefits like mental healthcare as well as fertility and adoption assistance. - Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year. The equal employment opportunity policy of the GEICO Companies provides for a fair and equal employment opportunity for all associates and job applicants regardless of race, color, religious creed, national origin, ancestry, age, gender, pregnancy, sexual orientation, gender identity, marital status, familial status, disability or genetic information, in compliance with applicable federal, state and local law. GEICO hires and promotes individuals solely on the basis of their qualifications for the job to be filled.
Senior Site Reliability Engineer
SEONThe command center for fraud prevention and AML compliance that enriches data, provides context and directs action.
• Ensure the reliability, availability, and performance of our systems by implementing SRE best practices • Develop and maintain comprehensive monitoring and alerting systems using tools such as Prometheus, Grafana, ELK stack, etc. • Manage incident response and root cause analysis for production issues • Conduct postmortems to learn from failures and drive continuous improvement in the system’s reliability • Continuously monitor and optimize the performance of cloud infrastructure to ensure efficient resource utilization and cost-effectiveness • Automate routine tasks and processes to reduce manual intervention and increase efficiency • Analyze current system capacity and plan for future growth to ensure the infrastructure can scale with increasing demands • Define, measure, and monitor SLOs and SLIs to ensure that services meet their reliability targets • Work closely with engineering, and product teams to provide feedback and suggestions on new architectures, ensuring they meet reliability and performance standards • Develop and maintain comprehensive documentation for architecture, infrastructure, and troubleshooting processes. • Provide on-call support to ensure the continuous availability of our applications and infrastructure • Ensure that systems meet security and compliance requirements, performing regular audits and assessments based on the internal security team’s guidelines • Stay current with new technologies and industry trends, evaluating their potential impact on our infrastructure and reliability practices
Deployment Engineer
SkydioSkydio is the leading U.S. drone manufacturer and world leader in autonomous flight.
• Work closely with internal teams to become an expert on Skydio’s Deployment products, processes, specifications, and product roadmap • Deploy and ensure our cloud connected devices software and hardware are functioning and providing value as agreed in our customer’s ecosystem after it has been installed, configured, tested, and modified as per requirements. • Communicate with customers to ensure that all of their needs are understood and addressed • Collaborate with various internal departments to ensure that they fulfill all customer requests • Resolve complaints and keeping track of all processes that pertain to the client’s needs • Act as the customer’s representative to ensure that their demands are met with a focus on improving the customer experience • Track and manage all implementation projects with our large enterprise customers for successful delivery of technology and services. • Develop and maintain deployment and installation documentation, and documenting Standard Operating Procedures for the customer to ensure proper usage and value out of deployment. Quantify product feedback and briefing executives to drive software and hardware engineering to better fit our customers needs • Build customer loyalty through proactive support and account management • Build scalable processes for installation of cloud connected devices on Enterprise grade secure networks



