Flatgigs logo
Flatgigs

Scaling Investor-Backed Startups & Growth Companies

Senior DevOps Engineer – MLOps Focus

DevOps EngineerDevOps EngineerFull TimeRemoteSeniorTeam 1-10Since 2023H1B No SponsorCompany SiteLinkedIn

Location

United Arab Emirates

Posted

59 days ago

Salary

0

Seniority

Senior

Job Description

Senior DevOps Engineer – MLOps Focus

Flatgigs

• Design and maintain secure, scalable cloud infrastructure across Azure and GCP • Build and optimise CI/CD pipelines and deployment workflows • Implement and manage Infrastructure as Code (IaC) (Terraform or similar) • Deploy and operate machine learning pipelines in production environments • Manage and scale multi-cluster GPU environments for AI workloads • Set up monitoring, logging, and alerting systems to ensure reliability • Apply cloud security best practices across infrastructure and access layers • Troubleshoot and resolve system, deployment, and performance issues • Provide hands-on IT support (user access, devices, internal systems) to ensure smooth day-to-day operations • Collaborate with engineering teams to improve scalability, performance, and automation

Job Requirements

  • 4+ years of experience in DevOps / SRE / Infrastructure Engineering
  • Strong experience with Azure and Google Cloud Platform (GCP)
  • Hands-on experience building and managing CI/CD pipelines
  • Solid expertise in Infrastructure as Code (Terraform, Pulumi, etc.)
  • Experience managing Kubernetes environments and GPU workloads
  • Proven experience deploying machine learning models or MLOps pipelines
  • Strong understanding of cloud security, networking, and system reliability
  • Comfortable handling IT support responsibilities alongside core DevOps work
  • Strong problem-solving mindset with the ability to work independently

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Full TimeRemoteTeam 201-500Since 1986H1B Sponsor

• Build software and systems to manage platform infrastructure and applications • Improve reliability, quality, and time-to-market of our suite of software solutions • Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement • Provide primary operational support and engineering for multiple large-scale distributed software applications • Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding • Partner with development teams to improve services through rigorous testing and release procedures • Participate in system design consulting, platform management, and capacity planning • Create sustainable systems and services through automation and uplifts • Balance feature development speed and reliability with well-defined service-level objectives

Washington
$119K - $133K / year
Job Closed
Full TimeRemoteTeam 10,001+Since 1990H1B No Sponsor

Site Management Associate I (FSP - Sponsor dedicated) ICON plc is a world-leading healthcare intelligence and clinical research organization. We’re proud to foster an inclusive environment driving innovation and excellence, and we welcome you to join us on our mission to shape the future of clinical development We are currently seeking a Site Management Associate I to join our diverse and dynamic team. As a SMA I at ICON, you will play an important role in supporting the management of clinical trial sites by assisting with site monitoring activities, ensuring compliance with study protocols, and maintaining high standards of operational efficiency. You will contribute to the success of clinical research by providing essential administrative and logistical support to site management teams. What You Will Be Doing: - Assisting in the monitoring of clinical trial sites, ensuring adherence to study protocols, regulatory requirements, and Good Clinical Practice (GCP) guidelines. - Initiation of investigator site activities, including collection of site essential documents. - Communication directly with sites to enable start-up, as required and maintain an active collaboration with sites during maintenance and close out, such as essential documents collection per sponsor’s request. - eTMF filing of collected documents, audit preparation support - Request, creation and follow-up on site accounts - Contracts personalization/customization, following local alignment with sponsor team - Customization of informed consent documents - Supporting site management activities, including documentation preparation, data entry, and tracking site performance metrics. - Collaborating with cross-functional teams to facilitate communication and address site-related issues effectively. - Maintaining accurate records of site activities and contributing to the preparation of monitoring reports. - Participating in training and development initiatives to enhance knowledge and skills in clinical trial management. Your Profile: - Bachelor’s degree in a relevant field such as life sciences, healthcare administration, or clinical research. - At least Intermediate level of English - Proven working experience in clinical research in administrative support roles or site management roles. - Strong organizational skills and attention to detail, with the ability to manage multiple tasks effectively. - Basic knowledge of clinical trial processes and regulatory requirements. - Excellent communication and interpersonal skills, with the ability to work collaboratively within a team environment. #LI-ED1 #LI-Remote -- Important note: This is a Homebased role in Buenos Aires, Argentina only -- What ICON can offer you: Our success depends on the quality of our people. That’s why we’ve made it a priority to build a diverse culture that rewards high performance and nurtures talent. In addition to your competitive salary, ICON offers a range of additional benefits. Our benefits are designed to be competitive within each country and are focused on well-being and work life balance opportunities for you and your family. Our benefits examples include: - Various annual leave entitlements - A range of health insurance offerings to suit you and your family’s needs. - Competitive retirement planning offerings to maximize savings and plan with confidence for the years ahead. - Global Employee Assistance Programme, TELUS Health, offering 24-hour access to a global network of over 80,000 independent specialised professionals who are there to support you and your family’s well-being. - Life assurance - Flexible country-specific optional benefits, including childcare vouchers, bike purchase schemes, discounted gym memberships, subsidised travel passes, health assessments, among others. Visit our careers site to read more about the benefits ICON offers. At ICON, inclusion & belonging are fundamental to our culture and values. We’re dedicated to providing an inclusive and accessible environment for all candidates. ICON is committed to providing a workplace free of discrimination and harassment. All qualified applicants will receive equal consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or protected veteran status. If, because of a medical condition or disability, you need a reasonable accommodation for any part of the application process, or in order to perform the essential functions of a position, please let us know or submit a request here. Interested in the role, but unsure if you meet all of the requirements? We would encourage you to apply regardless – there’s every chance you’re exactly what we’re looking for here at ICON whether it is for this or other roles. Are you a current ICON Employee? Please click here to apply

Argentina
Full TimeRemoteTeam 201-500H1B No Sponsor

• Maintain reliable, high‑performing AWS production systems. • Manage EKS clusters for configuration, scaling, and workload stability. • Set up and support Istio service mesh for traffic control and security. • Oversee GitOps workflows to ensure secure, consistent infrastructure changes. • Create automation tools and platform enhancements. • Design, implement, and manage monitoring, logging, and tracing solutions across a diverse range of applications—including AI workloads, microservices, and data pipelines—to ensure visibility, reliability, and rapid issue resolution. • Respond to incidents, analyze root causes, and recommend lasting solutions. • Work with developers and platform teams to enhance deployments and system operations. • Support nx‑based monorepos for scalable, effective developer workflows. • On call rotation.

Romania
Job Closed
Visa logo

Staff Site Reliability Engineer

Visa

Based in Foster City, California, Visa is a global payments technology organization. Visa was founded in 1958, coinciding with Bank of America’s launch of the

DevOps Engineer59 days ago

• Own the end‑to‑end lifecycle (design, provisioning, upgrades, and decommissioning) of core platform components, including cloud infrastructure primitives, Kubernetes clusters, networking, ingress, service discovery, service mesh, and data‑plane components. • Design, build, and evolve a highly reliable and resilient containerized platform supporting critical workloads, applying SRE and cloud‑native best practices. • Lead the design and implementation of infrastructure bootstrap orchestration, enabling deterministic, repeatable platform bring‑up and teardown across cloud, network, and Kubernetes layers. • Drive a strong Infrastructure‑as‑Code and GitOps‑first approach, ensuring platform components are reproducible, auditable, automated, testable, and reversible. • Identify and close automation gaps, leading initiatives that significantly reduce manual effort, onboarding time, and operational risk at scale. • Apply and promote SRE principles such as fault isolation, graceful degradation, capacity planning, saturation control, and clear failure modes across the platform. • Continuously assess platform reliability risks and proactively improve stability, resilience, and operational readiness. • Act as a technical reference and escalation point for platform reliability, participating in on‑call rotations, incident response, post‑incident reviews, and problem management. • Improve platform operability by simplifying day‑2 operations, standardizing upgrade and rollback strategies, and reducing Mean Time to Detect (MTTD) and Mean Time to Recover (MTTR). • Ensure platform operations align with security, compliance, and internal control requirements. • Collaborate closely with cross‑functional engineering teams, influencing technical decisions and promoting best practices through hands‑on contributions and technical leadership. • Contribute to architectural and technical discussions, supporting continuous improvement and long‑term platform evolution. • Stay up to date with emerging technologies, SRE practices, and cloud‑native patterns, sharing insights at squad and collective levels. • Be recognized for delivering high‑impact, high‑quality platform and reliability solutions across the organization.

Brazil
Job Closed