The largest platform for hiring top remote talent from Latin America.
Senior DevOps Engineer
Location
Latin America (LATAM)
Posted
37 days ago
Salary
0
Seniority
Senior
No structured requirement data.
Job Description
Senior DevOps Engineer
Workana
Role Description MedXprts.ai is looking for a Senior DevOps / Platform Engineer to support and evolve a production AI medical-legal report platform. The platform is already live and running on AWS EKS across a multi-account AWS Organization under HIPAA requirements. This is not a greenfield build. The focus is to keep production stable, improve the existing infrastructure, strengthen observability and compliance, and help the team move faster. This is a part-time remote role, around 20 hours per week, ideal for someone senior, hands-on, and adaptable. Responsibilities - Maintain and improve a production AWS/Kubernetes platform, including incidents, deploys, rollbacks, IAM changes, and infrastructure troubleshooting. - Support ongoing platform evolution, including environment separation, release versioning, workflow monitoring, and production migrations. - Improve observability, alerting, and platform reliability using tools such as Loki and Grafana. - Support HIPAA-oriented infrastructure hardening, including encryption, access controls, audit logs, and secure operational practices. Qualifications - Senior-level DevOps / Platform Engineering experience in production environments. - Strong AWS experience, including IAM, cross-account access, EKS/Fargate, RDS, DynamoDB, S3, and KMS. - Solid production Kubernetes, Docker/container, CI/CD, Linux, and networking experience. - Experience with automation using Python and shell scripting. - Strong troubleshooting skills and ability to work independently without heavy micromanagement. - Adaptable, curious, and comfortable learning new technologies as the platform evolves. - Availability for approximately 20 hours per week, with overlap with U.S. Eastern working hours. Requirements - Nice to have: HIPAA, healthcare, or regulated-environment experience. - GCP exposure, especially Vertex AI or Cloud Logging. - Observability experience with Loki and Grafana. - Exposure to MLOps or AI/ML infrastructure. - Experience supporting globally distributed or 24/7 production systems. Benefits - Remote role open to LATAM candidates. - Part-time engagement of approximately 20 hours per week. - Opportunity to work on a production AI platform in the medical-legal / healthcare space.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Senior Manager, Cloud & DevOps Engineering
Nomi HealthRebuilding healthcare with services and technology solutions that deliver easy access to quality, affordable care.
• Own the day-to-day operation of our AWS and Kubernetes infrastructure across multiple business units • Lead a team that delivers reliably against a roadmap set in partnership with senior technical leadership • Partner closely with the VP of Technical Operations and Automation, who serves as the architecture lead for DevOps • Review a Terraform PR, debug a production issue, and coach your engineers through hard problems • Responsible for the platform meeting its specifications — uptime, security, throughput, access
- Build and scale the infrastructure that powers real-time AI inference across GPU fleets, bare-metal servers, serverless and containerised production systems - Help evolve Runware’s platform toward more elastic, on-demand infrastructure that can scale quickly with customer traffic and model demand - Make Runware faster, more reliable and more resilient by improving the critical paths behind our request entrypoints, inference services, queues, storage, load balancers and networking layer - Automate the hard parts of infrastructure operations, from provisioning and configuration through to CI/CD, deployment safety, progressive rollouts and rapid rollback - Build the observability backbone for a high-performance AI platform, with the signals needed to spot issues early, understand capacity and fix problems before customers feel them - Play a leading role in production operations, incident response, debugging and post-incident improvements, helping us turn operational challenges into a stronger platform - Strengthen the security and compliance foundations of our infrastructure through patching, secrets management, access controls, hardening, auditability, documentation and repeatable operational processes
Senior DevOps Engineer
SagentSagent powers banks and lenders to make loans and homeownership simpler and safer for millions of consumers.
• Operate and improve multi-region GKE clusters hosting hundreds of microservices across multiple environments from development through production • Manage the Kubernetes platform layer: Istio service mesh, cert-manager, external-dns, RBAC, HPA/KEDA autoscaling, HashiCorp Vault secret injection, and Helm-based deployments • Develop and maintain Terraform modules across multiple IaC repositories covering GKE, networking (Shared VPC, Cloud NAT, Private Service Connect), Cloud SQL, Cloud Storage, Dataproc, Cloud Composer, Vault, and web hosting • Maintain and extend Azure DevOps CI/CD pipelines using shared Terraform templates with multi-environment deployment workflows • Support Confluent Kafka infrastructure including Connect workers with JDBC source connectors, consumer group health monitoring, and Kafka-lag-based autoscaling with KEDA • Manage Redis Enterprise clusters on Kubernetes with operator-managed lifecycle and replication • Operate the observability stack: Grafana Cloud (Alloy, Loki, Mimir, Tempo, Pyroscope via Private Service Connect), kube-prometheus-stack, Google Managed Prometheus, OpenTelemetry Operator/Collector, Beyla, and Kubecost • Harden cluster security posture: NetworkPolicies, Pod Security Standards, admission policy enforcement, CrowdStrike Falcon, Lacework, kube-bench, and cert-manager with Let’s Encrypt ACME • Support data infrastructure including Cloud SQL (PostgreSQL), Dataproc (Spark), Cloud Composer (Airflow), Matillion CDC pipelines, Snowflake, and BigQuery • Manage DNS across multiple providers (Azure DNS, Cloudflare, GCP Cloud DNS) via external-dns, and support Azure APIM and Cloudflare CDN/WAF • Partner directly with application development teams to troubleshoot deployment failures, tune resource limits and autoscaling, and resolve Kafka consumer lag and connectivity issues • Contribute to the Internal Developer Portal (Backstage) and internal CLI tooling that enables self-service for product engineers.
• Design, implement, and evolve GCP-based infrastructure using Infrastructure as Code with Terraform and Google Cloud deployment automation patterns. • Build and maintain scalable CI/CD pipelines using Cloud Build, GitHub Actions, Jenkins, or equivalent platforms for application, infrastructure, and platform workloads. • Administer and optimize GCP delivery workflows including Cloud Build triggers, Artifact Registry, source integrations, deployment approvals, and service account access patterns. • Partner with engineering teams to improve build, release, and deployment workflows across microservices and cloud-native applications. • Implement robust observability across systems using Google Cloud Operations Suite, Cloud Logging, Cloud Monitoring, and related telemetry tooling. • Strengthen platform security by integrating secrets management, policy enforcement, vulnerability scanning, and least-privilege access control. • Manage and optimize containerized environments using Kubernetes, Helm, and Google Kubernetes Engine (GKE). • Drive reliability engineering practices including incident response, root cause analysis, SLO thinking, and automated remediation where appropriate. • Standardize reusable templates, modules, and platform patterns that improve developer productivity and consistency. • Mentor engineers and provide technical leadership on GCP architecture, deployment automation, release governance, and DevSecOps practices.




