Job Closed
This listing is no longer active.
Somos Humanos. Somos Digitais. Somos Verity!
SRE / DevOps Engineer
Location
Brazil
Posted
50 days ago
Salary
0
Seniority
Senior
Job Description
SRE / DevOps Engineer
Verity Group
• Design, implement, and evolve CI/CD pipelines • Provision and maintain infrastructure on GCP using Terraform and Ansible • Operate and scale Kubernetes environments (GKE) • Define, implement, and monitor SLIs, SLOs, and Error Budgets • Build observability, alerts, and APM (Dynatrace experience is a plus) • Work closely with squads, promoting platform engineering and reliability best practices
Job Requirements
- Hands-on experience with Google Cloud Platform (GCP)
- Strong knowledge of Kubernetes (GKE) and Docker
- Proven experience with Terraform and Ansible
- Experience with CI/CD pipelines (GitLab CI, GitHub Actions, Jenkins, or similar)
- Administration of Linux environments
- Experience working in agile environments (Scrum/Kanban)
- Ability to work autonomously in critical environments
- Clear technical communication with developers and stakeholders
- Proactive approach to identifying and mitigating operational risks
Benefits
- Meal allowance
- Food allowance
- Home office allowance
- Health insurance
- Dental insurance
- Life insurance
- Employee discount partnerships
- Partnerships with retailers and educational institutions
- Ongoing agile training
- Alura subscriptions
- Verity Break
- #VerityComVocê
- Viva Engage
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Partner with product and platform engineering teams to improve system reliability, scalability, and developer experience • Build, maintain, and evolve CI/CD pipelines to support safe, fast, and reliable deployments • Improve observability through better monitoring, alerting, logging, and telemetry • Implement and maintain Infrastructure as Code (Terraform) to manage cloud resources safely and reproducibly • Operate and scale containerized workloads (Kubernetes, Docker) • Support and evolve Zipline's AWS-based cloud infrastructure (experience with GCP or Azure is a plus) • Assist our Data team in codifying and maintaining our data warehouse and ML infrastructure on GCP. • Participate in an on-call rotation, responding to and resolving production incident • Contribute to incident follow-ups and postmortems by helping implement durable fixes and reducing operational toil • Collaborate with Rails-focused product teams to improve reliability, performance, and deployment workflows
• Partner with product and platform engineering teams to improve system reliability, scalability, and developer experience • Build, maintain, and evolve CI/CD pipelines to support safe, fast, and reliable deployments • Improve observability through better monitoring, alerting, logging, and telemetry • Implement and maintain Infrastructure as Code (Terraform) to manage cloud resources safely and reproducibly • Operate and scale containerized workloads (Kubernetes, Docker) • Support and evolve Zipline's AWS-based cloud infrastructure (experience with GCP or Azure is a plus) • Assist our Data team in codifying and maintaining our data warehouse and ML infrastructure on GCP. • Participate in an on-call rotation, responding to and resolving production incidents • Contribute to incident follow-ups and postmortems by helping implement durable fixes and reducing operational toil • Collaborate with Rails-focused product teams to improve reliability, performance, and deployment workflows
About 10a Labs: 10a Labs is the safety and threat-intelligence layer trusted by frontier AI labs, AI unicorns, Fortune 10 companies, and leading global technology platforms. Our adversarial red teaming, model evaluations, and intelligence collection enable engineering, safety, and security teams to stay ahead of evolving threats and deploy AI systems safely. 3–8 Years of Industry Experience | Remote | High-Impact About the Role: We’re looking for an infrastructure-focused engineer who thrives at the intersection of machine learning, systems, and product delivery. This is a hands-on role responsible for deploying, monitoring, and scaling a real-time ML-powered content moderation system used to detect and triage abuse, threats, and edge-case language. You’ll work closely with ML engineers, researchers, and clients to build infrastructure that makes high-performance models accessible and reliable in the wild. In This Role, You Will: - Design and maintain cloud infrastructure (GCP or AWS) to support real-time model serving, data ingestion, and evaluation workflows. - Deploy and optimize APIs for low-latency access to ML models and embedding search systems. - Manage and optimize the end-to-end training data flow—from sourcing and cleaning datasets to preparing them for model consumption—ensuring accuracy, scalability, and efficiency. - Build observability tooling for production ML pipelines (monitor latency, error rates, request volumes, drift). - Automate model deployment, retraining, and evaluation pipelines (CI/CD for ML). - Work with ML engineers to package models for serving. - Help manage vector databases and semantic search infrastructure (e.g., Pinecone, FAISS, Vertex Matching Engine). - Ensure security, compliance, and uptime of infrastructure supporting safety-critical systems. We’re Looking for Someone Who: - Has 3–8 years of experience deploying machine learning systems or high-availability backend systems. - Has shipped and maintained production infrastructure at scale, supporting ML workflows. - Has experience with GCP, AWS, or similar platforms (including managed ML services). - Is proficient in Terraform, Docker, Kubernetes, or similar infra tools. - Understands performance tradeoffs in serving models and embedding search pipelines. - Can work cross-functionally with ML, security, and product teams to deploy safely and iterate fast. - Brings a builder's mindset and bias for ownership in ambiguous environments. Nice to Have Experience With: - Experience with vector databases or ANN systems, preferably within GCP (or AWS). - Experience serving LLMs or embedding-based models via API. - Experience with model monitoring, logging, and metrics platforms (e.g., Prometheus, Grafana, Sentry). - Familiarity with trust & safety infrastructure, abuse detection, or policy enforcement systems. What Success Looks Like in the First 3 Months: - You’ve deployed and monitored a real-time ML inference system with well-defined observability. - You’ve implemented an API with latency under 200ms for embedding or classifier-based inference. - You’ve partnered with ML engineers to streamline deployment and retraining workflows. - You’ve built logging and monitoring that gives insight into system performance and classifier behavior. Compensation & Benefits: - Salary Range: $130K–$230K, depending on experience and location. - Bonus: Performance-based annual bonus. - Professional Development: Support for continuing education, conferences, or training. - Work Environment: Fully remote, U.S.-based. - Health Benefits: Comprehensive health, dental, and vision coverage. - Time Off: Generous PTO and paid holiday schedule. - Retirement: 401(k) plan.
Location: Austin Texas, Reston, Virginia OR Fully Remote About the Opportunity: We are seeking a DevOps Manager with deep expertise in Kubernetes, Terraform, and Ansible to help scale Seekr’s AI platform across on-premises, cloud, and SaaS environments. You’ll be highly hands-on, juggling multiple projects, mentoring engineers, and driving complex initiatives to deliver robust, scalable, and reliable systems. On-prem experience is highly preferred. This role demands a strong foundation in Linux, networking (both traditional and Kubernetes), container technologies, and automation. You’ll collaborate closely with software engineering teams, own critical infrastructure, and solve challenging operational and scalability problems in fast-paced, dynamic environments. From your first day, you will make a valuable — and valued — contribution. We are a fast-growing company where no one is a bystander. We offer you the opportunity to delight millions of consumers around the world while gaining meaningful experience across a variety of disciplines. Duties and Responsibilities: - Lead development of solutions to complex reliability, performance, and scaling challenges. - Design, architect, and implement systems, networks, and services powering Seekr’s platform. - Provide hands-on leadership and mentorship to the team. - Partner with software engineering teams to build scalable, efficient, and reliable services. - Identify and resolve operational inefficiencies through automation. - Troubleshoot and lead response to deployment and production incidents. - Implement and enforce security best practices, ensuring infrastructure, deployments, and data are protected at every stage. Skills and Qualifications: - Technical Leadership: 12+ years experience, Proven ability to deliver results in a high-pressure/dynamic environment, Communication Skills, Roadmap & long-term strategy, mentoring senior engineers. - Kubernetes & Distributed Systems: Enterprise-scale K8s with custom operators/controllers, multi-platform clusters, hybrid fleet orchestration across cloud & edge, K8s control plane, k8s upgrades, Docker, containerd, CRI-O, Ingress Controllers (Istio, NGINIX, Traefik), K8s Databases, Helm charts. - Database Management: Postgres, ElasticSearch/OpenSearch, Kubernetes databases, Stateful sets. - Networking: L2/L3 protocols (BGP, OSPF, VLANs, IPSec), VPNs, firewalls, redundancy paths, bare-metal Linux networking, CoreDNS, Calico, K8s service mesh (Istio). - Infrastructure Automation: Ansible, Terraform, CI/CD Pipelines, GitLab, ArgoCD, MAAS, scripting (Python, Golang, Bash), AWS, Azure. - Observability: Grafana, Prometheus, Loki, Tempo, ELK, OTEL. - Security: Zero-trust architecture, PKI, mTLS, SPIFFE/SPIRE, certificate automation, CVE remediation, secrets management, IAM. - Incident Management & RCA: End-to-end incident lifecycle, root cause analysis, corrective action ownership. About the Company: Seekr is a leader in explainable and trustworthy artificial intelligence designed to power mission-critical decisions in enterprises, government, and regulated industries. SeekrFlow™, our end-to-end AI platform, provides secure, auditable AI solutions tailored to sectors where transparency, accuracy, and compliance are paramount. Available across cloud, on-premises, and edge environments, SeekrFlow reduces bias, strengthens data integrity, and simplifies model oversight so organizations can rely on trusted AI decisions in high-stakes settings that impact society’s most sensitive and vital systems. Trusted by leading enterprises and government agencies, we partner with defense, finance, telecom, and critical infrastructure leaders to enable AI solutions that drive real-world results with unmatched transparency and control.We are a team of strategic thinkers and problem-solvers tackling the toughest challenges facing critical infrastructure and global enterprises through best-in-class AI models and customer deployment.Our team operates with unwavering commitment to our core values and mission: - We are driven by outcomes—our customers' success is what we strive for every day. - We believe trust is earned, which is why we build explainability and transparency into the entire AI lifecycle. - We take our responsibility to deliver secure AI seriously. - We believe innovation drives progress—we are building the technologies that power the systems our society depends on. Company Benefits: - Meaningful Mission & Impact - Work with a deeply talented, collaborative team solving some of the toughest AI challenges that matter. - Equity Ownership – RSUs that let you share directly in Seekr’s long‑term success and growth. - Time Off That Respects Real Life – Unlimited PTO plus 14 paid company holidays to truly recharge. - Work Your Way – A flexible hybrid work environment with offices in Reston, VA and Austin, TX, plus remote options and flexible working hours. - Competitive Total Rewards – A role‑appropriate compensation structure that supports long‑term growth, including base salary, bonuses, or commission plans depending on role. - 401(k) with Company Match – Build your future with a retirement plan that includes employer matching. - Comprehensive Health & Wellness – Medical, dental, vision, and life insurance coverage starting day one—for you and your family. - Parental Leave – Paid parental leave to support employees as they welcome a new child through birth, adoption, or foster placement.

