Job Closed

This listing is no longer active.

Thomas Talent Network

Building agents for the global entertainment industry and creator economy. Professionals at WME, UTA, Netflix, Night, and Live Nation use this product today through private beta. The company has a massive trust moat: a 270K+ proprietary distribution network of verified entertainment professionals. Founded by Vince Morales (2x founder, UTA Ventures, Elevate Ventures), Warner Bailey (2x founder, WME, Live Nation), and Ryan McCaffrey (WMG, Hebbia AI, Robinhood). Technical talent from NVIDIA, Intuit, and HubSpot. Oversubscribed its $400K round through angels from Coatue, Ramp, Plug and Play, FanFix, Temple Hill, Outshine Talent, and Undercurrent Talent. Now raising $2M with +$500K already committed.

Senior DevOps Engineer / Site Reliability Engineer

Location

United States

Posted

37 days ago

Salary

$140K - $160K / year

Seniority

Senior

No structured requirement data.

Job Description

Senior DevOps Engineer / Site Reliability Engineer

Thomas Talent Network

Role Description A leading B2B SaaS platform in the cross-border e-commerce sector, is expanding its North America operations. We're seeking a Senior DevOps Engineer / Site Reliability Engineer (SRE) to architect and maintain our unified global O&M (operations and maintenance) platform. This is a newly created role supporting our North America team's contribution. You'll work directly with our Middle Platform Director, Technical Experts, and CEO in a collaborative, remote-first environment. - Design, develop, and maintain unified operation and platform management systems covering resource management, monitoring & alerting, configuration management, and automated operation & maintenance - Build and operate observability platforms and CI/CD pipelines; develop self-healing systems and automated incident response processes to realize intelligent O&M - Establish DevOps standards and best practices; promote standardization of DevOps toolchains (technology selection, version management) - Provide platform-level technical support for product and engineering teams; resolve complex system issues, reduce technical debt, and lead infrastructure and architecture upgrades - Promote SRE concepts and engineering practices; organize technical sharing and training; build a reliability engineering system - Conduct technical research and innovation; track cloud-native/DevOps industry trends; evaluate new technologies and drive continuous modernization of O&M platforms Qualifications - Currently residing in California or North Carolina, USA - US Green Card or US Citizenship (work authorization; no sponsorship available) - Fluent in Mandarin Chinese (working language; close collaboration with domestic R&D required) - Bachelor's degree or above in Computer Science or related field - 4-6 years of hands-on experience in DevOps/SRE/Platform Engineering - Proficient in at least one major cloud platform (AWS/Azure/GCP) with deep understanding of VPC, EC2, EKS/K8s, RDS, IAM - Proficient in Linux, networking, containers (Docker/Kubernetes), load balancing, and service governance - Skilled in IaC (Infrastructure as Code) tools: Terraform, Ansible, Helm - Experience building CI/CD pipelines: Jenkins, Argo CD, CodeBuild, etc. - Familiar with monitoring/logging/tracing: Prometheus, Grafana, ELK, OpenTelemetry - Proficient in at least one development/scripting language: Python, Shell, Go - Excellent system design, analysis, and troubleshooting skills - Strong cross-team communication and collaboration abilities Preferred Qualifications - Master's degree in Computer Science or related field - Experience with global platforms, cross-border SRE, multi-cloud O&M - Led platform reconstruction, self-healing systems, or observability initiatives - Go development, service mesh, chaos engineering, capacity planning experience - Demonstrated success improving system availability, reducing incident rates, increasing automation - Global technical vision and cross-cultural collaboration experience - Result-oriented, self-driven, experienced in technical evangelism/sharing Compensation - Base Salary: $140,000 - $160,000 annually (top candidates may receive 5-10% upward adjustment) - 401(k): Dollar-for-dollar match, up to 4% of salary - Medical Insurance - PTO: 12 days annually - Social Security & Housing Fund: Contributed per US legal requirements Work Environment - Location: Silicon Valley, CA OR Raleigh, NC (homebase available) - Department: Tech O&M Department - Working Style: Remote-first - Hours: 8 hours per day, weekends off - Travel: No business travel required - Expected Start: ASAP Interview Process - Round 1 (Online): Middle Platform Director + Technical Expert - Round 2 (Online): Head of HR - Round 3 (Online): CEO/Founder

Related Categories

Related Job Pages

More DevOps Engineer Jobs

AceHack 4.0 logo

Site Reliability Engineer

AceHack 4.0

Innovate - Elevate - Transform | 24 Hour in-person Hackathon in Jaipur

DevOps Engineer37 days ago
Full TimeRemoteTeam 11-50Since 2022H1B No Sponsor

• Own reliability, availability, and performance of production systems running in cloud environments • Define and monitor SLIs/SLOs and help manage error budgets across the platform • Lead incident response efforts including detection, triage, mitigation, and postmortems • Improve observability through logging, monitoring, alerting, and dashboards • Automate operational workflows and reduce manual toil wherever possible • Partner closely with engineering teams to improve system resiliency and scalability • Assist with capacity planning, infrastructure optimization, and performance tuning • Build internal tooling, runbooks, and operational best practices • Support Kubernetes-based infrastructure and distributed systems at scale • Act as an escalation point for complex production and platform issues

United States
$180K - $250K / year

Role Description Você se motiva a atuar com tecnologia, resolver desafios e fazer a diferença em ambientes dinâmicos? Como parte do time, você terá um papel essencial na sustentação e evolução da nossa infraestrutura, contribuindo diretamente para a estabilidade das soluções e para a continuidade dos nossos produtos. Seu desafio será atuar com olhar técnico e senso de responsabilidade na resolução de incidentes e no dia a dia do ambiente, garantindo respostas eficientes e bem conduzidas. Buscamos alguém que organize, documente e contribua para que os problemas não se repitam. Procuramos uma pessoa com iniciativa, repertório técnico e senso de priorização, capaz de avaliar cenários, propor caminhos viáveis e contribuir com melhorias contínuas. Também é importante ter um olhar crítico sobre o que já existe, colaborando com a evolução do ambiente de forma consistente e estruturada. Se você gosta de ambientes dinâmicos, colaborativos e com espaço para atuação prática no dia a dia, essa pode ser a oportunidade ideal para você. Vamos juntos evoluir a base que sustenta a nossa tecnologia! 🚀 Responsibilities - Gerenciar e evoluir ambientes cloud (Azure, AWS ou GCP) garantindo disponibilidade, escalabilidade e eficiência. - Manter e evoluir infraestrutura como código (Terraform, Ansible) e plataformas baseadas em containers (Kubernetes, Docker). - Definir, acompanhar e defender SLIs e SLOs como instrumentos reais de tomada de decisão. - Implementar e aprimorar observabilidade: monitoramento, logs e tracing (Prometheus, Grafana, Elastic/Sentry). - Responder a incidentes (on-call), reduzir MTTR e conduzir post-mortems com aprendizados concretos. - Automatizar processos operacionais, pipelines de CI/CD e eliminar toil de forma sistemática. - Monitorar e otimizar custos de infraestrutura (FinOps), garantindo uso eficiente dos recursos computacionais. - Apoiar decisões arquiteturais equilibrando custo, performance, segurança e confiabilidade. Qualifications - Cloud: Azure, AWS ou GCP — gerenciamento de ambientes, redes, IAM e custos - Kubernetes (kubectl, Helm) e Docker — operação e troubleshooting em produção - IaC: Terraform/Ansible - CI/CD: GitHub Actions/Jenkins - Linux avançado - Observabilidade: Prometheus, Grafana + ao menos uma ferramenta APM (Elastic, Sentry) - Redes e segurança: DNS, TCP/IP, Load Balancer, VPN, Firewalls, IAM - Scripting: Bash e Python - Inglês técnico (leitura de documentação) Differentials - Certificações: CKA, AWS, Azure ou Terraform Associate. - Mensageria: Kafka ou RabbitMQ - GitOps: ArgoCD ou FluxCD - Experiência com Platform Engineering - Conhecimento em FinOps e otimização de custos de infraestrutura Benefits - Cartão Multibenefício (Swile) - Plano de Saúde - Unimed - Conexa Plus + Psicologia Viva - Plano Odontológico - Metlife - Seguro de vida - Metlife - TotalPass - Day Off no mês do aniversário de vida - Parceria com curso de inglês e espanhol - 20 dias úteis de descanso

Brazil
Job Closed
Survatra logo

DevOps Security Contractor

Survatra

Communicate. Collaborate. Create.

DevOps Engineer37 days ago
Part TimeRemoteTeam 1-10H1B No Sponsor

• Provide ongoing DevOps and security guidance to engineering and leadership • Review current infrastructure (cloud, CI/CD, access controls) and recommend improvements • Conduct periodic security audits and risk assessments • Advise on and help implement best practices across cloud security, IAM, and data protection • Support incident response for security-related events, as well as helping refine our incident response procedures • Review and strengthen deployment pipelines and system architecture • Assist with security tooling selection and implementation (monitoring, alerting, vulnerability scanning) • Help ensure alignment with SOC 2 and general compliance standards • Partner with engineering on secure system design and new builds when needed • Document recommendations and maintain lightweight security playbooks

United States
Full TimeRemoteTeam 10,001+Since 1993H1B Sponsor

• Engage in 24/7 global shift rotations to provide remote support for network repairs and changes while collaborating across teams and updating customers on status and ticket information. • Drive operational improvements in change management and daily operations by following procedures. • Manage and operate large scale IP network technologies and infrastructures. • Utilize your skills in Peering and Datacenter interconnect technologies: PNI, Transit, Exchange, Passive DWDM, Wave circuits. • Monitor and support the network health of on-premises and cloud infrastructures. • Collaborate and develop workflow enhancements while documenting best practices.

California
$136K - $264.5K / year