Workforce analytics platform that gives managers actionable insights to improve team productivity and performance.
DevOps Engineer
Location
Brazil
Posted
64 days ago
Salary
0
Seniority
Senior
Job Description
DevOps Engineer
Time Doctor
• Architecting, managing and scaling our cloud-native infrastructure on Google Cloud Platform and AWS • Work hands-on with modern serverless technologies, containerized architectures and container orchestration platforms • Leverage the full range of cloud services to ensure high availability, security and performance • Requires deep expertise in infrastructure such as code (Terraform), automated CI/CD pipelines and cloud-native best practices
Job Requirements
- Bachelor's degree in Computer Science, related technical field or equivalent practical experience
- 3-5 years of hands-on DevOps experience with production cloud environments
- Strong expertise in Google Cloud Platform (GCP), including: Cloud Run, Cloud Functions, GKE VPC networking, Cloud Armor, Load Balancers IAM, Secret Manager and security services
- Advanced Terraform proficiency : Infrastructure as code for complex multi-environment setups Module development and state management Terraform Cloud/Enterprise workflows
- MongoDB Atlas administration : Cluster configuration, sharding and replica sets Backup/recovery strategies and performance tuning Network peering and security configuration
- Container technologies : Docker containerization and multi-stage builds
- CI/CD expertise : GitHub Actions workflows Cloud Build pipelines GitOps practices and automated deployments
- Strong Python and Bash scripting skills
- Experience with Sentry/Datadog or similar APM/monitoring platforms
Benefits
- 100% remote and async-first — work from anywhere
- Competitive pay + 30+ days of paid time off
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Principal Site Reliability Engineer
Fidelity InvestmentsFounded in 1946 and headquartered in Boston, Massachusetts, Fidelity Investments is a financial services corporation specializing in investment management, reti
Title: Principal Site Reliability Engineer Location: 100 New Millennium Way, Bldg 1, Durham NC Job Description: Position Description: Combines Operational excellence with Development experience to deliver services at high scale, high availability with resilience. Builds reliability into the ecosystem by applying best practices in Resiliency Engineering, Automation, Observability and Chaos Testing. Streamlines and accelerates software delivery cycle by using DevOps practices and toolchain. Integrates Site Reliability Engineering (SRE) practices (Observability and Chaos) with DevOps processes and delivery pipelines to stop bad code from reaching production. Ensures business-critical enterprise systems are continuously available to internal and external customers. Implements technical standardization and process refinements within the engineering organization and for Site Reliability Engineers. Collaborates with production support teams to define and implement processes for the identification, collection, and analysis of incident data. Brings together technical, procedural, and financial data to reduce toil and increase efficiency. Primary Responsibilities: - Develops Chaos Testing capabilities using multiple Chaos Tools (AWS Fault Injection Service (FIS), Chaos Mesh, and Chaosd) and Chaos Toolkit. - Develops and enhances organization’s internal Chaos Framework to streamline Chaos Executions and reporting. - Provides specialized technical expertise in the adoption of Chaos Engineering by application teams. - Chaos tests and observes business-critical applications to understand the weaknesses and increase application resiliency. - Activates Observability for the critical applications with recommended Service Level Indicators and Service Level Objectives for Latency, Availability, Error Rate etc. - Utilizes modern monitoring tools (Datadog, Splunk, Catchpoint etc.) to reduce mean time to detect an issue and improve the response times. - Creates CI/CD pipelines with security and quality checks with Application Lifecycle management toolchain. Helps in integrating Chaos and Observability with CI/CD pipelines. - Automates repetitive activities using scripting languages (Python, Groovy etc.). - Implements and supports solutions based on cloud platforms AWS/Azure and container orchestration Kubernetes. - Onboards /Evaluates New Cloud services that help to enhance the Resiliency of cloud ecosystem. Serves as a liaison for vendor engagement. - Participates in incident management, problem management and incident postmortems. - Takes part in peer code reviews providing qualitative feedback. - Builds processes and capabilities to adapt and respond to risks, and disruptions, while maintaining business operations and data recovery with minimal disruptions. - Coaches peer SREs and application teams on SRE and DevOps. - Implements Agile methodologies in the team’s project completion using incremental and iterative steps. Education and Experience: Bachelor’s degree in Computer Science, Engineering, Information Technology, Information Systems, or a closely related field (or foreign education equivalent) and five (5) years of experience as a Principal Site Reliability Engineer (or closely related occupation) implementing resilient container and cloud-based applications and infrastructure solutions, using DevOps or SRE practices, in a financial services environment. Or, alternatively, Master’s degree (or foreign education equivalent) in Computer Science, Engineering, Information Technology, Information Systems, or a closely related field (or foreign education equivalent) and three (3) years of experience as a Principal Site Reliability Engineer (or closely related occupation) implementing resilient container and cloud-based applications and infrastructure solutions, using DevOps or SRE practices, in a financial services environment. Skills and Knowledge: Candidate must also possess: - Demonstrated Expertise (“DE”) improving application resiliency by implementing chaos engineering to build system's capability to withstand turbulent conditions in production, using Chaos Mesh, Chaosd, Azure Chaos Studio, AWS FIS, or Gremlin; and driving automation to implement scalable approaches for the planning, design, execution, and reporting of chaos testing using Jenkins pipelines, standard frameworks, data visualization, and dashboards. - DE implementing advanced observability practices and techniques in production and pre-production environments, at scale using Datadog, Splunk, or Catchpoint; tracking the error budget, proactively identifying issues, minimizing Mean Time to Repair (MTTR); and balancing customer expectations by implementing Service-Level Indicators (SLIs) and Service-Level Objectives (SLOs) using logs, traces, monitors and synthetic tests. - DE migrating and maintaining cloud applications and creating cloud solutions using Amazon Web Services (AWS) or Azure cloud services; Implementing infrastructure as code for cloud; Onboarding new AWS or Azure services with required reviews and security controls in non-production and production environments; and researching evolving cloud ecosystem to adopt machine learning based tools (AWS DevOps guru) to boost AIOps abilities. - DE implementing CI/CD pipelines in both production and non-production environments using Application Lifecycle Management (ALM) tools (JIRA, GitHub, Jenkins, SonarQube, Artifactory, or uDeploy) to enable faster code delivery, enhanced software quality, reliability, and security; and developing products, and core and common capabilities for the organization to reduce toil and drive standardization, using containerization and orchestration technologies (Docker or Kubernetes), Infrastructure as Code (IaC) tools, scripting languages (Python or Groovy), and engineering best practices. #PE1M2 #LI-DNI Certifications: Category:Information Technology Most roles at Fidelity are Hybrid, requiring associates to work onsite every other week (all business days, M-F) in a Fidelity office. This does not apply to Remote or fully Onsite roles. Some roles may have unique onsite requirements. Please consult with your recruiter for the specific expectations for this position. Please be advised that Fidelity’s business is governed by the provisions of the Securities Exchange Act of 1934, the Investment Advisers Act of 1940, the Investment Company Act of 1940, ERISA, numerous state laws governing securities, investment and retirement-related financial activities and the rules and regulations of numerous self-regulatory organizations, including FINRA, among others. Those laws and regulations may restrict Fidelity from hiring and/or associating with individuals with certain Criminal Histories.
Role Description The Senior Fullstack Engineer is a critical role responsible for developing core platform functionality. This includes creating tools to connect our Care Team with Members and building self-service options for Members. A key part of the role is defining the technical architecture and solutions necessary for efficient, rapid, and effective scaling. The goal is to deliver care to our Members that is easy, personalized, and highly effective. This role requires close collaboration with the engineering team, as well as with product and design, to successfully implement the defined product vision and roadmap. Key Responsibilities: - Delivery of full stack functionality on our solution that connects patients and clinicians through our web based portal and backend interfaces and APIs. - Support what is built, including monitoring, performance tuning, and responding to incidents. - Propose viable technical solutions to business needs that align with Ilant Health’s mission and values. - Contribute to the advisement of technical strategy, primarily related to architecting and scaling of current and new products. - Identify bottlenecks and implement improvements to processes, tools, and procedures. - Promote a culture of collaboration and learning across engineering, product, and design team via mentoring, documentation, presentations, or other knowledge sharing methods. Qualifications - Experience being on a small to medium sized engineering team (3 - 8 people) to deliver consumer or business facing features in a fast-paced environment. - Proven ability to deliver full stack development directly delivering value to patients and providers using technologies like Python, Next.JS and Typescript. - Effectively communicate between teams and within teams in order to drive alignment and increase effectiveness on delivery. - Ability to deal with ambiguity, demonstrate ownership, and lend your expertise to guiding the technical product roadmap. - Proven ability to switch domains and tech stacks then add value on Day 1. Requirements - Language: React, Python, Next.js, Typescript - Systems: AWS, Amplify, ECS, Postgres Benefits - Fully remote environment – work from anywhere while maintaining meaningful collaboration with a distributed team - Comprehensive health benefits – medical, dental, and vision coverage to support you and your family - Paid time off – 2 weeks of PTO to rest, recharge, and take the time you need - Flexible floating holiday – one additional day each year to celebrate what matters most to you - Paid sick leave – 5 sick days so you can prioritize your health when needed - 11 paid company holidays throughout the year - 401(k) retirement plan to help you invest in your future - Healthcare and Dependent Care FSA options for additional tax-advantaged savings
Senior DevOps Engineer
General DynamicsA business unit of General Dynamics, General Dynamics Information Technology (GDIT) supports some of the United States' most complex government, defense, and in
• Design and define system architecture for new or existing computer systems • Set-up, maintain, and develop continuous build/ integration infrastructure • Create and maintain fully automated CI build processes for multiple environments • Develop build and deployment scripts • Support CI/CD tools integration, operations, change management, and maintenance • Support full automation of CI/CD Development and Testing • Support policies, standards, guidelines, governance and related guidance for CI/CD operations and development • Enable successful release management by moving code from Development and Testing environments to Staging and Production
Staff Site Reliability Engineer
Thrive MarketThrive Market is a membership ecommerce platform that aims to provide every American family with high-quality, natural products at affordable prices. Potential hires seeking work-l
ABOUT THRIVE MARKET Thrive Market was founded in 2014 with a mission to make healthy and sustainable living easy and affordable for everyone. As an online, membership-based market, we deliver the highest quality healthy, and sustainable products at member-only prices, while matching every paid membership with a free one for someone in need. Every day, we leverage innovative technology and member-first thinking to help our over 1,700,000+ members find better products, support better brands, and build a better world in the process. We are also a Certified B Corporation, a Public Benefit Corporation, and a Climate Neutral Certified company. Join us as we bring healthy and sustainable living to millions of Americans in the years to come. THE ROLEWe’re looking for a Staff Site Reliability Engineer to help define and build the reliability foundation for Thrive Market’s platform. You’ll be working with a first-class group of engineers to establish our SRE practice from the ground up; defining SLOs, SLIs and Error Budgets, building observability into everything we do, and creating the frameworks that ensure our systems scale reliably during our company’s rapid growth. This is a high-impact role at an exciting inflection point. We’ve recently containerized our entire platform on Kubernetes, and we’re evaluating a potential platform migration to a next-generation ecommerce platform. You’ll be balancing hands-on reliability work with the strategic thinking needed to build systems that self-heal and get better over time. If you’ve read books like The Google SRE Handbook, The Phoenix Project, Accelerate, The DevOps Handbook, etc., this is the right place for you! RESPONSIBILITIESReliability & Observability - Define, implement, and own Service Level Objectives (SLOs) and Service Level Indicators (SLIs) across critical platform services - Build and maintain comprehensive monitoring, alerting, and observability systems using tools like Datadog, Prometheus, Grafana, or similar platforms - Establish error budgets and use them to balance feature velocity with reliability investments - Lead incident response efforts, conduct blameless postmortems, and drive systemic improvements that prevent recurrence - Design and implement chaos engineering practices to proactively identify failure modes before they impact members Infrastructure & Platform - Architect and optimize our Kubernetes-based container orchestration platform for reliability, performance, and cost efficiency - Support large infrastructure migrations, ensuring a smooth transition with minimal disruption to business operations - Contribute to the evaluation and execution of potential platform migrations, with a focus on reliability planning and risk mitigation - Design and implement automated deployment pipelines that enable rapid, error-free releases with feature flags and built-in rollback/roll-forward capabilities - Develop and own disaster recovery plans, capacity planning models, and system hardening initiatives - Collaborate closely with product engineering teams to help them scale their infrastructure in AWS and adopt SRE best practices Culture & Process - Help establish SRE as a practice at Thrive Market, defining the team’s charter, processes, and engagement model with product engineering teams - Champion a culture of operational excellence, continuous improvement, and data-driven reliability decisions - Create and maintain technical documentation covering architecture decisions, runbooks, incident response procedures, and operational playbooks - Participate in weekly on-call rotations and help build sustainable on-call practices that avoid burnout - Identify systemic problems and inefficiencies across the engineering organization and make strategic recommendations for improvement QUALIFICATIONSRequired - B.S. in Computer Science or equivalent professional experience - 7+ years of hands-on experience in SRE, DevOps, or Infrastructure Engineering, with a proven track record of improving reliability at rapidly growing companies - Deep expertise in Kubernetes (K8s) — including cluster management, Helm charts, service meshes, and production-grade container orchestration - Strong systems engineering background with advanced proficiency in Linux administration - Advanced scripting and automation skills in Bash, Python, Golang, Ruby, or similar languages - Extensive experience with core AWS services including EC2, ECS/EKS, S3, VPC, IAM, CloudWatch, Route 53, RDS, and Lambda - Strong experience with Infrastructure as Code tools (Terraform, CloudFormation, Pulumi, or similar) - Hands-on experience defining and implementing SLOs, SLIs, and error budgets in production environments - Deep understanding of CI/CD pipelines and deployment strategies (blue-green, canary, rolling deployments) - Expertise in monitoring and observability platforms (Datadog, Prometheus, Grafana, New Relic, or similar) - Strong knowledge of web application infrastructure, networking, load balancing, and security best practices - Excellent communication skills with the ability to lead incident response and facilitate blameless postmortems Preferred - Experience with e-commerce platforms (Magento, Shopify, or comparable) and the unique reliability challenges they present at scale - Experience with ConcourseCI, Github Actions (GHA) or similar deployment frameworks - Experience with chaos engineering tools and practices (Gremlin, Litmus, Chaos Monkey, or similar) - Familiarity with GitOps workflows (ArgoCD, Flux) and service mesh technologies (Istio, Linkerd) - Experience building and managing cost-optimization strategies for cloud infrastructure - Background in establishing SRE practices in organizations transitioning from traditional DevOps models - Experience with configuration management tools (Ansible, Chef, Puppet, or similar) BELONG TO A BETTER COMPANY - Comprehensive health benefits (medical, dental, vision, life and disability) - Competitive salary (DOE) + equity - 401k plan - 9 Observed Holidays - Flexible Paid Time Off - Subsidized ClassPass Membership with access to fitness classes and wellness and beauty experiences - Ability to work in our beautiful office in Playa Vista - Free Thrive Market membership with exclusive employee discount - Coverage for Life Coaching & Therapy Sessions on our holistic mental health and well-being platform We're a community of more than 1 Million + members who are united by a singular belief: It should be easy to find better products, support better brands, make better choices, and build a better world in the process. At Thrive Market, we believe in building a diverse, inclusive, and authentic culture. If you are excited about this role along with our mission and values, we encourage you to apply. Thrive Market is an EEO/Veterans/Disabled/LGBTQ employer At Thrive Market, our goal is to be a diverse and inclusive workplace that is representative, at all job levels, of the members we serve and the communities we operate in. We’re proud to be an inclusive company and an Equal Opportunity Employer and we prohibit discrimination and harassment of any kind. We believe that diversity and inclusion among our teammates is critical to our success as a company, and we seek to recruit, develop and retain the most talented people from a diverse candidate pool. If you’re thinking about joining our team, we expect that you would agree! Employment with Thrive Market requires that employees be based in the United States. This is a condition of employment and must be maintained throughout the duration of employment. If you need assistance or accommodation due to a disability, please email us at eeo@thrivemarket.com and we’ll be happy to assist you. Ensure your Thrive Market job offer is legitimate and don't fall victim to fraud. Thrive Market never seeks payment from job applicants. Thrive Market recruiters will only reach out to applicants from an @thrivemarket.com email address. For added security, where possible, apply through our company website at www.thrivemarket.com. © Thrive Market 2026 All rights reserved. JOB INFORMATION - Compensation Description - The base salary range for this position is $180,000 - $225,000/Per Year. - Compensation may vary outside of this range depending on several factors, including a candidate’s qualifications, skills, competencies and experience, and geographic location. - Total Compensation includes Base Salary, Stock Options, Health & Wellness Benefits, Flexible PTO, and more! - This position requires traveling to our HQ office in Los Angeles, California, twice a year for all-company summits; once in the summer and once in the winter.



