DroneUp logo
DroneUp

DroneUp is a leader in drone flight services that transforms organizations using drone technology and delivery solutions. The company develops SaaS platforms that have mobile app t

SRE – Platform Engineer

Location

United States

Posted

105 days ago

Salary

$125K - $150K / year

Seniority

Lead

Bachelor Degree8 yrs expExperience acceptedEnglishAWSAzureGCPGrafanaKubernetesLinuxmacOSNode.jsPrometheusPythonTerraformUnix

Job Description

SRE – Platform Engineer

DroneUp

• Broad domain architect for the internal developer platform and all cloud engineering • Drive architecture for tooling or in-house software • Mentor other platform engineers to drive strong engineering practices • Enablement of platform engineering technical capabilities in our internal client teams in software engineering • Peer with the senior architects and engineers in software engineering • Architecture and engineering focused on GCP environment • Architect and oversee GKE cluster operations and workload management • Provide feedback to others and participate in peer reviews / pair programming • Drive the broad adoption of Test Driven Development through designing, development, and debugging unit and integration tests for new and existing infrastructure and code • Continuous curiosity of existing implementations and new technologies and sharing with the team • Practice continuous improvement across all job areas and personally / professionally • Clearly communicate with platform engineering teams and other stakeholders and provide technical direction while doing so • Stay current with platform changes and third-party libraries. • Proactively investigate better solutions for current solutions • An understanding of Open Telemetry and true observability and the difference between it and monitoring and logging • Grow the engineering culture towards a high-performing team • Practice the arts of self-service, least privilege and security by default in all solutions • Define and maintain Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets • Lead incident response, including on-call rotations, root cause analysis, and post-mortem reviews • Implement and optimize monitoring, alerting, and observability systems for system reliability • Collaborate on capacity planning and performance optimization to ensure high availability • Other duties as assigned

Job Requirements

  • Bachelor's degree in Computer Science, Computer Engineering or related field or 8+ years experience as a software engineer
  • Proficiency in kubernetes. Optional: CKA, CKAD
  • Extensive experience in Unix / Linux
  • Polyglot and proficiency in multiple languages (ideally: Golang, NodeJS, Python, HCL and more)
  • Knowledge of multi-cloud environment, including GCP, AWS, and Azure (familiar with at least two of these environments)
  • Experienced in using git in trunk-based development models
  • Experience in use of feature flagging in infrastructure and runtime (k8s)
  • Experience with backend database technology is a plus, including supporting and performance enhancements
  • Advanced experience working with and creating public cloud resources in Terraform or other infrastructure as code tools
  • Experience participating in a 24/7 on-call schedule without supervision and successfully resolving issues without escalation
  • Experience using Open Telemetry for observability as well as other monitoring tools such as datadog, new relic and others
  • Good understanding of networking and routing principles
  • Experience in dockerizing applications and orchestrating them with kubernetes
  • Familiarity with security configuration for web/api services (SSL, Access control)
  • Experience with JIRA or other work tracking systems.
  • Ability to resolve tickets according to priority order and collaborating with the Technical Product Manager to adjust priorities
  • Excellent documentation details, using Confluence or similar tooling – this could include support notes, runbooks, ADRs, etc
  • Familiarity with creating an end to end CI/CD pipeline using various tools with artifact storage
  • Familiarity with use of MacOS as a desktop and predominantly CLI interfaces
  • Experience in a “product mindset” by understanding stakeholder needs, priorities and business value
  • Experience with security compliance frameworks including FedRAMP, NIST, and SOC2
  • Proven experience in SRE practices, including incident management and reliability engineering
  • Familiarity with monitoring tools like Prometheus, Grafana, or Honeycomb for observability
  • Experience with chaos engineering, load testing, or reliability testing frameworks.

Benefits

  • Employees are expected to provide a high level of security to any personal or private information accessed as part of their work, whether at a DroneUp facility or remotely.
  • Participate in security training.
  • Remain sensitive to individual rights to personal privacy.
  • Comply with company policies.

Related Categories

Related Job Pages

More DevOps Engineer Jobs

OtherRemoteTeam 1,001-5,000H1B No Sponsor

• Design, implement, and maintain CI/CD pipelines to support automated build, test, and deployment workflows • Partner with engineering teams to streamline release processes and improve deployment reliability • Implement and manage monitoring, logging, and alerting solutions to ensure system health and performance • Define and maintain cost monitoring and alerting strategies to optimize cloud spend and prevent unexpected usage • Automate infrastructure provisioning and configuration using Infrastructure as Code (IaC) • Troubleshoot production issues and lead root cause analysis efforts • Establish DevOps best practices around reliability, security, and operational excellence • Continuously evaluate tools and processes to improve scalability, availability, and efficiency • Mentor junior engineers and contribute to a strong DevOps culture

United States
Job Closed
OtherRemoteTeam 1,001-5,000H1B No Sponsor

• Lead and manage SRE operations supporting 24/7/365 availability • Own uptime, SLA compliance, SLIs, SLOs, error budgets, MTTR, and incident trends • Oversee incident management, on-call rotations, and post-incident reviews • Lead FinOps practices across hybrid environments • Drive right-sizing, optimization, and elimination of infrastructure waste • Establish cost visibility, allocation, and reporting • Define and maintain observability standards across hybrid environments, such as AWS, Azure and Vsphere • Utilize platforms such as Coralogix, Open Telemetry, and FireHydrant • Champion GitOps practices and pull request governance • Lead Terraform-based infrastructure automation initiatives • Partner across Product, Engineering, Infrastructure, Finance, and Support teams • Lead, mentor, and develop a high-performing SRE team

United States
Job Closed
spiderSilk logo

Senior DevOps Engineer

spiderSilk

spiderSilk delivers tip of the spear threat detection technology for the public and private sectors, globally.

DevOps Engineer105 days ago
Full TimeRemoteTeam 11-50H1B No Sponsor

• Design, build, and maintain robust CI/CD pipelines and cloud infrastructure to accelerate software delivery. • Monitor system performance, troubleshoot issues, and proactively respond to incidents to minimize downtime. • Collaborate closely with software engineers to enable rapid, secure, and reliable releases. • Automate deployment, testing, and scaling processes to enhance operational efficiency. • Implement Infrastructure as Code (IaC) and cloud security best practices to ensure compliance and reduce risk. • Optimize system reliability and performance through proactive capacity planning and tuning. • Champion DevOps best practices, including observability, disaster recovery, and cost optimization. • Stay ahead of emerging technologies and evaluate new tools to improve our tech stack.

United Arab Emirates
MAS Global Consulting logo

Senior DevOps Engineer, Strong Kubernetes

MAS Global Consulting

Modern digital solutions. Exceptional nearshore delivery.

DevOps Engineer105 days ago
OtherRemoteTeam 51-200Since 2013H1B Sponsor

• Design, build, and maintain systems that power ephemeral developer environments used in both local development and CI workflows • Improve environment provisioning, stability, and teardown workflows to enhance developer velocity and reliability • Develop tools and automation for testing, debugging, and feature validation across distributed systems • Collaborate with Developer Productivity teams to integrate environment management with CI/CD pipelines and internal tooling • Diagnose and resolve issues related to performance, scalability, and dependency management in developer environments • Contribute to observability and monitoring, including environment health, metrics, and resource utilization • Write and maintain high-quality documentation and internal guides to support developer onboarding and environment usage • Participate in design and code reviews, advocating for maintainability, reliability, and engineering best practices.

Florida
Job Closed