Job Closed

This listing is no longer active.

Work Truck Solutions logo
Work Truck Solutions

Helping dealers sell more Work Trucks.

Cloud Operations Engineer

DevOps EngineerDevOps EngineerOtherRemoteSeniorTeam 51-200H1B No SponsorCompany SiteLinkedIn

Location

California + 2 moreAll locations: California | Florida | Texas

Posted

117 days ago

Salary

$110K - $150K / year

Seniority

Senior

Bachelor DegreeEnglishAWSAzureGCPJenkins

Job Description

Cloud Operations Engineer

Work Truck Solutions

• Oversee all cloud infrastructure and resources, including provisioning, performing regular patch management, and proactive capacity planning • Establish comprehensive system observability and maintain alerting infrastructure; serve as the escalation point for major incidents, drive resolution, and champion thorough Root Cause Analysis (RCA) • Define and maintain a robust security posture by enforcing Identity & Access Management (IAM), completing security audits, ensuring data encryption, and managing audit logs for regulatory compliance • Actively track cloud spend against budgets, direct the team in performing right-sizing and waste elimination, and optimize rates through reserved instances and savings plans (FinOps strategy) • Direct the implementation and regular testing of comprehensive disaster recovery and business continuity plans, including backup management and maintaining a High Availability (HA) architecture across multiple zones

Job Requirements

  • Proven experience managing infrastructure on major cloud platforms (AWS, Azure, or GCP)
  • Strong understanding of network security, IAM, and compliance frameworks
  • Demonstrated ability to reduce cloud costs through FinOps principles
  • Experience in designing and testing Disaster Recovery and High Availability architectures
  • Proficiency in scripting languages for operational automation
  • Familiarity with tools like CloudWatch, Datadog, Jenkins, or similar systems
  • A focus on system availability as the primary key metric (target uptime 99.99%)

Benefits

  • Competitive salary
  • Fully remote Monday-Friday work week
  • Comprehensive medical, dental, and 401k benefits, with complimentary life insurance
  • Paid Time Off (PTO) and holidays
  • Flexible scheduling, subject to manager’s approval
  • Opportunity to work with a supportive and innovative team

Related Categories

Related Job Pages

More DevOps Engineer Jobs

DevOps Engineer117 days ago
OtherRemoteTeam 51-200H1B No Sponsor

• Own the health, performance, and availability of Air's PostgreSQL Aurora infrastructure. • Proactively optimize database parameters, indexes, and query patterns to maintain sub-100ms p95 response times. • Uplevel migration practices and tooling to ensure zero-downtime schema changes as the platform scales. • Establish and maintain comprehensive backup, recovery, and disaster recovery procedures with documented RTO/RPO targets. • Partner with backend engineers to implement database best practices in application code (connection pooling, query optimization, caching strategies). • Develop multi-quarter roadmap to scale Air's database infrastructure to support 10x growth in asset volume and user activity. • Collaborate with backend engineers and product leadership to model data growth patterns and anticipate scaling inflection points. • Evaluate and implement horizontal scaling strategies (read replicas, sharding, partitioning) aligned with business needs. • Continuously assess AWS Aurora capabilities, PostgreSQL ecosystem innovations, and emerging database technologies for strategic advantage. • Design and implement database architecture that supports Air's AI-powered features and real-time creative workflows. • Create comprehensive monitoring, alerting, and reporting systems to maintain database reliability and inform data-driven infrastructure decisions. • Implement detailed instrumentation for database performance metrics (query latency, connection pool utilization, replication lag, disk I/O). • Build automated alerting for anomalies in query performance, connection patterns, and resource utilization. • Create executive-level dashboards showing database health trends, capacity utilization, and cost efficiency. • Develop regular database health review cadence with engineering leadership to surface insights and drive continuous improvement.

United States
$160K - $240K / year
Job Closed
Full TimeRemoteTeam 201-500H1B No Sponsor

• Own production infrastructure across AWS and Azure, including networking, IAM, and cost. • Build and operate Terraform modules and state at scale, keeping our infrastructure as code clean and reviewable. • Run Kubernetes in production: upgrades, scaling, troubleshooting, and platform improvements. • Operate and improve CI/CD pipelines that the entire engineering org depends on. • Operationalize SLO/SLI frameworks and observability practices alongside the SRE team. • Own incident response practice, on-call tooling, and incident review follow-through. • Reduce operational toil through automation across secret rotation, access management, and environment provisioning. • Execute on capacity planning, disaster recovery, and resilience work across critical systems. • Build and maintain internal developer tooling that removes friction across engineering. • Lead rollouts of AI-native tooling for code review, testing, and engineering productivity, e.g., CodeRabbit, Copilot-class assistants, and internal AI workflows. • Own migrations and consolidation of internal platforms such as Jira, Confluence, ticketing, and documentation systems. • Partner with engineering and product leadership to identify and remove the biggest DX bottlenecks, and align infrastructure and reliability investments with business goals. • Mentor engineers and technical leads, fostering growth and knowledge-sharing within the organization. • Lead post-mortems and continuous improvement initiatives to strengthen reliability practices. • Evaluate and introduce new technologies, tools, and approaches to improve scalability and efficiency. • Drive standardization and modernization efforts across infrastructure and operational practices. • Lead proof-of-concept and experimentation initiatives to validate new reliability solutions.

Canada
$150K - $175K / year
Part TimeRemoteTeam 11-50H1B No Sponsor

• Design, implement, and manage Azure infrastructure • Automate cloud deployments and manage resources • Create, maintain, and enhance CI/CD pipelines • Manage and maintain Linux servers • Implement and enforce security best practices

United Arab Emirates
Job Closed
StarCompliance logo

Principal Site Reliability Engineering Lead

StarCompliance

We are Reputation Guardians, on a mission to make compliance simple and easy.

DevOps Engineer117 days ago
OtherRemoteTeam 201-500H1B No Sponsor

• Act as a senior custodian of the production promotion process across the software platform estate. • Work closely with Technical Leads and QA to define and evolve promotion practices that emphasise quality, performance, and operational readiness. • Define and evolve observability standards across metrics, logging, tracing, and alerting. • Ensure systems are instrumented to support rapid diagnosis, learning, and recovery. • Drive continuous improvement in platform reliability, performance, and release confidence. • Partner with engineering, architecture, and platform teams to embed operability and resilience into system design. • Lead and participate in on-call and rota-based operational support for production systems. • Coordinate and continuously improve incident management practices, including post-incident reviews and preventative actions. • Act as a senior technical authority for production readiness, operational risk, and release confidence. • Mentor SREs and senior engineers, raising reliability and operational standards across teams. • Influence architectural and platform decisions with a strong operational and delivery lens while remaining hands-on.

New York