SkyFi

SkyFi is an equal-opportunity employer that values and encourages workplace diversity.

Senior DevOps Engineer

DevOps EngineerDevOps EngineerFull Time Remote SeniorTeam 11-50

Location

United States

Posted

127 days ago

Salary

$170K - $220K / year

Seniority

Senior

No structured requirement data.

Job Description

Title Senior DevOps Engineer Our Mission We're unlocking the secrets of our planet. SkyFi simplifies obtaining high-resolution Earth observation data and analytics, ensuring businesses and professionals a seamless and efficient user experience. No more complex procedures or hefty price tags. We're empowering everyone, from individuals to companies, to understand and utilize the power of space for good. What we do has tremendous potential to solve meaningful problems in our world. This technology is a powerful tool for enterprises and individuals, enabling them to leverage satellite imagery and analytics for critical applications: assessing the structural integrity of bridges to prevent failures, monitoring crop health for optimized agricultural output, tracking endangered species for environmental conservation, and exploring a myriad of other innovative use cases yet to be discovered. Grab the chance to be part of this. Join a team of open-minded, dynamic people solving new challenges and working on new technology in an exciting market with immense growth. SkyFi is the place for you. The Job As a Senior DevOps Engineer, you will design, build, and maintain the cloud infrastructure powering SkyFi's Earth Observation platform. You will work at the intersection of satellite technology and modern cloud-native systems, operating across GCP and AWS, managing production Kubernetes clusters, and championing GitOps-driven delivery. This role requires deep expertise in infrastructure-as-code, CI/CD, and site reliability practices, along with comfortable proficiency in Python for automation and operational tooling. The ideal candidate thrives with minimal supervision, excels when tackling ambiguous, high-impact problems, and is eager to learn about the fascinating Earth Observation industry. This Role Reports To: Engineering Manager, DevOps You Will Be Expected To - Design, deploy, and maintain production Kubernetes clusters; own cluster lifecycle management, performance tuning, and capacity planning. - Build and manage cloud infrastructure across GCP and AWS using Terraform and Terragrunt, following infrastructure-as-code best practices. - Develop, optimize, and maintain CI/CD pipelines using GitHub Actions and Flux CD to enable reliable, GitOps-driven deployments of containerized applications. - Develop Python-based tooling and automation to support infrastructure and platform operations. - Troubleshoot and resolve operational, networking, pipeline, and infrastructure issues across multi-cloud environments. - Identify, document, and automate repetitive or critical workflows to reduce operational burden on the engineering team. - Implement and maintain comprehensive monitoring, alerting, and observability using tools such as Prometheus and Grafana. - Ensure compliance with security, governance, and regulatory requirements, including those tied to classified environments. - Collaborate with development and operations teams to gather requirements and translate them into reliable infrastructure solutions. - Partner with fellow engineers to architect, develop, and scale the product while keeping operational reliability and cost-efficiency in mind. - Champion cloud-native best practices, infrastructure-as-code principles, and GitOps workflows across the engineering organization. What We Are Looking For Must-Have Qualifications - Active U.S. security clearance (required). - U.S. citizenship (required). - 6+ years of professional experience in DevOps, SRE, or Platform Engineering. - 5+ years of hands-on experience operating and managing Kubernetes in production environments. - Strong hands-on experience with both GCP and AWS - Proficiency with Terraform and Terragrunt for infrastructure provisioning and management. - Hands-on experience with Flux CD for GitOps-based continuous delivery. - Hands-on experience building and maintaining CI/CD pipelines with GitHub Actions. - Strong scripting skills in Bash and/or Python - Solid experience with Docker and container orchestration. - Deep understanding of modern DevOps principles, cloud-native architecture, and infrastructure-as-code practices. - Solid understanding and experience with observability systems like Grafana/Prometheus - Strong Linux systems administration skills. - Proactivity and ability to work with minimal supervision Preferred Qualifications - Familiarity with service mesh technologies (e.g., Istio, Linkerd). - Previous experience supporting 24/7/365 production services. - Experience working in early-stage or high-growth startup environments. - Excellent organizational and documentation skills. At SkyFi You Will - Be well compensated. Possibility for equity - Receive best-in-class benefits, including premium medical, dental, and vision coverage and 20 days paid time off - Play a critical role in building a market-changing product in the exciting realm of Space - Thrive in a fast-paced, dynamic environment that rewards initiative, innovation, and getting things done SkyFi is an equal-opportunity employer that values and encourages workplace diversity. Salary Band: $170,000-$220,000

Related Categories

DevOps Engineer

Related Job Pages

Remote Full-time Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

Senior Site Reliability Engineer

CentralReach

CentralReach provides electronic health record and practice management solutions designed to provide superior care and outcomes for people with autism. Offering

DevOps Engineer127 days ago

Other Remote

Company Site

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more. Role Description As a Sr. SRE, you will work closely with the key stakeholders in Software Engineering to drive adoption of modern reliability practices like SLOs, error budget policies, actionable alerts, incident retrospectives, chaos testing, and end-to-end ownership. - Responsible for availability, latency, performance, efficiency, monitoring/observability, emergency response, capacity planning, setting and maintaining SLOs, SLIs and Error Budgets, creating dashboards. - Analyze, troubleshoot and resolve operational challenges contributing to defined SLO's. - Manage site stability, performance, reliability, and maintain uptime for production environments. - Develop a fully automated multi-environment observability stack based on the existing system and extend it to predict capacity needs based on the usage patterns. - Strive for automation to reduce toil and increase development velocity. - Perform application-specific production support, incident management, change management, problem management, RCAs, and service restoration as needed. - Identify changes for the product architecture from the reliability, performance and availability perspective with a data driven approach. - Document resolution run books and standard operating procedures. - Actively look for opportunities to improve the availability and performance of the system by applying the learnings from monitoring and observation. - Collaborate with software development teams in the release management process and to shape the future roadmap and establish strong operational readiness across teams. - Implementation of reliability and observability tools (like New Relic, Prometheus, Grafana etc.). - Collaborates with Security team and other platform engineering teams to build reliable, maintainable, and scalable solutions that improve our security posture. Qualifications - Strong background as a SRE supporting a 24x7 highly available production environment for a SaaS or cloud service provider. - Solid experience with Monitoring/APM/Observability tools (Splunk, New Relic etc.). - Experience implementing observability plans around logs, metrics, and traces. - Experience in an agile development team developing software. - Experience with cloud infrastructure environments, preferably AWS, and Infrastructure as code (Terraform, CloudFormation). - Extensive experience with Docker, Kubernetes, Helm, CI/CD and config management tools like Ansible, Chef. - Strong experience with containerization technology and/or Kubernetes. - Experience with Release automation, system administration, configuration management. - Experience with programming languages (Java, Python, Go, etc.). - Strong understanding of Linux, Windows, software development, systems, networking, and cloud concepts. - Strong interpersonal and teaming skills - ability to set and enforce process and influence engineers who are not direct reports. - Strong analytical and programming skills (Python, Go, Java etc.). - Deep understanding around best practices for modern cloud security. - Proven experience building observability for security concerns, such as privilege escalations and bot detection. Requirements - Location: Hybrid capacity from Holmdel, New Jersey or Fort Lauderdale, Florida, or remote candidates located in other U.S. states for the right individual. - In-person interview or face-to-face meeting required for fully remote roles prior to the first day of employment. Benefits - Competitive compensation. - Comprehensive health benefits. - Generous PTO. - 401(k) matching. - Paid parental leave for full-time employees. - Hybrid work schedules. - Career development support. - Wellness programs. - Opportunities to give back through CR Cares™, our community engagement initiative.

AWS Terraform Docker Kubernetes Helm CI/CD Ansible Chef Splunk New Relic Prometheus Grafana Python Java Linux Microsoft Windows

View details: Senior Site Reliability Engineer

United States

$160K - $180K / year

Apply

Job Closed

DevOps Engineer

Pythian

Love your data

DevOps Engineer127 days ago

Full Time RemoteTeam 201-500Since 1997H1B No Sponsor

Company Site LinkedIn

Role Description Do you thrive on solving tough problems—even under pressure? Are you motivated by fast-paced environments with continuous learning opportunities? Do you enjoy collaborating with a team of peers who push you to constantly up your game? At Pythian, we are building a next-generation Site Reliability Engineering team. We need motivated and talented individuals on our teams, and we want you! You’ll act as a technology leader, advisor for our clients, and mentor for other team members. Projects would include: - Infrastructure architecture - Automation - Intelligent monitoring systems from design through implementation If you Love Your Data and want to Love Your Career, this could be the job for you! If this is you, and you wonder what it would be like to work at Pythian, reach out to us and find out! Intrigued to see what a life is like at Pythian? Check out #pythianlife on LinkedIn and follow @loveyourdata on Instagram! Not the right job for you? Check out what other great jobs Pythian has open around the world! Qualifications - Experience working with Google and AWS Clouds (including infrastructure as code deployment with Cloud Formation, Terraform, Opsworks, etc) - Scripting and automation of administrative tasks using Python and Scala is a must - Solid understanding of microservices architecture and container technologies (Kubernetes is a must, Docker, lxc, etc) - Clear understanding of software development lifecycles and best practices from an infrastructure point of view (PRs, merge, rebase, etc) - Understanding the end-to-end operations of a ‘Business System’ vs components - Comprehensive systems hardware and network troubleshooting experience - Common Linux distribution platform installation, configuration, performance tuning, and cloud migration - TCP/IP networking, NIC bonding, and network services configuration (DNS, NTP, DHCP, SMTP, etc) - Operation and administration of virtual infrastructure, including experience with at least one hypervisor (VMware, Hyper-V, KVM, etc) - Ability to describe IaaS, PaaS, SaaS, pros and cons of each, use cases for virtualization and cloud - Administration of web servers and supporting technologies, including network load balancers - Experience with the design, development, and deployment of Puppet - System and application error investigation, troubleshooting of access/availability issues including deep multi-system root cause analysis - Experience managing networking devices, such as switches and firewalls from a variety of vendors - Solid understanding of DevOps tools, processes, and culture - Ability to pick up new technologies quickly - Ability to provide accurate work scheduling and task estimations for work delivery Benefits - Competitive total rewards package - Flexibly work remotely from your home, there’s no daily travel requirement to an office - Collaborate with some of the best and brightest in the industry - Hone your skills or learn new ones with our substantial training allowance - We provide all the equipment you need to work from home including a laptop with your choice of OS - Annual wellness budget to prioritize your health and well-being - Generous amount of paid vacation and sick days, as well as a day off to volunteer for your favorite charity Hiring Disclaimer The successful applicant will need to fulfill the requirements necessary to obtain a background check. Accommodations are available upon request for candidates taking part in any aspect of the selection process. AI Disclaimer Pythian may utilize Enterprise Generative Artificial Intelligence (AI) tools or features throughout its hiring process. These tools help us manage high volumes of applications efficiently and may be employed to review applications, analyze resumes, and assist with other recruitment steps. While Pythian uses AI in its hiring process, it does not substitute for human judgment. Our Talent Acquisition Team reviews all AI-generated recommendations, and the system is subject to regular bias audits to ensure fairness and compliance with all applicable employment and human rights laws. All final hiring decisions are made by, and remain the responsibility of, human decision-makers. By applying for this position, you consent to Pythian’s use of these AI tools in the evaluation of your application. You have the right to request a human review of any solely AI-driven decision or to request an accommodation. Should you require further details regarding the processing of your data, please reach out to us.

View details: DevOps Engineer

Worldwide

Apply

Job Closed

DevOps Engineer – Google Cloud Platform, Terraform

Smart Working

Empowering companies to work with the best engineers in the world

DevOps Engineer127 days ago

Full Time RemoteTeam 51-200H1B No Sponsor

Company Site LinkedIn

• Design, implement, and maintain cloud-native infrastructure on Google Cloud Platform (GCP) using Terraform across multiple environments (production, staging, sandbox, and customer deployments). • Architect and operate serverless container workloads using Cloud Run, ensuring efficient scaling, resource management, and cost optimisation. • Design and manage event-driven systems using Pub/Sub, including message retention, acknowledgement deadlines, dead-letter queues (DLQ), and monitoring. • Build and maintain CI/CD pipelines using GitHub Actions and Cloud Build, including automated Terraform deployments and GitOps-based workflows. • Develop reusable Terraform modules and manage infrastructure across multiple GCP projects using best practices for remote state and environment separation. • Manage containerized workloads and cloud networking using services such as GKE, VPC, Load Balancers, Cloud Armor, IAM, and Secret Manager. • Collaborate with software engineers on architecture design decisions, including scaling strategies, service separation (HTTP vs WebSockets), and performance optimisation. • Implement monitoring, alerting, and observability using Google Cloud Monitoring, Cloud Logging, Sentry, and OpenTelemetry. • Administer and optimise data infrastructure, including MongoDB Atlas, Redis, BigQuery, and Cloud Storage. • Perform incident response and root cause analysis, implementing long-term improvements to increase reliability and resilience. • Own infrastructure end-to-end, including architecture decisions, performance optimisation, cost management, and operational excellence. • Create and maintain documentation, operational runbooks, and best practices. • Mentor engineers and promote DevOps and cloud architecture best practices across the organisation.

BigQuery Docker GCP MongoDB Python Redis Terraform

View details: DevOps Engineer – Google Cloud Platform, Terraform

India

Apply

Job Closed

DevSecOps Engineer

Weekday (YC W21)

We are a Y-Combinator-backed startup building your AI-powered Recruiter Agent

DevOps Engineer127 days ago

Full Time RemoteTeam 11-50Since 2021H1B No Sponsor

Company Site LinkedIn

• Responsible for integrating security practices into the DevOps lifecycle. • Build and maintain scalable, secure, and reliable cloud infrastructure. • Collaborate closely with software engineers, security teams, and infrastructure specialists. • Design and manage cloud environments, automating infrastructure provisioning. • Strengthen CI/CD pipelines and embed security controls throughout the software development lifecycle.

AWS Distributed Systems Docker Amazon EC2 Jenkins Kubernetes Python Terraform

View details: DevSecOps Engineer

India

Apply

Job Closed