An open directory platform for secure, frictionless access from any device to any resource, anywhere
Staff Software Engineer, DevOps
Location
India
Posted
26 days ago
Salary
0
Seniority
Lead
Job Description
Staff Software Engineer, DevOps
JumpCloud
• Shape the direction of our Kubernetes-based platform • Drive a major infrastructure initiative • Multiply the effectiveness of the engineering team • Own the technical strategy and delivery of critical Infra systems that underpin all of JumpCloud's services: EKS cluster lifecycle, infrastructure-as-code, GitOps delivery pipelines, observability, and multi-region networking • Work closely with Data Platform and application engineering teams
Job Requirements
- 7+ years of professional experience in infrastructure, platform, or DevOps engineering
- Deep expertise in Kubernetes (EKS preferred) — cluster lifecycle management, addon ecosystem, node group operations, RBAC, Cross plane, networking, and production troubleshooting
- Strong Terraform proficiency — module design, state management, provider upgrades, and large-scale refactoring across multi-account, multi-region AWS infrastructure
- Solid AWS knowledge spanning EKS, IAM, VPC networking (NLB/ALB, Transit Gateway, Global Accelerator), Route53, CloudFront, and multi-account architectures
- Hands-on experience with production observability — Datadog monitors, SLOs/SLIs, PagerDuty alerting, and driving alert quality and noise reduction
- Experience with secrets management — HashiCorp Vault & Consul, AWS Secrets Manager, KMS, External Secrets Operator, Cert Manager
- Experience working with highly available distributed systems across multiple regions
- Demonstrated ability to lead technical architecture decisions, author design documents/RFCs with clear trade-off analysis, and drive consensus across teams
- Track record of mentoring engineers and raising the technical bar of a team
- Passion for addressing complex infrastructure challenges at scale
- Strong problem solving, communication, and collaboration skills
- A strong team player who helps the team live by our core values: building connections, thinking big, and 1% better every day
- Professional experience developing and deploying applications in a public cloud environment (AWS, GCP) with CI/CD pipelines
- Experience working with highly available distributed systems
- Experience leveraging tools to monitor platform stability, availability and performance (ie: Datadog)
- Passion for addressing complex engineering problems/challenges
- Strong problem solving, communication and collaboration skills
- A strong team player who helps the team live by our core values: building connections, thinking big and 1% better every day
Benefits
- Scam Notice:
- Please be aware that there are individuals and organizations that may attempt to scam job seekers by offering fraudulent employment opportunities in the name of JumpCloud. These scams may involve fake job postings, unsolicited emails, or messages claiming to be from our recruiters or hiring managers. Please note that JumpCloud will never ask for any personal account information, such as credit card details or bank account numbers, during the recruitment process. Additionally, JumpCloud will never send you a check for any equipment prior to employment.
- All communication related to interviews and offers from our recruiters and hiring managers will come from official company email addresses (@jumpcloud.com) and will never ask for any payment, fee to be paid or purchases to be made by the job seeker. If you are contacted by anyone claiming to represent JumpCloud and you are unsure of their authenticity, please do not provide any personal/financial information and contact us immediately at recruiting@jumpcloud.com with the subject line "Scam Notice"
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
DevSecOps Developer
Cormac CorporationAt CORMAC, we leverage the power of data management and analytics to enable our customers to achieve their strategic goals. With over 20 years of experience in health information technology (HIT), human-centered design principles, and Agile development methodologies, CORMAC delivers complex digital solutions to solve some of the most challenging problems facing public healthcare programs today.
Role Description CORMAC is seeking a highly skilled DevOps Engineer to join our dynamic team. The ideal candidate will bring strong experience designing and managing AWS cloud infrastructure, building automated CI/CD pipelines, and supporting secure, compliant environments for healthcare applications. This role will develop automation using Python and other scripting languages, manage containerized deployments with Docker, and maintain high-availability Linux systems. The DevOps Engineer will collaborate with cross-functional teams to implement Infrastructure as Code using tools such as Terraform and Puppet, enforce configuration management best practices, and ensure adherence to healthcare industry security and compliance standards. Experience with monitoring tools, cloud security, and artifact management solutions is highly valued. - Design, deploy, and manage cloud infrastructure on AWS to support healthcare applications with a focus on security and compliance. - Develop and maintain automated CI/CD pipelines using tools such as Terraform, Ansible, Puppet, and Artifactory to streamline software delivery. - Write and maintain scripts in Python and other scripting languages to automate routine tasks and improve operational efficiency. - Manage containerized environments using Docker to ensure consistent application deployment across development, testing, and production. - Monitor system performance and troubleshoot issues on Linux operating systems to maintain high availability and reliability. - Collaborate with cross-functional teams to implement infrastructure as code (IaC) and enforce best practices in configuration management. - Ensure compliance with healthcare industry standards and data privacy regulations through secure infrastructure design and monitoring. Qualifications - Bachelor’s degree in Computer Science, Information Technology, or a related field, or equivalent practical experience. - Must be a U.S. Citizen - Must be able to obtain a Public Trust (Tier I) Clearance - 5+ years of experience with DevOps. - Proven experience working with Amazon Web Services (AWS) in a production environment. - Strong proficiency in scripting languages such as Python, Bash, or similar for automation purposes. - Hands-on experience with containerization technologies, specifically Docker. - Experience managing Linux operating systems in a server environment. - Familiarity with infrastructure as code tools such as Terraform and configuration management tools like Puppet. Requirements - Master’s degree in Computer Science, Information Technology, or a related field, or equivalent practical experience. - Experience working in the healthcare or regulated industries with knowledge of compliance requirements such as HIPAA, NIST, etc. - Certifications such as AWS Certified DevOps Engineer, AWS Certified Solutions Architect, or similar. - Experience with artifact repository management tools like Artifactory. - Knowledge of monitoring and logging tools such as CloudWatch, Prometheus, or ELK stack. - Strong understanding of network security principles and best practices in cloud environments. Location Leesburg, VA Work Arrangement 100% Remote
Role Description A Senior Site Reliability Engineer (SRE) is expected to own the operational stability and performance of Juul’s hybrid cloud infrastructure (Nutanix, AWS/GCP). This involves leading automation efforts, architecting for reliability, and acting as the final escalation point for critical incidents to ensure the platform is scalable and efficient. Nutanix Platform Management - Design, deploy, and maintain enterprise-scale Nutanix AHV clusters and Prism Central for multi-cluster management. - Expert-level proficiency with Nutanix CLI (nCLI and acli) for advanced operations, troubleshooting, and automation. - Develop automation scripts using Nutanix REST APIs, Python SDK, PowerShell, and Terraform for infrastructure-as-code. - Create and manage VM templates, golden images, and standardized deployment catalogs for consistent provisioning. - Design disaster recovery solutions using Leap, Protection Domains, cross-cluster replication, and metro clustering. - Implement network micro-segmentation using Nutanix Flow and configure RBAC, encryption, and security hardening. - Lead L3 troubleshooting using advanced diagnostics, log analysis (CVM, Genesis), NCC health checks, and cluster service resolution. - Configure high availability, VM affinity rules, QoS policies, and optimize performance for mission-critical workloads. - Manage AHV networking with OVS bridges, VLANs, bonds, LACP and implement resource reservations and workload balance. - Design, deploy, and maintain hybrid cloud infrastructure across Nutanix HCI, AWS, and GCP platforms. - Architect and implement multi-cloud solutions ensuring high availability, scalability, and disaster recovery. Cloud Platform Engineering - Architect and deploy enterprise-scale, highly available multi-cloud solutions across AWS and GCP with multi-region/multi-account strategies. - Expert-level proficiency with AWS CLI, GCP CLI, SDK, boto3, and Python for advanced automation and infrastructure orchestration. - Design AWS Organizations and GCP Organization hierarchies with consolidated billing, IAM policies, and centralized governance. - Configure and manage AWS Systems Manager (SSM) including Session Manager, Run Command, State Manager, and Automation for centralized fleet operations. - Implement centralized logging using CloudWatch/CloudTrail and GCP Cloud Logging with S3/Cloud Storage aggregation. - Integrate AWS and GCP with Splunk using HEC, CloudWatch subscriptions, Pub/Sub, Dataflow, and cloud-specific add-ons for SIEM correlation. - Design and deploy advanced load balancing solutions with AWS ALB/NLB/ELB and GCP Cloud Load Balancing including SSL termination and auto-scaling. - Develop infrastructure-as-code using Terraform, CloudFormation, CDK for repeatable multi-cloud deployments and CI/CD pipelines. - Configure AWS SSO, cross-account IAM roles, GCP Workload Identity, and federated access for centralized identity management. - Design VPC architectures with AWS Transit Gateway/PrivateLink and GCP Shared VPC/VPC peering for hybrid connectivity. - Manage containerized workloads using EKS, GKE, ECS, Cloud Run with service mesh, observability, and security best practices. - Implement disaster recovery using AWS Backup, Cross-Region Replication, GCP snapshots, and multi-region failover strategies. - Lead L3 troubleshooting using CloudWatch Insights, GCP Cloud Trace, VPC Flow Logs, X-Ray, and vendor support escalation. - Perform cost optimization through Reserved Instances, Committed Use Discounts, rightsizing, and automated resource lifecycle management. System Administration - Administer and support Windows Server and Unix/Linux environments in production and non-production settings. - Perform OS-level hardening, patch management, and security compliance across heterogeneous systems. - Automate routine administrative tasks using PowerShell, Bash, Python, or similar scripting languages. - Manage GitHub organization settings, user permissions, repository access controls, and monitor GitHub Actions workflows and repository health across multiple teams. - Configure Splunk forwarders, heavy forwarders and other integrations for data ingestion from cloud and on-premises sources. Qualifications - 8-12+ years infrastructure experience with 8+ years in Nutanix HCI and enterprise cloud (AWS/GCP). - Expert-level skills in Python, PowerShell, Bash scripting, infrastructure-as-code (Terraform/CloudFormation), and container orchestration (Kubernetes, EKS/GKE). - Proven experience managing enterprise-scale environments, hybrid cloud migrations, disaster recovery, and L3 critical incident management. - Strong networking knowledge (TCP/IP, VLANs, routing, VPN), security hardening, and compliance frameworks (ITIL). - Strategic thinker with exceptional analytical and troubleshooting abilities for complex multi-layer infrastructure issues. - Excellent communication skills to translate technical concepts to executives and non-technical stakeholders. - Calm under pressure during critical outages with meticulous attention to security, compliance, and configuration management. - Self-motivated continuous learner committed to staying current with evolving cloud technologies and automation opportunities. - Available for on-call rotations with strong documentation skills and customer service orientation. Requirements - Certifications (plus): Nutanix NCP/NCAP, AWS Solutions Architect Professional, AWS DevOps Professional, GCP Professional Cloud Architect, Terraform. Benefits - People. Work with talented, committed and supportive teammates. - Equity and performance bonuses. Every employee is a stakeholder in our success. - Cell phone subsidy, commuter benefits and discounts on JUUL products. - Excellent medical, dental and vision, disability, and life insurance, plus family support, wellness, legal, and employee assistance program benefits. - 401(k) plan with company matching. - Plus biannual discretionary performance bonuses.
• Ensure the reliability, performance and availability of connectivity between systems by operating, diagnosing and evolving networks in distributed, hybrid and cloud environments, with a focus on traffic analysis, advanced troubleshooting and communication architecture between services. **Strategic AWS environment management:** • Operate and evolve complex, reproducible environments with high availability, performance and horizontal scalability, using services such as EC2, ECS, Lambda, RDS and S3. **Reliability and observability:** • Define, implement and evolve end-to-end observability practices (SLIs, SLOs, SLAs) with tools like New Relic, CloudWatch and custom dashboards; • Proactively identify bottlenecks and incidents. **Connectivity and network performance:** • Diagnose and resolve latency, packet loss and throughput issues in distributed environments; • Troubleshoot DNS, VPNs, firewalls and load balancers (L4/L7); • Analyze communication flows between services (cloud, on-premises and external integrations); • Support the definition and evolution of connectivity architecture between systems. **Automation and CI/CD:** • Design, maintain and optimize robust and secure pipelines (Jenkins, Bitbucket, GitOps) for continuous delivery of microservices and serverless workloads. **SRE culture and continuous improvement:** • Promote blameless post-mortems, chaos engineering and automation of operational tasks to reduce toil and increase team efficiency. **Security and governance (DevSecOps):** • Integrate best practices for IAM, secure networking, encryption, traffic control, vulnerability monitoring and compliance into the application lifecycle, including connectivity troubleshooting and communication analysis between services. **Documentation and knowledge sharing:** • Create and maintain clear documentation on architecture, automations, incidents and runbooks, driving autonomy and improving onboarding.
Vice President of Engineering – DevOps Engineering
GitLabBuild software faster. The One DevOps Platform enables your entire org to collaborate around your code. We're hiring.
• Define the engineering strategy for your functional area, aligning roadmap, investments, and organizational planning with GitLab's company direction. • Lead a large, globally distributed engineering organization across multiple product domains, supporting Directors, Senior Managers, Engineering Managers, and their teams. • Drive operational excellence through clear engineering metrics, strong incident management practices, and a disciplined approach to reliability, quality, and developer productivity. • Champion AI-native engineering practices across workflows, platforms, and products, including developer tooling, code review, and CI/CD. • Partner with Product Management to create a consistent, high-velocity working model between engineering and product teams. • Collaborate with Finance and executive peers on headcount planning, budgeting, forecasting, and investment tradeoffs. • Work with Sales, Marketing, Alliances, Customer Success, and security leaders to address enterprise needs and support strategic initiatives. • Represent your engineering organization in executive discussions, board-level conversations, and external forums through clear communication and transparent documentation.



