
EverOps
Remote Jobs
The Embedded Service Provider
8 Jobs
• Design, implement, and manage endpoint platforms using Microsoft Intune and Iru • Own device lifecycle management (onboarding, offboarding, compliance, and refresh) • Implement automated provisioning with Autopilot and Apple Business Manager (DEP) • Configure and enforce industry-standard hardening baselines for macOS and Windows via Intune/Iru • Manage vulnerability exposure using CrowdStrike Spotlight or similar, and drive remediation SLAs • Own CrowdStrike Falcon platform administration, including sensor deployment, policy configuration, prevention policy tuning, and exclusion management • Lead alert triage and investigation workflows, partnering with the security team on escalations and response • Build and maintain host groups, device policies, and containment workflows • Administer Server Patch and Policy Management through WSUS/AWS SSM • Integrate endpoint platforms with Okta, Entra ID, and other identity providers • Automate endpoint configuration and application lifecycle using scripting (PowerShell, Bash, Python) • Troubleshoot complex endpoint issues across OS, network, and identity layers • Support secure access workflows (VPN, cert-based Wifi authentication) • Build and maintain documentation, runbooks, and standards • Partner with Security, IAM, and Cloud teams to align endpoint strategy with broader platform architecture • Contribute to continuous improvement within your team and across EverOps’ customer base.
• Lead implementation of Okta as the central identity platform (SSO, MFA, lifecycle management) • Reduce manual IT operations through automation • Build and manage identity infrastructure using Terraform (or equivalent IaC tools) • Develop reusable modules for Okta apps, groups, policies, and integrations • Implement version-controlled identity configurations with full auditability • Leverage GitHub (GitOps) for: • Source control of identity configurations • Pull request-based change management • CI/CD pipelines (GitHub Actions) for identity deployments • Enforce approval workflows, testing, and promotion across environments (dev → prod) • Treat identity changes as code with full traceability and rollback capability • Design and automate onboarding/offboarding (JML) workflows with zero manual provisioning • Establish device trust and conditional access policies (identity + endpoint integration) • Automate workflows across HRIS, identity, and endpoint systems (APIs, scripting, Okta Workflows, Tines, or equivalent) • Fully automate onboarding/offboarding with clear workflow visibility • Provide 100% SSO coverage and MFA standardization • Administer device-based access controls (zero trust foundation)
Overview Some of the world’s most innovative global enterprise software companies struggle to find engineering partners capable of matching their rigorous standards. These teams need a partner that can co-own complex problems from within their own development environment. Enter EverOps – the premier Embedded Service Provider. We partner directly with customer engineering teams to assess and address mission-critical delivery and infrastructure challenges. The Challenge EverOps is looking for a Senior DevOps Engineer with a deep mastery of enterprise cloud infrastructure and the ability to drive technical projects autonomously. You will act as a technical anchor, leveraging advanced engineering skills to migrate high-traffic workloads and restructure AWS environments to meet enterprise standards. The Mission As a Senior DevOps Engineer, you will join our U.S.-Based Virtual Operating Center, working within a dynamic team to manage and evolve production cloud environments. Your primary mission will involve the strategic containerization of legacy workloads and the architectural separation of accounts to improve security and scalability. You will be expected to lead by example—architecting solutions with Terraform + Atmos, implementing EKS best practices, and mentoring peers to ensure collective success. What You’ll Do - Workload Migration: Lead the transition of acquired ad exchange workloads from EC2 into a modern, containerized Amazon EKS architecture. - Account Restructuring: Execute AWS account separation, moving from shared environments into distinct, isolated accounts (Dev, Staging, Prod) with robust enterprise guardrails. - Infrastructure as Code: Design and maintain a DRY, component-driven infrastructure using Terraform and Atmos. - EKS Operations: Architect and operate multi-tenant Kubernetes platforms, focusing on namespace isolation, RBAC models, and cluster security. - GitOps & CI/CD: Implement and optimize deployments using ArgoCD and GitHub Actions for a seamless, automated SDLC. - Technical Mentorship: Act as a subject matter expert (SME) within your pod, guiding engineers on complex troubleshooting and architectural best practices. - Cloud-Native Security: Manage secrets and encryption using AWS Secrets Manager and KMS, ensuring secure cross-account access and Kubernetes integration. You Have - Experience: 5+ years of professional experience in DevOps, CloudOps, or SRE, specializing in high-scale Amazon EKS environments. - Infrastructure Frameworks: Advanced proficiency with Terraform, specifically utilizing Atmos (or similar wrappers) to manage hierarchical configurations across multi-account structures. - Migration Mastery: Proven track record of migrating complex workloads from EC2 to EKS with minimal downtime. - Multi-Account Governance: Deep experience with AWS Organizations and Landing Zone patterns to enforce environment isolation. - Enterprise Guardrails: Ability to implement security guardrails using IAM Permission Boundaries and SCPs. - Containerization: Expert knowledge of Docker and Kubernetes-native networking. - Coding: Proficiency in Golang, Python, or Bash for building custom automation and migration tooling. - AWS Security: Production experience with AWS Secrets Manager, KMS, and integration patterns like External Secrets Operator for EKS. - Observability: Experience implementing enterprise monitoring suites like Datadog, Prometheus, or Grafana. Extra Awesome - Progressive Delivery: Production experience with Argo Rollouts for Canary and Blue/Green deployments. - FinOps & Cost Governance: Experience with cost allocation, tagging strategies, and tools like KubeCost to manage spend during account migrations. - Policy as Code: Experience implementing OPA (Open Policy Agent) or Kyverno to enforce compliance within EKS. - Scale & Performance: Background in high-transaction industries (AdTech, Gaming, or Fintech) where platform stability is critical. - Platform Engineering: A mindset toward building internal developer platforms (IDPs) that allow for self-service within the new EKS accounts. - Certifications: AWS Certified Solutions Architect (Pro) or Certified Kubernetes Administrator (CKA). Benefits - 100% Remote Workplace: We’ve been remote since Day 1! - Unlimited Paid Time Off. - Equity: Become a true owner of the company. - 401K with company contribution and sponsored healthcare. - Professional Growth: Access to training and certification programs to accelerate your career.
• Design, implement, and validate disaster recovery architectures for relational, NoSQL, and managed data services across AWS, Azure, or GCP • Plan and execute database migration cutovers including blue-green database swaps, read-replica promotion, and zero-downtime schema migration workflows • Architect replication topologies (cross-region, cross-account, active-passive, active-active) and validate RPO/RTO targets through runbook-driven DR drills • Build and maintain Infrastructure as Code for data platform provisioning (RDS, Aurora, DynamoDB, ElastiCache, Redshift, managed Kafka/MSK, etc.) using Terraform, Atlantis, and/or CloudFormation • Design backup, snapshot, and point-in-time recovery strategies with automated validation and alerting • Develop automation tooling for data platform operations: failover orchestration, health checks, capacity scaling, and credential rotation • Implement observability for data infrastructure—replication lag monitoring, connection pool health, query performance baselines, and storage growth forecasting • Support production workload migrations including data tier cutovers with rollback plans and data integrity verification • Contribute to multi-tenant Kubernetes platform operations where data services intersect (e.g., External Secrets Operator for DB credentials, sidecar patterns for connection pooling) • Participate in regular customer and internal EverOps scrums, providing data architecture guidance and operational status • Document runbooks, architecture decision records (ADRs), and operational playbooks for data platform operations
• Develop and use automation tools effectively to operate, manage, and scale production and development environments in Azure quickly • Design, build, and maintain CI/CD pipelines using Azure DevOps Pipelines, including multi-stage YAML pipelines for infrastructure and application deployments • Author and maintain Azure infrastructure using Bicep templates and Terraform modules, following IaC best practices • Participate in regular customer and internal EverOps scrums • Monitor Azure environments using native tooling and third-party platforms while focusing on constant improvement • Implement new Azure services and technologies as customer requirements evolve • Design and execute new solutions while working to improve existing ones • Provide operational support and project deployments for our customer environments
• Support engineering teams by triaging and resolving infrastructure and CI/CD issues • Build automation that eliminates repetitive operational work • Contribute to cloud infrastructure improvements across development, staging, and production environments • Continuously enhance deployment pipelines, monitoring, and infrastructure reliability • Respond to infrastructure and CI/CD-related tickets from engineering teams • Troubleshoot build failures, deployment issues, IAM/permissions problems, networking misconfigurations, and container runtime errors • Assist with debugging issues across cloud environments (dev, staging, prod) • Identify recurring issues and propose automation-based solutions • Contribute to Infrastructure as Code (IaC) using tools like Terraform, Pulumi, or CloudFormation • Improve reusable modules and promote DRY infrastructure patterns • Write automation scripts and tooling in Python, Go, or Bash to eliminate manual processes • Help automate environment provisioning, account setup, and deployment workflows • Support and enhance CI/CD pipelines using GitHub Actions, GitLab CI, Jenkins, or similar tools • Improve deployment reliability and rollback processes • Contribute to GitOps-based workflows using tools like ArgoCD or Flux • Support containerized workloads running on Kubernetes (EKS, AKS, GKE, or self-managed clusters) • Assist with namespace configuration, RBAC, and cluster hygiene • Help manage cloud resources across AWS and other major cloud providers • Participate in environment setup and account structuring projects • Assist in implementing and tuning monitoring and alerting systems (Datadog, Cloud-native monitoring tools) • Help reduce alert fatigue through automation and improved signal quality • Contribute to post-incident reviews and identify preventative automation opportunities • Document processes and create internal runbooks • Identify opportunities to automate manual support workflows • Collaborate with senior engineers on platform enhancement projects
This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more. Role Description This role involves owning and executing a comprehensive IT support automation strategy designed to significantly reduce ticket volume and human intervention. - Eliminating tickets before they are created - Automating resolution paths when tickets do occur - Building durable automation frameworks across SaaS and internal platforms - Removing systemic friction across the IT lifecycle You will operate heavily within the IT support domain, addressing areas such as: - Account lockouts and access management - Provisioning and deprovisioning workflows - Device and asset lifecycle management - Standard internal IT requests - SaaS integrations and workflow orchestration The expectation is leadership-level ownership. You will define the automation roadmap, architect solutions, and drive initiatives from intake through deployment with measurable outcomes. Qualifications - 8+ years in SRE, Platform Engineering, DevOps, or Automation Engineering - Proven experience designing enterprise-scale automation systems - Strong exposure to IT support domains (access, provisioning, identity, device lifecycle, SaaS operations) Requirements - Deep experience designing and consuming REST APIs - Strong understanding of authentication and authorization patterns - Experience orchestrating workflows across multiple SaaS platforms - Strong proficiency in Python or Go - Experience building production-ready services - Advanced scripting for orchestration and automation logic - Strong familiarity with at least one major cloud provider (AWS, GCP, or Azure) - Containerization and Kubernetes exposure - Infrastructure as Code experience - Networking fundamentals - Identity and access concepts - Understanding of asset lifecycle management - Experience leading technical initiatives from idea through deployment - Ability to mentor junior engineers - Strong written and verbal communication skills - Comfortable influencing cross-functional stakeholders - Data-driven decision-making approach Benefits - 100% Remote Workplace - Unlimited Paid Time Off - Equity – Become a true owner of the company - 401K with company contribution and sponsored healthcare - Professional Growth – Access to training and certification programs
• Lead technical workshops to identify, refine, and prioritize high-impact AI and GenAI use cases aligned with business objectives. • Translate business problems into system design requirements and AI workflows. • Assess existing data platforms, pipelines, governance, and accessibility for AI workloads. • Evaluate data quality, lineage, security, and suitability for training, RAG, and inference patterns. • Design AI architectures that comply with enterprise security, privacy, and regulatory constraints (PII, PHI, internal policies). • Evaluate and design integrations across APIs, event streams, and existing systems. • Evaluate and recommend foundation models and AI services, including Amazon Bedrock, Amazon Nova, and open-source models. • Analyze tradeoffs across cost, latency, accuracy, and scalability. • Design GenAI patterns such as RAG, agent workflows, and inference pipelines. • Produce high-level and detailed AWS reference architectures for prioritized AI use cases. • Define phased implementation roadmaps that balance speed, risk, and long-term maintainability. • Identify PoC scope that can be executed within a short engagement. • Partner with stakeholders to develop ROI and TCO models for AI initiatives. • Provide cost modeling for model usage, data pipelines, infrastructure, and operations. • Deliver AI assessment findings and recommendations. • Create target-state AI platform architecture diagrams. • Summarize data readiness and compliance assessments. • Provide model evaluation and selection rationale. • Define phased implementation roadmap. • Design and validate PoC. • Prepare executive-ready presentations and documentation.