NetApp / ONTAP Storage Engineering — FSx for ONTAP provisioning, volume and SVM management, snapshot policies, tiering policies, ONTAP CLI/REST API operations, and performance tuning AWS Storage Architecture — FSx for ONTAP sizing and deployment, throughput capacity planning, integration with VPCs, and cost optimization (capacity pool vs. SSD tier) Data Migration & Replication — SnapMirror configuration for cross-region replication, NetApp XCP or robocopy for bulk data migration, cutover planning, and data validation Cloud Network Architecture — VPC subnet design, security groups for NFS/SMB/iSCSI protocols, cross-region VPC peering for replication traffic, and DNS configuration for file system endpoints Linux / Windows Systems Engineering — NFS mount configuration on Linux, SMB share mapping on Windows, multi-protocol access testing, and client-side performance tuning Backup, DR & Data Protection — AWS Backup integration with FSx for ONTAP, snapshot scheduling, cross-region DR strategy, and RTO/RPO validation Security & Compliance — Encryption at rest (KMS), encryption in transit, IAM policies for FSx access, ONTAP export policies, and data governance controls
Sr TechOps & SRE Lead Engineer
Location
EST (UTC-5)
Posted
39 days ago
Salary
0
Seniority
Lead
No structured requirement data.
Job Description
Sr TechOps & SRE Lead Engineer
Simple Solutions
Role Description We are seeking a highly experienced TechOps & SRE Lead Engineer with deep expertise in Cloud to lead our cloud infrastructure, DevOps practices, reliability engineering, and operational excellence initiatives. This role is both strategic and hands-on — responsible for designing scalable architectures, improving automation, ensuring system reliability, and leading the TechOps team. Key Responsibilities - Architect and manage secure, scalable, and highly available infrastructure on AWS. - Design multi-account AWS environments using AWS Organizations. - Implement VPC architecture, IAM policies, networking, and security best practices. - Oversee EC2, ECS/EKS, Lambda, RDS, S3, CloudFront, and related AWS services. - Optimize AWS cost management and resource utilization. - Implement Site Reliability Engineering (SRE) best practices. - Define SLIs, SLOs, and error budgets. - Manage monitoring and alerting (CloudWatch, Datadog, Prometheus, Grafana). - Lead incident response, root cause analysis (RCA), and postmortems. - Ensure 24/7 uptime and operational resilience. - Implement IAM best practices and least-privilege access controls. - Manage secrets and key management (AWS KMS, Secrets Manager). - Conduct vulnerability management and patching. - Support compliance initiatives (SOC 2, ISO 27001, GDPR as applicable). - Lead disaster recovery planning and backup strategies. - Lead and mentor a team of DevOps/TechOps engineers. - Establish operational KPIs and performance benchmarks. - Manage on-call rotations and escalation processes. - Collaborate with Engineering, Product, Security, and Data teams. - Contribute to long-term infrastructure strategy and cloud roadmap. Qualifications - Bachelor’s degree in Computer Science, Engineering, or equivalent experience. - 10+ years in DevOps, Cloud Engineering, or Infrastructure roles. - 5+ years leading SRE technical teams. - Strong hands-on experience with AWS services (EC2, EKS, RDS, S3, IAM, VPC, Lambda). - Deep knowledge of networking, Linux systems, and distributed systems. - Experience with Infrastructure-as-Code (Terraform or CloudFormation). - Strong scripting skills (Python, Bash, or similar). - Experience with containerization (Docker) and Kubernetes (EKS preferred). Key Competencies - Strong architectural thinking - Hands-on technical leadership - Crisis and incident management - Strategic planning and execution - Excellent cross-functional communication
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Solution Deployment Engineer
Flock SafetyWe are the first public safety operating system empowering over 2500 cities to eliminate crime.
• Execute technical solution design and deployment for key customer engagements across the U.S. • Serve as a trusted technical advisor, ensuring deployments align with customer needs and product capabilities. • Troubleshoot and resolve complex technical challenges in real time. • Contribute to team knowledge by documenting best practices and sharing lessons learned. • Partner cross-functionally with Sales, Product, and Customer Success to support customer goals. • Maintain deep technical knowledge of Flock Safety’s products and competitive solutions. • Represent the company at customer meetings, field visits, and industry events. • Provide feedback to improve internal processes, tools, and deployment methodologies.
DevOps Release Manager
Arize AIArize AI is a machine learning observability platform for ML practitioners to detect and troubleshoot model issues
• Design, build, and maintain scalable backend systems that power the deployment of Arize in customer-managed (on-prem and cloud) environments. • Develop tooling and infrastructure to package, test, and deliver the Arize platform as reliable, production-ready self-hosted releases. • Work across the stack using Go, Java, Python, and Bazel to build reproducible builds and deployment pipelines. • Partner with customers to understand infrastructure constraints and translate them into robust deployment architectures. • Build and optimize services that support high-volume analytics workloads in resource-constrained or isolated environments. • Improve system reliability, observability, and upgradeability for distributed deployments.
WebMD and its affiliates is an Equal Opportunity/Affirmative Action employer and does not discriminate on the basis of race, ancestry, color, religion, sex, gender, age, marital status, sexual orientation, gender identity, national origin, medical condition, disability, veterans status, or any other basis protected by law. Position Overview Our BI team runs a set of GCP-based APIs and data services that a lot of internal products depend on. As we've grown, keeping things running has increasingly been a side responsibility for engineers who are primarily building features — and that's not sustainable. We're looking for an SRE to own that space: service health, incident response, infrastructure monitoring, and making sure we're not blindly burning cloud budget. The Site Reliability Engineer will ensure the availability, performance, and security of the Business Intelligence team's GCP-hosted APIs and data infrastructure. This role is responsible for proactive monitoring, incident response, and continuous improvement of platform reliability across a cloud-native stack. The engineer will work closely with backend and data engineers to maintain service health and drive operational excellence. This position also carries responsibility for GCP cost visibility, helping the team track and optimize cloud spend through structured monitoring and alerting. Responsibilities - Monitor and maintain uptime of GCP-hosted APIs and services, keeping performance within agreed targets - Lead incident response for BI platform services — triage, resolve, and follow up with post-mortems that actually prevent recurrence - Build and manage observability infrastructure: dashboards, alerts, and logging across GCP services - Track GCP cloud spend and set up cost alerting to flag anomalies before they become problems - Review and fix security gaps — IAP configs, service account permissions, API access controls - Work with data and backend engineers to shore up reliability of data pipelines and BigQuery workflows - Contribute to infrastructure-as-code and help keep deployments documented and reproducible Qualifications - 2+ years in a Site Reliability, DevOps, or Cloud Infrastructure role in a production environment - Bachelor's degree in Computer Science, Engineering, or related field, or equivalent hands-on experience - Practical experience with GCP — Cloud Run, API Gateway, and BigQuery in particular - Experience with monitoring and observability tooling (Cloud Monitoring, Datadog, or similar) - Solid grasp of cloud security fundamentals — IAM, network controls, access management - Proficiency with Git and version control in a team setting Please list the preferred skills here: - CI/CD pipelines and deployment automation (GitHub Actions, Cloud Build, or similar) - Terraform or other infrastructure-as-code tools - Python for scripting or automation - MySQL, Spanner, or BigQuery at any meaningful depth - GCP cost management and spend optimization - Experience with dbt or Looker - Comfortable working across CET/EST hours in a distributed team
DevOps Engineer
RE PartnersWe make the Aspirational Attainable. We Do Better Together to Deliver Real Change.
DevOps Engineer We are looking for a DevOps Engineer with basic Golang skills to support a team focused on improving development processes and the product. In this role, you will work closely with a Staff Engineer building with a focus on designing, migrating, implementing, and maintaining the infrastructure and cloud environments that enable these systems to operate reliably and at scale. Your work will involve containerized services, AWS Lambda, Terraform-managed infrastructure, and secure data transfer between AWS accounts, with a strong emphasis on scalability, security, and operational excellence. You will collaborate with engineering, product, and operational teams, while also coordinating technical requirements with the vendor. Strong communication skills are important, as you’ll often act as the bridge between the team and other internal teams, ensuring infrastructure and systems are aligned with internal standards and best practices. What we’re looking for: ● Strong AWS and DevOps experience (infrastructure, deployments, operations, observability) ● Solid experience with infrastructure as code (Terraform preferred) ● Experience with containerized environments (Docker, ECS/EKS, etc.) ● Hands-on experience with AWS services such as Lambda and related cloud tooling ● Golang and Bash skills for scripting, automation, and backend support ● Ability to work independently while collaborating with multiple stakeholders ● Strong communication and coordination skills Nice to have: ● Familiarity with TypeScript or Node.js This role is ideal for engineers who enjoy working towards engineering excellence and helping the team excel. You will collaborate across infrastructure and development teams to deliver impactful solutions. Join Our Global Team: We invite you to apply for the position at RE Partners. Join us in shaping the future of business technology consulting and transforming the way organizations thrive in a digital world. As a diverse, woman-owned global business, we pride ourselves on keeping talent happy – our 7% attrition rate speaks volumes. Bring your talented friends along and earn a referral bonus Equal Opportunity Employer: We are an equal opportunity employer and welcome applications from all qualified individuals regardless of race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, disability, or veteran status.



