Manager, DevOps
Location
Texas + 1 moreAll locations: Texas | Virginia
Posted
54 days ago
Salary
0
Seniority
Lead
No structured requirement data.
Job Description
Manager, DevOps
Seekr
Location: Austin Texas, Reston, Virginia OR Fully Remote About the Opportunity: We are seeking a DevOps Manager with deep expertise in Kubernetes, Terraform, and Ansible to help scale Seekr’s AI platform across on-premises, cloud, and SaaS environments. You’ll be highly hands-on, juggling multiple projects, mentoring engineers, and driving complex initiatives to deliver robust, scalable, and reliable systems. On-prem experience is highly preferred. This role demands a strong foundation in Linux, networking (both traditional and Kubernetes), container technologies, and automation. You’ll collaborate closely with software engineering teams, own critical infrastructure, and solve challenging operational and scalability problems in fast-paced, dynamic environments. From your first day, you will make a valuable — and valued — contribution. We are a fast-growing company where no one is a bystander. We offer you the opportunity to delight millions of consumers around the world while gaining meaningful experience across a variety of disciplines. Duties and Responsibilities: - Lead development of solutions to complex reliability, performance, and scaling challenges. - Design, architect, and implement systems, networks, and services powering Seekr’s platform. - Provide hands-on leadership and mentorship to the team. - Partner with software engineering teams to build scalable, efficient, and reliable services. - Identify and resolve operational inefficiencies through automation. - Troubleshoot and lead response to deployment and production incidents. - Implement and enforce security best practices, ensuring infrastructure, deployments, and data are protected at every stage. Skills and Qualifications: - Technical Leadership: 12+ years experience, Proven ability to deliver results in a high-pressure/dynamic environment, Communication Skills, Roadmap & long-term strategy, mentoring senior engineers. - Kubernetes & Distributed Systems: Enterprise-scale K8s with custom operators/controllers, multi-platform clusters, hybrid fleet orchestration across cloud & edge, K8s control plane, k8s upgrades, Docker, containerd, CRI-O, Ingress Controllers (Istio, NGINIX, Traefik), K8s Databases, Helm charts. - Database Management: Postgres, ElasticSearch/OpenSearch, Kubernetes databases, Stateful sets. - Networking: L2/L3 protocols (BGP, OSPF, VLANs, IPSec), VPNs, firewalls, redundancy paths, bare-metal Linux networking, CoreDNS, Calico, K8s service mesh (Istio). - Infrastructure Automation: Ansible, Terraform, CI/CD Pipelines, GitLab, ArgoCD, MAAS, scripting (Python, Golang, Bash), AWS, Azure. - Observability: Grafana, Prometheus, Loki, Tempo, ELK, OTEL. - Security: Zero-trust architecture, PKI, mTLS, SPIFFE/SPIRE, certificate automation, CVE remediation, secrets management, IAM. - Incident Management & RCA: End-to-end incident lifecycle, root cause analysis, corrective action ownership. About the Company: Seekr is a leader in explainable and trustworthy artificial intelligence designed to power mission-critical decisions in enterprises, government, and regulated industries. SeekrFlow™, our end-to-end AI platform, provides secure, auditable AI solutions tailored to sectors where transparency, accuracy, and compliance are paramount. Available across cloud, on-premises, and edge environments, SeekrFlow reduces bias, strengthens data integrity, and simplifies model oversight so organizations can rely on trusted AI decisions in high-stakes settings that impact society’s most sensitive and vital systems. Trusted by leading enterprises and government agencies, we partner with defense, finance, telecom, and critical infrastructure leaders to enable AI solutions that drive real-world results with unmatched transparency and control.We are a team of strategic thinkers and problem-solvers tackling the toughest challenges facing critical infrastructure and global enterprises through best-in-class AI models and customer deployment.Our team operates with unwavering commitment to our core values and mission: - We are driven by outcomes—our customers' success is what we strive for every day. - We believe trust is earned, which is why we build explainability and transparency into the entire AI lifecycle. - We take our responsibility to deliver secure AI seriously. - We believe innovation drives progress—we are building the technologies that power the systems our society depends on. Company Benefits: - Meaningful Mission & Impact - Work with a deeply talented, collaborative team solving some of the toughest AI challenges that matter. - Equity Ownership – RSUs that let you share directly in Seekr’s long‑term success and growth. - Time Off That Respects Real Life – Unlimited PTO plus 14 paid company holidays to truly recharge. - Work Your Way – A flexible hybrid work environment with offices in Reston, VA and Austin, TX, plus remote options and flexible working hours. - Competitive Total Rewards – A role‑appropriate compensation structure that supports long‑term growth, including base salary, bonuses, or commission plans depending on role. - 401(k) with Company Match – Build your future with a retirement plan that includes employer matching. - Comprehensive Health & Wellness – Medical, dental, vision, and life insurance coverage starting day one—for you and your family. - Parental Leave – Paid parental leave to support employees as they welcome a new child through birth, adoption, or foster placement.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Design and evolve Kubernetes architectures across multi-cluster environments • Build and improve platform foundations involving networking, storage, scalability, and resilience • Manage and standardize deployments using Helm Charts, Kustomize, and GitOps practices • Provision and maintain infrastructure using Terraform • Drive performance tuning and security hardening across clusters and workloads • Support cloud-native environments on GCP with a strong architecture mindset • Help define and improve networking and security architecture for distributed systems • Work with AI beyond simple chat assistant usage, applying it as a practical accelerator for analysis, operations, and problem-solving • Create and maintain development and operational scripts to improve efficiency and reliability • Read and interpret code when needed to troubleshoot behavior, assess risks, and collaborate with engineering teams • Communicate clearly across technical and non-technical stakeholders, building trust and strong collaboration across teams
DevOps Engineer
InframarkInframark's Operations and Maintenance team is an award-winning team that delivers cutting-edge water, wastewater, and public works services to municipalities, utility districts, and industries. We are dedicated to supporting our employees as well as protecting the environment and the communities we serve. You would be empowered to thrive in a dynamic, supportive, and innovative environment. Our dedication to sustainability and community impact drives us to ensure clean, safe water for future generations. Whether you're at the start of your career or looking for advancement, Inframark offers purpose-driven work and opportunities for growth.
Join Inframark: Pioneering Automation and Intelligence Step into the future with Inframark's award-winning Automation and Intelligence team. We deliver cutting-edge solutions in instrumentation and controls, industrial cybersecurity, data analysis, and remote network operations center services for water and wastewater plants. Elevate your career and join us at Inframark. Apply today! Why Work for Inframark? Our dedication to sustainability and community impact drives us to ensure clean, safe water for future generations. Whether you're at the start of your career or looking for advancement, Inframark offers purpose-driven work and opportunities for growth. We offer an attractive salary package, including a generous benefits package with health, dental, and life insurance, 401(k) plan, paid time off, sick leave, holidays, and wellness plan. Job Title: DevOps Engineer Location: Remote (Eastern Time zone preferred - AWS GovCloud requirement) Reports To: Sr. Director of Technology and Architecture Position Overview We're looking for a DevOps Engineer who takes ownership of infrastructure. You'll stabilize and modernize the infrastructure supporting WaterMinds, our cloud-based platform for water and wastewater utilities—implementing proper monitoring and alerting, upgrading production environments, establishing operational discipline, and enabling our engineering teams to ship with confidence. You'll follow DevOps best practices, proactively identify and solve problems, and drive infrastructure improvements with minimal direction. The challenge: build and maintain infrastructure that can reliably serve hundreds of utility customers at scale. Your immediate focus is moving our infrastructure from reactive firefighting to proactive maintenance mode. As the platform matures and our data science team ramps up, you'll have the opportunity to transition into MLOps, building the infrastructure that enables machine learning at scale. Key Responsibilities Take ownership of production monitoring and alerting using Prometheus, Grafana, and CloudWatch—proactively identify issues before they become incidents. Modernize production EKS cluster with GitOps practices (ArgoCD), comprehensive monitoring, and proper deployment workflows following industry best practices. Streamline staging deployment process; eliminate branch-based workarounds and establish clean GitOps patterns. Design infrastructure patterns that scale to hundreds of customers and own AWS infrastructure operations including patching, maintenance, cost optimization, and security compliance—stay ahead of requirements. Expand into MLOps—building the infrastructure that enables data scientists to deploy models at scale across multiple utility customers once DevOps operations are automated. Manage Kubernetes clusters (EKS) including pod migrations, resource optimization, troubleshooting, and security updates—proactively, not reactively. Maintain infrastructure as code using Terraform and Ansible following best practices—all changes tested in non-production before deployment. Support engineering teams with infrastructure needs, unblock them quickly, and establish self-service patterns where possible—anticipate needs, don't wait for requests. Manage message queue infrastructure (Kafka/Redpanda) including retention policies, storage optimization, and performance tuning. Document infrastructure, create runbooks, and automate operational tasks to move systems into maintenance mode. Clean up technical debt—proactively identify infrastructure to decommission, resources to consolidate, and costs to optimize. Qualifications 5+ years of experience in DevOps, infrastructure, or site reliability engineering. Demonstrated ability to take ownership and initiative—you see what needs to be done and do it without waiting for direction. Deep knowledge of DevOps and infrastructure best practices—you know what good looks like and implement it proactively. Strong Kubernetes experience (EKS preferred) including cluster management, deployments, services, and troubleshooting. Hands-on AWS experience (EC2, EKS, ECS, RDS, VPC, IAM, CloudWatch, S3). Infrastructure as code proficiency (Terraform and Ansible). GitOps experience (ArgoCD, Flux, or similar). CI/CD pipeline experience (Bitbucket Pipelines, Jenkins, GitHub Actions, or similar). Monitoring and observability experience (Prometheus and Grafana preferred). Python scripting ability for automation and tooling. US citizenship (required for AWS GovCloud access). Self-starter mentality—you identify problems and opportunities, then drive solutions to completion. Proven track record of delivering tested, high-quality infrastructure changes on schedule. Excellent communication skills—proactive about sharing status, raising blockers, and documenting decisions. Bonus Points For Curiosity about machine learning and interest in transitioning to MLOps as the platform matures. Any MLOps or ML infrastructure experience (KServe, Kubeflow, SageMaker, model serving). Experience with data pipelines, feature engineering, or supporting data science teams. AWS GovCloud experience and understanding of compliance requirements (FedRAMP). Experience with message queue systems (Kafka, Redpanda). Container security and vulnerability scanning (Snyk). Background in SaaS platforms, IoT, or critical infrastructure. Inframark is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, age, religion, sex, sexual orientation, gender identity, national origin, or protected veteran status and will not be discriminated against based on disability. Learn more about us at Automation and Intelligence - Inframark
SRE Manager
DaxkoDaxko is dedicated to pursuing and hiring a diverse workforce. We are committed to diversity in the broadest sense, including thought and perspective, age, ability, nationality, ethnicity, orientation, and gender. The skills, perspectives, ideas, and experiences of all of our team members contribute to the vitality and success of our purpose and values. We truly care for our team members, and this is reflected through our offices, and benefits, and great perks. These perks are only for our full-time team members.
We’re looking for a Manager of Site Reliability Engineering (SRE) who is passionate about building resilient systems and leading teams that keep critical services running smoothly. In this role, you’ll guide a team responsible for the reliability, performance, and operational health of our production environments. You’ll partner closely with engineering leaders to ensure our systems remain secure, scalable, and available for the organizations and communities who depend on them. What You’ll Do As the Manager of Site Reliability Engineering, you will lead a team responsible for the operational reliability of Daxko’s production platforms. Your work will focus on creating stable, high-performing systems while empowering your team to continuously improve how we operate and support our products. You will also: - Lead and support a team responsible for the reliability and performance of production systems, which includes: - Setting clear performance expectations and goals for team members - Providing ongoing coaching and real-time feedback - Ensuring team members have the training and resources they need to succeed - Coordinating on-call rotations and operational coverage - Supporting the team during critical incidents and outages - Managing team staffing, including hiring and headcount planning - Prioritize and coordinate work across operational initiatives, deployments, upgrades, and infrastructure improvements - Ensure high levels of system uptime, data integrity, and operational stability - Partner with Engineering Leads to align platform operations with product development needs - Maintain business continuity across all production assets - Monitor system health, performance, and capacity to proactively identify and resolve issues - Serve as a technical escalation point for complex infrastructure or platform challenges - Provide regular reporting on system availability, response times, and capacity trends - Ensure operations meet security, compliance, and regulatory requirements - Support and coordinate the team’s on-call rotation and incident response processes - Continuously improve operational practices through automation, tooling, and monitoring Technologies You’ll Work With Our platform relies on modern infrastructure and cloud technologies. Strong experience with several of the following areas is important: - Linux-based systems - Web server technologies (NGINX, PHP, Traefik, F5) - Virtualization platforms such as VMware - Cloud platforms including AWS and Azure - Containerization and orchestration (Docker, Kubernetes, Dynos) - Messaging and caching technologies (Redis, RabbitMQ) - A strong security mindset and experience implementing infrastructure security controls are essential. What You Bring You’re a thoughtful technical leader who enjoys solving complex operational challenges and helping engineers grow. We’re looking for someone who brings: - Strong analytical and problem-solving skills - Clear communication and collaboration skills - Experience leading teams in fast-moving technical environments - The ability to balance multiple priorities and make thoughtful decisions under pressure - Strong organizational and time management skills - A customer-focused mindset and commitment to system reliability - Bachelor’s degree in a technical discipline or equivalent professional experience - 3–5 years of experience leading or managing globally distributed engineering teams - 3–5 years of experience in a Site Reliability Engineering or similar infrastructure-focused role Preferred Experience - Experience serving as a technical lead on infrastructure or platform teams - Experience with modern observability and monitoring tools, such as OpenTelemetry, Instana, LogicMonitor, PagerDuty, or OpsGenie - Experience with infrastructure and automation tooling such as GitLab CI, Jenkins, Chef, Terraform, Elasticsearch, Kubernetes, or Rancher - Scripting experience in Ruby, Python, or Bash - Familiarity with SOC, PCI, or GDPR compliance standards - Experience working with issue tracking and collaboration tools such as the Atlassian suite - Experience supporting or developing applications built with Java, PHP, or Node - Experience automating operational processes and repetitive tasks Daxko is dedicated to pursuing and hiring a diverse workforce. We are committed to diversity in the broadest sense, including thought and perspective, age, ability, nationality, ethnicity, orientation, and gender. The skills, perspectives, ideas, and experiences of all of our team members contribute to the vitality and success of our purpose and values. We truly care for our team members, and this is reflected through our offices, and benefits, and great perks. These perks are only for our full-time team members. Some of our favorites include: 🏝 Flexible paid time off ⚕️ Affordable health, dental, and vision insurance options 💪 Monthly fitness reimbursement 🤑 401(k) matching 🍼 New-Parent Paid Leave 👖 Casual work environments 🏡 Flexible work - remote & hybrid All your information will be kept confidential according to EEO guidelines. #LI-Remote
Senior DevOps Engineer
PostscriptSMS marketing platform for ecommerce companies. Helping Shopify stores drive 30x ROI with text message marketing.
• Design, implement, and maintain infrastructure solutions on AWS, utilizing tools such as Terraform for Infrastructure as Code (IaC) and, ideally, Terraform Cloud. • Deploy and manage containerized applications using ECS, with EKS experience being a bonus. • Work closely with engineering teams to understand project requirements, ensuring seamless implementation of pre-designed infrastructure and deployment patterns. • Set up and manage CI/CD pipelines to automate and streamline the software delivery process, with a preference for GitHub Actions or similar tools. • Continuously monitor the health of the infrastructure, identifying and resolving issues to optimize performance, scalability, and security. • Leverage Python for scripting, automation, and other development tasks to enhance infrastructure and deployment processes. • Create and maintain comprehensive documentation of all infrastructure components, processes, and deployment workflows to facilitate knowledge sharing and continuity.

