AppGate Cybersecurity, Inc. logo
AppGate Cybersecurity, Inc.

AppGate is a leading cybersecurity company and pioneer in the Zero Trust Network Access (ZTNA) market focused on providing cutting-edge solutions that protect organizations from evolving threats. Our mission is to support the warfighter, the national security community, and critical infrastructure by providing trusted access that ensures mission success.

Senior/Staff/Principal SWE- Observability Engineering

Location

United States

Posted

20 days ago

Salary

0

Seniority

Lead

No structured requirement data.

Job Description

Senior/Staff/Principal SWE- Observability Engineering

AppGate Cybersecurity, Inc.

Role Description We’re looking for an Observability Engineer (Senior/Staff/Principal level) who has shipped distributed tracing systems, designed high-cardinality pipelines, and knows OpenTelemetry inside and out. You will own the end-to-end design and implementation of the AppGate observability fabric — from telemetry SDKs in our clients and gateways, to the LogForwarder pipeline, to customer-side integrations. You’ll make the foundational technical decisions — transport protocols, sampling strategies, schema design, correlation models — that determine whether our platform scales gracefully to hundreds of millions of events per day. This is a builder’s role with a strategist’s reach. Key Responsibilities - OpenTelemetry-Native Telemetry Fabric: Logs and distributed traces from clients, controllers, gateways, and connectors — all correlated by session, user, device, and trace ID across the full ZTNA flow. - High-Cardinality Data Pipeline: An OTLP-based ingestion and routing layer engineered for 100M+ events per day, with attribute filtering, redaction, and tail-sampling. - End-to-End Distributed Tracing: Span hierarchies decomposing login and session establishment across posture checks, policy decisions, TLS handshakes, and entitlement resolution — turning hours of triage into seconds. - On-Demand Packet Capture: Admin-triggered PCAP coordinated across client and gateway, with the workflow fully observable through OTel logs and traces. - AI-Ready Foundation: Structured, semantically rich telemetry that future LLM-based incident analysis agents can reason over. The schema you design today is the substrate for Phase 3. - Architect the Observability Platform: Define telemetry schema, correlation model, transport, and sampling strategies spanning client devices, controllers, and gateways. - Build the Telemetry SDKs and LogForwarder: Instrument AppGate components with OpenTelemetry and implement the enrichment, redaction, batching, and tail-sampling pipeline that scales horizontally under load. - Validate at Customer Scale: Test in lab environments matching our largest deployments — hundreds of sites, tens of thousands of concurrent sessions — and hunt down cardinality explosions and pipeline backpressure before customers see them. - Drive Integration Standards: Own the OTLP, Prometheus, and JSON-log compatibility surface and validate ingestion into Datadog, Splunk, Nexthink, and Elastic. - Raise the Engineering Bar: Establish patterns and review practices the Data + AI team builds on. Mentor engineers and grow the observability discipline inside AppGate. - Collaborate Cross-Functionally: Work directly with product, R&D, and marquee customers in defense and critical infrastructure to shape requirements and deliver outcomes that matter. Qualifications - 8+ years of engineering experience with at least 4 years dedicated to observability, telemetry, or large-scale data infrastructure (Datadog, Splunk, Elastic, Honeycomb, New Relic, Grafana Labs, or equivalent). - Deep OpenTelemetry expertise: OTLP, the OTel Collector, semantic conventions, context propagation, and head/tail sampling — you can debate the trade-offs in your sleep. - Distributed tracing in production: You’ve designed or significantly contributed to a tracing system handling real customer traffic, not just a side project. - High-throughput pipeline experience: Hands-on with systems ingesting 100M+ events per day, including back-pressure handling, batching, and storage trade-offs. - Strong systems programming: Production Go and/or Rust preferred. Comfort across the stack, from agent code to backend services. - Networking and security fluency: Comfortable with TLS, DNS, TCP, and identity protocols. Prior ZTNA, SASE, or SD-WAN experience is a strong plus. - Mindset: Pragmatic, opinionated, and impact driven. You know when to prototype and when to ship. Our Observability Vision AppGate secures defense agencies, federal governments, and Fortune 100 enterprises. When a connection traverses our ZTNA fabric — across clients, gateways, controllers, and protected resources — every hop carries real consequences for national security and business continuity. Yet when something breaks, the answer to “Why can’t I reach this resource?” is still buried in fragmented logs and tribal knowledge. That ends now. We are building Observability AI — a purpose-built observability platform for the Zero Trust era. It emits high-fidelity, correlated telemetry across every AppGate component, is OpenTelemetry-native, engineered for 100M+ events per day, and designed to stream into Datadog, Splunk, Nexthink, Elastic, or any OTLP-compatible backend. The roadmap runs from a raw data-feed MVP, through native analytics and root-cause dashboards, to AI-driven incident analysis — LLM agents that read traces and explain failures in AppGate terms — and ultimately to autonomous remediation. This is the nervous system for networks that protect nations. This is your chance to build the observability platform for networks that protect nations. If you’ve shipped observability at scale and want to apply that craft where the stakes are highest, we want to hear from you. Company Description AppGate is An Equal Opportunity/Affirmative Action Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or veteran status, age or any other federally protected class. In furtherance of AppGate's policy regarding affirmative action and equal employment opportunity, AppGate has developed a written affirmative action program. This program is available for review upon request by any applicant or employee during normal business hours by contacting the company's EEO Coordinator.

Related Categories

Related Job Pages

More Engineering Manager Jobs

Vannevar logo

Engineering Manager, Mission Agents

Vannevar

Vannevar is a defense technology company building AI to deter our adversaries. In the 21st century, conflict moves at algorithmic speed and foresight equals firepower. Our agentic AI is purpose-built to compete with China—from cross-Strait conflict to gray zone coercion. Trained on the most mission-relevant datasets in defense, our technology models adversary behavior, simulates campaigns, and recommends the best course of action to decision makers. Our AI systems are some of the most trusted in the industry and actively used on the front lines of the Indo-Pacific to keep the peace and save lives. Exceptional technology starts with exceptional people. Vannevar is a small agile team combining world-class engineers with veteran strategists who bring deep expertise in defense and tradecraft. We’re building a company defined by mission impact, user empathy, and disciplined growth. In just three years, we grew from $3M to $80M in ARR, achieved early profitability, and reached unicorn status—proving that disruption doesn’t require an ego, and staying power doesn’t mean standing still.

Full TimeRemoteTeam 51-200Since 2019

Role Description Vannevar is building the future of national security enablement with agentic AI. Our platform tightens the loop from observation to action by giving warfighters cutting-edge analysis tools to make better decisions, faster. We’re looking for a talented Software Engineering Manager to lead our Mission Agents engineering team. This team builds the agent-powered mission console, product surfaces, and workflow infrastructure that enable mission owners in the defense and national security space to leverage AI agents to drive real operational outcomes. Your team will partner closely with customers, mission owners, designers, product leaders, and engineers to build lasting products where users are meaningfully enabled by AI. You’ll be responsible for leading the team that designs, ships, and scales full-stack agentic workflows. This role is ideal for a product-minded engineering leader who can combine strong management fundamentals with deep technical judgment around AI-native product development. You’ll help the team move quickly, iterate with users, and build pragmatic, high-impact agentic systems for national security missions. What You’ll Do - Product engineering: Lead the Mission Agents team in designing, building, and shipping a cutting-edge agent-powered mission console using TypeScript, React, Tailwind, Node.js, Python, Postgres, and AWS. - Agent leadership: Guide the implementation and scaling of robust agentic workflows that enable missions at scale, using modern agent frameworks and patterns. - Customer and mission partnership: Partner with customers, mission owners, designers, and product teams to understand user workflows and drive concrete outcomes aligned with real-world operator workflows. - AI-native product strategy: Help define and shape how users are enabled by AI and agents at Vannevar, balancing ambitious product vision with reliable execution. - Developer and team acceleration: Reduce friction and duplication across the team by building strong technical patterns, reusable services, and clear product engineering practices. - Team leadership: Grow and mentor a high-performing team of full-stack engineers. Foster a strong engineering culture focused on pragmatism, velocity, ownership, and mission impact. - Cross-functional execution: Collaborate with Design, Platform, and Product teams to deliver durable, reliable, agentic software. Qualifications - 3+ years of experience leading and building software engineering teams, ideally in product engineering, AI/ML, or related domains. - 8+ years of professional software development experience, with strong technical judgment across full-stack systems. - Familiarity with agentic AI products and workflows, including frameworks such as LangGraph, Agno, Pydantic AI, or similar systems. - Strong product instincts and prior experience working directly with users, customers, or mission owners to understand workflows, collect feedback, and ship better solutions. - Ability to lead teams through ambiguity, balancing speed, reliability, security, and mission impact. - Track record of data-driven decision making - argue with metrics, not gut feelings. - Excellent communication skills, cross-functional leadership, and project ownership. - Proficiency with: - Coding in Python, TypeScript, or similar. - PostgreSQL, or other relational databases. - Software development in AWS or other cloud services. - Willingness to travel to offsites, team syncs, and customer sites. - U.S. Citizenship, required to access U.S.-only data systems. Nice to Have - Active security clearance or ability/willingness to obtain one. - Prior experience building software for defense, national security or intelligence use cases. Benefits - Health, dental, and vision insurance. - Remote friendly with WeWork access. - Unlimited PTO, shared downtime during the federal holiday calendar, and company-wide off time at the end of each year. - 401(k) match. - Lifestyle & wellbeing stipends. - Salary top-up during military reserve duty. - Fully paid parental leave. - Child and pet care reimbursement during travel.

New York
$200K - $260K / year
CoLab Software logo

Engineering Manager

CoLab Software

Setting the standard in engineering collaboration. Simplified design review that lets teams build the future—faster.

Full TimeRemoteTeam 51-200H1B No Sponsor

• Lead and support a team of engineers building CoLab’s AI-powered Operator platform • Drive technical execution across applying LLMs, agentic workflows, vector search, retrieval systems AI guardrail models, and design data processing • Help the team rapidly test hypotheses and iterate toward product-market fit • Contribute technically through architecture guidance, code reviews, debugging, and hands-on problem solving • Partner closely with Product and Design to shape roadmap direction and execution strategy • Establish lightweight but effective planning and delivery practices • Coach and develop engineers across varying levels of experience • Create clarity and momentum in a product area where requirements and solutions evolve quickly

Canada
Grafana Labs logo

Engineering Manager

Grafana Labs

Grafana Labs supports organizations’ monitoring, visualization and observability goals. 950,000+ active installations

Full TimeRemoteTeam 501-1,000Since 2014H1B Sponsor

• Manage and develop metrics engineering teams • Facilitate cross-team collaboration and remove blockers • Engage with product owners to manage the product roadmap • Conduct technical discussions and decision-making with teams • Interact with customers to ensure satisfaction

United Kingdom
£103K - £129K / year
Job Closed
Grafana Labs logo

Engineering Manager

Grafana Labs

Grafana Labs supports organizations’ monitoring, visualization and observability goals. 950,000+ active installations

Full TimeRemoteTeam 501-1,000Since 2014H1B Sponsor

• The core focus of the role is people management, but you should have enough technical skills to manage a highly technical team and product. • Manage, hire, and develop a team of engineers, providing regular feedback and supporting each person’s growth through career conversations. • You will act as project manager as well as working with product owners to ensure the product roadmap is defined and up-to-date. • You have a strong technical background and are capable of engaging in technical conversations and challenging teams to arrive at strong technical decisions themselves. • You will be comfortable working with engineering teams who have a strong sense of autonomy in their decision making, be it technical or product focused. • Interacting with customers will occasionally occur, most of our customers are large companies using Mimir to monitor their operations. • While you’re great with people and adept at managing relationships, you still keep up-to-date with the latest technical trends and shifts in order to maintain and enhance your understanding of the challenges your teams face. • Ideally you will have a Computer Science degree or equivalent experience from having worked as an engineer before moving into Management. • We develop in Go, we are Cloud Native and deploy on Kubernetes across all Cloud Service Providers using Helm and Jsonnet. Conceptual familiarity with Kubernetes and Helm is valuable, knowledge around distributed systems is appreciated.

Spain
€94K - €117K / year
Job Closed