The fastest way to visualize, understand and debug software. Find the critical issues that logs and metrics can’t see.
Field Reliability Engineer
Location
Brazil
Posted
2 days ago
Salary
0
Seniority
Senior
Job Description
Field Reliability Engineer
Honeycomb.io
• Own and operate customer-facing managed infrastructure including Refinery as a Service (RaaS) and Honeycomb Private Cloud (HnyPC) deployments across multiple AWS accounts and regions. • Build and maintain Terraform modules, Helm charts, and deployment automation for provisioning and managing customer EKS clusters, collector pools, and Refinery instances. • Design and implement monitoring, alerting, and observability for managed service infrastructure - using Honeycomb to monitor Honeycomb. • Manage scaling, upgrades, and incident response for customer deployments, including capacity planning and cost optimization across AWS infrastructure. • Building autonomous deployment and management tooling for field-operated managed services. • Serve as the senior technical escalation point for our most challenging customer situations - production incidents, complex collector configurations, Refinery tuning, and architecture reviews that exceed the scope of standard technical roles. • Diagnose and resolve deep infrastructure and observability issues spanning distributed systems, Kubernetes clusters, AWS networking (ALBs, PrivateLink, NLBs, VPCs), and polyglot service meshes. • Partner directly with customer SRE, platform, and engineering teams to troubleshoot real-time production issues, often under time pressure and with direct revenue impact. • Participate in an on-call rotation for managed services (Refinery as a Service, Honeycomb Private Cloud), providing Tier 2 escalation support for customer-facing infrastructure issues. • Build and maintain SOPs, runbooks, and diagnostic frameworks that accelerate resolution for the broader field and support teams. • Contribute to and maintain OpenTelemetry distributions, collectors, exporters, and instrumentation libraries that our customers depend on. • Represent Honeycomb in the OpenTelemetry community - participating in SIGs, reviewing PRs, triaging issues, and driving adoption of best practices. • Build reference architectures, sample collector configurations, and integration guides that demonstrate effective instrumentation patterns for common customer environments (Kubernetes, ECS, serverless).
Job Requirements
- Serve as the senior technical escalation point for our most challenging customer situations - production incidents, complex collector configurations, Refinery tuning, and architecture reviews that exceed the scope of standard technical roles.
- Diagnose and resolve deep infrastructure and observability issues spanning distributed systems, Kubernetes clusters, AWS networking (ALBs, PrivateLink, NLBs, VPCs), and polyglot service meshes.
- Partner directly with customer SRE, platform, and engineering teams to troubleshoot real-time production issues, often under time pressure and with direct revenue impact.
- Participate in an on-call rotation for managed services (Refinery as a Service, Honeycomb Private Cloud), providing Tier 2 escalation support for customer-facing infrastructure issues.
- Build and maintain SOPs, runbooks, and diagnostic frameworks that accelerate resolution for the broader field and support teams.
- Own and operate customer-facing managed infrastructure including Refinery as a Service (RaaS) and Honeycomb Private Cloud (HnyPC) deployments across multiple AWS accounts and regions.
- Build and maintain Terraform modules, Helm charts, and deployment automation for provisioning and managing customer EKS clusters, collector pools, and Refinery instances.
- Design and implement monitoring, alerting, and observability for managed service infrastructure - using Honeycomb to monitor Honeycomb.
- Manage scaling, upgrades, and incident response for customer deployments, including capacity planning and cost optimization across AWS infrastructure.
- Building autonomous deployment and management tooling for field-operated managed services.
- Contribute to and maintain OpenTelemetry distributions, collectors, exporters, and instrumentation libraries that our customers depend on.
- Represent Honeycomb in the OpenTelemetry community - participating in SIGs, reviewing PRs, triaging issues, and driving adoption of best practices.
- Build reference architectures, sample collector configurations, and integration guides that demonstrate effective instrumentation patterns for common customer environments (Kubernetes, ECS, serverless).
- Identify gaps in the open source ecosystem that create friction for customers and either contribute fixes upstream or build bridging solutions.
- Contribute features and improvements to Honeycomb’s own open source projects (Refinery, Honeycomb Collector Distro) to support managed service capabilities.
- Be the person Solutions Architects call when a deal goes deeper than demo and design - you join calls to troubleshoot live production environments, validate architecture decisions, and provide the infrastructure credibility that closes technical evaluations.
- Tag-team with SAs on strategic accounts, owning the infrastructure and data pipeline conversations while they own the product narrative.
- Lead architecture reviews, SLO workshops, and instrumentation deep-dives for customers evaluating or expanding Honeycomb - especially in complex environments (multi-cluster Kubernetes, hybrid cloud, high-cardinality workloads).
- Step into customer-facing POCs and pilots as the hands-on technical lead, standing up collector pools, configuring Refinery pipelines, and proving out integrations in the customer’s actual environment.
- Create feedback loops between the field and product/engineering, surfacing patterns from customer environments that inform roadmap priorities.
Benefits
- A stake in our success - generous equity with employee-friendly stock program
- It’s not about how strong of a negotiator you are - our pay is based on transparent levels relative to experience
- Time to recharge with unlimited PTO
- A distributed-first mindset and culture (really!)
- Home office, co-working, and internet stipend
- Full benefits coverage for employees, with additional coverage available for dependents
- Up to 16 weeks of paid parental leave, regardless of path to parenthood
- Annual development allowance
- And much more...
Related Guides
Related Categories
Related Job Pages
More Field Engineer Jobs
Field Service Engineer – Electric Power
Caterpillar Inc.We help our customers build a better, more sustainable world.
• Perform startup, preventative maintenance, testing, repair, and advanced electrical services on generator paralleling controls and switchgear systems. • Ensure high levels of customer satisfaction across the assigned territory while supporting on-call service needs (75% travel required). • Assist and train internal technicians and external customers on preventative maintenance, troubleshooting, repair, and equipment testing. • Collaborate with the Product Support Center help desk to troubleshoot and resolve customer issues. • Support dealer service and sales teams in promoting products and services. • Submit daily service reports, weekly expense reports, and maintain accurate project documentation in a timely manner. • Stay current on training for new and updated systems, technologies, and platforms. • Gather feedback to identify opportunities for new products and improvements to existing offerings. • Evaluate manufacturing and production processes to identify defects and recommend design modifications to improve efficiency and quality. • Track, manage, and ensure resolution of product quality and performance issues. • Communicate technical specifications and service instructions clearly to ensure optimal product performance.
Field Applications Engineer – Fluid Dispensing
Mantracourt Electronics Ltd.Advanced Intelligent Instrumentation
• The Field Applications Engineer (Fluid Dispensing) is the customer-facing engineer for demos, system buyoffs, system/process development, and troubleshooting. • Customer interactions include pre-sales support, machine demonstration and buyoffs, and post-sales support, which includes troubleshooting. • Being responsible for customer support calls, customer process evaluations, and preparing evaluation reports based on the tests, and configuring systems to be quoted. • Providing solutions to machine issues and interact with customers for higher level process support.
Field Engineer
DPR ConstructionDPR Construction is a unique technical builder with a passion for results.
• Assist and support field work under the direction of the Project Superintendents and Assistant Superintendents • Coordinate working and shop drawings • Coordinate the delivery, coding and purchasing of project materials • Assist in the preparation and subsequent monitoring/updating of the project schedule • Organize and conduct subcontractor meetings and produce meeting minutes • Assist in the project’s RFI & Submittal process • Assist in the completion of project close-out activities
Senior Staff Field Application Engineer
Western DigitalWe create data storage solutions that power the technology of today and inspire the innovations of tomorrow.
• Drive the technical execution, qualification, and deployment of our enterprise solid-state drive (eSSD) portfolio within one of our largest strategic Tier-1 OEM customer accounts • Act as the principal technical liaison and internal customer advocate, bridging the gap between the OEM’s advanced server/storage architects and SanDisk’s cross-functional engineering teams • Own the technical lifecycle of upcoming flagship enterprise programs—including next-generation eSSD architectural platforms • Manage product architecture reviews (Concept & Design), Quarterly Technical Reviews (QTRs), and future technology roadmap alignments with key enterprise and AI infrastructure architects • Manage end-to-end qualification milestones, executing deep-dives across Program Concept Reviews, Design Reviews, and early evaluation risk assessments • Collaborate deeply with internal engineering and customer teams to review all hardware, firmware, security, and mechanical interfaces • Perform hands-on failure analysis (FA) within local customer environments and labs using high-speed protocol analyzers to isolate and resolve complex Firmware bugs, PCIe/NVMe protocol issues, and data integrity anomalies • Act as the primary technical point of contact to rapidly triage, contain, and resolve complex field issues and high-severity customer escalations




