Job Closed
This listing is no longer active.
Observability Engineer
Location
United States
Posted
10 days ago
Salary
$100K - $150K / year
Seniority
Mid Level
Job Description
Observability Engineer
Bright Vision Technologies
Role Description We are looking for an Observability Engineer to design and operate the metrics, logging, tracing, and alerting platforms that give engineering teams confidence in the systems they run. The role spans the full observability stack — from collection agents and pipelines to long-term storage, dashboards, and alerting workflows — with a strong focus on usability, signal quality, and operational ROI. The ideal candidate has built and operated observability platforms at scale, understands the trade-offs between open-source and SaaS approaches, and can translate noisy telemetry into actionable insight for both engineers and business stakeholders. Key Responsibilities - Design and operate enterprise-grade observability platforms covering metrics, logs, traces, events, and synthetic monitoring. - Architect Prometheus / Thanos / Mimir, Grafana, Loki, Tempo, OpenTelemetry, and Datadog deployments for high availability and scale. - Develop standards for service instrumentation, including OpenTelemetry adoption, metric naming, label cardinality, and structured logging conventions. - Define and enforce SLOs, SLIs, and error budgets, and build the dashboards and alerts that operationalize them. - Build alerting strategies that minimize noise, surface actionable signals, and integrate cleanly with on-call workflows in PagerDuty, Opsgenie, or similar tools. - Operate large-scale time-series and log storage platforms, balancing retention, query performance, and cost. - Design distributed tracing pipelines and help teams use traces to diagnose latency and reliability issues. - Develop self-service tooling, paved-road libraries, and templates that make adoption of observability standards easy for product teams. - Drive cost management and label-cardinality discipline across the observability estate. - Lead incident response readiness improvements through better dashboards, alerting hygiene, and post-incident analysis tooling. - Partner with SRE and platform teams to integrate observability into deployment pipelines, canary analysis, and progressive delivery workflows. - Evaluate and recommend observability vendors and open-source tools based on cost, capability, and operational maturity. - Mentor engineering teams on observability fundamentals, debugging techniques, and SLO-driven operations. - Maintain documentation, onboarding guides, and runbooks for the observability platform. Qualifications - Bachelor’s degree in Computer Science or a related field. - Five or more years of experience in SRE, platform engineering, or observability roles. - Deep hands-on experience with Prometheus, Grafana, and at least one major commercial observability platform such as Datadog, New Relic, or Splunk. - Strong understanding of OpenTelemetry, distributed tracing, and structured logging. - Proficiency in at least one general-purpose language such as Go, Python, or Java. - Experience operating high-cardinality, high-throughput metrics and log pipelines. - Strong understanding of SLOs, error budgets, and SRE principles. - Experience integrating observability with CI/CD and incident management tooling. - Solid grasp of Linux internals, networking, and container platforms. - Excellent communication and collaboration skills. Preferred Qualifications - Experience with Thanos, Mimir, Cortex, Loki, or Tempo at scale. - Contributions to OpenTelemetry or observability open-source projects. - Familiarity with eBPF-based observability tooling. - Experience driving observability cost optimization initiatives. - Exposure to regulated environments with audit-grade logging requirements. How to Apply Would you like to know more about this opportunity? For immediate consideration, please send your resume to [email protected] . Learn more about Bright Vision Technologies at www.bvteck.com .
Related Guides
Related Categories
Related Job Pages
More Engineer Jobs
• Lead a two-person fire damper and grease extraction team in Reading/Slough • Carry out fire damper inspections and drop testing • Clean commercial kitchen extract systems to TR19 Grease standards • Identify and report defects • Ensure all works are completed safely and to a high professional standard • Manage team productivity and maintain health & safety • Complete accurate reports with photographic evidence
ESN Engineer - Isle of Skye
EricssonWe create limitless connectivity to improve lives, redefine business and pioneer a sustainable future. #ImaginePossible
Join our Team Applications are open to candidates from the Isle of Skye area. Salary: £33,000-£36,000 per annum + allowances About this opportunity: As an ESN Engineer, you will own a designated group of sites, working as part of a four-person team covering adjacent territories. The role requires you to be self-managing, proactively maintaining your sites and collaborating closely with other ESN engineers and the wider FSO team to solve problems, with the goal of boosting network resilience and network availability.This position requires a flexible work schedule, including participation in a 24/7 on call rota and potential shift work to handle tasks outside of normal business hours. You must be prepared to travel and stay away from home for service recovery mainly on the remote islands. In a major network event scenario, you should be available to lead and complete the recovery in your own and adjoining areas This role is based in Scotland and reports directly to the Regional Manager. We currently have vacancies in Isle of Skye! Applicants should already reside in, or be willing to relocate to the Isle of Skye area. What you will do:- Conduct corrective maintenance activities and manage site facility operations.- The role requires you to take full responsibility for assigned Emergency Service Network (ESN) Sites, proactively maintaining them to prevent failures and leading the coordination with other technical team to resolve any issues that arise- You must also take a leadership role in problem management for your area resolving major issues in your own and neighbouring areas- Work individually, as part of the FSO and wider Ericsson teams to achieve right first time resolution of work within customer SLA's without compromising safe working practices- Carryout reactive and proactive maintenance on multi-customer, multi technology networks, achieving agreed performance targets (Mostly EE/ESN)- Participate in a 24 x 7 callout rota covering all equipment in the Service Area- Assist in the development and review of FSO processes and procedures with a view to maximise efficiency The skills you bring:- Radio Frequency(RF) Transmission and Receiver systems employed in mobile communication equipment including- AC/DC Power Systems (single phase and three phase)- UPS systems and deployed ESN generators- Satellite Backhaul systems- Rectification and power distribution systems- GSM, UMTS and LTE fundamentals- Cellular Network Fundamentals. Mobility Management. Cell Planning-Radio Access Network and Core Network architecture- Antenna systems, Mast Head Amplifiers, Feeder systems and connectors
Full Stack Engineer
Pogo TechnologiesPriorities may shift quickly. Oftentimes, we're tackling very ambiguous problems that don't have clear-cut answers. At times, you'll need to build things in a day: we live and breathe a value called "Calculated Speed". We don't have structured management (yet!). We expect more than 9 to 5 - raw hours make an impact at our current stage. We trust each team member to create a flexible work schedule that allows them to be most productive while accommodating other priorities outside of work. We also strongly encourage time off to recharge the batteries: in addition to unlimited PTO, we've implemented a minimum 20 days vacation policy.
Role Description As a Full Stack Engineer, you’ll work on our enterprise AI SaaS product and own meaningful parts of the product end-to-end. This is a hands-on role with real responsibility. Specifically, you will: - Build and own user-facing product features from idea to production - Work closely with engineering, product, and design to plan, architect, and ship features - Design, build, and implement AI agents, backend services, APIs, and web UIs - Partner with functional leaders across the organization (i.e., GTM, finance, customer support) to design and build internal tools to streamline operations and eliminate manual workflows Qualifications - Likely have 2-4 years of professional full-stack software development experience - You’re a heavy user of AI tools like Claude Code, Cursor, or Codex - You stay up to date with the latest in AI, including new LLMs, agent frameworks, and tooling - You have strong product and design taste and a clear sense of what “good” looks like - You take ownership and are comfortable driving an entire part of a product - You manage your own time and priorities well - You’re happy moving across frontend and backend when needed - You communicate clearly and work well with others, especially when things are ambiguous - You’ve built and maintained production systems used by real users - You’re comfortable learning new tools and building things from scratch Requirements - Experience at early-stage startups or building your own products (bonus) - Have heard of Animations.Dev, Devouring Details, or InterfaceCraft (bonus) Benefits - Ship real products that get real usage, from customer-facing features to internal tools - Work cross-functionally. Partner with product, design, ops, GTM, data, and CS - Use the latest tech. Build with the latest coding tools and infrastructure - Move fast. Ship quickly and get rapid feedback from real users and teammates - Unlimited PTO with a minimum 20 days vacation policy
Role Description UL is seeking a Functional Safety Expert with strong electronic system, hardware, and software design and safety development experience. This role will be deeply involved in the creation of safety work products for a wide range of vehicle and automation features, including complex systems such as: - Electric and autonomous vehicle platforms - Industrial automation equipment - Battery and energy storage systems for both mobile and stationary applications The role covers a broad set of safety‑related standards, including but not limited to: - ISO 26262 - IEC 61508 - IEC/EN 62061 - ISO 61511 - ISO 21448 (SOTIF) - UL 4600 - ISO 13849 - Other applicable standards pertaining to functional safety, hardware reliability, and system engineering Company Description A global leader in applied safety science, UL Solutions (NYSE: ULS) transforms safety, security and sustainability challenges into opportunities for customers in more than 110 countries. UL Solutions delivers testing, inspection and certification services, together with software products and advisory offerings, that support our customers’ product innovation and business growth. The UL Mark serves as a recognized symbol of trust in our customers’ products and reflects an unwavering commitment to advancing our safety mission. When you combine the gravitas of a 130-year-old safety science leader with a culture of innovation that pushes the boundaries in new services and technology, you get UL Solutions. As a global team of business and engineering experts, we develop software for a huge range of industries, including: - Medical - Sustainability - Renewable energy - Healthy buildings Together, we’re dedicated to making a real-world difference, to using our knowledge for good and to pioneering positive change. Join our team and use your expertise to transform global safety, security and sustainability.

