Site Reliability Engineer

EngineerEngineerFull Time Remote Senior Company Site

Location

Worldwide

Posted

56 days ago

Salary

Seniority

Senior

Bachelor DegreeDistributed Systems Observability/Monitoring Prometheus Grafana OpenTelemetry Datadog Python Shell Kubernetes CI/CD Mode Java Linux AWS Azure GCP Istio Linkerd Consul

Job Description

Title: Site Reliability Engineer (SRE) Location: 100% Remote (Continental United States) Job Description: Bright Vision Technologies is a forward-thinking software development company dedicated to building innovative solutions that help businesses automate and optimize their operations. We leverage cutting-edge technologies to create scalable, secure, and user-friendly applications. As we continue to grow, we’re looking for a skilled Site Reliability Engineer (SRE) to join our dynamic team and contribute to our mission of transforming business processes through technology. This is a fantastic opportunity to join an established and well-respected organization offering tremendous career growth potential. Position Type: In-house Bright Vision Technologies SOW engagement (no third-party client or vendor) Experience: 5+ years Sponsorship: No new H1B sponsorship available. H1B transfers welcomed for qualified candidates. Employment Type: Full-time, direct W2 with Bright Vision Technologies (no C2C, no 1099, no third-party) Engagement: Long-term, multi-year, aligned to the Bright Vision SOW delivery roadmap Compensation: Competitive base salary commensurate with experience, plus benefits. Employment Terms & Visa Policy This is a 100% remote, full-time, direct W2 position with Bright Vision Technologies. This role is part of Bright Vision Technologies’ in-house Statement of Work (SOW) engagement. The client, end customer, and employer for this position is Bright Vision Technologies — there is no third-party client, vendor, or implementation partner involved. We do not engage in C2C, 1099, or third-party arrangements for this role. BUT STRICTLY NO C2C/1099/3RD PARTY COMPANIES. ALL OUR ROLES ARE W2 AND NO 3RD PARTY BROKERING PLEASE. Candidates must be willing to work directly as a full-time W2 employee of Bright Vision Technologies and contribute to our in-house SOW deliverables. No new H1B sponsorship is available for this role. However, candidates who are currently on a valid H1B visa and require a transfer are welcome to apply. We will support H1B transfers for qualified candidates. For every role, a technical coding assessment is mandatory. Please apply only if you are confident in your technical abilities and hands-on experience. Job Summary We are seeking an experienced Site Reliability Engineer to ensure the availability, performance, and operational excellence of large-scale distributed systems in production. As an SRE you will live at the boundary between development and operations, applying strong software engineering principles to infrastructure and operations problems, and continually pushing the platform toward higher reliability with lower operational toil. The ideal candidate will combine deep systems knowledge with strong programming skills, a measurement-driven mindset, and the discipline to design, automate, and operate complex services so that reliability becomes a first-class engineering deliverable rather than a reactive concern. Key Responsibilities - Define, instrument, and continually refine service-level objectives (SLOs), service-level indicators (SLIs), and error budgets for critical services, and use those measures to drive concrete engineering and prioritization decisions. - Lead incident response and resolution for production issues, acting as a calm and effective incident commander when needed, and ensuring high-quality post-incident reviews that drive lasting improvements. - Design and implement comprehensive monitoring, logging, and tracing strategies using Prometheus, Grafana, OpenTelemetry, ELK/EFK, Datadog, or similar tooling so that operators have rich, actionable visibility into system behavior. - Build and maintain robust on-call processes, runbooks, and escalation paths that reduce mean time to detect and mean time to resolve while protecting the well-being of the engineers on rotation. - Automate operational toil aggressively by writing production-grade tooling in Python, Go, Bash, or similar languages, replacing manual workflows with reliable, auditable automation. - Architect and operate large-scale Kubernetes clusters and container-based workloads, including autoscaling, capacity planning, network policy, and integration with service meshes. - Design CI/CD pipelines that promote safe, frequent, and observable releases, supported by automated testing, canary deployments, feature flags, and progressive rollout strategies. - Lead capacity planning and performance engineering activities, building models that predict growth and stress, and validating those models through load testing and chaos experiments. - Partner closely with application development teams to embed reliability practices early in design — including failure-mode analyses, graceful degradation patterns, and dependency hardening. - Strengthen the platform’s resiliency through chaos engineering, fault injection, dependency isolation, retries, timeouts, circuit breakers, and well-tested failover paths. - Drive continuous improvement of security posture in collaboration with security teams, including patch management, vulnerability remediation, and secure-by-default platform defaults. - Contribute to the technical roadmap for reliability tooling, observability platforms, and developer-experience improvements that reduce friction and improve outcomes for engineering teams. - Mentor engineers across the organization on SRE practices and foster a strong, blameless culture of operational excellence. Required Qualifications - Bachelor’s degree in Computer Science, Engineering, or a related technical discipline. - Five or more years of SRE, DevOps, or production engineering experience supporting large-scale distributed systems. - Strong programming skills in at least one of Python, Go, or Java, with the ability to build robust automation and tooling. - Deep, hands-on experience operating Linux at scale, including networking, performance tuning, and systems-level troubleshooting. - Production experience operating Kubernetes and container-based workloads. - Strong working knowledge of observability tooling such as Prometheus, Grafana, OpenTelemetry, ELK/EFK, or commercial equivalents. - Hands-on experience designing and operating CI/CD pipelines for both infrastructure and applications. - Solid understanding of distributed system design, including consistency models, partitioning, and failure semantics. - Demonstrated experience leading incident response and conducting effective post-incident reviews. - Excellent communication and documentation skills. Preferred Qualifications - Experience defining and operationalizing SLOs and error budgets in real production environments. - Exposure to chaos engineering practices and tools such as Chaos Monkey, Gremlin, or Litmus. - Hands-on experience with at least one major cloud platform (AWS, Azure, or GCP). - Background in capacity planning, performance engineering, or large-scale load testing. - Familiarity with service mesh technologies such as Istio, Linkerd, or Consul. How to Apply Would you like to know more about this opportunity? For immediate consideration, please send your resume to [email protected] or contact us at (908) 505-3544. Learn more about Bright Vision Technologies at www.bvteck.com. We recognize that our people are our strength, and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. We also make reasonable accommodations for applicants’ and employees’ religious practices and beliefs, as well as mental health or physical disability needs. Bright Vision Technologies is an Equal Opportunity Employer, including Disability/Veterans. Position offered by “No Fee Agency.” Equal Employment Opportunity (EEO) Statement Bright Vision Technologies (BV Teck) is committed to equal employment opportunity (EEO) for all employees and applicants without regard to race, color, religion, sex, sexual orientation, gender identity or expression, national origin, age, genetic information, disability, veteran status, or any other protected status as defined by applicable federal, state, or local laws. This commitment extends to all aspects of employment, including recruitment, hiring, training, compensation, promotion, transfer, leaves of absence, termination, layoffs, and recall. BV Teck expressly prohibits any form of workplace harassment or discrimination. Any improper interference with employees' ability to perform their job duties may result in disciplinary action up to and including termination of employment.

Related Categories

Engineer

Related Job Pages

Remote Full-time Jobs (US)Remote Python Jobs (US)More Remote Jobs

More Engineer Jobs

Storage Engineer (NetApp, Pure, Ceph)

Bright Vision Technologies

Engineer56 days ago

Full Time Remote

Company Site

Title: Storage Engineer (NetApp / Pure / Ceph) Location: Remote Full Time Experienced Job Description: Bright Vision Technologies is a forward-thinking software development company dedicated to building innovative solutions that help businesses automate and optimize their operations. We leverage cutting-edge technologies to create scalable, secure, and user-friendly applications. As we continue to grow, we’re looking for a skilled Storage Engineer (NetApp / Pure / Ceph) to join our dynamic team and contribute to our mission of transforming business processes through technology. This is a fantastic opportunity to join an established and well-respected organization offering tremendous career growth potential. Job Title: Storage Engineer (NetApp / Pure / Ceph) Location: 100% Remote (Continental United States) Position Type: In-house Bright Vision Technologies SOW engagement (no third-party client or vendor) Experience: 5+ years Sponsorship: No new H1B sponsorship available. H1B transfers welcomed for qualified candidates. Employment Type: Full-time, direct W2 with Bright Vision Technologies (no C2C, no 1099, no third-party) Engagement: Long-term, multi-year, aligned to the Bright Vision SOW delivery roadmap Compensation: Competitive base salary commensurate with experience, plus benefits. Employment Terms & Visa Policy This is a 100% remote, full-time, direct W2 position with Bright Vision Technologies. This role is part of Bright Vision Technologies’ in-house Statement of Work (SOW) engagement. The client, end customer, and employer for this position is Bright Vision Technologies — there is no third-party client, vendor, or implementation partner involved. We do not engage in C2C, 1099, or third-party arrangements for this role. BUT STRICTLY NO C2C/1099/3RD PARTY COMPANIES. ALL OUR ROLES ARE W2 AND NO 3RD PARTY BROKERING PLEASE. Candidates must be willing to work directly as a full-time W2 employee of Bright Vision Technologies and contribute to our in-house SOW deliverables. No new H1B sponsorship is available for this role. However, candidates who are currently on a valid H1B visa and require a transfer are welcome to apply. We will support H1B transfers for qualified candidates. For every role, a technical coding assessment is mandatory. Please apply only if you are confident in your technical abilities and hands-on experience. Job Summary We are seeking a Storage Engineer with deep expertise across enterprise storage platforms — NetApp ONTAP, Pure Storage, and Ceph — to design, deploy, and operate the storage foundation that supports our compute, virtualization, database, and Kubernetes workloads. The role spans block, file, and object storage across data center, edge, and cloud environments, with strong attention to performance, data protection, and cost. The ideal candidate has operated heterogeneous storage estates at scale, understands the storage characteristics of diverse workloads, and brings strong automation discipline to storage engineering work. Key Responsibilities - Design and operate enterprise storage platforms across NetApp ONTAP, Pure Storage FlashArray/FlashBlade, and Ceph. - Implement and manage SAN, NAS, and object storage across data center and cloud environments. - Build storage solutions for VMware, Kubernetes (CSI drivers), and database workloads. - Design and operate replication, snapshot, and backup strategies for critical data assets. - Implement and operate cloud-tiered storage with FabricPool, CloudVolumes, or equivalent. - Build storage automation using Ansible, Terraform, REST APIs, and platform-native automation. - Operate Ceph clusters including OSD lifecycle, CRUSH maps, RGW, RBD, and CephFS. - Design and operate storage performance management for high-throughput and latency-sensitive workloads. - Implement data protection strategies including ransomware protection and immutable snapshots. - Build storage observability across capacity, performance, and health dimensions, surfacing the signals operators need to spot capacity pressure, performance regressions, and failing components before users feel the impact. - Lead storage upgrades, firmware lifecycle management, and security patching across the storage estate, planning each campaign to maintain availability for the workloads dependent on the platform. - Drive storage cost optimization including deduplication, compression, and tiering strategies. - Troubleshoot complex storage issues spanning array, network, and host layers. - Stay current with storage industry developments and emerging platforms, regularly review release notes and community discussions, and translate noteworthy advances into concrete recommendations and adoption proposals for the team. Required Qualifications - Bachelor’s degree in Computer Science, Information Systems, or a related field. - Five or more years of enterprise storage engineering experience. - Deep expertise in NetApp ONTAP or Pure Storage FlashArray/FlashBlade. - Hands-on experience with Ceph at production scale. - Strong understanding of SAN, NAS, and object storage protocols. - Experience with storage integration for VMware and Kubernetes (CSI). - Strong scripting and automation skills using Ansible, Terraform, or REST APIs. - Strong understanding of storage performance, capacity planning, and cost optimization. - Strong troubleshooting skills across storage, network, and host layers. - Excellent communication and collaboration skills. Preferred Qualifications - Vendor certifications (NCDA, NCIE, Pure Storage certifications). - Experience with hybrid-cloud storage replication. - Familiarity with software-defined storage platforms beyond Ceph. - Experience with object storage at petabyte scale. - Exposure to AI/ML workloads with extreme I/O requirements. How to Apply Would you like to know more about this opportunity? For immediate consideration, please send your resume or contact us. Learn more about Bright Vision Technologies at www.bvteck.com. We recognize that our people are our strength, and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. We also make reasonable accommodations for applicants’ and employees’ religious practices and beliefs, as well as mental health or physical disability needs. Bright Vision Technologies is an Equal Opportunity Employer, including Disability/Veterans. Position offered by “No Fee Agency.” Equal Employment Opportunity (EEO) Statement Bright Vision Technologies (BV Teck) is committed to equal employment opportunity (EEO) for all employees and applicants without regard to race, color, religion, sex, sexual orientation, gender identity or expression, national origin, age, genetic information, disability, veteran status, or any other protected status as defined by applicable federal, state, or local laws. This commitment extends to all aspects of employment, including recruitment, hiring, training, compensation, promotion, transfer, leaves of absence, termination, layoffs, and recall. BV Teck expressly prohibits any form of workplace harassment or discrimination. Any improper interference with employees' ability to perform their job duties may result in disciplinary action up to and including termination of employment.

Kubernetes VMware Ansible Terraform REST API Observability/Monitoring AI/ML

View details: Storage Engineer (NetApp, Pure, Ceph)

Worldwide

Apply

Technical Services Engineer, 2nd Shift

MongoDB

MongoDB, originally called 10gen, is a software development company. Since 2007, MongoDB has created an open-source, document-oriented database to help clients

Engineer56 days ago

Full Time RemoteTeam 5,550Since 2008

Company Site

MongoDB Technical Services Engineers use their exceptional problem solving and customer service skills, along with their deep technical experience, to advise customers and to solve their complex MongoDB problems. Technical Service Engineers are experts in the entire MongoDB ecosystem - database server, drivers, cloud and infrastructure. This also includes services such as Atlas (database as a service), or Cloud Manager (which helps customers with automation, backup and monitoring of their MongoDB systems). Our engineers combine their MongoDB expertise with passion, initiative, teamwork and a great sense of humor to achieve exceptional results for our customers. We are looking to speak to candidates who are based in Hawaii or near one of our West Coast offices for our hybrid or remote working models. The Federal Risk and Authorization Management Program (FedRAMP) is a US government-wide program that provides a standardized approach to security assessment, authorization, and continuous monitoring for cloud products and services. Our FedRAMP program requires that anyone who is accessing customer data or metadata inside the Authorization Boundary be a US Citizen on US Soil. In order for us to triage and assign cases, it is necessary to be able to identify available resources at any given time. For this reason the FedRamp team is composed of three separate shifts: first shift, second shift, and third shift. This posting is for Second Shift, in which your working hours would be 12pm-9pm PST. We are looking to speak to candidates who are interested in working out of our West Coast offices under our flexible or remote working models, Monday to Friday for the first 6-9 months depending on ramping speed. Once considered ramped, they will transition to a permanent Wednesday to Sunday work week to provide weekend coverage alongside other peers. Saturdays and Sundays are considered fully online workdays and not an on-call shift. Due to the 24/7 nature of our support organization, certain events throughout the year will require volunteering for coverage outside one's normal work days or work hours (i.e. regional offsites, regional holidays, etc). These are typically announced weeks in advance with a sign-up system that considers equitability. Cool things you’ll doMongoDB is on a mission to change the way people think about databases. Along the way, our customers encounter questions and issues about how our approach to databases works for their use case. In Technical Services, it's our job to help these people. You'll be working alongside our largest customers, solving their complex issues - resolving questions on architecture, performance, recovery, security, and everything in between. You'll be an expert resource on standard methodologies in running MongoDB at scale, whatever that scale may be. You'll be an advocate for customers' needs - working with our product management and development teams on their behalf. And you'll contribute to internal projects, including software development of support tools for performance, benchmarking, and diagnostics. In addition, you will also be responsible for mentoring and ramping new team members and taking initiatives in building knowledge of new product lines within the MongoDB ecosystem. What you needWe consider all candidates with an eye for those who are self taught, insatiably curious, and multi-faceted. It’s important for candidates to check off these boxes: - Systems engineering experience, including Linux performance, memory management, I/O tuning, configuration, security, networking, clusters, and troubleshooting - Should have a good understanding of Networking concepts and protocols (DNS, TCP/IP, SSL/TLS, etc.) - Storage engineering experience, including NAS, SAN, SSD, multi-pathing, and caching - Experience building and maintaining complex mission-critical production database systems - Be able to read code, and basic coding/scripting ability in one or more languages: Java, Python, Ruby, C, C++, C#, Javascript, node.js, Go, etc. - Broad awareness of customer workloads and use cases, including performance, availability, and scalability - Experience analyzing issues holistically, from the application tier through the database, down to the storage - Genuine desire to help people - Ability to think on your feet, remain calm under pressure, and solve problems in real-time - Desire and ability to rapidly learn a wide variety of new technical skills - Collaboration: willingness and ability to get help from team members when required, and the good judgment to know when to seek help Bonus Points - Experience managing large scale databases (e.g. RDBMS, NoSQL) - Experience with authentication systems such as LDAP, Kerberos, AD - Experience using or managing MongoDB - Experience using distributed version control systems, and in particular git - Experience using any cloud services stack such as AWS, Azure or GCP - Experience with Kubernetes or other orchestration systems using vendor-specific Enterprise operators. Special Requirements - Be a US Citizen Success Measures - In 3 months, you’ll have gained a deep understanding of MongoDB and its ecosystem. You will complete New Hire Training - In 6 months, you will be comfortable working frontline with our customers. You will also complete the MongoDB Certified DBA Associate exam - In 12 months, you will be a technical specialist within MongoDB and will be helping your peer engineers in advance diagnostics. Also, you will be encouraged to handle technical escalations independently About MongoDBMongoDB is built for change, empowering our customers and our people to innovate at the speed of the market. We have redefined the database for the AI era, enabling innovators to create, transform, and disrupt industries with software. MongoDB’s unified database platform, the most widely available, globally distributed database on the market, helps organizations modernize legacy workloads, embrace innovation, and unleash AI. Our cloud-native platform, MongoDB Atlas, is the only globally distributed, multi-cloud database and is available across AWS, Google Cloud, and Microsoft Azure. With offices worldwide and over 60,000 customers, including 75% of the Fortune 100 and AI-native startups, relying on MongoDB for their most important applications, we’re powering the next era of software. Our compass at MongoDB is our Leadership Commitment, guiding how and why we make decisions, show up for each other, and win. It’s what makes us MongoDB. To drive the personal growth and business impact of our employees, we’re committed to developing a supportive and enriching culture for everyone. From employee affinity groups, to fertility assistance and a generous parental leave policy, we value our employees’ wellbeing and want to support them along every step of their professional and personal journeys. Learn more about what it’s like to work at MongoDB, and help us make an impact on the world! MongoDB is committed to providing any necessary accommodations for individuals with disabilities within our application and interview process. To request an accommodation due to a disability, please inform your recruiter. MongoDB, Inc. provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type and makes all hiring decisions without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws. Req ID: 1273373875 MongoDB’s base salary range for this role is posted below. Compensation at the time of offer is unique to each candidate and based on a variety of factors such as skill set, experience, qualifications, and work location. Salary is one part of MongoDB’s total compensation and benefits package. Other benefits for eligible employees may include: equity, participation in the employee stock purchase program, flexible paid time off, 20 weeks fully-paid gender-neutral parental leave, fertility and adoption assistance, 401(k) plan, mental health counseling, access to transgender-inclusive health insurance coverage, and health benefits offerings. Please note, the base salary range listed below and the benefits in this paragraph are only applicable to U.S.-based candidates. MongoDB’s base salary range for this role in the U.S. is: $90,000—$176,000 USD

View details: Technical Services Engineer, 2nd Shift

Hawaii + 4 more

$90K - $176K / year

Apply

Forward Deployed Engineer – Oil, Gas, Chemicals

Picarro

Empowering the world through timely, trusted and actionable data through enhanced optical spectroscopy.

Engineer56 days ago

Full Time RemoteTeam 201-500Since 2003H1B Sponsor

Company Site LinkedIn

• The Forward Deployed Engineer – Oil, Gas & Chemicals is responsible for driving adoption, enablement, and value realization at customer sites during pilots and early-stage engagements. • Working with one customer at a time, the FDE ensures that Picarro's Fenceline Monitoring Solution and PicarroLink platform are embedded into Environmental, HSSE, and Operations workflows as an active, real-time tool — not a passive data archive. • Delivery, commissioning, and hardware support are handled by Picarro's Operations and Delivery function; the FDE collaborates with that team but is focused entirely on the adoption and enablement work that determines whether a customer changes how they operate. • Success is measured by the depth and consistency of customer adoption, the quality of value realized during the engagement, and the insights fed back to Customer Success and Product. • Participate in pre-sales conversations, technical working sessions, and pilot design — bringing operational credibility to discussions about how PicarroLink integrates into a customer's existing workflows, data environment, and compliance obligations. • Help define what success looks like and how it will be measured. • Ongoing engagement with HSSE and Operations workers and leaders to build early alignment on objectives, establish relationships that support access and credibility during deployment, and ensure that pilot design reflects real operational constraints. • Design and deliver structured, hands-on onboarding for Environmental, HSSE, and Operations personnel. • Build confidence in PicarroLink — alerts, dashboards, event interpretation, and daily data review. Develop site-specific superusers who can train and sustain adoption across shifts and personnel changes, reducing dependency on Picarro's presence over time. • Drive this cycle into existing daily operational procedures and compliance practice shifting customers from reactive, schedule-driven programs to proactive, signal-driven operations. • Own the goal of consistent, meaningful daily use of PicarroLink across Environmental and Operations functions. Lead on the software side — configuring detection thresholds, alerts, and dashboards in collaboration with the customer team; demonstrating and embedding digital ways of working as the new operational standard. • Make the shift to real-time feel natural and inevitable. • Work closely with the assigned Customer Success Manager to maintain a real-time view of adoption health, workflow integration status, and value realization. Contribute to Quarterly Business Reviews, success criteria tracking, and customer health reporting. Ensure Customer Success has the ground-level insight needed to manage the account effectively. • Translate direct field observations — workflow gaps, usability friction, unmet customer needs — into structured, actionable product feedback. Serve as a primary Voice of Customer input to the Product Owner and Industry Solution Director.

View details: Forward Deployed Engineer – Oil, Gas, Chemicals

California + 3 more

$100K - $150K / year

Apply

Staff Product Engineer

LawnStarter

Engineer56 days ago

Full Time RemoteTeam 51-200Since 2013H1B Sponsor

Company Site LinkedIn

Role Description You're the engineering anchor of one initiative at a time. The initiative is a team effort — an iron triangle of you, your PM, and your designer — and you have key participation across the full lifecycle: - Shaping the problem - Deciding the technical approach - Leading the AI agents that implement most of the code - Shipping to production - Answering for the outcome alongside the rest of the triangle You're accountable for the outcome — not for the volume of code merged. If an agent can ship it safely, your job is to make sure the agent does it right and the metric moves. If the initiative needs hand-written code in a sensitive area, you write it yourself. What makes this role different: - You lead AI agents, not humans. - You own an outcome, not a ticket queue. - You partner horizontally with PM and design. - The bar is staff, not senior. Qualifications - AI-native. Experience with Claude Code, Cursor, Codex, or equivalent. - Already operating at lead level. Experience making calls and shipping hard things. - Outcome-driven, not output-driven. Focus on metrics and experience improvement. - A strong horizontal partner. Ability to collaborate effectively with PM and design. - Decisive and documented. Ability to write down decisions and move quickly. - Raises the floor, not just the ceiling. Impact beyond individual initiatives. - Cares about customers and pros. Focus on real-world outcomes. Requirements - Leading AI agents at staff-level quality. - Owning an outcome without a tech lead. - Shipping outcomes, not features. Benefits - Competitive salary of USD $80,000–$100,000 annual base. - Work from anywhere. - High ownership and autonomy. - Fast-moving team that loves to build, learn, and grow.

AI Agents

View details: Staff Product Engineer

Worldwide

$80K - $100K / year

Apply