Bright Vision Technologies is a forward-thinking software development company dedicated to building innovative solutions that help businesses automate and optimize their operations. We leverage cutting-edge technologies to create scalable, secure, and user-friendly applications. We recognize that our people are our strength. We are an equal opportunity employer and place a high value on diversity and inclusion. We do not discriminate on the basis of any protected attribute. We make reasonable accommodations for applicants’ and employees’ religious practices and beliefs, as well as mental health or physical disability needs. Bright Vision Technologies is an Equal Opportunity Employer, including Disability/Veterans.
Site Reliability Engineer
Location
United States
Posted
2 days ago
Salary
$100K - $150K / year
Seniority
Mid Level
No structured requirement data.
Job Description
Site Reliability Engineer
Bright Vision Technologies
Role Description We are seeking an experienced Site Reliability Engineer to ensure the availability, performance, and operational excellence of large-scale distributed systems in production. As an SRE you will live at the boundary between development and operations, applying strong software engineering principles to infrastructure and operations problems, and continually pushing the platform toward higher reliability with lower operational toil. The ideal candidate will combine deep systems knowledge with strong programming skills, a measurement-driven mindset, and the discipline to design, automate, and operate complex services so that reliability becomes a first-class engineering deliverable rather than a reactive concern. Key Responsibilities - Define, instrument, and continually refine service-level objectives (SLOs), service-level indicators (SLIs), and error budgets for critical services. - Lead incident response and resolution for production issues, acting as a calm and effective incident commander when needed. - Ensure high-quality post-incident reviews that drive lasting improvements. - Design and implement comprehensive monitoring, logging, and tracing strategies using tools like Prometheus, Grafana, OpenTelemetry, ELK/EFK, Datadog, or similar. - Build and maintain robust on-call processes, runbooks, and escalation paths. - Automate operational toil aggressively by writing production-grade tooling in Python, Go, Bash, or similar languages. - Architect and operate large-scale Kubernetes clusters and container-based workloads. - Design CI/CD pipelines that promote safe, frequent, and observable releases. - Lead capacity planning and performance engineering activities. - Partner closely with application development teams to embed reliability practices early in design. - Strengthen the platform’s resiliency through chaos engineering and fault injection. - Drive continuous improvement of security posture in collaboration with security teams. - Contribute to the technical roadmap for reliability tooling and observability platforms. - Mentor engineers across the organization on SRE practices. Qualifications - Bachelor’s degree in Computer Science, Engineering, or a related technical discipline. - Five or more years of SRE, DevOps, or production engineering experience supporting large-scale distributed systems. - Strong programming skills in at least one of Python, Go, or Java. - Deep, hands-on experience operating Linux at scale. - Production experience operating Kubernetes and container-based workloads. - Strong working knowledge of observability tooling such as Prometheus, Grafana, OpenTelemetry, ELK/EFK, or commercial equivalents. - Hands-on experience designing and operating CI/CD pipelines. - Solid understanding of distributed system design. - Demonstrated experience leading incident response and conducting effective post-incident reviews. - Excellent communication and documentation skills. Preferred Qualifications - Experience defining and operationalizing SLOs and error budgets in real production environments. - Exposure to chaos engineering practices and tools such as Chaos Monkey, Gremlin, or Litmus. - Hands-on experience with at least one major cloud platform (AWS, Azure, or GCP). - Background in capacity planning, performance engineering, or large-scale load testing. - Familiarity with service mesh technologies such as Istio, Linkerd, or Consul. How to Apply Would you like to know more about this opportunity? For immediate consideration, please send your resume to [email protected] or contact us at (908) 505-3899. Learn more about Bright Vision Technologies at www.bvteck.com .
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Build and operate the AWS GovCloud environment for federal customers • Design and implement infrastructure-as-code for dedicated environments • Own the container image pipeline for government deployment • Identify and address availability risks and monitoring gaps • Collaborate with assessment partners for FedRAMP documentation • Enable product engineers to enhance features across environments • Define separation of compliance functions from engineering operations • Support federal customers in CMMC environment with escalations and support issues
• Automation and reliability of NBCU’s Live sources • Delivery of 200+ NBC and Telemundo stations and live events • Build automation solutions to deploy and maintain applications • Work on high-visibility projects • Collaborate with 3rd party vendors • Develop Infrastructure as Code • Participate in troubleshooting activities • Analyze current technology and develop improvement processes • Develop robust CI/CD pipelines • Participate in an on-call rotation for L2 support
• Design and implement cloud-native architectures on AWS, emphasizing serverless-first patterns where appropriate. • Build and maintain production-ready serverless APIs using AWS Lambda, API Gateway, and related integration patterns. • Create event-driven workflows and orchestration using AWS Step Functions. • Design and maintain containerized workloads using Docker and Amazon ECS on Fargate. • Develop and manage infrastructure using AWS CloudFormation. • Build and maintain GitLab CI/CD pipelines for automated testing, security checks, deployments, and rollbacks. • Architect secure AWS environments leveraging IAM, encryption, and secure secrets management. • Design and maintain MySQL/Amazon RDS environments, including backups, parameter tuning, and secure connectivity.
Senior DevSecOps Engineer
Arbor EducationArbor Education, founded in 2011 and based in London, England, United Kingdom, is the country's fastest-growing management information system provider, serving
• Collaborate with stakeholders to pinpoint security enhancements across platform architecture and infrastructure, devising and executing strategic plans for implementation • Work closely with the Platform team to embed robust security processes, controls, and tooling across all system components • Threat model new and existing systems — including AI/LLM-enabled features and agentic workflows — and translate findings into prioritised, actionable work • Strengthen our software supply chain: dependency and base-image hygiene, SBOM generation, artefact signing and provenance, and the pinning of third-party actions and packages • Secure the use of AI across the SDLC, ensuring agentic coding tools, assistants, and MCP integrations operate within safe, well-scoped, and auditable boundaries • Contribute to the evolution of deployment frameworks, emphasising security, deployment speed, and system stability • Elevate platform security through strong secrets management and the safe handling of sensitive information • Play an active role in incident response, resolution, and blameless post-mortems, facilitating continuous improvement • Participate in knowledge-sharing initiatives, including tech-talks and team-based learning sessions • Maintain meticulous, current documentation — playbooks, runbooks, and comprehensive systems documentation — to facilitate knowledge dissemination


