Senior Site Reliability Engineer, Observability

DevOps EngineerDevOps EngineerOtherRemoteSeniorTeam 201-500Since 2017H1B No SponsorCompany SiteLinkedIn

Location

United States

Posted

161 days ago

Salary

0

Seniority

Senior

Job Description

Senior Site Reliability Engineer, Observability

Chainlink Labs

• Build and orchestrate Modern OTEL-based Observability Platform • Support multiple telemetry types, like metrics, logs and traces. • Define and support modern governance in observability and problems at scale. • Ensure reliability, security, and performance exceed our defined SLAs • Work with engineers from across the company to help troubleshoot issues, deploy new products and services, and increase velocity while decreasing cognitive load • Lead the design and deployment of monitoring/observability services to detect and alert the team of needed action. • Ingest, aggregate, transform, and utilize data from a multitude of sources in our real time data pipeline. • Oversee the availability, performance, and supportability of our observability infrastructure. • Create processes around alert response operations and support the team to ensure the reliable delivery of oracle data. • Make recommendations to ensure sufficient metrics are collected to create alerts with every new feature release. • Champion reliability and security by taking the time to do your work right the first time

Job Requirements

  • 7+ years of relevant professional experience. You probably have worked on a devops, infrastructure, SRE, and/or platform team before
  • Ability to develop software outside of the scope of typical infrastructure requirements and configurations
  • Experience programming in C, C++, Java, Python, Go, Perl, or Ruby
  • Expert knowledge in all aspects of designing, developing, and managing large real-time systems
  • Experience with monitoring and logging. You know how to export metrics using Prometheus, have built a Grafana dashboard or two, and have experience with a centralized logging solution like an ELK Stack, Splunk or Grafana Stack.
  • Experience with distributed systems and container orchestration. You have maintained or even built Kubernetes clusters before and feel comfortable deploying completely new services on them
  • Strong communication skills. You can give and receive constructive feedback, and you do not shy away from planning meetings and code reviews

Benefits

  • Health insurance
  • 401(k) matching
  • Flexible work hours
  • Paid time off
  • Remote work options

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Upshop logo

SRE / DevOps Manager

Upshop

AI-Powered Total Store Operations Platform

DevOps Engineer161 days ago
OtherRemoteTeam 51-200Since 1993H1B No Sponsor

• Manage and mentor a team of SRE and DevOps engineers. • Drive hiring, onboarding, and professional development. • Set clear goals and performance metrics. • Own system uptime, performance, and reliability. • Lead incident response and root cause analysis. • Define and monitor SLAs, SLOs, and SLIs. • Oversee cloud infrastructure (Azure). • Implement Infrastructure as Code (IaC) using tools like Terraform or other similar tools. • Drive automation of CI/CD pipelines and operational tasks. • Build and manage a DevSecOps process to connect CI/CD pipelines with AzureDevOps, Gitlab etc. • Implement and maintain monitoring, alerting, and logging systems. • Use tools like Datadog or other similar tools like Prometheus, Grafana, ELK stack. • Ensure infrastructure security and compliance with industry standards. • Collaborate with InfoSec teams on audits and vulnerability management. • Work closely with software engineering, product, and QA teams. • Advocate for DevOps and SRE best practices across the organization.

Texas
Full TimeRemoteTeam 51-200H1B No Sponsor

• Define and drive the technical vision for DevOps practices across the organization • Lead architecture decisions for infrastructure, CI/CD pipelines, and cloud resources • Serve as a technical escalation point for complex infrastructure challenges • Conduct design reviews and provide guidance on reliability, security, and scalability • Design, build, and maintain cloud infrastructure on Google Cloud Platform using Terraform • Own and improve CI/CD pipelines to enable fast, safe deployments • Implement and maintain monitoring, alerting, and observability systems • Drive incident response processes and lead post-mortems to improve system resilience • Partner with product engineering teams to understand their infrastructure needs and translate them into scalable solutions • Work closely with Security to implement and maintain compliance and security best practices • Collaborate with Product and Engineering leadership on capacity planning and technical roadmaps • Mentor and coach DevOps engineers, fostering growth and technical development • Establish and document DevOps standards, runbooks, and best practices • Champion a culture of reliability, automation, and continuous improvement

Brazil
Enterprise Horizon Consulting Group logo

Senior DevSecOps Engineer

Enterprise Horizon Consulting Group

Enterprise Horizon solves complex IT and business challenges for the DoD, Federal, and Private sectors.

DevOps Engineer162 days ago
OtherRemoteTeam 11-50Since 2005H1B No Sponsor

• Lead the design, implementation, and optimization of secure DevSecOps pipelines in support of DoD applications and systems. • Assess the landscape of DevSecOps tools available to the customer, propose best practices, suggest alternatives, and identify gaps. • Integrate and deploy DevOps tools and practices in accordance with NIST 800-53 and DoD DevSecOps policies. • Develop and manage CI/CD pipelines using AWS and Azure DevOps. • Configure AWS IAM roles, CodePipeline, and CodeDeploy for cross-account deployments. • Integrate security tools (SonarQube, OWASP ZAP, Nexus, Sonatype IQ) into DevOps pipelines. • Conduct cost-benefit analysis and provided tool recommendations for security and DevOps. • Collaborate within an Agile SAFe framework, participating in PI planning sessions and aligning DevOps efforts with strategic goals. • Develop Python scripts to review ZAP findings and break automation if critical vulnerabilities are detected with web-hosted applications. • Provide technical leadership and act as a point of contact between the larger team and the customer. • Support Authority to Operate (ATO) processes through automated compliance checks, vulnerability remediation, and continuous monitoring.

District of Columbia + 1 moreAll locations: District of Columbia | Washington
Job Closed

• Design, implement, and manage cloud infrastructure on AWS using Terraform and SAM • Automate deployments and CI/CD pipelines using GitHub Actions • Develop Python scripts for automation, monitoring, and system integrations • Troubleshoot production issues and coordinate with development teams to streamline deployments • Implement automation frameworks for security, performance, and availability • Analyze infrastructure performance and suggest improvements • Collaborate with team members to improve engineering tools, policies, and procedures • Gather and aggregate logs and metrics into actionable insights

New York
$100K - $140K / year
Job Closed