Kentik

The network observability company.

Staff Site Reliability Engineer, Cloud

DevOps EngineerDevOps EngineerOther Remote LeadTeam 201-500H1B SponsorCompany Site LinkedIn

Location

United States

Posted

134 days ago

Salary

$165K - $200K / year

Seniority

Lead

Bachelor Degree8 yrs expEnglishAnsible AWS Azure DNS Docker Firewalls GCP Grafana Kubernetes Linux Prometheus Puppet Python TCP/IP Terraform

Job Description

• Make sure our real-time, scalable, infrastructure is set up for growth and working efficiently. Our infrastructure runs on our own hardware, across multiple locations as well as all major cloud vendors • Work on tools and processes to better monitor our platform as well as ensuring its stability through our rapid growth • Deep-diving into diverse topics, from firewalls and IP routing, to database replication strategies or automating build processes • Collaborate with engineering and infrastructure teams on finding solutions from an operational perspective • Assist with expanding our cloud deployments across the major cloud providers • Contribute code, code reviews and tools or patches to all kinds of existing code • Write design documents or collaborate on colleagues’ docs to introduce new features or changes into our infrastructure • Provide valuable feedback on team goals, projects, and processes. We believe in continuously improving our team

Job Requirements

8+ years of experience in cloud-based Systems Administration, IT and/or SRE related projects
Expertise in public cloud environments such as AWS, GCP, Azure, or OCI.
Strong command of containerization and orchestration using Docker and Kubernetes.
Solid programming and automation skills using Bash, Python, or Go.
Proficiency with Infrastructure as Code (IaC) and configuration management platforms such as Terraform, Ansible, and Puppet.
Proficiency in Linux administration and command-line tools (e.g., SSH, grep, awk).
Detailed understanding of major internet protocols (TCP/IP, DNS, HTTP, TLS)
Networking administration experience: concepts such as routing, firewalls (iptables), peering sound familiar
A passion for documenting code, processes, and infrastructure in runbooks and wikis
Worked with metrics monitoring solutions such as grafana, prometheus, telegraf, and OpenTelemetry
Experience creating and managing tickets with third party vendors and owning cloud vendor partner relationships.

Benefits

100% of premiums are paid by company for health, vision and dental coverage for you and your dependents
Additionally, an annual Health Reimbursement Account (HRA) of $3,000 for an individual or $4,500 for a family
Paid family & medical leave
Open PTO, a quarterly Wellness Day, and a minimum of 10 paid holidays
401(k) retirement account
Home office reimbursement
Stock options

Related Categories

DevOps Engineer

Related Job Pages

Remote Python Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

Principal DevOps Engineer

SageSure Insurance Managers

SageSure is an insurance company and division of Insight Catastrophe Group, a New York-based company that delivers property risk management services. As an empl

DevOps Engineer134 days ago

Other Remote

Company Site

• Drive the development and continuous improvement of platform tools, emphasizing scalability, reliability, and monitoring capabilities to effectively support engineering teams. • Design and implement self-service tools and frameworks that empower engineering teams, promoting scalability, efficiency, and reusability across various platforms. • Provide expert-level technical oversight and mentorship to engineering teams, ensuring platform capabilities are seamlessly integrated into workflows and aligned with organizational goals. • Establish and maintain comprehensive technical documentation and engineering standards, ensuring platform tools remain understandable, extensible, and accessible to all teams. • Analyze and resolve complex performance issues within platform tools, identifying root causes, and implementing robust, scalable solutions to enhance efficiency and reliability. • Proactively research and adopt new technologies, tools, and engineering patterns that elevate developer productivity and improve self-service capabilities. • Focus extensively on scalability, performance optimization, and sustainable software delivery, ensuring efficient resource utilization and cost effectiveness. • Actively participate in on-call rotations, providing critical expertise and technical guidance to maintain production environment resilience and high availability.

AWS Distributed Systems Docker Kubernetes Nginx Python

View details: Principal DevOps Engineer

United States

Apply

Job Closed

Senior DevOps Engineer

Agiloft

The global standard in no-code contract lifecycle management (CLM) software.

DevOps Engineer134 days ago

Full Time RemoteTeam 201-500Since 1991H1B Sponsor

Company Site LinkedIn

• help design, build, and maintain a stable and efficient infrastructure to optimize service delivery cross production throughout the development lifecycle • monitor, troubleshoot, maintain, and continuously improve building, packaging and deployment processes • collaborate within the Cloud Ops team as well as with QA and development to troubleshoot performance issues

Ansible AWS Azure Chef Cloud Docker Grafana Jenkins Kubernetes Linux Prometheus Python Ruby Terraform

View details: Senior DevOps Engineer

United Kingdom

Apply

Job Closed

Site Reliability Engineer

AutoRABIT Holding

DevOps Engineer134 days ago

Other Remote

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more. Role Description AutoRABIT is looking for a Site Reliability/DevSecOps Engineer to help develop, scale and operate our cloud services. In this role you will be an experienced business professional able to implement and execute best practice operations and improvements across teams by providing visibility and recommendations for improved reliability and automation. Responsible for the security, availability, performance, efficiency, change management, monitoring, emergency response, capacity planning, back-up, and disaster recovery of our technical ecosystem, as well as drive automation while building a robust and agile DevSecOps framework. Accountability, agility and strong analytical skills paired with an obsession for learning, gathering data and executing on that data, are key to being successful in this role. Responsibilities - Site Reliability or DevSecOps engineer with a passion for automation, reliability, scalability, monitoring, and capacity planning. - Contribute to the development and maintenance of frameworks for monitoring, automation and code to increase the scalability and reliability of the service. - Assist both internal and customer facing teams with deployment of new software releases, VPN and other related security infrastructure interfacing. - Assist with resolution of AutoRABIT service or customer issues as required. - Participate in and practice sustainable incident response and blameless postmortems. - Contribute to the automation of manual tasks, such as the provisioning of users in production and test environments. - Work within a small agile team to develop and improve SRE software, support your peers, plan and self-improve. - Participate in a regular on-call or rotational schedule needed to support AutoRABIT servers, including weekends and holidays. Qualifications - Experience with deployment and maintenance of scalable, resilient, and secure infrastructure with AWS, GCP, and/or Azure based infrastructure cloud and services and automation. - Knowledge of key DevSecOps tools for monitoring (ELK, AWS Azure CloudWatch etc.), Infrastructure management platforms (Kubernetes, Docker, Ansible, Jenkins, Terraform etc.). - Experience with Shell Scripting (Bash), Python or equivalent is required. - Knowledge of programming languages such as Python, Go, or Java. - Experience with configuration management tools such as Ansible or Chef. - Solid understanding of CI/CD pipelines and tools such as Jenkins, GitLab CI, or CircleCI. - Excellent troubleshooting skills in SaaS, or customer environments. - Team player, receiving and giving feedback as well as sharing knowledge. - Can-do attitude: challenging status, leading, and contributing to key improvements and innovations, while maintaining accountability. - Excellent written and verbal US English communication skills for working across a global team environment. - Responsible to adhere to set internal controls. Requirements - Bachelor's in computer science, Engineering, or equivalent degree or experience. - 2+ years of experience in Infrastructure Management, DevOps or Site Reliability preferably in a SaaS or cloud environment. - AWS, GCP and/or Azure Certified. - 2+ Years of Kubernetes experience. - 2+ years' experience managing Linux-based systems in a public cloud such as AWS, GCP, or Azure. - 2+ years of experience with systems monitoring and logging; knowledge of ELK is preferred. - Solid understanding of standard TCP/IP networking and common protocols like DNS, load balancers, HTTP, etc. - Must be a US citizen/permanent resident of the US, and capable of obtaining a Government Security clearance if required and live and work from the US. Green card holders qualify, but H1B or other work visa holders do not qualify for this role. Benefits - Salary range for the role is $150,000 to $175,000 per year, depending on experience. - THIS IS A 100% REMOTE JOB.

AWS GCP Azure Kubernetes Docker Terraform Jenkins Ansible Python Shell GitLab CI CI/CD Linux TCP/IP DNS

View details: Site Reliability Engineer

United States

$150K - $175K / year

Apply

Job Closed

DevOps Engineer

TetraScience

TetraScience is a cloud-native technology company that develops software and hardware solutions for monitoring and managing research experiments, as well as clo

DevOps Engineer134 days ago

Other Remote

Company Site

• Collaborate with product and engineering teams to drive and enhance the entire lifecycle of our products, from design and development to deployment and operation. • Work closely with clients to deploy and troubleshoot our products in clients' AWS environments ensuring smooth integration and optimal performance. • Develop CloudFormation templates, Terraform modules, Python scripts, deployment frameworks, monitors, and self-healing tools to automate processes and improve efficiency. • Assist the software engineering team in building accurate monitoring and metrics systems for applications before they go into production. • Manage the internal AWS environments and network, ensuring stability, security, and scalability while keeping costs in check • Participate in meetings with potential clients, working alongside solution architects to address their questions and concerns regarding integration of our products into their network and AWS accounts. • Maintain up-to-date documentation on deployments, processes, and standard operating procedures.

AWS Docker Java Kubernetes Linux Microservices Python Terraform

View details: DevOps Engineer

United States

Apply

Job Closed

Staff Site Reliability Engineer, Cloud

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Principal DevOps Engineer

Senior DevOps Engineer

Site Reliability Engineer

DevOps Engineer