Job Closed
This listing is no longer active.
The world’s #1 telematics provider, committed to advancing technology, empowering businesses and making the roads safer!
Site Reliability Engineering
Location
United States
Posted
85 days ago
Salary
0
Seniority
Mid Level
Job Description
Site Reliability Engineering
GEOTAB
Who we are: Geotab ® is a global leader in IoT and connected transportation and certified “Great Place to Work™.” We are a company of diverse and talented individuals who work together to help businesses grow and succeed, and increase the safety and sustainability of our communities. Geotab is advancing security, connecting commercial vehicles to the internet and providing web-based analytics to help customers better manage their fleets. Geotab’s open platform and Geotab Marketplace ®, offering hundreds of third-party solution options, allows both small and large businesses to automate operations by integrating vehicle data with their other data assets. Processing billions of data points a day, Geotab leverages data analytics and machine learning to improve productivity, optimize fleets through the reduction of fuel consumption, enhance driver safety and achieve strong compliance to regulatory changes. Our team is growing and we’re looking for people who follow their passion, think differently and want to make an impact. Ours is a fast paced, ever changing environment. Geotabbers accept that challenge and are willing to take on new tasks and activities - ones that may not always be described in the initial job description. Join us for a fulfilling career with opportunities to innovate, great benefits, and our fun and inclusive work culture. Reach your full potential with Geotab. To see what it’s like to be a Geotabber, check out our blog and follow us @InsideGeotab on Instagram. Join our talent network to learn more about job opportunities and company news. Who you are: We are always looking for amazing talent who can contribute to our growth and deliver results! Geotab is seeking a Site Reliability Engineer professional who with training, will be able to quickly contribute to the Site Reliability team. If you love technology, are passionate about engineering support, and are keen to join an industry leader — we would love to hear from you! What you'll do: As a part of the Site Reliability Engineering team, your key area of responsibility is to ensure the availability, reliability, and performance of Geotab's core products for our customers. This role acts as a primary escalation point, diagnosing and resolving complex application issues impacting service availability and performance of multiple large scale applications that support thousands of customers globally. SRE supports production applications and infrastructure, focusing on restoring normal service operations efficiently and contributing to long-term system stability. How you'll make an impact: - Act as a primary escalation point for critical production application/product issues. - Rapidly troubleshoot complex problems across the application stack, utilizing observability tools to identify root causes. - Coordinate effectively with development, infrastructure, and other technical teams during incidents to implement fixes and restore service swiftly. - Clearly communicate incident status, impact, and resolution steps to internal stakeholders. - Collaborate with team members to improve monitoring tools, dashboards, and alerting mechanisms for proactive detection of issues impacting Critical User Journeys (CUJs) within the application/product and computing architecture. Our complex environment encompasses monolithic applications, microservices, and a vast ecosystem of millions of hardware units. - Monitor application/product and system health proactively using a combination of tools to ensure high availability and adherence to Service Level Objectives (SLOs) / Service Level Agreements (SLAs). - Identify opportunities and implement automation tools/scripts to streamline routine operational tasks, reduce manual effort (toil), and improve response times. - Conduct system tests to validate performance, reliability, and successful remediation of issues. - Recommend design and process enhancements based on operational experience to improve overall application reliability and maintainability. - Participate in post major incident reviews (PMIRs) to analyze disruptions, document findings, track corrective actions to prevent recurrence, and identify areas of improvement for incident response processes. - Contribute to building a culture of learning from incidents. - Participate in a 24x7 on-call rotation to provide timely support for critical issues outside of business hours. What you'll bring to the role: - 3 - 5 years experience in SRE/DevOps/Tier 3. - Strong troubleshooting skills with a systematic problem-solving approach. - Extensive experience resolving critical incidents in production environments. - Strong proficiency in Linux and operational scripting (Bash, Powershell, Python). - Experience with database/dataset querying (GoogleSQL, PostgreSQL, BigData), automated configuration management (via tools like Ansible), and GitOps tools (Argo CD). - Experience with data visualization platforms (e.g., Apache Superset/BigQuery Visualizations). - Familiarity with cloud platforms (GCP/Azure/AWS), container orchestration (Kubernetes), and monitoring/alerting systems (e.g., Prometheus stack including AlertManager/Grafana). - Understanding of application environments (e.g., .NET/C#) for troubleshooting purposes. - Understanding of fundamental networking concepts (TCP/IP, HTTP, DNS, Load Balancing) are considered assets. - Familiarity with applying AI-powered tools to enhance operational efficiency in areas such as log analysis, troubleshooting assistance, incident summarization, and automation scripting. - Demonstrated ability to work well under pressure and manage multiple tasks and projects simultaneously. - Experience with incident management processes. - Experience working within a technical or engineering organization with knowledge of the high-technology industry is considered an asset. - Excellent verbal and written communication skills. - Strong analytical skills with the ability to problem solve and develop well-judged decisions. - Strong team player with the ability to engage with all levels of the organization. - Technical competence using software programs, including but not limited to, Google Suite for business (Sheets, Docs, Slides) or equivalents - Entrepreneurial mindset and comfortable in a flat organization. - To be eligible, candidates must have continuously resided in the continental United States for at least three years immediately preceding their application. Successful applicants will be required to provide verifiable documentation of continuous lawful residency. Some exceptions may apply to US citizens. - Ability to pass an enhanced background check, including a drug screening test (if applicable) and a credit check. If you got this far, we hope you're feeling excited about this role! Even if you don't feel you meet every single requirement, we still encourage you to apply. Please note: Geotab does not accept agency resumes and is not responsible for any fees related to unsolicited resumes. Please do not forward resumes to Geotab employees. Why job seekers choose Geotab: Flex working arrangements Home office reimbursement program Baby bonus & parental leave top up program Online learning and networking opportunities Electric vehicle purchase incentive program Competitive medical and dental benefits Retirement savings program *The above are offered to full-time permanent employees only How we work: At Geotab, we have adopted a flexible hybrid working model in that we have systems, functions, programs and policies in place to support both in-person and virtual work. However, you are welcomed and encouraged to come into our beautiful, safe, clean offices as often as you like. When working from home, you are required to have a reliable internet connection with at least 50mb DL/10mb UL. Virtual work is supported with cloud-based applications, collaboration tools and asynchronous working. The health and safety of employees are a top priority. We encourage work-life balance and keep the Geotab culture going strong with online social events, chat rooms and gatherings. Join us and help reshape the future of technology! Geotab verifies candidates' eligibility to work in the United States through E-Verify, an internet-based system operated by U.S. Citizen and Immigration Services. Other employment statements: Geotab will not discharge or in any other manner discriminate against employees or applicants because they have inquired about, discussed, or disclosed their own pay or the pay of another employee or applicant. Additionally, employees who have access to the compensation information of other employees or applicants as a part of their essential job functions cannot disclose the pay of other employees or applicants to individuals who do not otherwise have access to compensation information, unless the disclosure is (a) in response to a formal complaint or charge, (b) in furtherance of an investigation, proceeding, hearing, or action, including an investigation conducted by the employer, or (c) consistent with the Company's legal duty to furnish information. We are committed to accommodating people with disabilities during the recruitment and assessment processes and when people are hired. We will ensure the accessibility needs of employees with disabilities are taken into account as part of performance management, career development, training and redeployment processes. If you require accommodation at any stage of the application process or want more information about our diversity and inclusion as well as accommodation policies and practices, please contact us at careers@geotab.com. Geotab provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, sex, national origin, age, disability or genetics. In addition to federal law requirements, Geotab complies with applicable state and local laws governing nondiscrimination in employment in every location in which the company has facilities. This policy applies to all terms and conditions of employment, including recruiting, hiring, placement, promotion, termination, layoff, recall, transfer, leaves of absence, compensation and training. Geotab expressly prohibits any form of workplace harassment or discrimination based on race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, genetic information, disability, or veteran status. Improper interference with the ability of Geotab's employees to perform their job duties may result in discipline up to and including discharge. If you would like more information about our EEO program or wish to file a complaint, please contact our EEO officer, Klaus Boeckers at HRCompliance@geotab.com. For more details, view a copy of the EEOC's Know Your Rights poster. By submitting a job application to Geotab Inc. or its affiliates and subsidiaries (collectively, “Geotab”), you acknowledge Geotab’s collection, use and disclosure of your personal data in accordance with our Privacy Policy. Click here to read our Privacy Notice.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Staff DevOps Engineer
Scratch FinancialScratch Financial is the world's simplest patient financing solution.
Company Description NBCUniversal is one of the world's leading media and entertainment companies. We create world-class content, which we distribute across our portfolio of film, television, and streaming, and bring to life through our global theme park destinations, consumer products, and experiences. We own and operate leading entertainment and news brands, including NBC, NBC News, NBC Sports, Telemundo, NBC Local Stations, Bravo, and Peacock, our premium ad-supported streaming service. We produce and distribute premier filmed entertainment and programming through our powerhouse film and television studios, including Universal Pictures, DreamWorks Animation, and Focus Features, and the four global television studios under the Universal Studio Group banner, and operate industry-leading theme parks and experiences around the world through Universal Destinations & Experiences, including Universal Orlando Resort, home to Universal Epic Universe, and Universal Studios Hollywood. NBCUniversal is a subsidiary of Comcast Corporation. Visit www.nbcuniversal.com for more information. Our impact is rooted in improving the communities where our employees, customers, and audiences live and work. We have a rich tradition of giving back and ensuring our employees have the opportunity to serve their communities. We champion an inclusive culture and strive to attract and develop a talented workforce to create and deliver a wide range of content reflecting our world. Job Description As the DevOps Lead Engineer, you will be responsible for spearheading our DevOps initiatives. You will foster a culture of automation, continuous integration, observability and delivery. Your efforts will support consumer data driven advertising and marketing products, standardized consumer identity solutions, and machine learning initiatives for NBCUniversal and its brands. You will collaborate with cross-functional teams to optimize our cloud infrastructure, ensuring high availability, scalability, and security. Your expertise in AWS services, containerization technologies, monitoring tools, and cloud architecture will be pivotal in designing and implementing robust DevOps solutions that streamline our development, testing, and deployment processes. Responsibilities: - Develop and lead the implementation of DevOps strategies and best practices to improve the efficiency, reliability, and scalability of our cloud-based applications. - Design, build, and maintain robust continuous integration and continuous delivery pipelines to automate the software development and deployment lifecycle. - Utilize your in-depth knowledge of AWS services to architect, deploy, and manage scalable and resilient cloud infrastructure solutions. - Implement containerization technologies (e.g., Docker, Kubernetes) to orchestrate application deployment and ensure consistent environments across various stages of development. - Implement effective monitoring and logging solutions to proactively identify performance bottlenecks, security issues, and system anomalies. Develop auto-scaling solutions to meet fluctuating demand. - Design and optimize cloud architecture to ensure high availability, disaster recovery, and cost-effectiveness. - Implement security measures and best practices to safeguard our cloud infrastructure and applications against potential threats and vulnerabilities. - Lead and mentor a team of DevOps engineers, fostering a collaborative and innovative work environment. - Promote automation in all aspects of DevOps and maintain detailed documentation of infrastructure, processes, and procedures. Qualifications - Bachelor's degree in Computer Science, Software Engineering, or a related field. - Proven experience of 6+ years in DevOps and cloud engineering, with at least 2 years in a leadership or senior role. - Expertise in building and managing CI/CD pipelines using tools like Jenkins, GitLab CI/CD, or AWS CodePipeline. - Strong proficiency in AWS services, including EC2, S3, RDS, Lambda, IAM, and VPC. - Solid understanding of containerization technologies (e.g., Docker, Kubernetes) and container orchestration. - Experience with infrastructure-as-code tools (e.g., CloudFormation, Terraform). - Familiarity with monitoring and logging tools such as Prometheus, Grafana, ELK stack, Splunk, Datadog and CloudWatch. - Knowledge of cloud security best practices and compliance standards (e.g., CIS benchmarks, CCPA, GDPR). - Strong problem-solving skills and the ability to troubleshoot complex issues in a cloud environment. - Excellent communication and leadership skills to effectively collaborate with cross-functional teams. Additional Requirements: - Fully Remote: This position has been designated as fully remote, meaning that the position is expected to contribute from a non-NBCUniversal worksite, most commonly an employee’s residence. This position is eligible for company sponsored benefits, including medical, dental and vision insurance, 401(k), paid leave, tuition reimbursement, and a variety of other discounts and perks. Learn more about the benefits offered by NBCUniversal by visiting the Benefits page of the Careers website. Salary range: $130,000 - $160,000 (bonus eligible) We are accepting applications for this position on an ongoing basis. Additional Information As part of our selection process, external candidates may be required to attend an in-person interview with an NBCUniversal employee at one of our locations prior to a hiring decision. NBCUniversal's policy is to provide equal employment opportunities to all applicants and employees without regard to race, color, religion, creed, gender, gender identity or expression, age, national origin or ancestry, citizenship, disability, sexual orientation, marital status, pregnancy, veteran status, membership in the uniformed services, genetic information, or any other basis protected by applicable law. If you are a qualified individual with a disability or a disabled veteran, you have the right to request a reasonable accommodation if you are unable or limited in your ability to use or access nbcunicareers.com as a result of your disability. You can request reasonable accommodations by emailing [email protected]. For LA County and City Residents Only: NBCUniversal will consider for employment qualified applicants with criminal histories, or arrest or conviction records, in a manner consistent with relevant legal requirements, including the City of Los Angeles' Fair Chance Initiative For Hiring Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act, where applicable. - Business Segment: Operations & Technology - Compensation: USD 130000 - USD 160000 - yearly
Principal DevOps Engineer
SagentSagent powers banks and lenders to make loans and homeownership simpler and safer for millions of consumers.
• Collaborate with senior leadership to develop and refine the company’s cloud strategy, ensuring alignment with business goals. • Stay abreast of emerging cloud technologies and assess their applicability and potential benefits to our organization. • Design robust, scalable, and highly available cloud architectures that meet business requirements and align with industry best practices. • Architect solutions adhering to security, compliance, and performance requirements, incorporating GCP and Azure platforms. • Provide technical leadership and mentorship to the cloud engineering team. • Lead architecture discussions and guide development teams in implementing cloud solutions, focusing on Kubernetes and container orchestration with Helm. • Implement cloud solutions hands-on, including infrastructure setup, configuration, and troubleshooting. • Develop, troubleshoot, and maintain CI/CD pipelines using Azure DevOps and integrate cloud components and services with cross-functional teams. • Continuously monitor and optimize cloud infrastructure for performance, cost, and scalability. • Recommend improvements to existing cloud-based systems for enhanced efficiency and effectiveness. • Create and maintain comprehensive documentation related to cloud architecture, configurations, and processes. • Generate regular reports on system performance and usage. • Effectively collaborate with internal stakeholders, vendors, and partners on cloud-related initiatives. • Communicate complex technical concepts to non-technical stakeholders clearly and concisely.
• Design, build, and operate scalable ML infrastructure on GCP (GKE), supporting both experimentation and production workloads for LLMs and NLP systems. • Manage Kubernetes-based environments (GKE): deployment, scaling, upgrades, and reliability of training and inference workloads across GPU/TPU/CPU pools. • Build and maintain CI/CD pipelines (GitHub Actions, Jenkins) to automate testing, training, and deployment of ML services and infrastructure. • Implement infrastructure as code (Terraform, Ansible) to provision and manage cloud resources in a reproducible, secure, and cost-efficient way. • Ensure observability of ML systems: monitoring, logging, and alerting for infrastructure, pipelines, and production inference workloads. • Collaborate with ML engineers and Data Engineers to design and support reliable training and inference pipelines. • Optimize resource utilization and cost, improving efficiency of training and serving infrastructure. • Troubleshoot and resolve issues across the ML platform - from data pipelines to distributed training and production deployments. • Contribute to engineering best practices: code reviews, automation, and continuous improvement of platform reliability and developer experience.
• Become a member of a highly collaborative engineering team offering a unique blend of Cloud Infrastructure Administration, Site Reliability Engineering, Security Operations, and Vulnerability Management across multiple clients. • Coordinate with client product teams, engineering team members, and other stakeholders to monitor and maintain a secure and resilient cloud-hosted infrastructure to established SLAs in both production and non-production environments. • Innovate and implement using automated orchestration and configuration management techniques. Understand the design, deployment, and management of secure and compliant enterprise servers, network infrastructure, boundary protection, and cloud architectures using Infrastructure-as-Code. • Create, maintain, and peer review automated orchestration and configuration management codebases, as well as Infrastructure-as-Code codebases. Maintain IaC tooling and versioning within Client environments. • Implement and upgrade client environments with CI/CD infrastructure code and provide internal feedback to development teams for environment requirements and necessary alterations. • Work across AWS, Azure and GCP, understanding and utilizing their unique native services in client environments. • Configure, tune, and troubleshoot cloud-based tools, manage cost, security, and compliance for the Client’s environments. • Monitor and resolve site stability and performance issues related to functionality and availability. • Work closely with client DevOps and product teams to provide 24x7x365 support to environments through Client ticketing systems. • Support definition, testing, and validation of incident response and disaster recovery documentation and exercises. • Participate in on-call rotations as needed to support Client critical events, and operational needs that may lay outside of business hours. • Support testing and data reviews to collect and report on the effectiveness of current security and operational measures, in addition to remediating deviations from current security and operational measures. • Maintain detailed diagrams representative of the Client’s cloud architecture. • Maintain, optimize, and peer review standard operating procedures, operational runbooks, technical documents, and troubleshooting guidelines




