Based in Lehi, Utah, DigiCert is a certificate authority company that has issued more than 80,000 digital certificates and credentials to customers around the g

Principal Site Reliability Engineer

EngineerEngineerFull Time Remote Lead Company Site

Location

United States

Posted

2 days ago

Salary

$160K - $190K / year

Seniority

Lead

Job Description

Role Description The Platform Ops team within CloudOps is responsible for the reliability, scalability, and modernization of DigiCert’s cloud infrastructure. As a Principle SRE, you will own the intersection of software engineering and operations—driving automation-first practices, reducing toil, and accelerating our cloud transformation across AWS, Azure, and GCP environments. You will be a technical force multiplier: raising reliability standards across the organization, defining SLOs that matter, and building the internal platforms and tooling that enable product teams to ship with confidence. What you will do - Reliability Engineering - Define, implement, and own SLIs, SLOs, and error budgets for critical platform services - Lead blameless post-mortems and drive systemic reliability improvements across the platform - Design and implement observability pipelines (metrics, logs, traces) using tools such as Splunk, Prometheus, Grafana, or OpenTelemetry - Participate in on-call rotation and serve as an incident commander for P0/P1 events - Cloud Modernization - Architect and execute migration strategies from legacy infrastructure to cloud-native patterns (containers, serverless, managed services) - Champion adoption of Kubernetes, service mesh, and managed cloud services (EKS, GKE, AKS) - Evaluate and introduce emerging cloud technologies that improve availability, cost efficiency, and developer experience - Partner with architecture and security teams to embed reliability and compliance into platform design - Automation & Platform Development - Build and maintain infrastructure-as-code using Terraform across multi-cloud environments - Develop internal tooling, self-service platforms, and golden-path templates that reduce operational burden for development teams - Automate operational workflows including provisioning, scaling, patching, and secret rotation - Contribute to and maintain CI/CD pipelines (GitHub Actions) to enable safe, frequent deployments - Engineering Leadership - Mentor mid-level engineers on SRE principles, distributed systems, and infrastructure best practices - Collaborate cross-functionally with product, security, and compliance teams to deliver on platform roadmap commitments - Document architectural decisions, runbooks, and platform standards; raise the engineering bar through code and design reviews Qualifications - 5+ years of experience in SRE, platform engineering, or infrastructure engineering roles - Deep proficiency in at least one major cloud provider (AWS, GCP, or Azure) with working knowledge of multi-cloud environments - Strong software engineering skills in Python, Go, or Bash; comfortable writing production-grade automation and tooling - Hands-on Kubernetes experience: cluster operations, workload management, networking (CNI/service mesh), and security (RBAC, pod security) - Infrastructure-as-code expertise with Terraform or equivalent; experience with GitOps workflows - Proven experience designing and operating observability systems and responding to production incidents at scale - Strong understanding of networking fundamentals: DNS, TLS/PKI, load balancing, and zero-trust networking concepts Nice to have - Experience in PKI, certificate lifecycle management, or security-adjacent infrastructure - Familiarity with compliance frameworks such as SOC 2, FedRAMP, or ISO 27001 in cloud environments - Prior experience driving cloud migration or modernization programs at scale - Contributions to open-source infrastructure or platform projects - AWS/GCP/Azure professional-level certifications (e.g., AWS Solutions Architect Professional, CKA/CKS) What success looks like In your first 90 days, you’ll have a deep understanding of our platform’s reliability posture, contributed to at least one automation or modernization initiative, and be a trusted voice in incident response. Within a year, you’ll have measurably reduced toil, improved SLO attainment across key services, and delivered at least one major platform capability that enables product teams to move faster. Benefits - Competitive compensation and comprehensive health, dental, and vision coverage - Retirement savings programs with company matching (401(k) or RRSP) - Generous paid time off, including holidays, and vacation - Paid parental leave and family support benefits - Life and disability coverage - Flexible spending and health savings options (where applicable) - Health and wellness support, including gym reimbursement and wellness programs - Employee Assistance Program with 24/7 confidential support for employees and families - Education assistance and professional development opportunities - Access to LinkedIn Learning and continuous learning resources - Employee referral bonus program and additional company perks and discounts - Internal rewards and recognition platform (Motivosity) to celebrate and acknowledge project wins, milestone achievements, and the outstanding contributions of our colleagues - Business travel insurance and global employee support programs

Related Categories

Engineer

Related Job Pages

Remote Full-time Jobs (US)Remote Python Jobs (US)More Remote Jobs

More Engineer Jobs

Principal Full Stack Engineer

Tech Firefly

Illuminating Solutions

Engineer2 days ago

Contract RemoteTeam 1,001-5,000H1B Sponsor

Company Site LinkedIn

Role Description We are looking for a highly skilled Principal Full Stack Engineer to drive the development of our next-generation, interoperable healthcare solutions. This role requires a seasoned engineer who can seamlessly balance high-level architectural design with hands-on full-stack delivery. The ideal candidate will have deep roots in Java ecosystems, strong frontend capabilities in Angular, and a comprehensive understanding of healthcare interoperability standards (FHIR/HL7) to connect complex clinical ecosystems. Location: Remote Pay: $120-130/hour Contract Length: 12+ Months Responsibilities - Lead the design and development of end-to-end applications, combining robust Java Microservices with a modern Angular frontend. - Architect seamless data exchange pipelines utilizing FHIR/HL7 standards and API Gateways. - Design and deploy scalable, cloud-native, and event-driven distributed systems on GCP. - Ensure data integrity and performance across PostgreSQL and distributed databases like Google Cloud Spanner. - Define engineering best practices, conduct code reviews, and guide agile teams toward successful product launches. Qualifications - 12–15 years of experience in full-stack software development, with significant tenure in architectural or principal roles. - Prior experience in the Healthcare domain is highly preferred. - Expert knowledge of Java Microservices, API Gateways, and healthcare compliance standards (FHIR / HL7). - Strong proficiency with Angular for building scalable enterprise user interfaces. - Proven experience architecting solutions on GCP and managing PostgreSQL or Spanner databases. - Advanced mastery of cloud-native systems, service mesh, and event-driven patterns. Nice to Have - Familiarity with Electronic Health Record (EHR) systems, specifically EPIC. - Proficiency with Python. - Experience with GKE, GCS, and APIGEE. - Exposure to Agentic AI and AI-assisted coding methodologies.

Java Angular Microservices Distributed Systems GCP PostgreSQL EHR Python Google Kubernetes Engine Apigee AI

View details: Principal Full Stack Engineer

United States

$120 - $130 / hour

Apply

Software Consulting Engineer – T&D Configuration Systems, Software Solutions

Switzerland Global Enterprise

We support Swiss SMEs in their international business and help innovative foreign companies to establish in Switzerland.

Engineer2 days ago

Full Time RemoteTeam 51-200Since 1927H1B No Sponsor

Company Site LinkedIn

• Define the architecture and evolution of scalable, modular, and secure software platforms for T&D configuration systems, including tools for device configuration, substation engineering, and grid automation workflows. • Define technical strategy and oversee the design of cloud-native platforms using modern frameworks (.NET, Java, Python, TypeScript) that support desktop, edge, and cloud deployments, with emphasis on performance, resilience, and maintainability. • Drive the integration of IEC 61850 engineering workflows (SCL-based ICD, SCD, SSD) into intuitive, automated tooling ecosystems aligned with modern UI/UX, API design, and utility integration requirements. • Ensure cross-functional alignment, acting as the authority between firmware, UI/UX, and power systems teams, enabling coherent system design and tight integration between engineering workflows and device behavior. • Collaborate with cybersecurity, systems, and hardware architects to deliver secure, compliant solutions for critical infrastructure, incorporating secure development lifecycle (SDLC) and DevSecOps practices. • Drive reuse of software components across product lines, fostering platform consistency, reducing duplication, and accelerating development. • Oversee development and lifecycle management of configuration and commissioning tools for protection and control devices, ensuring seamless integration with SCADA/DMS/EMS and other utility systems. • Conduct software and architecture reviews, ensure compliance with industry standards (IEC 61850, IEC 61968/70, CIM, IEC 62351), and manage the end-to-end software development lifecycle from requirements to deployment and support. • Partner with architects, systems engineers, and utility customers to define and deliver customer-centric, technically robust solutions that improve reliability, visibility, and flexibility of grid systems. • Monitor emerging technologies (cloud-native services, model-driven engineering, AI/ML) for relevance to the T&D domain and contribute to technology roadmaps, product strategy, and IP generation through patents, whitepapers, and technical forums. • Mentor and guide software engineers, promote a culture of technical excellence and innovation, and represent the organization in customer engagements, RFPs, and industry events.

Cloud Cyber Security Java JavaScript Python SDLC TypeScript .NET

View details: Software Consulting Engineer – T&D Configuration Systems, Software Solutions

Canada

$162.9K - $244.3K / year

Apply

Senior / Staff Documentation Engineer

TetraScience

Open | Cloud-Native | Purpose-Built for Science

Engineer2 days ago

Full Time RemoteTeam 51-200Since 2015H1B Sponsor

Company Site LinkedIn

Role Description TetraScience is the scientific data and AI company. Our documentation is how customers, from bench scientists to platform engineers, learn to build on the platform, and increasingly it is how AI agents consume the platform too. We are looking for a Documentation Engineer to own documentation as a system: - Manage the pipelines that build and publish documentation. - Oversee AI-augmented workflows that generate drafts for human review and refinement. - Handle the review and publish process. - Ensure infrastructure makes documentation reliably consumable by AI agents. This is primarily a documentation systems role, not only a writer who uses tools. The differentiator is building and owning the systems that produce, validate, publish, and AI-enable our documentation. Strong writing and editorial judgment are still required, but the center of gravity is tooling and systems, and a large portion of the day-to-day is building. - Lead the growth of our existing docs-as-code foundation and AI-assisted documentation workflows into a docs-as-AI-agents capability. - Own editorial quality and the release-notes cadence while focusing on building leverage. - Own the documentation site and its publishing as software: the docs-as-code repo, CI/CD publishing pipelines, build performance, and automated checks. - Build and grow AI-augmented documentation workflows: AI-assisted drafting, summarization, classification, consistency and staleness checks, and a feedback loop for quality improvement. - Structure and transform content for AI systems to reliably chunk, index, and reason over it. - Generate reference documentation from source (OpenAPI and related specs) and maintain alignment with platform changes. - Lower the barrier for internal contributors to ship their own docs through the docs-as-code workflow and reduce repetitive work through automation. - Own the release-notes and customer-communications cadence with every platform release and run the SME review for accuracy and timeliness. - Own the documentation style guide, hold the review-and-publish gate, and keep the team runbook current. Qualifications - 5+ years owning documentation tooling, content engineering, or developer documentation for a developer-platform or enterprise B2B product. - Engineering ability in a scripting or web stack (e.g., Python, TypeScript, or JavaScript) and fluency with docs-as-code: Git, pull-request review, CI/CD, and a static-site or CMS publishing pipeline. - Hands-on experience building AI-augmented or LLM-backed workflows: integrating LLM APIs, AI-assisted authoring, and structuring content for AI consumption. - Ability to read and reason about a real codebase and API surface well enough to document it accurately and build tooling against it. - Strong editorial judgment: ability to clarify dense engineering changes for customer safety. - Bachelors or Masters degree in a technical field, or equivalent practical experience. Requirements - Experience making documentation consumable by AI agents (llms.txt, content negotiation, RAG pipelines, MCP servers). - Experience in BioPharma or scientific software, or in regulated and validated (GxP) environments. - Experience generating reference docs from OpenAPI or related specifications with two-way Git sync. - Developer-relations or developer-education exposure. Benefits - 100% employer-paid benefits for all eligible employees and immediate family members. - Unlimited paid time off (PTO). - 401K. - Flexible working arrangements - Remote work. - Company paid Life Insurance, LTD/STD. - A culture of continuous improvement where you can grow your career and get coaching.

AI AI Agents CI/CD OpenAPI Python TypeScript JavaScript Git CMS LLM

View details: Senior / Staff Documentation Engineer

United States

Apply

Senior GenAI & High Performance Computing (HPC) Delivery Engineer

Dell Technologies

Dell Technologies was formed in 2016 when Dell and EMC combined in what is considered "the largest technology merger in history." Today, the multinational technology company is bas

Engineer2 days ago

Full Time Remote

Role Description Join us to do the best work of your career and make a profound social impact as a Senior GenAI & High Performance Computing (HPC) Delivery Engineer on our Service Delivery Team in Austin, Texas or Remote United States. 50-70 % National Travel. We’re seeking a Senior GenAI & HPC Engineer with deep experience in GPU accelerated systems, Linux performance tuning, and benchmarking. This role is highly hands-on and customer-facing, supporting onsite deployments across the U.S. for advanced HPC and GenAI solutions. You will work as a part of a team to help build, integrate, and test some of the world’s largest multi-GPU systems, benchmark them using industry standard tools, and deliver the next generations of AI and HPC infrastructure. - Deploy, configure, and validate GPU accelerated compute clusters for AI, ML, and HPC with NVIDIA Base Command Manager (Warewulf and OpenHPC knowledge are a plus) - Perform benchmarking with HPL GPU, HPL MxP, STREAM, NCCL, RCCL, OSU Microbenchmarks, and related tools - Produce as-built documentation, performance reports, and share best practices amongst the team. - Configure and secure RHEL, Ubuntu, Rocky for GenAI or HPC workloads - Work directly with customers onsite (travel both regionally and across the U.S.) Qualifications - 7+ years with HPC or GenAI clusters, GPU based systems, AI infrastructure, or related fields - Deep hands-on experience with GPU deployment, configuration, and multi-node testing using NVIDIA Base Command Manager - Proficiency with benchmarking tools: HPL, STREAM, NCCL, RCCL, MxP, OSU Microbenchmarks - Red Hat certification (RHCSA/RHCE) or 7+ years of relevant RH distros experience - Experience with GenAI/HPC networking (InfiniBand and/or RoCE) - Experience working in Linux based parallel computing environments at scale - Experience with containers/orchestration (Docker, Singularity/Apptainer, Kubernetes, Slurm) - Ability to travel up to 70% of the time across the U.S. as needed for projects - Strong customer facing and communication skills Requirements - Bachelor’s degree - NVIDIA certifications (NCA, NCE, DGX) - Experience with NVIDIA UFM, Infiniband, and SpectrumX fabrics - Exposure to hybrid cloud or GPU cloud environments - Experience with GPU observability/performance profiling tools Benefits - Your life. Your health. Supported by your benefits. You can explore the overall benefits experience that awaits you as a Dell Technologies team member — right now at MyWellatDell.com Compensation Dell is committed to fair and equitable compensation practices. The salary range for this position is $145,000 to $199,100. Company Description We believe that each of us has the power to make an impact. That’s why we put our team members at the center of everything we do. If you’re looking for an opportunity to grow your career with some of the best minds and most advanced tech in the industry, we’re looking for you. - Dell Technologies is a unique family of businesses that helps individuals and organizations transform how they work, live and play. - Join us to build a future that works for everyone because Progress Takes All of Us. - Dell Technologies is committed to the principle of equal employment opportunity for all employees and to providing employees with a work environment free of discrimination and harassment.

Linux AI AI/ML Red Hat Enterprise Linux Ubuntu Docker/Containers Docker Kubernetes Observability/Monitoring Performance Optimization

View details: Senior GenAI & High Performance Computing (HPC) Delivery Engineer

United States

$145K - $199.1K / year

Apply