Job Closed
This listing is no longer active.
AI-Powered Data Operations Center. Autonomous Self-Healing Pipelines Are Here.
Senior Site Reliability Engineer
Location
California
Posted
80 days ago
Salary
0
Seniority
Senior
Job Description
Senior Site Reliability Engineer
Pantomath
• Own the Platform: Design, build, and maintain Pantomath's cloud infrastructure on AWS (EC2, EKS, IAM, ALB, RDS, S3) using Infrastructure as Code principles (Terraform, CDK). • Architect and evolve CI/CD pipelines (GitHub Actions, NX) that enable development teams to ship with speed, confidence, and consistency. • Lead the incident response lifecycle — own runbooks, drive resolution, and conduct blameless postmortems that harden the platform for the future. • Manage BAU operations including backups, credential rotation, log retention, and system administration with operational discipline. • Engineer for Reliability and Security: Apply zero trust and least privilege design patterns to authorization, authentication, networking, and runtime threat detection across the platform. • Partner with leadership to maintain SOC2-compliant infrastructure practices and proactively close security gaps before they become incidents. • Implement and manage robust observability tooling (Datadog, CloudWatch, Prometheus) — define standards for logging, metrics, and alerting that give every team real-time platform visibility. • Support agent observability for connector services central to Pantomath's autonomous remediation engine. • Drive Efficiency and Scale: Establish cost dashboards, conduct bi-weekly reviews, and implement right-sizing, idle shutdown, and shared infrastructure patterns that meaningfully reduce cloud spend. • Lead migration to shared ALB patterns and optimize EKS autoscaling to support rapid customer and product growth. • Contribute to multi-region readiness strategy and proactively address AWS service limits and scalability bottlenecks before they impact customers. • Reduce friction for developers — automate manual provisioning, clean up IaC repositories, and streamline dev and staging environments so engineers can move fast. • Shape the Engineering Culture: Champion DevOps and SRE best practices within an Agile/Scrum framework across multiple engineering pods. • Drive the infrastructure roadmap and platform strategy in close partnership with the VP of Engineering and company leadership. • Contribute to system architecture discussions and mentor engineers across the organization on reliability and operational excellence.
Job Requirements
- Bachelor's degree in Computer Science, Information Systems, or a related field, or equivalent practical experience.
- 5+ years of experience in Site Reliability, Platform Engineering, DevOps, or Cloud Engineering — ideally in a high-growth startup environment.
- Demonstrated track record of owning platform initiatives end-to-end, from design through production operation.
- Proven experience operating within an Agile/Scrum development methodology.
- Deep AWS expertise across core services (EC2, EKS, IAM, ALB, RDS, S3) and strong hands-on experience with Terraform or comparable IaC tools.
- Solid CI/CD knowledge, preferably with GitHub Actions, and the ability to build pipelines that accelerate engineering without sacrificing safety.
- Proficiency with observability tooling (Datadog, Prometheus, CloudWatch) and the judgment to define meaningful alerting standards across a distributed platform.
- Strong command of security best practices — least privilege, secret management, zero trust networking, and runtime threat detection.
- Proficiency in at least one scripting language (Python, Bash) for automation, tooling, and infrastructure management.
- Proficient in leveraging AI coding assistants and committed to evolving SDLC workflows to maximize the impact of AI-driven development.
- Excellent problem-solving, communication, and cross-functional collaboration skills.
Benefits
- Equal Opportunity Employer
- Reasonable accommodations offered
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Senior DevOps Engineer
Ivim Services LLCIvím Health is a telehealth company dedicated to transforming metabolic healthcare through individualized evidence-based treatment. Our patient-first care model integrates GLP-1 therapy, hormone support, nutrition and lifestyle coaching, and ongoing clinical guidance to help patients achieve sustainable results. We support patients nationwide through a fully remote team committed to innovation, transparency, and delivering healthcare with excellence.
This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more. Role Description The Senior DevOps Engineer will design, build, and operate the cloud infrastructure and deployment systems that power Ivím Health’s digital platform. This role focuses on AWS infrastructure, Terraform-based infrastructure as code, CI/CD automation, and release engineering supporting a distributed WordPress and React/JavaScript environment. - Ensure Ivím’s systems are secure, scalable, and reliable while supporting rapid product development and continuous deployment. - Work closely with Engineering, Product, and Security teams to build the foundation for fast, low-risk releases while maintaining strict privacy and security standards required for healthcare data. - Combine infrastructure architecture, automation, release engineering, and security-minded operations to support Ivím’s continued growth as a modern healthcare technology platform. Key Responsibilities - Cloud Infrastructure & Architecture - Design and maintain scalable AWS infrastructure supporting Ivím’s digital platform. - Build and manage infrastructure using Terraform and infrastructure-as-code practices. - Maintain core AWS services including VPCs, ECS/EKS, RDS, S3, CloudFront, IAM, and Secrets Manager. - Ensure infrastructure is highly available, fault-tolerant, and optimized for performance and cost. - Establish infrastructure standards that support long-term scalability and maintainability. - CI/CD & Release Engineering - Design and maintain CI/CD pipelines supporting WordPress and React/JavaScript services. - Automate build, testing, artifact management, and deployment workflows. - Manage release processes across multiple environments including development, QA, staging, and production. - Implement deployment strategies such as blue/green, canary releases, and automated rollbacks. - Improve deployment speed and reliability while minimizing operational risk. - Security & Compliance - Implement infrastructure security practices aligned with healthcare data protection standards. - Maintain least-privilege IAM policies, secrets management, and encrypted data storage. - Ensure infrastructure follows secure configuration standards for systems handling sensitive data. - Collaborate with engineering and security teams to maintain privacy and compliance safeguards. - Support internal security reviews, audits, and documentation when required. - Infrastructure Automation & Platform Engineering - Automate infrastructure provisioning, environment setup, and ephemeral testing environments. - Improve developer workflows by standardizing infrastructure patterns and deployment processes. - Develop infrastructure templates and reusable modules for engineering teams. - Document infrastructure standards, operational procedures, and runbooks. - Monitoring, Observability & Reliability - Implement monitoring, alerting, and logging systems across infrastructure and services. - Use tools such as CloudWatch and centralized logging systems to track system performance and health. - Establish observability standards for metrics, logs, and system behavior. - Participate in incident response and root-cause analysis when production issues occur. - Continuously improve system reliability and operational readiness. - Cross-Functional Collaboration - Work closely with software engineers to improve build systems, containerization, and deployment pipelines. - Partner with product and engineering leaders to support platform scalability. - Help guide engineering teams on best practices for infrastructure, security, and deployment. - Share operational insights and recommendations that improve system stability and performance. Qualifications - 7+ years of experience in DevOps, infrastructure engineering, or cloud platform operations. - Strong experience operating production infrastructure on AWS. - Experience managing infrastructure using Terraform and infrastructure-as-code workflows. - Experience designing and maintaining CI/CD pipelines for distributed applications. - Experience supporting modern web application stacks including WordPress and React/JavaScript. - Experience with release engineering, deployment workflows, and environment promotion processes. - Experience supporting platforms that handle sensitive or regulated data preferred. - Healthcare or regulated environment experience preferred but not required. Technical Skills - Deep experience with AWS cloud infrastructure and architecture. - Strong proficiency with Terraform and infrastructure-as-code practices. - Experience with CI/CD platforms such as GitHub Actions, GitLab CI, Jenkins, or similar tools. - Containerization experience with Docker and container orchestration platforms such as ECS or EKS. - Strong understanding of Linux systems, networking, and distributed system architecture. - Experience implementing monitoring, logging, and alerting systems. - Familiarity with scripting languages such as Python or Bash. - Understanding of web platform architectures including WordPress and modern JavaScript frameworks. Attributes - Strong systems thinker with a focus on reliability and automation. - Highly organized and comfortable managing complex infrastructure environments. - Strong problem solver who thrives in fast-moving engineering teams. - Security-minded engineer with a strong understanding of data protection practices. - Collaborative team member able to work effectively with engineering and product teams. - Mission-aligned with interest in healthcare technology and digital innovation. Benefits - Competitive pay aligned to experience and impact. - Comprehensive health, dental, and vision benefits. - 401(k) with employer match. - Generous PTO, paid holidays, and a flexible remote environment. - Exclusive Ivím At Work perks and wellness offerings. - Opportunity to build and scale infrastructure for a rapidly growing digital healthcare platform. Important Hiring Notice All official communication from Ivím Health will come from an @ivimhealth.com email address. We will never ask for payment, financial information, or fees at any point during the application or hiring process. If you receive any suspicious communication claiming to be from Ivím Health, please report it immediately by emailing talent@ivimhealth.com. Equal Opportunity Employer Ivím Health is committed to creating an inclusive and equitable workplace. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, veteran status, or any other protected characteristic. Ivím Health provides reasonable accommodations for qualified individuals with disabilities and for sincerely held religious beliefs, in accordance with applicable law.
Senior Build & Release Engineer
ITRex GroupWe turn AI ambition into working systems — Generative AI, data, and everything in between
This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more. Role Description We are seeking a skilled Senior Build & Release Engineer, who will implement and maintain end-to-end CI/CD pipelines using GitHub, with a strong focus on test-driven deployment, automated release management, and Infrastructure as Code (IaC) practices. - Lead the design, architecture, and management of CI/CD pipelines using GitHub Actions (and similar tools), ensuring fast, reliable, and reproducible software delivery. - Implement and enforce test-driven deployment systems, integrating automated testing, validation, and monitoring to maintain code quality and accelerate feedback cycles. - Containerize applications and microservices with Docker, optimize image builds, and manage deployment pipelines for distributed environments. - Oversee the build, packaging, and publishing lifecycle for JavaScript, TypeScript, and C++ packages, including versioning, semantic tagging, and NPM or internal registry publication. - Develop and maintain cross-platform build pipelines using CMake or equivalent tools, ensuring consistent compilation and release workflows across web, desktop, and mobile. - Automate end-to-end release processes, including tagging, building, signing, and distributing mobile, web, and desktop applications. - Define and manage Infrastructure as Code (IaC) to provision and maintain reliable, scalable, and secure infrastructure environments. - Collaborate closely with development, QA, and operations teams to troubleshoot deployment issues, optimize performance, and improve release reliability. - Continuously improve observability and feedback loops, leveraging monitoring and alerting systems to maintain operational excellence. Qualifications - 5+ years of hands-on experience in a DevOps, CI/CD, or Release Engineering role. - Strong knowledge of AWS cloud, Infrastructure as Code (IaC), shell scripting. - Knowledge of Docker, including image creation, registry management, and basic orchestration patterns. - Coding experience, especially in C++, JavaScript, and TypeScript. - Hands-on experience with compilation and publishing workflows for JavaScript, TypeScript, and C++ packages. - Proven experience in designing, architecting, and maintaining CI/CD pipelines, preferably using GitHub Actions (or similar tools). - Experience with cross-platform build and release automation (web, desktop, and mobile). - Experience in automating tagging, versioning, signing, and artifact distribution processes. - Experience managing Release workflows across web, desktop, and mobile, for cloud and on-premise systems. - Ability to design robust multi-language build pipelines leveraging CMake or similar build tools. - Experience in integration of automated testing, validation, and monitoring. - English level: B2 or higher. Requirements - Deep understanding of C++ build systems, especially CMake. - Experience building and supporting AI/ML pipelines (training and deploying models) using PyTorch or TensorFlow. - Strong Linux system administration and networking skills. - Experience setting up monitoring and alerting systems. - Familiarity with tools like Prometheus, Grafana, or ELK for CI/CD processes tracking. - Background in secure environments like fintech, blockchain, or distributed systems. - Knowledge of other clouds like GCP or Azure. Benefits - Remote flexibility: Work where and how you work best - we trust you to deliver. - Fair compensation: Competitive salary + benefits that matter (medical, learning). - Ownership opportunities: See a problem worth solving? Own it. We back smart risks over bureaucratic safety. - AI enhancement: We leverage AI to make you faster and stronger - complementing your abilities, not replacing them. - Learning investment: English classes, professional development. - Career progression: Real paths up, not just sideways shuffling. - Responsive teammates: No ignored Slacks, no "not my problem" attitudes. - Supportive culture: When you're stuck, people help. When things break, we fix them together. - Human connections: Regular meetups, tech talks, and actual relationships beyond work.
• Design, deploy, and manage cloud infrastructure for highly available backend services and data-heavy applications • Build and maintain ETL pipelines that ingest, transform, and serve large volumes of on-chain and off-chain data • Develop and manage APIs, databases, and the middleware that ties systems together • Challenge the cost and complexity of how we run things, finding ways to deliver more for less without cutting corners on reliability or security • Collaborate closely with the broader Product team to ship new tools and applications end-to-end • Own reliability and performance: build the monitoring, alerting, and deployment practices that mean our tools don't let the ecosystem down
DevOps Engineer – Microsoft Dynamics, 11 Month Contract
Robusta StudioYour lifetime digital partner. We harness technology to deliver tangible business growth.
• Design, implement, and maintain **CI/CD pipelines supporting Microsoft Dynamics environments**. • Automate build, test, and deployment processes for **Dynamics 365 / Dynamics CRM solutions**. • Manage and optimize cloud and infrastructure resources supporting Dynamics applications. • Implement **infrastructure as code (IaC)** and configuration management practices. • Support environment provisioning, monitoring, and incident management. • Collaborate with development teams to ensure smooth integration of Dynamics customizations and solutions. • Manage release management processes across development, staging, and production environments. • Implement security, compliance, and governance best practices within DevOps workflows. • Monitor system performance and availability, troubleshooting issues affecting Dynamics deployments. • Document processes, pipelines, and deployment standards.



