Senior Site Reliability Engineer

Location

United States

Posted

96 days ago

Salary

0

Job Description

Senior Site Reliability Engineer

4IR

About This Role We deliver mission-critical IT/OT infrastructure—in cloud and on-prem—for industrial customers that can't afford downtime. Small team. Hard problems. Practical solutions. No bureaucracy. No blame. No egos. We ship it, own it, and make it better—blameless but accountable, shoulder to shoulder. We work hard. We stay human. We trust each other. We figure it out. If you know what to do, delight in building it, and feel the ownership to support it—keep reading. What You'll Do Customer Delivery - Design complex IT/OT architectures—in cloud and on-prem—that are secure, recoverable, and sized appropriately - Work directly with customers to understand their environment and estimate effort - Own customer solutions end-to-end: requirements design build support - Build or use reusable modules when it makes sense—build bespoke when it doesn't - Deploy and manage Kubernetes-based infrastructure and stateful applications across diverse customer environments Incident Response & Ownership - Participate in on-call rotation alongside the rest of the team—everyone here supports what we ship - Own incidents through resolution, then drive root cause analysis that eliminates the class of problem—not just the symptom - Build the runbooks, alerts, and automation that make the next incident less likely or less painful Infrastructure & Automation - Work with Infrastructure-as-Code tools to provision and manage diverse customer environments - Implement and maintain GitOps workflows for in-cluster deployments - Ensure all infrastructure and application changes are declarative and version-controlled - Automate self-healing and system updates—reduce manual intervention and keep environments current Observability & Reliability - Build and maintain monitoring, alerting, and dashboards using Prometheus, Loki, and Grafana - Define SLIs and SLOs that reflect what actually matters to customers - Surface real problems, reduce noise, and continually improve reliability and team efficiency Shape the Future - We don't have everything figured out. You'll help build, create, and shape how we operate - Contribute to standards, patterns, and processes that make us better—not bureaucracy for its own sake - Bring the SRE mindset: automate toil, prefer boring/stable systems, and relentlessly improve What We're Looking For - 5+ years in SRE, DevOps, or Infrastructure Engineering - Strong Kubernetes skills in production environments—you'll troubleshoot real clusters, not just tutorials - Experience with GitOps tooling (ArgoCD, Rancher Fleet, FluxCD, or similar) - Solid understanding of Infrastructure-as-Code concepts (Terraform, Pulumi, Crossplane, or similar) - Real incident response experience—you've been on-call, stayed calm, and fixed things under pressure - Comfort with heterogeneous environments—every customer site is a little different and you need to adapt - Clear communication skills—you can write a useful runbook, gather requirements on a customer call, and document what you learned - Ability to operate in ambiguity—we're building clarity, not waiting for it Strong Plus - Azure experience (our primary cloud) - Experience with SUSE ecosystem (SLE Micro, RKE2, Rancher, Longhorn) - Industrial, manufacturing, or OT environment experience - Familiarity with Inductive Automation's Ignition platform and MQTT - Experience in a startup or small-team environment where you wore many hats The SRE Mindset This matters here. We need someone who: - Sees repetitive manual work as a problem to automate, not a fact of life - Prefers stable, predictable, "boring" production over clever and fragile - Supports what they create—no throwing things over the wall - Treats incidents as opportunities for systemic improvement - Works well on a small team where everyone carries weight - Stays current with SRE practices, emerging technologies, and cloud/edge trends A Few Honest Words This is a startup. Hours can be demanding. Priorities shift. You won't have a team of 30 backing you up. What you will have: the autonomy to make real decisions, teammates who own their work, and customers who genuinely depend on what we build. We work hard because the work matters—and we have fun doing it. If you want a structured 9-5, predictability, and a clear ladder—this probably isn't the right fit. If you want to build, learn, and be part of something that's actually going somewhere—let's talk. What We Offer - Comprehensive benefits (Medical, Dental, Vision, 401K) - Fully remote—work from anywhere in the world - A team where it's safe to be honest, learn from mistakes, and get better together Additional Information We are committed to the principle of equal employment opportunity for all employees and to providing a work environment free from discrimination and harassment.

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Paradigm logo

IT DevOps Engineer II

Paradigm

Paradigm is a crypto-focused investment firm based in San Francisco.

DevOps Engineer96 days ago
OtherRemoteTeam 1-10Since 2018H1B Sponsor

Paradigm is an accountable specialty care management organization focused on improving the lives of people with complex injuries and diagnoses. The company has been a pioneer in value-based care since 1991 and has an exceptional track record of generating the very best outcomes for patients, payers, and providers. Deep clinical expertise is the foundation for every part of Paradigm’s business: risk-based clinical solutions, case management, specialty networks, home health, shared decision support, and payment integrity programs. We’re proud to be recognized—again! For the fourth year in a row, we’ve been certified by Great Place to Work®, and for the third consecutive year, we’ve earned a spot on Fortune's Best Workplaces in Health Care™ list. These honors reflect our unwavering commitment to fostering a positive, inclusive, and employee-centric culture where people thrive. Watch this short video for a brief introduction to Paradigm. We are seeking a full-time, remote DevOps Engineer II. The DevOps Engineer II is responsible for designing, implementing, and supporting secure, scalable CI/CD pipelines across Azure and AWS environments. This role is hands on and execution focused, partnering closely with senior engineers, application development, and infrastructure teams to improve deployment reliability, security, and operational efficiency. The ideal candidate is a self-starter who takes ownership of delivery, proactively identifies improvement opportunities, and is comfortable working with minimal supervision in a collaborative environment. Occasional after-hours support for scheduled production releases is required. RESPONSIBILITIES: - Design, implement, and maintain CI/CD pipelines using Azure DevOps and AWS - Build multi‑stage pipelines supporting build, test, security scanning, and deployment - Implement DevSecOps practices, including secure access management and credential handling - Integrate logging, metrics, and alerting into delivery workflows - Troubleshoot CI/CD and cloud infrastructure issues in cross‑functional teams - Identify and implement automation opportunities across DevOps and ITSM processes - Provide Tier 2 production support during business hours and limited off‑hours releases QUALIFICATIONS: Education - Bachelor’s degree in Computer Science, Information Systems, or a related field, or equivalent practical experience. Experience - 4+ years of experience supporting CI/CD pipelines and Git‑based workflows, including multi‑stage build and deployment pipelines, in Azure DevOps (primary) and AWS (secondary) - 2+ years working with Infrastructure as Code (Terraform preferred) - 2+ years of experience working with containerized workloads and Kubernetes - 2+ years of experience supporting or developing production applications in a mid‑ to large‑scale environment - Experience supporting secure, repeatable SDLC processes, including release and deployment automation - Ability to troubleshoot delivery and infrastructure issues in a cross‑functional team environment Technology and Concepts – a successful candidate will have familiarity with some or all of the technologies and concepts used in our current environment, including but not limited to: - DevOps & CI/CD - Secrets management, IAM, and least‑privilege access - Failure handling, rollback strategies, and pipeline notifications - Cloud Platforms & Infrastructure - Microsoft Azure and Amazon Web Services (AWS) - Cloud networking fundamentals (virtual networks, routing, load balancing, DNS, etc) - Scripting & Automations - Bash or PowerShell - Azure CLI - kubectl Desired Skills - Azure Bicep or AWS CloudFormation - Experience with observability and application security tooling - GitHub repository and pull request management - Familiarity with GitHub Copilot tooling Professional Competencies - Intermediate knowledge of: - Solution development - Systems analysis and design - Programming and testing methodologies - Release and deployment management - Demonstrated ability to work effectively in a matrixed, cross‑functional environment - Strong problem‑solving and decision‑making skills, with the ability to manage multiple initiatives simultaneously - Effective communication, organizational, and execution skills, both written and verbal - Demonstrated ownership mindset and ability to work independently while collaborating across teams - Strong commitment to continuous improvement and ongoing learning as technologies and job requirements evolve Paradigm Benefits: - Health and wellness– We want our people to be and stay healthy, so we offer PPO, HDHP, and HMO health insurance options with Cigna and Kaiser (CA employees only). - Financial incentives – Paradigm’s financial benefits help prepare you for the future: competitive salaries, 401(k) matching contributions, employer-paid life and disability insurance, flexible spending and commuter accounts, and employer-matched HSA contributions. - Vacation - We believe strongly that work-life balance is good for you and for our company. Our paid time off and personal holiday programs give you the flexibility you need to live your life to the fullest. - Volunteer time– We want our employees to engage with and give back to their communities in meaningful ways. Full and part-time employees receive one paid day per calendar year. - Learning and development: One of Paradigm’s core values is expertise, so we encourage our employees to continually learn and grow. We support this in a variety of ways, including our new Learning Excellence at Paradigm (LEAP) program. Paradigm believes that fostering a diverse and inclusive workplace is central to our mission of helping more people and transforming lives. We’re striving to build a culture that better reflects the society we live in and empowers our team to deliver the highest levels of compassion and care to those we serve. For us, achieving this goal requires a workforce that respectfully embraces differences and commits to positive change, creating an environment where everyone is able to bring their whole self to work. Paradigm complies with federal and state disability laws and makes reasonable accommodations for applicants and employees with disabilities. If reasonable accommodation is needed to participate in the job application or interview process, to perform essential job functions, and/or to receive other benefits and privileges of employment, please contact Leave Management at leave.management@paradigmcorp.com. We are an Equal Opportunity Employer and do not discriminate against any employee or applicant for employment because of race, color, sex, age, national origin, religion, sexual orientation, gender identity, status as a veteran, and basis of disability or any other federal, state or local protected class. As a contractor with the State of Wisconsin, Paradigm complies with Wisconsin Contract Compliance Law (§16.765). Poster link: Contract Compliance Law Poster #LI-Remote

United States
$100K - $120K / year
Job Closed
Elligint Health logo

DevOps Engineer – Systems Focus

Elligint Health

Sparking actionable change and empowering whole-person care with dynamic, data-driven health technology solutions.

DevOps Engineer96 days ago
OtherRemoteTeam 51-200H1B No Sponsor

• Perform regular patching, deployments, and system updates for Production and Training environments • Administer and maintain all VCRON related configurations, including scheduling, monitoring, and troubleshooting • Configure, manage, and support SFTP and HL7 over TCP/IP for ADT file exchanges with vendors • Administer and maintain IHE servers for processing CCD files from vendor systems • Collaborate closely with the support team to troubleshoot file processing issues and resolve root causes • Enhance documentation, monitoring, and alerting to ensure operational continuity and compliance with HIPAA, HITRUST, and other regulatory standards

New Jersey
Elligint Health logo

DevOps Engineer (Systems Focus)

Elligint Health

Sparking actionable change and empowering whole-person care with dynamic, data-driven health technology solutions.

DevOps Engineer96 days ago
OtherRemoteTeam 51-200H1B No Sponsor

About Elligint Health: Elligint Health, established in 2024, is leading the charge of innovating healthcare by aligning all stakeholders, delivering intelligent healthcare solutions, and empowering pro-active, whole-person care across the healthcare continuum. Elligint Health integrates vast amounts of data from across the healthcare continuum, delivering intelligence that informs decision-making, enhances care coordination, and improves outcomes. Focused on enabling actionable intervention and whole-person care, Elligint Health helps healthcare organizations navigate complexity, turning insights into strategies that benefit providers, payers, and members and patients alike. With Elligint Health, the future of healthcare is simpler, smarter, and more effective. Position Summary: We are looking for a motivated and experienced DevOps/Systems Engineer to support and optimize our Windows based web services running in the AWS cloud. The ideal candidate is a Windows specialist with strong cloud operations expertise and a solid understanding of healthcare industry security and compliance requirements. Experience with Linux systems is a valuable bonus for supporting our hybrid environment and automation initiatives. Duties & Responsibilities: - Perform regular patching, deployments, and system updates for Production and Training environments - Administer and maintain all VCRON related configurations, including scheduling, monitoring, and troubleshooting - Configure, manage, and support SFTP and HL7 over TCP/IP for ADT file exchanges with vendors - Administer and maintain IHE servers for processing CCD files from vendor systems - Collaborate closely with the support team to troubleshoot file processing issues and resolve root causes - Enhance documentation, monitoring, and alerting to ensure operational continuity and compliance with HIPAA, HITRUST, and other regulatory standards Required Qualifications: - 4+ years of hands-on experience managing production Windows and MS SQL–based web services in a controlled environment - Proven production operations experience in AWS, including EC2, S3, VPC, RDS, and IAM - Strong scripting skills in PowerShell, Bash, Python, or similar languages - Ability to troubleshoot complex networking and system issues in production environments Preferred Qualifications: - Infrastructure as Code tools such as Terraform or CloudFormation - Configuration management platforms such as Puppet or Ansible - Monitoring and alerting systems like Datadog, New Relic, Splunk, Grafana, PagerDuty, and Nagios - Linux administration (Debian/Ubuntu) - AWS managed services including ECS/Fargate, OpenSearch/Elasticsearch, ElastiCache/Redis, RabbitMQ, CloudWatch, Kinesis, SNS, and Redshift - Experience with Power BI administration, including workspace management, data gateway configuration, and tenant-level governance - An opportunity exists for the candidate to participate in efforts to rearchitect and modernize our Linux based AWS platforms and microservice based subsystems Compensation: - Competitive salary & equity in Elligint Health - PTO and company sick leave - Health, dental, and vision insurance - 401(k) participation and matching - Opportunity to join a rapidly growing company and shape its future Elligint Health is committed to ensuring that information security remains a top priority for everyone. All workers are responsible for the protection of our Information Security, and we take the execution of this seriously. Information Security Policies and procedures details and training will be provided during onboarding. Each candidate will be subject to a drug screening, background check, and reference check before beginning employment. Please note that some of our positions require U.S. citizenship and submission and further approval of “Public Trust” federal clearance.

United States
OtherRemoteTeam 11-50

We are seeking a highly skilled DevOps Engineer to join our remote engineering team. In this role, you will be responsible for designing, implementing, and maintaining scalable infrastructure and deployment pipelines that support the development and delivery of modern cloud-based applications. As a DevOps Engineer, you will work closely with software developers, security teams, and IT operations to streamline development workflows, automate infrastructure management, and ensure system reliability, scalability, and performance. This position is fully remote within the United States, and candidates must possess a valid U.S. work authorization. Key Responsibilities Infrastructure & Cloud Management - Design, deploy, and manage scalable cloud infrastructure across platforms such as AWS, Azure, or Google Cloud Platform (GCP). - Maintain high availability, fault tolerance, and security across production and staging environments. - Implement infrastructure as code (IaC) using tools such as Terraform, CloudFormation, or ARM templates. CI/CD Pipeline Development - Build and maintain continuous integration and continuous deployment (CI/CD) pipelines. - Automate build, testing, and deployment processes using tools such as Jenkins, GitHub Actions, GitLab CI, or CircleCI. - Ensure efficient and reliable release management processes. System Monitoring & Performance Optimization - Implement monitoring, logging, and alerting systems using tools such as Prometheus, Grafana, Datadog, or ELK stack. - Analyze system performance metrics and proactively resolve infrastructure issues. - Improve system reliability and uptime through automated monitoring and incident response. Containerization & Orchestration - Manage containerized applications using Docker. - Deploy and maintain orchestration platforms such as Kubernetes. - Optimize container performance and resource utilization. Security & Compliance - Implement DevSecOps best practices across development and infrastructure environments. - Ensure infrastructure complies with organizational security standards and industry regulations. - Collaborate with security teams to mitigate vulnerabilities and maintain secure deployment pipelines. Collaboration & Process Improvement - Work closely with software engineering, QA, and operations teams to support rapid and stable application releases. - Identify opportunities for automation to improve efficiency across development and deployment processes. - Document infrastructure architecture, processes, and operational procedures. Required Qualifications - Bachelors degree in Computer Science, Information Technology, Software Engineering, or a related field (or equivalent practical experience). - 3–6 years of experience working in DevOps, Site Reliability Engineering (SRE), or cloud infrastructure roles. - Strong experience with cloud platforms (AWS, Azure, or GCP). - Hands-on experience with CI/CD tools and automation pipelines. - Proficiency with Docker and Kubernetes. - Experience with Infrastructure as Code (Terraform, CloudFormation, etc.). - Strong scripting experience with Python, Bash, or similar languages. - Knowledge of Linux system administration and networking fundamentals. - Excellent problem-solving and troubleshooting skills. Preferred Qualifications - Experience with microservices architecture and distributed systems. - Familiarity with security automation and DevSecOps practices. - Experience with configuration management tools such as Ansible, Puppet, or Chef. - Knowledge of serverless architectures. - Cloud certifications such as AWS Certified DevOps Engineer, Azure DevOps Engineer Expert, or Google Professional DevOps Engineer. Benefits & Work Environment - Fully remote work environment within the United States. - Flexible work schedule. - Opportunity to work on modern cloud-native infrastructure and cutting-edge technologies. - Collaborative engineering culture focused on innovation, automation, and continuous improvement. - Professional development and certification support. Eligibility Requirement Applicants must currently reside in the United States and possess a valid work permit authorizing employment. Sponsorship may be considered for highly qualified candidates depending on organizational requirements.

United States
Job Closed