Job Closed
This listing is no longer active.
CertifID provides identity protection services to help prevent wire fraud. Focused on securing digital financial transactions, the company strives to reduce the financial and emoti
Senior Sire Reliability Engineer
Location
Texas
Posted
120 days ago
Salary
0
Seniority
Senior
Job Description
Senior Sire Reliability Engineer
CertifID
• Own and improve the reliability, availability, and performance of production systems while defining and operationalizing SLIs/SLOs and error budgets. • Design and implement autonomous and semi-autonomous AI agents for monitoring distributed systems and applications. Build agents capable of consuming multi-source observability data (metrics, logs, traces, etc.). • Participate in and help lead an on-call rotation, serving as an escalation point for major incidents and facilitating blameless postmortems. • Build automated workflows to eliminate manual work and design/maintain Infrastructure-as-Code with Terraform. • Improve metrics, logs, traces, and alerting using tools like Datadog or Prometheus to reduce noise and increase signal. • Partner with application teams to implement reliability best practices and mentor junior engineers to foster a culture of knowledge sharing.
Job Requirements
- 5+ years in SRE, DevOps, Platform Engineering, or Infrastructure Engineering.
- Proven experience supporting production SaaS systems in Azure (preferred), AWS, or GCP.
- Strong Linux, networking, and distributed systems troubleshooting skills.
- Strong experience with containers and orchestration (Kubernetes/EKS/AKS).
- Expertise with Infrastructure-as-Code (Terraform strongly preferred).
- Strong scripting/programming skills in Python, Go, Bash, or C#/.NET.
- Hands-on experience with Datadog, Prometheus/Grafana, or OpenTelemetry.
Benefits
- Flexible vacation
- 12 company-paid holidays
- 10 paid sick days
- No work on your birthday
- Health, dental, and vision Insurance (including a $0 option)
- 401(k) with matching, and no waiting period
- Equity
- Life insurance
- Generous parental paid leave
- Wellness reimbursement of $300/year
- Remote worker reimbursement of $300/year
- Professional development reimbursement
- Competitive pay
- An award-winning culture
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Senior Security Engineer – DevSecOps
PrizePicksPrizePicks is the fastest-growing sports company in North America according to the 2023 Inc. 5000 rankings, two years running, and the largest independent skill-based fantasy sports operator in the country.
• Manage and maintain edge and bot protection (e.g., WAF, CDN, DDoS mitigation). • Perform security-focused infrastructure reviews for new product releases and architectural changes. • Implement and maintain monitoring and alerting tools to detect cloud and container-related vulnerabilities and misconfigurations. • Collaborate with DevOps and Engineering teams to embed security into CI/CD pipelines and deployment processes without slowing down delivery. • Partner with Application Security and Engineering to implement security controls on opportunities identified during Threat Modeling. • Lead initiatives around infrastructure-as-code (IaC) security and runtime protection to automate security controls and hardening. • Assist with threat modeling, risk assessments, and provide security guidance during the development lifecycle. • Collaborate with incident response teams, offering expert advice on cloud-related security issues to help resolve incidents quickly. • Develop tooling or automation to support proactive remediation and continuous security validation. • Track and report DevSecOps KPIs, such as mean time to remediate, security control coverage, and vulnerability trends.
• Design, operate, and continuously improve automated CI/CD pipelines using GitLab CI to support zero-downtime deployments across multiple environments. • Support development teams with standardized deployment tooling, automation, and operational best practices. • Administer and support containerized workloads using Kubernetes (EKS) and Docker-based container platforms. • Configure and manage Linux-based servers and systems. • Implement Infrastructure as Code (IaC) using Terraform and/or AWS CDK for repeatable, auditable deployments. • Support provisioning and configuration of AWS services including EC2, EKS, ECS, S3, RDS, VPC, Lambda, and related services. • Coordinate infrastructure changes without performing AWS account provisioning or organizational administration. • Integrate security scanning into CI/CD pipelines using tools such as Trivy, AWS Inspector, and AWS Security Hub. • Perform vulnerability triage and coordinate remediation with development teams in accordance with defined timelines. • Implement and manage IAM least-privilege policies, secrets, and encryption using AWS KMS, Secrets Manager, and SSM. • Ensure encryption in transit and at rest across all in-scope systems. • Configure and maintain monitoring and observability using CloudWatch, Prometheus, Grafana, and centralized logging solutions. • Support Tier 2 and Tier 3 incident response for production systems, meeting SLA requirements. • Participate in root-cause analysis and continuous improvement initiatives. • Participate in Agile sprints, including backlog grooming, sprint planning, stand-ups, and retrospectives. • Track work in JIRA, using story-point estimation and sprint metrics. • Support reprioritization of backlog items in coordination with the COR and Product Owner. • Produce and maintain technical documentation covering architecture, pipelines, monitoring, security, and disaster recovery. • Support Business Continuity and Disaster Recovery (BCDR) planning, documentation, and exercises. • Ensure all deliverables comply with ADA, Section 508, WCAG 2.2 A/AA, and digital accessibility standards.
• Design, operate, and continuously improve automated CI/CD pipelines using GitLab CI to support zero-downtime deployments across multiple environments. • Support development teams with standardized deployment tooling, automation, and operational best practices. • Administer and support containerized workloads using Kubernetes (EKS) and Docker-based container platforms. • Configure and manage Linux-based servers and systems. • Implement Infrastructure as Code (IaC) using Terraform and/or AWS CDK for repeatable, auditable deployments. • Support provisioning and configuration of AWS services including EC2, EKS, ECS, S3, RDS, VPC, Lambda, and related services. • Coordinate infrastructure changes without performing AWS account provisioning or organizational administration. • Integrate security scanning into CI/CD pipelines using tools such as Trivy, AWS Inspector, and AWS Security Hub. • Perform vulnerability triage and coordinate remediation with development teams in accordance with defined timelines. • Implement and manage IAM least-privilege policies, secrets, and encryption using AWS KMS, Secrets Manager, and SSM. • Ensure encryption in transit and at rest across all in-scope systems. • Configure and maintain monitoring and observability using CloudWatch, Prometheus, Grafana, and centralized logging solutions. • Support Tier 2 and Tier 3 incident response for production systems, meeting SLA requirements. • Participate in root-cause analysis and continuous improvement initiatives. • Participate in Agile sprints, including backlog grooming, sprint planning, stand-ups, and retrospectives. • Track work in JIRA, using story-point estimation and sprint metrics. • Support reprioritization of backlog items in coordination with the COR and Product Owner. • Produce and maintain technical documentation covering architecture, pipelines, monitoring, security, and disaster recovery. • Support Business Continuity and Disaster Recovery (BCDR) planning, documentation, and exercises. • Ensure all deliverables comply with ADA, Section 508, WCAG 2.2 A/AA, and digital accessibility standards.
Public Trust Eligibility Required This is a contingent position, meaning employment is dependent upon the successful award of the associated contract to Aretum and completion of any required background investigation or security clearance verification. About Aretum Aretum is a mission-driven organization committed to delivering innovative, technology-enabled solutions to our customers across defense, civilian, and homeland security sectors. Our teams work at the intersection of strategy, technology, and transformation, helping agencies solve their most critical challenges. We believe in investing in our people and creating a culture where collaboration, inclusion, and professional growth are at the forefront. Job Summary Aretum is seeking a skilled and motivated Junior DevSecOps Engineer. As a Junior DevSecOps Engineer you will assist with the client's cloud and systems operations. Due to the nature of our work as a federal consulting organization, employees may be expected to handle Controlled Unclassified Information (CUI) and must adhere to applicable safeguarding and compliance requirements. Responsibilities - Design, operate, and continuously improve automated CI/CD pipelines using GitLab CI to support zero-downtime deployments across multiple environments. - Support development teams with standardized deployment tooling, automation, and operational best practices. - Administer and support containerized workloads using Kubernetes (EKS) and Docker-based container platforms. - Configure and manage Linux-based servers and systems. - Implement Infrastructure as Code (IaC) using Terraform and/or AWS CDK for repeatable, auditable deployments. - Support provisioning and configuration of AWS services including EC2, EKS, ECS, S3, RDS, VPC, Lambda, and related services. - Coordinate infrastructure changes without performing AWS account provisioning or organizational administration. - Integrate security scanning into CI/CD pipelines using tools such as Trivy, AWS Inspector, and AWS Security Hub. - Perform vulnerability triage and coordinate remediation with development teams in accordance with defined timelines. - Implement and manage IAM least-privilege policies, secrets, and encryption using AWS KMS, Secrets Manager, and SSM. - Ensure encryption in transit and at rest across all in-scope systems. - Configure and maintain monitoring and observability using CloudWatch, Prometheus, Grafana, and centralized logging solutions. - Support Tier 2 and Tier 3 incident response for production systems, meeting SLA requirements. - Participate in root-cause analysis and continuous improvement initiatives. - Participate in Agile sprints, including backlog grooming, sprint planning, stand-ups, and retrospectives. - Track work in JIRA, using story-point estimation and sprint metrics. - Support reprioritization of backlog items in coordination with the COR and Product Owner. - Produce and maintain technical documentation covering architecture, pipelines, monitoring, security, and disaster recovery. - Support Business Continuity and Disaster Recovery (BCDR) planning, documentation, and exercises. - Ensure all deliverables comply with ADA, Section 508, WCAG 2.2 A/AA, and digital accessibility standards.
