Cyber solutions that move you forward, faster.
Junior Site Reliability Engineer
Location
United States
Posted
73 days ago
Salary
$95K - $110K / year
Seniority
Junior
Job Description
Junior Site Reliability Engineer
Coalfire
• Become a member of a highly collaborative engineering team offering a unique blend of Cloud Infrastructure Administration, Site Reliability Engineering, Security Operations, and Vulnerability Management across multiple clients. • Coordinate with client product teams, engineering team members, and other stakeholders to monitor and maintain a secure and resilient cloud-hosted infrastructure to established SLAs in both production and non-production environments. • Innovate and implement using automated orchestration and configuration management techniques. Understand the design, deployment, and management of secure and compliant enterprise servers, network infrastructure, boundary protection, and cloud architectures using Infrastructure-as-Code. • Create, maintain, and peer review automated orchestration and configuration management codebases, as well as Infrastructure-as-Code codebases. Maintain IaC tooling and versioning within Client environments. • Implement and upgrade client environments with CI/CD infrastructure code and provide internal feedback to development teams for environment requirements and necessary alterations. • Work across AWS, Azure and GCP, understanding and utilizing their unique native services in client environments. • Configure, tune, and troubleshoot cloud-based tools, manage cost, security, and compliance for the Client’s environments. • Monitor and resolve site stability and performance issues related to functionality and availability. • Work closely with client DevOps and product teams to provide 24x7x365 support to environments through Client ticketing systems. • Support definition, testing, and validation of incident response and disaster recovery documentation and exercises. • Participate in on-call rotations as needed to support Client critical events, and operational needs that may lay outside of business hours. • Support testing and data reviews to collect and report on the effectiveness of current security and operational measures, in addition to remediating deviations from current security and operational measures. • Maintain detailed diagrams representative of the Client’s cloud architecture. • Maintain, optimize, and peer review standard operating procedures, operational runbooks, technical documents, and troubleshooting guidelines
Job Requirements
- BS or above in related Information Technology field or equivalent combination of education and experience
- 2+ years experience in 24x7x365 production operations
- Fundamental understanding of networking and networking troubleshooting
- 2+ years experience installing, managing, and troubleshooting Linux and/or Windows Server operating systems in a production environment
- 2+ years experience supporting cloud operations and automation in AWS, Azure or GCP (and aligned certifications)
- 2+ years experience with Infrastructure-as-Code and orchestration/automation tools such as Terraform and Ansible
- Experience with IaaS platform capabilities and services (cloud certifications expected)
- Experience within ticketing tool solutions such as Jira and ServiceNow
- Experience using environmental analytics tools such as Splunk and Elastic Stack for querying, monitoring and alerting
- Experience in at least one primary scripting language (Bash, Python, PowerShell)
- Excellent communication, organizational, and problem-solving skills in a dynamic environment
- Effective documentation skills, to include technical diagrams and written descriptions
- Ability to work as part of a team with professional attitude and demeanor
Benefits
- Paid parental leave
- Flexible time off
- Certification and training reimbursement
- Digital mental health and wellbeing support membership
- Comprehensive insurance options
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Senior Engineer, FinOps, DevOps
Thinkahead Consultant Psychologist Pty LtdWe get to the heart of the matter.....real people......real solutions
• Develop and maintain cost allocation, budgeting, and chargeback models across cloud accounts (AWS, Azure, GCP). • Implement and enforce tagging and resource hierarchy standards, ensuring >90% coverage for cost-critical tags (e.g., Application, Environment, Cost Center). • Build and publish cost visibility dashboards and reports using Power BI, QuickSight, Looker, or other FinOps tooling. • Support unified multi-cloud cost reporting and forecasting for engineering and finance teams. • Execute rightsizing, scheduling, and lifecycle management of cloud resources across AWS, Azure, and GCP (EC2, VM, GCE, RDS, S3, Storage, Networking). • Manage and optimize Reservations, Savings Plans, Committed Use Discounts (CUDs), and licensing benefits (BYOL, AHB). • Implement policy-as-code and governance using tools like Terraform, AWS Config, Azure Policy, or GCP Organization Policies. • Participate in anomaly detection, spend forecasting, and automation of remediation workflows. • Contribute to CI/CD pipeline management, infrastructure automation, and GitOps practices using Azure DevOps, GitHub Actions, AWS CodePipeline, or Google Cloud Build. • Provide actionable insights in monthly cost and performance reviews with engineering and product stakeholders. • Partner with Finance and Procurement teams on budgeting, forecasting, and billing validation. • Collaborate with SRE and Platform teams to balance cost efficiency, performance, and reliability. • Maintain operational hygiene through scripting, compliance audits, and automation.
Senior DevOps Engineer – all genders
NFON AGStay connected with NFON: Corporate updates, IR insights, and key news. #NFON
• verantwortlich für den Aufbau und die Verwaltung von Kubernetes-Clustern, um Container-Anwendungen zu orchestrieren • implementierst und pflegst CI/CD-Pipelines, um Entwicklungs-, Test- und Bereitstellungsprozesse zu automatisieren • arbeitest eng mit den Entwicklungsteams zusammen, um skalierbare und hochverfügbare Cloud-Infrastrukturen zu implementieren und zu optimieren • sicherstellst die Stabilität und Performance der Systeme, indem du Systemressourcen proaktiv überwachst und optimierst • unterstützt bei der Einführung und Verbesserung von DevOps-Best Practices in den Teams
Senior DevOps Engineer
Private IdentityPrivacy-preserving lightweight remote onboarding (IAL2), authentication (AAL2) and federation (FAL2).
Role Description You'll work under the Lead DevOps Engineer as a key contributor on a collaborative team, executing on infrastructure work, responding to incidents, and helping keep our multi-cloud environments reliable and secure. You're someone who takes direction well, communicates proactively, and brings enough experience to work independently on complex tasks without needing hand-holding. This is a hands-on, execution-focused role. You'll be deep in Kubernetes, Terraform, CI/CD pipelines, and on-call rotations day to day. What You Will Do - Infrastructure & Cloud - Contribute to multi-cloud infrastructure across AWS (EKS, IAM, multi-account) and GCP (GKE Autopilot, IAM, multi-project) using Terraform - Help provision and manage per-client environments — VPC, Kubernetes cluster, DNS, SSL, container registry, secrets, and GitOps integration — following established patterns and module library - Manage bastion hosts, networking, firewalls, and VPC peering under guidance from the lead - CI/CD & GitOps - Maintain and improve GitHub Actions pipelines for build, test, and deployment workflows - Support ArgoCD-based GitOps deployments across multiple GKE and EKS clusters - Help maintain reusable workflow templates used across all product repositories - Observability & Incident Response - Monitor application and infrastructure health using New Relic dashboards and alerts - Actively participate in the on-call rotation via PagerDuty — acknowledge alerts promptly, triage issues, escalate when needed, and follow up thoroughly - Contribute to runbooks and post-mortems after incidents - Proactively flag performance issues and anomalies to the team - Security & Secrets Management - Manage and rotate secrets across all environments using Doppler — AWS IAM keys, GCP service account keys, MongoDB Atlas API keys, GitHub tokens - Follow and uphold least-privilege IAM practices across AWS, GCP, GitHub, and Azure AD - Assist with employee access provisioning and offboarding Qualifications - 7+ years in a DevOps, SRE, or Platform Engineering role - Strong Terraform skills — modules, remote state, multi-environment configurations - Hands-on experience with AWS (EKS, IAM, EC2, S3, multi-account) and GCP (GKE, IAM, Workload Identity) - Kubernetes — Helm, RBAC, namespaces, cluster troubleshooting - GitHub Actions — building and maintaining CI/CD pipelines, reusable workflows - ArgoCD or similar GitOps tooling - Active on-call experience — you've been paged, you know how to triage fast and communicate clearly under pressure - New Relic or equivalent (Datadog, Grafana) — dashboards, alerts, log querying - PagerDuty — on-call rotations, escalation policies, alert routing - Secrets management — Doppler, Vault, or AWS Secrets Manager - Strong scripting in Bash and/or Python - A team-first attitude — you share knowledge, ask questions early, and don't go dark Nice to Have - GCP Workload Identity Federation and AWS IRSA (keyless CI/CD auth) - MongoDB Atlas administration (clusters, VPC peering, API key management) - Azure AD / Entra ID user and access management - Slack app integrations for deployment approvals and alerting - Experience managing infrastructure for multiple clients or tenants from a single codebase Benefits - Competitive compensation and equity - A remote-first, collaborative culture
• Co-responsibility for system availability: You will actively contribute to the availability, reliability and efficiency of our complex system architecture. • Maintenance and automation: You will support the maintenance and automation of our existing infrastructure. • Monitoring and analysis: You will improve our monitoring strategies and perform comprehensive root-cause analyses. • High availability: You must be prepared to respond during off-hours, including occasional overnight incidents. • Software development: Several years' experience in one or more programming languages is required.



