Job Closed
This listing is no longer active.
Founded in 1969, ICF is a global advisory and technology services company headquartered in Reston, Virginia. It delivers data-driven solutions across energy, en
Site Reliability Engineer
Location
Virginia
Posted
135 days ago
Salary
$108.5K - $184.4K / year
Seniority
Senior
Job Description
Site Reliability Engineer
ICF
• Define and maintain SLIs, SLOs, and SLAs for the Internet-based Quality Improvement and Evaluation System (iQIES) application • Performance tuning that will model load scenarios, forecasting capacity, and optimize scaling strategies • Design and optimize the observability stack through New Relic, CloudWatch, and Jenkins CI/CD pipelines • Participate in root cause analysis for operational issues and improve incident response process • Participate in creating, monitoring, and optimizing actionable alerts to respond to issues in a timely manner • Develop tools and scripts • Develop and maintain Jenkins CI/CD pipelines, using declarative Jenkinsfiles and foundational Groovy for pipeline logic and enhancements • Deploy services to Fargate, EKS, Lambda, Airflow, Databases • Manage security groups and access controls • Thoroughly understand fundamentals like security groups, IAM, managing RDS • Apply patch management and hardening practices • Align with DevOps and Technical Leads to ensure overall strategy • Actively participate in releases and product launches with expectation of being online during release windows
Job Requirements
- 5+ years experience in a software development environment and a Bachelor’s degree; OR 3+ years experience in a software development environment and a Master’s degree
- 5+ years supporting a high ‑ availability production environment (cloud or on ‑ prem)
- 3+ years of working in a SRE role in a large scale cloud implementing high availability and scalability
- 3+ years of experience focused on SRE, DevOps, or Platform Engineering
- Must be able to obtain and maintain a public trust clearance
- Candidate must reside in the US, be authorized to work in the US, and work must be performed in the US
- Must have lived in the US 3 full years out of the last 5 years
Benefits
- Reasonable Accommodations are available
- Health insurance
- 401(k) matching
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Senior DevOps – Platform
Saipos | Sistema para RestauranteTornando o dia a dia do seu restaurante mais simples, ágil e inteligente. 🐿️
• Plan, implement, and maintain scalable, reliable, and secure infrastructure on AWS (Lambda, ECS, RDS, ElastiCache, CloudWatch, S3, IAM); • Manage and continuously improve CI/CD pipelines using tools such as Bitbucket Pipelines and Jenkins; • Automate infrastructure provisioning and management using tools such as Terraform, CloudFormation, or AWS CDK; • Ensure effective system observability and monitoring with tools like CloudWatch, Prometheus, and Grafana; • Proactively implement infrastructure security practices (DevSecOps), IAM policies, audits, and continuous vulnerability analysis; • Lead technical incident response, conduct root cause analyses, corrective actions, and preventive measures (post-mortem); • Mentor the team and promote a DevOps culture, automation, and continuous improvement of processes and tools used; • Collaborate directly with internal teams to define development and deployment standards aligned with market best practices.
• Leverage infrastructure as code (Terraform) to build and maintain complex production and analytics workflows including networking and containerized services. • Rapidly diagnose and resolve faults in system services as part of a 24/7 on-call rotation focused on actionable alerting and eliminating toil. • Improve speed of delivery by developing and maintaining CI/CD pipelines. • Develop infrastructure automation leveraging Terraform, Python and Typescript. • Improve system availability, security, compliance, cost effectiveness and performance. • Estimate work, prioritize tasks, track dependencies, report progress, highlight blockers • Participate in continuous improvement initiatives, advocate for SRE best practices, and stay current with emerging technologies and trends. • Be part of a team where your focus will be on building, measuring, and refining the systems infrastructure that runs our software.
Dev Ops Engineer, Level 5
Scratch FinancialScratch Financial is the world's simplest patient financing solution.
• Participates as a technical expert providing advanced knowledge in vendor devices and management systems • Plans and directs development teams and troubleshoots internal application issues • Provides technical solutions for network engineering and operational problems • Interfaces with vendors and engineering organizations • Provides leadership to Network Engineers and the CIEC Development team
• Build, lead, and develop a high-performing team of Site Reliability Engineers responsible for our hybrid cloud infrastructure in AWS, with an on-premise extension in Hetzner . • Design, document, and lead the implementation of reliable and secure infrastructure solutions following industry best practices. • Oversee technical analysis, cost estimation and optimization, platform and system design, architectural compliance, resource planning, and delivery milestones. • Engage in hands-on technical work alongside the team to maintain deep understanding of the infrastructure, and lead incident response during critical issues. • Define team goals and strategy, building strong relationships with internal stakeholders across the organisation. • Manage and coordinate the on-call rotation, including escalation processes, across infrastructure and software engineering teams. • Champion engineering best practices and drive continuous improvement in production environment quality and reliability.




