We're helping our clients identify and capture opportunities across the entire lifecycle of their real estate activity.
Site Reliability Engineer – AWS
Location
United States
Posted
4 days ago
Salary
$110K - $140K / year
Seniority
Senior
Job Description
Site Reliability Engineer – AWS
SitusAMC
• Support products transitioned from on-prem data center into AWS Cloud • Implement cloud best practices for newly transitioned products • Maintain operational coverage of environments • Enhance automation capabilities and process improvement • Collaborate with development teams for secure migration of changes • Design and implement scalable and reliable solutions • Improve observability of Applications running in Cloud • Lead strategic initiatives for seamless application integration
Job Requirements
- Bachelor’s degree or equivalent combination of education and experience
- 5+ years of industry and/or relevant experience
- 1+ years in an Associate level role or external equivalent
- Strong Experience with Containerization, Kubernetes, EKS
- CI/CD pipelines using GitOps Methodology like ArgoCD and FluxCD
- Experience with Git for version controls, Azure DevOps preferred
- Experience with Monitoring tool (especially CloudWatch)
- Strong experience with Terraform and best practices in IAC
- Strong scripting and automation skills (Python, Bash, PowerShell)
- Expertise in managing EC2 instances and AMIs
- Proficiency in SQL and SQL administration
- Experience with service mesh technologies (e.g., Linkerd, Istio) is a plus
- Strong Network Management skills
Benefits
- PTO and paid holidays
- Medical, dental, vision, life, disability insurance
- 401K
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Senior Site Reliability Engineer
Honeycomb.ioThe fastest way to visualize, understand and debug software. Find the critical issues that logs and metrics can’t see.
• Help Honeycomb scale our backend systems to support our highest-volume customers. • Build organizational trust through transparent communication, giving and receiving direct and kind feedback. • Work with other backend teams to dive deep into our stack to make sure we’re getting the most out of our infrastructure. • Be trained, become, and then train others as an Incident Commander. • Help SRE and Honeycomb develop a healthy cross-Atlantic engineering culture. • Participate in the team’s on-call rotation as the EU side of a new follow-the-sun rotation. • Help the organization navigate tradeoffs between reliability and its other goals and priorities. • Optional: act as an external ambassador through blog posts, conference talks, and presentations with support from our DevRel team.
Senior Site Reliability Engineer - Investments (She/ He/ They)
CapcoCapco, a Wipro company, is a management & technology consultancy dedicated to the financial services & energy industries
CAPCO POLAND *We are looking for Poland based candidate. At Capco Poland, we’re not just another consultancy - we’re the spark behind digital transformation in the financial world. As a global leader in technology and management consulting, we thrive on helping clients tackle the toughest challenges across banking, payments, capital markets, wealth, and asset management. ROLE OVERVIEW: We are looking for a Site Reliability Engineer (SRE) to act as an embedded reliability engineering partner supporting critical digital platforms and business services within a leading global financial services environment. Working alongside application, platform, and infrastructure teams, you will help improve the reliability, scalability, observability, and operational maturity of systems that support investment, advisory, and customer-facing services. You will apply Site Reliability Engineering principles to reduce operational risk, improve service availability, and enhance customer experience across a complex technology landscape. The role combines hands-on engineering with operational leadership, leveraging automation, AI-driven capabilities, and modern observability practices to accelerate incident response, reduce manual effort, and continuously improve service resilience. You will play a key role in driving reliability outcomes while collaborating with stakeholders across technology and business functions. WHAT YOU'LL DO: - Partner with engineering, platform, and business-aligned teams to improve the reliability and performance of critical financial services applications. - Define, measure, and manage SLIs, SLOs, and error budgets to drive data-driven reliability improvements. - Lead and support incident management activities, participating in a 24x7 on-call rotation and driving effective post-incident reviews. - Build automation, self-healing capabilities, and operational tooling that reduce manual intervention and improve service recovery times. - Analyse application, infrastructure, and platform performance to identify reliability risks and deliver continuous improvements across the technology estate. - Partner with India-based team of engineers WHAT WE'RE LOOKING FOR: - Proven experience in Site Reliability Engineering, Production Engineering, DevOps, Platform Engineering, or a similar operationally focused role. - Strong knowledge of observability, monitoring, incident management, reliability engineering, and service operations best practices. - Experience supporting business-critical applications within complex enterprise environments. - Hands-on experience with automation, scripting, infrastructure management, and cloud or hybrid technology platforms. - Excellent communication skills with the ability to collaborate effectively with engineering teams, operational stakeholders, and business partners. BONUS POINTS FOR: - Experience supporting wealth management, investment management, banking, or other financial services platforms. - Knowledge of regulatory, security, and operational resilience requirements within highly governed environments. - Experience implementing AIOps, intelligent alerting, automated remediation, or predictive monitoring capabilities. - Familiarity with Kubernetes, container platforms, cloud-native architectures, and distributed systems. - Experience driving service reliability programmes using SLOs, error budgets, and operational excellence frameworks. We offer a flexible collaboration model based on a B2B contract, with the opportunity to work on diverse projects. Recruitment Process: - HR Interview with the recruiter - Technical Interview - Client Interview - Feedback and offer #LI-HYBRID
Senior DevSecOps Engineer
VanillaMaking Estate Planning Simple for Financial Advisors. Built for advisors, loved by clients.
• Own and operate security tooling, manage key vendor relationships, and drive application and cloud security programs forward • Secure AWS infrastructure, systems, and networking • Review infrastructure-as-code (Terraform) changes for security implications • Monitor and triage security alerts across dedicated channels • Manage the vCISO relationship and own the annual penetration test lifecycle • Run tabletop exercises and maintain the incident response playbook
Associate Site Reliability Engineer
UnitedHealth GroupUnitedHealth Group is a healthcare and well-being company that’s dedicated to improving the health outcomes of millions around the world. We are comprised of two distinct and com
Role Description As a member of our team, you will: - Design, develop, and deploy AI-powered solutions using no-code, low-code, and advanced platforms, translating business needs into scalable applications that enhance products, workflows, and decision-making. - Design, deploy, and maintain Kubernetes-based infrastructure to ensure high availability and scalability of applications. - Build and manage CI/CD pipelines using GitHub Actions to enable fast and reliable deployments. - Use Terraform to provision and manage infrastructure in Google Cloud Platform (GCP). - Manage and optimize Apache Kafka-based systems to ensure reliable message streaming and data processing. - Monitor and improve system performance and reliability using Prometheus and Grafana. - Collaborate with developers to automate workflows and implement best practices for infrastructure-as-code (IaC). - Write Python scripts for automation and tooling to enhance operational efficiency. - Troubleshoot and resolve system issues to minimize downtime and impact on users. - Participate in on-call rotations and incident response to ensure high service reliability. Qualifications - 1+ years of experience with Google Cloud Platform (GCP) services such as Compute Engine, Kubernetes Engine, and Cloud Storage. - 1+ years of hands-on experience with Kubernetes for deploying and managing containerized applications. - 1+ years of experience in understanding GitHub Actions for creating and maintaining CI/CD pipelines. - 1+ years of experience in proficiency in Python for scripting, automation, and tooling. - 1+ years of experience with Apache Kafka for building, maintaining, and troubleshooting message-driven systems. - 1+ years of experience using Prometheus and Grafana for monitoring and observability. - Basic level of knowledge of Terraform for infrastructure provisioning and management. Requirements - Familiarity with other cloud providers (e.g., AWS or Azure). - Knowledge of Helm for Kubernetes package management. - Experience with debugging and optimizing distributed systems. - Exposure to security best practices for cloud infrastructure. - Knowledge of Java for developing and troubleshooting backend systems. - Familiarity with DataHub or similar data cataloging and metadata management platforms. - Understanding of Artificial Intelligence (AI) concepts and tools, such as building or managing machine learning pipelines, integrating AI models, or working with ML platforms like TensorFlow, PyTorch, or Vertex AI. - Experience with Golang for developing infrastructure tools or cloud-native applications. Benefits - Comprehensive benefits package. - Incentive and recognition programs. - Equity stock purchase. - 401k contribution (subject to eligibility requirements).



