The market intelligence and search platform trusted by over 3,500 leading organizations
Cloud Reliability Engineer – Recovery
Location
India
Posted
30 days ago
Salary
0
Seniority
Senior
Job Description
Cloud Reliability Engineer – Recovery
AlphaSense
• Design and implement multi-region, multi-AZ AWS architectures that meet RTO/RPO targets • Engineer active-active and active-passive failover patterns using Route 53, Global Accelerator, and CloudFront • Build automated DR runbooks and playbooks using AWS Systems Manager Automation and Step Functions • Implement chaos engineering practices using AWS Fault Injection Simulator (FIS) to validate resiliency • Architect cross-region replication strategies for S3, DynamoDB Global Tables, RDS, and Aurora Global • Review containerized workloads using Kubernetes, ensuring resilience through self-healing, auto-scaling, and multi-cluster or multi-region deployments. • Administer AWS Backup across all services (EC2, EBS, RDS, EFS, FSx, DynamoDB, Aurora) with policy-based automation • Design immutable backup vaults and cross-account/cross-region backup replication pipelines • Develop and automate data recovery testing procedures, ensuring integrity and meeting defined SLAs • Implement point-in-time recovery (PITR) for databases and storage; validate via regular restore drills • Maintain Business Continuity Plans (BCP) and Disaster Recovery (DR) strategies, including tracking RTO (Recovery Time Objective) and RPO (Recovery Point Objective). • Author and maintain Terraform/CloudFormation templates for all BCP/DR infrastructure components • Automate DR testing pipelines through CI/CD (CodePipeline, CodeBuild, GitHub Actions) • Write Python/Bash/PowerShell scripts to orchestrate failover, failback, and health-check workflows • Manage infrastructure state in AWS Control Tower and implement Landing Zone DR patterns • Build CloudWatch dashboards, alarms, and composite alarms for availability and DR-readiness indicators • Integrate AWS Health, Personal Health Dashboard events into PagerDuty/OpsGenie alerting workflows • Participate in on-call rotations and lead DR incident response; conduct post-incident reviews (PIRs) • Develop and maintain runbooks for AWS service degradations, regional outages, and data corruption events • Conduct regular BCP/DR tabletop exercises and full failover simulations to validate recovery procedures and improve organizational readiness, document results and action items. • Ensure DR controls meet SOC 2, ISO 22301, NIST 800-53, and HIPAA/PCI requirements as applicable • Maintain current and accurate DR documentation: BIAs, BCPs, DRP runbooks, and recovery evidence • Collaborate with audit and compliance teams to provide DR evidence and remediation tracking
Job Requirements
- 5+ years in cloud infrastructure, SRE, or IT disaster recovery engineering roles
- 3+ years of hands-on AWS experience in production environments at scale
- Proven delivery of multi-region DR architectures with defined and tested RTO/RPO targets
- Expert-level proficiency with core AWS resilience services
- Strong scripting skills: Python, Bash, or PowerShell for automation and orchestration
- Experience with Infrastructure as Code: Terraform and/or AWS CloudFormation
- Solid understanding of networking fundamentals: VPC, TGW, Direct Connect, VPN, DNS failover
- Excellent written and verbal communication; able to produce executive-level DR reports.
Benefits
- Competitive salary
- Remote work options
Related Guides
Related Categories
Related Job Pages
More Engineer Jobs
Role Description We are looking for experienced Safety Case Engineers to join our growing team. If you are a safety professional looking for a new challenge, we would like to hear from you. There are opportunities in our offices in Glasgow, Warrington, and London, but we can also offer home-based working for those based elsewhere. We are looking for self-motivated, passionate safety professionals to grow our existing safety case capability. We have a specific need for candidates with the following specific skills and experience: - ♻️ Waste and Decommissioning - 📜 Licensing and Regulation - 💧 Pressurised Water Reactors - ⚠️ Internal and External Hazards Assessment You'll be responsible for: - Leading projects for a range of clients across the UK nuclear industry, including new build projects, supporting operating facilities, as well as decommissioning and waste management. - Taking an active role in business development, building and maintaining relationships with clients, and supporting bids for large and small projects and frameworks. - Mentoring and supporting our Juniors, as well as supporting external programmes, such as involvement in initiatives led by institutions or other relevant bodies, sponsoring undergraduate projects, supporting STEM activities in schools, or our own Annual Risk and Safety Forum. Qualifications - 🎓 Degree (or relevant work experience) in any of the following disciplines: Mechanical Engineering, Electrical Engineering, Chemical Engineering, Nuclear Engineering, Physics, or Mathematics. - ✅ Demonstrable experience at a senior level in the production of safety assessments for nuclear facilities. - 📚 A detailed understanding of UK regulatory requirements and international standards and guidance, particularly relating to Nuclear Safety Assessment. Benefits - ⭐ Flexible working arrangements to suit individuals - ⭐ 25 days' holiday entitlement plus bank holidays - ⭐ Ability to sell or purchase leave - ⭐ Company pension scheme - ⭐ Life assurance - ⭐ Paid membership of two professional institutions/organisations - ⭐ Cycle to work scheme
• In close collaboration with our business stakeholders, you identify requirements across end-to-end processes and systems and prepare them in a clear and understandable way. • You conduct structured interviews, lead interdisciplinary workshops and analyze available data to understand current processes and derive future target processes. • You document processes, workflows and system functionalities to build sustainable knowledge and information, which serves as the basis for decision-making and continuous improvement. • You act as a Center of Excellence for your topics in the production environment and have a deep understanding of how processes and functionalities are structured and who works with them in which roles. • You critically question the current state and see yourself as a bridge between IT and business areas to prepare all requirements for the implementation of key projects and features in a comprehensible way and to enable prioritization.
• Work on communication stacks, in-depth domain knowledge on Bluetooth Classic, Bluetooth LE, Bluetooth Audio, Multipoint, Auracast, Controller stack protocols, BLE profiles/Services. • Work on system level architecture; create and maintain system level solutions comprised of state-of-the-art hardware and software components and provide documentation for our products. • Develop experience with functions and protocols such as TWS Role switch, fast pairing, voice assistants, Conformance and Teams certifications. • Understand complex customer application ecosystems and integrate our product offerings. • Develop and debug all aspects of system software, understanding overall system performance from both a hardware and software perspective.
Interview Engineer
Interview PenHigh-quality content, community, & tools to empower technologists looking to succeed in upscaling their careers.
• Interview—Facilitate an interview through Karat's platform. Create an enjoyable candidate experience while you evaluate a candidate's fundamental coding skills and technical knowledge. • Evaluate—Provide input on the candidate's performance, coding style, communication skills, knowledge question answers, and coding approach. • Partner—Collaborate with Karat to test content, processes, and products.




