Lakeside Software helps IT teams monitor and optimize environments by focusing on the quantified end-user experience.
DevOps Engineer
Location
India
Posted
6 days ago
Salary
0
Seniority
Mid Level
No structured requirement data.
Job Description
DevOps Engineer
Lakeside Software
Role Description We are seeking a driven and technically skilled DevOps Engineer with strong Microsoft Azure experience to support, troubleshoot, and improve cloud infrastructure, CI/CD pipelines, automation, monitoring, and operational reliability across production environments. This role is highly operational and troubleshooting focused, requiring someone who is comfortable diagnosing production issues, responding to alerts and outages, managing escalated support tickets, and serving as a key escalation point for infrastructure and application support. The ideal candidate enjoys problem solving, identifying root causes, stabilizing environments, and partnering cross functionally to resolve complex operational issues quickly and effectively. This position operates within Agile/Scrum environments while balancing real time operational support priorities. Responsibilities - Build, deploy, maintain, and troubleshoot scalable Azure cloud infrastructure - Develop and maintain Infrastructure as Code (IaC) solutions - Create, manage, troubleshoot, and improve CI/CD pipelines and deployment automation - Monitor production systems and actively respond to operational alerts, incidents, outages, and performance degradation - Own and manage escalated support tickets and serve as a technical escalation point for operational issues - Investigate and troubleshoot infrastructure, deployment, networking, database, and application related problems - Perform root cause analysis and implement corrective actions to improve long term system stability - Support highly available environments aligned with SLA/SLO objectives - Participate in on call rotations and support critical production incidents as needed - Perform application maintenance, patching, upgrades, and environment support activities - Collaborate with development, security, infrastructure, and support teams to resolve operational issues quickly - Work within Agile/Scrum processes while also handling ad hoc operational and troubleshooting priorities - Implement operational best practices for reliability, security, monitoring, and performance optimization - Maintain operational documentation, deployment standards, troubleshooting guides, and support procedures Qualifications - 5+ years of experience working in technology, infrastructure, cloud engineering, DevOps, or IT operations roles - 3+ years of hands on experience with Microsoft Azure cloud services - Experience supporting and troubleshooting production environments with SLA/SLO requirements - Strong experience responding to operational alerts, incidents, outages, escalations, and infrastructure troubleshooting activities - Experience diagnosing and resolving deployment, networking, application connectivity, and system performance issues - Experience working in fast paced Agile/Scrum and ad hoc operational support environments - Experience acting as a ticket owner or escalation resource for infrastructure and application related support cases - 3+ years of Infrastructure as Code (IaC) experience using Terraform preferred; ARM templates and/or Bicep acceptable - 2+ years of experience working with SQL databases and Active Directory environments - Experience designing, managing, and troubleshooting CI/CD pipelines using GitHub Actions, Bitbucket Pipelines, and/or Azure DevOps - Strong experience with Git based version control systems, primarily GitHub - Experience with automation and scripting using PowerShell, Bash, or Python - Hands on experience with monitoring and observability platforms such as Azure Monitor, Grafana, Uptrends, and Application Insights - Experience troubleshooting Azure networking components including VNets, NSGs, Private Endpoints, peering, load balancing, and application connectivity - Understanding of cloud security, operational reliability, and infrastructure best practices Preferred Qualifications - Microsoft Certified: Azure Administrator Associate (AZ-104) - Experience with containerization and orchestration technologies such as Docker or Kubernetes - Experience supporting or integrating AI/ML related Azure services such as Azure OpenAI, Azure AI Foundry, or Azure AI Search - Familiarity with GitOps or platform engineering concepts - Strong troubleshooting, analytical thinking, and root cause analysis skills - Strong communication and cross team collaboration skills Benefits - 20 Days Annual Leave - 45 Days Annual Leave Maximum - 4 Festival Days Named - 8 Festival Days Select - 12 Days Sick Leave - 100% Paid Medical Insurance & GPA - Wellness Programme - 3x CTC Group Life Insurance - Pension - Employee Referral Scheme
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Senior Platform Engineer – Kubernetes, Middleware, DevOps
Trace3Trace3 is an information technology and services company that helps businesses around the world evolve with the fast-changing climate of IT innovation. Headquartered in Irvine, Cal
• Support day-to-day operations of the CU Boulder following systems: • Maintain, monitor, patch, and upgrade application and middleware environments • Manage and support logging, monitoring, and reporting platforms • Troubleshoot and resolve complex issues; perform root cause analysis • Support system implementations, upgrades, and production rollouts • Collaborate with cross-functional teams and stakeholders • Document system configurations, processes, and operational procedures • Evaluate and recommend tools to improve performance, reliability, and cost efficiency • Coordinate and participate in maintenance windows, including after-hours activities
DevOps Engineer - GCCA Remote
TransUnionTransUnion is a global information and insights company that makes trust possible by ensuring that each consumer is reliably and safely represented in the marketplace. We do this by having an accurate and comprehensive picture of each person. This picture is grounded in our legacy as a credit reporting agency which enables us to tap into both credit and public record data; our data fusion methodology that helps us link, match and tap into the awesome combined power of that data; and our knowledgeable and passionate team, who stewards the information with expertise, and in accordance with local legislation around the world. Because of our work, organizations can better understand consumers in order to make more informed decisions, and earn their trust through great, personalized experiences, and the proactive extension of the right opportunities, tools and offers. In turn, consumers can be confident that their data identities will result in the opportunities they deserve. We make trust possible, so businesses and consumers can transact with confidence and achieve great things. We call this Information for Good®—it’s our purpose, and what drives us every day.
TransUnion's Job Applicant Privacy Notice What We'll Bring: We Are TransUnion: TransUnion is a major credit reference agency, and we offer specialist services in fraud, identity and risk management, automated decisioning and demographics. We support organisations across a variety of sectors including finance, retail, telecommunications, utilities, gaming, government and insurance. What You'll Bring: Technical Expertise: - Strong Linux systems administration experience, including firewalls and hardening - Expertise in Docker and container orchestration. - Proficiency with Infrastructure as Code (IaC) tools, particularly Terraform. - Experience with network design, administration, and troubleshooting. - Knowledge of programming languages (e.g., JavaScript, Node.js, PHP). - Experience with version control systems, ideally Git. - Web server configuration (Apache, Nginx + nice to have: MSSQL Server). - Database management (MySQL, MongoDB), including high availability and backup solutions. - Hands-on experience managing cloud providers, with significant experience in AWS and Google Cloud Platform (GCP). - Familiarity with GCP services such as Compute Engine, Kubernetes Engine (GKE), Cloud Storage, BigQuery, and IAM. - Familiarity with configuration management and IT automation tools. - Strong understanding of DevOps and SRE principles. Impact You'll Make: Infrastructure & Operations: - Participate in the design, implementation, and maintenance of our infrastructure, ensuring reliability, scalability, and security. - Support, monitor, and enhance the live infrastructure and platform solutions, ensuring high availability and performance. - Help plan and execute the integration of our current infrastructure into TransUnion's group-wide cloud platform while minimising disruptions. - Participate in the migration of infrastructure from AWS to Google Cloud Platform (GCP), ensuring a smooth transition and leveraging GCP services effectively. DevOps & Security: - Maintain robust CI/CD pipelines, collaborating closely with development teams to streamline deployment processes. - Maintain and enhance our security posture, ensuring compliance with industry standards and frameworks (e.g., SOC-2, ISO 27001). - Diagnose and resolve infrastructure outages and incidents, ensuring timely resolution and root cause analysis. Documentation & Best Practices: - Ensure comprehensive documentation of infrastructure, systems, and processes to support onboarding, troubleshooting, and scalability. - Promote and implement DevOps and Site Reliability Engineering (SRE) best practices across the organisation. For positions based in South Africa, preference will be given to suitably qualified candidates from designated groups in line with the company's Employment Equity plan and targets. Should you have not heard from TransUnion within 3 weeks from applying, please regard your application as unsuccessful. Please note it is a requirement of the Global Capability Centre Africa that you reside in a home that is fibre ready; and has space for you to be able to work comfortably and confidentially on a day-to-day basis for the purpose of your proposed employment. You can be based anywhere in South Africa that has fibre, but will not be able to work in a location outside of South Africa. A Minimum of a 100/100 Meg Fibre line is required, should you be successful, you will need to upgrade your line or install fibre for day one. Please note that being a credit bureau, some positions require a clear credit record. At TransUnion, we encourage and are committed to creating a real, positive impact and shared sense of purpose within our Workforce for Good, which empowers our people to grow, innovate and contribute to a better future for our communities and customers. We strive to build an environment where our associates are in the driver’s seat of their professional development— while having access to help along the way. We recognize that success comes when our associates thrive both professionally and personally; that’s why we prioritize work/life flexibility and offer resources for our teams across the globe to collaborate and drive excellence. Be a part of our Workforce for Good – you’ll work with great people, pioneering products and cutting-edge technology. At GCC Africa, you’ll join a purpose‑driven organisation that invests deeply in its people through competitive rewards, comprehensive benefits, and meaningful career growth. We offer flexible, permanent work‑from‑home arrangements, strong wellbeing and support programmes for you and your family, and access to global exposure, continuous learning, and accredited development opportunities. Our inclusive culture, focus on recognition, and commitment to work–life balance ensure you can grow your career while thriving personally This is a hybrid position and involves regular performance of job responsibilities virtually as well as in-person at an assigned TU office location for a minimum of two days a week. TransUnion Job Title Sr Engineer, Development Ops
• Maintain and optimize existing monitoring and automation solutions • Collaborate with stakeholders to gather requirements • Define monitoring strategies and engineer solutions • Design and implement cloud automation and orchestration workflows • Develop and maintain integrations with RESTful APIs • Create and maintain technical documentation • Continuously analyze and improve monitoring KPIs and incident response processes
Senior Site Reliability Engineer (SRE)
DevsuDevsu is a technology agency that provides software development services, IT augmentation and staffing.
Role Description We are seeking a Site Reliability Engineer (SRE) with deep expertise in monitoring, observability, and reliability engineering to support systems running across on-premises infrastructure and Google Cloud Platform (GCP). This role is primarily responsible for designing, operating, and improving monitoring, alerting, and observability platforms, with a strong focus on Grafana and Kubernetes environments. As a secondary responsibility, this role provides backup coverage for the Application Support team during periods of resource constraints or major incidents, offering L2/L3 technical support when required. Responsibilities - Monitoring & Observability (Core Focus) - Own and operate the monitoring and observability stack across on-prem and GCP environments - Design, build, and maintain Grafana dashboards for infrastructure, Kubernetes, and applications - Define, tune, and maintain alerts to ensure high signal-to-noise ratio - Establish observability standards and best practices across teams - Improve visibility into system health, performance, and reliability - Site Reliability Engineering - Apply SRE principles to improve availability, performance, and resilience - Define and track SLIs, SLOs, and error budgets - Participate in on-call rotations and SEV incident response - Lead or contribute to incident investigations and root cause analysis (RCA) - Drive preventative actions to reduce repeat incidents - Kubernetes & Platform Reliability - Support and monitor Kubernetes environments (GKE and on-prem clusters) - Monitor cluster health, capacity, and resource utilization - Troubleshoot platform-level issues impacting application reliability - Collaborate with Platform and Engineering teams on reliability improvements - Secondary Responsibilities (Backup Application Support) - Provide L2/L3 application support coverage during: - Support team resource shortages - High-severity incidents (SEVs) - Peak support periods or escalations - Triage and troubleshoot application issues using existing runbooks and dashboards - Collaborate with Application Support and Engineering teams during incidents - Ensure all actions, findings, and resolutions are documented in ServiceNow (SNOW) Qualifications - Strong experience as a Site Reliability Engineer or Reliability Engineer - Deep hands-on expertise with Grafana (dashboards, alerting, troubleshooting) - Solid experience with monitoring and observability systems - Production experience operating Kubernetes environments - Experience supporting systems in GCP and on-prem environments (mandatory) - Strong Linux systems and troubleshooting skills - Fluent English (written and spoken) - Ability to work in PST time zone - Ability to participate in an on-call rotation that includes coverage for one weekend day Requirements - Technology Stack: - Observability: Grafana, Prometheus, logging platforms - Containers: Kubernetes (GKE and on-prem) - Cloud: Google Cloud Platform (GCP) - Operations: Linux, networking, infrastructure monitoring - Incident Tools: PagerDuty, ServiceNow, Slack (or equivalents) - Nice to have: - Experience supporting application teams during SEV incidents - Knowledge of capacity planning and performance tuning - Scripting skills (Python, Bash, etc.) - Experience with hybrid infrastructure environments Benefits - A stable, long-term contract with opportunities for career growth - Private health insurance - A remote-friendly culture that promotes work-life balance - Continuous training, mentorship, and learning programs to keep you at the forefront of the industry - Free access to AI training resources and state-of-the-art AI tools to elevate your daily work - A flexible Paid Time Off (PTO) policy as well as paid holiday days - Challenging, world-class software projects for clients in the US and LatAm - Collaboration with some of the most talented software engineers in Latin America and the US, in a diverse work environment




