Job Closed
This listing is no longer active.
We Believe All Possibilities Live in Technology
Senior Platform Engineer – Kubernetes, Middleware, DevOps
Location
California
Posted
9 days ago
Salary
$80K - $110K / year
Seniority
Senior
Job Description
Senior Platform Engineer – Kubernetes, Middleware, DevOps
Trace3
• Support day-to-day operations of the CU Boulder following systems: • Maintain, monitor, patch, and upgrade application and middleware environments • Manage and support logging, monitoring, and reporting platforms • Troubleshoot and resolve complex issues; perform root cause analysis • Support system implementations, upgrades, and production rollouts • Collaborate with cross-functional teams and stakeholders • Document system configurations, processes, and operational procedures • Evaluate and recommend tools to improve performance, reliability, and cost efficiency • Coordinate and participate in maintenance windows, including after-hours activities
Job Requirements
- Advanced configuration management skills with Ansible
- Advanced experience in Unix/Linux systems administration
- Shell scripting in bash and Python or equivalent programming skills
- Experience working with Rancher 2 and Kubernetes
- Experience working with Neo4J, Cognos, Sunapsis, Tableau
- Experience with Graylog and/or Grafana
- Demonstrated experience logging with the Elastic Stack (aka ELK Stack) and Graylog
- Demonstrated experience monitoring with Icinga or equivalent
- Experience with disaster recovery planning and testing
- Experience with Oracle middleware stack (OSB, OIM, OUD, WebLogic, etc.)
- OS patching and upgrade processes
- AWS or similar cloud platforms
Benefits
- Comprehensive medical, dental and vision plans for you and your dependents
- 401(k) Retirement Plan with Employer Match, 529 College Savings Plan, Health Savings Account, Life Insurance, and Long-Term Disability
- Competitive Compensation
- Training and development programs
- Major offices stocked with snacks and beverages
- Collaborative and cool culture
- Work-life balance and generous paid time off
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
DevOps Engineer - GCCA Remote
TransUnionFounded in 1968, TransUnion is a credit information management services provider for consumers, businesses, and the global credit community. An equal opportunity employer recognize
TransUnion's Job Applicant Privacy Notice What We'll Bring: We Are TransUnion: TransUnion is a major credit reference agency, and we offer specialist services in fraud, identity and risk management, automated decisioning and demographics. We support organisations across a variety of sectors including finance, retail, telecommunications, utilities, gaming, government and insurance. What You'll Bring: Technical Expertise: - Strong Linux systems administration experience, including firewalls and hardening - Expertise in Docker and container orchestration. - Proficiency with Infrastructure as Code (IaC) tools, particularly Terraform. - Experience with network design, administration, and troubleshooting. - Knowledge of programming languages (e.g., JavaScript, Node.js, PHP). - Experience with version control systems, ideally Git. - Web server configuration (Apache, Nginx + nice to have: MSSQL Server). - Database management (MySQL, MongoDB), including high availability and backup solutions. - Hands-on experience managing cloud providers, with significant experience in AWS and Google Cloud Platform (GCP). - Familiarity with GCP services such as Compute Engine, Kubernetes Engine (GKE), Cloud Storage, BigQuery, and IAM. - Familiarity with configuration management and IT automation tools. - Strong understanding of DevOps and SRE principles. Impact You'll Make: Infrastructure & Operations: - Participate in the design, implementation, and maintenance of our infrastructure, ensuring reliability, scalability, and security. - Support, monitor, and enhance the live infrastructure and platform solutions, ensuring high availability and performance. - Help plan and execute the integration of our current infrastructure into TransUnion's group-wide cloud platform while minimising disruptions. - Participate in the migration of infrastructure from AWS to Google Cloud Platform (GCP), ensuring a smooth transition and leveraging GCP services effectively. DevOps & Security: - Maintain robust CI/CD pipelines, collaborating closely with development teams to streamline deployment processes. - Maintain and enhance our security posture, ensuring compliance with industry standards and frameworks (e.g., SOC-2, ISO 27001). - Diagnose and resolve infrastructure outages and incidents, ensuring timely resolution and root cause analysis. Documentation & Best Practices: - Ensure comprehensive documentation of infrastructure, systems, and processes to support onboarding, troubleshooting, and scalability. - Promote and implement DevOps and Site Reliability Engineering (SRE) best practices across the organisation. For positions based in South Africa, preference will be given to suitably qualified candidates from designated groups in line with the company's Employment Equity plan and targets. Should you have not heard from TransUnion within 3 weeks from applying, please regard your application as unsuccessful. Please note it is a requirement of the Global Capability Centre Africa that you reside in a home that is fibre ready; and has space for you to be able to work comfortably and confidentially on a day-to-day basis for the purpose of your proposed employment. You can be based anywhere in South Africa that has fibre, but will not be able to work in a location outside of South Africa. A Minimum of a 100/100 Meg Fibre line is required, should you be successful, you will need to upgrade your line or install fibre for day one. Please note that being a credit bureau, some positions require a clear credit record. At TransUnion, we encourage and are committed to creating a real, positive impact and shared sense of purpose within our Workforce for Good, which empowers our people to grow, innovate and contribute to a better future for our communities and customers. We strive to build an environment where our associates are in the driver’s seat of their professional development— while having access to help along the way. We recognize that success comes when our associates thrive both professionally and personally; that’s why we prioritize work/life flexibility and offer resources for our teams across the globe to collaborate and drive excellence. Be a part of our Workforce for Good – you’ll work with great people, pioneering products and cutting-edge technology. At GCC Africa, you’ll join a purpose‑driven organisation that invests deeply in its people through competitive rewards, comprehensive benefits, and meaningful career growth. We offer flexible, permanent work‑from‑home arrangements, strong wellbeing and support programmes for you and your family, and access to global exposure, continuous learning, and accredited development opportunities. Our inclusive culture, focus on recognition, and commitment to work–life balance ensure you can grow your career while thriving personally This is a hybrid position and involves regular performance of job responsibilities virtually as well as in-person at an assigned TU office location for a minimum of two days a week. TransUnion Job Title Sr Engineer, Development Ops
• Maintain and optimize existing monitoring and automation solutions • Collaborate with stakeholders to gather requirements • Define monitoring strategies and engineer solutions • Design and implement cloud automation and orchestration workflows • Develop and maintain integrations with RESTful APIs • Create and maintain technical documentation • Continuously analyze and improve monitoring KPIs and incident response processes
Senior Site Reliability Engineer (SRE)
DevsuDevsu is a technology agency that provides software development services, IT augmentation and staffing.
Role Description We are seeking a Site Reliability Engineer (SRE) with deep expertise in monitoring, observability, and reliability engineering to support systems running across on-premises infrastructure and Google Cloud Platform (GCP). This role is primarily responsible for designing, operating, and improving monitoring, alerting, and observability platforms, with a strong focus on Grafana and Kubernetes environments. As a secondary responsibility, this role provides backup coverage for the Application Support team during periods of resource constraints or major incidents, offering L2/L3 technical support when required. Responsibilities - Monitoring & Observability (Core Focus) - Own and operate the monitoring and observability stack across on-prem and GCP environments - Design, build, and maintain Grafana dashboards for infrastructure, Kubernetes, and applications - Define, tune, and maintain alerts to ensure high signal-to-noise ratio - Establish observability standards and best practices across teams - Improve visibility into system health, performance, and reliability - Site Reliability Engineering - Apply SRE principles to improve availability, performance, and resilience - Define and track SLIs, SLOs, and error budgets - Participate in on-call rotations and SEV incident response - Lead or contribute to incident investigations and root cause analysis (RCA) - Drive preventative actions to reduce repeat incidents - Kubernetes & Platform Reliability - Support and monitor Kubernetes environments (GKE and on-prem clusters) - Monitor cluster health, capacity, and resource utilization - Troubleshoot platform-level issues impacting application reliability - Collaborate with Platform and Engineering teams on reliability improvements - Secondary Responsibilities (Backup Application Support) - Provide L2/L3 application support coverage during: - Support team resource shortages - High-severity incidents (SEVs) - Peak support periods or escalations - Triage and troubleshoot application issues using existing runbooks and dashboards - Collaborate with Application Support and Engineering teams during incidents - Ensure all actions, findings, and resolutions are documented in ServiceNow (SNOW) Qualifications - Strong experience as a Site Reliability Engineer or Reliability Engineer - Deep hands-on expertise with Grafana (dashboards, alerting, troubleshooting) - Solid experience with monitoring and observability systems - Production experience operating Kubernetes environments - Experience supporting systems in GCP and on-prem environments (mandatory) - Strong Linux systems and troubleshooting skills - Fluent English (written and spoken) - Ability to work in PST time zone - Ability to participate in an on-call rotation that includes coverage for one weekend day Requirements - Technology Stack: - Observability: Grafana, Prometheus, logging platforms - Containers: Kubernetes (GKE and on-prem) - Cloud: Google Cloud Platform (GCP) - Operations: Linux, networking, infrastructure monitoring - Incident Tools: PagerDuty, ServiceNow, Slack (or equivalents) - Nice to have: - Experience supporting application teams during SEV incidents - Knowledge of capacity planning and performance tuning - Scripting skills (Python, Bash, etc.) - Experience with hybrid infrastructure environments Benefits - A stable, long-term contract with opportunities for career growth - Private health insurance - A remote-friendly culture that promotes work-life balance - Continuous training, mentorship, and learning programs to keep you at the forefront of the industry - Free access to AI training resources and state-of-the-art AI tools to elevate your daily work - A flexible Paid Time Off (PTO) policy as well as paid holiday days - Challenging, world-class software projects for clients in the US and LatAm - Collaboration with some of the most talented software engineers in Latin America and the US, in a diverse work environment
• Own the technical direction of Emergent’s DevOps practice, including the IaC standards, pipeline patterns, code review expectations, and reference architectures that the team applies across client engagements. • Author and maintain the internal documentation, templates, and modules that the team relies on day to day. This is a core part of the role, not a side task. • Mentor a team of engineers from mixed backgrounds, including software developers growing into cloud roles and traditional cloud engineers building DevOps fluency. Pair, review pull requests, run office hours, and raise the technical bar across the group. • Lead the most complex DevOps engagements end to end, from discovery through delivery, with a focus on environments where scale, regulatory requirements, or legacy constraints demand strong technical judgment. This includes grooming backlog work and feature & task assignments to project team as well as influence technical direction across practice areas. • Represent the DevOps practice in pre-sales. Partner with sales and account teams to scope work, lead technical discovery sessions, contribute to proposals, and build client confidence in Emergent’s approach. • Continuously evolve Emergent’s DevOps practice by evaluating new tools, patterns, and approaches, and bringing the right ones into the team’s standard playbook. • Diagnose and improve existing client systems that may be partially implemented or inconsistently managed, balancing pragmatic delivery with long-term maintainability. • Accurately track time and work performed to support client billing, project planning, and overall project health, as part of a professional services delivery model. • Strong time management skills to be able to prioritize workload, meet deadlines, and provide technical support and guidance for other DevOps focused team members. • Share your knowledge at regular talk shop and lunch & learn sessions to help build a stronger team.




