Job Closed
This listing is no longer active.
The Leading AI Platform for Real-time Information and Event Discovery
Site Reliability Engineer III, Platform Engineering
Location
United States
Posted
103 days ago
Salary
$132K - $194.5K / year
Seniority
Senior
Job Description
Site Reliability Engineer III, Platform Engineering
Dataminr
• Ensure the high-quality delivery of software by building and maintaining tools used by software engineers and data scientists • Work on our self service internal developer platform used by engineering teams to deploy containers, serverless functions and cloud resources • Maintain and improve our observability stack • Drive improvements in security, reliability, cost efficiency and performance • Troubleshoot large-scale distributed systems • Work closely with product engineering teams to enable efficient project delivery • Support our production environment as part of an on call rota, help with triage and resolution when issues arise
Job Requirements
- Experience managing Kubernetes clusters at scale (CKA a bonus)
- Maintaining and hardening AWS infrastructure using Terraform
- Development skills in Python or Go
- Linux systems administration and TCP/IP networking
- Experience maintaining observability tooling e.g. LGTM stack, OpenSearch
Benefits
- Flexible work arrangements
- Generous PTO and sick leave
- Professional development opportunities
Related Guides
Related Categories
Related Job Pages
More Platform Engineer Jobs
• Provide timely and effective technical support to end-users regarding GenAI Platform (e.g., platform access, capabilities, models, API usage, etc.). • Document support interactions, troubleshooting steps, and resolutions in a ticketing system. • Triage and resolve user issues related to platform access, Generative AI model performance, API connectivity issues, and other GenAI Platform functionalities. • Act as the initial point of contact for all support inquiries, including general questions, “How-to” and usability questions, basic troubleshooting, and incident reporting/ security-related issues, providing resources and support to cybersecurity related inquiries. • Help maintain a tech support playbook or Standard Operating Procedure (SOP) to manage, triage and track all tech support activities and inquiries. • Collaborate with Tier 2/3 and engineering teams to resolve complex technical challenges and provide customer feedback for platform improvements. • Stay up-to-date on the latest Platform features and best practices. • This position requires participation in an 8 hour shift schedule to provide coverage during core service hours from 0600 to 2200 EST on weekdays. • This position requires participation in a rotational weekend shift schedule to provide intermittent coverage during core service hours.
• Provide timely and effective technical support to end-users regarding GenAI Platform (e.g., platform access, capabilities, models, API usage, etc.). • Document support interactions, troubleshooting steps, and resolutions in a ticketing system. • Perform in-depth diagnostics and root cause analysis (RCA) for novel issues, including complex integration, performance, or latency issues. • Act as the initial point of contact for security-related issues, providing resources and support to cybersecurity related inquiries, and escalating issues as needed to the proper authority. • Directly collaborate with Generative AI Operations (GenAIOps) and Software Engineering teams to communicate complex bugs, test potential fixes, and validate resolution. • Monitor system health, performance, and automated alerts to proactively identify and address emerging trends or system degradation before they impact the user base. • Collaborate with Tier 1/3 and engineering teams to resolve complex technical challenges and provide customer feedback for platform improvements. • Maintain oversight into user accounts, groups, licenses, and agreements. • Maintain awareness of configuration of new GenAI Platform features and applications. • Proficiently diagnose complex issues across various Linux OS environments, cloud networking (VPC, firewall, routing), authentication layers critical to the GenAI platform. • Create and maintain a tech support playbook or Standard Operating Procedure (SOP) to manage, triage and track all tech support activities and inquiries. • Compile and deliver weekly and monthly technical support metrics, conduct trend analysis to identify recurring issues and inform platform improvements • This position requires participation in an 8 hour shift schedule to provide coverage during core service hours from 0600 to 2200 EST on weekdays. • This position requires participation in a rotational weekend shift schedule to provide intermittent coverage during core service hours.
Senior Power Platform Developer
General DynamicsGeneral Dynamics is a global aerospace and defense company offering products designed to provide safety and security to people around the world. In the past, Ge
• Architecting, designing, developing, and testing applications using Microsoft Power Platform • Writing and deploying custom components using JavaScript, C#, Power Platform Plugins, PCF Controls, and Custom APIs • Applying role-based security to Power Apps applications • Presenting technical concepts to both technical and executive audiences • Communicating with stakeholders at all levels • Translating business needs into technical solutions • Providing expert knowledge support and mentorship to junior staff
• Amazon WorkSpaces Platform Engineering • Deploy and manage Amazon WorkSpaces environments • Configure and maintain directory integration (AD Connector / Managed AD) • Create, maintain, and version golden images • Manage bundle selection and optimization • Support WorkSpaces pools (if applicable) • Implement lifecycle management (provisioning, rebuild, restore, decommission) • Monitor WorkSpaces performance and user experience • Troubleshoot latency, login, and connectivity issues • Optimize resource allocation and right-size instances • Support patching and image updates • Implement repeatable deployments using Terraform or CloudFormation • Automate image builds and updates where feasible • Document configurations and deployment patterns • Support incident resolution related to WorkSpaces • Collaborate with VAEC for networking or IAM escalations when required • Maintain configuration documentation • Support audit and compliance evidence gathering


