Job Closed
This listing is no longer active.
Site Reliability Engineer
Location
United States
Posted
71 days ago
Salary
0
Seniority
Mid Level
Job Description
Site Reliability Engineer
Cooper Standard
Job Description: Site Reliability Engineer (SRE) About Liveline Liveline enables dramatic improvements in manufacturing performance thorough a unique application of artificial intelligence to provide real-time process control and predictive assistants for plant personnel. Our focus is on automating complex processes, not simply providing dashboards for managers and operators. Our team combines experts in AI with world-class process engineers who can focus on the “last mile” with customers: Extracting data from the process and implementing controls on the shop floor. We speak the language of AI but also industrial controllers. Our hardware and software offerings are scalable and cost-effective whether customers have one production line or hundreds, delivering an ROI that’s attractive to small and medium-sized enterprises. We are passionate about democratizing the power of analytics and advanced automation for manufacturers of almost any size. Through our approach, producers can de-mystify complex processes and free up valuable technicians to focus on more advanced tasks instead of constantly monitoring and adjusting equipment parameters. A Liveline Technologies SRE is responsible for the reliability, performance, observability, and operational excellence of Liveline’s production services. This spans from the factory-floor edge systems to AWS cloud components. You will help build and run resilient infrastructure, automate repetitive work with code (Terraform, Bash, Python), implement monitoring and alerting (Prometheus/Grafana), and participate in incident response/on-call to ensure uptime for mission-critical manufacturing systems. You’ll collaborate closely with controls engineers, data scientists, and software teams to safely deploy changes, define SLIs/SLOs, and continuously improve availability and latency for real-time process control. Primary Responsibilities - Operate Production Systems: Maintain high availability, performance, and security of Liveline’s production stack across AWS and plant/edge environments. - Observability & Monitoring: Stand up, tune, and maintain Prometheus/Grafana dashboards, alerts, recording rules, and runbooks. Implement logs/traces (e.g., OpenTelemetry) and actionable alerting. - Infrastructure as Code: Build and manage reproducible infrastructure with Terraform (VPC, IAM, EC2/EKS/ECS, RDS, S3, CloudWatch, CloudTrail). Apply version control, code reviews, and plan/apply workflows. - Automation & Tooling: Write Bash and Python scripts and small services to automate operational tasks, health checks, failover routines, backup/restore, and environment bootstrapping. - NOC / Incident Response: Participate in a follow-the-sun/on-call rotation; triage and resolve incidents, lead initial comms, and produce blameless postmortems with clear corrective actions. - SLIs/SLOs/Error Budgets: Define and instrument SLIs (availability, latency, error rate, freshness), set SLOs with stakeholders, and manage error budgets to guide release velocity and reliability tradeoffs. - Networking & Connectivity: Support secure, reliable connectivity between factory networks and cloud (site-to-site VPNs, routing, DNS, TLS, private subnets, security groups, network ACLs). - Databases & Storage: Operate and tune PostgreSQL/TimescaleDB, InfluxDB, or similar time-series/relational stores; manage backups, PITR, replication, partitioning, and performance baselining. - CI/CD & Release Engineering: Contribute to build/deploy pipelines (e.g., GitHub Actions/GitLab CI), implement canaries/blue-green strategies, and enforce change management and rollback plans. - Security & Compliance: Enforce least-privilege IAM, secret management (AWS Secrets Manager/SSM), encryption, artifact signing, and basic hardening for Linux and Kubernetes workloads. - Edge & OT Collaboration: Partner with process/controls engineers to ensure reliable data ingestion from PLCs/industrial gateways (e.g., OPC UA/Modbus), and safe deploys to plant edge nodes. - Cost, Capacity & Performance: Right-size compute/storage, set budgets/alerts, forecast capacity, and optimize resource utilization without compromising SLOs. - Documentation & Runbooks: Author and maintain runbooks, architecture diagrams, operational playbooks, and disaster recovery procedures. Education and Qualifications: - Bachelor’s Degree in IT, Computer Science, or Computer Engineering (or equivalent experience). - 5+ years of experience in a corporate IT or startup setting - Familiar with containers (Docker) and orchestration (Kubernetes or ECS). - Experience running production workloads, participating in on-call, and writing postmortems. - Strong communication skills with the ability to explain tradeoffs to non-SRE stakeholders. - Intellectual curiosity, ownership mindset, and bias for automation. - Willingness and ability to travel to customer sites and plants, as necessary. Nice to Have - Kubernetes (EKS), Helm, Kustomize. - Service Mesh/Ingress (Envoy, NGINX, ALB). - Logging/Tracing: OpenSearch/ELK, Loki, OpenTelemetry. - Config Management: Ansible. - Secrets & PKI: HashiCorp Vault, mTLS. - Edge/Industrial Protocols: OPC UA, Modbus, MQTT; experience with industrial gateways. - Compliance exposure (SOC 2, ISO 27001) and change management (ITIL). Position Type: Regular Additional Locations: Additional Information: Cooper Standard is proud of its diverse workforce and committed to providing equal employment opportunities to applicants and employees without regard to race, color, religion, sex, national origin, genetic information, physical or mental disability, age, veteran or military status, or any other characteristic protected by applicable law. We are dedicated to creating an environment at work that not only values diversity but also encourages inclusion and a sense of belonging. We firmly believe that a diverse workplace fosters an environment where our employees can flourish and provide superior service to our customers. Because we recognize and value the range of ways in which people acquire experiences, whether personal, professional, or via education or volunteerism, we invite interested applicants to evaluate the key duties and requirements and apply for any opportunities that fit your experience and qualifications. Applicants with disabilities may be entitled to reasonable accommodations under the Americans with Disabilities Act, as well as certain state and/or local laws. If you believe you require such assistance to complete our online application or to participate in an interview, you (or someone on your behalf) may request assistance by emailing recruitment@cooperstandard.com with a description of the accommodation you seek. Application materials submitted to this email address will not be considered. Remote Status: Remote
Job Requirements
- Bachelor’s Degree in IT, Computer Science, or Computer Engineering (or equivalent experience).
- 5+ years of experience in a corporate IT or startup setting.
- Familiar with containers (Docker) and orchestration (Kubernetes or ECS).
- Experience running production workloads, participating in on-call, and writing postmortems.
- Strong communication skills with the ability to explain tradeoffs to non-SRE stakeholders.
- Intellectual curiosity, ownership mindset, and bias for automation.
- Willingness and ability to travel to customer sites and plants, as necessary.
- Nice to Have
- Kubernetes (EKS), Helm, Kustomize.
- Service Mesh/Ingress (Envoy, NGINX, ALB).
- Logging/Tracing: OpenSearch/ELK, Loki, OpenTelemetry.
- Config Management: Ansible.
- Secrets & PKI: HashiCorp Vault, mTLS.
- Edge/Industrial Protocols: OPC UA, Modbus, MQTT; experience with industrial gateways.
- Compliance exposure (SOC 2, ISO 27001) and change management (ITIL).
- Position Type
- Regular
- Remote Status
- Remote
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Manage and optimize serverless infrastructure to ensure scalability, high availability, and cost efficiency using services like AWS Lambda, Step Functions, DynamoDB, AWS Batch, and more. • Design, implement, and maintain CI/CD pipelines using tools such as Github Actions, AWS CodePipeline and CodeBuild. • Develop and maintain Infrastructure as Code (IaC) using Cloudformation and AWS CDK. • Proficient in Infrastructure as Code (IaC) tools. • Collaborate with security teams to implement best practices for AWS security and compliance. • Monitor and manage security measures within the AWS environment to protect against potential threats. • Monitor system performance, analyze metrics, and implement optimizations to enhance overall system efficiency. • Create and maintain comprehensive documentation for AWS infrastructure, automation processes, and best practices. • Collaborate with cross-functional teams to ensure seamless integration of applications and services. • Communicate effectively with team members, providing technical guidance and expertise.
DevOps Engineer
TherapyNotes, LLCTherapyNotes™ is the industry-preferred online EHR for behavioral health. Try one month free!
• Support and improve the engineering platform and Continuous Integration/Continuous Deployment pipeline containing technologies such as Azure PaaS, Kubernetes/Docker containers, GitHub, GitHub Actions, Argo CD, Octopus, and more. • Identify and champion opportunities to automate workloads such as scripting repetitive tasks, moving legacy workloads into Kubernetes, effectively utilizing AI, writing AI agents, deploying infrastructure as code using Terraform and Ansible and writing excellent documentation. • Takes a GitOps and Infrastructure as Code first approach to solving technological problems, to encourage repeatability and testability of changes. • Discovers and architects automated solutions to manual or inefficient processes and procedures. • Provides escalated technical guidance and support to other technology teams throughout the organization. • Improve site reliability through the implementation of detailed telemetry, based on leading and trailing indicators. • Ensure all solutions and operational activities adhere to the security and operating policies established by the organization. • Identify opportunities for improving service delivery or the health of the environment through process improvement and/or continual development of documentation and runbooks. • Provides on-call coverage for IT operational support and other duties as required.
• Manage cloud environments, ensuring their efficiency and security • Serve as the technical infrastructure reference for the squad's developers • Automate provisioning, deployment, and scalability of the software infrastructure • Respond to incidents following agile methodologies and DevOps culture • Ensure services are dynamically scalable, highly available, and fault-tolerant • Develop and maintain monitoring frameworks, alerts, and anomaly detection • Analyze system performance and propose solutions for bottlenecks and other technical issues • Ensure all systems have secure and up-to-date configurations • Measure and estimate infrastructure costs and required upgrades
DevOps Engineer
University of ArizonaThe University of Arizona (UA) is a public research university located in Tucson, Arizona. As an employer, the University of Arizona offers a work environment t
• The DevOps Engineer will drive the automation, reliability, and scalability of our cloud infrastructure and software delivery processes. • The DevOps Engineer will bridge development and operations by designing and maintaining robust CI/CD pipelines, ensuring high availability and observability of production systems, and championing automation to accelerate secure software releases. • The DevOps Engineer collaborates closely with Systems Analysts, SMEs, and Business Units to build resilient, cloud-native systems that support rapid iteration while maintaining enterprise-grade security, performance, and cost efficiency. • Design, implement, and maintain end-to-end CI/CD pipelines using modern tools to automate build, test, security scanning, and deployment processes with zero-downtime strategies. • Implement and manage container orchestration, microservices deployment, and scaling strategies. • Establish comprehensive monitoring, logging, alerting, and observability solutions to enable proactive incident detection and rapid resolution. • Automate repetitive operational tasks, reduce toil, and improve system reliability through scripting, tooling, and process optimization. • Optimize cloud resource usage, implement cost controls, and contribute to capacity planning and performance tuning. • Work cross-functionally to define and evolve platform standards and self-service capabilities for developers. • Stay current with emerging DevOps, cloud, and observability technologies and recommend adoption where appropriate.



