Software nach Maß aus Jena
DevOps / Platform Engineer
Location
Germany
Posted
6 days ago
Salary
0
Seniority
Senior
Job Description
DevOps / Platform Engineer
zollsoft GmbH
• Bindeglied zwischen Entwicklung und IT-Betrieb • Gestaltung, Automatisierung und Optimierung der Build-, Test- und Deployment-Prozesse • Entwicklung einer Container-Plattform • Pflege der IT-Infrastruktur mit klarer Dokumentation • Automatisierung wiederkehrender Aufgaben
Job Requirements
- Starkes Fundament durch ein technisches Studium, eine entsprechende Ausbildung oder praktische Erfahrung im IT-Umfeld
- Mindestens 3 Jahre Berufserfahrung in der Softwareentwicklung oder IT-Administration
- Leidenschaft für moderne DevOps-Technologien (Docker, Kubernetes, CI/CD, GitLab, Terraform)
- Beherrschung einer Skript- oder Programmiersprache (Python, Bash, Java oder Go)
- Erfahrungen in der Public Cloud (Google Cloud und/oder AWS) sind vorteilhaft
- Ausgezeichnete Deutschkenntnisse (mindestens C1)
Benefits
- Steuerfreie Shoppingkarte
- Dienstrad-Leasing
- Gesundheitskurse (z.B. Yoga, Rückentraining und mental Health)
- Flexible Arbeitszeiten
- Home Office
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Drive infrastructure standardization and operational excellence by designing and developing scalable automation frameworks in Python that enable consistent and repeatable deployments across cloud and on-premises environments. • Accelerate infrastructure provisioning by building and enhancing Terraform code-generation platforms using Python and Jinja2, enabling teams to produce validated, environment-specific infrastructure code from reusable templates. • Improve system reliability and compliance by developing and maintaining Puppet modules, manifests, and Hiera configurations that manage Linux and Windows environments at scale. • Increase operational efficiency across Windows platforms by creating robust PowerShell automation solutions for server management, Active Directory administration, and hybrid cloud integrations. • Simplify complex infrastructure workflows by developing internal automation tools, command-line utilities, and APIs that empower engineering teams to self-service common operational tasks. • Enhance the speed and safety of infrastructure delivery by integrating automation frameworks with CI/CD pipelines, enabling automated validation, testing, and deployment of infrastructure changes. • Improve software quality and reduce deployment risk by implementing comprehensive testing strategies for infrastructure code, including unit testing, linting, and integration testing. • Partner closely with cloud, platform, and application teams to identify manual processes, eliminate operational toil, and drive automation-first solutions across the organization. • Enable long-term scalability and maintainability by creating clear documentation, standards, and runbooks for automation frameworks and infrastructure templates. • Contribute to a strong engineering culture by participating in code reviews, sharing best practices, and continuously improving the quality, security, and maintainability of automation solutions.
• Design, implement, and operate scalable, secure, and highly available AWS cloud infrastructure leveraging services such as EC2, EKS, ECS, RDS, S3, VPC, Lambda, and IAM. • Drive the reliability and performance of containerized applications by managing Amazon EKS and ECS environments, including cluster operations, networking, scaling, and troubleshooting. • Ensure the stability, security, and efficiency of production Linux environments through system administration, performance tuning, storage management, networking, and incident resolution. • Maintain and optimize relational databases (PostgreSQL, MySQL, Aurora) and NoSQL platforms (DynamoDB, Redis), ensuring high availability, performance, and disaster recovery readiness. • Strengthen the organization's cloud security posture through effective management of IAM, network security controls, secrets management, and compliance best practices. • Enhance platform observability and operational excellence by implementing and improving monitoring, logging, alerting, and performance analytics using CloudWatch, Prometheus, and Grafana. • Take ownership of production incidents by participating in on-call rotations, leading troubleshooting efforts, performing root cause analysis, and driving continuous improvement initiatives. • Partner closely with software engineering, DevOps, and platform teams to improve deployment processes, application reliability, and operational efficiency. • Identify and implement cloud cost optimization opportunities through resource right-sizing, capacity planning, automation, and governance best practices.
• Drive infrastructure standardization and operational excellence by designing and developing scalable automation frameworks in Python that enable consistent and repeatable deployments across cloud and on-premises environments. • Accelerate infrastructure provisioning by building and enhancing Terraform code-generation platforms using Python and Jinja2, enabling teams to produce validated, environment-specific infrastructure code from reusable templates. • Improve system reliability and compliance by developing and maintaining Puppet modules, manifests, and Hiera configurations that manage Linux and Windows environments at scale. • Increase operational efficiency across Windows platforms by creating robust PowerShell automation solutions for server management, Active Directory administration, and hybrid cloud integrations. • Simplify complex infrastructure workflows by developing internal automation tools, command-line utilities, and APIs that empower engineering teams to self-service common operational tasks. • Enhance the speed and safety of infrastructure delivery by integrating automation frameworks with CI/CD pipelines, enabling automated validation, testing, and deployment of infrastructure changes. • Improve software quality and reduce deployment risk by implementing comprehensive testing strategies for infrastructure code, including unit testing, linting, and integration testing. • Partner closely with cloud, platform, and application teams to identify manual processes, eliminate operational toil, and drive automation-first solutions across the organization. • Enable long-term scalability and maintainability by creating clear documentation, standards, and runbooks for automation frameworks and infrastructure templates. • Contribute to a strong engineering culture by participating in code reviews, sharing best practices, and continuously improving the quality, security, and maintainability of automation solutions.
Principal Operations Engineer – Reliability, Data Center Operations
FluidStackNVIDIA H100 & A100 GPUs available on demand at scale. Access thousands of GPUs for AI/LLM/ML, ready for deployment now.
• Take the on-call escalation when a site hits trouble and triage it virtually, using real knowledge of the team and the systems to decide what to escalate, when, and how to keep the field crew focused without burying them. • Get on a plane when it matters: travel site to site (50%+) to work live incidents and post-incident reviews on the floor, and bring the practices that worked elsewhere with you. • Own root cause analysis on significant events through to closure and track corrective actions to done, killing the underlying class of failure rather than the one instance in front of you. • Read the patterns across the fleet’s incidents and RCAs, push the few highest-value learnings through to closure, and stay honest about what’s achievable and what to drop instead of boiling the ocean. • Carry learnings and practices from one campus to the next so a fix at one site becomes the standard everywhere before the failure repeats. • Write the operational Assessment standard and audit each campus against it, feeding what you find straight back into the corrective-action loop.


