Mitratech is a privately-held, Austin, Texas-based company providing computer software solutions to companies across the globe. The company has been in operation since 1987 and hel
Senior Infrastructure Engineer – AI/ML
Location
Germany
Posted
164 days ago
Salary
0
Seniority
Senior
Job Description
Senior Infrastructure Engineer – AI/ML
Mitratech
• Design, deploy, and maintain scalable and secure infrastructure supporting AI and ML workloads. • Build and maintain AWS cloud environments for compute (EC2, ECS/EKS, Lambda), storage (S3, EFS, FSx), and networking (VPC, Transit Gateway, PrivateLink, Route 53, load balancers). • Implement security best practices using IAM, KMS, Secrets Manager, GuardDuty, and Security Hub. • Support and optimize AI/ML workloads across AWS services (SageMaker, Bedrock, Batch, Step Functions). • Develop and maintain Infrastructure as Code (IaC) using Terraform, AWS CDK, and CloudFormation. • Manage containerized workloads and orchestration platforms (Docker, EKS, Fargate), including GPU scheduling and scaling. • Set up and maintain monitoring and observability frameworks using CloudWatch and OpenTelemetry. • Build and manage CI/CD pipelines (CircleCI, GitHub Actions, GitLab CI) for infrastructure automation and ML/Gen AI deployments. • Collaborate with ML and Generative AI teams to scale models, optimize performance, and design efficient prompt or inference pipelines. • Develop runbooks and SOPs for AI service deployment, troubleshooting, and performance optimization. • Ensure security, compliance, and data protection across AI datasets and environments.
Job Requirements
- Strong proficiency in Linux administration and systems-level troubleshooting.
- Deep expertise in AWS cloud services, with experience in compute, storage, networking, and security domains.
- Proficiency in container orchestration (Kubernetes/EKS) and infrastructure automation tools.
- Hands-on experience with IaC tools such as Terraform, AWS CDK, or CloudFormation.
- Familiarity with monitoring, logging, and observability stacks (Prometheus, Grafana, OpenTelemetry).
- Experience implementing CI/CD pipelines for automated deployment and testing.
- Understanding of AI/ML concepts, including model deployment, inference scaling, and LLM performance tuning.
- Working knowledge of security best practices, IAM roles, encryption, and compliance controls.
- Excellent collaboration and communication skills to partner with ML engineers, data scientists, and product teams.
Benefits
- Equal-opportunity employer that values diversity at all levels
- Professional development opportunities
Related Guides
Related Categories
Related Job Pages
More Infrastructure Engineer Jobs
This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more. Role Description We are looking for a well-versed, passionate Engineer who wants to play a key role in site reliability engineering and cloud operations of our global cloud infrastructure. You will likely be successful in this role if you identify with the following traits: attention to detail, problem solver, customer-oriented, versatile, resilient, and confident. What you will be doing at VGS: - Architect and maintain scalable, reliable infrastructure: Design and optimize infrastructure for high availability, fault tolerance, and performance across distributed systems. - Lead incident management and root cause analysis: Own incident response processes, ensure swift resolution of issues, and drive post-incident improvements to prevent recurrences. - Service monitoring and automation: Build and maintain automated monitoring, alerting, and healing systems that improve system health, reduce manual intervention, and minimize downtime. - Performance tuning and capacity planning: Identify bottlenecks and optimization opportunities, and implement scaling strategies to handle traffic spikes and growing workloads efficiently. - Collaborate with cross-functional teams: Work closely with software engineers, product teams, and DevOps to enhance system reliability and delivery pipelines. - Improve operational processes: Champion continuous improvement initiatives in deployment, scaling, and performance testing, while advocating for the adoption of SRE best practices across the organization. - Mentorship and leadership: Provide technical mentorship to junior engineers, contribute to strategic decisions around infrastructure, and ensure best practices are implemented at scale. - Be proactive and innovative: We rely on your feedback to build a world-class product. - Be a part of a team that believes in the core values of transparency, collaboration, grit, and humility; in going above and beyond what is required to do the right thing for our customers and the company; and in having fun while doing all this! Qualifications - Proven experience in Infrastructure/SRE roles, with a track record of managing production systems in complex, large-scale environments. - Strong proficiency in AWS, including infrastructure-as-code (Terraform, CloudFormation, etc.). - Solid understanding of cloud-native architecture, Linux Systems, microservices, Infrastructure-as-code (Terraform, CloudFormation, CDK), CI/CD (CircleCI, GitHub Actions, Argo), GitOps, Authentication and Authorization, APIs and API Gateway, Docker, Kubernetes (EKS), Kafka (MSK), Java, Spring Framework, Python, and AWS services. - Strong plus if you are a database wiz. - Expertise in monitoring and observability tools like Prometheus, Grafana, Open Telemetry, New Relic, or similar tools to measure system health and performance. - Programming and scripting experience in languages such as Python, Go, Bash, or other relevant languages used in automating infrastructure. - Solid understanding of networking, security, and load balancing in cloud-native environments. - Strong communication and collaboration skills, with the ability to lead cross-functional initiatives and mentor junior team members. - Experience with incident management and disaster recovery best practices. - Strong written and verbal communication skills. Requirements - $140,000 - $190,000 a year Benefits - Flexible work hours and flexible PTO - Competitive health benefits - VGS stock options - 401k plan, with employer matching 4% and immediate vesting (available only for US employees) - Life & disability insurance - Pre-tax flexible spending accounts, dependent and healthcare FSA (available only for US employees) - Global parental leave program - Employee Assistance Program - Home Internet reimbursement - New hire home office set-up allowance - Professional learning reimbursement
• Incorporación en proyecto internacional en uno de nuestros clientes directos • Trabajar en departamento de arquitectura de soluciones basadas en entorno cloud • Colaborar con un equipo altamente cualificado
Infrastructure Engineer
Yuxi Global powered by Veritas AutomataYuxi Global powered by Veritas Automata is a technology force multiplier that digitally empowers companies.
• Design, deploy, and maintain Kubernetes clusters (K3s, RKE2, AKS, EKS, GKE) across cloud and hybrid environments. • Implement infrastructure-as-code solutions using Terraform, Pulumi, Ansible, or equivalent automation tools. • Engineer secure, scalable networking architectures including VPCs, subnets, VPNs, firewalls, service meshes, load balancers, and cross-region connectivity. • Architect and maintain CI/CD pipelines, GitOps tooling, and automated delivery workflows using GitHub Actions, ArgoCD, Flux, or GitLab CI. • Configure and operate observability platforms including Prometheus, Grafana, Loki, Tempo, OpenTelemetry, and Thanos for full-stack visibility. • Collaborate with SRE and platform teams to improve reliability, reduce operational toil, and optimize performance and cost. • Implement and maintain cloud security best practices including IAM, RBAC, secrets management, encryption, and compliance controls. • Participate in on-call rotation, incident response, and root cause analysis for platform-related production issues. • Develop and document runbooks, architecture diagrams, operational standards, and troubleshooting guides. • Mentor junior engineers and contribute to capability-building around modern infrastructure practices.
At Symmetrio, we are recruiting for an IT Infrastructure Specialist who supports the deployment and ongoing operation of customer environments used to deliver company applications. This heavily customer-facing role is hands-on and includes testing/validating hardware configurations, implementing core infrastructure services, and providing support for both new deployments and existing clients. **NYC Metro candidates are highly preferred for occasional travel.** **Infrastructure Management** · Test and validate server and hardware configurations to ensure they are suitable for reliable application delivery · Implement, administer, and maintain Windows-based physical server environments and VMware and Nutanix virtual infrastructure · Configure and support Active Directory services, including users, groups, and Group Policy (GPO) administration · Deploy, maintain, and troubleshoot app delivery (Citrix/Omnissa) based environments · Install, configure, maintain, and troubleshoot Microsoft SQL Server and related components · Coordinate software releases and patching activities; monitor environments to maintain security, stability, and performance **Support & Operations** · Follow change management procedures and provide escalation-level troubleshooting when required · Maintain accurate documentation for systems, configurations, processes, and support activities · Provide hands-on technical support to new and existing clients via phone/email/remote support · Handle daily support cases and service requests, ensuring timely response and resolution · Participate in occasional coverage aligned with West Coast business hours **Best Practices & Professional Development** · Apply IT service management and standards practices (e.g., ITIL and ISO) to support operations · Maintain familiarity with disaster recovery best practices and procedures · Assess business requirements and recommend practical technical solutions · Stay current with industry trends and emerging technologies




