Job Closed

This listing is no longer active.

Senior Infrastructure Engineer

Infrastructure EngineerInfrastructure EngineerOtherRemoteSeniorTeam 51-200Since 2017H1B SponsorCompany SiteLinkedIn

Location

United States

Posted

167 days ago

Salary

$140K - $180K / year

Seniority

Senior

Job Description

Senior Infrastructure Engineer

Prometheum

• Design, build, and maintain AWS cloud infrastructure using Terraform, Terragrunt, Helm, ArgoCD, Kubernetes (EKS), and CI/CD pipelines (GitHub Actions) • Manage infrastructure across multiple AWS accounts and environments, ensuring consistency, proper isolation, and security • Maintain and optimize Kubernetes clusters, including EKS upgrades, component updates, and capacity planning • Build and maintain observability systems (Prometheus, Grafana, Datadog) with comprehensive alerting and dashboards • Manage dependency updates and security patches across Docker images, Helm charts, Terraform modules, and application dependencies using automation tools like Renovate • Enhance security posture through least privilege access, signed images, admission controllers (Kyverno), and mTLS • Participate in on-call rotation to respond to incidents, troubleshoot issues, identify root causes, and implement preventive measures • Document infrastructure patterns, best practices, and operational procedures

Job Requirements

  • 5+ years of experience architecting, designing, and implementing cloud solutions on AWS
  • Production experience with Docker and Kubernetes (AWS EKS strongly preferred)
  • Strong Infrastructure-as-Code skills using Terraform and Terragrunt (or similar DRY configuration patterns)
  • Experience managing infrastructure across multiple AWS accounts with IAM, SSO, and account isolation
  • Hands-on experience with GitOps workflows and tools (ArgoCD preferred)
  • Experience with CI/CD pipelines and automation (GitHub Actions preferred)
  • Experience with observability tools (Prometheus, Grafana, Datadog) for metrics, alerting, and dashboards
  • Experience with Cloudflare and Cloudflare Zero Trust for network security, DNS, and secure access
  • Proficiency in at least one programming language: Python, Go, Rust, or TypeScript
  • Strong troubleshooting skills in containerized Linux environments
  • Experience applying SRE principles: SLO/SLIs, golden signals, MTTR, progressive rollouts, and change management
  • Experience setting up, managing, and maintaining high-availability blockchain infrastructure for production environments (Nice to have)
  • Experience working in a highly regulated environment (Nice to have)
  • Experience building and operating multi-region, multi-cloud production systems (Nice to have)
  • Experience using AI-related tools in DevOps and infrastructure toolchains (Nice to have)

Benefits

  • Competitive salary based on experience
  • Excellent benefits including:
  • Health, Vision & Dental Insurance
  • Fully remote position with equipment provided.

Related Categories

Related Job Pages

More Infrastructure Engineer Jobs

Mitratech logo

Senior Infrastructure Engineer – AI/ML

Mitratech

Mitratech is a privately-held, Austin, Texas-based company providing computer software solutions to companies across the globe. The company has been in operatio

• Design, deploy, and maintain scalable and secure infrastructure supporting AI and ML workloads. • Build and maintain AWS cloud environments for compute (EC2, ECS/EKS, Lambda), storage (S3, EFS, FSx), and networking (VPC, Transit Gateway, PrivateLink, Route 53, load balancers). • Implement security best practices using IAM, KMS, Secrets Manager, GuardDuty, and Security Hub. • Support and optimize AI/ML workloads across AWS services (SageMaker, Bedrock, Batch, Step Functions). • Develop and maintain Infrastructure as Code (IaC) using Terraform, AWS CDK, and CloudFormation. • Manage containerized workloads and orchestration platforms (Docker, EKS, Fargate), including GPU scheduling and scaling. • Set up and maintain monitoring and observability frameworks using CloudWatch and OpenTelemetry. • Build and manage CI/CD pipelines (CircleCI, GitHub Actions, GitLab CI) for infrastructure automation and ML/Gen AI deployments. • Collaborate with ML and Generative AI teams to scale models, optimize performance, and design efficient prompt or inference pipelines. • Develop runbooks and SOPs for AI service deployment, troubleshooting, and performance optimization. • Ensure security, compliance, and data protection across AI datasets and environments.

Germany
Very Good Security - VGS logo

Senior Infrastructure Engineer

Very Good Security - VGS

Very Good Security, or VGS, is a computer and network security company offering a modern approach to data security, compliance, and privacy with its SaaS soluti

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more. Role Description We are looking for a well-versed, passionate Engineer who wants to play a key role in site reliability engineering and cloud operations of our global cloud infrastructure. You will likely be successful in this role if you identify with the following traits: attention to detail, problem solver, customer-oriented, versatile, resilient, and confident. What you will be doing at VGS: - Architect and maintain scalable, reliable infrastructure: Design and optimize infrastructure for high availability, fault tolerance, and performance across distributed systems. - Lead incident management and root cause analysis: Own incident response processes, ensure swift resolution of issues, and drive post-incident improvements to prevent recurrences. - Service monitoring and automation: Build and maintain automated monitoring, alerting, and healing systems that improve system health, reduce manual intervention, and minimize downtime. - Performance tuning and capacity planning: Identify bottlenecks and optimization opportunities, and implement scaling strategies to handle traffic spikes and growing workloads efficiently. - Collaborate with cross-functional teams: Work closely with software engineers, product teams, and DevOps to enhance system reliability and delivery pipelines. - Improve operational processes: Champion continuous improvement initiatives in deployment, scaling, and performance testing, while advocating for the adoption of SRE best practices across the organization. - Mentorship and leadership: Provide technical mentorship to junior engineers, contribute to strategic decisions around infrastructure, and ensure best practices are implemented at scale. - Be proactive and innovative: We rely on your feedback to build a world-class product. - Be a part of a team that believes in the core values of transparency, collaboration, grit, and humility; in going above and beyond what is required to do the right thing for our customers and the company; and in having fun while doing all this! Qualifications - Proven experience in Infrastructure/SRE roles, with a track record of managing production systems in complex, large-scale environments. - Strong proficiency in AWS, including infrastructure-as-code (Terraform, CloudFormation, etc.). - Solid understanding of cloud-native architecture, Linux Systems, microservices, Infrastructure-as-code (Terraform, CloudFormation, CDK), CI/CD (CircleCI, GitHub Actions, Argo), GitOps, Authentication and Authorization, APIs and API Gateway, Docker, Kubernetes (EKS), Kafka (MSK), Java, Spring Framework, Python, and AWS services. - Strong plus if you are a database wiz. - Expertise in monitoring and observability tools like Prometheus, Grafana, Open Telemetry, New Relic, or similar tools to measure system health and performance. - Programming and scripting experience in languages such as Python, Go, Bash, or other relevant languages used in automating infrastructure. - Solid understanding of networking, security, and load balancing in cloud-native environments. - Strong communication and collaboration skills, with the ability to lead cross-functional initiatives and mentor junior team members. - Experience with incident management and disaster recovery best practices. - Strong written and verbal communication skills. Requirements - $140,000 - $190,000 a year Benefits - Flexible work hours and flexible PTO - Competitive health benefits - VGS stock options - 401k plan, with employer matching 4% and immediate vesting (available only for US employees) - Life & disability insurance - Pre-tax flexible spending accounts, dependent and healthcare FSA (available only for US employees) - Global parental leave program - Employee Assistance Program - Home Internet reimbursement - New hire home office set-up allowance - Professional learning reimbursement

United States
$140K - $190K / year
Job Closed
Full TimeRemoteTeam 5,001-10,000H1B No Sponsor

• Incorporación en proyecto internacional en uno de nuestros clientes directos • Trabajar en departamento de arquitectura de soluciones basadas en entorno cloud • Colaborar con un equipo altamente cualificado

Spain
Yuxi Global powered by Veritas Automata logo

Infrastructure Engineer

Yuxi Global powered by Veritas Automata

Yuxi Global powered by Veritas Automata is a technology force multiplier that digitally empowers companies.

Full TimeRemoteTeam 201-500H1B No Sponsor

• Design, deploy, and maintain Kubernetes clusters (K3s, RKE2, AKS, EKS, GKE) across cloud and hybrid environments. • Implement infrastructure-as-code solutions using Terraform, Pulumi, Ansible, or equivalent automation tools. • Engineer secure, scalable networking architectures including VPCs, subnets, VPNs, firewalls, service meshes, load balancers, and cross-region connectivity. • Architect and maintain CI/CD pipelines, GitOps tooling, and automated delivery workflows using GitHub Actions, ArgoCD, Flux, or GitLab CI. • Configure and operate observability platforms including Prometheus, Grafana, Loki, Tempo, OpenTelemetry, and Thanos for full-stack visibility. • Collaborate with SRE and platform teams to improve reliability, reduce operational toil, and optimize performance and cost. • Implement and maintain cloud security best practices including IAM, RBAC, secrets management, encryption, and compliance controls. • Participate in on-call rotation, incident response, and root cause analysis for platform-related production issues. • Develop and document runbooks, architecture diagrams, operational standards, and troubleshooting guides. • Mentor junior engineers and contribute to capability-building around modern infrastructure practices.

Colombia
Job Closed