Deployment Engineer
Location
United States
Posted
4 days ago
Salary
$113.8K - $142.2K / year
Seniority
Senior
Job Description
Deployment Engineer
Armada
• Execute installation, commissioning, startup, and infrastructure validation activities for modular data center deployments • Perform hands-on technical work across electrical, mechanical, controls/BAS, networking, and low-voltage systems • Execute deployment procedures, commissioning plans, operational readiness testing, and infrastructure validation processes • Conduct site assessments and maintain field documentation including commissioning reports, punch lists, and as-built updates • Ensure deployment activities are completed in accordance with Armada operational, engineering, and safety standards • Troubleshoot infrastructure issues across power, cooling, controls, monitoring, and network-connected systems • Read, interpret, and apply electrical schematics, mechanical drawings, and control diagrams during deployment and operational activities • Utilize field diagnostic tools and test equipment to identify and resolve infrastructure issues • Participate in root cause analysis and corrective action implementation during deployment and operational incidents • Exercise independent technical judgment while escalating high-risk or complex issues appropriately • Execute BMS, EPMS, and DCIM integration and validation activities • Support system startup, alarm validation, monitoring verification, and operational turnover processes • Verify infrastructure systems meet operational, performance, and quality requirements prior to deployment completion • Participate in incident response and operational recovery activities as required • Partner with Senior Deployment Engineers, Engineering, Manufacturing, Supply Chain, and Customer Operations teams during deployment execution • Coordinate field activities with vendors, subcontractors, and third-party service providers • Provide operational feedback and field observations that improve deployment quality, repeatability, and infrastructure reliability • Maintain clear communication with internal and external stakeholders throughout deployment activities
Job Requirements
- 4–7 years of experience in mission-critical infrastructure, data centers, field engineering, or related technical operations environments
- Hands-on experience with deployment, commissioning, startup, troubleshooting, or operational support activities within critical infrastructure environments
- Working knowledge of critical infrastructure systems including:
- Electrical systems
- Mechanical/HVAC systems
- Controls/BAS systems
- Low-voltage systems
- Ability to independently troubleshoot and resolve technical infrastructure issues within established operational frameworks
- Familiarity with BMS, EPMS, and/or DCIM platforms
- Ability to read and interpret electrical schematics, mechanical drawings, and control diagrams
- Proficiency with field diagnostic equipment, hand tools, and power tools
- Strong analytical, troubleshooting, and problem-solving skills
- Excellent communication and coordination skills in fast-paced operational environments
- Valid driver’s license with a clean driving record
- U.S. citizenship required
- Must be eligible to obtain and maintain a U.S. security clearance
Benefits
- Competitive base salary and equity
- Medical, dental, and vision (subsidized cost)
- Health savings accounts (HSA), flexible spending accounts (FSA), and dependent care FSAs (DCFSA)
- Retirement plan options, including 401(k) and Roth 401(k)
- Unlimited paid time off (PTO)
- 14 paid company holidays per year
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Senior Site Reliability Engineer
The LeafletAn independent platform for cutting-edge, progressive, legal, and political opinion.
• Ensure the availability, reliability, and performance of high-traffic Java-based applications in a distributed environment • Troubleshoot and resolve complex issues across production and non-production environments • Participate in pre- and post-deployment performance testing and monitoring to continuously improve application performance • Design, build, and operate agentic AI workflows that automate operational tasks such as alert triage and root cause analysis
Senior Site Reliability Engineer
The LeafletAn independent platform for cutting-edge, progressive, legal, and political opinion.
• Ensure the availability, reliability, and performance of high-traffic Java-based applications in a distributed environment. • Troubleshoot and resolve complex issues across production and non-production environments. • Participate in pre- and post-deployment performance testing and monitoring to continuously improve application performance. • Optimize Java application performance with a focus on JVM tuning, efficient resource utilization, and horizontal scaling. • Deploy and manage the Grafana stack (Grafana, Prometheus, Loki, Mimir, Alloy) to deliver real-time monitoring, logging, and alerting. • Implement and refine observability strategies that enhance visibility into application and infrastructure health. • Create and maintain dashboards, alerts, and log queries for comprehensive system health monitoring. • Integrate AI/ML models into the observability pipeline for anomaly detection, predictive alerting, and intelligent alert correlation and noise reduction. • Design, build, and operate agentic AI workflows that automate operational tasks such as alert triage, root cause analysis, runbook execution, and incident summarization. • Develop tool-calling LLM agents that interact with infrastructure APIs (Kubernetes, Grafana, Jira, Slack, PagerDuty) to execute diagnostic and remediation actions autonomously or with human-in-the-loop approval. • Build and maintain MCP (Model Context Protocol) servers and integrations that expose internal systems as tool surfaces for AI agents. • Evaluate, select, and operationalize LLM frameworks and orchestration platforms (e.g., LangChain, LangGraph, CrewAI, n8n, or custom solutions) for production-grade agentic systems. • Implement guardrails, evaluation harnesses, and feedback loops to ensure AI agent outputs are accurate, safe, and continuously improving. • Champion the adoption of AI-assisted development and operations practices across the SRE and broader engineering organization. • Support the operations team’s incident response efforts, conduct post-mortems, and identify root causes to prevent recurrence. • Leverage AI tools to accelerate incident timelines, auto-generate post-mortem drafts, and surface patterns across historical incidents. • Document and share lessons learned, contributing to a culture of continuous improvement. • Identify repetitive operational workflows and engineer AI-augmented or fully automated replacements. • Build self-service tools and chatbot interfaces that allow engineering teams to query system status, retrieve logs, and execute standard operating procedures through natural language. • Measure and report on toil reduction metrics to quantify the impact of automation initiatives. • Work closely with developers, architects, and data/ML engineers to design solutions that improve reliability and leverage AI capabilities. • Collaborate with DevOps and NOC teams to support the application platform. • Communicate SRE practices, AI/automation capabilities, and operational insights to technical and non-technical stakeholders. • Provide feedback on application performance, potential improvements, and observability metrics.
Role Description We are looking for a Senior DevOps Engineer - Multi - Cloud. Qualifications - Experience in DevOps practices and tools. - Strong knowledge of cloud platforms (AWS, Azure, GCP). - Proficiency in scripting and automation. - Experience with CI/CD pipelines. - Familiarity with containerization technologies (Docker, Kubernetes). Requirements - Ability to work in a remote-first environment. - Strong communication skills. - Problem-solving mindset. - Team player with a collaborative approach. Benefits - Flexible hours and remote-first mode. - Competitive compensation. - Complete Hardware/Software setup – anything you need for work. - Open-door culture, transparent communication, and top management at a handshake distance. - Health insurance, vacation, sick leaves, holidays, paid maternity/paternity leave. - Access to our learning & development center: workshops, webinars, training platform, and edutainment events. - Virtual team buildings and social activities. Company Description Innovecs is a global digital services company with a presence in the US, the UK, the EU, Israel, Australia, and Ukraine. Specializing in software solutions, the Innovecs team has experience in Supply Chain, Healthtech, Collaboration Tech, and Gaming. - Included in the Inc. 5000, the list of fastest-growing private companies in the US. - Ranked as one of the best global outsourcing service providers by IAOP. - Honored with the Global Good Awards for Employee Engagement & Wellbeing. - Won gold at the Employer Brand Management Awards. - Included in the Global Top 100 Inspiring Workplaces Ranking.
DevOps Engineer, Fluent Ukrainian
SupportYourAppSupportYourApp is an industry leader in premium outsourced customer support that provides tech companies with reliable, cost-effective services. A multinational
• Будувати, підтримувати та оптимізувати CI/CD pipelines для веб-продуктів, сайтів та внутрішніх сервісів компанії у Jenkins та GitLab CI/CD • Підтримувати поступову міграцію deployment processes з Jenkins на GitLab CI • Забезпечувати стабільні, repeatable та predictable deployments з rollback-механізмами і мінімальною кількістю manual steps • Налаштовувати та підтримувати Docker-based runtime environments для web applications та сервісів • Стандартизувати Docker, docker-compose, deployment scripts та runtime-конфігурації, щоб рішення не потребували регулярного rework • Адмініструвати Linux-сервери у production-середовищі: налаштування, patch management, troubleshooting, performance analysis • Автоматизовувати infrastructure setup, configuration management та maintenance-процеси через Ansible і Bash • Підтримувати web infrastructure: Nginx, SSL/TLS, reverse proxy, routing, Cloudflare, DNS, caching та базові security rules • Налаштовувати, підтримувати та покращувати monitoring, logging та alerting для production systems • Аналізувати deployment failures та production incidents, визначати root cause і пропонувати preventive actions • Підтримувати backup/restore, monitoring та базове troubleshooting для MySQL/PostgreSQL • Забезпечення reliability та stability production systems • Аналіз production incidents, проведення root cause analysis та впровадження preventive actions • Участь у post-incident reviews та підготовка технічних висновків після інцидентів • Впроваджувати та підтримувати security practices для Linux і web infrastructure: hardening, контроль доступів, оновлення, закриття вразливостей • Документувати інфраструктурні рішення, deployment workflows, конфігурації та важливі зміни • Узгоджувати production changes з командою, попереджати про ризики та не вносити критичні зміни без прозорої комунікації • Проактивно виявляти слабкі місця в deployment, infrastructure та application architecture, які можуть призвести до нестабільності, та ініціювати їх усунення.



