ST Engineering iDirect logo
ST Engineering iDirect

Shaping the Future of How the World Connects

DevOps Engineer

DevOps EngineerDevOps EngineerFull TimeRemoteMid LevelTeam 501-1,000Since 1994H1B SponsorCompany SiteLinkedIn

Location

United States

Posted

78 days ago

Salary

0

Seniority

Mid Level

No structured requirement data.

Job Description

DevOps Engineer

ST Engineering iDirect

Overview At ST Engineering iDirect, we’re reshaping the future of global connectivity. As a leader in satellite communications, our groundbreaking technology empowers customers to grow, innovate, and transform their networks. Here, your skills and passion meet our vision and expertise to create something extraordinary. If you're ready to tackle technology’s biggest challenges and redefine how the world connects, the most exciting chapter of your career awaits. With ST Engineering iDirect, the sky isn’t the limit—it’s just the beginning! We are seeking a DevOps Engineer to increase productivity, site/platform reliability of operations, quality, scalability and security through small batch updates, automation, monitoring, Continuous Integration and Continuous Deployment (Continuous Flow). This role enables the engineering delivery teams to accomplish tasks such as infrastructure provisioning and build, bundle, test and deploy, in a self-service model, for both development and production environments. We are looking for a DevOps Engineer who is passionate about automation, reliability, scalability, and accelerating software delivery. In this role, you will partner with engineering teams to streamline infrastructure, improve deployment workflows, and enhance the overall developer experience. Responsibilities - Support teams with self‑service tools for provisioning, building, testing, and deploying applications. - Improve system reliability, security, and scalability using automation and modern DevOps practices. - Maintain and enhance CI/CD pipelines (Jenkins, GitLab CI/CD). - Work across cloud infrastructure (AWS), networking, system administration, and security. - Implement infrastructure‑as‑code and environment automation. - Drive operational excellence through monitoring, logging, and process improvements. Qualifications - Bachelor’s degree in systems, Electrical, Software Engineering, or other technical field. A graduate degree a plus. - 5+ years’ experience in any or several of the following areas: - Software programming - Shell scripting - Network architecture - Infrastructure architecture - System administration - Information security - Additional knowledge or depth of experience in the following areas a plus: - Strong practical Linux and Windows-based systems administration skills in a Cloud or Virtualized environment. - Experience building sophisticated and highly automated infrastructure. - Prior success in automating a real-world production environment. - Experience with seamless/automated build scripts used for release management across all environments. - Understanding and experience with code deployment (tagging). - - Strong knowledge of cloud platforms (AWS). - Experience with automation and infrastructure‑as‑code. - Ability to collaborate with engineering teams and improve delivery workflows. - Constantly interested in decreasing time to market, increasing quality, and looking for new/better ways to do things Nice to Have - Hands‑on experience with Ansible for configuration management. - Proficiency with Terraform for infrastructure‑as‑code. - Experience building automated infrastructure and deployment workflows. - Understanding of code deployment strategies, tagging, and release processes. - Knowledge of networking fundamentals (DNS, VPNs, load balancing, firewalls). - Experience with CI/CD tools such as Jenkins and GitLab.

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Jobgether logo

Senior Site Reliability Engineer

Jobgether

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team. We appreciate your interest and wish you the best! Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time. #LI-CL1 We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.

DevOps Engineer78 days ago
Full TimeRemoteH1B No Sponsor

Role Description This role offers the opportunity to play a critical part in scaling and maintaining a high-growth platform used by a global audience. You will be responsible for ensuring system reliability, performance, and security as infrastructure demands continue to expand. Working in a fully remote and highly collaborative environment, you will partner closely with engineering teams to build resilient, scalable systems. This is a hands-on position suited for someone who thrives in fast-paced environments and enjoys solving complex operational challenges. You’ll have a direct impact on uptime, system health, and long-term infrastructure strategy while contributing to automation and continuous improvement initiatives. - Act as a primary responder for incidents and outages, ensuring high availability and rapid resolution of production issues. - Own and continuously improve monitoring, alerting, and logging systems to enhance observability and system health. - Manage and optimize database infrastructure, including MySQL, PostgreSQL, ClickHouse, and Redis. - Maintain and enhance server infrastructure and deployment pipelines for improved efficiency and reliability. - Collaborate with engineering teams to design and implement scalable, fault-tolerant systems. - Contribute to the development of internal SRE tools and automation to streamline operations. Qualifications - 3+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering roles. - Strong expertise in AWS and Kubernetes, with hands-on experience managing cloud-native systems. - Proven experience handling incident response and maintaining production-grade systems. - Solid background in database operations, performance tuning, and optimization. - Familiarity with observability tools, monitoring frameworks, and logging best practices. - Strong communication skills and ability to work effectively in a remote, asynchronous environment. - Fluent English proficiency (written and spoken). - Bonus: Experience with SOC2 compliance, scaling high-growth platforms, or working with ClickHouse or similar technologies. Benefits - Competitive salary with equity and annual compensation reviews - Fully remote work environment with flexible working conditions - Generous paid time off (35 days annually) and sabbatical opportunities - Comprehensive healthcare coverage or reimbursement options - Parental leave to support family growth - Home office stipend for optimal remote setup - Learning and development budget for continuous skill enhancement - Performance-based bonus opportunities - Company-sponsored global retreats and team offsites

Portugal
Job Closed
Full TimeRemoteTeam 1-10H1B No Sponsor

• Own infrastructure across AWS, GCP, and Azure environments • Build and maintain CI/CD pipelines, observability stacks, and incident response workflows • Define and enforce SLOs/SLIs; lead postmortems • Author and maintain IaC (Terraform preferred) • Write internal tooling and automation using AI-assisted development workflows • Partner closely with engineering on reliability reviews and architecture decisions

Texas
Attentive logo

Senior Site Reliability Engineer

Attentive

The most comprehensive text message marketing solution.

DevOps Engineer78 days ago
Full TimeRemoteTeam 1,001-5,000Since 2016H1B Sponsor

Attentive® is the AI marketing platform for 1:1 personalization redefining the way brands and people connect. We’re the only marketing platform that combines powerful technology with human expertise to build authentic customer relationships. By unifying SMS, RCS, email, and push notifications, our AI-powered personalization engine delivers bespoke experiences that drive performance, revenue, and loyalty through real-time behavioral insights. Recognized as the #1 provider in SMS Marketing by G2, Attentive partners with more than 8,000 customers across 70+ industries. Leading global brands like Crate and Barrel, Urban Outfitters, and Carter’s work with us to enable billions of interactions that power tens of billions in revenue for our customers. With a distributed global workforce and employee hubs in New York City, San Francisco, London, and Sydney, Attentive’s team has been consistently recognized for its performance and culture. We’re proud to be included in Deloitte’s Fast 500 (four years running!), LinkedIn’s Top Startups, Forbes’ Cloud 100 (five years running!), Inc.’s Best Workplaces, and the Human Rights Campaign Foundation's Corporate Equality Index! About the Role What You’ll Accomplish - Design and deliver high-impact solutions: Design and implement systems that enhance reliability, observability, traceability, and incident management, ensuring the platform scales effectively - Lead execution on key projects: Take ownership of projects, driving them from discovery through execution - Partner across teams: Collaborate with engineers from AI/ML, Data, Platform, and Product teams to develop best-in-class platforms and services - Establish standards and best practices: Define and enforce production standards, processes, and tools to ensure operational excellence - Champion reliability goals: Advocate for and implement SLIs, SLOs, and other reliability-focused metrics across the engineering organization - Mentor and knowledge share: Guide and mentor junior team members, fostering technical growth and helping to develop the next generation of engineering leaders - Innovate and inspire: Drive continuous improvement by bringing creative ideas and challenging the status quo Your Expertise - 5+ years of experience in Production Engineering, SRE, Platform Engineering, DevOps, Backend Engineering, or similar roles - Strong coding ability in at least one language (e.g., Golang, Python, Java, Typescript) with the capability to solve complex issues through code - Experience with cloud-native technologies and Infrastructure-as-Code (e.g. Kubernetes, Terraform, AWS) - Demonstrated experience delivering medium to large-scale projects that drive meaningful improvements in platform reliability and scalability - Deep understanding of production reliability concepts, including SLIs, SLOs, and incident management - Proficient in designing and maintaining CI/CD pipelines, deployment strategies, and release automation to enable fast, safe delivery - Fluency in frontier AI-assisted development tools and agents (Claude Code, Codex, Cursor, or similar) - Excellent verbal and written communication skills with the ability to collaborate across technical and non-technical teams - Familiarity with working in dynamic, reliability-focused production environments (preferred) What We Use - Our services run primarily in Kubernetes, hosted on AWS EKS - Our tooling includes Terraform, Helm, ArgoCD, Istio, CloudFlare, Datadog, and Incident.io - Our backend is primarily Java / Spring Boot microservices, built with Gradle, coupled with things like DynamoDB, Kinesis, AirFlow, Postgres, and Redis - Our frontend is built with React and TypeScript, and uses best practices like GraphQL, Storybook, Radix UI, Vite, esbuild, and Playwright - Our automation is driven by custom and open source machine learning models, lots of data and built with Python, Metaflow, HuggingFace 🤗, PyTorch, TensorFlow, and Pandas You'll get competitive perks and benefits, from health & wellness to equity, to help you bring your best self to work. For US based applicants: - The US base salary range for this full-time position is $220,000 - 275,000 annually + equity + benefits - Our salary ranges are determined by role, level and location #LI-EF1 By applying for this position, your data will be processed as per Attentive's Privacy Policy. Attentive Company Values Default to Action - Move swiftly and with purpose Be One Unstoppable Team - Rally as each other’s champions Champion the Customer - Our success is defined by our customers' success Act Like an Owner - Take responsibility for Attentive’s success Learn more about AWAKE, Attentive’s collective of employee resource groups. If you do not meet all the requirements listed here, we still encourage you to apply! No job description is perfect, and we may also have another opportunity that closely matches your skills and experience. At Attentive, we know that our Company's strength lies in the diversity of our employees. Attentive is an Equal Opportunity Employer and we welcome applicants from all backgrounds. Our policy is to provide equal employment opportunities for all employees, applicants and covered individuals regardless of protected characteristics. We prioritize and maintain a fair, inclusive and equitable workplace free from discrimination, harassment, and retaliation. Attentive is also committed to providing reasonable accommodations for candidates with disabilities. If you need any assistance or reasonable accommodations, please let your recruiter know.

United States
$220K - $275K / year
Full TimeRemoteTeam 1,001-5,000H1B No Sponsor

• Build and operate metrics/monitoring platforms: **Prometheus and/or VictoriaMetrics** (scrape configs, exporters, recording rules) • Design and maintain alerting strategy: thresholds, anomaly detection where applicable, alert routing, deduplication, and noise reduction • Integrate monitoring/alerting and events with **BigPanda** (correlation, enrichment, routing, incident workflows) • Create and maintain dashboards and operational visibility (Grafana or equivalent) • Develop and maintain runbooks, operational playbooks, and incident response procedures • Participate in **on-call shifts**: triage alerts, manage incidents, coordinate response, and lead communication during outages • Perform root-cause analysis, postmortems, and implement corrective/preventive actions • Improve service reliability via SLOs/SLIs, capacity planning, and automation to reduce toil • Support monitoring for core infrastructure and services on **Windows and Linux**, including HA components and clusters • Collaborate with DevOps/Engineering to instrument applications and standardize telemetry (metrics, logs, traces where applicable)

United Kingdom