DevOps Engineer Remote Jobs in Arkansas (US)
This page tracks remote devops engineer openings that are location-eligible for Arkansas.
This page tracks remote devops engineer openings that are location-eligible for Arkansas.
Open jobs
1,633
Hiring companies this week
10
Salary sample
$116,633 - $184,000
Jobs added last hour
0
1633 Jobs
1077 Companies
Nagarro (Frankfurt: NA9) is a leader in digital product engineering and drives technology-led business breakthroughs.
• Analyze and optimize existing New Relic dashboards, telemetry, and monitoring setup for the TRAIT application. • Review and refine PagerDuty alert triggers, escalation policies, and incident workflows to ensure only actionable events generate alerts. • Identify obsolete dashboards, alerts, and monitoring components and optimize them based on current operational requirements. • Support DevOps operational activities including monitoring production environments, incident response, root cause analysis, and reliability improvements. • Collaborate with development, infrastructure, and support teams to improve application observability and operational health. • Assist in automation and monitoring integration within CI/CD and cloud environments. • Recommend and implement observability best practices for logging, metrics, tracing, and alerting.
Scientific Research Corporation is an advanced information technology and engineering company that provides innovative products and services to government and private industry, as well as independent institutions. At the core of our capabilities is a seasoned team of highly skilled engineers and scientists with multidisciplinary backgrounds. This team is challenged daily to provide cutting edge technology solutions to our clients.
Role Description The Site Reliability Engineer will support a premier Navy program team in reviewing, assessing, and improving the reliability, resilience, observability, and operational maintainability of next generation Navy afloat architecture. The candidate will work with technical staff, Navy stakeholders, cybersecurity teams, infrastructure engineers, software teams, and operational representatives to ensure that SRE principles are considered early in architecture, design, integration, test, deployment, and sustainment planning. - Reviewing proposed Navy afloat architecture designs for reliability, availability, scalability, maintainability, cybersecurity alignment, and operational supportability - Identifying architecture and implementation risks that could affect system uptime, Fleet usability, maintainability, troubleshooting, patching, monitoring, recovery, or sustainment - Defining and recommending SRE‑aligned practices for Navy afloat systems, including service level objectives, operational metrics, monitoring requirements, alerting thresholds, error budget concepts, incident response workflows, and reliability reporting - Assisting engineering teams in translating operational reliability requirements into technical design considerations, implementation standards, and sustainment procedures - Evaluating system designs against real‑world afloat constraints, including limited bandwidth, intermittent connectivity, shipboard infrastructure limits, cybersecurity controls, maintenance windows, disconnected operations, and mission availability requirements - Supporting development of observability strategies, including logging, metrics, tracing, dashboards, alerts, health checks, and performance monitoring - Recommending automation opportunities to reduce manual operational workload, improve repeatability, reduce configuration drift, and improve deployment and sustainment reliability - Supporting root cause analysis for operational issues, test findings, integration failures, or architecture concerns, then converting findings into corrective actions and long‑term reliability improvements - Assisting with reliability‑focused documentation, including architecture review comments, risk assessments, operational concepts, monitoring plans, sustainment recommendations, incident response workflows, and executive‑level technical summaries - Working with cybersecurity stakeholders to ensure reliability recommendations also support DoD cybersecurity requirements, including STIG compliance, vulnerability management, audit logging, privileged access controls, and continuous monitoring - Participating in technical working groups, architecture reviews, design reviews, test planning sessions, and customer briefings - Supporting planning for deployment, installation, test, checkout, transition to operations, and sustainment handoff activities - Helping define operational readiness criteria for new or updated afloat capabilities before Fleet deployment - Providing recommendations that balance modern SRE practices with Navy operational constraints, cybersecurity mandates, lifecycle supportability, and mission execution needs - Communicating clearly with both technical and non‑technical stakeholders, including government sponsors, program managers, engineers, cybersecurity staff, and operational users FILLING THIS POSITION IS CONTINGENT UPON FUNDING #LI-JC1 Qualifications - Ability to obtain and maintain a DoD Secret clearance - U.S. citizenship required due to DoD contract and clearance requirements - Ability to support a remote eligible role with coordination to the primary office in Charleston, South Carolina - Ability to obtain CSWF / DoD 8140 aligned IAT Level II qualification within the required contract or program timeline - Current or ability to obtain one qualifying IAT Level II certification, typically including one of the following: - CompTIA Security+ CE - CompTIA CySA+ - GIAC GSEC - ISC2 SSCP - EC-Council CND - Five or more years of experience in one or more of the following areas: - Site reliability engineering - Systems engineering - Platform engineering - DevSecOps - Network or infrastructure operations - Cloud, hybrid cloud, or enterprise hosting environments - Mission critical IT operations - Practical experience with Linux and Windows server environments, including system hardening, patching, configuration, troubleshooting, logging, and operational sustainment - Working knowledge of networking fundamentals, including TCP/IP, DNS, routing, switching, firewalls, load balancing, VPNs, segmentation, and network troubleshooting - Experience designing, reviewing, or operating highly available systems with attention to uptime, resilience, observability, recoverability, and operational risk - Experience with monitoring, alerting, log aggregation, performance analysis, and incident response - Understanding of SRE principles, including: - Service level indicators - Service level objectives - Error budgets - Toil reduction - Automation first operations - Blameless post incident review - Capacity planning - Reliability risk assessment - Experience supporting cybersecurity compliance in regulated environments, preferably DoD or federal environments - Familiarity with vulnerability management, STIGs, security baselines, patch compliance, privileged access, audit logging, and continuous monitoring - Ability to evaluate architecture and design decisions for operational reliability, maintainability, cybersecurity posture, and lifecycle sustainment - Ability to translate technical findings into clear written recommendations for government sponsors, engineering teams, cybersecurity stakeholders, and program leadership - Strong written and verbal communication skills, including the ability to document technical risks, operational impacts, and recommended mitigations Requirements - Prior experience as an SRE in Fortune 100 or similar large scale environments - Active DoD Secret clearance - Experience supporting Navy, NIWC, NAVWAR, Fleet, tactical, afloat, or shipboard systems - Experience with afloat or disconnected operations where bandwidth, latency, hardware constraints, cybersecurity requirements, and operational availability drive architecture decisions - Experience reviewing or contributing to next generation architecture for Navy, DoD, tactical edge, or mission critical platforms - Experience with DoD Risk Management Framework, Authority to Operate support, continuous monitoring, vulnerability remediation, POA&Ms, STIG implementation, and cyber inspection readiness - Experience with containerization and orchestration technologies such as Docker, Kubernetes, OpenShift, Rancher, or similar platforms - Experience with infrastructure as code and configuration management tools such as Ansible, Terraform, Puppet, Chef, PowerShell DSC, or similar technologies - Experience with CI/CD pipelines and secure software delivery using tools such as GitLab, Jenkins, GitHub Actions, Azure DevOps, Nexus, Artifactory, or similar platforms - Experience with observability platforms and tooling such as Prometheus, Grafana, ELK / Elastic Stack, Splunk, OpenTelemetry, Datadog, New Relic, or similar capabilities - Experience with cloud or hybrid environments, including AWS, Azure, Azure Government, GovCloud, private cloud, VMware, or other enterprise hosting platforms - Experience with backup, disaster recovery, fail-over planning, continuity of operations, and data protection for mission critical systems - Experience performing root cause analysis and converting incident findings into architectural, operational, or automation improvements - Familiarity with Zero Trust principles, identity and access management, certificate management, privileged access management, endpoint security, and secure remote administration - Familiarity with Navy change control, configuration management, test events, installation readiness reviews, deployment planning, or Fleet Readiness Change Board style processes - Experience working directly with government customers, system owners, cybersecurity teams, network engineers, software teams, and operational users - One or more of the following certifications: - Active Security+ CE or higher DoD 8140 / IAT Level II qualifying certification - CompTIA CySA+ - ISC2 SSCP - GIAC GSEC - GIAC GCIH - GIAC GCIA - GIAC GCWN or GCUX - Red Hat Certified System Administrator - Red Hat Certified Engineer - Certified Kubernetes Administrator - AWS Certified SysOps Administrator - AWS Solutions Architect - Microsoft Azure Administrator - VMware Certified Professional - Cisco CCNA or CCNP - ITIL Foundation - Certified ScrumMaster or SAFe certification, where relevant to program execution Benefits - Medical, dental, and vision plans - 401(k) with a company match - Life insurance - Vacation and sick paid time off accruals starting at 10 days of vacation and 5 days of sick leave annually - 11 paid holidays - Tuition reimbursement - A work environment that encourages excellence Company Description Scientific Research Corporation is an advanced information technology and engineering company that provides innovative products and services to government and private industry, as well as independent institutions. At the core of our capabilities is a seasoned team of highly skilled engineers and scientists with multidisciplinary backgrounds. This team is challenged daily to provide cutting edge technology solutions to our clients.
Role Description SouthState Bank is a nationally chartered bank that provides consumer, commercial, mortgage and wealth management solutions to more than one million customers throughout Florida, Alabama, Georgia, the Carolina’s, and Virginia. With its Correspondent Banking Division, the Bank serves clients coast to coast. The Deposit Operations Virtual Review Specialist I’s primary responsibilities will include, but are not limited to: - Review and correction of mobile and ATM deposited images for purposes of fraud prevention and determination of funds availability for account owners. - Ensure compliance with Federal Regulation CC and Check 21 requirements. - Perform duties within designated timeframes and decision complex exception items. This position can sit remotely in one of these states: FL, GA, AL, VA, SC, NC, TX, CO. Hours: Monday-Friday 12:00PM EST – 9:00PM EASTERN TIME ZONE Qualifications - High School Diploma. - A minimum of two years of banking and teller experience is required. - Proficient with general office machines, PC experience, and programs to include Microsoft Office Excel, Outlook, Teams, and other software that might be utilized in the department. Requirements - Promptly and accurately review 130-160 deposits per day. - Contribute to a team that is responsible for approving approximately 40,000 deposits, valuing over $150,000,000, and rejecting approximately 300 fraudulent deposits, valuing over $1,000,000.00, in high-risk channels monthly. - Placing holds as necessary on accounts to protect the bank and account owner from loss or further loss. - Identify possible fraudulent items and process complex exception items with minimum supervision. - Review and process account adjustments. - Investigate and research duplicate deposits made in Branch, ATM, or mobile deposit. - Facilitate the resolution of Tier 2 support requests within established Service Level Agreements. - Meet or exceed the accuracy, productivity and all other goals assigned by management. - Make decisions in accordance with department procedures while managing both customer service and risk. - Ensure compliance with all bank policies and procedures including a department Attendance Policy. - Work to become cross-trained in other departmental tasks to provide backup support for other teams when needed. - Interact courteously and tactfully with managers, co-workers, customers and/or vendors. - Can be flexible to work weekends, holidays and/or extended hours as needed. - Assist with other duties as requested or assigned. - Additional team tasks include teller/merchant capture duplicate exception review, Proof Suspense general ledger clearing, Deposit Correction Notice review and mailing, and Block transaction review. - All specialists are cross trained on one or more additional tasks. - Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions. Benefits - This position can sit remotely in one of these states: FL, GA, AL, VA, SC, NC, TX, CO. - Hours: Monday-Friday 12:00PM EST – 9:00PM EASTERN TIME ZONE. - This position requires a large amount of time in front of a computer. - Must be able to effectively access and interpret information on computer screens, documents, and reports. Work Environment - Telecommuting roles must have a secure home office environment that is free from background noise and distractions. - Must have a reliable private internet connection that is not supplied by use of cellular data (hot spot). Cable or fiber connections are preferred. - Requirements are subject to change, as new systems and technology are delivered. Travel - Travel may be required to come to meetings and/or training as needed.
• Build and operate our AWS infrastructure, Kubernetes/EKS clusters, GitOps pipelines (ArgoCD/Helm), and CI/CD (TeamCity) • Write and own infrastructure-as-code across both Terraform and AWS CDK, setting reusable, well-tested, peer-reviewed patterns. • Build and enhance our security posture, including topics such as: least-privilege IAM, secrets management, vulnerability and patch management, dependency/supply-chain (SCA) and code (SAST/DAST) scanning, and shifting security left into the developer workflow. • Analyze performance and other observability metrics using tools like Datadog to assess the health of our infrastructure, spot regressions and bottlenecks, and drive improvements. • Lead a team of DevOps and Security engineers to prioritize, schedule, and implement DevOps & Security-related tasks. • Help lead the overall technical vision and strategy for all things DevOps & Security. • Discuss with and educate other team members in DevOps methodology, design, and best practices.
Defining what it means to build and deliver the most extraordinary sports & entertainment experiences.The Crown is Yours
At DraftKings, AI is becoming an integral part of both our present and future, powering how work gets done today, guiding smarter decisions, and sparking bold ideas. It's transforming how we enhance customer experiences, streamline operations, and unlock new possibilities. Our teams are energized by innovation and readily embrace emerging technology. We're not waiting for the future to arrive. We're shaping it, one bold step at a time. To those who see AI as a driver of progress, come build the future together. The Crown Is Yours As a Senior Site Reliability Engineer, you'll build and scale the critical infrastructure behind every product. In this role, you'll take on complex challenges across global data centers, multiple cloud platforms, and on-premise systems-designing automation-first solutions that elevate performance and eliminate operational friction. You'll be trusted to drive stability at scale, influence architectural decisions, and build tools that empower our teams to move fast and deliver reliably. This is where your impact won't just be felt, it'll be foundational. What You'll Do - Drive stability and scalability across our global compute platform spanning numerous data centers, multiple public clouds, and on-premise environments, serving as the foundation for every product. - Operate and evolve our GitOps delivery model, using Rancher Fleet and Flux with Helm to deploy core cluster services and application workloads declaratively and repeatably. - Build self-healing, fault-tolerant infrastructure and internal tooling that eliminates repetitive operational work and reduces toil for both platform and application teams. - Own cluster autoscaling and capacity strategy, including Karpenter, HPA and KEDA, and predictive scaling driven by event and calendar data. - Define SLOs and reliability metrics for platform components, using Datadog and our logging pipeline to surface cluster and workload health. - Support technical growth by sharing knowledge, participating in design discussions, and contributing to a collaborative team culture, including on-call rotation. What You'll Bring - Bachelor's degree in Computer Science or relevant education, experience, and training. - At least 4 years managing distributed cloud and on-premise environments at scale, with strong hands-on AWS experience. Exposure to GCP, vSphere, or Nutanix is a plus. - Deep expertise in container orchestration with Kubernetes, including the ability to design, scale, and troubleshoot complex workloads. - Strong experience developing software for automation and infrastructure tooling such as Go and Python. - Working knowledge of networking and Linux-based systems, including container runtimes such as Docker and containerd, packet-level debugging, and kernel troubleshooting. - Experience with Infrastructure as Code (IaC) and configuration management tools to ensure scalable and repeatable infrastructure provisioning. #LI-MF1 Join Our Team We're a publicly traded (NASDAQ: DKNG) technology company headquartered in Boston. As a regulated gaming company, you may be required to obtain a gaming license issued by the appropriate state agency as a condition of employment. Don't worry, we'll guide you through the process if this is relevant to your role. The US base salary range for this full-time position is 128,000.00 USD - 160,000.00 USD, plus bonus, equity, and benefits as applicable. Our ranges are determined by role, level, and location. The compensation information displayed on each job posting reflects the range for new hire pay rates for the position across all US locations. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific pay range and how that was determined during the hiring process. It is unlawful in Massachusetts to require or administer a lie detector test as a condition of employment or continued employment. An employer who violates this law shall be subject to criminal penalties and civil liability.
Imagine a world in which every single human being can freely share in the sum of all knowledge.
Title: Senior Site Reliability Engineer, Wikimedia Enterprise Location: Remote Category: Advancement The Wikimedia Foundation is looking for a Senior Site Reliability Engineer to join our team, reporting to the Sr. Engineering Manager. As the Site Reliability Engineer, you will play a key role in designing, developing, and maintaining reliable, scalable, and highly available infrastructure for our API services. You will contribute heavily to the high impact challenges behind innovating, building, and maintaining Wikipedia’s data feeds for high volume reusers. In this role, you will foster cross department collaboration with the wikimedia foundation SRE teams. You will own reliability targets (SLOs) for critical APIs, balancing performance, cost, and availability through data-driven decisions. You will be involved in designing and running the infrastructure and services that interact with the base of Wikimedia Foundation’s projects, including, but not limited to: Kubernetes clusters, application servers, code collaboration infrastructure, and other developer-facing services. You will participate in incident response and be on-call. This role requires frequent work with other members of the enterprise and Foundation SRE team to maintain and improve our systems, as well as interacting with people not in SRE, like Security, Release and Software Engineers, together striving to move our projects and technologies forward. Wikimedia Enterprise is a new, revenue-generating product that provides fast, comprehensive, reliable, and secure data ingestion for organizations that wish to repurpose Wikimedia/Wikipedia content in third party environments. Wikimedia Enterprise aims to improve the user experience for Wikimedia/Wikipedia readers beyond our own websites; increase the reach and discoverability of Wikimedia/Wikipedia content; and improve awareness and ease of attribution and verifiability of Wikimedia/Wikipedia content by the organizations that reuse our content the most. You can learn more about the project in WIRED and Insider. We are a distributed and diverse team of engineers with a drive to explore, experiment, and embrace technologies. We act sort of like a startup within the Wikimedia Foundation: we build quickly, deploy often, and our work has a very high impact on the global knowledge ecosystem. If you are up to the challenge of working on something fast paced, of creating services that will revolutionize the systems distributing our knowledge for billions of people across the world, and enjoy the idea of working with a globally distributed team, you might be just the person we need. You are responsible for: - Define, track, and improve Service Level Objectives (SLOs), SLIs, and error budgets to ensure reliability targets are met - Build and enhance observability systems (metrics, logs, and distributed tracing) to enable proactive detection and faster troubleshooting - Drive reliability engineering practices, including capacity planning, load testing, and resilience validation (e.g., chaos testing) - Improve developer experience (DevEx) by enabling self-service infrastructure and streamlining deployment workflows - Partner with engineering team members to embed reliability best practices early in the development lifecycle - Design, implement, and optimize CI/CD and GitOps workflows using tools such as GitLab (or similar) and ArgoCD(or similar), enabling automated, reliable deployments with support for progressive delivery strategies like canary and blue-green releases - Implement secure-by-default infrastructure and enforce best practices (e.g., IAM, secrets management, encryption) - Continuously optimize infrastructure cost and efficiency using FinOps principles while maintaining performance and availability - Establish and track operational metrics such as MTTR, MTTD, and incident frequency to drive continuous improvement - Reduce operational toil by identifying repetitive work and implementing automation-first solutions - Contribute to and evolve internal platform capabilities that standardize infrastructure and improve scalability across teams - Collaborating with a global and asynchronously communicating team (don’t worry if you have never worked remotely, we’ll help you get used to it) - Mentoring peers in your areas of technical and operational strength Skills and Experience: - Automation & Configuration Management: Experience with Infrastructure as Code and automation tools (e.g., Terraform, Ansible) and proficiency in at least one programming language (e.g., Python, Go, or similar) - Cloud Infrastructure: Experience designing, operating, and optimizing cloud-based systems across platforms such as AWS, Azure, or GCP, including scalability, reliability, and cost efficiency - CI/CD & Deployment Practices: Experience building and maintaining CI/CD pipelines and GitOps workflows (e.g., GitLab or similar, ArgoCD), with familiarity in progressive delivery approaches such as canary and blue-green deployments - Incident Management & Reliability Operations: Experience with incident response, on-call practices, and leading postmortems, with a focus on continuous improvement and operational excellence - SRE Principles & Observability: Strong understanding of SRE best practices, including SLOs, SLIs, and error budgets, along with experience in observability (metrics, logging, and distributed tracing e.g., Prometheus, OpenTelemetry) - Collaboration & Communication: Ability to work effectively in a distributed, cross-functional environment, with strong documentation and communication skills - Familiarity with Wikimedia or other open source projects is a plus. - If you are passionate about building and maintaining reliable, scalable, and highly available infrastructure on AWS, and thrive in a dynamic and collaborative environment, we encourage you to apply for this exciting opportunity to join our team at Wikimedia Enterprise Qualities that are important to us: - Proven experience operating highly available, large-scale distributed systems, with a deep understanding of reliability, scalability, and failure modes - Ownership mindset: Takes end-to-end responsibility for system reliability, proactively identifying and addressing risks before they impact users - Bias for automation: Continuously seeks to reduce operational toil through automation and scalable solutions - Continuous improvement mindset: Actively learns from incidents and drives improvements through blameless postmortems and iterative enhancements - Customer and reliability focus: Prioritizes user experience by balancing availability, performance, and cost - Adaptability and learning: Comfortable working in a fast-evolving environment and learning new tools and technologies as needed Additionally, we’d love it if you have: - Experience managing and troubleshooting event streaming platforms at scale (e.g., Kafka, Kinesis, or similar) - Hands-on experience with cloud platforms such as AWS and/or GCP, including designing and operating production systems - Familiarity with data lake architectures and large-scale data processing frameworks (e.g., Iceberg, Flink, Spark) - Experience with continuous profiling and performance optimization tools to identify bottlenecks and improve system efficiency - Experience working with or contributing to open source projects, particularly in infrastructure or data ecosystems - Prior participation in the Wikimedia movement About the Wikimedia Foundation The Wikimedia Foundation is the nonprofit organization that operates Wikipedia and the other Wikimedia free knowledge projects. Our vision is a world in which every single human can freely share in the sum of all knowledge. We believe that everyone has the potential to contribute something to our shared knowledge, and that everyone should be able to access that knowledge freely. We host Wikipedia and the Wikimedia projects, build software experiences for reading, contributing, and sharing Wikimedia content, support the volunteer communities and partners who make Wikimedia possible, and advocate for policies that enable Wikimedia and free knowledge to thrive. The Wikimedia Foundation is a charitable, not-for-profit organization that relies on donations. We receive donations from millions of individuals around the world, with an average donation of about $15. We also receive donations through institutional grants and gifts. The Wikimedia Foundation is a United States 501(c)(3) tax-exempt organization with offices in San Francisco, California, USA. As an equal opportunity employer, the Wikimedia Foundation values having a diverse workforce and continuously strives to maintain an inclusive and equitable workplace. We encourage people with a diverse range of backgrounds to apply. We do not discriminate against any person based upon their race, traits historically associated with race, religion, color, national origin, sex, pregnancy or related medical conditions, parental status, sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, genetic information, or any other legally protected characteristics. The Wikimedia Foundation is a remote-first organization with staff members including contractors based 40+ countries*. Salaries at the Wikimedia Foundation are set in a way that is competitive, equitable, and consistent with our values and culture. The anticipated annual pay range of this position for applicants based within the United States is US$[ 116,633 ] to US$[ 181,243 ] with multiple individualized factors, including cost of living in the location, being the determinants of the offered pay. For applicants located outside of the US, the pay range will be adjusted to the country of hire. We neither ask for nor take into consideration the salary history of applicants. The compensation for a successful applicant will be based on their skills, experience and location. *Please note that we are currently able to hire in the following: US States: Arizona, California, Colorado, Connecticut, District of Columbia*, Florida, Georgia, Idaho, Illinois, Indiana, Iowa, Maryland, Massachusetts, Michigan, Minnesota, Missouri, New Jersey, New Mexico, New York, North Carolina, Ohio, Oklahoma, Oregon, Pennsylvania, Puerto Rico*, Rhode Island, Tennessee, Texas, Utah, Vermont, Virginia, Washington, West Virginia, Wisconsin and Wyoming (*US Territory or Federal District) Countries: Brazil, Canada, Colombia, Germany, Ghana, India, Indonesia, Italy, Kenya*, Mexico, Morocco, Netherlands, Poland, Singapore*, South Africa, Spain, Switzerland and the United Kingdom. Our non-US employees are hired through a local third party Employer of Record (EOR) and must have current work authorization in their location. (*citizens/permanent residents only) We periodically review this list to streamline to ensure alignment with our hiring requirements. All applicants can reach out to their recruiter to understand more about the specific pay range for their location during the interview process. If you are a qualified applicant requiring assistance or an accommodation to complete any step of the application process due to a disability, you may contact us.
• Create and maintain environment-specific Kubernetes configuration packages (e.g., Helm values files, YAML manifests) that incorporate STIG-aligned hardening and RBAC policy enforcement per IL environment. • Implement and manage secrets injection strategies using FedRAMP-compliant tools (e.g., AWS Secrets Manager, Azure Key Vault), ensuring compatibility with automated deployment pipelines. • Integrate CI/CD pipelines to enable secure, automated deployment and rollback of ArcGIS components and related services, supporting ongoing agility and compliance across environments. • Coordinate with Kubernetes and Data Layer Engineers to ensure version control, pipeline integrity, and integration testing for all containerized services deployed under WO-003. • Coordinate with the WO-009 infrastructure team to provide estimated cloud resource usage profiles for Kubernetes-based deployments in IL2, IL4, and IL5 environments. • Supply container and workload specifications to the WO-009 infrastructure team to support provisioning, tagging, and prepay or reservation planning Implement resource tagging in alignment with program-wide cloud governance policy and reconcile actual usage against estimates provided by WO-009. • Support cost optimization efforts led by WO-009 by adjusting workloads and scaling strategies in response to performance and utilization feedback.
• Design and scale hosting architecture in IL-5 environments with a path to IL-6 • Evaluate and select deployment strategies across government networks, GovCloud, and hybrid platforms • Build a production environment that supports zero-downtime deployments and enterprise-scale growth • Design, build, and operate a modern CI/CD pipeline with integrated testing, security scanning, and compliance validation • Establish and manage development, staging, and production environments • Ensure alignment with DoD requirements (e.g., Iron Bank, STIGs, ACAS/OpenSCAP, FIPS) • Lead transition from incumbent contractor and government-furnished environments • Inventory, assess, and rationalize existing services, dependencies, and infrastructure • Simplify and consolidate architecture into a scalable, maintainable system • Own release cadence from capability intake through production deployment • Ensure deployment quality through monitoring, rollback readiness, and performance metrics (e.g., uptime, failure rates) • Support continuous delivery aligned with government timelines and operational needs • Partner with Platform Security and RMF leadership to align on ATO strategy • Ensure the pipeline generates required artifacts for authorization and continuous compliance • Embed security and compliance into the delivery lifecycle—not as a post-deployment step
Red Cell Partners, founded in 2020, is a dynamic and rapidly growing firm specializing in launching and scaling innovative companies across various industries. With a focus on deli
• This is a rare opportunity to build a production deployment pipeline for mission-critical DoD software from the ground up. • As the DevSecOps Project Lead, you will own the end-to-end software delivery lifecycle—from code intake through secure, scalable production deployment. • You will design and operate the DevSecOps pipeline, define the hosting architecture, and integrate security and compliance directly into delivery. • This role spans both modern commercial DevOps practices and the realities of DoD deployment (IL-5 today with a path to IL-6), requiring you to make critical decisions across government networks, cloud environments, and hybrid architectures. • This is a senior role combining hands-on execution with system-level ownership. You will build key components directly while guiding the transition from an existing government-furnished environment to a scalable, long-term production system.
Empowering lean security operations teams of any skill to successfully secure their environments. WE ARE HIRING!
Role Description We are seeking a highly skilled Senior DevOps / Site Reliability Engineer (SRE) to join our globally distributed engineering organization. This is a hands-on senior-level role focused on building, operating, and scaling reliable cloud-native infrastructure and distributed data platforms. The ideal candidate will have strong expertise in Kubernetes, cloud infrastructure, observability, automation, CI/CD, incident management, and infrastructure reliability. This role combines DevOps engineering practices with SRE principles to improve scalability, resiliency, operational efficiency, and platform performance across production environments. The engineer will work closely with platform, development, and operations teams to drive automation, operational excellence, and reliability best practices for mission-critical systems. Key Responsibilities - Administer and maintain Kubernetes clusters and containerized workloads. - Manage cloud infrastructure across OCI, AWS, GCP, or Azure environments. - Develop and maintain CI/CD pipelines for reliable application deployments. - Implement and manage Infrastructure as Code (IaC) using Terraform and Helm. - Build automation tooling and operational workflows using Python, Go, or Bash. - Drive observability initiatives including monitoring, logging, tracing, and alerting improvements. - Monitor, troubleshoot, and resolve production incidents while participating in on-call rotations. - Support and optimize distributed data platforms including Kafka, Elasticsearch, Spark, Redis, and MongoDB. - Improve platform reliability, scalability, and operational efficiency using SRE best practices. - Collaborate with cross-functional teams across multiple time zones. - Perform Linux system administration and networking troubleshooting. - Contribute to incident response processes, postmortems, and reliability improvements. - Support GitOps and deployment workflows using tools such as ArgoCD and GitHub Actions. - Evaluate and implement AI-assisted operational tooling for auto-remediation, alert correlation, and operational intelligence. Qualifications - 5+ years of experience in DevOps, SRE, or Platform Engineering roles. - Strong expertise with Kubernetes, Docker, and container orchestration. - Hands-on experience managing production cloud environments. - Strong Infrastructure as Code experience with Terraform and Helm. - Experience with CI/CD tools and deployment automation. - Advanced troubleshooting skills in Linux systems, networking, and distributed systems. - Experience with observability platforms including Prometheus, Grafana, Loki, Alertmanager, and Elastic Stack. - Strong programming and scripting skills in Python, Bash, or Go. - Experience supporting high-availability production systems and on-call operations. - Knowledge of incident management and reliability engineering practices. - Familiarity with data platform technologies such as Kafka, Spark, Elasticsearch, Redis, or MongoDB. - Understanding of AI-driven operational tooling and automated remediation concepts. - Excellent communication, collaboration, and problem-solving skills. - Resides on the East Coast. Benefits - Pre-IPO Stock Options - Medical, Dental & Vision care - 401(k) - Employee Assistance Program - Employee Discount Program - Life Insurance - Paid time off - Referral Program - Rewards and Recognition Program Compensation The base compensation range for this role is USD 165,000-215,000 per year. Total compensation includes bonus opportunity and equity, and will vary based on candidate location.
1,623more opportunities are still waiting for you.Log in now and take your next shot before someone else does.
AWS, Cloud, Azure, Kubernetes, Terraform, TypeScript