Job Closed
This listing is no longer active.
Empowering the people that make global commerce happen.
Director of DevOps and Site Reliability Engineering (SRE)
Location
United States
Posted
102 days ago
Salary
$0
Seniority
Senior
Job Description
Director of DevOps and Site Reliability Engineering (SRE)
CargoSprint
About CargoSprint At CargoSprint, we “Build and Empower the Best Teams”! CargoSprint is made up of a world-class team of highly motivated individuals who are passionate about transforming the cargo industry. We have developed cutting-edge digital solutions that streamline cargo operations, enhance efficiency, and improve the overall experience for everyone involved. Our workplace fosters innovation, collaboration, and the drive to solve industry challenges. CargoSprint is dedicated to delivering game-changing solutions that connect the cargo industry like never before, and we are looking for driven, enthusiastic people who share our vision of innovation and excellence. If you think we are a great mutual fit , we want hear from you! About You You are passionate about the role and thrive on solving complex problems with a talented team of colleagues who both challenge and support you. You believe in lifelong learning, constantly honing your skills and staying on the cutting edge of technology. Most importantly, you want to engage your talents to make a meaningful difference by revolutionizing the cargo industry. About the role We’re seeking a hands-on Sr. Engineering Manager to lead our DevOps, Site Reliability Engineering (SRE), and Database teams. You’ll be responsible for building scalable, secure, and high-performing cloud infrastructure — with a strong emphasis on Azure Cloud — while fostering a culture of reliability, automation, and continuous improvement. What you'll do Lead and mentor a distributed team of DevOps, SRE, and Database engineers. Architect and operate secure, scalable, and cost-efficient Azure Cloud environments. Implement and optimize CI/CD pipelines, infrastructure as code (IaC), and observability platforms. Champion AIOps and AI-driven tooling (e.g., GitHub Copilot, Azure DevOps AI, intelligent alerting) to improve developer productivity and operational efficiency. Establish and enforce SRE practices — SLIs/SLOs, incident response, on-call processes, and postmortems. Oversee performance, scalability, and reliability of PostgreSQL, MySQL, SQL Server, CosmosDB and Redis databases in production. Partner cross-functionally with product and engineering teams to align infrastructure with business priorities. Drive cost optimization, disaster recovery, and security compliance initiatives Qualifications Compensation and Benefits At CargoSprint, we are “Empowering the People that make Global Commerce Happen”—and we know that starts with our CargoSpinter’s. That’s why we offer competitive pay and benefits designed to fuel our team’s success: Health & Wellness: Medical, dental, and vision plans for you and your family Future-Ready: 401(k) with company match Work Life Balance: Generous flexible PTO program and paid holidays Grow With Us: Professional development opportunities Does this role sound like the next step in your career? We’d love to hear from you! If you don’t meet all the requirements exactly, we encourage you to use your cover letter to tell us about your unique experience—talent comes from many places, and skills are transferable. #LI-EB1 #LI-Remote Our Commitment to an Extraordinary Work Environment At CargoSprint, we value diversity and inclusivity. We strive to create a welcoming and supportive community for employees from all backgrounds. Regardless of your gender, sexual orientation, physical ability, religion, ethnicity, race, or age, you will find a place where you can thrive and be your authentic self. Our CargoSprint Recruitment Team personally reviews every application.
Job Requirements
- 10+ years of experience in DevOps, Infrastructure, or SRE roles, including 3+ years of leadership experience managing multiple teams.
- Deep hands-on expertise with Azure Cloud, including networking, identity, security, and monitoring services.
- Proficiency in Kubernetes, Docker, Terraform, Azure DevOps, and CI/CD ecosystems.
- Proven experience managing relational and NoSQL databases at scale.
- Experience building observability stacks with Prometheus, Grafana, ELK, or Azure Monitor.
- Strong problem-solving, communication, and mentoring skills.
- Track record of integrating AI tools to reduce toil and improve operational insights.
- Nice to Have
- Experience in multi-cloud environments (AWS or GCP).
- Familiarity with AIOps, MLOps, or GenAI-assisted automation.
- Experience working in regulated or enterprise-scale environments (e.g., finance, healthcare).
- Prior success in high-growth startups or scaling SaaS platforms
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
About Us Resonance is a technology company building a more sustainable and valuable fashion industry for designers, brands, manufacturers, consumers, and the planet. The company’s AI-powered operating system, ONE, enables brands to design, sell, and make in that order – empowering designers to operate with no unnecessary inventory and eliminating the financial and environmental burdens of the legacy fashion industry. Resonance ONE is our end-to-end platform that powers every aspect of an apparel brand’s business, constantly learning and optimizing how garments are designed, sold, and made. Headquartered in New York City and Santiago, Dominican Republic, Resonance has partnered with more than 30 brands – including THE KIT and Rebecca Minkoff – to create garments that use 97% less dye, 70% less water, and 50% less material than any other fashion brand — and immediately eliminate overproduction. Want to know more? Visit our website and read articles about us. About the Role We’re looking for a talented DevOps Engineer to join our remote team and help scale the sophisticated infrastructure behind Resonance ONE. As a DevOps Engineer at Resonance, you will play a critical role in designing, building, and maintaining a complex full-stack platform that underpins everything from digital design tools to e-commerce and manufacturing automation. Our stack spans a wide range of modern technologies – from machine learning services (OpenAI and other ML models) to a robust cloud backend (AWS infrastructure, AWS Lambda), data and analytics systems (Hasura GraphQL engine, Snowflake data warehouse, Looker BI), event streaming (Kafka), and orchestration tools (Kubernetes with Argo Workflows, plus integrations with tools like Airtable) – all working in concert to realize our mission. In this role, you will ensure these diverse components work together in harmony, securely and at scale. You’ll have the opportunity to shape and implement scalable DevOps practices and systems from the ground up in a forward-thinking, AI-driven organization. You will collaborate closely with software engineers, data scientists, and product teams to continuously improve our development pipeline, deployment processes, and infrastructure automation. This is a unique chance to tackle challenging problems in an architecture that pushes the boundaries of technology – all while enabling fashion brands to innovate without waste. Responsibilities Architect and Maintain Cloud Infrastructure : Build, maintain, and scale our AWS cloud infrastructure using infrastructure-as-code and modern CI/CD pipelines (e.g. Argo Workflows). Ensure reliable, automated deployments of our applications and machine learning services across development, staging, and production environments. Container Orchestration : Manage our Kubernetes clusters and containerized microservices, optimizing for high availability, security, and efficient resource usage. Continuously improve our cluster deployment, scaling strategies, and rollback processes to support a rapidly growing platform. CI/CD & Automation : Design and implement continuous integration and delivery pipelines that empower our development team to ship code and ML model updates quickly and safely. Automate routine operations and workflows, reducing manual work through scripts, AWS Lambda functions, and other automation tools. Monitoring & Reliability : Implement robust monitoring, logging, and alerting (using tools like Prometheus, CloudWatch, etc.) to proactively track system performance and reliability. Quickly troubleshoot and resolve infrastructure issues or bottlenecks across the stack to maintain high uptime and responsive services. Data & Pipeline Integration : Work closely with our data engineering team to support a seamless flow of data through the platform. Maintain and optimize our event streaming and pipeline architecture (Kafka) and its integration with downstream systems like our Snowflake data warehouse and Looker analytics, ensuring data is delivered accurately and on time. AI/ML Infrastructure : Collaborate with machine learning engineers to deploy and scale AI/ML models in production. Support the integration of OpenAI and other ML models into our applications, implementing the infrastructure (compute, storage, containers) needed for model training, inference, and monitoring model performance in a live environment. Tool Integration & Support : Integrate and manage internal and third-party tools that extend our platform’s functionality – for example, maintaining our Hasura GraphQL engine that interfaces with databases, or automating workflows involving external services like Airtable. Ensure these tools are properly deployed, updated, and aligned with our security and compliance standards. DevOps Best Practices & Culture : Champion DevOps best practices across the engineering organization. This includes improving our release processes (e.g. implementing GitOps workflows), optimizing build/test pipelines, and mentoring developers on using infrastructure tools. You will continually evaluate new technologies and processes to enhance deployment speed, reliability, and scalability, while balancing rapid iteration with operational stability. Requirements Minimum Requirements Experience : 5+ years of experience in DevOps, SRE, or related infrastructure engineering roles, with a track record of managing complex, distributed systems at scale. Cloud Proficiency : Strong expertise in AWS and cloud architecture (compute, storage, networking, and security). You have designed and maintained scalable infrastructure using services like EC2/ECS/EKS, S3, RDS, VPC, and Lambda, and you understand how to build secure and cost-efficient cloud environments. Containers & Orchestration : Hands-on experience with containerization and orchestration – you have managed production Kubernetes clusters (or similar orchestration platforms), and you’re comfortable with Docker and container lifecycle management. CI/CD & Automation : Proven ability to create and manage CI/CD pipelines using tools such as Jenkins, CircleCI, GitHub Actions, or Argo. You automate workflows wherever possible and have experience implementing GitOps or similar practices to streamline deployments. Infrastructure as Code : Proficiency in scripting and infrastructure-as-code (Terraform, CloudFormation, or equivalent). You can manage infrastructure configuration in a reproducible way and have experience automating cloud resource provisioning. Monitoring & Troubleshooting : Solid knowledge of monitoring and logging frameworks (e.g. Prometheus, Grafana, ELK stack, CloudWatch) and experience setting up alerts and dashboards. You excel at diagnosing issues across the full stack – from network and infrastructure to application logs – and ensuring high reliability. Data Pipeline Familiarity : Familiarity with event-driven architecture and data pipelines. You have worked with messaging or streaming systems (e.g. Kafka, Kinesis) and understand how to connect various data stores and services (relational and NoSQL databases, data warehouses like Snowflake) in a production environment. Security Mindset : Good understanding of security best practices in cloud and DevOps (managing secrets, IAM roles, VPC security, etc.). You are vigilant about maintaining compliance and protecting sensitive data across all systems. Collaboration & Communication : Excellent communication skills and a collaborative attitude. You can work effectively on a remote, cross-functional team, partnering with software engineers, data scientists, product managers, and QA to achieve common goals. Adaptability : Self-driven and adaptable to change. You thrive in fast-paced, ambiguous environments and take ownership of delivering results. You prefer simple, elegant solutions and have a knack for prioritizing what will scale and add value, in line with our mission to deliver results and delight our users. Preferred Qualifications Startup / 0→1 Experience : Experience working in a startup or building systems from scratch. You’re comfortable with the scrappiness and ingenuity required to design new infrastructure and processes in a rapidly evolving environment. MLOps & AI Services : Exposure to MLOps or AI-driven platforms. Experience deploying or managing machine learning models in production, or familiarity with ML frameworks and services (e.g. handling model serving, working with OpenAI or similar AI APIs) is a strong plus. Data & Analytics Tools : Experience with data warehousing and analytics tools – for example, deploying or maintaining Snowflake, or integrating BI platforms like Looker into a data pipeline. Understanding of how to optimize data flows and query performance in such systems is a plus. GraphQL / Hasura : Familiarity with GraphQL APIs and frameworks (especially Hasura). You understand how GraphQL layers interface with backend databases and can optimize or troubleshoot in such an environment. Orchestration & Serverless : Experience with workflow orchestration tools like Argo Workflows (or similar, e.g. Airflow, Tekton) for running complex jobs/pipelines. Experience managing serverless functions (AWS Lambda) as part of a larger system is also beneficial. Domain Interest : A passion for our mission of sustainability and transforming the fashion industry. Interest or experience in e-commerce, manufacturing processes, or fashion technology is a plus – you enjoy applying technology to solve real-world problems in new domains. Benefits Compensation & Benefits : We offer full benefits (medical, dental, and vision) and a competitive salary, along with equity participation. You’ll be joining a passionate team with a shared mission and ample opportunities for growth. Remote Work : This is a fully remote position. We embrace a remote-first culture that allows you to work from anywhere, while staying closely connected with a diverse, global team. (Periodic travel to our NYC or Dominican Republic hubs for team gatherings is optional/occasional.*) Mission-Driven Culture : Work on something meaningful – every feature you help ship and every system you optimize contributes to eliminating waste in the fashion industry and driving sustainable innovation. We foster a creative, inclusive environment where new ideas are encouraged. Equal Opportunity Employer : Resonance Companies is an equal opportunity employer and values diversity in our company. We do not discriminate on the basis of race, religion, color, sex, gender identity, sexual orientation, age, non-disqualifying physical or mental disability, national origin, veteran status, or any other status protected by applicable law. All employment decisions are based on qualifications, merit, and business need. (Note: The role is remote; any mention of travel or specific location is flexible and can be adjusted based on company policy.)
Site Reliability Engineer - US - Remote
HomeVisionComprehensive collateral underwriting, powered by machine intelligence
HomeVision is building products to modernize real estate valuation and create a more efficient, transparent, and equitable housing market. We leverage technologies like NLP, computer vision, and large language models (LLMs) to streamline appraisals and help appraisers work more efficiently. We’re backed by Initialized Capital, growing fast, and looking for an SRE to help us scale. What you’ll do Build and maintain infrastructure for our SaaS products, primarily on AWS Create tooling and manage platform components to support developers Tackle software development projects - often related to authentication, reliability, observability and other platform concerns Help with day-to-day needs like testing environments and deployments Handle IT tasks such as onboarding and account provisioning Work a generally flexible schedule while generally being available until 6PM Pacific Time to provide internal support and monitoring coverage What you’ll bring Competitive salary, equity, and health benefits A high degree of ownership and autonomy Support for your professional growth Fully remote, flexible work environment No recruiters or automated submissions.
DevOps Engineer (Remote)
Information & Technology Management"Empowering IT Excellence: Unlocking Your Business Potential"
DevOps Engineer - FULLY REMOTE Duration: 12 months REQUIRED QUALIFICATIONS: PREFERRED QUALIFICATIONS ● Good working knowledge of load balancing technologies. ● Understanding of authentication/authorization mechanisms like OAuth2, OpenID and SAML. Deep understanding of virtualization technologies, storage and networking backend infrastructure. ● Expertise in creating and troubleshooting PowerShell scripts. ● Experience in automation of day-to-day tasks using PowerShell. ● Knowledge of managing and integrating on prem TFS and cloud bases VSTS with Azure Web Apps.
About The Role We are looking for a highly skilled DevOps Engineer with expertise in infrastructure automation and quality assurance (QA) automation. This role will be pivotal in designing secure CI/CD pipelines, implementing robust testing frameworks, and ensuring system reliability across our cloud environments. Your Impact: DevOps & Cloud Infrastructure: (50%) Build, optimize, and maintain scalable CI/CD pipelines (GitHub Actions, Jenkins, GitLab CI, ArgoCD). Manage cloud infrastructure (AWS/Azure/GCP) using (Terraform, Pulumi, CloudFormation). Automate deployments using Kubernetes, Docker, and serverless architectures Implement observability solutions (Prometheus, Grafana, Datadog) for performance monitoring. Quality Assurance (QA) Automation: (50%) Develop and maintain automated test suites (Selenium, Cypress, Playwright, Postman). Ensure high test coverage (unit, integration, E2E) within CI/CD workflows. Collaborate with developers to shift-left testing and improve code quality. Analyze test results, track defects, and optimize performance/load testing (JMeter, k6). Skills & Qualifications: Compensation : Market-competitive compensation including, for most roles, exposure to pre-launch tokens In addition, 0G Labs is committed to the health and well being of all of our team members. To that end, we provide reimbursements towards a holistic set of experiences and courses: Core self : Transcendental Meditation Mind : Landmark Education Emotion : Art of Communication Presence : Speech Coach Body : Fitness, gym and exercise memberships/classes




