Arize AI logo
Arize AI

Arize AI is a machine learning observability platform for ML practitioners to detect and troubleshoot model issues

Senior DevOps Engineer

DevOps EngineerDevOps EngineerFull TimeRemoteSeniorTeam 51-200Since 2019H1B SponsorCompany SiteLinkedIn

Location

Singapore

Posted

52 days ago

Salary

0

Seniority

Senior

Job Description

Senior DevOps Engineer

Arize AI

• Work hands-on with the infrastructure that supports our distributed & highly scalable services in both SaaS and on-prem offerings • Gather requirements from customers and adapt manifests and software to support new environments • Use and augment monitoring tools to observe platform health, ensure performance and reliability • Interact with the product team to test new features and package new on-prem releases • Automate and optimize the release pipeline to make it as frictionless as possible • Exhibit continuous curiosity for emerging technology that could solve our challenges

Job Requirements

  • 3+ years of experience as a DevOps Engineer, Cloud Engineer, Infrastructure Engineer or similar
  • Excellent communication skills and ability to work directly with customers to understand and address their infrastructure needs
  • Experience and fluency in Kubernetes
  • A self starter with an ability to thrive in a fast paced environment
  • Experience working with multiple cloud providers (AWS, GCP, Azure) and understanding how to adapt cloud-native architectures for on-premises environments
  • Strong troubleshooting skills

Benefits

  • unlimited paid time off
  • generous parental leave plan
  • mental and wellness support

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Coupa Software logo

Sr. DevOps Engineer - 11131

Coupa Software

Spend is the fuel to help your company deliver performance, profitability, and purpose!

DevOps Engineer52 days ago
Full TimeRemoteTeam 1,001-5,000Since 2006H1B Sponsor

Coupa makes margins multiply through its community-generated AI and industry-leading total spend management platform for businesses large and small. Coupa AI is informed by trillions of dollars of direct and indirect spend data across a global network of 10M+ buyers and suppliers. We empower you with the ability to predict, prescribe, and automate smarter, more profitable business decisions to improve operating margins. Why join Coupa? 🔹 Pioneering Technology: At Coupa, we're at the forefront of innovation, leveraging the latest technology to empower our customers with greater efficiency and visibility in their spend. 🔹 Collaborative Culture: We value collaboration and teamwork, and our culture is driven by transparency, openness, and a shared commitment to excellence. 🔹 Global Impact: Join a company where your work has a global, measurable impact on our clients, the business, and each other. Learn more on Life at Coupa blog and hear from our employees about their experiences working at Coupa. The Impact of a Sr. Devops Engineer at Coupa: As a Sr. Devops Engineer Platform, you will play a crucial role in the development of solutions for our Enterprise platform. You will be developing applications that provide self-service and increased efficiency to a diverse group of internal customers across Cloud Operations, Engineering, Customer Success & Support, and Customer Value Management. When you are successful you will significantly accelerate the ability of our teams to better serve our customers. What You'll Do: - Own, build, and operate modern cloud platforms delivering APIs, integrations, security, messaging, streaming, identity, reliability, and container services - Design and run a secure, scalable, multi-region and multi-cloud platform across AWS and Azure in a regulated environment supporting hundreds of services - Act as a technical lead and partner, providing guidance and direction to multiple platform engineering teams - Drive containerization efforts to improve efficiency, scalability, reliability, and cost optimization - Define and execute long-term platform strategies and roadmaps aligned with business priorities in collaboration with engineering and product teams - Take full ownership of platform reliability and scalability, using data-driven insights to deliver high-quality features while managing technical debt effectively What You Will Bring to Coupa: - 4+ years of hands-on platform/system engineering experience using Go, Python, Java, Ruby, or equivalent languages - 2+ years in a lead engineering role, collaborating with diverse, globally distributed teams - Strong experience with containerization and CI/CD technologies, including Docker, Kubernetes, GitHub Actions, ArgoCD, and cloud container services (EKS, AKS, ECS) - Experience with Infrastructure as Code and multi-cloud deployments across AWS, Azure, and GCP - Proven ability to build, run, and scale modern full-stack cloud applications reliably in public cloud environments - Excellent communication skills and technical leadership, with the ability to own roadmaps, align with business priorities, and solve complex problems; supported by a degree in Computer Science, Computer Engineering, or equivalent #LI-Hybrid #LI-TC1 Coupa complies with relevant laws and regulations regarding equal opportunity and offers a welcoming and inclusive work environment. Decisions related to hiring, compensation, training, or evaluating performance are made fairly, and we provide equal employment opportunities to all qualified candidates and employees. Please be advised that inquiries or resumes from recruiters will not be accepted. By submitting your application, you acknowledge that you have read Coupa’s Privacy Policy and understand that Coupa receives/collects your application, including your personal data, for the purposes of managing Coupa's ongoing recruitment and placement activities, including for employment purposes in the event of a successful application and for notification of future job opportunities if you did not succeed the first time. You will find more details about how your application is processed, the purposes of processing, and how long we retain your application in our Privacy Policy.

Colombia
Job Closed
Tulip logo

Senior DevOps Engineer

Tulip

Tulip, the leader in frontline operations, is helping companies around the world equip their workforce with connected apps, leading to higher quality work, improved efficiency, and end-to-end traceability across operations. Companies of all sizes and across industries have implemented composable solutions with Tulip’s cloud-native, no-code platform to solve some of the most pressing challenges in operations: error-proofing processes and boosting productivity, capturing and analyzing real-time data, and continuous improvement.

DevOps Engineer52 days ago
Full TimeRemoteTeam 310Since 2014

To be considered for this role, candidates must have United States Citizenship due to the nature of the assignments. If you do not have U.S. Citizenship, you will not be considered for the position. Tulip, the leader in AI-native frontline operations, is helping companies around the world equip their workforce with composable, connected apps, leading to higher quality work, improved efficiency, and end-to-end traceability across operations. Tulip's cloud-native, no-code platform, powered by embedded AI, is driving the digital transformation of industrial environments through composable, human-centric solutions that go beyond disrupting the Manufacturing Execution System (MES) category. A spinoff out of MIT, Tulip is headquartered in Somerville, MA, with offices in Germany, Hungary, Singapore, and Israel. Tulip has been recognized as a World Economic Forum Global Innovator, a 2024 Deloitte Technology Fast award winner, one of Energage's Top Workplaces USA, and one of Built In Boston's "Best Places to Work" and "Best Midsize Places to Work." About You You're a senior infrastructure engineer who believes that the best system is one that runs itself. You thrive on building automation that eliminates toil, designing for resilience at global scale, and owning the full lifecycle of cloud infrastructure — from architecture to observability. You bring a bias for action and a continuous improvement mindset to everything you do: if something can be automated, you'll automate it; if a system is fragile, you'll make it robust. You're equally comfortable diving deep into a production incident and partnering with developers to make their lives easier. What Skills Do I Need? - 5-7+ years of hands-on DevOps or Infrastructure Engineering experience, with demonstrated ownership of production cloud environments at scale - Proficiency with modern cloud infrastructure tooling — experience with Kubernetes, Helm, Terraform, Ansible, and major cloud providers (AWS and/or Azure) is highly relevant - Experience managing enterprise-grade data persistence layers, including NoSQL and SQL databases, key/value stores, and messaging systems (e.g., AMQP, MQTT) - Familiarity with observability and monitoring tooling (e.g., Prometheus, Mimir, Thanos, Grafana) and a strong understanding of what good SRE practice looks like in a fast-growing SaaS environment - Exposure to modern programming or scripting languages used in infrastructure contexts (e.g., Go, TypeScript, Python, Bash) - Bachelor's degree in Computer Science, Engineering, or equivalent practical experience - Must be able to work Eastern Time Zone (Boston) Hours Key Responsibilities - Own the deployment, health, and continuous improvement of Tulip's multi-cloud, multi-region SaaS environments — including clusters spanning the US, Europe, and Asia - Design and evolve cloud architecture to ensure customer availability, stability, and performance as Tulip scales globally - Own and continuously improve Tulip's CI/CD infrastructure, driving toward a fully automated, human-interaction-free software delivery lifecycle - Build automation tooling and internal systems that reduce operational toil and increase developer velocity — if it can be automated, automate it - Define and maintain observability standards across Tulip's cloud environments, including metrics, alerting, logging, and distributed tracing - Proactively identify performance degradation and capacity risks before they impact customers; lead incident response and drive root cause analysis - Serve as a close partner to application engineering teams throughout the software development lifecycle, providing infrastructure guidance and support - Participate in the on-call rotation and contribute to a culture of continuous improvement through documentation, runbooks, and process iteration Key Collaborators - Application Engineering teams - Edge & Hardware teams - QA team - Product Management - Customer Support Working At Tulip We know even great candidates experience imposter syndrome. Even if you don’t match every requirement, applying gives you the opportunity to be considered. We’re building a strong, diverse team that values hard work, families, and personal well-being. Benefits of working with us include: - Direct impact on product and culture - Company equity - Competitive benefits package including Health, Dental, Vision, Short-term Disability, Long-term Disability, Life Insurance, AD&D Insurance, Flexible Spending Account (FSA), Commuter Benefits, Parental Leave, and 401(K) - Flexible work schedule and unlimited vacation policy - Virtual company events and happy hours - Fitness subsidies We are an equal opportunity employer. At Tulip, we celebrate all. Qualified applicants will receive consideration for employment without regard to race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. Help us build an inclusive community that will transform frontline operations. The compensation information displayed on each job posting reflects the range for new hire pay rates for the position across all US locations. Within the range posted, actual compensation will be determined depending on multiple factors including job-related knowledge & skills, experience, business needs, geographical location, market compensation data, and internal equity. Expected compensation ranges for this role may change over time. The salary range for this position is $130,000 - $190,000 per year. It is unlawful in Massachusetts to require or administer a lie detector test as a condition of employment or continued employment. An employer who violates this law shall be subject to criminal penalties and civil liability. Please note that we may use AI-based tools to support parts of our hiring process. All data processing is carried out in compliance with local data protection laws, ensuring all personal candidate information is handled securely and ethically.

United States
Job Closed
Caris Life Sciences logo

Staff DevOps Engineer

Caris Life Sciences

Fulfilling the promise of precision medicine through quality and innovation.

DevOps Engineer52 days ago
Full TimeRemoteTeam 1,001-5,000Since 2008H1B No Sponsor

• Serve as a Staff DevOps Engineer specializing in AWS and Kubernetes to design, implement, and optimize scalable, secure cloud-native infrastructure • Lead PoC initiatives, oversee monitoring solutions, and translate SOX compliance into actionable cloud implementation plans • Break down silos by building a comprehensive team knowledge base, ensuring broad support capabilities • Provide technical leadership in cloud migration, security, and DevOps best practices, driving innovation and operational excellence across the organization • Lead the design, implementation, and management of Kubernetes clusters on AWS EKS, ensuring high availability, scalability, and security • Implement and manage advanced features including autoscaling, monitoring, logging, and security policies • Spearhead proof-of-concept (PoC) initiatives for new tools and environments, evaluating their potential benefits for the organization • Manage the full lifecycle of Kubernetes clusters, including regular upgrades, patch management, version control, and performance optimization • Provide expert-level support and guidance to teams for deploying and optimizing applications on Kubernetes, including container orchestration and service mesh implementation • Design and implement monitoring and alerting solutions for applications and infrastructure using CloudWatch, Prometheus, and Datadog • Develop observability standards and dashboards, leveraging AI/AIOps approaches and SRE agents to enable anomaly detection, alert noise reduction, and automated root cause analysis • Develop and maintain Infrastructure as Code (IaC) using tools such as Terraform or AWS CDK, and implement CI/CD pipelines for efficient application deployment and image management • Design and implement security solutions, including the deployment and management of security tools, and translate SOX compliance requirements into actionable implementation plans for cloud environments • Lead initiatives for cloud migration and modernization of legacy applications, collaborating with cross-functional teams to support their cloud and infrastructure needs • Provide technical leadership and mentorship to junior engineers on cloud technologies and DevOps practices, implementing knowledge-sharing initiatives to ensure broad support capabilities across the team • Stay current with emerging AWS services and features, evaluating their potential benefits and optimizing cloud resource utilization and cost-efficiency • Develop and maintain comprehensive documentation, including a team knowledge base, runbooks, and process documentation to eliminate information silos • Proactively identify areas of inefficiency and develop strategic plans for process improvements across the DevOps and cloud infrastructure landscape • Participate in on-call rotations to support critical cloud infrastructure and respond to emergency issues as needed

New Jersey + 2 moreAll locations: New Jersey | Oklahoma | Minnesota
$140K - $179K / year
Job Closed

Senior Cloud Operations Engineer

MRI Technologies

Proof of U.S. Citizenship is a requirement for this position. Must be able to complete a U.S. government background investigation. MRI Technologies is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or protected veteran status. As we are a Federal Contractor, most positions require the employee to obtain and maintain a U.S. Government background investigation. MRI also completes a pre-screening background check for anyone offered employment.

DevOps Engineer52 days ago

Role Description MRI Technologies has an exciting opportunity for a Senior Cloud Operations Engineer on the Mission Enabling Services Contract (MESC) at NASA. In this role, you will build and operate Azure cloud infrastructure from the ground up, establishing the landing zones, governance controls, identity integration, and compliance foundations that will carry NASA's mission workloads for years to come. You will work alongside cloud architects, security engineers, and NASA stakeholders, leading the Azure expansion of a production platform that already operates at enterprise scale on GCP. You are the subject-matter expert on Azure infrastructure design and federal compliance requirements, translating deep technical knowledge into a government-grade environment that meets NIST 800-171, CMMC 2.0, and NASA security standards. This is a build role, not a maintenance role. The platform is early-stage, and your decisions now will define how it operates across the life of the program. A typical day might include: - Reviewing Terraform changes for a new Azure subscription configuration - Coordinating with the security team on a Defender for Cloud alert - Designing a new landing zone architecture - Implementing Azure Policy assignments across management groups - Troubleshooting a network connectivity issue between the Azure environment and the existing GCP platform - Building CI/CD pipeline stages for infrastructure provisioning - Updating cost management tagging standards - Drafting runbook documentation for the operations team The work is hands-on, high-trust, and connected to infrastructure that supports NASA mission success. Qualifications - Bachelor's Degree in Computer Science, IT, or equivalent - 6 or more years of relevant cloud engineering experience - Deep experience designing and operating Azure cloud environments at an enterprise or government scale - Terraform (HCL) for Azure infrastructure automation - Azure Active Directory / Entra ID administration and federated identity configuration - Azure Policy, Management Groups, and subscription governance - CI/CD pipeline experience (Azure DevOps, GitLab, or equivalent) - Experience with NIST 800-171 or CMMC 2.0 compliance requirements - Strong networking fundamentals: VNets, NSGs, ExpressRoute/VPN, DNS - Ability to excel in a remote work environment Requirements - Azure Solutions Architect Expert or equivalent certification (preferred) - Experience with multi-cloud environments (Azure + GCP integration) (preferred) - Microsoft Sentinel, Defender for Cloud, or Azure Monitor experience (preferred) - Prior federal government or DoD cloud engineering experience (preferred) - Experience with CUI handling and controlled environment design (preferred) - Familiarity with CMMC 2.0 Level 2 self-assessment or C3PAO audit preparation (preferred) Benefits - Comprehensive benefits package including medical, dental, vision - Company-paid life and disability insurance - Paid time off - 401(k) - Flexible work schedule - Strong career development opportunities working alongside NASA's mission teams Company Description Proof of U.S. Citizenship is a requirement for this position. Must be able to complete a U.S. government background investigation. MRI Technologies is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or protected veteran status. As we are a Federal Contractor, most positions require the employee to obtain and maintain a U.S. Government background investigation. MRI also completes a pre-screening background check for anyone offered employment.

United States