Senior Cloud Engineer

Platform EngineerPlatform EngineerFull Time Remote SeniorTeam 51-200

Location

United States

Posted

97 days ago

Salary

Seniority

Senior

No structured requirement data.

Job Description

Senior Cloud Engineer ST6 (Seal Team Six) is an elite team of battle-hardened software operators dedicated to building enduringly great software companies. Our focus is on professionalizing and scaling software businesses from $100 million to $500 million. We partner with top-tier private equity software firms such as TA, Hg, Insight Partners, and Genstar to acquire and build one platform company per year. Our companies are not the largest or flashiest, but they are among the best-run software businesses, creating value for customers and shareholders at an accelerated pace. To date, our team has built six platform companies, each culminating in multiple liquidity transactions with multi-billion-dollar valuations. The Senior Cloud Engineer will be instrumental in maintaining and enhancing the cloud infrastructure. This role focuses on optimizing cloud operations to drive cost efficiency, ensure high performance, and maintain scalability and reliability of the company's services. The position will work closely with the development and operations teams to automate processes, manage cloud resources, and ensure strict compliance with security standards. Key Responsibilities - Maintain 99.9% availability for customer-facing hosted solutions, ensuring high reliability. - Reduce cloud operating costs consistently by utilizing automation and innovative tools. - Ensure scalability of systems without compromising on performance. - Achieve and maintain compliance with key security standards, such as ISO 27001. - Maintain a minimum of 90% CSAT rating with a 10% response rate for all cloud service-related cases. Key Qualities - Involves a meticulous approach to work, prioritizing accuracy and thoroughness to ensure high-quality outcomes - Encapsulates taking full responsibility for ones actions and their outcomes, emphasizing accountability and learning from experiences Skills - Experience in risk assessment - Experience in disaster recovery - Experience in cloud / saas

Related Categories

Platform Engineer

Related Job Pages

Remote Full-time Jobs (US)More Remote Jobs

More Platform Engineer Jobs

Data Platform Engineer

Lola Blankets

Platform Engineer97 days ago

Full Time Remote

Company Site

• Own our data ingestion layer end-to-end, including completing our migration to open-source ingestion tooling (dlt) and maintaining reliability as the stack evolves • Manage dbt models, tests, documentation, and the semantic layer - the definitions that determine what every metric means across the business • Own Dagster orchestration: scheduling, retries, alerting, and failure handling across all pipeline runs • Keep Lightdash metadata, dimension/measure definitions, and access controls accurate and current • Accelerate data refresh cycles to support near-real-time operational use across the business • Build monitoring, failure alerting, and anomaly detection into the stack so issues surface proactively • Chase data through systems when things go wrong: trace why records drop or transform unexpectedly between source and dashboard, and resolve the root cause rather than the symptom • Establish and document data quality standards and lineage practices across the warehouse • Partner with our Technology and Engineering Lead on platform infrastructure, system integrations, and technical initiatives where data is a core component • Build and maintain reverse ETL pipelines to push warehouse data back into operational tools • Support real-time event pipeline development as new data sources and product surfaces come online • Contribute to A/B testing infrastructure and the systems that support consistent metric definitions across the org • Own separation of dev and production environments: deployment pipelines, change management, access controls, and release practices • Run a PII audit across the stack and implement data warehouse governance standards • Maintain infrastructure documentation and ensure the platform is operable beyond any single person • Continuously evaluate our platform stack to ensure we're using the right tools - favoring open-source, cost-effective, and maintainable solutions

Airflow Cloud ETL Python SQL TypeScript

View details: Data Platform Engineer

United States

Apply

Job Closed

Senior ML Platform Engineer

Synthesia

Create studio-quality videos with AI avatars and voiceovers in 140+ languages. Trusted by Reuters, BBC, Amazon and more.

Platform Engineer97 days ago

Full Time RemoteTeam 501-1,000Since 2017H1B No Sponsor

Company Site LinkedIn

• Design and improve the platform systems that support model training, evaluation, and production serving. • Build infrastructure and tooling that make ML workloads more reliable, scalable, and cost-efficient. • Develop internal tools and workflows that are easy to operate both by humans and by agents. • Work on the architecture behind how models are deployed, served, and operated across research and product environments. • Improve how we schedule, monitor, and debug workloads running on GPUs and cloud infrastructure. • Develop internal tools and abstractions and agentic systems that reduce operational overhead for researchers and engineers. • Drive improvements across observability, automation, reliability, and developer experience. • Collaborate closely with researchers and product engineers to understand pain points and turn them into robust platform capabilities. • Contribute to technical direction and make pragmatic architectural tradeoffs as the platform grows.

Cloud Distributed Systems Kubernetes Linux Python Terraform

View details: Senior ML Platform Engineer

United Kingdom

Apply

Lead AI/ML Engineer - Clinical Platform - Remote

UnitedHealth Group

UnitedHealth Group is a healthcare and well-being company that’s dedicated to improving the health outcomes of millions around the world. We are comprised of

Platform Engineer97 days ago

Full Time Remote

Company Site

Optum is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best. Here, you will find a culture guided by diversity and inclusion, talented peers, comprehensive benefits and career development opportunities. Come make an impact on the communities we serve as you help us advance health equity on a global scale. Join us to start Caring. Connecting. Growing together. We are seeking a Lead AI/ML Engineer to drive the design, development, and operationalization of AI/ML solutions that improve the reliability, efficiency, and clinical impact of Optum Clinical Manager (OCM) workflows and integrations. This is a hands-on technical leadership role responsible for delivering production-grade ML systems end-to-end—data, modeling, MLOps, monitoring, and continuous improvement—while mentoring engineers and partnering closely with product, clinical, platform, and operations teams. You will help build AI capabilities that support: - Clinical workflow intelligence (prioritization, recommendations, decision support) - Operational reliability (anomaly detection, incident prediction, noise reduction) - Automation and agentic workflows (triage, routing, diagnostics, self-heal where appropriate) - Improved data quality, latency visibility, and integration observability You’ll enjoy the flexibility to work remotely * from anywhere within the U.S. as you take on some tough challenges. Primary Responsibilities: - Technical Leadership & Delivery (Hands-On) - Lead the architecture and implementation of scalable AI/ML solutions integrated into the OCM ecosystem (APIs, event streams, workflow engines, and integration layers) - Own end-to-end ML lifecycle: problem framing, feature engineering, model development, validation, deployment, monitoring, drift detection, retraining strategy - Establish best practices for MLOps: CI/CD for ML, model registries, automated evaluation gates, reproducible training, and secure deployment patterns - Build production-grade inference services (real-time and batch) with clear SLOs, instrumentation, and rollback strategies - Define and enforce data governance for ML features and training datasets (quality checks, lineage, documentation) - Clinical Platform Impact (OCM-Style Systems) - Partner with product and clinical stakeholders to identify high-impact use cases and translate them into measurable outcomes (quality, productivity, stability, member/patient impact) - Embed AI into workflows responsibly, with explainability, auditing, and human-in-the-loop guardrails - Reliability, Observability & Responsible AI - Implement ML monitoring (performance, drift, bias checks where applicable) and integrate signals into operational dashboards and alerting - Ensure solutions meet security and compliance needs (PHI/PII protection, least-privilege access, auditability) - Drive responsible AI practices: evaluation transparency, documentation, risk assessment, and safe deployment patterns - People Leadership & Collaboration - Mentor and guide ML engineers and software engineers—raising the bar on engineering quality, design rigor, and operational excellence - Lead technical design reviews, influence platform direction, and align teams across engineering, data, operations, and product - Act as a team player: unblock others, foster shared ownership, and improve execution predictability You’ll be rewarded and recognized for your performance in an environment that will challenge you and give you clear direction on what it takes to succeed in your role as well as provide development for other roles you may be interested in. Required Qualifications: - Bachelor's Degree in Computer Science, Data Science, Artificial Intelligence, Machine Learning, Engineering, or a related STEM field. - 10+ years of software engineering experience with 3+ years building and deploying ML systems into production - Proven hands-on experience delivering end-to-end ML solutions (data → model → deployment → monitoring → iteration) - Experience building API-based inference services and data pipelines in cloud-native environments (containerization, orchestration, CI/CD) - Experience collaborating across functions (product, operations, data, security, compliance) and translating needs into technical solutions - Solid skills in Python and modern ML libraries (e.g., PyTorch, TensorFlow, scikit-learn), plus strong software engineering fundamentals - Expertise in MLOps practices: model versioning, reproducibility, automated testing/validation, monitoring, drift detection - Solid understanding of data engineering concepts (feature stores, streaming/batch processing, data quality checks, lineage) - Proven solid leadership behaviors: mentoring, influencing without authority, driving clarity, and executing with accountability - Proven excellent communication skills—can explain complex ML concepts to non-ML stakeholders and align on measurable outcomes Preferred Qualifications: - Master's Degree in Computer Science, Data Science, Artificial Intelligence, Machine Learning, Engineering, or a related STEM field. - Healthcare & Regulated Data - Experience with healthcare systems and workflows (care management, utilization management, clinical operations) and/or working with PHI/PII in regulated environments (HIPAA-aligned controls) - Familiarity with clinical data standards and patterns (claims/encounters, care plans, HL7/FHIR concepts—where relevant) - Advanced ML & Applied AI - Experience with LLMs / GenAI for enterprise use cases (summarization, classification, retrieval, workflow copilots), including: RAG architectures, evaluation frameworks, prompt/version control, safety guardrails - Applied experience in one or more areas: Anomaly detection and time-series modeling, Ranking/recommendation systems, NLP for clinical/operational text, Causal inference / uplift modeling for operational optimization - Platform & Operational Excellence - Experience with observability platforms and building ML-driven alerting/noise reduction (AIOps) - Experience designing event-driven architectures (e.g., Kafka-style streaming), feature computation at scale, and real-time decisioning - Experience with security-by-design and governance (model documentation, audit trails, approvals) - Leadership & Program Influence - Experience leading technical roadmaps, shaping platform standards, and coordinating across multiple teams - Track record of establishing ML engineering standards (coding practices, model review process, reusable components) *All employees working remotely will be required to adhere to UnitedHealth Group’s Telecommuter Policy. Pay is based on several factors including but not limited to local labor markets, education, work experience, certifications, etc. In addition to your salary, we offer benefits such as, a comprehensive benefits package, incentive and recognition programs, equity stock purchase and 401k contribution (all benefits are subject to eligibility requirements). No matter where or when you begin a career with us, you’ll find a far-reaching choice of benefits and incentives. The salary for this role will range from $112,700 to $193,200 annually based on full-time employment. We comply with all minimum wage laws as applicable. Application Deadline: This will be posted for a minimum of 2 business days or until a sufficient candidate pool has been collected. Job posting may come down early due to volume of applicants. At UnitedHealth Group, our mission is to help people live healthier lives and make the health system work better for everyone. We believe everyone–of every race, gender, sexuality, age, location and income–deserves the opportunity to live their healthiest life. Today, however, there are still far too many barriers to good health which are disproportionately experienced by people of color, historically marginalized groups and those with lower incomes. We are committed to mitigating our impact on the environment and enabling and delivering equitable care that addresses health disparities and improves health outcomes — an enterprise priority reflected in our mission. UnitedHealth Group is an Equal Employment Opportunity employer under applicable law and qualified applicants will receive consideration for employment without regard to race, national origin, religion, age, color, sex, sexual orientation, gender identity, disability, or protected veteran status, or any other characteristic protected by local, state, or federal laws, rules, or regulations. UnitedHealth Group is a drug - free workplace. Candidates are required to pass a drug test before beginning employment.

AI/ML Observability/Monitoring CI/CD Docker/Containers Python PyTorch TensorFlow scikit-learn Data Engineering Apache Kafka

View details: Lead AI/ML Engineer - Clinical Platform - Remote

United States

$112K - $193K / year

Apply

Job Closed

Platform Engineer (Reliability) - Unannounced Project

Scopely

Scopely is a touchscreen entertainment network that collaborates and partners with elite game developers and global entertainment companies to deliver industry-

Platform Engineer97 days ago

Full Time Remote

Company Site

Scopely is looking for a Senior Platform Engineer(Reliability) to join a new truly unique multiplayer strategy game in Spain, Ireland, Portugal or the UK on a remote or hybrid basis. We can support with visa sponsorship and relocation assistance from any location. At Scopely, we care deeply about what we do and want to inspire play every day - whether in our work environments alongside our talented colleagues or through our deep connections with our communities of players. We are a global team of game lovers who are developing, publishing and innovating the mobile games industry, connecting millions of people around the world daily. We are in the early stages of development on an ambitious, unannounced Strategy/MMO title, creating a team of talented and passionate game makers to join us on this exciting journey! What You Will Do - Reliability and Operational Excellence: Shape the reliability and operational excellence engineering practices to maintain high system uptime, focusing on reducing operational toil through automation, clear ownership, and well-defined runbooks. - Performance and Scalibility: Drive performance testing, tuning, and capacity planning to ensure systems scale effectively and meet SLAs, making informed trade-offs between cost, scalability, and reliability. - Systems Engineering and Automation: Identify systemic manual processes and eliminate them through automation and software driven solutions, improving efficiency and reducing human error. - Software and Systems Engineering: Work across services and codebases to identify, debug, and resolve reliability and performance issues, contributing code changes where necessary to improve system behavior in production. - Security & Compliance: Embed security, compliance, and governance into pir Engineering Platform and delivery pipelines by default, minimizing the need for manual enforcement and ensuring data privacy and regulatory compliance. - Observability: Design, implement, and operate observability solutions that provide actionable insights into system health, reliability, and cost, enabling teams to detect and resolve issues proactively. - Incident Management: Participate in incident response and postmortem reviews, driving learning, systemic fixes, and preventative improvements rather than short-term workarounds. - Cost Optimization: Make the System cost and efficiency visible, helping teams understand and optimize their cloud usage in line with business objectives, budget constraints, and cloud governance best practices. - Cross-Functional Collaboration: Partner with Engineering and Product teams to shape and deliver an Engineering Platform roadmap that balances delivery speed, reliability, and long-term sustainability. What We’re Looking For - Strong background in software engineering, with experience applying SRE or platform practices to improve system reliability, scalability, and performance - Experience owning or operating systems in production, including incident response, troubleshooting, and driving improvements based on operational learnings - Demonstrated ability to take ownership of complex systems in production and improve them over time. - Ability to quickly navigate and understand unfamiliar codebases, proactively identifying and implementing improvements that enhance reliability, observability, and overall system health with minimal supervision - Experience debugging complex distributed systems and analyzing issues across service boundaries - Strong communication skills and the ability to collaborate effectively with both technical and non-technical stakeholders. - Passion for continuous improvement and staying current with emerging cloud, security, and automation trends. - Infrastructure as Code: Solid experience with Terraform or similar IaC tools. - Containerized Platforms: Experience operating containerized workloads on cloud-native platforms (Kubernetes/EKS, ECS, or equivalent). - Cloud Architecture: Familiarity with the AWS Well-Architected Framework or equivalent cloud architecture standards. - Observability & Logging: Experience designing observability strategies and implementing solutions for metrics, logs, and traces. - Software Engineering & Automation: Strong programming skills (e.g. Go, Python, or similar) with experience building systems, tooling, or services that improve reliability and developer workflows Bonus points - Experience mentoring other engineers, setting technical standards, and influencing best practices across teams. - Experience with Observability through Datadog - Exposure to cost optimization, capacity planning, and cloud governance at scale. About Scopely Scopely is a leading video game and global interactive entertainment company, home to many of the world’s most beloved and enduring experiences, including two of the most successful mobile games of all-time “MONOPOLY GO!” and “Pokémon GO,” along with “Stumble Guys,” “Star Trek™ Fleet Command,” “MARVEL Strike Force,” “WWE Champions,” the Scrabble® franchise, “Yahtzee® With Buddies,” and many others. Across mobile, web, PC, and console, Scopely creates, develops, publishes, and live-operates one of the most diversified and award-winning portfolios in the games industry — bringing hundreds of millions of players together through a shared love of play. Founded in 2011, Scopely is powered by its exceptional team — including thousands of world-class gamemakers around the globe, a distinctive tenet-driven culture, and its proprietary technology platform, Playgami. Together, these strengths have fueled Scopely’s position as the #1 mobile games company in the U.S. and #2 globally, generating more than $10 billion in lifetime revenue. Whether building global sensations like “MONOPOLY GO!” from the ground up, or expanding through strategic acquisitions, including the FoxNext, GSN, and Niantic games businesses — Scopely consistently delivers experiences players love today and return to for years to come. Recognized multiple times as one of the "100 Most Influential Companies in the World" by TIME magazine and one of Fast Company's "World's Most Innovative Companies" and “Best Workplaces for Innovators,” Scopely believes that video games can be a force for good — creating meaningful connections, vibrant communities, and making life better through play. Scopely has global operations and partners across four continents in more than a dozen countries worldwide. For more information, visit: https://www.scopely.com/. Notice to Candidates: Scopely will never request payment or financial information during the application or hiring process. Please apply only through our official website and verify that all Talent Partner communications come from an email address ending in @scopely.com. Should you have any questions or encounter any fraudulent requests/emails/websites, please immediately contact recruiting@scopely.com. Our job applicant privacy policies are available here: California Privacy Notice and EEA/UK Privacy Notice. Employment at Scopely is based solely on a person's merit and qualifications. Scopely does not discriminate against any employee or applicant because of race, creed, color, religion, gender, sexual orientation, gender identity/expression, national origin, disability, age, genetic information, veteran status, marital status, pregnancy or related condition (including breastfeeding), or any other basis protected by law. We also consider qualified applicants with arrest or conviction records, consistent with applicable federal, state and local law.

Observability/Monitoring Distributed Systems Infrastructure as Code Terraform Kubernetes Amazon EKS Amazon ECS AWS Python Datadog

View details: Platform Engineer (Reliability) - Unannounced Project

Ireland + 3 more

Apply

Senior Cloud Engineer

Job Description

Related Guides

Related Categories

Related Job Pages

More Platform Engineer Jobs

Data Platform Engineer

Senior ML Platform Engineer

Lead AI/ML Engineer - Clinical Platform - Remote

Platform Engineer (Reliability) - Unannounced Project