Smarsh enables organizations to manage the risk and uncover the value within their communications data.
Lead Data Scientist
Location
New York
Posted
3 days ago
Salary
$166K - $214K / year
Seniority
Senior
Job Description
Lead Data Scientist
Smarsh
• Collect, analyze, and interpret small/large datasets to uncover meaningful insights to support the development of statistical methods / machine learning algorithms. • Lead the design, training, and deployment of NLP and transformer-based models for financial surveillance and supervisory use cases (e.g., misconduct detection, market abuse, trade manipulation, insider communication). • Development of machine learning models and other analytics following established workflows, while also looking for optimization and improvement opportunities • Data annotation and quality review • Exploratory data analysis and model fail state analysis • Contribute to model governance, documentation, and explainability frameworks aligned with internal and regulatory AI standards. • Client/prospect guidance in machine learning model and analytic fine-tuning/development processes • Provide guidance to junior team members on model development and EDA • Work with Product Manager(s) to intake project/product requirements and translate these to technical tasks within the team’s tooling, technique and procedures • Continued self-led personal development
Job Requirements
- Strong understanding of **financial markets, compliance, surveillance, supervision, or regulatory technology**
- Experience with one or more data science and machine/deep learning frameworks and tooling, including scikit-learn, H2O, keras, pytorch, tensorflow, pandas, numpy, carot, tidyverse
- Command of data science and statistics principles (regression, Bayes, time series, clustering, P/R, AUROC, exploratory data analysis etc…)
- Strong knowledge of key programming concepts (e.g. split-apply-combine, data structures, object-oriented programming)
- Solid statistics knowledge (hypothesis testing, ANOVA, chi-square tests, etc…)
- Knowledge of NLP transfer learning, including word embedding models (gloVe, fastText, word2vec) and transformer models (Bert, SBert, HuggingFace, and GPT-x etc.)
- Experience with natural language processing toolkits like NLTK, spaCy, Nvidia NeMo
- Knowledge of microservices architecture and continuous delivery concepts in machine learning and related technologies such as helm, Docker and Kubernetes
- Familiarity with Deep Learning techniques for NLP.
- Familiarity with LLMs - using ollama & Langchain
- Excellent verbal and written skills
- Proven collaborator, thriving on teamwork
- Preferred Qualifications**
- Master’s or Doctor of Philosophy degree in Computer Science, Applied Math, Statistics, or a scientific field
- Familiarity with cloud computing platforms (AWS, GCS, Azure)
- Experience with automated supervision/surveillance/compliance tools
Related Guides
Related Categories
Related Job Pages
More Data Scientist Jobs
Senior Director, Data Science & Advanced Analytics
DataSpringDataSpring is the trusted data connector at the core of healthcare. For more than 25 years, we have powered the industry with the largest and most complete healthcare data foundation in the U.S., including more than 4.8 million provider data records sourced directly from providers and member data representing 75% of covered lives supplied by health plans. By improving how essential information flows across the system, DataSpring helps healthcare operate more efficiently, accurately, and with greater confidence.
Role Description The Sr. Director, Data Science & Advanced Analytics is responsible for defining and leading DataSpring's enterprise data science and advanced analytics strategy. This leadership role will focus on developing predictive models, machine learning solutions, and scalable analytics platforms to enable data-informed decisions across the organization. The Senior Director will build and lead a high-performing team that transforms provider and member data into actionable intelligence, supports business innovation, and drives measurable outcomes. - Define and execute a roadmap for enterprise data science that aligns with CAQH’s mission, data strategy, and product portfolio. - Identify key opportunities for predictive modeling, machine learning, and optimization to support provider and member data initiatives. - Lead the development of scalable models, algorithms, and decision-support tools to improve operations, data quality, and customer engagement. - Establish best practices for model development, validation, monitoring, and continuous improvement. - Guide the implementation of a robust, cloud-native analytics environment that enables rapid experimentation and insight generation. - Drive the development of reusable data products, features, and frameworks that scale across use cases. - Champion MLOps, automation, and reproducibility for production-grade model deployment and monitoring. - Lead the development of analytics dashboards, KPIs, and visualizations that empower business units to make data-informed decisions. - Partner with stakeholders to translate complex analytical outputs into business value, actionable insights, and measurable outcomes. - Standardize and govern enterprise-wide analytics metrics and methodologies to ensure consistency and reliability. - Work closely with Data Engineering, Architecture, and Governance teams to ensure data science solutions are interoperable, secure, and aligned with CAQH’s enterprise data ecosystem. - Collaborate with product, operations, and growth teams to embed analytics into workflows and external-facing solutions. - Partner with external vendors, research institutions, and data providers to expand modeling capabilities and data assets. - Recruit, lead, and mentor a high-performing team of data scientists, ML engineers, and analysts. - Foster a culture of curiosity, innovation, and continuous learning through coaching, technical leadership, and performance management. - Build team capacity and maturity across advanced analytics, statistical modeling, and AI/ML disciplines. Qualifications - Proven ability to lead enterprise data science strategy and deliver actionable insights that drive business impact. - Strong expertise in statistical modeling, machine learning, optimization, and data mining techniques. - Proficiency in Python, R, SQL, Spark, and data science frameworks (e.g., scikit-learn, TensorFlow, PyTorch, XGBoost). - Experience deploying models in cloud environments (Azure preferred) using MLOps tools and practices. - Exceptional communication and stakeholder management skills, with the ability to explain complex models and methods to non-technical audiences. - Deep understanding of BI, KPI frameworks, and performance measurement. - Familiarity with data governance and compliance frameworks (e.g., HIPAA, HITRUST, GDPR). - Knowledge of healthcare data and standards (e.g., FHIR, HL7, X12) is a strong plus. Requirements - 10+ years of experience in data science, analytics, or applied statistics, including 5+ years in a senior leadership role. - Demonstrated success in building and leading high-performing analytics or data science teams. - Proven track record in deploying machine learning models in production environments with measurable business outcomes. - Experience working with large, complex datasets, preferably in healthcare, health tech, or life sciences. - Bachelor’s degree in computer science, data science, statistics, applied mathematics, or a related field required. - Master’s or PhD in a quantitative discipline preferred. - Certifications in machine learning, data science, or cloud platforms (e.g., Azure) are a plus. Benefits - Competitive compensation and a comprehensive benefits package for full-time employees. - Medical, dental, and vision coverage. - 401(k) with company contributions and matching. - Paid parental leave. - Tuition assistance. - Generous paid time off. - Commitment to investing in our people and supporting professional growth over time. Company Description DataSpring is the trusted data connector at the core of healthcare. For more than 25 years, we have powered the industry with the largest and most complete healthcare data foundation in the U.S., including more than 4.8 million provider data records sourced directly from providers and member data representing 75% of covered lives supplied by health plans. By improving how essential information flows across the system, DataSpring helps healthcare operate more efficiently, accurately, and with greater confidence.
Senior Data Scientist
Flagstar BankFlagstar Bank N.A. was acquired by New York Community Bancorp, Inc., the holding company for Flagstar Bank N.A.
• Independently perform data analytics (ranging from data analysis, regression to machine learning) • Present complex concepts in a format digestible by a diverse audience across the organization • Build and improve the development of metrics, reports and analysis into BI dashboards that support a wide array of banking businesses • Manage ongoing data analytics processes, document calculations, code, and adhere to proper risk and compliance practices • Trouble shoot issues, suggest alternative approaches or controls to mitigate risks/gaps identified • Develop and design data-driven solutions in the areas of client insights and segmentation, sales & marketing lead generation, performance analysis and product analytics • Partner across the functions, including the line of businesses, IT, Data Office, and/or third-party vendors to co-build the data foundation and analytics delivery infrastructure for the action-driven insights and reporting • Assume end-to-end responsibilities on key initiatives assigned, and leverage partnership to achieve the shared vision
Senior Data Scientist
MNTNMNTN provides advertising software for brands to reach their audience across Connected TV, web, and mobile. MNTN Performance TV has redefined what it means to advertise on television, transforming Connected TV into a direct-response, performance marketing channel. Our web retargeting has been leveraged by thousands of top brands for over a decade, driving billions of dollars in revenue. Our solutions give advertisers total transparency and complete control over their campaigns all with the fastest go-live in the industry. As a result, thousands of top brands have partnered with MNTN, including Tarte, Decked, and National University.
Role Description We’re looking for a Senior Data Scientist to help shape MNTN’s sovereign identity data backbone powering targeting, bidding, measurement, and cross-device attribution for Performance TV marketing. In this role, you’ll develop the methodologies, models, and graph-based approaches that unify identity signals across fragmented data sources and improve the accuracy, scalability, and interpretability of identity resolution. This is a hands-on role for someone excited by large-scale data science, AI-assisted development, and deep research in graph-based entity resolution. - Design and improve graph-based approaches for identity resolution across devices, households, and identifiers which improve match quality, coverage, and stability across a rapidly changing identity landscape. - Use Scala, Spark, SQL, and cloud-native tools to analyze large identity datasets, build models, and productionize data science workflows. - Contribute production-grade code to shared repositories, using strong engineering practices to build clear, scalable, and maintainable systems. - Define validation strategies and measure model performance and business impact on targeting, measurement, and attribution. - Help shape the team’s approach to identity science and partner across Engineering, Product, and Analytics to deliver production-ready solutions. - Leverage LLMs, AI editors (Cursor, Copilot, Claude Code), and agentic workflows to accelerate research, prototyping, documentation, testing, and iteration. - Apply privacy-by-design principles to ensure identity science work is auditable, compliant, and aligned with governance standards. Qualifications - 5+ years of experience in data science, machine learning, or applied research working with large-scale datasets in production. - Strong experience building identity graphs, entity resolution systems, record linkage pipelines, or related graph-based matching systems. - Deep expertise in Scala, Spark, SQL, and/or Python for distributed processing, model development, and experimentation. - Strong foundation in applied statistics, machine learning, graph algorithms, clustering, probabilistic matching, and model evaluation. - Experience productionizing data science solutions in partnership with engineering, including testing, monitoring, and reproducibility. - Experience mentoring data scientists and helping define technical direction across a team. - Comfortable with AI-assisted workflows and modern development tools, including LLMs. - Deep ownership mindset - you care about correctness, explainability, scalability, observability, and maintainability. - Entrepreneurial, customer-first mindset - you connect identity science quality to marketing performance and attribution accuracy. Requirements - Experience in adtech, martech, measurement, attribution, or privacy-sensitive consumer data environments. - Hands-on experience with Google Cloud Services and cloud-native data and ML tooling such as BigQuery, Dataproc, GCS, and Kafka. - Experience with graph algorithms and techniques such as label propagation, connected components, community detection, graph embeddings, and/or link prediction. - Familiarity with privacy-enhancing technologies, data governance practices, and evolving identity standards. Benefits - 100% remote within the US - Flexible vacation policy - Annual vacation allowance for travel related expenses - Three-day weekend every month of the year - Competitive compensation - 100% healthcare coverage - 401k plan - Flexible Spending Account (FSA) for dependent, medical, and dental care - Access to coaching, therapy, and professional development Company Description MNTN provides advertising software for brands to reach their audience across Connected TV, web, and mobile. MNTN Performance TV has redefined what it means to advertise on television, transforming Connected TV into a direct-response, performance marketing channel. Our web retargeting has been leveraged by thousands of top brands for over a decade, driving billions of dollars in revenue. - Our solutions give advertisers total transparency and complete control over their campaigns all with the fastest go-live in the industry. - As a result, thousands of top brands have partnered with MNTN, including Tarte, Decked, and National University.
Data Annotation Specialist, Data Science
CohereAt Cohere, our mission is to build machines that understand the world, and to make them safely accessible to all.
• Evaluate the model's ability to respond to coding requests, workflows, and code base-related questions using available tools. • Assess agent trajectories and model capabilities for code generation, tabular and graphic manipulation, and debugging requests. • Prompt models to complete complex data science tasks and review the accuracy of generated responses. • Label, proofread, and improve machine-written and human-written software engineering-related outputs. • Report quality and performance trends related to model/agent behaviour and project assignments.



