Job Closed
This listing is no longer active.
Deception Detection for the Information Age.
Staff Data Engineer
Location
New York + 2 moreAll locations: New York | Texas | Washington
Posted
126 days ago
Salary
$160K - $190K / year
Seniority
Lead
Job Description
Staff Data Engineer
BLACKBIRD.AI
• Design and implement scalable data platform architecture on Databricks, supporting both batch and streaming ingestion • Build robust, fault-tolerant data ingestion pipelines that integrate with multiple third-party APIs and data providers • Design and implement AI-powered enrichment stages within pipelines—applying ML clustering, generative AI summarization, classification, and entity extraction to transform raw data into actionable intelligence • Build analytical systems with full-text search capabilities using Elasticsearch for rapid querying and analysis of enriched data • Work with AI/ML researchers to implement, integrate and scaling AI processing • Expose data platform capabilities as APIs and other interfaces for downstream consumption by applications and services • Optimize data lake and lakehouse architecture for performance, cost-efficiency, and scalability • Design and implement data quality frameworks, monitoring, and alerting systems • Design efficient architectures for calling external AI APIs and managing rate limits, costs, and reliability • Architect solutions with cost-efficiency as a first-class concern, implementing monitoring and optimization strategies for compute and storage • Make critical build-vs-buy decisions and establish architectural standards for the data organization • Mentor engineers and elevate the team's technical capabilities through code reviews, design discussions, and knowledge sharing
Job Requirements
- 8+ years of software engineering experience with 5+ years focused on data platforms or data engineering
- Deep expertise with Databricks, Apache Spark, and data lakehouse architectures
- Strong experience building and operating data pipelines at scale (handling TBs+ of data)
- Experience integrating AI/ML capabilities into data pipelines (clustering, LLM APIs, classification, summarization)
- Proficiency in Python, DBT, and SQL for data processing and pipeline development
- Experience with both batch and streaming large scale data processing patterns
- Strong understanding of cloud platforms (AWS, Azure)
- Excellent communication skills and ability to mentor engineers
- Preferred Qualifications:**
- Experience designing both batch and streaming/near real-time data architectures
- Proficiency with Elasticsearch for building analytical systems with full-text search capabilities
- Hands-on experience with LLM APIs and understanding of rate limiting and cost optimization
- Experience with Agentic AI, context engineering, and evaluation
- Background in trust & safety, security, or content moderation domains
- Experience with data observability tools and building comprehensive monitoring systems
- Prior experience at a startup or fast-paced environment
- Apply agentic coding tools for day to day development
- Familiarity with Databricks' Lakeflow, Agent Bricks, and vector databases
Benefits
- Competitive compensation package, 401(k), and equity -** everyone has a stake in our growth! **
- Comprehensive health benefits for you and your loved ones, including wellness days and monthly wellness reimbursements - **an apple a day doesn't always keep the doctor away! **
- Generous vacation policy, encouraging you to take the time you need - we trust you to strike the right work/life balance!
- A flexible work environment with opportunities to collaborate with your team in person -** you can have it all! **
- Inclusion and Impact **- soar to new heights! **
- Professional development stipend -** never stop learning! **
Related Guides
Related Categories
Related Job Pages
More Data Engineer Jobs
Senior Data Engineer
SmarterDxSmarterDx, founded in 2020 in New York, New York, is a health technology company focused on clinical AI solutions that enhance hospital revenue integrity and ca
• Design, develop, and maintain dbt data models that support our healthcare analytics products. • Integrate and transform customer data to conform to our data specifications and pipelines. • Design and execute initiatives that improve data platform and pipeline automation and resilience. • Participate in a rotation of engineers that diagnose, triage, and solve production data issues. • Apply industry standards and best practices to data testing, observability, and platform stability.
• Build data pipelines for coaching and user analytics • Create data systems that power product features • Establish data infrastructure and architecture • Process and analyze AI/LLM outputs • Support business operations and analytics • Create accessible analytics infrastructure • Work closely with product on instrumentation and data collection
This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more. Role Description Business Intelligence at Jamf powers data-driven decision-making across the organization. As a Data Platform Engineer II, you’ll be responsible not just for building & transforming data, but for owning critical data infrastructure: from ingestion and storage, to governance, quality, and consumption by analytics/ML tools. You will partner with analysts, data scientists, product owners and engineers to ensure that Jamf’s data assets are reliable, high-performance, secure, and scalable. For those candidates who live near a Jamf office, you may be expected to work periodically in-office or collaborative work location with other Jamf employees in your area for certain events or moments that matter. What you can expect to do in this role: - Design, build, maintain, and improve the data platform infrastructure (Snowflake environments, airflow workflows, orchestration, CI/CD pipelines for dbt / transformations) - Develop and maintain Terraform (or equivalent IaC) definitions for provisioning data infrastructure (compute, storage, permissions, networking where needed) - Automate deployment of data transformations (e.g. dbt CI/CD, staging / production pipelines) - Ensure data platform availability, reliability, security and performance (e.g. enforce roles & permissions in Snowflake, resource monitoring, concurrency/usage optimisation) - Instrument monitoring, logging and alerting of data workflows (Airflow / Kubernetes / dbt jobs) - Collaborate with Data Engineers / Analysts / Architects to define platform capabilities, set standards & best practices around schema design, governance, version control, and performance - Run capacity planning, ensure cost-efficiency, scaling strategy (e.g. concurrency limits in Snowflake warehouse sizing, cluster autoscaling, etc) - Facilitate onboarding of teams to the data platform: document usage patterns, create templates or utilities (for example dbt macros, shared libraries) - Participate in architecture reviews, evaluate new platform tooling (e.g. enhancements to orchestration, transformation frameworks, security strategy, etc) - Troubleshoot critical incidents and participate in incident / post-mortem cycles for platform issues Qualifications - Minimum of 3 years experience building data pipelines with Python (Required) - Minimum of 3 years experience working with data warehouse or other cloud based database technology, with strong proficiency in SQL. (Required) - Experience with Docker / Kubernetes (Required) - Exposure to Infrastructure-as-Code (IaC) such as Terraform or DevOps (Required) - Experience working with dbt (Preferred) - Strong experience with cloud infrastructure: AWS (EC2, ECR, S3, Glue, RDS, etc) or equivalent public cloud provider - Hands-on experience in CI/CD, version control, unit / integration testing for data pipelines - Comfortable working in agile teams, and mentoring others - Strong Communication Skills - Excellent Interpersonal Skills - Excellent Organizational Skills - Proven Analytical Skills - Ability to communicate complex technical terms in an easy to understand, non-technical manner - Ability to interact effectively with co-workers in a result driven culture - Self-starter, energetic multi-tasker, highly motivated and team player - Ability to engage with and establish trust and rapport with all levels of customers and employees - Agile practitioner experienced in Scrum or Kanban - General knowledge of Apple products and eco-systems - Bachelor's Degree in Mathematics, Computer Science or related field (Required) - A combination of relevant experience and education may be considered Requirements - Participation in ongoing security training is mandatory - Established security protocols will be adhered to, sensitive data will be handled responsibly, and data protection practices are followed, including understanding relevant privacy regulations and reporting breaches - Acknowledging the Jamf Code of Conduct, where applicable security and privacy policies can be found, is a requirement of all roles at Jamf Benefits - Named a 2025 Best Companies to Work For by U.S. News - Named a 2024 Best Technology Company to Work For by U.S. News - Named one of Forbes Most Trusted Companies in 2024 - Named a 2024 Best Companies to Work For by U.S. News - Opportunity to make a real and meaningful impact for more than 75,000 global customers - Support for new innovations and OS releases the moment they are made available by Apple - Work with a small and empowered team where the culture is based on trust, ownership, and respect - Clear career path that enables you to grow under supportive leadership and management - Access to the Jamf Engineering blog for insights on innovative projects - Pay Transparency Range: $85,100 — $181,700 USD
• Define the target architecture for Customer360 on Snowflake or Databricks, including ingestion patterns, modeling standards, and governance. • Design and lead the Golden Record / identity resolution approach (deterministic matching first), including identifiers, survivorship rules, confidence scoring. • Create the canonical customer model (core entities/relationships) and align marts/domains (e.g., insurance, cards, loans) into a unified customer layer. • Establish data quality frameworks: checks (null/uniqueness/RI/thresholds), monitoring/alerts, lineage/source-of-truth mapping, and data SLAs. • Define activation-ready outputs (customer attributes, segments, eligibility indicators) and support low-latency enablement patterns where needed.


