Job Closed
This listing is no longer active.
Greenbox Capital offers funding solutions for small and mid-sized businesses with the aim of making working capital more accessible to all, even those considered at high risk. Emph
Lead Data Engineer
Location
Florida
Posted
141 days ago
Salary
$150K - $170K / year
Seniority
Senior
Job Description
Lead Data Engineer
Greenbox Capital
• Design, develop, and maintain scalable data pipelines and ETL processes using Azure Data Factory, Azure Databricks, and other Azure services. • Own the data engineering framework, including pipeline patterns, orchestration standards, and reusable components. • Collaborate with data scientists, Software engineers, analysts, and other stakeholders to understand data requirements and deliver high-quality data solutions. • Define, document, and enforce best practices for ADF, Databricks, Spark, and data modeling. • Implement and maintain data storage solutions using Azure SQL Database, Azure Data Lake Storage, and Azure Cosmos DB. • Ensure data quality and integrity by implementing data validation, cleansing, and transformation processes. • Implement data quality checks, validation frameworks, and monitoring for critical data assets. • Design and support governance patterns leveraging Databricks Unity Catalog and Azure-native controls. • Develop and maintain documentation for data engineering processes and solutions.
Job Requirements
- Bachelor’s degree in computer science, Information Technology, or a related field.
- 5+ years of experience in data engineering, with demonstrated ownership of production systems.
- 3+ years of experience in the Azure ecosystem.
- Proficiency in Azure Data Factory, Azure Databricks, Azure SQL Database, Azure Data Lake Storage, Azure Cosmos DB, Databricks Unity Catalog.
- Material experience interacting with relational and NoSQL, JSON, XML, and interacting with REST APIs.
- Deep hands-one experience with: Azure Data Factory (orchestration, patterns, parameterization), Azure Databricks / Apache Spark (PySpark, performance tuning, cluster design), Azure Data Lake Storage and Azure SQL.
- Advanced experience in programming in Python & SQL.
- Solid understanding of data modeling, ETL/ELT design, and analytical data platforms.
- Experience with Azure DevOps or GitHub and CI/CD pipelines.
- Experience designing and deploying data governance frameworks from the ground up.
- Proven experience owning data engineering frameworks, not individual pipelines.
Benefits
- Competitive Pay - We know your worth and we pay accordingly.
- Flexible PTO - Work hard, rest well. Take the time you need to recharge.
- Full Benefits Package - Health, dental, vision
- Smart, Supportive Teammates - Collaborate with sharp minds who are kind, driven and uphold our core values: Commitment, Communication, Teamwork, Service and Integrity!
Related Guides
Related Categories
Related Job Pages
More Data Engineer Jobs
• Architect and deliver complex, multi-year data unification and Customer 360 solutions. • Design and implement Master Data Management (MDM) capabilities: identity resolution, match/merge logic, survivorship rules, and stewardship workflows. • Build and operate scalable, automated data pipelines across cloud and on-prem sources; hands-on experience with Profisee (MDM) and Fivetran (integration). • Develop and optimize solutions on Azure Data Lake, Azure Synapse, Azure Data Factory, Power BI, and Microsoft Fabric (including lakehouse/medallion patterns). • Establish and operationalize data governance frameworks: data cataloging, lineage tracking, stewardship operating models, access controls, and regulatory/compliance policies; experience integrating Microsoft Purview is a plus. • Define and monitor data quality baselines (accuracy, consistency, completeness, timeliness) and drive continuous improvement. • Create and evolve semantic models to support analytics, reporting, and AI scenarios; enforce standardized definitions and calculations. • Lead change management and stakeholder engagement to drive adoption and value realization. • Apply DataOps practices for observability, performance tuning, hypercare, and post–go-live optimization. • Collaborate with executive, business, and IT stakeholders; translate business needs into clear technical designs and roadmaps.
• Develop, enhance, and troubleshoot Mainframe batch processes using JCL, Easytrieve, and SAS. • Build and maintain automation and data processing scripts using Python. • Support distributed data processing workloads using Apache Spark and the PySpark API. • Write efficient SQL queries for data extraction, analysis, and transformation. • Work with Google Cloud Platform (GCP) services - primarily Cloud Storage - for data movement and storage management. • Collaborate with data analysts, engineers, and business teams to support data initiatives and enhance data workflows. • Participate in documentation, code reviews, and best practices for data and code quality. • Investigate data issues, perform root-cause analysis, and implement corrective actions.
Remote Sensing Data Engineer
Living CarbonPublic benefit company with a mission to fight climate change by enhancing CO2 capture and storage in trees
• Conduct remote sensing analytics and modeling • Develop scalable analytical and reporting tools • Manage GIS data collection, storage, and version control across team members • Engage in strategic planning & process improvement • Provide support to collaborate with Land and Forestry teams • Manage and analyze large datasets off-line and in cloud computing and storage platforms • Analyze large and complex geospatial datasets and remote sensing data • Design and implement novel predictive, statistical, and machine learning models related to forestry, land use, carbon sequestration, biodiversity, conservation planning, and climate resilience • Automate statistical and geospatial analysis processes using Python, R, or other programming languages • Create clear and impactful reporting tools to communicate geospatial information and insights • Maintain and update internal geospatial databases, ensuring data quality, consistency, and version control • Integrate data from Land, Forestry, and Carbon teams to support commercial initiatives • Ensure high standards of data accuracy and ethical use in all geospatial analyses and models • Conduct quality control checks on geospatial datasets • Provide technical mapping support to other team members as needed • Work closely with Land, Forestry, and Carbon teams to uncover new operational insights • Identify opportunities to improve geospatial workflows and contribute to the development of best practices • Support research and development efforts in geospatial analytics and remote sensing applications.
Staff Data Engineer
Netwrix CorporationData security starts with identity, #1 attack vector. Fast, cost-effective solutions trusted by 13,500 organizations
• Design and maintain standardized data schemas used across different data sources and storage systems • Define data contracts and models to ensure consistent representation of entities such as users, groups, resources, and permissions • Develop and maintain schema evolution and versioning processes to support iterative product development • Ensure data models are optimized for both transactional and analytical workloads • Collaborate with engineering and product teams to align data models with business logic and reporting requirements • Design and optimize ClickHouse schemas for analytical and time-series workloads • Maintain PostgreSQL schemas for metadata, configuration, and application-level data • Develop indexing, partitioning, and retention strategies that balance performance, scalability, and cost • Define transformation specifications to ensure consistency between raw and analytical data layers • Establish naming conventions, data types, and relationship standards for all stored data • Implement validation and normalization checks to ensure incoming data adheres to defined schemas • Partner with QA and product teams to verify that stored data accurately represents system behavior and business intent • Maintain clear documentation and metadata definitions for all datasets and structures • Manage schema migrations and versioning through CI/CD workflows • Collaborate with DevOps teams to deploy and monitor databases in Kubernetes-based environments • Use Infrastructure as Code tools (Helm, Terraform, or similar) for consistent database provisioning • Support observability and monitoring for data performance and reliability.




