On a mission to help people find the Job of their choice. Fill this: https://forms.gle/fWsXYfgAfEorQZgaA
Staff Data Engineer
Location
California
Posted
104 days ago
Salary
$155.6K - $320.3K / year
Seniority
Lead
Job Description
Staff Data Engineer
Zigsaw
• Design and implement robust data infrastructure in AWS, using Spark with Scala • Evolve our core data pipelines to efficiently scale for our massive growth • Store data in optimal engines and formats, matching your designs to our performance needs and cost factors • Collaborate with our cross-functional teams to design data solutions that meet business needs • Design and implement knowledge graphs, exposing their functionality both via Batch Processing and APIs • Leverage and optimize AWS resources while designing for scale • Collaborate closely with our Data Science and Product teams
Job Requirements
- Production data engineering experience
- Proficiency in Spark and Scala, with proven experience building data infrastructure in Spark using Scala
- Experience in delivering significant technical initiatives and building reliable, large scale services
- Experience in delivering APIs backed by relationship-heavy datasets
- Familiarity with data lakes, cloud warehouses, and storage formats
- Strong proficiency in AWS services
- Expertise in SQL for data manipulation and extraction
- Excellent written and verbal communication skills
- Bachelor's degree in Computer Science or a related field
Benefits
- Health insurance
- 401(k) matching
- Flexible work hours
- Paid time off
- Remote work options
Related Guides
Related Categories
Related Job Pages
More Data Engineer Jobs
• Assist with connecting systems to data sources • Help manage and maintain data connections in Azure • Build and maintain ETL pipelines (Extract, Transform, Load) • Help automate manual data processes • Work with large, enterprise-level datasets • Document data sources, pipelines, and processes
• Supervise junior members of the data engineering team. Guiding, planning, and reviewing the team's work • Create and maintain optimal data pipeline architecture • Assemble large, complex data sets that meet functional / non-functional business requirements • Extend our machine learning platform by designing tools that interface with cloud services, our current code base, and provide new flexibility in model building • Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL, Python, and AWS • Build analytics tools to provide actionable insights into key business performance metrics, as well as supporting the needs of the analytics team • Create data-handling tools for analytics and data scientist team members that assist them in building and optimizing our decision-making process
Geospatial Data Engineer
Orcrist Technologies GmbHPioneering Future Technologies with Advanced AI and Data Analytics
• Build and operate data pipelines that supply GEOINT services with accurate, compliant, and performant spatial data • Own ingestion, transformation, versioning, and distribution across cloud and air-gapped environments • Collaborate with Data Analytics Team in creating value adding data products • Develop ingestion pipelines using Python, GDAL, Rasterio, tippecanoe, and PostGIS for vector/raster/3D datasets • Automate tiling, generalization, and 3D tile generation (Cesium 3D Tiles, quantized mesh, terrain) with incremental update workflows • Implement data quality checks (topology validation, completeness, coordinate reference integrity) and provenance tracking (lineage metadata, checksums) • Manage storage lifecycle across cloud (S3/GCS) and on-prem object stores; optimize performance and cost • Package data for offline distribution (MBTiles, geopackages, zipped 3D tiles), including delta updates and secure transfer • Collaborate with Data Acquisition and Licensing to enforce usage rights, export control, and compliance • Monitor pipelines (Prometheus, Grafana), maintain runbooks, and participate in on-call/incident response • Own end-to-end sourcing of new geospatial datasets (commercial and freely available)
• Design and maintain a unified data architecture: database schemas, data models, and micro-architecture solutions to ensure scalability and reliability. • Optimize database performance at all levels: indexing, partitioning, clustering, and tuning configuration parameters. • Ensure full compliance with GDPR, UK Data Protection Act, and other relevant regulations: data masking, consent management, retention policies, and privacy impact assessments • Optimize queries, schemas, and indexes where needed • Set up basic data quality checks • Support GDPR and UK data protection requirements, including: Data masking, Access control, Retention policies • Take data notebooks and calculation logic • Turn them into reliable, production-ready pipelines • Ensure scalability, reliability, and reproducibility




