CreatorIQ logo
CreatorIQ

The most trusted software to unify and power advanced influencer marketing for the world’s most innovative enterprises

Senior Data Engineer, Reporting

Data EngineerData EngineerFull TimeRemoteSeniorTeam 201-500Since 2014H1B SponsorCompany SiteLinkedIn

Location

Poland

Posted

41 days ago

Salary

zł289K - zł325K / year

Seniority

Senior

Bachelor Degree5 yrs expEnglishAirflowApacheETLNumpyPandasPythonSQL

Job Description

Senior Data Engineer, Reporting

CreatorIQ

• Design and implement ETL pipelines migrating from transactional databases to analytical data warehouses • Create real-time data ingestion systems processing campaign data, user metrics, and business intelligence. • Build multi-tenant data models with proper partitioning strategies for enterprise-scale clients. • Develop data quality frameworks with comprehensive validation, monitoring, and alerting. • Implement Row-Level Security (RLS) and Role-Based Access Control (RBAC) in analytical databases • Design dynamic permission models supporting organization-level and division-level data access • Build session-based context management for secure multi-tenant queries • Create comprehensive audit trails and access logging for compliance requirements • Design database schemas with advanced partitioning and indexing strategies • Build materialized views and aggregated tables for real-time analytics • Implement query optimization, data skipping, and compression techniques • Handle high-concurrency embedded dashboard usage with sub-second query performance • Build dashboard data sources with optimized SQL transformations • Handle complex data structures and parsing requirements • Create flat, denormalized tables optimized for embedded analytics consumption • Implement custom field handling for tenant-specific metadata requirements

Job Requirements

  • 5+ years of data engineering experience with production-scale systems
  • Expert-level SQL skills with analytical databases (columnar databases preferred)
  • Strong Python programming with data libraries: pandas, numpy, pyarrow
  • Experience with ETL orchestration tools: Apache Airflow, Prefect, dbt, or similar
  • Deep understanding of analytical databases, partitioning strategies, and OLAP optimization
  • Experience building SaaS data platforms with tenant isolation requirements
  • Knowledge of Row-Level Security (RLS) implementation in analytical databases
  • Understanding of RBAC patterns and session-based access control
  • Experience with authentication flows in data systems
  • Familiarity with compliance requirements (SOC2, GDPR) for multi-tenant data.

Benefits

  • 26 days vacation
  • Floating and set holidays
  • Wellness allowance
  • Paid parental leave
  • Medical insurance
  • Life insurance
  • Business travel insurance
  • Stock options as part of our equity-sharing program.
  • Comprehensive perks program providing stipends for cell phone and internet, home office setup, mental wellness, professional development and tuition reimbursement, plus occasional company-funded meal opportunities throughout the year.

Related Categories

Related Job Pages

More Data Engineer Jobs

Full TimeRemoteTeam 51-200

🧠 ¡Buscamos Ingeniero de Datos! 🧠 ¿Te apasiona trabajar con grandes volúmenes de datos y proyectos en la nube? 🎯 Objetivo del rol: Participar en proyectos de migración y desarrollo en entornos cloud, utilizando Azure Synapse y PySpark para garantizar soluciones eficientes y escalables. 🎓 Requisitos del perfil - Formación académica: Ingeniero de Sistemas o carreras afines. - Experiencia: Mínimo 2 años en proyectos en la nube con Azure Synapse y desarrollo de notebooks con PySpark. - Conocimientos indispensables: - PySpark (desarrollo y migración de código desde otras plataformas). - Azure Synapse. - SQL avanzado. - Git (control de versiones). - Creación y gestión de pipelines. 💡 Competencias personales - Trabajo en equipo. - Comunicación efectiva. - Adaptabilidad. - Orientación a resultados. - Capacidad analítica. ⚙️ Responsabilidades principales - Desarrollar y mantener notebooks en PySpark. - Ejecutar proyectos de migración de código hacia PySpark. - Implementar y optimizar pipelines de datos. - Colaborar con equipos multidisciplinarios para garantizar la calidad del proyecto. - Documentar procesos y buenas prácticas. 🕒 Condiciones laborales - 📍 Ubicación: Colombia (Remoto). - ⏰ Horario: Tiempo completo. - 💰 Salario: A convenir 🌐 TALYCAP GLOBAL – Conectamos el mejor talento IT con proyectos de alto impacto. #IngenieroDeDatos #AzureSynapse #PySpark #TalycapGlobal

Colombia

Role Description At Sunrise Robotics, this role owns the data platform that powers our robotics, AI, and deployment systems. You’ll design and build the infrastructure that handles large-scale, multi-modal data - from ingestion and storage to processing and access. This includes video, sensor data, and robot telemetry used across training, simulation, and production systems. Your work will enable teams to reliably collect, version, and use data to improve system performance and accelerate development. This is a hands-on role focused on building robust, scalable systems that support real-world robotics deployments. What You’ll Do - Design and build data pipelines for ingesting, processing, and serving multi-modal data (video, images, sensor data) - Own and evolve our data platform, ensuring reliability, scalability, and ease of use across teams - Implement data storage and processing systems on AWS (e.g. S3, serverless pipelines, data lake architecture) - Build systems for data versioning, lineage, and reproducibility - Optimise data handling for large-scale, memory-intensive workloads - Enable efficient data access for AI, simulation, and deployment teams - Collaborate with AI, robotics, and infrastructure teams to align data systems with real-world use cases - Contribute to monitoring, observability, and tooling for data quality and system performance Qualifications - Strong Python engineering skills, including data structures and efficient data handling - Experience working with large, multi-modal datasets (e.g. video, images, sensor data) - Experience building and operating data infrastructure on AWS (e.g. S3, serverless processing, data lakes) - Experience with data versioning and lineage systems (e.g. DVC, LakeFS, Pachyderm, or similar) - Experience with asynchronous programming and memory-efficient processing - Experience with data pipeline orchestration tools (e.g. Airflow, Dagster, Prefect, or similar) - Familiarity with MLOps workflows and tools (e.g. MLflow, ClearML, ZenML, or similar) - Experience with infrastructure-as-code tools (e.g. Terraform, Pulumi, CloudFormation) - Experience working with databases (e.g. SQLite, MongoDB, Cassandra) Requirements - Experience working with robotics or other real-time, sensor-driven systems - Experience with streaming data and edge data processing (e.g. Jetson, Coral, NXP) - Experience with synthetic data generation (e.g. Unity, Isaac Sim) - Experience with data visualisation and monitoring tools (e.g. Grafana, Plotly, Rerun) - Familiarity with ROS/ROS2 or robotics frameworks - Experience working with telemetry data (e.g. logs from physical systems) Benefits - High exposure: Work closely with robotics, AI, and infrastructure teams on core systems - Career acceleration: Build and own the data platform in a fast-scaling robotics company - Real impact: Your work will directly enable how data is used to train, evaluate, and improve systems running in production

Slovenia
Full TimeRemoteTeam 1,001-5,000H1B No Sponsor

• develop scalable data processing architectures and solutions for data transformation • ingest data from internal and external sources using cloud-native platforms • apply industry-standard software development practices and patterns • use AI, machine learning, and big data to enhance service delivery • help shape departmental strategy and decisions regarding technical approaches

North America
$97.9K - $133.5K / year
Job Closed
SimSpace Corporation logo

Staff Data Science Engineer

SimSpace Corporation

SimSpace is an Equal Opportunity Employer: In compliance with federal law, all persons hired will be required to verify identity and eligibility to work in the United States and to complete the required employment eligibility verification document form upon hire. SimSpace is committed to providing an inclusive and welcoming environment for all members of our staff, clients, volunteers, subcontractors, vendors, and clients. Research shows that women and people from underrepresented groups only apply to jobs if they meet all of the qualifications. However, no one ever meets 100% of the qualifications. SimSpace encourages you to break that statistic and to apply. We also consider qualified applicants regardless of criminal histories, in accordance with applicable law. We are committed to providing reasonable accommodations for qualified individuals with disabilities in our job application procedures. SimSpace does not accept unsolicited resumes from employment agencies. Actual compensation for the position is based on a variety of factors, including, but not limited to affordability, skills, qualifications and experience, and may vary from the range.

Data Engineer41 days ago
Full TimeRemoteTeam 201-500

Role Description Staff Data Science Engineer sought by SimSpace Corporation (Boston, MA). - Design, implement, and deploy advanced mathematical and machine-learning algorithms (e.g., supervised, unsupervised, reinforcement learning, NLP, anomaly detection) to support cyber-range simulations, delivering production models with documented accuracy, latency, and throughput metrics. - Develop and maintain end-to-end AI/ML pipelines (data ingestion, feature engineering, model training, validation, inference, monitoring), ensuring test coverage, reproducibility of experiments, and documented performance benchmarks. - Construct and optimize numerical methods and computational models using Python, NumPy, SciPy, Pandas, and JAX/TensorFlow/PyTorch to solve large-scale (10M+ row) data and optimization problems relevant to cyber-range operations. - Architect scalable model-serving systems in Docker/Podman/Kubernetes, achieving reliable deployments with measured service uptime of 99 percent or greater and documented resource-utilization improvements. - Develop and integrate new AI-driven cybersecurity capabilities (e.g., automated scoring engines, classification systems, reinforcement-learning-based adversary behaviors) with quantified gains in accuracy, precision/recall, or scenario realism, validated against internal evaluation datasets. - Author and maintain production-quality Python services, enforcing code standards, implementing unit/integration testing with unittest/pytest, and reducing defect rates via measurable static/dynamic analysis reports. - Design, evaluate, and improve model performance using quantitative metrics (e.g., AUC, F1, perplexity, reward curves, convergence rates), generating written model-evaluation reports used in release readiness decisions. - Perform algorithmic research on emerging ML/AI/cyber methods, producing technical assessments, prototypes, and feasibility studies that directly inform quarterly engineering and product roadmaps. - Lead cross-team technical initiatives, producing written design documents, conducting architecture reviews, and driving the integration of DS/AI services across engineering, product management, platform teams, and cybersecurity content engineering. - Mentor senior-level engineers and data scientists by conducting formal code reviews, mathematical model reviews, and algorithm correctness checks, with documented feedback that improves model accuracy, stability, or performance. - Apply computational mathematics methods (e.g., linear algebra, numerical optimization, differential equations, stochastic processes) to design, implement, and validate algorithms and models with documented quantitative results. - Produce internal documentation (design specs, API references, model cards, validation reports) ensuring compliance with internal engineering, security, and AI governance standards. - Define and establish technical standards, best practices, and design patterns for AI/ML development across the Data Science team. - Drive high-performance computing initiatives to optimize AI/ML system performance, including distributed computing and GPU acceleration strategies. - Collaborate with cross-functional teams, including product development, engineering, cybersecurity content developers, and external stakeholders to align technical solutions with organizational objectives. - Prepare and deliver technical reports, presentations, and briefings to leadership, stakeholders, and customers on project status, technical approaches, and strategic recommendations. - Evaluate and recommend new technologies, tools, and methodologies to advance SimSpace's AI/ML and cybersecurity capabilities. - Attend and participate in team and company meetings as well as contribute to strategic planning and technical roadmap development. - May work remotely from anywhere in the US. Qualifications - Ph.D. in Computational Mathematics, Computer Science, Applied Mathematics, or a closely related field. - 1 year of experience in computational mathematics, scientific computing, machine learning, data science, or algorithm development. Experience may be gained through employment, research, or doctoral work. - Demonstrated experience applying mathematical or machine-learning algorithms (e.g., regression, classification, clustering, reinforcement learning, NLP, numerical optimization) to datasets of at least 1 million observations or high-dimensional data. - Demonstrated experience developing scientific or ML software in Python using at least three of the following packages: NumPy, Pandas, SciPy, Matplotlib. - Demonstrated experience implementing, training, and validating machine-learning models using at least three of the following frameworks: PyTorch, TensorFlow, JAX, scikit-learn. - Demonstrated experience writing automated tests for ML or scientific code using at least two of the following: unittest, pytest, hypothesis. - Demonstrated experience building and deploying containerized applications using at least one of the following: Docker, Podman, Kubernetes. - Demonstrated experience producing documented research or production-quality software artifacts (e.g., peer-reviewed publications, open-source contributions, internal enterprise algorithms or models) demonstrating algorithm correctness or performance validation. - Demonstrated experience applying computational mathematics methods (e.g., linear algebra, numerical optimization, differential equations, stochastic processes, network or graph analysis) to design or evaluate algorithms or models, with documented quantitative results. - Demonstrated understanding of statistics, computational complexity and performance, parallelization, databases, optimization, linear programming, hypothesis testing, research methodology, and existing scientific literature and results in the field of data science and AI/ML. Requirements - *Experience may be gained through academic coursework during or after master’s or PhD degree. - *Experience may be gained concurrently. - **Demonstrated knowledge or experience is equivalent to at least 6 months of experience as it cannot be learned during a reasonable period of on-the-job training. - **May work remotely from anywhere in the US. Benefits - Salary Range: $183,801-$184,000 - Please e-mail resume to careers@simspace.com. Company Description SimSpace is an Equal Opportunity Employer: - In compliance with federal law, all persons hired will be required to verify identity and eligibility to work in the United States and to complete the required employment eligibility verification document form upon hire. - SimSpace is committed to providing an inclusive and welcoming environment for all members of our staff, clients, volunteers, subcontractors, vendors, and clients. - Research shows that women and people from underrepresented groups only apply to jobs if they meet all of the qualifications. However, no one ever meets 100% of the qualifications. SimSpace encourages you to break that statistic and to apply. - We also consider qualified applicants regardless of criminal histories, in accordance with applicable law. We are committed to providing reasonable accommodations for qualified individuals with disabilities in our job application procedures. - SimSpace does not accept unsolicited resumes from employment agencies. - Actual compensation for the position is based on a variety of factors, including, but not limited to affordability, skills, qualifications and experience, and may vary from the range.

United States
$183.8K - $184K / year
Job Closed