Pelo Futuro da Indústria | Pelo Futuro do Trabalho
Doctoral Fellow – Data Engineering, Pipelines, PySpark
Location
Brazil
Posted
105 days ago
Salary
R$9K / month
Seniority
Senior
Job Description
Doctoral Fellow – Data Engineering, Pipelines, PySpark
Sistema Fibra
• Plan and align the project with the Androidization strategy • Gather and validate functional and technical requirements • Design the solution architecture and the data model • Automate integration and processing of operational data using Python • Model and automate refined tables incorporating business rules • Implement monitoring and proactive alerts • Publish and validate the tables in the production environment • Document the entire technical and functional platform architecture • Provide training and formalize the technical handover
Job Requirements
- Degree: PhD
- Education: Degree in Computer Engineering, Computer Science, Statistics, or related fields
- English: Intermediate
- Strong knowledge of data engineering using Python
- Basic knowledge of cloud computing platforms, especially for data integration and Data Lake storage
- Experience with code versioning using Git
- Ability to design scalable, automated architectures for data ingestion and transformation
- Ability to automate observability mechanisms and alerts
- Ability to create technical and operational documentation
- Experience with PySpark
Benefits
- Remote work
Related Guides
Related Categories
Related Job Pages
More Data Engineer Jobs
Senior Data Engineer – Cloud Data Platform, Snowflake, dbt
AssistRxSpecialty therapy initiation and patient support company delivering informed access and improved outcomes.
• Design, build, and optimize Snowflake-centric data architectures to support enterprise analytics, reporting, and operational use cases • Own dbt transformation layers, including model design, testing, documentation, and deployment best practices • Implement scalable data modeling patterns (star schemas, data vault, dimensional models) aligned to business needs • Develop and maintain reliable data pipelines integrating sources such as Salesforce, application databases, and external client data • Ensure data quality through validation, testing, monitoring, and observability frameworks • Optimize Snowflake performance and cost through query tuning, warehouse design, and efficient data modeling • Partner closely with Analytics, BI, Product, and Engineering teams to deliver trusted, analytics-ready datasets • Contribute to architectural standards, code reviews, and best practices across the CDP team • Document data flows, models, and platform decisions to support long-term scalability and knowledge sharing • Ensure data pipelines and models meet PHI / PII / HIPAA compliance requirements • Support secure access patterns, role-based permissions, and data governance controls
• Design, develop, and manage enterprise-scale batch scheduling and data pipeline workflows • Develop and support data pipelines using ETL/ELT tools and scripting • Monitor, troubleshoot, and optimize batch failures and performance issues • Schedule and monitor workloads in AWS / Azure / GCP environments • Integrate Control-M with data platforms such as Snowflake, Redshift, BigQuery, etc. • Collaborate with application, data, and infrastructure teams to ensure seamless scheduling
Senior/Principal Data Engineer
Sigma Software GroupWe support enterprises, product houses, and startups with custom software solutions development and IT consulting.
• Design and implement a scalable data warehouse or data lakehouse to support analytics, reporting, and business KPIs • Develop and maintain reliable batch and/or streaming data pipelines from internal databases and external systems • Collaborate with stakeholders to translate business requirements into efficient data models and schemas • Establish and maintain data modeling standards and best practices • Implement monitoring, data quality controls, and observability for all data workflows • Provide well‑structured datasets to enable self‑service analytics for BI and data teams • Document the data platform, including lineage, definitions, and contracts, to create a shared source of truth for metrics
• Design and implement data pipelines to be processed and visualized across a variety of projects and initiatives • Develop and maintain optimal data pipeline architecture by designing and implementing data ingestion solutions on AWS using AWS native services • Design and optimize data models on AWS Cloud using Databricks and AWS data stores such as Redshift, RDS, S3 • Integrate and assemble large, complex data sets that meet a broad range of business requirements • Read, extract, transform, stage and load data to selected tools and frameworks as required and requested • Customizing and managing integration tools, databases, warehouses, and analytical systems • Process unstructured data into a form suitable for analysis and assist in analysis of the processed data • Working directly with the technology and engineering teams to integrate data processing and business objectives • Monitoring and optimizing data performance, uptime, and scale; Maintaining high standards of code quality and thoughtful design • Create software architecture and design documentation for the supported solutions and overall best practices and patterns • Support team with technical planning, design, and code reviews including peer code reviews • Provide Architecture and Technical Knowledge training and support for the solution groups • Develop good working relations with the other solution teams and groups, such as Engineering, Marketing, Product, Test, QA • Stay current with emerging trends, making recommendations as needed to help the organization innovate.




