Job Closed
This listing is no longer active.
TechBiz Global is a leading IT recruitment and software development company
Data Engineer – Web Scraping, ETL
Location
Romania
Posted
43 days ago
Salary
0
Seniority
Senior
Job Description
Data Engineer – Web Scraping, ETL
TechBiz Global
• Monitor and manage overnight scraper and ingestion runs, triaging failures and applying fixes in real time to minimize data gaps before US market open • Verify data completeness and quality across all automated feeds, flagging anomalies and coordinating with the Houston team on persistent issues • Maintain run logs, error documentation, and escalation notes for seamless async handoffs • Build and maintain scrapers, parsers, and ingestion pipelines across a growing set of energy market data domains • Contribute to the design and build-out of our broader ETL infrastructure, including scheduling, orchestration, and error handling • Write transformation logic to clean, normalize, and load raw data into PostgreSQL staging and production tables • Optimize existing pipelines for performance, reliability, and cost efficiency • Help build monitoring dashboards and alerting for pipeline health and data freshness • Document data lineage, schema changes, and pipeline dependencies
Job Requirements
- 3+ years of experience building and maintaining ETL pipelines or data engineering systems
- Strong Python skills with experience in web scraping, data parsing, and automation
- Proficiency in SQL and experience working with relational databases (PostgreSQL preferred)
- Experience with headless browsers, anti-bot mitigation, and scraping resilience patterns
- Strong debugging instincts and ability to triage pipeline failures quickly
- Clear written communication in English for async collaboration and documentation
- Experience with energy, commodities, or financial data pipelines (preferred)
- Familiarity with FERC-regulated pipeline data, EIA reporting, or utility/regulatory filings (preferred)
- Experience with PostgreSQL-specific features (partitioning, materialized views, JSONB, pg_cron, logical replication)
- Familiarity with infrastructure-as-code (Terraform, CloudFormation) or containerized deployments (preferred)
- Prior experience working on a distributed team across time zones
Benefits
- Competitive compensation and benefits with room to grow as the team scales
- Direct impact on the data infrastructure behind a leading natural gas intelligence platform
- A small, senior team where your work is visible and valued from day one
- Flexible remote work with clear async workflows
- Exposure to the North American energy markets and commodity data at scale
Related Guides
Related Categories
Related Job Pages
More Data Engineer Jobs
Senior Data Engineer
EmpowerWe are an equal opportunity employer with a commitment to diversity. All individuals, regardless of personal characteristics, are encouraged to apply. All qualified applicants will receive consideration for employment without regard to age, race, color, national origin, ancestry, sex, sexual orientation, gender, gender identity, gender expression, marital status, pregnancy, religion, physical or mental disability, military or veteran status, genetic information, or any other status protected by applicable state or local law.
• Transform financial lives through innovative data solutions • Support internal mobility and create a diverse environment
• Lead data architecture design, API assessment, and ETL requirements gathering during the Discovery & Design phase. • Develop and configure CMIC ERP API integration to establish reliable data exchange between the ERP system and the AWS platform. • Design and implement data pipelines using AWS Glue for ETL processing of subcontractor documents and ERP data. • Integrate Amazon Textract to extract structured data from insurance certificates, bonding letters, and financial documents. • Build and maintain data models to support AI-powered validation, risk profiling, and executive reporting. • Configure Amazon S3 data lake architecture to store and manage raw, processed, and curated data assets. • Implement AWS Lambda and AWS Step Functions to orchestrate data workflows and automated processing pipelines. • Develop and expose data through Amazon API Gateway to support application and dashboard consumption. • Ensure data quality, validation, and integrity across all integration points and pipeline outputs. • Conduct data integration testing and support user acceptance testing (UAT) for data-dependent features. • Collaborate with Full Stack, AI/ML, and DevOps team members to ensure seamless end-to-end data flows. • Contribute to knowledge transfer documentation, data pipeline runbooks, and operations guides.
Lead Data Engineer (AWS Architect)
DreamixBespoke software development company that provides custom end-to-end product development following the highest standards
Dreamix was founded 19 years ago by passionate IT students who wanted to create the dreamiest workplace where everyone is heard, works under transparent management, and lives up to their full potential. Now, many years later, we deliver software solutions for renowned companies from Germany, the UK, Switzerland, and Silicon Valley. We believe that the employer-employee relationship must be in the form of partnership, not transaction. We are committed to investing as much as possible in our employees and we expect the same from you. About the Role: We are looking for a skilled and motivated Lead Data Engineer to drive the design, implementation, and evolution of cloud-native data platforms on AWS. In this role, you will own the end-to-end architecture of data ingestion, transformation, and delivery pipelines — from heterogeneous source systems through to executive-facing BI dashboards. You will combine deep technical expertise with strong stakeholder engagement to ensure data solutions are robust, scalable, and aligned with business needs. Responsibilities: - Lead teams of data engineers and analytics professionals, providing technical direction and mentorship across the project lifecycle - Architect and implement cloud-native data platforms on AWS, designing schemas and ingestion pipelines that handle structured data from diverse sources including Excel feeds, legacy databases, and third-party systems - Design, develop, and maintain ETL/ELT pipelines using AWS Glue and Step Functions, ensuring reliable data transformation, validation, and quality enforcement at every stage - Own the infrastructure-as-code practice using Terraform — provisioning, versioning, and managing all cloud resources with full reproducibility and environment consistency - Deliver end-to-end reporting solutions through Power BI, translating complex data models into intuitive dashboards that provide stakeholders with actionable, near-real-time insights - Translate complex technical concepts and architectural decisions to non-technical stakeholders clearly and persuasively, facilitating informed decision-making at all levels - Establish and enforce best practices around data quality, pipeline monitoring, documentation, and incident response - Participate in pre-sales and discovery efforts, leveraging technical expertise to support scoping, estimation, and business development initiatives Requirements: - Bachelor's or Master's degree in Computer Science, Data Engineering, Information Systems, or a related field - 7+ years of experience in data engineering, cloud infrastructure, or platform development, with a demonstrated track record of delivering production-grade data solutions - Strong hands-on experience with AWS data services — particularly Glue, Step Functions, S3, RDS/Redshift, IAM, and CloudWatch - Proficiency in SQL and Python for data transformation, pipeline orchestration, and scripting - Solid experience with Terraform or equivalent IaC tooling for managing cloud infrastructure - Proven ability to design and optimize ETL/ELT pipelines for data ingestion from heterogeneous sources - Experience building dashboards and data models in Power BI (or equivalent BI tools such as Tableau or Looker) - Strong understanding of data modeling, schema design, and data warehousing concepts - Exceptional communication, presentation, and stakeholder management skills — comfortable leading technical discussions with engineering teams and presenting insights to C-level audiences alike - Strong leadership capabilities with experience mentoring engineers and driving best practices across teams Nice to Have: - Experience with additional AWS services such as Lambda, Athena, Lake Formation, or EventBridge - Familiarity with CI/CD pipelines for data infrastructure (e.g., GitHub Actions, CodePipeline) - Exposure to data governance, cataloging, or lineage tooling - Experience working in consulting or multi-client delivery environments What you’ll get: - A collaborative, knowledge-sharing culture built on transparency and mutual respect - Flexible working hours that allow you to balance your work and personal life - Unlimited home office to help you stay productive and focused - Opportunities for professional development, including certifications and training - Additional benefits for academic teaching and speaking engagements - Knowledge-sharing sessions where you can learn from our Dreamix team - Team and company-wide events that bring us together - Amazing week long summer office and winter office initiatives - Additional health insurance and dental allowance to ensure your well-being - Multisport card to encourage a healthy and active lifestyle - Office massages to help you relax and unwind If you find the mentioned above interesting, send us your CV! Only shortlisted candidates will be contacted. The confidentiality of all applications is assured! By applying for this job, you voluntarily agree and submit your personal information. Any personal data that you provide will be processed in strict confidentiality by Dreamix ltd. only for the purposes of selection and recruitment and will not be transferred to other data controllers unless required by law. It will be stored, processed, retrieved, and deleted in accordance with the GDPR.
Data Engineer
VerisianAccelerate drug time to market through real study traceability and unparalleled trial integrity
• Join our world-class engineering team in building the Verisian Platform • Work on an application that exposes our clinical trial insights to various stakeholders • Support core modules targeting planning, exploration, building, validation, submission, and review of clinical trials • Analyze clinical trial documentation and turn them into actionable insights • Create data validation rules and develop a data analysis engine in Python to detect inconsistencies and errors • Lead analysis, design, building, and testing of components of the engine and data validation rules • Troubleshoot customer issues and deploy required fixes




