Data Engineer, Web Scraping
Location
United States
Posted
66 days ago
Salary
$105K - $125K / year
Seniority
Mid Level
No structured requirement data.
Job Description
Data Engineer, Web Scraping
10a Labs
About 10a Labs: 10a Labs is the safety and threat-intelligence layer trusted by frontier AI labs, AI unicorns, Fortune 10 companies, and leading global technology platforms. Our adversarial red teaming, model evaluations, and intelligence collection enable engineering, safety, and security teams to stay ahead of evolving threats and deploy AI systems safely. About 10a Labs: 10a Labs is the safety and threat-intelligence layer trusted by frontier AI labs, AI unicorns, Fortune 10 companies, and leading global technology platforms. Our adversarial red teaming, model evaluations, and intelligence collection enable engineering, safety, and security teams to stay ahead of evolving threats and deploy AI systems safely. In this role, you will: - Design, implement, and optimize end-to-end data pipelines for scraping and processing structured and unstructured data using Google Cloud Platform (or similar) and best practices; - Conduct ad hoc web scraping and data collection to support research and intelligence initiatives; - Prepare data for further analysis, including data cleaning, transformation, anonymization, and masking; - Contribute to the development of internal and external APIs, following best practices; - Collaborate with ML engineers, other data engineers, and software developers to deliver actionable insights and functional tools, including internal and external dashboards, APIs, and data dumps; and - Drive other critical initiatives. Requirements: - Degree (or equivalent work experience) in Computer Science, Engineering, Information Science, Data Science or a related field (graduate degree preferred) - 2+ years of professional experience in data engineering or a closely related field - Ability to communicate complex technical ideas clearly to non-technical audiences - Proficiency in Python, SQL - Experience with web scraping/crawling (e.g., Beautiful Soup, Selenium, Scrapy) - Experience with Google Cloud Platform (or similar), including storage and database services (e.g., Cloud Storage, CloudSQL, Cloud Spanner) and workflow orchestration (e.g., Cloud Composer/Airflow, Cloud Run, Pub/Sub) - Experience building and managing data pipelines, especially for text data - Comfort working in fast-moving, high-impact environments, such as startups, AI research labs, or security-focused teams Compensation & Benefits: - Salary Range: $105K–$125K, depending on experience and location - Bonus: Performance-based annual bonus - Professional Development: Support for conferences, continuing education, or leadership training - Work Environment: Fully remote, U.S.-based - Health Benefits: Comprehensive health, dental, and vision coverage - Time Off: Generous PTO and paid holiday schedule Retirement: 401(k) plan
Related Guides
Related Categories
Related Job Pages
More Data Engineer Jobs
Staff Software Engineer, Data Engineering
Omada HealthA digital-first chronic care provider, helping members change mindsets to improve health and build lasting change.
• Own the vision and technical roadmap for Omada’s enterprise data architecture, spanning ingestion, storage, modeling, and serving layers for analytics and applied statistics use cases. • Design, implement, and evolve scalable, secure, and cost‑efficient data solutions (datalakes, warehouses, marts, semantic layers) that support governed, cross‑functional analytics and self‑service. • Define and socialize architectural patterns, data contracts, and integration standards used by data and product teams across the organization. • Anticipate future needs (e.g., new product lines, new modalities, AI/ML workloads) and drive proactive architectural changes rather than reacting to incidents or point‑in‑time requests. • Lead the design of logical and physical data models to support enterprise metrics, dashboards, and ad hoc analytics, with a focus on reusability and clear ownership. • Implement robust data quality, validation, and monitoring frameworks that underpin trusted “single source of truth” definitions for core concepts (e.g., active member, MAU, GLP‑1 member).
Senior Data Platform Engineer (Python/AWS) - Finance
TruelogicAt Truelogic we are a leading provider of nearshore staff augmentation services headquartered in New York. For over two decades, we’ve been delivering top-tier technology solutions to companies of all sizes, from innovative startups to industry leaders, helping them achieve their digital transformation goals. Our team of 600+ highly skilled tech professionals, based in Latin America, drives digital disruption by partnering with U.S. companies on their most impactful projects. Whether collaborating with Fortune 500 giants or scaling startups, we deliver results that make a difference. By applying for this position, you’re taking the first step in joining a dynamic team that values your expertise and aspirations. We aim to align your skills with opportunities that foster exceptional career growth and success while contributing to transformative projects that shape the future.
About Truelogic At Truelogic, we are a leading provider of nearshore staff augmentation services headquartered in New York. For over two decades, we’ve been delivering top-tier technology solutions to companies of all sizes, from innovative startups to industry leaders, helping them achieve their digital transformation goals. Our team of 600+ highly skilled tech professionals, based in Latin America, drives digital disruption by partnering with U.S. companies on their most impactful projects. Whether collaborating with Fortune 500 giants or scaling startups, we deliver results that make a difference. By applying for this position, you’re taking the first step in joining a dynamic team that values your expertise and aspirations. We aim to align your skills with opportunities that foster exceptional career growth and success while contributing to transformative projects that shape the future. Our Client Our client is a leading publicly traded financial services company specializing in the U.S. mortgage market, operating a comprehensive end-to-end platform that supports loan origination, servicing, and investment management. Job Summary The Sr Data Platform Engineer - Python/AWS Specialist leads the design, development, and management of our enterprise data pipeline infrastructure, with a primary focus on Python-based solutions and AWS cloud services. This role supports critical business functions through sophisticated data engineering, including pricing analytics, trading systems, hedging models, and pooling operations, ensuring scalable, performant, and reliable data solutions across the organization. Responsibilities - Architect and maintain Python applications using Object-Oriented Programming (OOP) and enterprise design patterns. - Build RESTful APIs and microservices using FastAPI or Flask. - Utilize frameworks like Pandas, NumPy, SQLAlchemy, and PySpark for sophisticated data processing. - Implement solutions using Lambda, Glue, Step Functions, S3, EventBridge, and Kinesis. - Manage reproducible deployments via CloudFormation, CDK, or Terraform. - Deploy and orchestrate services using Docker, ECS, or Kubernetes. - Design scalable ETL/ELT pipelines using Airflow, Prefect, or AWS Step Functions. - Build real-time and batch processing systems using serverless and event-driven patterns. - CI/CD workflows, automated testing (pytest), and data validation Qualifications and Job Requirements - Software Experience: 5+ years in software development with 4+ years of dedicated production Python experience (including async programming). - AWS Expertise: 3+ years of hands-on experience with data-centric AWS services. - Architecture Knowledge: Deep understanding of ETL/ELT patterns and modern data architecture principles. - Communication: Proven ability to explain complex technical concepts to business stakeholders and project managers. What We Offer - 100% Remote Work: Enjoy the freedom to work from the location that helps you thrive. All it takes is a laptop and a reliable internet connection. - Highly Competitive USD Pay: Earn an excellent, market-leading compensation in USD, that goes beyond typical market offerings. - Paid Time Off: We value your well-being. Our paid time off policies ensure you have the chance to unwind and recharge when needed. - Work with Autonomy: Enjoy the freedom to manage your time as long as the work gets done. Focus on results, not the clock. - Work with Top American Companies: Grow your expertise working on innovative, high-impact projects with Industry-Leading U.S. Companies. Why You’ll Like Working Here - A Culture That Values You: We prioritize well-being and work-life balance, offering engagement activities and fostering dynamic teams to ensure you thrive both personally and professionally. - Diverse, Global Network: Connect with over 600 professionals in 25+ countries, expand your network, and collaborate with a multicultural team from Latin America. - Team Up with Skilled Professionals: Join forces with senior talent. All of our team members are seasoned experts, ensuring you're working with the best in your field. Apply now!
Director, Data Platform & Analytics
NewselaNewsela is a New York, New York-based educational and technology startup that was established in 2013. The company is committed to "transforming the way learners access the world t
• Own and drive the data and analytics strategy, roadmap, and execution across the organization. • Scale and mentor a high-performing team spanning data engineering, analytics, and platform. • Partner with the executive team and cross-functional leaders to surface insights that inform business strategy and drive user growth and retention. • Empower cross-functional partners across Product, Marketing, and Customer Success with the tools, frameworks, and self-serve access to make data-informed decisions confidently. • Architect and evolve a scalable data platform — encompassing pipelines, warehouse infrastructure, and self-serve capabilities — for both internal and external users. • Own the data catalog, ensuring data assets are well-documented, discoverable, and understood across the organization. • Define and enforce data quality standards, monitoring, and remediation processes to ensure trustworthy data at every layer. • Establish and maintain a data governance framework covering ownership, access controls, and data definitions across the organization. • Own the BI environment in Tableau and self-serve analytics tool with Heap and Pendo, ensuring accurate reporting and broad data availability.
Data Engineer – Azure Cloud & Security
AspenView Technology PartnersAspenView Technology Partners empowers organizations to thrive with agile, expert-staffed, nearshore IT teams.
About the role We're building secure, scalable, and automated data solutions on Azure — and we're looking for a Data Engineer who can bridge the worlds of data architecture and cloud security. You'll be joining a team that works on complex, high-impact data platforms where engineering rigor and security best practices go hand in hand. This is a fully remote position. What you will do: - Design, plan, and implement secure and scalable data architectures on Azure (on-premise and cloud) - Build and maintain ETL/ELT pipelines using Azure Data Factory, Databricks, and Azure Functions - Develop and optimize data platforms for analytics and decision-making (Data Warehouse / Lakehouse) - Write and maintain data processing code using Python and PySpark - Implement and manage Azure networking and security components (Firewall, VNET, Private Endpoints, NSG/ASG, FrontDoor) - Deploy and manage infrastructure using Terraform or Bicep (IaC) - Set up and maintain CI/CD pipelines via Azure DevOps or GitHub Actions - Lead technical projects or engineering teams, ensuring delivery and quality standards - Collaborate on data governance and security practices across the data platform What you bring: - 3+ years of experience as an analyst or engineer in development, databases, data governance, or data architecture - 3+ years designing and automating data architectures in cloud and on-premise environments - Hands-on experience with the core Azure data stack: Data Factory, Databricks, Azure Functions, and Microsoft Fabric - Strong SQL skills and experience with both relational and non-relational databases - Proficiency in Python and PySpark for data processing - Solid understanding of Azure networking and security (VNET, Firewall, Private Link, NSG/ASG) - Experience with Infrastructure as Code (Terraform or Bicep) and CI/CD pipelines - Proven track record leading teams or technical projects Nice if you have: - Azure Associate or Expert certification - English proficiency at B2 level or above - Experience with Azure Synapse or Microsoft Fabric - Familiarity with Site-to-Site VPN or ExpressRoute for on-prem connectivity - Experience working in complex or regulated environments - Background in data governance and security frameworks - Experience with self-hosted agents in CI/CD pipelines Equal Opportunity Employer: AspenView is proud to be an equal opportunity employer. We believe in creating an environment where all employees feel welcome, valued, and empowered to succeed. We celebrate diversity and strive to build a culture of inclusion where all individuals, regardless of their race, color, gender, gender identity or expression, sexual orientation, disability, age, or any other characteristic, can thrive. We encourage applicants from all walks of life to join our team and make a lasting impact.



