Software Mind logo
Software Mind

Software House focused on results since 1999

Senior Data Engineer – AI Ingestion Platform

Data EngineerData EngineerFull TimeRemoteSeniorTeam 1,001-5,000Since 1999H1B No SponsorCompany SiteLinkedIn

Location

Argentina

Posted

3 days ago

Salary

0

Seniority

Senior

Bachelor Degree6 yrs expEnglishAWSDynamoDBETLPython

Job Description

Senior Data Engineer – AI Ingestion Platform

Software Mind

• Build and own the historical email ingestion pipeline via Microsoft Graph API • Implement SharePoint / OneDrive document ingestion pipeline with scoped folder access • Design and implement the PII minimisation pre-processing layer • Build the vector store indexing workflow (OpenSearch/Pinecone) with per-tenant data isolation • Define and implement the data processing schema; produce and maintain schema documentation • Build the OCR routing orchestrator and integrate OCR service for scanned documents • Implement the raw text / content extraction layer for all supported document types • Define and prototype push vs. pull ingestion strategy, from one-time PoC through to incremental nightly pipeline • Ensure data lineage and audit traceability are built into pipeline outputs from the outset

Job Requirements

  • 6+ years in data engineering; strong pipeline and ETL/ELT experience required
  • Proficiency in Python for data pipeline development
  • Experience with Microsoft Graph API or similar enterprise email/document APIs (M365, Exchange Online)
  • AWS data services: S3, DynamoDB, Glue, and/or Lambda-based event-driven processing
  • Familiarity with PII detection and data minimisation techniques (regex-based, NER-based, or purpose-built libraries)
  • Experience with vector store indexing or semantic search pipeline construction

Benefits

  • Remote work options

Related Categories

Related Job Pages

More Data Engineer Jobs

Cencora logo

Data Entry Specialist

Cencora

Headquartered in Oakville, Ontario, Canada, Innomar Strategies is the country's top patient support provider for specialty pharmaceuticals and a division of Cen

Data Engineer3 days ago

Bilingual Data Entry Specialist (6-month contract) Location: Remote, Quebec, Canada Category: Customer Ops & Service Full time Job Details PRIMARY DUTIES AND RESPONSIBILITIES: - Create and maintain patient records in the CRM by ensuring information documented are accurate and updated regularly - Execute administrative tasks critical to the progression of patients through the Program - Ensure data integrity - Act as a liaise between other members of the team - Handling faxes (incoming/outgoing) accordingly and timely - Organize work schedule to complement working hours in multiple time zones - Labeling and attaching documents to the CRM - Ability to cover various shifts - Ad hoc duties as assigned EXPERIENCE AND EDUCATIONAL REQUIREMENTS: - Excellent computer skills - Minimum 2 years experience in a similar administrative or data entry role - Experience with the SalesForce software (or experience with a CRM system) an asset - Experience with the faxing software (MyFax or Right Fax) an asset - High School Diploma or Post Secondary schooling preferred - Technical vocational training or equivalent combination of experience and education - 2 years directly related experience MINIMUM SKILLS, KNOWLEDGE AND ABILITY REQUIREMENTS: - Knowledge of medical terminology an asset - Ability to work autonomously - Ability to provide assistance to team members during periods of increased workload - Ability to recognize and question abnormal data and escalate if need be - Ability to foresee and adjust scheduling and adjust workload - Strong analytical skills - Effective interpersonal and leadership skills - Effective organizational skills; attention to detail - Ability to consistently meet deadlines ;Time management skills and ability to prioritize tasks - Excellent problem-solving skills; ability to resolve issues effectively and efficiently - Knowledge of Microsoft Word, Excel, PowerPoint and other Office Programs - Ability to communicate effectively in English and French (oral and written) for the Quebec and New Brunswick provinces The successful candidate may have daily contacts with unilingual English-speaking customers, patients or peers from cross-functional teams. What Cencora offers We provide compensation, benefits, and resources that enable a highly inclusive culture and support our team members’ ability to live with purpose every day. In addition to traditional offerings like medical, dental, and vision care, we also provide a comprehensive suite of benefits that focus on the physical, emotional, financial, and social aspects of wellness. This encompasses support for working families, which may include backup dependent care, adoption assistance, infertility coverage, family building support, behavioral health solutions, paid parental leave, and paid caregiver leave. To encourage your personal growth, we also offer a variety of training programs, professional development resources, and opportunities to participate in mentorship programs, employee resource groups, volunteer activities, and much more. Affiliated Companies: Affiliated Companies: Innomar Strategies

Canada
BCT Partners logo

Data Engineer – Data Project Lead

BCT Partners

Harnessing the power of diversity, expertise, and innovation to transform lives, accelerate equity, and create lasting.

Data Engineer3 days ago
Full TimeRemoteTeam 51-200H1B No Sponsor

• Oversee the design, development, and implementation of data management, reporting, workflow automation, and system integration solutions that support Head Start and Early Head Start programs • Lead data projects, infrastructure, and automation initiatives to improve grant administration, operational efficiency, data quality, and decision-making while ensuring systems are scalable, secure, and reliable • Provide technical leadership, project oversight, and continuous improvement efforts to deliver innovative data solutions that support organizational performance and improve outcomes for children and families • Manage complex data engineering projects from planning through deployment, ensuring solutions are scalable, secure, reliable, and aligned with business requirements • Design and maintain data pipelines, ETL processes, databases, data warehouses, and reporting systems • Develop and manage API integrations and automated workflows across multiple systems and platforms • Establish and enforce data governance, data quality, security, and compliance standards • Collaborate with stakeholders to gather requirements and translate business needs into technical solutions • Monitor system performance, troubleshoot issues, and implement process and technology improvements • Prepare technical documentation, reports, and presentations for leadership and stakeholders

United States
$120K - $140K / year
Data Engineer3 days ago
Full TimeRemoteTeam 51-200Since 2019H1B No Sponsor

• Continuously monitor batch, streaming and on-demand data pipelines to ensure availability and data quality. • Identify, analyze and resolve operational incidents and critical failures in data environments. • Perform advanced troubleshooting by analyzing logs, metrics and alerts on GCP platforms. • Monitor and support data workflows, including DAGs, reprocessing, dependencies, and load validations. • Track execution of processes in Dataproc, Spark, BigQuery and other components of the data architecture. • Collaborate with Data Engineering, Architecture, Governance and Business teams to resolve issues and drive continuous improvements. • Contribute to the stability, reliability and evolution of data environments. • Prepare and maintain incident documentation, root cause analyses (RCA), operational procedures and action plans. • Participate in automation, observability and operational process optimization initiatives.

Brazil
Reply logo

Senior Data Engineer

Reply

Reply designs and implements innovative solutions in the areas: Digital Services, Technology and Consulting.

Data Engineer3 days ago
Full TimeRemoteTeam 10,001+Since 1996H1B Sponsor

• Build and manage reliable data pipelines involving ingestion/collection, processing, integration, storage, and data availability across the organization • Work within a distributed systems architecture for massively parallel (MPP) data processing, combining diverse heterogeneous data sources and collaborating with analytics and data science teams to build solutions and generate data-driven value

Italy