Job Closed
This listing is no longer active.
Powering Change
Senior Data Engineer III
Location
United States
Posted
158 days ago
Salary
$138K - $147K / year
Seniority
Senior
Job Description
Senior Data Engineer III
MetroStar
• Design, maintain, and validate data schemas supporting federation and integration between C2SET and external systems • Build and support SQL and Python based ETL pipelines for operational, simulation, and analytics data • Ensure data integrity, correctness, and performance across distributed and multi tier data sources • Troubleshoot data mismatches, malformed messages, schema drift, and integration issues • Partner with modeling and simulation engineers to analyze simulation outputs and support tuning of behaviors and decision logic • Design and execute data driven experiments to evaluate model changes and operational impacts • Develop datasets, scripts, and tooling to support repeatable validation and performance analysis • Support Government analysts and integrators by delivering timely, reliable data refreshes and analysis • Perform database performance tuning and optimization for operational workloads • Maintain data documentation, metadata repositories, and data governance artifacts • Act as a functional lead for data engineering and analytics activities within the integrations team
Job Requirements
- An active U.S. Government issued Secret security clearance
- Bachelor’s degree in Computer Science, Data Engineering, Information Systems, Mathematics, or a related technical field, or equivalent experience
- More than ten years of experience in data engineering, database engineering, database administration, or applied analytics
- Strong hands on experience with SQL and Python, including building and maintaining ETL pipelines
- Proven experience designing, implementing, and maintaining complex data schemas and data models
- Experience working with structured and semi structured data formats such as JSON and XML
- Ability to design, validate, and troubleshoot data exchanges across distributed or federated systems
- Experience with database performance tuning, indexing strategies, and query optimization
- Experience analyzing large or complex data sets and translating results into actionable technical recommendations
- Ability to collaborate effectively with software engineers, modeling and simulation engineers, and Government stakeholders.
Benefits
- Health, dental, and vision insurance
- 401(k) retirement plan with company match
- Paid time off (PTO) and holidays
- Parental Leave and dependent care
- Flexible work arrangements
- Professional development opportunities
- Employee assistance and wellness programs
Related Guides
Related Categories
Related Job Pages
More Data Engineer Jobs
Data Migration Specialist
Prospyr MedicalA HIPAA compliant solution that makes it easy for Aesthetics providers to manage and grow their practices.
• Own end-to-end data migrations for new and expanding customers • Import and validate patient, appointment, invoice, payment, provider, service, and membership data • Map data from a wide range of legacy systems (EMRs, POS tools, spreadsheets, exports) • Identify, clean, normalize, and reconcile inconsistent or incomplete data • Perform QA checks to ensure data accuracy, completeness, and integrity post-migration • Work directly with customers during onboarding to define migration scope and timelines • Explain data requirements, limitations, and tradeoffs in a clear, customer-friendly way • Support go-live readiness by ensuring migrated data aligns with customer workflows • Troubleshoot and resolve migration issues quickly and accurately • Maintain and improve migration playbooks, templates, and checklists • Document repeatable patterns for common legacy systems • Partner with Engineering and Product to improve migration tooling and automation • Surface recurring data issues and upstream product improvements • Partner closely with Customer Experience and Implementation teams on go-live execution • Coordinate with Engineering on complex migrations or edge cases • Provide internal visibility into migration status, risks, and blockers
• Act as the transition point between Prompt Engineering and Data Labeling, translating model and product requirements into concrete data and annotation workflows. • Design, implement, and maintain scalable data workflows for dataset generation, curation, and ongoing maintenance. • Ensure data quality and consistency across labeling projects, with a focus on operational reliability for production AI systems. • Create, review, and maintain high-quality annotations across multiple modalities, including text, audio, conversational transcripts, and structured datasets. • Identify labeling inconsistencies, data errors, and edge cases; propose and enforce corrective actions and improvements to annotation standards. • Utilize platforms such as Labelbox, Label Studio, or Langfuse to manage large-scale labeling workflows and enforce consistent task execution. • Use Python and SQL for data extraction, validation, transformation, and workflow automation across labeling pipelines. • Leverage LLMs (e.g., GPT-4, Claude, Gemini) for prompt-based quality checks, automated review, and data validation of annotation outputs. • Implement automated QA checks and anomaly-detection mechanisms to scale quality assurance for large datasets. • Analyze annotation performance metrics and quality trends to surface actionable insights that improve labeling workflows and overall data accuracy. • Apply statistical analysis to detect data anomalies, annotation bias, and quality issues, and partner with stakeholders to mitigate them. • Collaborate with ML and Operations teams to refine labeling guidelines and enhance instructions based on observed patterns and error modes. • Work closely with Prompt Engineering, Data Labeling, and ML teams to ensure that data operations align with model requirements and product goals. • Document data standards, annotation guidelines, and workflow best practices for use by internal teams and external labeling partners.
Data Engineer – Geolocation Team
IPinfo.io – IP Data ProviderWe're the trusted source for IP address data, handling over 40 billion API requests per month for over 500,000+ users.
• Design, build, and operate data collection and analysis pipelines • Work with large-scale internet measurement data (we collect 75+ TB per week , including BGP, DNS, ping, and traceroute data from 1200+ global vantage points ) • Research, apply, and implement techniques from cutting-edge internet measurement research • Maintain a high bar for signal quality and defensibility , prioritizing observable network behavior over heuristics or guesswork • Communicate findings clearly by contributing to blog posts, technical documentation, and research publications , both internally and externally
• Utilize extract/transform/load ETL technologies using snowflake and other cloud data platforms • Interpret data, analyze results using statistical techniques and provide ongoing reports • Develop and implement databases, data collection systems, data analytics, and other strategies that optimize statistical efficiency and quality • Acquire data from primary or secondary data sources and maintain databases/data systems • Evaluate and optimize data structures • Identify, analyze, and interpret trends or patterns in complex data sets • Filter and “clean” data by reviewing computer reports, printouts, and performance indicators to locate and correct code problems • Monitor, troubleshoot, and improve pipeline transparency, performance, scalability, and reliability, using Snowflake OpenFlow and related ELT/ETL tools • Ensure AI/ML readiness of data by preparing and maintaining semantic models, ensuring robust data quality, and establishing and enforcing data access • Produce field mapping and translation documentation for use in both manual and scripted migrations • Work within Agile methodology managing tasks and tickets as assigned • Communicate with clients and team members for requirements gathering, clarification, and planning for data conversions • Document work and work processes for use by team members




