Johnson & Johnson logo
Johnson & Johnson

Johnson & Johnson is an award-winning, family-owned-and-operated company that has been providing health and wellness products for more than 120 years. Employing more than 120,000 p

Principal Data Engineer - Safety Analytics

Location

Pennsylvania + 1 moreAll locations: Pennsylvania | New Jersey

Posted

24 days ago

Salary

$102K - $177.1K / year

Seniority

Senior

Job Description

Principal Data Engineer - Safety Analytics

Johnson & Johnson

Title: Principal Data Engineer - Safety Analytics (Global Medical Safety) remote type Hybrid Work locations Horsham, Pennsylvania, United States of America Titusville, New Jersey, United States of America time type Full time job requisition id R-070696 Job Description: At Johnson & Johnson, we believe health is everything. Our strength in healthcare innovation empowers us to build a world where complex diseases are prevented, treated, and cured, where treatments are smarter and less invasive, and solutions are personal. Through our expertise in Innovative Medicine and MedTech, we are uniquely positioned to innovate across the full spectrum of healthcare solutions today to deliver the breakthroughs of tomorrow, and profoundly impact health for humanity.  As guided by Our Credo, Johnson & Johnson is responsible to our employees who work with us throughout the world. We provide an inclusive work environment where each person is considered as an individual. At Johnson & Johnson, we respect the diversity and dignity of our employees and recognize their merit. Job Function: Data Analytics & Computational Sciences Job Sub Function: Data Engineering Job Category: Scientific/Technology All Job Posting Locations: Horsham, Pennsylvania, United States of America, Titusville, New Jersey, United States of America Job Description: About Innovative Medicine Our expertise in Innovative Medicine is informed and inspired by patients, whose insights fuel our science-based advancements. Visionaries like you work on teams that save lives by developing the medicines of tomorrow. Join us in developing treatments, finding cures, and pioneering the path from lab to life while championing patients every step of the way. Learn more at https://www.jnj.com/innovative-medicine Prefered Location: Horsham, PA or Titusville, NJ. Remote work will considered on a case by case basis. Role Overview We are seeking a Principal Data Engineer to provide technical leadership within Global Medical Safety (GMS), supporting the Safety Analytics organization. This role is focused on building and enabling modern safety analytics tools using AI, Machine Learning, and GenAI, underpinned by robust, compliant, and scalable data engineering on Google Cloud Platform (GCP). The Principal Data Engineer is responsible for end-to-end ownership of safety analytics data engineering, spanning data intake, data quality and continuity, pipeline and architecture design, automation, performance optimization, and compliance. The role enables advanced analytical, machine learning, and predictive capabilities for pharmacovigilance and serves as a technical data engineering leader within Global Medical Safety. This is a Principal-level individual contributor role with broad technical influence, working closely with safety scientists, analytics teams, data scientists, IT, and platform partners to deliver trusted, production-grade analytics capabilities for safety decision-making. Key Responsibilities Safety Analytics & Pharmacovigilance Enablement - Design and maintain production-grade data pipelines and curated datasets that directly support pharmacovigilance activities, including safety monitoring, analytics, and regulatory reporting. - Ensure data engineering solutions produce reproducible, explainable, and trusted analytics outputs suitable for safety decision support and inspection readiness. - Enable AI/ML and GenAI workflows for safety analytics, including: - Feature engineering and feature store enablement - Embeddings, vectorized representations, and semantic retrieval - Retrieval-Augmented Generation (RAG) patterns for safety analytics tools End-to-End Data Architecture & Lifecycle Ownership - Own the end-to-end data lifecycle for safety analytics, from source system intake through transformation, serving, and downstream analytical consumption, ensuring data continuity, traceability, and integrity. - Lead architectural decisions across ingestion, transformation, storage, and serving layers on GCP (e.g., BigQuery, Dataform, object storage). - Design, implement, and automate scalable, reusable data pipelines and architectures to support evolving safety analytics needs. Data Quality, Governance & Compliance - Establish and enforce data quality, validation, lineage, and observability standards for safety analytics datasets. - Define and implement data governance practices, including data contracts, schema versioning, access control, stewardship, and lifecycle management. - Ensure safety analytics data and systems meet Global Medical Safety requirements for reliability, auditability, and regulatory use. GxP Validation & Regulatory Readiness - Apply GxP validation expertise to data pipelines, analytics services, and supporting infrastructure. - Partner with quality and compliance teams to implement CSV/CSA-aligned controls, audit trails, documentation, and organizational change. - Balance delivery velocity and innovation with the rigor required for regulated pharmacovigilance systems. Services, APIs & Microservices - Design and build APIs and microservices-based architectures to operationalize safety analytics and ML capabilities (e.g., feature serving, retrieval services, analytics backends). - Deploy and operate services on GCP (e.g., Cloud Run, GKE) with a strong focus on security, scalability, and observability. - Enforce contract-first integration patterns between producing and consuming systems to ensure reliability and safe evolution. Infrastructure, CI/CD & Cost Optimization - Provision and manage cloud infrastructure using Terraform (Infrastructure as Code) on GCP. - Build and maintain CI/CD pipelines (e.g., Jenkins) for data pipelines, analytics services, feature pipelines, and ML data assets. - Continuously optimize the performance and cost efficiency of data and analytics infrastructure while maintaining compliance and reliability standards. Technical Leadership & Stakeholder Engagement - Serve as a technical authority and data engineering leader for Safety Analytics within Global Medical Safety. - Review and influence designs across pipelines, services, feature stores, and AI/ML integrations to maintain a high technical bar. Collaborate closely with safety scientists, epidemiologists, biostatisticians, analytics teams, IT, and platform partners to translate safety needs into scalable technical solutions. - Communicate complex technical concepts and tradeoffs clearly to both technical and non-technical stakeholders. - Enable and upskill teams through mentorship, guidance, and knowledge sharing on modern data, cloud, and AI technologies. Qualifications - Master’s degree in Computer Science, Engineering, or a related field (or equivalent experience) is required. - 5+ years of experience in data engineering or analytics engineering with increasing responsibilities. - Proficient programming skills in Python and SQL. - Deep understanding of data architecture for analytics and ML (e.g., batch/streaming, modeling, performance optimization). - Proven ability to translate complex problems into clear, concise, and testable programming code/tools. - Experience implementing data contracts, data validation, schema versioning, and governance practices, as well as a solid understanding of leading cloud concepts (GCP preferred). - Experience designing and operating APIs and microservices-based architectures. - Excellent written and verbal communication, customer service, interpersonal, and teamwork skills to foster a collaborative team environment. - Solid understanding of SDLC and Agile methodologies, alongside basic project management skills. - Experience building production workloads on Google Cloud Platform (GCP) is preferred. - Experience provisioning infrastructure using Terraform (Infrastructure as Code) and building CI/CD pipelines (e.g., Jenkins) is preferred. - Experience in pharmaceuticals, life sciences, healthcare, or a related regulated domain is preferred. - GCP certification is preferred. - Experience enabling AI/ML and GenAI workflows (e.g., feature engineering, RAG patterns, semantic retrieval) for analytical applications is preferred. Johnson & Johnson is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, age, national origin, disability, protected veteran status or other characteristics protected by federal, state or local law. We actively seek qualified candidates who are protected veterans and individuals with disabilities as defined under VEVRAA and Section 503 of the Rehabilitation Act. Johnson & Johnson is committed to providing an interview process that is inclusive of our applicants’ needs. If you are an individual with a disability and would like to request an accommodation, external applicants please contact us via https://www.jnj.com/contact-us/careers , internal employees contact AskGS to be directed to your accommodation resource. #JNJTech #LI-Hybrid #LI-GR1 Required Skills: Preferred Skills: Advanced Analytics, Agility Jumps, Coaching, Critical Thinking, Data Engineering, Data Governance, Data Modeling, Data Privacy Standards, Data Science, Digital Fluency, Execution Focus, Hybrid Clouds, Organizing, Presentation Design, Technical Development, Technical Writing, Technologically Savvy The anticipated base pay range for this position is : $102,000.00 - $177,100.00 Additional Description for Pay Transparency: Subject to the terms of their respective plans, employees are eligible to participate in the Company’s consolidated retirement plan (pension) and savings plan (401(k)). Subject to the terms of their respective policies and date of hire, employees are eligible for the following time off benefits: Vacation –120 hours per calendar year Sick time - 40 hours per calendar year; for employees who reside in the State of Colorado –48 hours per calendar year; for employees who reside in the State of Washington –56 hours per calendar year Holiday pay, including Floating Holidays –13 days per calendar year Work, Personal and Family Time - up to 40 hours per calendar year Parental Leave – 480 hours within one year of the birth/adoption/foster care of a child Bereavement Leave – 240 hours for an immediate family member: 40 hours for an extended family member per calendar year Caregiver Leave – 80 hours in a 52-week rolling period10 days Volunteer Leave – 32 hours per calendar year Military Spouse Time-Off – 80 hours per calendar year

Related Categories

Related Job Pages

More Data Engineer Jobs

Blend360 logo

Senior Data Engineer

Blend360

Optimizing business performance through people, data, tech & analytics

Data Engineer24 days ago
Full TimeRemoteTeam 501-1,000H1B Sponsor

• Design and implement data ingestion architectures on Snowflake. • Design, develop, and implement scalable data pipelines and architectures on AWS. • Build and maintain ingestion pipelines integrating multiple data sources. • Define and implement data quality checks, validation schemas, and testing frameworks. • Establish data governance components including glossary definitions and data stewardship patterns. • Develop the Streamlit-based ingestion validation UI and its backend logic. • Collaborate with cross-functional teams to align technical solutions with business requirements. • Contribute to documentation and continuous improvement of data reliability and quality processes.

Argentina
ContractRemoteTeam 11-50H1B No Sponsor

• Design, implement, and optimize data pipelines that extract, transform, and load data into Snowflake from multiple sources using Airflow and AWS services • Build modular, well-documented dbt models with strong test coverage to serve business reporting, lifecycle marketing, and experimentation use cases • Partner with analytics and business stakeholders to define source-to-target transformations and implement them in dbt • Maintain and improve our orchestration layer (Airflow/Astronomer) to ensure reliability, visibility, and efficient dependency management • Collaborate on data model design best practices, including dimensional modeling, naming conventions, and versioning strategies

Argentina
CI&T logo

Data Architect

CI&T

Navigate Change

Data Engineer24 days ago
Full TimeRemoteTeam 5,001-10,000Since 1995H1B No Sponsor

• Play a pivotal role in designing and implementing cutting-edge cloud solutions. • Guide the migration and modernization strategies that empower clients to leverage the full potential of AWS. • Collaborate with diverse teams to ensure cloud architectures are robust, efficient, and aligned with client objectives. • Create and articulate a compelling value proposition for AWS migration, demonstrating its transformational benefits to client executives. • Define and execute technical migration and modernization strategies in collaboration with highly technical teams. • Deliver modern architectures, including microservices and event-driven architectures, using managed services and cloud-native capabilities. • Engage with clients’ development, infrastructure, security, and IT operations teams to identify repeatable patterns and architectures for cloud migration. • Identify platform improvement opportunities and influence future iterations of the AWS platform and Migration Acceleration Program.

Colombia
Reply logo

Engenheiro de Dados Sênior

Reply

Reply designs and implements innovative solutions in the areas: Digital Services, Technology and Consulting.

Data Engineer24 days ago
Full TimeRemoteTeam 10,001+Since 1996H1B Sponsor

• Construir e gerenciar pipelines confiáveis de dados envolvendo ingestão/coleta, processamento, integração, armazenamento e disponibilização de dados na organização. • Atuar em uma arquitetura de sistemas distribuídos para o processamento de dados massivos em paralelo (MPP), combinando diversas fontes de dados heterogêneas e colaborando com equipes de análise e ciência de dados na construção de soluções e geração de valor baseadas em dados.

Italy