This project showcases a streamlined data pipeline built to handle healthcare data using modern data engineering tools. I used Apache Airflow (via Astronomer) to orchestrate workflows, Google BigQuery as a scalable data warehouse, and dbt for transformations and quality checks.
Starting with synthetic data generation, the pipeline uploads data to Google Cloud Storage, creates external tables in BigQuery, and applies transformations—dynamically adapting to development and production environments. Docker ensured a consistent setup, while the solution highlights my ability to integrate cloud tools, automate processes, and maintain data quality.