This project was about building a straightforward, scalable data pipeline for healthcare data using modern tools and cloud tech. For a full breakdown of the process, challenges, and insights gained, check out my article on Medium: https://medium.com/@jushijun/building-an-end-to-end-data-pipeline-for-healthcare-data-with-bigquery-dbt-and-github-ci-cd-8e772b01e318
Github: https://github.com/shj37/dbt-redshift-aws-banking-data-warehouse
Goal
Create a pipeline to ingest, transform, and deploy healthcare data for actionable insights.
Tools Used
- Google Cloud Platform (GCP)
- BigQuery
- Google Cloud Storage (GCS)
- dbt (data build tool)
- GitHub Actions
- Python
- SQL
How I Did It
- Set up GCP with a storage bucket and BigQuery datasets.
- Generated mock healthcare data and stored it in GCS.
- Built transformation models in dbt to clean and structure the data.
- Automated testing and deployment with GitHub Actions.
Results
- Delivered a working pipeline that handles healthcare data efficiently.
- Learned a ton about cloud data tools and automation.