Showing 12 Result(s)

Automating Healthcare Data Pipelines with Airflow, BigQuery, and dbt

This project showcases a streamlined data pipeline built to handle healthcare data using modern data engineering tools. I used Apache Airflow (via Astronomer) to orchestrate workflows, Google BigQuery as a scalable data warehouse, and dbt for transformations and quality checks. Starting with synthetic data generation, the pipeline uploads data to Google Cloud Storage, creates external …

Building an End-to-End Data Pipeline for Healthcare Data with BigQuery, dbt, and GitHub CI/CD

This project was about building a straightforward, scalable data pipeline for healthcare data using modern tools and cloud tech. For a full breakdown of the process, challenges, and insights gained, check out my article on Medium: https://medium.com/@jushijun/building-an-end-to-end-data-pipeline-for-healthcare-data-with-bigquery-dbt-and-github-ci-cd-8e772b01e318 Github: https://github.com/shj37/dbt-redshift-aws-banking-data-warehouse GoalCreate a pipeline to ingest, transform, and deploy healthcare data for actionable insights. Tools Used How …

Building a Banking Customer Data Warehouse: An End-to-End Guide Using AWS and dbt

I created a step-by-step data warehouse for banking customer data using AWS and dbt. This guide walks through schema design, cloud setup, and data transformation—perfect for anyone curious about practical data engineering. Check out the full article here: https://medium.com/@jushijun/building-a-banking-customer-data-warehouse-an-end-to-end-guide-using-aws-and-dbt-c058ebe7af35. Github: https://github.com/shj37/dbt-redshift-aws-banking-data-warehouse

Implementing a Medallion Architecture Data Pipeline on Azure with Data Factory, Databricks, and DBT

In this project, I built a medallion architecture data pipeline on Azure, integrating Azure Data Factory, Databricks, and dbt to process and transform data across bronze, silver, and gold layers. It was a practical, real-world application of data engineering concepts, focusing on scalability and data quality. For a full breakdown of the process, challenges, and …

Building a Weather Data Pipeline with Apache Airflow, AWS, and Amazon RDS

Detailed walkthrough at Medium.com: https://medium.com/@jushijun/building-a-weather-data-pipeline-with-apache-airflow-aws-and-amazon-rds-fca4ab31540c In this project, I developed a fully automated weather data pipeline to streamline the ingestion, transformation, and storage of weather data. Using Apache Airflow for orchestration, I integrated the OpenWeatherMap API to extract real-time data, processed it with Python, and stored the results in a PostgreSQL database hosted on Amazon RDS. …

Building an Incremental Data Pipeline with dbt, Snowflake, and Amazon S3

Detailed walkthrough at Medium.com: https://medium.com/@jushijun/building-an-incremental-data-pipeline-with-dbt-snowflake-and-amazon-s3-e8bee58e69d7 In this project, I developed a scalable data pipeline integrating Amazon S3, Snowflake, and dbt to efficiently manage and transform order data. The pipeline generates synthetic order data, stores it in S3, stages it in Snowflake, and uses dbt to perform incremental loads based on CDC timestamps, ensuring optimal performance by …

Real-Time Weather Data Streaming and Analysis on Azure

Project description in details at Medium.com: Description This project showcases the development of an end-to-end real-time streaming data pipeline using Microsoft Azure services. The objective was to ingest weather data from a public API, process it in real-time, and deliver actionable insights through visualizations and alerts, all while prioritizing cost efficiency and security. By leveraging a …