Shijun Ju’s Resume

jushijun@gmail.com ⋄ Barrie, ON ⋄ linkedin.com/in/shijunju

SUMMARY

Results-driven Data Scientist and AI Engineer with a Ph.D. in Economics and experience in AI, machine learning, and data analysis. Proficient in Python, with a strong background in statistical modeling and AI technologies. Proven ability to conduct complex AI projects and deliver innovative solutions that drive business outcomes.

  • Keywords: LLM, Speech-to-text, Fine-tuning, LoRA, QLoRA, Economics

RELEVANT EXPERIENCE

KaggleX Fellowship: LLM Fine-tuning, PyTorch, Keras; Remote; Sep 2024 – Dec 2024 Presentation Video, PPT with links to codes and models, Project Details

  • Selected as one of 81 fellowship recipients from over 3,000 applicants for the KaggleX fellowship program.
  • Final project ReguGuard AI selected as one of the 16 final showcases in the end of program celebration
  • Developed an AI chatbot ReguGuard AI for financial risk compliance by fine-tuning large language models (Gemma-2b-en and Gemma-7b) using advanced techniques (LoRA, QLoRA) on GPU and TPU.
  • Achieved an accuracy rate of 78.6% with the model, demonstrating expertise in machine learning and model optimization.
  • Adapted LLamaIndex's RAFT DatasetPack module and used OpenAI API to generate over 14,000 question-answer pairs for training
  • Conducted extensive testing on model parameters, enhancing understanding of machine learning performance metrics.

Skinopathy: Student AI Researcher; Speech-to-text model Fine-tuning, PyTorch, QLoRA; Remote; Sep 2024 – Dec 2024 LINK

  • Fine-tuned the OpenAI Whisper Large-v3 Turbo model to improve audio recognition of dermatology terminology using QLoRA.
  • Achieved a 33% reduction in Word Error Rate (WER), improving accuracy from 14.5% to 9.5%.
  • Cleaned and processed over 100 dermatology documents, which were used to synthesize more than 780 English audio samples with varied accents and genders using the Google Text-to-Speech API for training purposes.

Data Engineering Projects: Microsoft Fabric, Azure Data Factory; Canada; Aug 2024

  • Optimized end-to-end data engineering pipeline leveraging Azure services LINK:
    • Designed and implemented ETL pipelines to ingest on-premise SQL Server data into Azure Data Lake via Azure Data Factory,
    • Performed data transformations from Bronze to Gold layer with Azure Databricks,
    • Loaded transformed data into Azure Synapse Analytics
    • Developed interactive dashboards and reports using Power BI
  • Developed a sentiment analysis pipeline leveraging Microsoft Fabric LINK:
    • Orchestrated data ingestion from Bing API using Data Factory and stored data in One Lake
    • Performed transformations with Synapse Data Engineering
    • Applied machine learning models for sentiment analysis via Synapse Data Science
    • Created interactive visualizations and reports using Power BI

AI Projects: AutoML, LLM, NLP; Canada; May 2024 – July 2024

  • Ranked in the Top 1% (15th out of 1,847) in the 2024 KaggleX Skill Assessment Challenge, showcasing exceptional data science skills.
  • Successfully combined AutoML frameworks (AutoGluon, LightAutoML) to predict used car prices, demonstrating proficiency in automated machine learning techniques.
  • Executed comprehensive topic modeling on 3,000+ multiple-choice questions using BERTopic, providing insights into educational content trends.

3auk.com, LiuXueWangXiao.com: Founder; China; Jun 2021 – Aug 2023

  • Founded and led an online learning platform, enhancing user experience through data-driven design and interactive features.
  • Spearheaded the development of educational tools, fostering engagement and improving learning outcomes.

Educational Applications: Programmer; JS, Unity(C#); China; Jun 2019 – Aug 2021

  • Designed and implemented a Virtual Reality Classroom, integrating immersive technology to enhance educational experiences.
  • Developed cloud-based applications that personalized learning and improved efficiency, showcasing strong programming and analytical skills.

Web Scraping and Data Collection: Programmer; China; Jan 2020 - Feb 2020

  • Utilized Python (Scrapy) and MongoDB for data collection and transformation, demonstrating expertise in data engineering and management.

TECHNICAL SKILLS

  • Programming Languages: Python, PySpark, R, SQL, SAS, LATEX, JavaScript
  • Data Analytics / AI Tools: AutoML (AutoGluon, LightAutoML, MLJAR), PyTorch, Keras, LoRA, QLoRA, BERTopic
  • Big Data, Cloud Technologies: Azure (AI Services, Data Factory, Data Lake, Databricks, Synapse Analytics, OpenAI, Power BI), AWS Machine Learning

EDUCATION

  • Graduate Certificate in Artificial Intelligence, Georgian College, Canada - Dec 2024 (expected)
  • Graduate Certificate in Marketing Analytics, Centennial College, Canada - May 2024
  • Ph.D. in Economics, University of Pittsburgh, United States - 2015
  • B.A. with Honorary M.A. in Economics, University of Edinburgh, Scotland - Jun 2006

PROFESSIONAL CERTIFICATIONS & TRAINING

  • Passed all three levels of Chartered Financial Analyst (CFA) exams

LANGUAGES

  • English (fluent), Mandarin Chinese (native)