jushijun@gmail.com ⋄ Barrie, ON ⋄ linkedin.com/in/shijunju
SUMMARY
Results-driven Data Scientist and AI Engineer with a Ph.D. in Economics and experience in AI, machine learning, and data analysis. Proficient in Python, with a strong background in statistical modeling and AI technologies. Proven ability to conduct complex AI projects and deliver innovative solutions that drive business outcomes.
- Keywords: LLM, Speech-to-text, Fine-tuning, LoRA, QLoRA, Economics
RELEVANT EXPERIENCE
KaggleX Fellowship: LLM Fine-tuning, PyTorch, Keras; Remote; Sep 2024 – Dec 2024 Presentation Video, PPT with links to codes and models, Project Details
- Selected as one of 81 fellowship recipients from over 3,000 applicants for the KaggleX fellowship program.
- Final project ReguGuard AI selected as one of the 16 final showcases in the end of program celebration
- Developed an AI chatbot ReguGuard AI for financial risk compliance by fine-tuning large language models (Gemma-2b-en and Gemma-7b) using advanced techniques (LoRA, QLoRA) on GPU and TPU.
- Achieved an accuracy rate of 78.6% with the model, demonstrating expertise in machine learning and model optimization.
- Adapted LLamaIndex's RAFT DatasetPack module and used OpenAI API to generate over 14,000 question-answer pairs for training
- Conducted extensive testing on model parameters, enhancing understanding of machine learning performance metrics.
Skinopathy: Student AI Researcher; Speech-to-text model Fine-tuning, PyTorch, QLoRA; Remote; Sep 2024 – Dec 2024 LINK
- Fine-tuned the OpenAI Whisper Large-v3 Turbo model to improve audio recognition of dermatology terminology using QLoRA.
- Achieved a 33% reduction in Word Error Rate (WER), improving accuracy from 14.5% to 9.5%.
- Cleaned and processed over 100 dermatology documents, which were used to synthesize more than 780 English audio samples with varied accents and genders using the Google Text-to-Speech API for training purposes.
Data Engineering Projects: Microsoft Fabric, Azure Data Factory; Canada; Aug 2024
- Optimized end-to-end data engineering pipeline leveraging Azure services LINK:
- Designed and implemented ETL pipelines to ingest on-premise SQL Server data into Azure Data Lake via Azure Data Factory,
- Performed data transformations from Bronze to Gold layer with Azure Databricks,
- Loaded transformed data into Azure Synapse Analytics
- Developed interactive dashboards and reports using Power BI
- Developed a sentiment analysis pipeline leveraging Microsoft Fabric LINK:
- Orchestrated data ingestion from Bing API using Data Factory and stored data in One Lake
- Performed transformations with Synapse Data Engineering
- Applied machine learning models for sentiment analysis via Synapse Data Science
- Created interactive visualizations and reports using Power BI
AI Projects: AutoML, LLM, NLP; Canada; May 2024 – July 2024
- Ranked in the Top 1% (15th out of 1,847) in the 2024 KaggleX Skill Assessment Challenge, showcasing exceptional data science skills.
- Successfully combined AutoML frameworks (AutoGluon, LightAutoML) to predict used car prices, demonstrating proficiency in automated machine learning techniques.
- Executed comprehensive topic modeling on 3,000+ multiple-choice questions using BERTopic, providing insights into educational content trends.
3auk.com, LiuXueWangXiao.com: Founder; China; Jun 2021 – Aug 2023
- Founded and led an online learning platform, enhancing user experience through data-driven design and interactive features.
- Spearheaded the development of educational tools, fostering engagement and improving learning outcomes.
Educational Applications: Programmer; JS, Unity(C#); China; Jun 2019 – Aug 2021
- Designed and implemented a Virtual Reality Classroom, integrating immersive technology to enhance educational experiences.
- Developed cloud-based applications that personalized learning and improved efficiency, showcasing strong programming and analytical skills.
Web Scraping and Data Collection: Programmer; China; Jan 2020 - Feb 2020
- Utilized Python (Scrapy) and MongoDB for data collection and transformation, demonstrating expertise in data engineering and management.
TECHNICAL SKILLS
- Programming Languages: Python, PySpark, R, SQL, SAS, LATEX, JavaScript
- Data Analytics / AI Tools: AutoML (AutoGluon, LightAutoML, MLJAR), PyTorch, Keras, LoRA, QLoRA, BERTopic
- Big Data, Cloud Technologies: Azure (AI Services, Data Factory, Data Lake, Databricks, Synapse Analytics, OpenAI, Power BI), AWS Machine Learning
EDUCATION
- Graduate Certificate in Artificial Intelligence, Georgian College, Canada - Dec 2024 (expected)
- Graduate Certificate in Marketing Analytics, Centennial College, Canada - May 2024
- Ph.D. in Economics, University of Pittsburgh, United States - 2015
- B.A. with Honorary M.A. in Economics, University of Edinburgh, Scotland - Jun 2006
PROFESSIONAL CERTIFICATIONS & TRAINING
- Passed all three levels of Chartered Financial Analyst (CFA) exams
LANGUAGES
- English (fluent), Mandarin Chinese (native)