Columbia University - Mailman School of Public HealthExpected May 2023
Master of Science in Biostatistics, Specialization in Data Science
Relevant Coursework: Introduction to Database, Data Science (R), Biostatistics Method
Shanghai University of Finance and Economics (SUFE) - School of Statistics and Management, China June 2021
Bachelor of Science Statistics
Relevant Coursework: Data Mining (Python), Machine Learning, Computational Programming (C++), Mathematical Analysis, Probability, Advanced Mathematics, Regression Analysis (R, SAS), Random Process, Database Theory (SQL, C#), EDA (R)
PROFESSIONAL EXPERIENCE
Ping An Technology | Data Scientist Intern , Shanghai, China April 2021 – July 2021
Implemented and deployed of data pipelines, data models, and ETL processes with PySpark in Apache Zeppelin
Optimized web crawlers code and feature engineering process; improved efficiency by 40%
Used LightGBM, XGBoost to predict if customer will renew auto insurance and health insurance with AUC 0.93; researched the feasibility of implementing the GNN auto insurance anti-fraud model
Shengqu Games | Data Analyst Intern , Shanghai, China October 2020 – April 2021
Reduced reporting time from hours to seconds by using SQL and Python to query data and automate reporting
Researched on user payment/churn issues and provide in-depth data reports to the game planner teams with Tableau and Plotly
Predicted user churn with AUC 0.92 using Kmeans, RFM analysis, and an ensembled model of XGBoost and Random Forest
Hylink Digital Solution | Data Analyst Intern, Shanghai, China August 2020 – October 2020
Wrote Python code to update daily/weekly report and built a new data dashboard on Excel to demonstrate performance of campaign to our client (PayPal) with convenience
Undertake exploratory data analysis (using Python and Tableau) towards weekly campaign data on several media such as Baidu and Google to offer business insights to our client