Applied machine learning

Use machine learning and online traffic data of an official account on WeChat, one of the biggest social media platforms in China, to predict the net-increase of followers, so as to have a better understanding of users' behavior.

 

My approach

 

1. Data preparation

Download and prepare the raw data by splitting the data into three parts: Cross-Validation data(70%), Development data(20%), and Test data(10%).

2. Baseline experiment

Run the linear regression algorithm with default settings on the dataset to get the baseline performance

3. Data exploration

Conduct the K means experiment to make more sense of the data.

4. Feature engineering

Conduct rounds of feature engineering and error analysis on LightSide to adjust the feature space.

5. Tuning

Tune the SVM regression algorithm to get the final model.

 
The model got significant improvement after feature engineering.

The model got significant improvement after feature engineering.

Result

The final model has been significantly improved over the baseline model.

Metric Baseline Final model
Correlation coefficient 0.2841 0.3009
Mean absolute error 0.0148 0.0143
Root mean squared error 0.0376 0.0379
Relative absolute error 81.0229% 78.6965%
Root relative squared error 95.931% 96.6811%
 

 Reflection

From this project, I learned how to do error analysis and expand the feature space systematically. Additionally, I found that only informative data and meaningful features can lead to high-quality models. All this knowledge is very helpful for me to design more reasonable products in the future.