python和mysql实现Yelp数据分析.pdf
文本预览下载声明
Running Head: YELP DATASET ANALYSIS
Yelp dataset analysis
Xiaoyu Chen
Wang Yang
University of Waterloo
Winter 2017
ECE 656 Database System
YELP DATASET ANALYSIS 2
Abstract
The aim of the project is to analyze the Yelp dataset. There are a large quantity in the Yelp
dataset. In order to analyze the Yelp dataset based on valid data, we completed to data cleaning
to removing or modifying the records which are not conform to some constraints. Besides, we
added the index after importing these records into database to find the difference with index and
without index. We selected two topics to analyze. One of them is to predict the rating given by a
user to a business where the user has never been there before. We analyzed the distribution of
user’s rating and the reason in which they gave these ratings. The second analysis is to find if a
business is declining or improving in its rating and tried to find the trend of the change of ratings.
Key word: data cleaning, data indexing, yelp, database
YELP DATASET ANALYSIS 3
Yelp dataset analysis
Introduction
Yelp dataset is an online database provided by official website which contains the
information of businesses and their users. The dataset has five different data types: businesses,
reviews, users, checkins and tips.
The businesses store the information of every
显示全部