文档详情

python和mysql实现Yelp数据分析.pdf

发布:2017-06-21约3.21万字共23页下载文档
文本预览下载声明
Running Head: YELP DATASET ANALYSIS Yelp dataset analysis Xiaoyu Chen Wang Yang University of Waterloo Winter 2017 ECE 656 Database System YELP DATASET ANALYSIS 2 Abstract The aim of the project is to analyze the Yelp dataset. There are a large quantity in the Yelp dataset. In order to analyze the Yelp dataset based on valid data, we completed to data cleaning to removing or modifying the records which are not conform to some constraints. Besides, we added the index after importing these records into database to find the difference with index and without index. We selected two topics to analyze. One of them is to predict the rating given by a user to a business where the user has never been there before. We analyzed the distribution of user’s rating and the reason in which they gave these ratings. The second analysis is to find if a business is declining or improving in its rating and tried to find the trend of the change of ratings. Key word: data cleaning, data indexing, yelp, database YELP DATASET ANALYSIS 3 Yelp dataset analysis Introduction Yelp dataset is an online database provided by official website which contains the information of businesses and their users. The dataset has five different data types: businesses, reviews, users, checkins and tips. The businesses store the information of every
显示全部
相似文档