Codes added
This commit is contained in:
commit
67d6328405
157
Datasets/ml-100k/README
Normal file
157
Datasets/ml-100k/README
Normal file
@ -0,0 +1,157 @@
|
||||
SUMMARY & USAGE LICENSE
|
||||
=============================================
|
||||
|
||||
MovieLens data sets were collected by the GroupLens Research Project
|
||||
at the University of Minnesota.
|
||||
|
||||
This data set consists of:
|
||||
* 100,000 ratings (1-5) from 943 users on 1682 movies.
|
||||
* Each user has rated at least 20 movies.
|
||||
* Simple demographic info for the users (age, gender, occupation, zip)
|
||||
|
||||
The data was collected through the MovieLens web site
|
||||
(movielens.umn.edu) during the seven-month period from September 19th,
|
||||
1997 through April 22nd, 1998. This data has been cleaned up - users
|
||||
who had less than 20 ratings or did not have complete demographic
|
||||
information were removed from this data set. Detailed descriptions of
|
||||
the data file can be found at the end of this file.
|
||||
|
||||
Neither the University of Minnesota nor any of the researchers
|
||||
involved can guarantee the correctness of the data, its suitability
|
||||
for any particular purpose, or the validity of results based on the
|
||||
use of the data set. The data set may be used for any research
|
||||
purposes under the following conditions:
|
||||
|
||||
* The user may not state or imply any endorsement from the
|
||||
University of Minnesota or the GroupLens Research Group.
|
||||
|
||||
* The user must acknowledge the use of the data set in
|
||||
publications resulting from the use of the data set
|
||||
(see below for citation information).
|
||||
|
||||
* The user may not redistribute the data without separate
|
||||
permission.
|
||||
|
||||
* The user may not use this information for any commercial or
|
||||
revenue-bearing purposes without first obtaining permission
|
||||
from a faculty member of the GroupLens Research Project at the
|
||||
University of Minnesota.
|
||||
|
||||
If you have any further questions or comments, please contact GroupLens
|
||||
<grouplens-info@cs.umn.edu>.
|
||||
|
||||
CITATION
|
||||
==============================================
|
||||
|
||||
To acknowledge use of the dataset in publications, please cite the
|
||||
following paper:
|
||||
|
||||
F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets:
|
||||
History and Context. ACM Transactions on Interactive Intelligent
|
||||
Systems (TiiS) 5, 4, Article 19 (December 2015), 19 pages.
|
||||
DOI=http://dx.doi.org/10.1145/2827872
|
||||
|
||||
|
||||
ACKNOWLEDGEMENTS
|
||||
==============================================
|
||||
|
||||
Thanks to Al Borchers for cleaning up this data and writing the
|
||||
accompanying scripts.
|
||||
|
||||
PUBLISHED WORK THAT HAS USED THIS DATASET
|
||||
==============================================
|
||||
|
||||
Herlocker, J., Konstan, J., Borchers, A., Riedl, J.. An Algorithmic
|
||||
Framework for Performing Collaborative Filtering. Proceedings of the
|
||||
1999 Conference on Research and Development in Information
|
||||
Retrieval. Aug. 1999.
|
||||
|
||||
FURTHER INFORMATION ABOUT THE GROUPLENS RESEARCH PROJECT
|
||||
==============================================
|
||||
|
||||
The GroupLens Research Project is a research group in the Department
|
||||
of Computer Science and Engineering at the University of Minnesota.
|
||||
Members of the GroupLens Research Project are involved in many
|
||||
research projects related to the fields of information filtering,
|
||||
collaborative filtering, and recommender systems. The project is lead
|
||||
by professors John Riedl and Joseph Konstan. The project began to
|
||||
explore automated collaborative filtering in 1992, but is most well
|
||||
known for its world wide trial of an automated collaborative filtering
|
||||
system for Usenet news in 1996. The technology developed in the
|
||||
Usenet trial formed the base for the formation of Net Perceptions,
|
||||
Inc., which was founded by members of GroupLens Research. Since then
|
||||
the project has expanded its scope to research overall information
|
||||
filtering solutions, integrating in content-based methods as well as
|
||||
improving current collaborative filtering technology.
|
||||
|
||||
Further information on the GroupLens Research project, including
|
||||
research publications, can be found at the following web site:
|
||||
|
||||
http://www.grouplens.org/
|
||||
|
||||
GroupLens Research currently operates a movie recommender based on
|
||||
collaborative filtering:
|
||||
|
||||
http://www.movielens.org/
|
||||
|
||||
DETAILED DESCRIPTIONS OF DATA FILES
|
||||
==============================================
|
||||
|
||||
Here are brief descriptions of the data.
|
||||
|
||||
ml-data.tar.gz -- Compressed tar file. To rebuild the u data files do this:
|
||||
gunzip ml-data.tar.gz
|
||||
tar xvf ml-data.tar
|
||||
mku.sh
|
||||
|
||||
u.data -- The full u data set, 100000 ratings by 943 users on 1682 items.
|
||||
Each user has rated at least 20 movies. Users and items are
|
||||
numbered consecutively from 1. The data is randomly
|
||||
ordered. This is a tab separated list of
|
||||
user id | item id | rating | timestamp.
|
||||
The time stamps are unix seconds since 1/1/1970 UTC
|
||||
|
||||
u.info -- The number of users, items, and ratings in the u data set.
|
||||
|
||||
u.item -- Information about the items (movies); this is a tab separated
|
||||
list of
|
||||
movie id | movie title | release date | video release date |
|
||||
IMDb URL | unknown | Action | Adventure | Animation |
|
||||
Children's | Comedy | Crime | Documentary | Drama | Fantasy |
|
||||
Film-Noir | Horror | Musical | Mystery | Romance | Sci-Fi |
|
||||
Thriller | War | Western |
|
||||
The last 19 fields are the genres, a 1 indicates the movie
|
||||
is of that genre, a 0 indicates it is not; movies can be in
|
||||
several genres at once.
|
||||
The movie ids are the ones used in the u.data data set.
|
||||
|
||||
u.genre -- A list of the genres.
|
||||
|
||||
u.user -- Demographic information about the users; this is a tab
|
||||
separated list of
|
||||
user id | age | gender | occupation | zip code
|
||||
The user ids are the ones used in the u.data data set.
|
||||
|
||||
u.occupation -- A list of the occupations.
|
||||
|
||||
u1.base -- The data sets u1.base and u1.test through u5.base and u5.test
|
||||
u1.test are 80%/20% splits of the u data into training and test data.
|
||||
u2.base Each of u1, ..., u5 have disjoint test sets; this if for
|
||||
u2.test 5 fold cross validation (where you repeat your experiment
|
||||
u3.base with each training and test set and average the results).
|
||||
u3.test These data sets can be generated from u.data by mku.sh.
|
||||
u4.base
|
||||
u4.test
|
||||
u5.base
|
||||
u5.test
|
||||
|
||||
ua.base -- The data sets ua.base, ua.test, ub.base, and ub.test
|
||||
ua.test split the u data into a training set and a test set with
|
||||
ub.base exactly 10 ratings per user in the test set. The sets
|
||||
ub.test ua.test and ub.test are disjoint. These data sets can
|
||||
be generated from u.data by mku.sh.
|
||||
|
||||
allbut.pl -- The script that generates training and test sets where
|
||||
all but n of a users ratings are in the training data.
|
||||
|
||||
mku.sh -- A shell script to generate all the u data sets from u.data.
|
34
Datasets/ml-100k/allbut.pl
Normal file
34
Datasets/ml-100k/allbut.pl
Normal file
@ -0,0 +1,34 @@
|
||||
#!/usr/local/bin/perl
|
||||
|
||||
# get args
|
||||
if (@ARGV < 3) {
|
||||
print STDERR "Usage: $0 base_name start stop max_test [ratings ...]\n";
|
||||
exit 1;
|
||||
}
|
||||
$basename = shift;
|
||||
$start = shift;
|
||||
$stop = shift;
|
||||
$maxtest = shift;
|
||||
|
||||
# open files
|
||||
open( TESTFILE, ">$basename.test" ) or die "Cannot open $basename.test for writing\n";
|
||||
open( BASEFILE, ">$basename.base" ) or die "Cannot open $basename.base for writing\n";
|
||||
|
||||
# init variables
|
||||
$testcnt = 0;
|
||||
|
||||
while (<>) {
|
||||
($user) = split;
|
||||
if (! defined $ratingcnt{$user}) {
|
||||
$ratingcnt{$user} = 0;
|
||||
}
|
||||
++$ratingcnt{$user};
|
||||
if (($testcnt < $maxtest || $maxtest <= 0)
|
||||
&& $ratingcnt{$user} >= $start && $ratingcnt{$user} <= $stop) {
|
||||
++$testcnt;
|
||||
print TESTFILE;
|
||||
}
|
||||
else {
|
||||
print BASEFILE;
|
||||
}
|
||||
}
|
25
Datasets/ml-100k/mku.sh
Normal file
25
Datasets/ml-100k/mku.sh
Normal file
@ -0,0 +1,25 @@
|
||||
#!/bin/sh
|
||||
|
||||
trap `rm -f tmp.$$; exit 1` 1 2 15
|
||||
|
||||
for i in 1 2 3 4 5
|
||||
do
|
||||
head -`expr $i \* 20000` u.data | tail -20000 > tmp.$$
|
||||
sort -t" " -k 1,1n -k 2,2n tmp.$$ > u$i.test
|
||||
head -`expr \( $i - 1 \) \* 20000` u.data > tmp.$$
|
||||
tail -`expr \( 5 - $i \) \* 20000` u.data >> tmp.$$
|
||||
sort -t" " -k 1,1n -k 2,2n tmp.$$ > u$i.base
|
||||
done
|
||||
|
||||
allbut.pl ua 1 10 100000 u.data
|
||||
sort -t" " -k 1,1n -k 2,2n ua.base > tmp.$$
|
||||
mv tmp.$$ ua.base
|
||||
sort -t" " -k 1,1n -k 2,2n ua.test > tmp.$$
|
||||
mv tmp.$$ ua.test
|
||||
|
||||
allbut.pl ub 11 20 100000 u.data
|
||||
sort -t" " -k 1,1n -k 2,2n ub.base > tmp.$$
|
||||
mv tmp.$$ ub.base
|
||||
sort -t" " -k 1,1n -k 2,2n ub.test > tmp.$$
|
||||
mv tmp.$$ ub.test
|
||||
|
1683
Datasets/ml-100k/movies.csv
Normal file
1683
Datasets/ml-100k/movies.csv
Normal file
File diff suppressed because it is too large
Load Diff
20000
Datasets/ml-100k/test.csv
Normal file
20000
Datasets/ml-100k/test.csv
Normal file
File diff suppressed because it is too large
Load Diff
80000
Datasets/ml-100k/train.csv
Normal file
80000
Datasets/ml-100k/train.csv
Normal file
File diff suppressed because it is too large
Load Diff
100000
Datasets/ml-100k/u.data
Normal file
100000
Datasets/ml-100k/u.data
Normal file
File diff suppressed because it is too large
Load Diff
20
Datasets/ml-100k/u.genre
Normal file
20
Datasets/ml-100k/u.genre
Normal file
@ -0,0 +1,20 @@
|
||||
unknown|0
|
||||
Action|1
|
||||
Adventure|2
|
||||
Animation|3
|
||||
Children's|4
|
||||
Comedy|5
|
||||
Crime|6
|
||||
Documentary|7
|
||||
Drama|8
|
||||
Fantasy|9
|
||||
Film-Noir|10
|
||||
Horror|11
|
||||
Musical|12
|
||||
Mystery|13
|
||||
Romance|14
|
||||
Sci-Fi|15
|
||||
Thriller|16
|
||||
War|17
|
||||
Western|18
|
||||
|
3
Datasets/ml-100k/u.info
Normal file
3
Datasets/ml-100k/u.info
Normal file
@ -0,0 +1,3 @@
|
||||
943 users
|
||||
1682 items
|
||||
100000 ratings
|
1682
Datasets/ml-100k/u.item
Normal file
1682
Datasets/ml-100k/u.item
Normal file
File diff suppressed because it is too large
Load Diff
21
Datasets/ml-100k/u.occupation
Normal file
21
Datasets/ml-100k/u.occupation
Normal file
@ -0,0 +1,21 @@
|
||||
administrator
|
||||
artist
|
||||
doctor
|
||||
educator
|
||||
engineer
|
||||
entertainment
|
||||
executive
|
||||
healthcare
|
||||
homemaker
|
||||
lawyer
|
||||
librarian
|
||||
marketing
|
||||
none
|
||||
other
|
||||
programmer
|
||||
retired
|
||||
salesman
|
||||
scientist
|
||||
student
|
||||
technician
|
||||
writer
|
943
Datasets/ml-100k/u.user
Normal file
943
Datasets/ml-100k/u.user
Normal file
@ -0,0 +1,943 @@
|
||||
1|24|M|technician|85711
|
||||
2|53|F|other|94043
|
||||
3|23|M|writer|32067
|
||||
4|24|M|technician|43537
|
||||
5|33|F|other|15213
|
||||
6|42|M|executive|98101
|
||||
7|57|M|administrator|91344
|
||||
8|36|M|administrator|05201
|
||||
9|29|M|student|01002
|
||||
10|53|M|lawyer|90703
|
||||
11|39|F|other|30329
|
||||
12|28|F|other|06405
|
||||
13|47|M|educator|29206
|
||||
14|45|M|scientist|55106
|
||||
15|49|F|educator|97301
|
||||
16|21|M|entertainment|10309
|
||||
17|30|M|programmer|06355
|
||||
18|35|F|other|37212
|
||||
19|40|M|librarian|02138
|
||||
20|42|F|homemaker|95660
|
||||
21|26|M|writer|30068
|
||||
22|25|M|writer|40206
|
||||
23|30|F|artist|48197
|
||||
24|21|F|artist|94533
|
||||
25|39|M|engineer|55107
|
||||
26|49|M|engineer|21044
|
||||
27|40|F|librarian|30030
|
||||
28|32|M|writer|55369
|
||||
29|41|M|programmer|94043
|
||||
30|7|M|student|55436
|
||||
31|24|M|artist|10003
|
||||
32|28|F|student|78741
|
||||
33|23|M|student|27510
|
||||
34|38|F|administrator|42141
|
||||
35|20|F|homemaker|42459
|
||||
36|19|F|student|93117
|
||||
37|23|M|student|55105
|
||||
38|28|F|other|54467
|
||||
39|41|M|entertainment|01040
|
||||
40|38|M|scientist|27514
|
||||
41|33|M|engineer|80525
|
||||
42|30|M|administrator|17870
|
||||
43|29|F|librarian|20854
|
||||
44|26|M|technician|46260
|
||||
45|29|M|programmer|50233
|
||||
46|27|F|marketing|46538
|
||||
47|53|M|marketing|07102
|
||||
48|45|M|administrator|12550
|
||||
49|23|F|student|76111
|
||||
50|21|M|writer|52245
|
||||
51|28|M|educator|16509
|
||||
52|18|F|student|55105
|
||||
53|26|M|programmer|55414
|
||||
54|22|M|executive|66315
|
||||
55|37|M|programmer|01331
|
||||
56|25|M|librarian|46260
|
||||
57|16|M|none|84010
|
||||
58|27|M|programmer|52246
|
||||
59|49|M|educator|08403
|
||||
60|50|M|healthcare|06472
|
||||
61|36|M|engineer|30040
|
||||
62|27|F|administrator|97214
|
||||
63|31|M|marketing|75240
|
||||
64|32|M|educator|43202
|
||||
65|51|F|educator|48118
|
||||
66|23|M|student|80521
|
||||
67|17|M|student|60402
|
||||
68|19|M|student|22904
|
||||
69|24|M|engineer|55337
|
||||
70|27|M|engineer|60067
|
||||
71|39|M|scientist|98034
|
||||
72|48|F|administrator|73034
|
||||
73|24|M|student|41850
|
||||
74|39|M|scientist|T8H1N
|
||||
75|24|M|entertainment|08816
|
||||
76|20|M|student|02215
|
||||
77|30|M|technician|29379
|
||||
78|26|M|administrator|61801
|
||||
79|39|F|administrator|03755
|
||||
80|34|F|administrator|52241
|
||||
81|21|M|student|21218
|
||||
82|50|M|programmer|22902
|
||||
83|40|M|other|44133
|
||||
84|32|M|executive|55369
|
||||
85|51|M|educator|20003
|
||||
86|26|M|administrator|46005
|
||||
87|47|M|administrator|89503
|
||||
88|49|F|librarian|11701
|
||||
89|43|F|administrator|68106
|
||||
90|60|M|educator|78155
|
||||
91|55|M|marketing|01913
|
||||
92|32|M|entertainment|80525
|
||||
93|48|M|executive|23112
|
||||
94|26|M|student|71457
|
||||
95|31|M|administrator|10707
|
||||
96|25|F|artist|75206
|
||||
97|43|M|artist|98006
|
||||
98|49|F|executive|90291
|
||||
99|20|M|student|63129
|
||||
100|36|M|executive|90254
|
||||
101|15|M|student|05146
|
||||
102|38|M|programmer|30220
|
||||
103|26|M|student|55108
|
||||
104|27|M|student|55108
|
||||
105|24|M|engineer|94043
|
||||
106|61|M|retired|55125
|
||||
107|39|M|scientist|60466
|
||||
108|44|M|educator|63130
|
||||
109|29|M|other|55423
|
||||
110|19|M|student|77840
|
||||
111|57|M|engineer|90630
|
||||
112|30|M|salesman|60613
|
||||
113|47|M|executive|95032
|
||||
114|27|M|programmer|75013
|
||||
115|31|M|engineer|17110
|
||||
116|40|M|healthcare|97232
|
||||
117|20|M|student|16125
|
||||
118|21|M|administrator|90210
|
||||
119|32|M|programmer|67401
|
||||
120|47|F|other|06260
|
||||
121|54|M|librarian|99603
|
||||
122|32|F|writer|22206
|
||||
123|48|F|artist|20008
|
||||
124|34|M|student|60615
|
||||
125|30|M|lawyer|22202
|
||||
126|28|F|lawyer|20015
|
||||
127|33|M|none|73439
|
||||
128|24|F|marketing|20009
|
||||
129|36|F|marketing|07039
|
||||
130|20|M|none|60115
|
||||
131|59|F|administrator|15237
|
||||
132|24|M|other|94612
|
||||
133|53|M|engineer|78602
|
||||
134|31|M|programmer|80236
|
||||
135|23|M|student|38401
|
||||
136|51|M|other|97365
|
||||
137|50|M|educator|84408
|
||||
138|46|M|doctor|53211
|
||||
139|20|M|student|08904
|
||||
140|30|F|student|32250
|
||||
141|49|M|programmer|36117
|
||||
142|13|M|other|48118
|
||||
143|42|M|technician|08832
|
||||
144|53|M|programmer|20910
|
||||
145|31|M|entertainment|V3N4P
|
||||
146|45|M|artist|83814
|
||||
147|40|F|librarian|02143
|
||||
148|33|M|engineer|97006
|
||||
149|35|F|marketing|17325
|
||||
150|20|F|artist|02139
|
||||
151|38|F|administrator|48103
|
||||
152|33|F|educator|68767
|
||||
153|25|M|student|60641
|
||||
154|25|M|student|53703
|
||||
155|32|F|other|11217
|
||||
156|25|M|educator|08360
|
||||
157|57|M|engineer|70808
|
||||
158|50|M|educator|27606
|
||||
159|23|F|student|55346
|
||||
160|27|M|programmer|66215
|
||||
161|50|M|lawyer|55104
|
||||
162|25|M|artist|15610
|
||||
163|49|M|administrator|97212
|
||||
164|47|M|healthcare|80123
|
||||
165|20|F|other|53715
|
||||
166|47|M|educator|55113
|
||||
167|37|M|other|L9G2B
|
||||
168|48|M|other|80127
|
||||
169|52|F|other|53705
|
||||
170|53|F|healthcare|30067
|
||||
171|48|F|educator|78750
|
||||
172|55|M|marketing|22207
|
||||
173|56|M|other|22306
|
||||
174|30|F|administrator|52302
|
||||
175|26|F|scientist|21911
|
||||
176|28|M|scientist|07030
|
||||
177|20|M|programmer|19104
|
||||
178|26|M|other|49512
|
||||
179|15|M|entertainment|20755
|
||||
180|22|F|administrator|60202
|
||||
181|26|M|executive|21218
|
||||
182|36|M|programmer|33884
|
||||
183|33|M|scientist|27708
|
||||
184|37|M|librarian|76013
|
||||
185|53|F|librarian|97403
|
||||
186|39|F|executive|00000
|
||||
187|26|M|educator|16801
|
||||
188|42|M|student|29440
|
||||
189|32|M|artist|95014
|
||||
190|30|M|administrator|95938
|
||||
191|33|M|administrator|95161
|
||||
192|42|M|educator|90840
|
||||
193|29|M|student|49931
|
||||
194|38|M|administrator|02154
|
||||
195|42|M|scientist|93555
|
||||
196|49|M|writer|55105
|
||||
197|55|M|technician|75094
|
||||
198|21|F|student|55414
|
||||
199|30|M|writer|17604
|
||||
200|40|M|programmer|93402
|
||||
201|27|M|writer|E2A4H
|
||||
202|41|F|educator|60201
|
||||
203|25|F|student|32301
|
||||
204|52|F|librarian|10960
|
||||
205|47|M|lawyer|06371
|
||||
206|14|F|student|53115
|
||||
207|39|M|marketing|92037
|
||||
208|43|M|engineer|01720
|
||||
209|33|F|educator|85710
|
||||
210|39|M|engineer|03060
|
||||
211|66|M|salesman|32605
|
||||
212|49|F|educator|61401
|
||||
213|33|M|executive|55345
|
||||
214|26|F|librarian|11231
|
||||
215|35|M|programmer|63033
|
||||
216|22|M|engineer|02215
|
||||
217|22|M|other|11727
|
||||
218|37|M|administrator|06513
|
||||
219|32|M|programmer|43212
|
||||
220|30|M|librarian|78205
|
||||
221|19|M|student|20685
|
||||
222|29|M|programmer|27502
|
||||
223|19|F|student|47906
|
||||
224|31|F|educator|43512
|
||||
225|51|F|administrator|58202
|
||||
226|28|M|student|92103
|
||||
227|46|M|executive|60659
|
||||
228|21|F|student|22003
|
||||
229|29|F|librarian|22903
|
||||
230|28|F|student|14476
|
||||
231|48|M|librarian|01080
|
||||
232|45|M|scientist|99709
|
||||
233|38|M|engineer|98682
|
||||
234|60|M|retired|94702
|
||||
235|37|M|educator|22973
|
||||
236|44|F|writer|53214
|
||||
237|49|M|administrator|63146
|
||||
238|42|F|administrator|44124
|
||||
239|39|M|artist|95628
|
||||
240|23|F|educator|20784
|
||||
241|26|F|student|20001
|
||||
242|33|M|educator|31404
|
||||
243|33|M|educator|60201
|
||||
244|28|M|technician|80525
|
||||
245|22|M|student|55109
|
||||
246|19|M|student|28734
|
||||
247|28|M|engineer|20770
|
||||
248|25|M|student|37235
|
||||
249|25|M|student|84103
|
||||
250|29|M|executive|95110
|
||||
251|28|M|doctor|85032
|
||||
252|42|M|engineer|07733
|
||||
253|26|F|librarian|22903
|
||||
254|44|M|educator|42647
|
||||
255|23|M|entertainment|07029
|
||||
256|35|F|none|39042
|
||||
257|17|M|student|77005
|
||||
258|19|F|student|77801
|
||||
259|21|M|student|48823
|
||||
260|40|F|artist|89801
|
||||
261|28|M|administrator|85202
|
||||
262|19|F|student|78264
|
||||
263|41|M|programmer|55346
|
||||
264|36|F|writer|90064
|
||||
265|26|M|executive|84601
|
||||
266|62|F|administrator|78756
|
||||
267|23|M|engineer|83716
|
||||
268|24|M|engineer|19422
|
||||
269|31|F|librarian|43201
|
||||
270|18|F|student|63119
|
||||
271|51|M|engineer|22932
|
||||
272|33|M|scientist|53706
|
||||
273|50|F|other|10016
|
||||
274|20|F|student|55414
|
||||
275|38|M|engineer|92064
|
||||
276|21|M|student|95064
|
||||
277|35|F|administrator|55406
|
||||
278|37|F|librarian|30033
|
||||
279|33|M|programmer|85251
|
||||
280|30|F|librarian|22903
|
||||
281|15|F|student|06059
|
||||
282|22|M|administrator|20057
|
||||
283|28|M|programmer|55305
|
||||
284|40|M|executive|92629
|
||||
285|25|M|programmer|53713
|
||||
286|27|M|student|15217
|
||||
287|21|M|salesman|31211
|
||||
288|34|M|marketing|23226
|
||||
289|11|M|none|94619
|
||||
290|40|M|engineer|93550
|
||||
291|19|M|student|44106
|
||||
292|35|F|programmer|94703
|
||||
293|24|M|writer|60804
|
||||
294|34|M|technician|92110
|
||||
295|31|M|educator|50325
|
||||
296|43|F|administrator|16803
|
||||
297|29|F|educator|98103
|
||||
298|44|M|executive|01581
|
||||
299|29|M|doctor|63108
|
||||
300|26|F|programmer|55106
|
||||
301|24|M|student|55439
|
||||
302|42|M|educator|77904
|
||||
303|19|M|student|14853
|
||||
304|22|F|student|71701
|
||||
305|23|M|programmer|94086
|
||||
306|45|M|other|73132
|
||||
307|25|M|student|55454
|
||||
308|60|M|retired|95076
|
||||
309|40|M|scientist|70802
|
||||
310|37|M|educator|91711
|
||||
311|32|M|technician|73071
|
||||
312|48|M|other|02110
|
||||
313|41|M|marketing|60035
|
||||
314|20|F|student|08043
|
||||
315|31|M|educator|18301
|
||||
316|43|F|other|77009
|
||||
317|22|M|administrator|13210
|
||||
318|65|M|retired|06518
|
||||
319|38|M|programmer|22030
|
||||
320|19|M|student|24060
|
||||
321|49|F|educator|55413
|
||||
322|20|M|student|50613
|
||||
323|21|M|student|19149
|
||||
324|21|F|student|02176
|
||||
325|48|M|technician|02139
|
||||
326|41|M|administrator|15235
|
||||
327|22|M|student|11101
|
||||
328|51|M|administrator|06779
|
||||
329|48|M|educator|01720
|
||||
330|35|F|educator|33884
|
||||
331|33|M|entertainment|91344
|
||||
332|20|M|student|40504
|
||||
333|47|M|other|V0R2M
|
||||
334|32|M|librarian|30002
|
||||
335|45|M|executive|33775
|
||||
336|23|M|salesman|42101
|
||||
337|37|M|scientist|10522
|
||||
338|39|F|librarian|59717
|
||||
339|35|M|lawyer|37901
|
||||
340|46|M|engineer|80123
|
||||
341|17|F|student|44405
|
||||
342|25|F|other|98006
|
||||
343|43|M|engineer|30093
|
||||
344|30|F|librarian|94117
|
||||
345|28|F|librarian|94143
|
||||
346|34|M|other|76059
|
||||
347|18|M|student|90210
|
||||
348|24|F|student|45660
|
||||
349|68|M|retired|61455
|
||||
350|32|M|student|97301
|
||||
351|61|M|educator|49938
|
||||
352|37|F|programmer|55105
|
||||
353|25|M|scientist|28480
|
||||
354|29|F|librarian|48197
|
||||
355|25|M|student|60135
|
||||
356|32|F|homemaker|92688
|
||||
357|26|M|executive|98133
|
||||
358|40|M|educator|10022
|
||||
359|22|M|student|61801
|
||||
360|51|M|other|98027
|
||||
361|22|M|student|44074
|
||||
362|35|F|homemaker|85233
|
||||
363|20|M|student|87501
|
||||
364|63|M|engineer|01810
|
||||
365|29|M|lawyer|20009
|
||||
366|20|F|student|50670
|
||||
367|17|M|student|37411
|
||||
368|18|M|student|92113
|
||||
369|24|M|student|91335
|
||||
370|52|M|writer|08534
|
||||
371|36|M|engineer|99206
|
||||
372|25|F|student|66046
|
||||
373|24|F|other|55116
|
||||
374|36|M|executive|78746
|
||||
375|17|M|entertainment|37777
|
||||
376|28|F|other|10010
|
||||
377|22|M|student|18015
|
||||
378|35|M|student|02859
|
||||
379|44|M|programmer|98117
|
||||
380|32|M|engineer|55117
|
||||
381|33|M|artist|94608
|
||||
382|45|M|engineer|01824
|
||||
383|42|M|administrator|75204
|
||||
384|52|M|programmer|45218
|
||||
385|36|M|writer|10003
|
||||
386|36|M|salesman|43221
|
||||
387|33|M|entertainment|37412
|
||||
388|31|M|other|36106
|
||||
389|44|F|writer|83702
|
||||
390|42|F|writer|85016
|
||||
391|23|M|student|84604
|
||||
392|52|M|writer|59801
|
||||
393|19|M|student|83686
|
||||
394|25|M|administrator|96819
|
||||
395|43|M|other|44092
|
||||
396|57|M|engineer|94551
|
||||
397|17|M|student|27514
|
||||
398|40|M|other|60008
|
||||
399|25|M|other|92374
|
||||
400|33|F|administrator|78213
|
||||
401|46|F|healthcare|84107
|
||||
402|30|M|engineer|95129
|
||||
403|37|M|other|06811
|
||||
404|29|F|programmer|55108
|
||||
405|22|F|healthcare|10019
|
||||
406|52|M|educator|93109
|
||||
407|29|M|engineer|03261
|
||||
408|23|M|student|61755
|
||||
409|48|M|administrator|98225
|
||||
410|30|F|artist|94025
|
||||
411|34|M|educator|44691
|
||||
412|25|M|educator|15222
|
||||
413|55|M|educator|78212
|
||||
414|24|M|programmer|38115
|
||||
415|39|M|educator|85711
|
||||
416|20|F|student|92626
|
||||
417|27|F|other|48103
|
||||
418|55|F|none|21206
|
||||
419|37|M|lawyer|43215
|
||||
420|53|M|educator|02140
|
||||
421|38|F|programmer|55105
|
||||
422|26|M|entertainment|94533
|
||||
423|64|M|other|91606
|
||||
424|36|F|marketing|55422
|
||||
425|19|M|student|58644
|
||||
426|55|M|educator|01602
|
||||
427|51|M|doctor|85258
|
||||
428|28|M|student|55414
|
||||
429|27|M|student|29205
|
||||
430|38|M|scientist|98199
|
||||
431|24|M|marketing|92629
|
||||
432|22|M|entertainment|50311
|
||||
433|27|M|artist|11211
|
||||
434|16|F|student|49705
|
||||
435|24|M|engineer|60007
|
||||
436|30|F|administrator|17345
|
||||
437|27|F|other|20009
|
||||
438|51|F|administrator|43204
|
||||
439|23|F|administrator|20817
|
||||
440|30|M|other|48076
|
||||
441|50|M|technician|55013
|
||||
442|22|M|student|85282
|
||||
443|35|M|salesman|33308
|
||||
444|51|F|lawyer|53202
|
||||
445|21|M|writer|92653
|
||||
446|57|M|educator|60201
|
||||
447|30|M|administrator|55113
|
||||
448|23|M|entertainment|10021
|
||||
449|23|M|librarian|55021
|
||||
450|35|F|educator|11758
|
||||
451|16|M|student|48446
|
||||
452|35|M|administrator|28018
|
||||
453|18|M|student|06333
|
||||
454|57|M|other|97330
|
||||
455|48|M|administrator|83709
|
||||
456|24|M|technician|31820
|
||||
457|33|F|salesman|30011
|
||||
458|47|M|technician|Y1A6B
|
||||
459|22|M|student|29201
|
||||
460|44|F|other|60630
|
||||
461|15|M|student|98102
|
||||
462|19|F|student|02918
|
||||
463|48|F|healthcare|75218
|
||||
464|60|M|writer|94583
|
||||
465|32|M|other|05001
|
||||
466|22|M|student|90804
|
||||
467|29|M|engineer|91201
|
||||
468|28|M|engineer|02341
|
||||
469|60|M|educator|78628
|
||||
470|24|M|programmer|10021
|
||||
471|10|M|student|77459
|
||||
472|24|M|student|87544
|
||||
473|29|M|student|94708
|
||||
474|51|M|executive|93711
|
||||
475|30|M|programmer|75230
|
||||
476|28|M|student|60440
|
||||
477|23|F|student|02125
|
||||
478|29|M|other|10019
|
||||
479|30|M|educator|55409
|
||||
480|57|M|retired|98257
|
||||
481|73|M|retired|37771
|
||||
482|18|F|student|40256
|
||||
483|29|M|scientist|43212
|
||||
484|27|M|student|21208
|
||||
485|44|F|educator|95821
|
||||
486|39|M|educator|93101
|
||||
487|22|M|engineer|92121
|
||||
488|48|M|technician|21012
|
||||
489|55|M|other|45218
|
||||
490|29|F|artist|V5A2B
|
||||
491|43|F|writer|53711
|
||||
492|57|M|educator|94618
|
||||
493|22|M|engineer|60090
|
||||
494|38|F|administrator|49428
|
||||
495|29|M|engineer|03052
|
||||
496|21|F|student|55414
|
||||
497|20|M|student|50112
|
||||
498|26|M|writer|55408
|
||||
499|42|M|programmer|75006
|
||||
500|28|M|administrator|94305
|
||||
501|22|M|student|10025
|
||||
502|22|M|student|23092
|
||||
503|50|F|writer|27514
|
||||
504|40|F|writer|92115
|
||||
505|27|F|other|20657
|
||||
506|46|M|programmer|03869
|
||||
507|18|F|writer|28450
|
||||
508|27|M|marketing|19382
|
||||
509|23|M|administrator|10011
|
||||
510|34|M|other|98038
|
||||
511|22|M|student|21250
|
||||
512|29|M|other|20090
|
||||
513|43|M|administrator|26241
|
||||
514|27|M|programmer|20707
|
||||
515|53|M|marketing|49508
|
||||
516|53|F|librarian|10021
|
||||
517|24|M|student|55454
|
||||
518|49|F|writer|99709
|
||||
519|22|M|other|55320
|
||||
520|62|M|healthcare|12603
|
||||
521|19|M|student|02146
|
||||
522|36|M|engineer|55443
|
||||
523|50|F|administrator|04102
|
||||
524|56|M|educator|02159
|
||||
525|27|F|administrator|19711
|
||||
526|30|M|marketing|97124
|
||||
527|33|M|librarian|12180
|
||||
528|18|M|student|55104
|
||||
529|47|F|administrator|44224
|
||||
530|29|M|engineer|94040
|
||||
531|30|F|salesman|97408
|
||||
532|20|M|student|92705
|
||||
533|43|M|librarian|02324
|
||||
534|20|M|student|05464
|
||||
535|45|F|educator|80302
|
||||
536|38|M|engineer|30078
|
||||
537|36|M|engineer|22902
|
||||
538|31|M|scientist|21010
|
||||
539|53|F|administrator|80303
|
||||
540|28|M|engineer|91201
|
||||
541|19|F|student|84302
|
||||
542|21|M|student|60515
|
||||
543|33|M|scientist|95123
|
||||
544|44|F|other|29464
|
||||
545|27|M|technician|08052
|
||||
546|36|M|executive|22911
|
||||
547|50|M|educator|14534
|
||||
548|51|M|writer|95468
|
||||
549|42|M|scientist|45680
|
||||
550|16|F|student|95453
|
||||
551|25|M|programmer|55414
|
||||
552|45|M|other|68147
|
||||
553|58|M|educator|62901
|
||||
554|32|M|scientist|62901
|
||||
555|29|F|educator|23227
|
||||
556|35|F|educator|30606
|
||||
557|30|F|writer|11217
|
||||
558|56|F|writer|63132
|
||||
559|69|M|executive|10022
|
||||
560|32|M|student|10003
|
||||
561|23|M|engineer|60005
|
||||
562|54|F|administrator|20879
|
||||
563|39|F|librarian|32707
|
||||
564|65|M|retired|94591
|
||||
565|40|M|student|55422
|
||||
566|20|M|student|14627
|
||||
567|24|M|entertainment|10003
|
||||
568|39|M|educator|01915
|
||||
569|34|M|educator|91903
|
||||
570|26|M|educator|14627
|
||||
571|34|M|artist|01945
|
||||
572|51|M|educator|20003
|
||||
573|68|M|retired|48911
|
||||
574|56|M|educator|53188
|
||||
575|33|M|marketing|46032
|
||||
576|48|M|executive|98281
|
||||
577|36|F|student|77845
|
||||
578|31|M|administrator|M7A1A
|
||||
579|32|M|educator|48103
|
||||
580|16|M|student|17961
|
||||
581|37|M|other|94131
|
||||
582|17|M|student|93003
|
||||
583|44|M|engineer|29631
|
||||
584|25|M|student|27511
|
||||
585|69|M|librarian|98501
|
||||
586|20|M|student|79508
|
||||
587|26|M|other|14216
|
||||
588|18|F|student|93063
|
||||
589|21|M|lawyer|90034
|
||||
590|50|M|educator|82435
|
||||
591|57|F|librarian|92093
|
||||
592|18|M|student|97520
|
||||
593|31|F|educator|68767
|
||||
594|46|M|educator|M4J2K
|
||||
595|25|M|programmer|31909
|
||||
596|20|M|artist|77073
|
||||
597|23|M|other|84116
|
||||
598|40|F|marketing|43085
|
||||
599|22|F|student|R3T5K
|
||||
600|34|M|programmer|02320
|
||||
601|19|F|artist|99687
|
||||
602|47|F|other|34656
|
||||
603|21|M|programmer|47905
|
||||
604|39|M|educator|11787
|
||||
605|33|M|engineer|33716
|
||||
606|28|M|programmer|63044
|
||||
607|49|F|healthcare|02154
|
||||
608|22|M|other|10003
|
||||
609|13|F|student|55106
|
||||
610|22|M|student|21227
|
||||
611|46|M|librarian|77008
|
||||
612|36|M|educator|79070
|
||||
613|37|F|marketing|29678
|
||||
614|54|M|educator|80227
|
||||
615|38|M|educator|27705
|
||||
616|55|M|scientist|50613
|
||||
617|27|F|writer|11201
|
||||
618|15|F|student|44212
|
||||
619|17|M|student|44134
|
||||
620|18|F|writer|81648
|
||||
621|17|M|student|60402
|
||||
622|25|M|programmer|14850
|
||||
623|50|F|educator|60187
|
||||
624|19|M|student|30067
|
||||
625|27|M|programmer|20723
|
||||
626|23|M|scientist|19807
|
||||
627|24|M|engineer|08034
|
||||
628|13|M|none|94306
|
||||
629|46|F|other|44224
|
||||
630|26|F|healthcare|55408
|
||||
631|18|F|student|38866
|
||||
632|18|M|student|55454
|
||||
633|35|M|programmer|55414
|
||||
634|39|M|engineer|T8H1N
|
||||
635|22|M|other|23237
|
||||
636|47|M|educator|48043
|
||||
637|30|M|other|74101
|
||||
638|45|M|engineer|01940
|
||||
639|42|F|librarian|12065
|
||||
640|20|M|student|61801
|
||||
641|24|M|student|60626
|
||||
642|18|F|student|95521
|
||||
643|39|M|scientist|55122
|
||||
644|51|M|retired|63645
|
||||
645|27|M|programmer|53211
|
||||
646|17|F|student|51250
|
||||
647|40|M|educator|45810
|
||||
648|43|M|engineer|91351
|
||||
649|20|M|student|39762
|
||||
650|42|M|engineer|83814
|
||||
651|65|M|retired|02903
|
||||
652|35|M|other|22911
|
||||
653|31|M|executive|55105
|
||||
654|27|F|student|78739
|
||||
655|50|F|healthcare|60657
|
||||
656|48|M|educator|10314
|
||||
657|26|F|none|78704
|
||||
658|33|M|programmer|92626
|
||||
659|31|M|educator|54248
|
||||
660|26|M|student|77380
|
||||
661|28|M|programmer|98121
|
||||
662|55|M|librarian|19102
|
||||
663|26|M|other|19341
|
||||
664|30|M|engineer|94115
|
||||
665|25|M|administrator|55412
|
||||
666|44|M|administrator|61820
|
||||
667|35|M|librarian|01970
|
||||
668|29|F|writer|10016
|
||||
669|37|M|other|20009
|
||||
670|30|M|technician|21114
|
||||
671|21|M|programmer|91919
|
||||
672|54|F|administrator|90095
|
||||
673|51|M|educator|22906
|
||||
674|13|F|student|55337
|
||||
675|34|M|other|28814
|
||||
676|30|M|programmer|32712
|
||||
677|20|M|other|99835
|
||||
678|50|M|educator|61462
|
||||
679|20|F|student|54302
|
||||
680|33|M|lawyer|90405
|
||||
681|44|F|marketing|97208
|
||||
682|23|M|programmer|55128
|
||||
683|42|M|librarian|23509
|
||||
684|28|M|student|55414
|
||||
685|32|F|librarian|55409
|
||||
686|32|M|educator|26506
|
||||
687|31|F|healthcare|27713
|
||||
688|37|F|administrator|60476
|
||||
689|25|M|other|45439
|
||||
690|35|M|salesman|63304
|
||||
691|34|M|educator|60089
|
||||
692|34|M|engineer|18053
|
||||
693|43|F|healthcare|85210
|
||||
694|60|M|programmer|06365
|
||||
695|26|M|writer|38115
|
||||
696|55|M|other|94920
|
||||
697|25|M|other|77042
|
||||
698|28|F|programmer|06906
|
||||
699|44|M|other|96754
|
||||
700|17|M|student|76309
|
||||
701|51|F|librarian|56321
|
||||
702|37|M|other|89104
|
||||
703|26|M|educator|49512
|
||||
704|51|F|librarian|91105
|
||||
705|21|F|student|54494
|
||||
706|23|M|student|55454
|
||||
707|56|F|librarian|19146
|
||||
708|26|F|homemaker|96349
|
||||
709|21|M|other|N4T1A
|
||||
710|19|M|student|92020
|
||||
711|22|F|student|15203
|
||||
712|22|F|student|54901
|
||||
713|42|F|other|07204
|
||||
714|26|M|engineer|55343
|
||||
715|21|M|technician|91206
|
||||
716|36|F|administrator|44265
|
||||
717|24|M|technician|84105
|
||||
718|42|M|technician|64118
|
||||
719|37|F|other|V0R2H
|
||||
720|49|F|administrator|16506
|
||||
721|24|F|entertainment|11238
|
||||
722|50|F|homemaker|17331
|
||||
723|26|M|executive|94403
|
||||
724|31|M|executive|40243
|
||||
725|21|M|student|91711
|
||||
726|25|F|administrator|80538
|
||||
727|25|M|student|78741
|
||||
728|58|M|executive|94306
|
||||
729|19|M|student|56567
|
||||
730|31|F|scientist|32114
|
||||
731|41|F|educator|70403
|
||||
732|28|F|other|98405
|
||||
733|44|F|other|60630
|
||||
734|25|F|other|63108
|
||||
735|29|F|healthcare|85719
|
||||
736|48|F|writer|94618
|
||||
737|30|M|programmer|98072
|
||||
738|35|M|technician|95403
|
||||
739|35|M|technician|73162
|
||||
740|25|F|educator|22206
|
||||
741|25|M|writer|63108
|
||||
742|35|M|student|29210
|
||||
743|31|M|programmer|92660
|
||||
744|35|M|marketing|47024
|
||||
745|42|M|writer|55113
|
||||
746|25|M|engineer|19047
|
||||
747|19|M|other|93612
|
||||
748|28|M|administrator|94720
|
||||
749|33|M|other|80919
|
||||
750|28|M|administrator|32303
|
||||
751|24|F|other|90034
|
||||
752|60|M|retired|21201
|
||||
753|56|M|salesman|91206
|
||||
754|59|F|librarian|62901
|
||||
755|44|F|educator|97007
|
||||
756|30|F|none|90247
|
||||
757|26|M|student|55104
|
||||
758|27|M|student|53706
|
||||
759|20|F|student|68503
|
||||
760|35|F|other|14211
|
||||
761|17|M|student|97302
|
||||
762|32|M|administrator|95050
|
||||
763|27|M|scientist|02113
|
||||
764|27|F|educator|62903
|
||||
765|31|M|student|33066
|
||||
766|42|M|other|10960
|
||||
767|70|M|engineer|00000
|
||||
768|29|M|administrator|12866
|
||||
769|39|M|executive|06927
|
||||
770|28|M|student|14216
|
||||
771|26|M|student|15232
|
||||
772|50|M|writer|27105
|
||||
773|20|M|student|55414
|
||||
774|30|M|student|80027
|
||||
775|46|M|executive|90036
|
||||
776|30|M|librarian|51157
|
||||
777|63|M|programmer|01810
|
||||
778|34|M|student|01960
|
||||
779|31|M|student|K7L5J
|
||||
780|49|M|programmer|94560
|
||||
781|20|M|student|48825
|
||||
782|21|F|artist|33205
|
||||
783|30|M|marketing|77081
|
||||
784|47|M|administrator|91040
|
||||
785|32|M|engineer|23322
|
||||
786|36|F|engineer|01754
|
||||
787|18|F|student|98620
|
||||
788|51|M|administrator|05779
|
||||
789|29|M|other|55420
|
||||
790|27|M|technician|80913
|
||||
791|31|M|educator|20064
|
||||
792|40|M|programmer|12205
|
||||
793|22|M|student|85281
|
||||
794|32|M|educator|57197
|
||||
795|30|M|programmer|08610
|
||||
796|32|F|writer|33755
|
||||
797|44|F|other|62522
|
||||
798|40|F|writer|64131
|
||||
799|49|F|administrator|19716
|
||||
800|25|M|programmer|55337
|
||||
801|22|M|writer|92154
|
||||
802|35|M|administrator|34105
|
||||
803|70|M|administrator|78212
|
||||
804|39|M|educator|61820
|
||||
805|27|F|other|20009
|
||||
806|27|M|marketing|11217
|
||||
807|41|F|healthcare|93555
|
||||
808|45|M|salesman|90016
|
||||
809|50|F|marketing|30803
|
||||
810|55|F|other|80526
|
||||
811|40|F|educator|73013
|
||||
812|22|M|technician|76234
|
||||
813|14|F|student|02136
|
||||
814|30|M|other|12345
|
||||
815|32|M|other|28806
|
||||
816|34|M|other|20755
|
||||
817|19|M|student|60152
|
||||
818|28|M|librarian|27514
|
||||
819|59|M|administrator|40205
|
||||
820|22|M|student|37725
|
||||
821|37|M|engineer|77845
|
||||
822|29|F|librarian|53144
|
||||
823|27|M|artist|50322
|
||||
824|31|M|other|15017
|
||||
825|44|M|engineer|05452
|
||||
826|28|M|artist|77048
|
||||
827|23|F|engineer|80228
|
||||
828|28|M|librarian|85282
|
||||
829|48|M|writer|80209
|
||||
830|46|M|programmer|53066
|
||||
831|21|M|other|33765
|
||||
832|24|M|technician|77042
|
||||
833|34|M|writer|90019
|
||||
834|26|M|other|64153
|
||||
835|44|F|executive|11577
|
||||
836|44|M|artist|10018
|
||||
837|36|F|artist|55409
|
||||
838|23|M|student|01375
|
||||
839|38|F|entertainment|90814
|
||||
840|39|M|artist|55406
|
||||
841|45|M|doctor|47401
|
||||
842|40|M|writer|93055
|
||||
843|35|M|librarian|44212
|
||||
844|22|M|engineer|95662
|
||||
845|64|M|doctor|97405
|
||||
846|27|M|lawyer|47130
|
||||
847|29|M|student|55417
|
||||
848|46|M|engineer|02146
|
||||
849|15|F|student|25652
|
||||
850|34|M|technician|78390
|
||||
851|18|M|other|29646
|
||||
852|46|M|administrator|94086
|
||||
853|49|M|writer|40515
|
||||
854|29|F|student|55408
|
||||
855|53|M|librarian|04988
|
||||
856|43|F|marketing|97215
|
||||
857|35|F|administrator|V1G4L
|
||||
858|63|M|educator|09645
|
||||
859|18|F|other|06492
|
||||
860|70|F|retired|48322
|
||||
861|38|F|student|14085
|
||||
862|25|M|executive|13820
|
||||
863|17|M|student|60089
|
||||
864|27|M|programmer|63021
|
||||
865|25|M|artist|11231
|
||||
866|45|M|other|60302
|
||||
867|24|M|scientist|92507
|
||||
868|21|M|programmer|55303
|
||||
869|30|M|student|10025
|
||||
870|22|M|student|65203
|
||||
871|31|M|executive|44648
|
||||
872|19|F|student|74078
|
||||
873|48|F|administrator|33763
|
||||
874|36|M|scientist|37076
|
||||
875|24|F|student|35802
|
||||
876|41|M|other|20902
|
||||
877|30|M|other|77504
|
||||
878|50|F|educator|98027
|
||||
879|33|F|administrator|55337
|
||||
880|13|M|student|83702
|
||||
881|39|M|marketing|43017
|
||||
882|35|M|engineer|40503
|
||||
883|49|M|librarian|50266
|
||||
884|44|M|engineer|55337
|
||||
885|30|F|other|95316
|
||||
886|20|M|student|61820
|
||||
887|14|F|student|27249
|
||||
888|41|M|scientist|17036
|
||||
889|24|M|technician|78704
|
||||
890|32|M|student|97301
|
||||
891|51|F|administrator|03062
|
||||
892|36|M|other|45243
|
||||
893|25|M|student|95823
|
||||
894|47|M|educator|74075
|
||||
895|31|F|librarian|32301
|
||||
896|28|M|writer|91505
|
||||
897|30|M|other|33484
|
||||
898|23|M|homemaker|61755
|
||||
899|32|M|other|55116
|
||||
900|60|M|retired|18505
|
||||
901|38|M|executive|L1V3W
|
||||
902|45|F|artist|97203
|
||||
903|28|M|educator|20850
|
||||
904|17|F|student|61073
|
||||
905|27|M|other|30350
|
||||
906|45|M|librarian|70124
|
||||
907|25|F|other|80526
|
||||
908|44|F|librarian|68504
|
||||
909|50|F|educator|53171
|
||||
910|28|M|healthcare|29301
|
||||
911|37|F|writer|53210
|
||||
912|51|M|other|06512
|
||||
913|27|M|student|76201
|
||||
914|44|F|other|08105
|
||||
915|50|M|entertainment|60614
|
||||
916|27|M|engineer|N2L5N
|
||||
917|22|F|student|20006
|
||||
918|40|M|scientist|70116
|
||||
919|25|M|other|14216
|
||||
920|30|F|artist|90008
|
||||
921|20|F|student|98801
|
||||
922|29|F|administrator|21114
|
||||
923|21|M|student|E2E3R
|
||||
924|29|M|other|11753
|
||||
925|18|F|salesman|49036
|
||||
926|49|M|entertainment|01701
|
||||
927|23|M|programmer|55428
|
||||
928|21|M|student|55408
|
||||
929|44|M|scientist|53711
|
||||
930|28|F|scientist|07310
|
||||
931|60|M|educator|33556
|
||||
932|58|M|educator|06437
|
||||
933|28|M|student|48105
|
||||
934|61|M|engineer|22902
|
||||
935|42|M|doctor|66221
|
||||
936|24|M|other|32789
|
||||
937|48|M|educator|98072
|
||||
938|38|F|technician|55038
|
||||
939|26|F|student|33319
|
||||
940|32|M|administrator|02215
|
||||
941|20|M|student|97229
|
||||
942|48|F|librarian|78209
|
||||
943|22|M|student|77841
|
80000
Datasets/ml-100k/u1.base
Normal file
80000
Datasets/ml-100k/u1.base
Normal file
File diff suppressed because it is too large
Load Diff
20000
Datasets/ml-100k/u1.test
Normal file
20000
Datasets/ml-100k/u1.test
Normal file
File diff suppressed because it is too large
Load Diff
80000
Datasets/ml-100k/u2.base
Normal file
80000
Datasets/ml-100k/u2.base
Normal file
File diff suppressed because it is too large
Load Diff
20000
Datasets/ml-100k/u2.test
Normal file
20000
Datasets/ml-100k/u2.test
Normal file
File diff suppressed because it is too large
Load Diff
80000
Datasets/ml-100k/u3.base
Normal file
80000
Datasets/ml-100k/u3.base
Normal file
File diff suppressed because it is too large
Load Diff
20000
Datasets/ml-100k/u3.test
Normal file
20000
Datasets/ml-100k/u3.test
Normal file
File diff suppressed because it is too large
Load Diff
80000
Datasets/ml-100k/u4.base
Normal file
80000
Datasets/ml-100k/u4.base
Normal file
File diff suppressed because it is too large
Load Diff
20000
Datasets/ml-100k/u4.test
Normal file
20000
Datasets/ml-100k/u4.test
Normal file
File diff suppressed because it is too large
Load Diff
80000
Datasets/ml-100k/u5.base
Normal file
80000
Datasets/ml-100k/u5.base
Normal file
File diff suppressed because it is too large
Load Diff
20000
Datasets/ml-100k/u5.test
Normal file
20000
Datasets/ml-100k/u5.test
Normal file
File diff suppressed because it is too large
Load Diff
90570
Datasets/ml-100k/ua.base
Normal file
90570
Datasets/ml-100k/ua.base
Normal file
File diff suppressed because it is too large
Load Diff
9430
Datasets/ml-100k/ua.test
Normal file
9430
Datasets/ml-100k/ua.test
Normal file
File diff suppressed because it is too large
Load Diff
90570
Datasets/ml-100k/ub.base
Normal file
90570
Datasets/ml-100k/ub.base
Normal file
File diff suppressed because it is too large
Load Diff
9430
Datasets/ml-100k/ub.test
Normal file
9430
Datasets/ml-100k/ub.test
Normal file
File diff suppressed because it is too large
Load Diff
725
P0. Data preparation.ipynb
Normal file
725
P0. Data preparation.ipynb
Normal file
File diff suppressed because one or more lines are too long
1256
P1. Baseline.ipynb
Normal file
1256
P1. Baseline.ipynb
Normal file
File diff suppressed because it is too large
Load Diff
2414
P2. Evaluation.ipynb
Normal file
2414
P2. Evaluation.ipynb
Normal file
File diff suppressed because it is too large
Load Diff
957
P3. k-nearest neighbours.ipynb
Normal file
957
P3. k-nearest neighbours.ipynb
Normal file
@ -0,0 +1,957 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Self made simplified I-KNN"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import helpers\n",
|
||||
"import pandas as pd\n",
|
||||
"import numpy as np\n",
|
||||
"import scipy.sparse as sparse\n",
|
||||
"from collections import defaultdict\n",
|
||||
"from itertools import chain\n",
|
||||
"import random\n",
|
||||
"\n",
|
||||
"train_read=pd.read_csv('./Datasets/ml-100k/train.csv', sep='\\t', header=None)\n",
|
||||
"test_read=pd.read_csv('./Datasets/ml-100k/test.csv', sep='\\t', header=None)\n",
|
||||
"train_ui, test_ui, user_code_id, user_id_code, item_code_id, item_id_code = helpers.data_to_csr(train_read, test_read)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"class IKNN():\n",
|
||||
" \n",
|
||||
" def fit(self, train_ui):\n",
|
||||
" self.train_ui=train_ui\n",
|
||||
" \n",
|
||||
" train_iu=train_ui.transpose()\n",
|
||||
" norms=np.linalg.norm(train_iu.A, axis=1) # here we compute lenth of each item ratings vector\n",
|
||||
" norms=np.vectorize(lambda x: max(x,1))(norms[:,None]) # to avoid dividing by zero\n",
|
||||
"\n",
|
||||
" normalized_train_iu=sparse.csr_matrix(train_iu/norms)\n",
|
||||
"\n",
|
||||
" self.similarity_matrix_ii=normalized_train_iu*normalized_train_iu.transpose()\n",
|
||||
" \n",
|
||||
" self.estimations=np.array(train_ui*self.similarity_matrix_ii/((train_ui>0)*self.similarity_matrix_ii))\n",
|
||||
" \n",
|
||||
" def recommend(self, user_code_id, item_code_id, topK=10):\n",
|
||||
" \n",
|
||||
" top_k = defaultdict(list)\n",
|
||||
" for nb_user, user in enumerate(self.estimations):\n",
|
||||
" \n",
|
||||
" user_rated=self.train_ui.indices[self.train_ui.indptr[nb_user]:self.train_ui.indptr[nb_user+1]]\n",
|
||||
" for item, score in enumerate(user):\n",
|
||||
" if item not in user_rated and not np.isnan(score):\n",
|
||||
" top_k[user_code_id[nb_user]].append((item_code_id[item], score))\n",
|
||||
" result=[]\n",
|
||||
" # Let's choose k best items in the format: (user, item1, score1, item2, score2, ...)\n",
|
||||
" for uid, item_scores in top_k.items():\n",
|
||||
" item_scores.sort(key=lambda x: x[1], reverse=True)\n",
|
||||
" result.append([uid]+list(chain(*item_scores[:topK])))\n",
|
||||
" return result\n",
|
||||
" \n",
|
||||
" def estimate(self, user_code_id, item_code_id, test_ui):\n",
|
||||
" result=[]\n",
|
||||
" for user, item in zip(*test_ui.nonzero()):\n",
|
||||
" result.append([user_code_id[user], item_code_id[item], \n",
|
||||
" self.estimations[user,item] if not np.isnan(self.estimations[user,item]) else 1])\n",
|
||||
" return result"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"toy train ui:\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"array([[3, 4, 0, 0, 5, 0, 0, 4],\n",
|
||||
" [0, 1, 2, 3, 0, 0, 0, 0],\n",
|
||||
" [0, 0, 0, 5, 0, 3, 4, 0]], dtype=int64)"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"similarity matrix:\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"array([[1. , 0.9701425 , 0. , 0. , 1. ,\n",
|
||||
" 0. , 0. , 1. ],\n",
|
||||
" [0.9701425 , 1. , 0.24253563, 0.12478355, 0.9701425 ,\n",
|
||||
" 0. , 0. , 0.9701425 ],\n",
|
||||
" [0. , 0.24253563, 1. , 0.51449576, 0. ,\n",
|
||||
" 0. , 0. , 0. ],\n",
|
||||
" [0. , 0.12478355, 0.51449576, 1. , 0. ,\n",
|
||||
" 0.85749293, 0.85749293, 0. ],\n",
|
||||
" [1. , 0.9701425 , 0. , 0. , 1. ,\n",
|
||||
" 0. , 0. , 1. ],\n",
|
||||
" [0. , 0. , 0. , 0.85749293, 0. ,\n",
|
||||
" 1. , 1. , 0. ],\n",
|
||||
" [0. , 0. , 0. , 0.85749293, 0. ,\n",
|
||||
" 1. , 1. , 0. ],\n",
|
||||
" [1. , 0.9701425 , 0. , 0. , 1. ,\n",
|
||||
" 0. , 0. , 1. ]])"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"estimations matrix:\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"array([[4. , 4. , 4. , 4. , 4. ,\n",
|
||||
" nan, nan, 4. ],\n",
|
||||
" [1. , 1.35990333, 2.15478388, 2.53390319, 1. ,\n",
|
||||
" 3. , 3. , 1. ],\n",
|
||||
" [ nan, 5. , 5. , 4.05248907, nan,\n",
|
||||
" 3.95012863, 3.95012863, nan]])"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[[0, 20, 4.0, 30, 4.0],\n",
|
||||
" [10, 50, 3.0, 60, 3.0, 0, 1.0, 40, 1.0, 70, 1.0],\n",
|
||||
" [20, 10, 5.0, 20, 5.0]]"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# toy example\n",
|
||||
"toy_train_read=pd.read_csv('./Datasets/toy-example/train.csv', sep='\\t', header=None, names=['user', 'item', 'rating', 'timestamp'])\n",
|
||||
"toy_test_read=pd.read_csv('./Datasets/toy-example/test.csv', sep='\\t', header=None, names=['user', 'item', 'rating', 'timestamp'])\n",
|
||||
"\n",
|
||||
"toy_train_ui, toy_test_ui, toy_user_code_id, toy_user_id_code, \\\n",
|
||||
"toy_item_code_id, toy_item_id_code = helpers.data_to_csr(toy_train_read, toy_test_read)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"model=IKNN()\n",
|
||||
"model.fit(toy_train_ui)\n",
|
||||
"\n",
|
||||
"print('toy train ui:')\n",
|
||||
"display(toy_train_ui.A)\n",
|
||||
"\n",
|
||||
"print('similarity matrix:')\n",
|
||||
"display(model.similarity_matrix_ii.A)\n",
|
||||
"\n",
|
||||
"print('estimations matrix:')\n",
|
||||
"display(model.estimations)\n",
|
||||
"\n",
|
||||
"model.recommend(toy_user_code_id, toy_item_code_id)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"model=IKNN()\n",
|
||||
"model.fit(train_ui)\n",
|
||||
"\n",
|
||||
"top_n=pd.DataFrame(model.recommend(user_code_id, item_code_id, topK=10))\n",
|
||||
"\n",
|
||||
"top_n.to_csv('Recommendations generated/ml-100k/Self_IKNN_reco.csv', index=False, header=False)\n",
|
||||
"\n",
|
||||
"estimations=pd.DataFrame(model.estimate(user_code_id, item_code_id, test_ui))\n",
|
||||
"estimations.to_csv('Recommendations generated/ml-100k/Self_IKNN_estimations.csv', index=False, header=False)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"943it [00:00, 8845.73it/s]\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<div>\n",
|
||||
"<style scoped>\n",
|
||||
" .dataframe tbody tr th:only-of-type {\n",
|
||||
" vertical-align: middle;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe tbody tr th {\n",
|
||||
" vertical-align: top;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe thead th {\n",
|
||||
" text-align: right;\n",
|
||||
" }\n",
|
||||
"</style>\n",
|
||||
"<table border=\"1\" class=\"dataframe\">\n",
|
||||
" <thead>\n",
|
||||
" <tr style=\"text-align: right;\">\n",
|
||||
" <th></th>\n",
|
||||
" <th>RMSE</th>\n",
|
||||
" <th>MAE</th>\n",
|
||||
" <th>precision</th>\n",
|
||||
" <th>recall</th>\n",
|
||||
" <th>F_1</th>\n",
|
||||
" <th>F_05</th>\n",
|
||||
" <th>precision_super</th>\n",
|
||||
" <th>recall_super</th>\n",
|
||||
" <th>NDCG</th>\n",
|
||||
" <th>mAP</th>\n",
|
||||
" <th>MRR</th>\n",
|
||||
" <th>LAUC</th>\n",
|
||||
" <th>HR</th>\n",
|
||||
" <th>Reco in test</th>\n",
|
||||
" <th>Test coverage</th>\n",
|
||||
" <th>Shannon</th>\n",
|
||||
" <th>Gini</th>\n",
|
||||
" </tr>\n",
|
||||
" </thead>\n",
|
||||
" <tbody>\n",
|
||||
" <tr>\n",
|
||||
" <th>0</th>\n",
|
||||
" <td>1.018363</td>\n",
|
||||
" <td>0.808793</td>\n",
|
||||
" <td>0.000318</td>\n",
|
||||
" <td>0.000108</td>\n",
|
||||
" <td>0.00014</td>\n",
|
||||
" <td>0.000189</td>\n",
|
||||
" <td>0.0</td>\n",
|
||||
" <td>0.0</td>\n",
|
||||
" <td>0.000214</td>\n",
|
||||
" <td>0.000037</td>\n",
|
||||
" <td>0.000368</td>\n",
|
||||
" <td>0.496391</td>\n",
|
||||
" <td>0.003181</td>\n",
|
||||
" <td>0.392153</td>\n",
|
||||
" <td>0.11544</td>\n",
|
||||
" <td>4.174741</td>\n",
|
||||
" <td>0.965327</td>\n",
|
||||
" </tr>\n",
|
||||
" </tbody>\n",
|
||||
"</table>\n",
|
||||
"</div>"
|
||||
],
|
||||
"text/plain": [
|
||||
" RMSE MAE precision recall F_1 F_05 \\\n",
|
||||
"0 1.018363 0.808793 0.000318 0.000108 0.00014 0.000189 \n",
|
||||
"\n",
|
||||
" precision_super recall_super NDCG mAP MRR LAUC \\\n",
|
||||
"0 0.0 0.0 0.000214 0.000037 0.000368 0.496391 \n",
|
||||
"\n",
|
||||
" HR Reco in test Test coverage Shannon Gini \n",
|
||||
"0 0.003181 0.392153 0.11544 4.174741 0.965327 "
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import evaluation_measures as ev\n",
|
||||
"estimations_df=pd.read_csv('Recommendations generated/ml-100k/Self_IKNN_estimations.csv', header=None)\n",
|
||||
"reco=np.loadtxt('Recommendations generated/ml-100k/Self_IKNN_reco.csv', delimiter=',')\n",
|
||||
"\n",
|
||||
"ev.evaluate(test=pd.read_csv('./Datasets/ml-100k/test.csv', sep='\\t', header=None),\n",
|
||||
" estimations_df=estimations_df, \n",
|
||||
" reco=reco,\n",
|
||||
" super_reactions=[4,5])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"943it [00:00, 7423.18it/s]\n",
|
||||
"943it [00:00, 7890.87it/s]\n",
|
||||
"943it [00:00, 7370.82it/s]\n",
|
||||
"943it [00:00, 8035.93it/s]\n",
|
||||
"943it [00:00, 8071.70it/s]\n",
|
||||
"943it [00:00, 7893.80it/s]\n",
|
||||
"943it [00:00, 8159.55it/s]\n",
|
||||
"943it [00:00, 7982.77it/s]\n",
|
||||
"943it [00:00, 7514.53it/s]\n",
|
||||
"943it [00:00, 8047.34it/s]\n",
|
||||
"943it [00:00, 7874.80it/s]\n",
|
||||
"943it [00:00, 7657.62it/s]\n",
|
||||
"943it [00:00, 8281.73it/s]\n",
|
||||
"943it [00:00, 8253.33it/s]\n",
|
||||
"943it [00:00, 8332.31it/s]\n",
|
||||
"943it [00:00, 8348.73it/s]\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<div>\n",
|
||||
"<style scoped>\n",
|
||||
" .dataframe tbody tr th:only-of-type {\n",
|
||||
" vertical-align: middle;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe tbody tr th {\n",
|
||||
" vertical-align: top;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe thead th {\n",
|
||||
" text-align: right;\n",
|
||||
" }\n",
|
||||
"</style>\n",
|
||||
"<table border=\"1\" class=\"dataframe\">\n",
|
||||
" <thead>\n",
|
||||
" <tr style=\"text-align: right;\">\n",
|
||||
" <th></th>\n",
|
||||
" <th>Model</th>\n",
|
||||
" <th>RMSE</th>\n",
|
||||
" <th>MAE</th>\n",
|
||||
" <th>precision</th>\n",
|
||||
" <th>recall</th>\n",
|
||||
" <th>F_1</th>\n",
|
||||
" <th>F_05</th>\n",
|
||||
" <th>precision_super</th>\n",
|
||||
" <th>recall_super</th>\n",
|
||||
" <th>NDCG</th>\n",
|
||||
" <th>mAP</th>\n",
|
||||
" <th>MRR</th>\n",
|
||||
" <th>LAUC</th>\n",
|
||||
" <th>HR</th>\n",
|
||||
" <th>Reco in test</th>\n",
|
||||
" <th>Test coverage</th>\n",
|
||||
" <th>Shannon</th>\n",
|
||||
" <th>Gini</th>\n",
|
||||
" </tr>\n",
|
||||
" </thead>\n",
|
||||
" <tbody>\n",
|
||||
" <tr>\n",
|
||||
" <th>0</th>\n",
|
||||
" <td>Self_RP3Beta</td>\n",
|
||||
" <td>3.702446</td>\n",
|
||||
" <td>3.527273</td>\n",
|
||||
" <td>0.282185</td>\n",
|
||||
" <td>0.192092</td>\n",
|
||||
" <td>0.186749</td>\n",
|
||||
" <td>0.216980</td>\n",
|
||||
" <td>0.204185</td>\n",
|
||||
" <td>0.240096</td>\n",
|
||||
" <td>0.339114</td>\n",
|
||||
" <td>0.204905</td>\n",
|
||||
" <td>0.572157</td>\n",
|
||||
" <td>0.593544</td>\n",
|
||||
" <td>0.875928</td>\n",
|
||||
" <td>1.000000</td>\n",
|
||||
" <td>0.077201</td>\n",
|
||||
" <td>3.875892</td>\n",
|
||||
" <td>0.974947</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>0</th>\n",
|
||||
" <td>Self_TopPop</td>\n",
|
||||
" <td>2.508258</td>\n",
|
||||
" <td>2.217909</td>\n",
|
||||
" <td>0.188865</td>\n",
|
||||
" <td>0.116919</td>\n",
|
||||
" <td>0.118732</td>\n",
|
||||
" <td>0.141584</td>\n",
|
||||
" <td>0.130472</td>\n",
|
||||
" <td>0.137473</td>\n",
|
||||
" <td>0.214651</td>\n",
|
||||
" <td>0.111707</td>\n",
|
||||
" <td>0.400939</td>\n",
|
||||
" <td>0.555546</td>\n",
|
||||
" <td>0.765642</td>\n",
|
||||
" <td>1.000000</td>\n",
|
||||
" <td>0.038961</td>\n",
|
||||
" <td>3.159079</td>\n",
|
||||
" <td>0.987317</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>0</th>\n",
|
||||
" <td>Ready_SVD</td>\n",
|
||||
" <td>0.952784</td>\n",
|
||||
" <td>0.750597</td>\n",
|
||||
" <td>0.095228</td>\n",
|
||||
" <td>0.047497</td>\n",
|
||||
" <td>0.053142</td>\n",
|
||||
" <td>0.067082</td>\n",
|
||||
" <td>0.084871</td>\n",
|
||||
" <td>0.076457</td>\n",
|
||||
" <td>0.109075</td>\n",
|
||||
" <td>0.050124</td>\n",
|
||||
" <td>0.241366</td>\n",
|
||||
" <td>0.520459</td>\n",
|
||||
" <td>0.499470</td>\n",
|
||||
" <td>0.992047</td>\n",
|
||||
" <td>0.217893</td>\n",
|
||||
" <td>4.405246</td>\n",
|
||||
" <td>0.953484</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>0</th>\n",
|
||||
" <td>Self_SVDBaseline</td>\n",
|
||||
" <td>0.930321</td>\n",
|
||||
" <td>0.734643</td>\n",
|
||||
" <td>0.092683</td>\n",
|
||||
" <td>0.042046</td>\n",
|
||||
" <td>0.048568</td>\n",
|
||||
" <td>0.063218</td>\n",
|
||||
" <td>0.082940</td>\n",
|
||||
" <td>0.068730</td>\n",
|
||||
" <td>0.098937</td>\n",
|
||||
" <td>0.044405</td>\n",
|
||||
" <td>0.203936</td>\n",
|
||||
" <td>0.517696</td>\n",
|
||||
" <td>0.469777</td>\n",
|
||||
" <td>1.000000</td>\n",
|
||||
" <td>0.058442</td>\n",
|
||||
" <td>3.085857</td>\n",
|
||||
" <td>0.988824</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>0</th>\n",
|
||||
" <td>Ready_SVDBiased</td>\n",
|
||||
" <td>0.940375</td>\n",
|
||||
" <td>0.742264</td>\n",
|
||||
" <td>0.092153</td>\n",
|
||||
" <td>0.039645</td>\n",
|
||||
" <td>0.046804</td>\n",
|
||||
" <td>0.061886</td>\n",
|
||||
" <td>0.079399</td>\n",
|
||||
" <td>0.055967</td>\n",
|
||||
" <td>0.102017</td>\n",
|
||||
" <td>0.047972</td>\n",
|
||||
" <td>0.216876</td>\n",
|
||||
" <td>0.516515</td>\n",
|
||||
" <td>0.441145</td>\n",
|
||||
" <td>0.997455</td>\n",
|
||||
" <td>0.167388</td>\n",
|
||||
" <td>4.235348</td>\n",
|
||||
" <td>0.962085</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>0</th>\n",
|
||||
" <td>Ready_Baseline</td>\n",
|
||||
" <td>0.949459</td>\n",
|
||||
" <td>0.752487</td>\n",
|
||||
" <td>0.091410</td>\n",
|
||||
" <td>0.037652</td>\n",
|
||||
" <td>0.046030</td>\n",
|
||||
" <td>0.061286</td>\n",
|
||||
" <td>0.079614</td>\n",
|
||||
" <td>0.056463</td>\n",
|
||||
" <td>0.095957</td>\n",
|
||||
" <td>0.043178</td>\n",
|
||||
" <td>0.198193</td>\n",
|
||||
" <td>0.515501</td>\n",
|
||||
" <td>0.437964</td>\n",
|
||||
" <td>1.000000</td>\n",
|
||||
" <td>0.033911</td>\n",
|
||||
" <td>2.836513</td>\n",
|
||||
" <td>0.991139</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>0</th>\n",
|
||||
" <td>Self_SVD</td>\n",
|
||||
" <td>0.939326</td>\n",
|
||||
" <td>0.740022</td>\n",
|
||||
" <td>0.074549</td>\n",
|
||||
" <td>0.031755</td>\n",
|
||||
" <td>0.038425</td>\n",
|
||||
" <td>0.050562</td>\n",
|
||||
" <td>0.065665</td>\n",
|
||||
" <td>0.050602</td>\n",
|
||||
" <td>0.077117</td>\n",
|
||||
" <td>0.031574</td>\n",
|
||||
" <td>0.165509</td>\n",
|
||||
" <td>0.512485</td>\n",
|
||||
" <td>0.414634</td>\n",
|
||||
" <td>0.981866</td>\n",
|
||||
" <td>0.080087</td>\n",
|
||||
" <td>3.858982</td>\n",
|
||||
" <td>0.975271</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>0</th>\n",
|
||||
" <td>Self_GlobalAvg</td>\n",
|
||||
" <td>1.125760</td>\n",
|
||||
" <td>0.943534</td>\n",
|
||||
" <td>0.061188</td>\n",
|
||||
" <td>0.025968</td>\n",
|
||||
" <td>0.031383</td>\n",
|
||||
" <td>0.041343</td>\n",
|
||||
" <td>0.040558</td>\n",
|
||||
" <td>0.032107</td>\n",
|
||||
" <td>0.067695</td>\n",
|
||||
" <td>0.027470</td>\n",
|
||||
" <td>0.171187</td>\n",
|
||||
" <td>0.509546</td>\n",
|
||||
" <td>0.384942</td>\n",
|
||||
" <td>1.000000</td>\n",
|
||||
" <td>0.025974</td>\n",
|
||||
" <td>2.711772</td>\n",
|
||||
" <td>0.992003</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>0</th>\n",
|
||||
" <td>Ready_Random</td>\n",
|
||||
" <td>1.518551</td>\n",
|
||||
" <td>1.218784</td>\n",
|
||||
" <td>0.050583</td>\n",
|
||||
" <td>0.024085</td>\n",
|
||||
" <td>0.027323</td>\n",
|
||||
" <td>0.034826</td>\n",
|
||||
" <td>0.031223</td>\n",
|
||||
" <td>0.026436</td>\n",
|
||||
" <td>0.054902</td>\n",
|
||||
" <td>0.020652</td>\n",
|
||||
" <td>0.137928</td>\n",
|
||||
" <td>0.508570</td>\n",
|
||||
" <td>0.353128</td>\n",
|
||||
" <td>0.987699</td>\n",
|
||||
" <td>0.183261</td>\n",
|
||||
" <td>5.093805</td>\n",
|
||||
" <td>0.908215</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>0</th>\n",
|
||||
" <td>Ready_I-KNN</td>\n",
|
||||
" <td>1.030386</td>\n",
|
||||
" <td>0.813067</td>\n",
|
||||
" <td>0.026087</td>\n",
|
||||
" <td>0.006908</td>\n",
|
||||
" <td>0.010593</td>\n",
|
||||
" <td>0.016046</td>\n",
|
||||
" <td>0.021137</td>\n",
|
||||
" <td>0.009522</td>\n",
|
||||
" <td>0.024214</td>\n",
|
||||
" <td>0.008958</td>\n",
|
||||
" <td>0.048068</td>\n",
|
||||
" <td>0.499885</td>\n",
|
||||
" <td>0.154825</td>\n",
|
||||
" <td>0.402333</td>\n",
|
||||
" <td>0.434343</td>\n",
|
||||
" <td>5.133650</td>\n",
|
||||
" <td>0.877999</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>0</th>\n",
|
||||
" <td>Ready_U-KNNBaseline</td>\n",
|
||||
" <td>0.935327</td>\n",
|
||||
" <td>0.737424</td>\n",
|
||||
" <td>0.002545</td>\n",
|
||||
" <td>0.000755</td>\n",
|
||||
" <td>0.001105</td>\n",
|
||||
" <td>0.001602</td>\n",
|
||||
" <td>0.002253</td>\n",
|
||||
" <td>0.000930</td>\n",
|
||||
" <td>0.003444</td>\n",
|
||||
" <td>0.001362</td>\n",
|
||||
" <td>0.011760</td>\n",
|
||||
" <td>0.496724</td>\n",
|
||||
" <td>0.021209</td>\n",
|
||||
" <td>0.482821</td>\n",
|
||||
" <td>0.059885</td>\n",
|
||||
" <td>2.232578</td>\n",
|
||||
" <td>0.994487</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>0</th>\n",
|
||||
" <td>Ready_I-KNNBaseline</td>\n",
|
||||
" <td>0.935327</td>\n",
|
||||
" <td>0.737424</td>\n",
|
||||
" <td>0.002545</td>\n",
|
||||
" <td>0.000755</td>\n",
|
||||
" <td>0.001105</td>\n",
|
||||
" <td>0.001602</td>\n",
|
||||
" <td>0.002253</td>\n",
|
||||
" <td>0.000930</td>\n",
|
||||
" <td>0.003444</td>\n",
|
||||
" <td>0.001362</td>\n",
|
||||
" <td>0.011760</td>\n",
|
||||
" <td>0.496724</td>\n",
|
||||
" <td>0.021209</td>\n",
|
||||
" <td>0.482821</td>\n",
|
||||
" <td>0.059885</td>\n",
|
||||
" <td>2.232578</td>\n",
|
||||
" <td>0.994487</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>0</th>\n",
|
||||
" <td>Ready_U-KNN</td>\n",
|
||||
" <td>1.023495</td>\n",
|
||||
" <td>0.807913</td>\n",
|
||||
" <td>0.000742</td>\n",
|
||||
" <td>0.000205</td>\n",
|
||||
" <td>0.000305</td>\n",
|
||||
" <td>0.000449</td>\n",
|
||||
" <td>0.000536</td>\n",
|
||||
" <td>0.000198</td>\n",
|
||||
" <td>0.000845</td>\n",
|
||||
" <td>0.000274</td>\n",
|
||||
" <td>0.002744</td>\n",
|
||||
" <td>0.496441</td>\n",
|
||||
" <td>0.007423</td>\n",
|
||||
" <td>0.602121</td>\n",
|
||||
" <td>0.010823</td>\n",
|
||||
" <td>2.089186</td>\n",
|
||||
" <td>0.995706</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>0</th>\n",
|
||||
" <td>Self_TopRated</td>\n",
|
||||
" <td>1.033085</td>\n",
|
||||
" <td>0.822057</td>\n",
|
||||
" <td>0.000954</td>\n",
|
||||
" <td>0.000188</td>\n",
|
||||
" <td>0.000298</td>\n",
|
||||
" <td>0.000481</td>\n",
|
||||
" <td>0.000644</td>\n",
|
||||
" <td>0.000223</td>\n",
|
||||
" <td>0.001043</td>\n",
|
||||
" <td>0.000335</td>\n",
|
||||
" <td>0.003348</td>\n",
|
||||
" <td>0.496433</td>\n",
|
||||
" <td>0.009544</td>\n",
|
||||
" <td>0.699046</td>\n",
|
||||
" <td>0.005051</td>\n",
|
||||
" <td>1.945910</td>\n",
|
||||
" <td>0.995669</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>0</th>\n",
|
||||
" <td>Self_BaselineUI</td>\n",
|
||||
" <td>0.967585</td>\n",
|
||||
" <td>0.762740</td>\n",
|
||||
" <td>0.000954</td>\n",
|
||||
" <td>0.000170</td>\n",
|
||||
" <td>0.000278</td>\n",
|
||||
" <td>0.000463</td>\n",
|
||||
" <td>0.000644</td>\n",
|
||||
" <td>0.000189</td>\n",
|
||||
" <td>0.000752</td>\n",
|
||||
" <td>0.000168</td>\n",
|
||||
" <td>0.001677</td>\n",
|
||||
" <td>0.496424</td>\n",
|
||||
" <td>0.009544</td>\n",
|
||||
" <td>0.600530</td>\n",
|
||||
" <td>0.005051</td>\n",
|
||||
" <td>1.803126</td>\n",
|
||||
" <td>0.996380</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>0</th>\n",
|
||||
" <td>Self_IKNN</td>\n",
|
||||
" <td>1.018363</td>\n",
|
||||
" <td>0.808793</td>\n",
|
||||
" <td>0.000318</td>\n",
|
||||
" <td>0.000108</td>\n",
|
||||
" <td>0.000140</td>\n",
|
||||
" <td>0.000189</td>\n",
|
||||
" <td>0.000000</td>\n",
|
||||
" <td>0.000000</td>\n",
|
||||
" <td>0.000214</td>\n",
|
||||
" <td>0.000037</td>\n",
|
||||
" <td>0.000368</td>\n",
|
||||
" <td>0.496391</td>\n",
|
||||
" <td>0.003181</td>\n",
|
||||
" <td>0.392153</td>\n",
|
||||
" <td>0.115440</td>\n",
|
||||
" <td>4.174741</td>\n",
|
||||
" <td>0.965327</td>\n",
|
||||
" </tr>\n",
|
||||
" </tbody>\n",
|
||||
"</table>\n",
|
||||
"</div>"
|
||||
],
|
||||
"text/plain": [
|
||||
" Model RMSE MAE precision recall F_1 \\\n",
|
||||
"0 Self_RP3Beta 3.702446 3.527273 0.282185 0.192092 0.186749 \n",
|
||||
"0 Self_TopPop 2.508258 2.217909 0.188865 0.116919 0.118732 \n",
|
||||
"0 Ready_SVD 0.952784 0.750597 0.095228 0.047497 0.053142 \n",
|
||||
"0 Self_SVDBaseline 0.930321 0.734643 0.092683 0.042046 0.048568 \n",
|
||||
"0 Ready_SVDBiased 0.940375 0.742264 0.092153 0.039645 0.046804 \n",
|
||||
"0 Ready_Baseline 0.949459 0.752487 0.091410 0.037652 0.046030 \n",
|
||||
"0 Self_SVD 0.939326 0.740022 0.074549 0.031755 0.038425 \n",
|
||||
"0 Self_GlobalAvg 1.125760 0.943534 0.061188 0.025968 0.031383 \n",
|
||||
"0 Ready_Random 1.518551 1.218784 0.050583 0.024085 0.027323 \n",
|
||||
"0 Ready_I-KNN 1.030386 0.813067 0.026087 0.006908 0.010593 \n",
|
||||
"0 Ready_U-KNNBaseline 0.935327 0.737424 0.002545 0.000755 0.001105 \n",
|
||||
"0 Ready_I-KNNBaseline 0.935327 0.737424 0.002545 0.000755 0.001105 \n",
|
||||
"0 Ready_U-KNN 1.023495 0.807913 0.000742 0.000205 0.000305 \n",
|
||||
"0 Self_TopRated 1.033085 0.822057 0.000954 0.000188 0.000298 \n",
|
||||
"0 Self_BaselineUI 0.967585 0.762740 0.000954 0.000170 0.000278 \n",
|
||||
"0 Self_IKNN 1.018363 0.808793 0.000318 0.000108 0.000140 \n",
|
||||
"\n",
|
||||
" F_05 precision_super recall_super NDCG mAP MRR \\\n",
|
||||
"0 0.216980 0.204185 0.240096 0.339114 0.204905 0.572157 \n",
|
||||
"0 0.141584 0.130472 0.137473 0.214651 0.111707 0.400939 \n",
|
||||
"0 0.067082 0.084871 0.076457 0.109075 0.050124 0.241366 \n",
|
||||
"0 0.063218 0.082940 0.068730 0.098937 0.044405 0.203936 \n",
|
||||
"0 0.061886 0.079399 0.055967 0.102017 0.047972 0.216876 \n",
|
||||
"0 0.061286 0.079614 0.056463 0.095957 0.043178 0.198193 \n",
|
||||
"0 0.050562 0.065665 0.050602 0.077117 0.031574 0.165509 \n",
|
||||
"0 0.041343 0.040558 0.032107 0.067695 0.027470 0.171187 \n",
|
||||
"0 0.034826 0.031223 0.026436 0.054902 0.020652 0.137928 \n",
|
||||
"0 0.016046 0.021137 0.009522 0.024214 0.008958 0.048068 \n",
|
||||
"0 0.001602 0.002253 0.000930 0.003444 0.001362 0.011760 \n",
|
||||
"0 0.001602 0.002253 0.000930 0.003444 0.001362 0.011760 \n",
|
||||
"0 0.000449 0.000536 0.000198 0.000845 0.000274 0.002744 \n",
|
||||
"0 0.000481 0.000644 0.000223 0.001043 0.000335 0.003348 \n",
|
||||
"0 0.000463 0.000644 0.000189 0.000752 0.000168 0.001677 \n",
|
||||
"0 0.000189 0.000000 0.000000 0.000214 0.000037 0.000368 \n",
|
||||
"\n",
|
||||
" LAUC HR Reco in test Test coverage Shannon Gini \n",
|
||||
"0 0.593544 0.875928 1.000000 0.077201 3.875892 0.974947 \n",
|
||||
"0 0.555546 0.765642 1.000000 0.038961 3.159079 0.987317 \n",
|
||||
"0 0.520459 0.499470 0.992047 0.217893 4.405246 0.953484 \n",
|
||||
"0 0.517696 0.469777 1.000000 0.058442 3.085857 0.988824 \n",
|
||||
"0 0.516515 0.441145 0.997455 0.167388 4.235348 0.962085 \n",
|
||||
"0 0.515501 0.437964 1.000000 0.033911 2.836513 0.991139 \n",
|
||||
"0 0.512485 0.414634 0.981866 0.080087 3.858982 0.975271 \n",
|
||||
"0 0.509546 0.384942 1.000000 0.025974 2.711772 0.992003 \n",
|
||||
"0 0.508570 0.353128 0.987699 0.183261 5.093805 0.908215 \n",
|
||||
"0 0.499885 0.154825 0.402333 0.434343 5.133650 0.877999 \n",
|
||||
"0 0.496724 0.021209 0.482821 0.059885 2.232578 0.994487 \n",
|
||||
"0 0.496724 0.021209 0.482821 0.059885 2.232578 0.994487 \n",
|
||||
"0 0.496441 0.007423 0.602121 0.010823 2.089186 0.995706 \n",
|
||||
"0 0.496433 0.009544 0.699046 0.005051 1.945910 0.995669 \n",
|
||||
"0 0.496424 0.009544 0.600530 0.005051 1.803126 0.996380 \n",
|
||||
"0 0.496391 0.003181 0.392153 0.115440 4.174741 0.965327 "
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import imp\n",
|
||||
"imp.reload(ev)\n",
|
||||
"\n",
|
||||
"import evaluation_measures as ev\n",
|
||||
"dir_path=\"Recommendations generated/ml-100k/\"\n",
|
||||
"super_reactions=[4,5]\n",
|
||||
"test=pd.read_csv('./Datasets/ml-100k/test.csv', sep='\\t', header=None)\n",
|
||||
"\n",
|
||||
"ev.evaluate_all(test, dir_path, super_reactions)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Ready-made KNNs - Surprise implementation"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### I-KNN - basic"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Computing the cosine similarity matrix...\n",
|
||||
"Done computing similarity matrix.\n",
|
||||
"Generating predictions...\n",
|
||||
"Generating top N recommendations...\n",
|
||||
"Generating predictions...\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import helpers\n",
|
||||
"import surprise as sp\n",
|
||||
"import imp\n",
|
||||
"imp.reload(helpers)\n",
|
||||
"\n",
|
||||
"sim_options = {'name': 'cosine',\n",
|
||||
" 'user_based': False} # compute similarities between items\n",
|
||||
"algo = sp.KNNBasic(sim_options=sim_options)\n",
|
||||
"\n",
|
||||
"helpers.ready_made(algo, reco_path='Recommendations generated/ml-100k/Ready_I-KNN_reco.csv',\n",
|
||||
" estimations_path='Recommendations generated/ml-100k/Ready_Baseline_I-KNN_estimations.csv')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### U-KNN - basic"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Computing the cosine similarity matrix...\n",
|
||||
"Done computing similarity matrix.\n",
|
||||
"Generating predictions...\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"ename": "KeyboardInterrupt",
|
||||
"evalue": "",
|
||||
"output_type": "error",
|
||||
"traceback": [
|
||||
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
|
||||
"\u001b[0;31mKeyboardInterrupt\u001b[0m Traceback (most recent call last)",
|
||||
"\u001b[0;32m<ipython-input-10-dd4f59625a08>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 9\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 10\u001b[0m helpers.ready_made(algo, reco_path='Recommendations generated/ml-100k/Ready_U-KNN_reco.csv',\n\u001b[0;32m---> 11\u001b[0;31m estimations_path='Recommendations generated/ml-100k/Ready_Baseline_U-KNN_estimations.csv')\n\u001b[0m",
|
||||
"\u001b[0;32m/mnt/c/Users/rkwie/Repositories/Warsztaty z uczenia maszynowego - systemy rekomendacyjne/helpers.py\u001b[0m in \u001b[0;36mready_made\u001b[0;34m(algo, reco_path, estimations_path)\u001b[0m\n\u001b[1;32m 61\u001b[0m \u001b[0mantitrainset\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtrainset\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbuild_anti_testset\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# We want to predict ratings of pairs (user, item) which are not in train set\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 62\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'Generating predictions...'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 63\u001b[0;31m \u001b[0mpredictions\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0malgo\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtest\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mantitrainset\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 64\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'Generating top N recommendations...'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 65\u001b[0m \u001b[0mtop_n\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mget_top_n\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpredictions\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mn\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m10\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;32m~/.local/lib/python3.6/site-packages/surprise/prediction_algorithms/algo_base.py\u001b[0m in \u001b[0;36mtest\u001b[0;34m(self, testset, verbose)\u001b[0m\n\u001b[1;32m 165\u001b[0m \u001b[0mr_ui_trans\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 166\u001b[0m verbose=verbose)\n\u001b[0;32m--> 167\u001b[0;31m for (uid, iid, r_ui_trans) in testset]\n\u001b[0m\u001b[1;32m 168\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mpredictions\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 169\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;32m~/.local/lib/python3.6/site-packages/surprise/prediction_algorithms/algo_base.py\u001b[0m in \u001b[0;36m<listcomp>\u001b[0;34m(.0)\u001b[0m\n\u001b[1;32m 165\u001b[0m \u001b[0mr_ui_trans\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 166\u001b[0m verbose=verbose)\n\u001b[0;32m--> 167\u001b[0;31m for (uid, iid, r_ui_trans) in testset]\n\u001b[0m\u001b[1;32m 168\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mpredictions\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 169\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;32m~/.local/lib/python3.6/site-packages/surprise/prediction_algorithms/algo_base.py\u001b[0m in \u001b[0;36mpredict\u001b[0;34m(self, uid, iid, r_ui, clip, verbose)\u001b[0m\n\u001b[1;32m 103\u001b[0m \u001b[0mdetails\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m{\u001b[0m\u001b[0;34m}\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 104\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 105\u001b[0;31m \u001b[0mest\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mestimate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0miuid\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0miiid\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 106\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 107\u001b[0m \u001b[0;31m# If the details dict was also returned\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;32m~/.local/lib/python3.6/site-packages/surprise/prediction_algorithms/knns.py\u001b[0m in \u001b[0;36mestimate\u001b[0;34m(self, u, i)\u001b[0m\n\u001b[1;32m 109\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 110\u001b[0m \u001b[0mneighbors\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msim\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mx2\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mr\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mx2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mr\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0myr\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0my\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 111\u001b[0;31m \u001b[0mk_neighbors\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mheapq\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mnlargest\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mk\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mneighbors\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkey\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mlambda\u001b[0m \u001b[0mt\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mt\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 112\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 113\u001b[0m \u001b[0;31m# compute weighted average\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;32m/usr/lib/python3.6/heapq.py\u001b[0m in \u001b[0;36mnlargest\u001b[0;34m(n, iterable, key)\u001b[0m\n\u001b[1;32m 567\u001b[0m \u001b[0;31m# General case, slowest method\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 568\u001b[0m \u001b[0mit\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0miter\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0miterable\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 569\u001b[0;31m \u001b[0mresult\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0melem\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mi\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0melem\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mi\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0melem\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mzip\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrange\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m-\u001b[0m\u001b[0mn\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mit\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 570\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mresult\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 571\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mresult\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;32m/usr/lib/python3.6/heapq.py\u001b[0m in \u001b[0;36m<listcomp>\u001b[0;34m(.0)\u001b[0m\n\u001b[1;32m 567\u001b[0m \u001b[0;31m# General case, slowest method\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 568\u001b[0m \u001b[0mit\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0miter\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0miterable\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 569\u001b[0;31m \u001b[0mresult\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0melem\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mi\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0melem\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mi\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0melem\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mzip\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrange\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m-\u001b[0m\u001b[0mn\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mit\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 570\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mresult\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 571\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mresult\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;32m~/.local/lib/python3.6/site-packages/surprise/prediction_algorithms/knns.py\u001b[0m in \u001b[0;36m<lambda>\u001b[0;34m(t)\u001b[0m\n\u001b[1;32m 109\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 110\u001b[0m \u001b[0mneighbors\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msim\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mx2\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mr\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mx2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mr\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0myr\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0my\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 111\u001b[0;31m \u001b[0mk_neighbors\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mheapq\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mnlargest\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mk\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mneighbors\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkey\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mlambda\u001b[0m \u001b[0mt\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mt\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 112\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 113\u001b[0m \u001b[0;31m# compute weighted average\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;31mKeyboardInterrupt\u001b[0m: "
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import helpers\n",
|
||||
"import surprise as sp\n",
|
||||
"import imp\n",
|
||||
"imp.reload(helpers)\n",
|
||||
"\n",
|
||||
"sim_options = {'name': 'cosine',\n",
|
||||
" 'user_based': True} # compute similarities between users\n",
|
||||
"algo = sp.KNNBasic(sim_options=sim_options)\n",
|
||||
"\n",
|
||||
"helpers.ready_made(algo, reco_path='Recommendations generated/ml-100k/Ready_U-KNN_reco.csv',\n",
|
||||
" estimations_path='Recommendations generated/ml-100k/Ready_Baseline_U-KNN_estimations.csv')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### I-KNN - on top baseline"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import helpers\n",
|
||||
"import surprise as sp\n",
|
||||
"import imp\n",
|
||||
"imp.reload(helpers)\n",
|
||||
"\n",
|
||||
"sim_options = {'name': 'cosine',\n",
|
||||
" 'user_based': False} # compute similarities between items\n",
|
||||
"algo = sp.KNNBaseline()\n",
|
||||
"\n",
|
||||
"helpers.ready_made(algo, reco_path='Recommendations generated/ml-100k/Ready_I-KNNBaseline_reco.csv',\n",
|
||||
" estimations_path='Recommendations generated/ml-100k/Ready_I-KNNBaseline_estimations.csv')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# project task 4: use a version of your choice of Surprise KNNalgorithm"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# read the docs and try to find best parameter configuration (let say in terms of RMSE)\n",
|
||||
"# https://surprise.readthedocs.io/en/stable/knn_inspired.html##surprise.prediction_algorithms.knns.KNNBaseline\n",
|
||||
"# the solution here can be similar to examples above\n",
|
||||
"# please save the output in 'Recommendations generated/ml-100k/Self_KNNSurprisetask_reco.csv' and\n",
|
||||
"# 'Recommendations generated/ml-100k/Self_KNNSurprisetask_estimations.csv'"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
80
P4. Appendix - embeddings in high demensional spaces.ipynb
Normal file
80
P4. Appendix - embeddings in high demensional spaces.ipynb
Normal file
@ -0,0 +1,80 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"['dimensions: 1, cases when observation is the nearest: 0.0%',\n",
|
||||
" 'dimensions: 2, cases when observation is the nearest: 0.0%',\n",
|
||||
" 'dimensions: 3, cases when observation is the nearest: 0.0%',\n",
|
||||
" 'dimensions: 10, cases when observation is the nearest: 10.0%',\n",
|
||||
" 'dimensions: 20, cases when observation is the nearest: 61.0%',\n",
|
||||
" 'dimensions: 30, cases when observation is the nearest: 96.0%',\n",
|
||||
" 'dimensions: 40, cases when observation is the nearest: 98.0%',\n",
|
||||
" 'dimensions: 50, cases when observation is the nearest: 100.0%',\n",
|
||||
" 'dimensions: 60, cases when observation is the nearest: 100.0%',\n",
|
||||
" 'dimensions: 70, cases when observation is the nearest: 100.0%',\n",
|
||||
" 'dimensions: 80, cases when observation is the nearest: 100.0%',\n",
|
||||
" 'dimensions: 90, cases when observation is the nearest: 100.0%']"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import numpy as np\n",
|
||||
"import pandas as pd\n",
|
||||
"import random\n",
|
||||
"from numpy.linalg import norm\n",
|
||||
"\n",
|
||||
"dimensions=[1,2,3]+[10*i for i in range(1,10)]\n",
|
||||
"nb_vectors=10000\n",
|
||||
"trials=100\n",
|
||||
"k=1 # by setting k=1 we want to check how often the closest vector to the avarage of 2 random vectors is one of these 2 vectors\n",
|
||||
"\n",
|
||||
"result=[]\n",
|
||||
"for dimension in dimensions:\n",
|
||||
" vectors=np.random.normal(0,1,size=(nb_vectors, dimension))\n",
|
||||
" successes=0\n",
|
||||
" for i in range(trials):\n",
|
||||
" i1,i2=random.sample(range(nb_vectors),2)\n",
|
||||
" target=(vectors[i1]+vectors[i2])/2\n",
|
||||
"\n",
|
||||
" distances=pd.DataFrame(enumerate(np.dot(target, vectors.transpose())/norm(target)/norm(vectors.transpose(), axis=0)))\n",
|
||||
" distances=distances.sort_values(by=[1], ascending=False)\n",
|
||||
" if (i1 in (list(distances[0][:k]))) | (i2 in (list(distances[0][:k]))):\n",
|
||||
" successes+=1\n",
|
||||
" result.append(successes/trials)\n",
|
||||
" \n",
|
||||
"[f'dimensions: {i}, cases when observation is the nearest: {100*round(j,3)}%' for i,j in zip(dimensions, result)]"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
1641
P4. Matrix Factorization.ipynb
Normal file
1641
P4. Matrix Factorization.ipynb
Normal file
File diff suppressed because one or more lines are too long
1391
P5. Graph-based.ipynb
Normal file
1391
P5. Graph-based.ipynb
Normal file
File diff suppressed because one or more lines are too long
214
evaluation_measures.py
Normal file
214
evaluation_measures.py
Normal file
@ -0,0 +1,214 @@
|
||||
import os
|
||||
import sys
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
import math
|
||||
from sklearn.preprocessing import normalize
|
||||
from tqdm import tqdm
|
||||
from datetime import datetime, date
|
||||
import random
|
||||
import scipy.sparse as sparse
|
||||
from os import listdir
|
||||
from os.path import isfile, join
|
||||
from collections import defaultdict
|
||||
|
||||
|
||||
def evaluate(test,
|
||||
estimations_df,
|
||||
reco,
|
||||
super_reactions=[4,5],
|
||||
topK=10):
|
||||
|
||||
estimations_df=estimations_df.copy()
|
||||
reco=reco.copy()
|
||||
test_df=test.copy()
|
||||
|
||||
# prepare testset
|
||||
test_df.columns=['user', 'item', 'rating', 'timestamp']
|
||||
test_df['user_code'] = test_df['user'].astype("category").cat.codes
|
||||
test_df['item_code'] = test_df['item'].astype("category").cat.codes
|
||||
|
||||
user_code_id = dict(enumerate(test_df['user'].astype("category").cat.categories))
|
||||
user_id_code = dict((v, k) for k, v in user_code_id.items())
|
||||
item_code_id = dict(enumerate(test_df['item'].astype("category").cat.categories))
|
||||
item_id_code = dict((v, k) for k, v in item_code_id.items())
|
||||
|
||||
test_ui = sparse.csr_matrix((test_df['rating'], (test_df['user_code'], test_df['item_code'])))
|
||||
|
||||
#prepare estimations
|
||||
estimations_df.columns=['user', 'item' ,'score']
|
||||
estimations_df['user_code']=[user_id_code[user] for user in estimations_df['user']]
|
||||
estimations_df['item_code']=[item_id_code[item] for item in estimations_df['item']]
|
||||
estimations=sparse.csr_matrix((estimations_df['score'], (estimations_df['user_code'], estimations_df['item_code'])), shape=test_ui.shape)
|
||||
|
||||
#compute_estimations
|
||||
estimations_df=estimations_metrics(test_ui, estimations)
|
||||
|
||||
#prepare reco
|
||||
users=reco[:,:1]
|
||||
items=reco[:,1::2]
|
||||
# Let's use inner ids instead of real ones
|
||||
users=np.vectorize(lambda x: user_id_code.setdefault(x, -1))(users) # maybe users we recommend are not in test set
|
||||
items=np.vectorize(lambda x: item_id_code.setdefault(x, -1))(items) # maybe items we recommend are not in test set
|
||||
# Let's put them into one array
|
||||
reco=np.concatenate((users, items), axis=1)
|
||||
|
||||
#compute ranking metrics
|
||||
ranking_df=ranking_metrics(test_ui, reco, super_reactions=super_reactions, topK=topK)
|
||||
|
||||
#compute diversity metrics
|
||||
diversity_df=diversity_metrics(test_ui, reco, topK)
|
||||
|
||||
result=pd.concat([estimations_df, ranking_df, diversity_df], axis=1)
|
||||
|
||||
return(result)
|
||||
|
||||
|
||||
def ranking_metrics(test_ui, reco, super_reactions=[], topK=10):
|
||||
|
||||
nb_items=test_ui.shape[1]
|
||||
relevant_users, super_relevant_users, prec, rec, F_1, F_05, prec_super, rec_super, ndcg, mAP, MRR, LAUC, HR=\
|
||||
0,0,0,0,0,0,0,0,0,0,0,0,0
|
||||
|
||||
cg = (1.0 / np.log2(np.arange(2, topK + 2)))
|
||||
cg_sum = np.cumsum(cg)
|
||||
|
||||
for (nb_user, user) in tqdm(enumerate(reco[:,0])):
|
||||
u_rated_items=test_ui.indices[test_ui.indptr[user]:test_ui.indptr[user+1]]
|
||||
nb_u_rated_items=len(u_rated_items)
|
||||
if nb_u_rated_items>0: # skip users with no items in test set (still possible that there will be no super items)
|
||||
relevant_users+=1
|
||||
|
||||
u_super_items=u_rated_items[np.vectorize(lambda x: x in super_reactions)\
|
||||
(test_ui.data[test_ui.indptr[user]:test_ui.indptr[user+1]])]
|
||||
# more natural seems u_super_items=[item for item in u_rated_items if test_ui[user,item] in super_reactions]
|
||||
# but accesing test_ui[user,item] is expensive -we should avoid doing it
|
||||
if len(u_super_items)>0:
|
||||
super_relevant_users+=1
|
||||
|
||||
user_successes=np.zeros(topK)
|
||||
nb_user_successes=0
|
||||
user_super_successes=np.zeros(topK)
|
||||
nb_user_super_successes=0
|
||||
|
||||
# evaluation
|
||||
for (item_position,item) in enumerate(reco[nb_user,1:topK+1]):
|
||||
if item in u_rated_items:
|
||||
user_successes[item_position]=1
|
||||
nb_user_successes+=1
|
||||
if item in u_super_items:
|
||||
user_super_successes[item_position]=1
|
||||
nb_user_super_successes+=1
|
||||
|
||||
prec_u=nb_user_successes/topK
|
||||
prec+=prec_u
|
||||
|
||||
rec_u=nb_user_successes/nb_u_rated_items
|
||||
rec+=rec_u
|
||||
|
||||
F_1+=2*(prec_u*rec_u)/(prec_u+rec_u) if prec_u+rec_u>0 else 0
|
||||
F_05+=(0.5**2+1)*(prec_u*rec_u)/(0.5**2*prec_u+rec_u) if prec_u+rec_u>0 else 0
|
||||
|
||||
prec_super+=nb_user_super_successes/topK
|
||||
rec_super+=nb_user_super_successes/max(len(u_super_items),1)
|
||||
ndcg+=np.dot(user_successes,cg)/cg_sum[min(topK, nb_u_rated_items)-1]
|
||||
|
||||
cumsum_successes=np.cumsum(user_successes)
|
||||
mAP+=np.dot(cumsum_successes/np.arange(1,topK+1), user_successes)/min(topK, nb_u_rated_items)
|
||||
MRR+=1/(user_successes.nonzero()[0][0]+1) if user_successes.nonzero()[0].size>0 else 0
|
||||
LAUC+=(np.dot(cumsum_successes, 1-user_successes)+\
|
||||
(nb_user_successes+nb_u_rated_items)/2*((nb_items-nb_u_rated_items)-(topK-nb_user_successes)))/\
|
||||
((nb_items-nb_u_rated_items)*nb_u_rated_items)
|
||||
|
||||
HR+=nb_user_successes>0
|
||||
|
||||
|
||||
result=[]
|
||||
result.append(('precision', prec/relevant_users))
|
||||
result.append(('recall', rec/relevant_users))
|
||||
result.append(('F_1', F_1/relevant_users))
|
||||
result.append(('F_05', F_05/relevant_users))
|
||||
result.append(('precision_super', prec_super/super_relevant_users))
|
||||
result.append(('recall_super', rec_super/super_relevant_users))
|
||||
result.append(('NDCG', ndcg/relevant_users))
|
||||
result.append(('mAP', mAP/relevant_users))
|
||||
result.append(('MRR', MRR/relevant_users))
|
||||
result.append(('LAUC', LAUC/relevant_users))
|
||||
result.append(('HR', HR/relevant_users))
|
||||
|
||||
df_result=pd.DataFrame()
|
||||
if len(result)>0:
|
||||
df_result=(pd.DataFrame(list(zip(*result))[1])).T
|
||||
df_result.columns=list(zip(*result))[0]
|
||||
return df_result
|
||||
|
||||
|
||||
def estimations_metrics(test_ui, estimations):
|
||||
result=[]
|
||||
|
||||
RMSE=(np.sum((estimations.data-test_ui.data)**2)/estimations.nnz)**(1/2)
|
||||
result.append(['RMSE', RMSE])
|
||||
|
||||
MAE=np.sum(abs(estimations.data-test_ui.data))/estimations.nnz
|
||||
result.append(['MAE', MAE])
|
||||
|
||||
df_result=pd.DataFrame()
|
||||
if len(result)>0:
|
||||
df_result=(pd.DataFrame(list(zip(*result))[1])).T
|
||||
df_result.columns=list(zip(*result))[0]
|
||||
return df_result
|
||||
|
||||
def diversity_metrics(test_ui, reco, topK=10):
|
||||
|
||||
frequencies=defaultdict(int)
|
||||
|
||||
for item in list(set(test_ui.indices)):
|
||||
frequencies[item]=0
|
||||
|
||||
for item in reco[:,1:].flat:
|
||||
frequencies[item]+=1
|
||||
|
||||
nb_reco_outside_test=frequencies[-1]
|
||||
del frequencies[-1]
|
||||
|
||||
frequencies=np.array(list(frequencies.values()))
|
||||
|
||||
nb_rec_items=len(frequencies[frequencies>0])
|
||||
nb_reco_inside_test=np.sum(frequencies)
|
||||
|
||||
frequencies=frequencies/np.sum(frequencies)
|
||||
frequencies=np.sort(frequencies)
|
||||
|
||||
with np.errstate(divide='ignore'): # let's put zeros we items with 0 frequency and ignore division warning
|
||||
log_frequencies=np.nan_to_num(np.log(frequencies), posinf=0, neginf=0)
|
||||
|
||||
result=[]
|
||||
result.append(('Reco in test', nb_reco_inside_test/(nb_reco_inside_test+nb_reco_outside_test)))
|
||||
result.append(('Test coverage', nb_rec_items/test_ui.shape[1]))
|
||||
result.append(('Shannon', -np.dot(frequencies, log_frequencies)))
|
||||
result.append(('Gini', np.dot(frequencies, np.arange(1-len(frequencies), len(frequencies), 2))/(len(frequencies)-1)))
|
||||
|
||||
df_result=(pd.DataFrame(list(zip(*result))[1])).T
|
||||
df_result.columns=list(zip(*result))[0]
|
||||
return df_result
|
||||
|
||||
|
||||
|
||||
def evaluate_all(test,
|
||||
dir_path="Recommendations generated/ml-100k/",
|
||||
super_reactions=[4,5],
|
||||
topK=10):
|
||||
|
||||
models = list(set(['_'.join(f.split('_')[:2]) for f in listdir(dir_path)
|
||||
if isfile(dir_path+f)]))
|
||||
result=[]
|
||||
for model in models:
|
||||
estimations_df=pd.read_csv('{}{}_estimations.csv'.format(dir_path, model), header=None)
|
||||
reco=np.loadtxt('{}{}_reco.csv'.format(dir_path, model), delimiter=',')
|
||||
to_append=evaluate(test, estimations_df, reco, super_reactions, topK)
|
||||
|
||||
to_append.insert(0, "Model", model)
|
||||
result.append(to_append)
|
||||
result=pd.concat(result)
|
||||
result=result.sort_values(by='recall', ascending=False)
|
||||
return result
|
75
helpers.py
Normal file
75
helpers.py
Normal file
@ -0,0 +1,75 @@
|
||||
import pandas as pd
|
||||
import numpy as np
|
||||
import scipy.sparse as sparse
|
||||
import surprise as sp
|
||||
import time
|
||||
from collections import defaultdict
|
||||
from itertools import chain
|
||||
|
||||
def data_to_csr(train_read, test_read):
|
||||
train_read.columns=['user', 'item', 'rating', 'timestamp']
|
||||
test_read.columns=['user', 'item', 'rating', 'timestamp']
|
||||
|
||||
# Let's build whole dataset
|
||||
train_and_test=pd.concat([train_read, test_read], axis=0, ignore_index=True)
|
||||
train_and_test['user_code'] = train_and_test['user'].astype("category").cat.codes
|
||||
train_and_test['item_code'] = train_and_test['item'].astype("category").cat.codes
|
||||
|
||||
user_code_id = dict(enumerate(train_and_test['user'].astype("category").cat.categories))
|
||||
user_id_code = dict((v, k) for k, v in user_code_id.items())
|
||||
item_code_id = dict(enumerate(train_and_test['item'].astype("category").cat.categories))
|
||||
item_id_code = dict((v, k) for k, v in item_code_id.items())
|
||||
|
||||
train_df=pd.merge(train_read, train_and_test, on=list(train_read.columns))
|
||||
test_df=pd.merge(test_read, train_and_test, on=list(train_read.columns))
|
||||
|
||||
# Take number of users and items
|
||||
(U,I)=(train_and_test['user_code'].max()+1, train_and_test['item_code'].max()+1)
|
||||
|
||||
# Create sparse csr matrices
|
||||
train_ui = sparse.csr_matrix((train_df['rating'], (train_df['user_code'], train_df['item_code'])), shape=(U, I))
|
||||
test_ui = sparse.csr_matrix((test_df['rating'], (test_df['user_code'], test_df['item_code'])), shape=(U, I))
|
||||
|
||||
return train_ui, test_ui, user_code_id, user_id_code, item_code_id, item_id_code
|
||||
|
||||
|
||||
def get_top_n(predictions, n=10):
|
||||
|
||||
# Here we create a dictionary which items are lists of pairs (item, score)
|
||||
top_n = defaultdict(list)
|
||||
for uid, iid, true_r, est, _ in predictions:
|
||||
top_n[uid].append((iid, est))
|
||||
|
||||
result=[]
|
||||
# Let's choose k best items in the format: (user, item1, score1, item2, score2, ...)
|
||||
for uid, user_ratings in top_n.items():
|
||||
user_ratings.sort(key=lambda x: x[1], reverse=True)
|
||||
result.append([uid]+list(chain(*user_ratings[:n])))
|
||||
return result
|
||||
|
||||
|
||||
def ready_made(algo, reco_path, estimations_path):
|
||||
reader = sp.Reader(line_format='user item rating timestamp', sep='\t')
|
||||
trainset = sp.Dataset.load_from_file('./Datasets/ml-100k/train.csv', reader=reader)
|
||||
trainset = trainset.build_full_trainset() # <class 'surprise.trainset.Trainset'> -> it is needed for using Surprise package
|
||||
|
||||
testset = sp.Dataset.load_from_file('./Datasets/ml-100k/test.csv', reader=reader)
|
||||
testset = sp.Trainset.build_testset(testset.build_full_trainset())
|
||||
|
||||
algo.fit(trainset)
|
||||
|
||||
antitrainset = trainset.build_anti_testset() # We want to predict ratings of pairs (user, item) which are not in train set
|
||||
print('Generating predictions...')
|
||||
predictions = algo.test(antitrainset)
|
||||
print('Generating top N recommendations...')
|
||||
top_n = get_top_n(predictions, n=10)
|
||||
top_n=pd.DataFrame(top_n)
|
||||
top_n.to_csv(reco_path, index=False, header=False)
|
||||
|
||||
print('Generating predictions...')
|
||||
predictions = algo.test(testset)
|
||||
predictions_df=[]
|
||||
for uid, iid, true_r, est, _ in predictions:
|
||||
predictions_df.append([uid, iid, est])
|
||||
predictions_df=pd.DataFrame(predictions_df)
|
||||
predictions_df.to_csv(estimations_path, index=False, header=False)
|
Loading…
Reference in New Issue
Block a user