Codes added

2020-05-21 13:42:50 +02:00 · 2020-05-21 13:42:50 +02:00 · 67d6328405
commit 67d6328405
35 changed files with 913321 additions and 0 deletions
--- a/Datasets/ml-100k/README
+++ b/Datasets/ml-100k/README
@ -0,0 +1,157 @@
+SUMMARY & USAGE LICENSE
+=============================================
+
+MovieLens data sets were collected by the GroupLens Research Project
+at the University of Minnesota.
+ 
+This data set consists of:
+	* 100,000 ratings (1-5) from 943 users on 1682 movies. 
+	* Each user has rated at least 20 movies. 
+        * Simple demographic info for the users (age, gender, occupation, zip)
+
+The data was collected through the MovieLens web site
+(movielens.umn.edu) during the seven-month period from September 19th, 
+1997 through April 22nd, 1998. This data has been cleaned up - users
+who had less than 20 ratings or did not have complete demographic
+information were removed from this data set. Detailed descriptions of
+the data file can be found at the end of this file.
+
+Neither the University of Minnesota nor any of the researchers
+involved can guarantee the correctness of the data, its suitability
+for any particular purpose, or the validity of results based on the
+use of the data set.  The data set may be used for any research
+purposes under the following conditions:
+
+     * The user may not state or imply any endorsement from the
+       University of Minnesota or the GroupLens Research Group.
+
+     * The user must acknowledge the use of the data set in
+       publications resulting from the use of the data set
+       (see below for citation information).
+
+     * The user may not redistribute the data without separate
+       permission.
+
+     * The user may not use this information for any commercial or
+       revenue-bearing purposes without first obtaining permission
+       from a faculty member of the GroupLens Research Project at the
+       University of Minnesota.
+
+If you have any further questions or comments, please contact GroupLens
+<grouplens-info@cs.umn.edu>. 
+
+CITATION
+==============================================
+
+To acknowledge use of the dataset in publications, please cite the 
+following paper:
+
+F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets:
+History and Context. ACM Transactions on Interactive Intelligent
+Systems (TiiS) 5, 4, Article 19 (December 2015), 19 pages.
+DOI=http://dx.doi.org/10.1145/2827872
+
+
+ACKNOWLEDGEMENTS
+==============================================
+
+Thanks to Al Borchers for cleaning up this data and writing the
+accompanying scripts.
+
+PUBLISHED WORK THAT HAS USED THIS DATASET
+==============================================
+
+Herlocker, J., Konstan, J., Borchers, A., Riedl, J.. An Algorithmic
+Framework for Performing Collaborative Filtering. Proceedings of the
+1999 Conference on Research and Development in Information
+Retrieval. Aug. 1999.
+
+FURTHER INFORMATION ABOUT THE GROUPLENS RESEARCH PROJECT
+==============================================
+
+The GroupLens Research Project is a research group in the Department
+of Computer Science and Engineering at the University of Minnesota.
+Members of the GroupLens Research Project are involved in many
+research projects related to the fields of information filtering,
+collaborative filtering, and recommender systems. The project is lead
+by professors John Riedl and Joseph Konstan. The project began to
+explore automated collaborative filtering in 1992, but is most well
+known for its world wide trial of an automated collaborative filtering
+system for Usenet news in 1996.  The technology developed in the
+Usenet trial formed the base for the formation of Net Perceptions,
+Inc., which was founded by members of GroupLens Research. Since then
+the project has expanded its scope to research overall information
+filtering solutions, integrating in content-based methods as well as
+improving current collaborative filtering technology.
+
+Further information on the GroupLens Research project, including
+research publications, can be found at the following web site:
+        
+        http://www.grouplens.org/
+
+GroupLens Research currently operates a movie recommender based on
+collaborative filtering:
+
+        http://www.movielens.org/
+
+DETAILED DESCRIPTIONS OF DATA FILES
+==============================================
+
+Here are brief descriptions of the data.
+
+ml-data.tar.gz   -- Compressed tar file.  To rebuild the u data files do this:
+                gunzip ml-data.tar.gz
+                tar xvf ml-data.tar
+                mku.sh
+
+u.data     -- The full u data set, 100000 ratings by 943 users on 1682 items.
+              Each user has rated at least 20 movies.  Users and items are
+              numbered consecutively from 1.  The data is randomly
+              ordered. This is a tab separated list of 
+	         user id | item id | rating | timestamp. 
+              The time stamps are unix seconds since 1/1/1970 UTC   
+
+u.info     -- The number of users, items, and ratings in the u data set.
+
+u.item     -- Information about the items (movies); this is a tab separated
+              list of
+              movie id | movie title | release date | video release date |
+              IMDb URL | unknown | Action | Adventure | Animation |
+              Children's | Comedy | Crime | Documentary | Drama | Fantasy |
+              Film-Noir | Horror | Musical | Mystery | Romance | Sci-Fi |
+              Thriller | War | Western |
+              The last 19 fields are the genres, a 1 indicates the movie
+              is of that genre, a 0 indicates it is not; movies can be in
+              several genres at once.
+              The movie ids are the ones used in the u.data data set.
+
+u.genre    -- A list of the genres.
+
+u.user     -- Demographic information about the users; this is a tab
+              separated list of
+              user id | age | gender | occupation | zip code
+              The user ids are the ones used in the u.data data set.
+
+u.occupation -- A list of the occupations.
+
+u1.base    -- The data sets u1.base and u1.test through u5.base and u5.test
+u1.test       are 80%/20% splits of the u data into training and test data.
+u2.base       Each of u1, ..., u5 have disjoint test sets; this if for
+u2.test       5 fold cross validation (where you repeat your experiment
+u3.base       with each training and test set and average the results).
+u3.test       These data sets can be generated from u.data by mku.sh.
+u4.base
+u4.test
+u5.base
+u5.test
+
+ua.base    -- The data sets ua.base, ua.test, ub.base, and ub.test
+ua.test       split the u data into a training set and a test set with
+ub.base       exactly 10 ratings per user in the test set.  The sets
+ub.test       ua.test and ub.test are disjoint.  These data sets can
+              be generated from u.data by mku.sh.
+
+allbut.pl  -- The script that generates training and test sets where
+              all but n of a users ratings are in the training data.
+
+mku.sh     -- A shell script to generate all the u data sets from u.data.
--- a/Datasets/ml-100k/allbut.pl
+++ b/Datasets/ml-100k/allbut.pl
@ -0,0 +1,34 @@
+#!/usr/local/bin/perl
+
+# get args
+if (@ARGV < 3) {
+	print STDERR "Usage: $0 base_name start stop max_test [ratings ...]\n";
+	exit 1;
+}
+$basename = shift;
+$start = shift;
+$stop = shift;
+$maxtest = shift;
+
+# open files
+open( TESTFILE, ">$basename.test" ) or die "Cannot open $basename.test for writing\n";
+open( BASEFILE, ">$basename.base" ) or die "Cannot open $basename.base for writing\n";
+
+# init variables
+$testcnt = 0;
+
+while (<>) {
+	($user) = split;
+	if (! defined $ratingcnt{$user}) {
+		$ratingcnt{$user} = 0;
+	}
+	++$ratingcnt{$user};
+	if (($testcnt < $maxtest || $maxtest <= 0)
+	&& $ratingcnt{$user} >= $start && $ratingcnt{$user} <= $stop) {
+		++$testcnt;
+		print TESTFILE;
+	}
+	else {
+		print BASEFILE;
+	}
+}
--- a/Datasets/ml-100k/mku.sh
+++ b/Datasets/ml-100k/mku.sh
@ -0,0 +1,25 @@
+#!/bin/sh
+
+trap `rm -f tmp.$$; exit 1` 1 2 15
+
+for i in 1 2 3 4 5
+do
+	head -`expr $i \* 20000` u.data | tail -20000 > tmp.$$
+	sort -t"	" -k 1,1n -k 2,2n tmp.$$ > u$i.test
+	head -`expr \( $i - 1 \) \* 20000` u.data > tmp.$$
+	tail -`expr \( 5 - $i \) \* 20000` u.data >> tmp.$$
+	sort -t"	" -k 1,1n -k 2,2n tmp.$$ > u$i.base
+done
+
+allbut.pl ua 1 10 100000 u.data
+sort -t"	" -k 1,1n -k 2,2n ua.base > tmp.$$
+mv tmp.$$ ua.base
+sort -t"	" -k 1,1n -k 2,2n ua.test > tmp.$$
+mv tmp.$$ ua.test
+
+allbut.pl ub 11 20 100000 u.data
+sort -t"	" -k 1,1n -k 2,2n ub.base > tmp.$$
+mv tmp.$$ ub.base
+sort -t"	" -k 1,1n -k 2,2n ub.test > tmp.$$
+mv tmp.$$ ub.test
+
--- a/Datasets/ml-100k/movies.csv
+++ b/Datasets/ml-100k/movies.csv
--- a/Datasets/ml-100k/test.csv
+++ b/Datasets/ml-100k/test.csv
--- a/Datasets/ml-100k/train.csv
+++ b/Datasets/ml-100k/train.csv
--- a/Datasets/ml-100k/u.data
+++ b/Datasets/ml-100k/u.data
--- a/Datasets/ml-100k/u.genre
+++ b/Datasets/ml-100k/u.genre
@ -0,0 +1,20 @@
+unknown|0
+Action|1
+Adventure|2
+Animation|3
+Children's|4
+Comedy|5
+Crime|6
+Documentary|7
+Drama|8
+Fantasy|9
+Film-Noir|10
+Horror|11
+Musical|12
+Mystery|13
+Romance|14
+Sci-Fi|15
+Thriller|16
+War|17
+Western|18
+
--- a/Datasets/ml-100k/u.info
+++ b/Datasets/ml-100k/u.info
@ -0,0 +1,3 @@
+943 users
+1682 items
+100000 ratings
--- a/Datasets/ml-100k/u.item
+++ b/Datasets/ml-100k/u.item
--- a/Datasets/ml-100k/u.occupation
+++ b/Datasets/ml-100k/u.occupation
@ -0,0 +1,21 @@
+administrator
+artist
+doctor
+educator
+engineer
+entertainment
+executive
+healthcare
+homemaker
+lawyer
+librarian
+marketing
+none
+other
+programmer
+retired
+salesman
+scientist
+student
+technician
+writer
--- a/Datasets/ml-100k/u.user
+++ b/Datasets/ml-100k/u.user
@ -0,0 +1,943 @@
+1|24|M|technician|85711
+2|53|F|other|94043
+3|23|M|writer|32067
+4|24|M|technician|43537
+5|33|F|other|15213
+6|42|M|executive|98101
+7|57|M|administrator|91344
+8|36|M|administrator|05201
+9|29|M|student|01002
+10|53|M|lawyer|90703
+11|39|F|other|30329
+12|28|F|other|06405
+13|47|M|educator|29206
+14|45|M|scientist|55106
+15|49|F|educator|97301
+16|21|M|entertainment|10309
+17|30|M|programmer|06355
+18|35|F|other|37212
+19|40|M|librarian|02138
+20|42|F|homemaker|95660
+21|26|M|writer|30068
+22|25|M|writer|40206
+23|30|F|artist|48197
+24|21|F|artist|94533
+25|39|M|engineer|55107
+26|49|M|engineer|21044
+27|40|F|librarian|30030
+28|32|M|writer|55369
+29|41|M|programmer|94043
+30|7|M|student|55436
+31|24|M|artist|10003
+32|28|F|student|78741
+33|23|M|student|27510
+34|38|F|administrator|42141
+35|20|F|homemaker|42459
+36|19|F|student|93117
+37|23|M|student|55105
+38|28|F|other|54467
+39|41|M|entertainment|01040
+40|38|M|scientist|27514
+41|33|M|engineer|80525
+42|30|M|administrator|17870
+43|29|F|librarian|20854
+44|26|M|technician|46260
+45|29|M|programmer|50233
+46|27|F|marketing|46538
+47|53|M|marketing|07102
+48|45|M|administrator|12550
+49|23|F|student|76111
+50|21|M|writer|52245
+51|28|M|educator|16509
+52|18|F|student|55105
+53|26|M|programmer|55414
+54|22|M|executive|66315
+55|37|M|programmer|01331
+56|25|M|librarian|46260
+57|16|M|none|84010
+58|27|M|programmer|52246
+59|49|M|educator|08403
+60|50|M|healthcare|06472
+61|36|M|engineer|30040
+62|27|F|administrator|97214
+63|31|M|marketing|75240
+64|32|M|educator|43202
+65|51|F|educator|48118
+66|23|M|student|80521
+67|17|M|student|60402
+68|19|M|student|22904
+69|24|M|engineer|55337
+70|27|M|engineer|60067
+71|39|M|scientist|98034
+72|48|F|administrator|73034
+73|24|M|student|41850
+74|39|M|scientist|T8H1N
+75|24|M|entertainment|08816
+76|20|M|student|02215
+77|30|M|technician|29379
+78|26|M|administrator|61801
+79|39|F|administrator|03755
+80|34|F|administrator|52241
+81|21|M|student|21218
+82|50|M|programmer|22902
+83|40|M|other|44133
+84|32|M|executive|55369
+85|51|M|educator|20003
+86|26|M|administrator|46005
+87|47|M|administrator|89503
+88|49|F|librarian|11701
+89|43|F|administrator|68106
+90|60|M|educator|78155
+91|55|M|marketing|01913
+92|32|M|entertainment|80525
+93|48|M|executive|23112
+94|26|M|student|71457
+95|31|M|administrator|10707
+96|25|F|artist|75206
+97|43|M|artist|98006
+98|49|F|executive|90291
+99|20|M|student|63129
+100|36|M|executive|90254
+101|15|M|student|05146
+102|38|M|programmer|30220
+103|26|M|student|55108
+104|27|M|student|55108
+105|24|M|engineer|94043
+106|61|M|retired|55125
+107|39|M|scientist|60466
+108|44|M|educator|63130
+109|29|M|other|55423
+110|19|M|student|77840
+111|57|M|engineer|90630
+112|30|M|salesman|60613
+113|47|M|executive|95032
+114|27|M|programmer|75013
+115|31|M|engineer|17110
+116|40|M|healthcare|97232
+117|20|M|student|16125
+118|21|M|administrator|90210
+119|32|M|programmer|67401
+120|47|F|other|06260
+121|54|M|librarian|99603
+122|32|F|writer|22206
+123|48|F|artist|20008
+124|34|M|student|60615
+125|30|M|lawyer|22202
+126|28|F|lawyer|20015
+127|33|M|none|73439
+128|24|F|marketing|20009
+129|36|F|marketing|07039
+130|20|M|none|60115
+131|59|F|administrator|15237
+132|24|M|other|94612
+133|53|M|engineer|78602
+134|31|M|programmer|80236
+135|23|M|student|38401
+136|51|M|other|97365
+137|50|M|educator|84408
+138|46|M|doctor|53211
+139|20|M|student|08904
+140|30|F|student|32250
+141|49|M|programmer|36117
+142|13|M|other|48118
+143|42|M|technician|08832
+144|53|M|programmer|20910
+145|31|M|entertainment|V3N4P
+146|45|M|artist|83814
+147|40|F|librarian|02143
+148|33|M|engineer|97006
+149|35|F|marketing|17325
+150|20|F|artist|02139
+151|38|F|administrator|48103
+152|33|F|educator|68767
+153|25|M|student|60641
+154|25|M|student|53703
+155|32|F|other|11217
+156|25|M|educator|08360
+157|57|M|engineer|70808
+158|50|M|educator|27606
+159|23|F|student|55346
+160|27|M|programmer|66215
+161|50|M|lawyer|55104
+162|25|M|artist|15610
+163|49|M|administrator|97212
+164|47|M|healthcare|80123
+165|20|F|other|53715
+166|47|M|educator|55113
+167|37|M|other|L9G2B
+168|48|M|other|80127
+169|52|F|other|53705
+170|53|F|healthcare|30067
+171|48|F|educator|78750
+172|55|M|marketing|22207
+173|56|M|other|22306
+174|30|F|administrator|52302
+175|26|F|scientist|21911
+176|28|M|scientist|07030
+177|20|M|programmer|19104
+178|26|M|other|49512
+179|15|M|entertainment|20755
+180|22|F|administrator|60202
+181|26|M|executive|21218
+182|36|M|programmer|33884
+183|33|M|scientist|27708
+184|37|M|librarian|76013
+185|53|F|librarian|97403
+186|39|F|executive|00000
+187|26|M|educator|16801
+188|42|M|student|29440
+189|32|M|artist|95014
+190|30|M|administrator|95938
+191|33|M|administrator|95161
+192|42|M|educator|90840
+193|29|M|student|49931
+194|38|M|administrator|02154
+195|42|M|scientist|93555
+196|49|M|writer|55105
+197|55|M|technician|75094
+198|21|F|student|55414
+199|30|M|writer|17604
+200|40|M|programmer|93402
+201|27|M|writer|E2A4H
+202|41|F|educator|60201
+203|25|F|student|32301
+204|52|F|librarian|10960
+205|47|M|lawyer|06371
+206|14|F|student|53115
+207|39|M|marketing|92037
+208|43|M|engineer|01720
+209|33|F|educator|85710
+210|39|M|engineer|03060
+211|66|M|salesman|32605
+212|49|F|educator|61401
+213|33|M|executive|55345
+214|26|F|librarian|11231
+215|35|M|programmer|63033
+216|22|M|engineer|02215
+217|22|M|other|11727
+218|37|M|administrator|06513
+219|32|M|programmer|43212
+220|30|M|librarian|78205
+221|19|M|student|20685
+222|29|M|programmer|27502
+223|19|F|student|47906
+224|31|F|educator|43512
+225|51|F|administrator|58202
+226|28|M|student|92103
+227|46|M|executive|60659
+228|21|F|student|22003
+229|29|F|librarian|22903
+230|28|F|student|14476
+231|48|M|librarian|01080
+232|45|M|scientist|99709
+233|38|M|engineer|98682
+234|60|M|retired|94702
+235|37|M|educator|22973
+236|44|F|writer|53214
+237|49|M|administrator|63146
+238|42|F|administrator|44124
+239|39|M|artist|95628
+240|23|F|educator|20784
+241|26|F|student|20001
+242|33|M|educator|31404
+243|33|M|educator|60201
+244|28|M|technician|80525
+245|22|M|student|55109
+246|19|M|student|28734
+247|28|M|engineer|20770
+248|25|M|student|37235
+249|25|M|student|84103
+250|29|M|executive|95110
+251|28|M|doctor|85032
+252|42|M|engineer|07733
+253|26|F|librarian|22903
+254|44|M|educator|42647
+255|23|M|entertainment|07029
+256|35|F|none|39042
+257|17|M|student|77005
+258|19|F|student|77801
+259|21|M|student|48823
+260|40|F|artist|89801
+261|28|M|administrator|85202
+262|19|F|student|78264
+263|41|M|programmer|55346
+264|36|F|writer|90064
+265|26|M|executive|84601
+266|62|F|administrator|78756
+267|23|M|engineer|83716
+268|24|M|engineer|19422
+269|31|F|librarian|43201
+270|18|F|student|63119
+271|51|M|engineer|22932
+272|33|M|scientist|53706
+273|50|F|other|10016
+274|20|F|student|55414
+275|38|M|engineer|92064
+276|21|M|student|95064
+277|35|F|administrator|55406
+278|37|F|librarian|30033
+279|33|M|programmer|85251
+280|30|F|librarian|22903
+281|15|F|student|06059
+282|22|M|administrator|20057
+283|28|M|programmer|55305
+284|40|M|executive|92629
+285|25|M|programmer|53713
+286|27|M|student|15217
+287|21|M|salesman|31211
+288|34|M|marketing|23226
+289|11|M|none|94619
+290|40|M|engineer|93550
+291|19|M|student|44106
+292|35|F|programmer|94703
+293|24|M|writer|60804
+294|34|M|technician|92110
+295|31|M|educator|50325
+296|43|F|administrator|16803
+297|29|F|educator|98103
+298|44|M|executive|01581
+299|29|M|doctor|63108
+300|26|F|programmer|55106
+301|24|M|student|55439
+302|42|M|educator|77904
+303|19|M|student|14853
+304|22|F|student|71701
+305|23|M|programmer|94086
+306|45|M|other|73132
+307|25|M|student|55454
+308|60|M|retired|95076
+309|40|M|scientist|70802
+310|37|M|educator|91711
+311|32|M|technician|73071
+312|48|M|other|02110
+313|41|M|marketing|60035
+314|20|F|student|08043
+315|31|M|educator|18301
+316|43|F|other|77009
+317|22|M|administrator|13210
+318|65|M|retired|06518
+319|38|M|programmer|22030
+320|19|M|student|24060
+321|49|F|educator|55413
+322|20|M|student|50613
+323|21|M|student|19149
+324|21|F|student|02176
+325|48|M|technician|02139
+326|41|M|administrator|15235
+327|22|M|student|11101
+328|51|M|administrator|06779
+329|48|M|educator|01720
+330|35|F|educator|33884
+331|33|M|entertainment|91344
+332|20|M|student|40504
+333|47|M|other|V0R2M
+334|32|M|librarian|30002
+335|45|M|executive|33775
+336|23|M|salesman|42101
+337|37|M|scientist|10522
+338|39|F|librarian|59717
+339|35|M|lawyer|37901
+340|46|M|engineer|80123
+341|17|F|student|44405
+342|25|F|other|98006
+343|43|M|engineer|30093
+344|30|F|librarian|94117
+345|28|F|librarian|94143
+346|34|M|other|76059
+347|18|M|student|90210
+348|24|F|student|45660
+349|68|M|retired|61455
+350|32|M|student|97301
+351|61|M|educator|49938
+352|37|F|programmer|55105
+353|25|M|scientist|28480
+354|29|F|librarian|48197
+355|25|M|student|60135
+356|32|F|homemaker|92688
+357|26|M|executive|98133
+358|40|M|educator|10022
+359|22|M|student|61801
+360|51|M|other|98027
+361|22|M|student|44074
+362|35|F|homemaker|85233
+363|20|M|student|87501
+364|63|M|engineer|01810
+365|29|M|lawyer|20009
+366|20|F|student|50670
+367|17|M|student|37411
+368|18|M|student|92113
+369|24|M|student|91335
+370|52|M|writer|08534
+371|36|M|engineer|99206
+372|25|F|student|66046
+373|24|F|other|55116
+374|36|M|executive|78746
+375|17|M|entertainment|37777
+376|28|F|other|10010
+377|22|M|student|18015
+378|35|M|student|02859
+379|44|M|programmer|98117
+380|32|M|engineer|55117
+381|33|M|artist|94608
+382|45|M|engineer|01824
+383|42|M|administrator|75204
+384|52|M|programmer|45218
+385|36|M|writer|10003
+386|36|M|salesman|43221
+387|33|M|entertainment|37412
+388|31|M|other|36106
+389|44|F|writer|83702
+390|42|F|writer|85016
+391|23|M|student|84604
+392|52|M|writer|59801
+393|19|M|student|83686
+394|25|M|administrator|96819
+395|43|M|other|44092
+396|57|M|engineer|94551
+397|17|M|student|27514
+398|40|M|other|60008
+399|25|M|other|92374
+400|33|F|administrator|78213
+401|46|F|healthcare|84107
+402|30|M|engineer|95129
+403|37|M|other|06811
+404|29|F|programmer|55108
+405|22|F|healthcare|10019
+406|52|M|educator|93109
+407|29|M|engineer|03261
+408|23|M|student|61755
+409|48|M|administrator|98225
+410|30|F|artist|94025
+411|34|M|educator|44691
+412|25|M|educator|15222
+413|55|M|educator|78212
+414|24|M|programmer|38115
+415|39|M|educator|85711
+416|20|F|student|92626
+417|27|F|other|48103
+418|55|F|none|21206
+419|37|M|lawyer|43215
+420|53|M|educator|02140
+421|38|F|programmer|55105
+422|26|M|entertainment|94533
+423|64|M|other|91606
+424|36|F|marketing|55422
+425|19|M|student|58644
+426|55|M|educator|01602
+427|51|M|doctor|85258
+428|28|M|student|55414
+429|27|M|student|29205
+430|38|M|scientist|98199
+431|24|M|marketing|92629
+432|22|M|entertainment|50311
+433|27|M|artist|11211
+434|16|F|student|49705
+435|24|M|engineer|60007
+436|30|F|administrator|17345
+437|27|F|other|20009
+438|51|F|administrator|43204
+439|23|F|administrator|20817
+440|30|M|other|48076
+441|50|M|technician|55013
+442|22|M|student|85282
+443|35|M|salesman|33308
+444|51|F|lawyer|53202
+445|21|M|writer|92653
+446|57|M|educator|60201
+447|30|M|administrator|55113
+448|23|M|entertainment|10021
+449|23|M|librarian|55021
+450|35|F|educator|11758
+451|16|M|student|48446
+452|35|M|administrator|28018
+453|18|M|student|06333
+454|57|M|other|97330
+455|48|M|administrator|83709
+456|24|M|technician|31820
+457|33|F|salesman|30011
+458|47|M|technician|Y1A6B
+459|22|M|student|29201
+460|44|F|other|60630
+461|15|M|student|98102
+462|19|F|student|02918
+463|48|F|healthcare|75218
+464|60|M|writer|94583
+465|32|M|other|05001
+466|22|M|student|90804
+467|29|M|engineer|91201
+468|28|M|engineer|02341
+469|60|M|educator|78628
+470|24|M|programmer|10021
+471|10|M|student|77459
+472|24|M|student|87544
+473|29|M|student|94708
+474|51|M|executive|93711
+475|30|M|programmer|75230
+476|28|M|student|60440
+477|23|F|student|02125
+478|29|M|other|10019
+479|30|M|educator|55409
+480|57|M|retired|98257
+481|73|M|retired|37771
+482|18|F|student|40256
+483|29|M|scientist|43212
+484|27|M|student|21208
+485|44|F|educator|95821
+486|39|M|educator|93101
+487|22|M|engineer|92121
+488|48|M|technician|21012
+489|55|M|other|45218
+490|29|F|artist|V5A2B
+491|43|F|writer|53711
+492|57|M|educator|94618
+493|22|M|engineer|60090
+494|38|F|administrator|49428
+495|29|M|engineer|03052
+496|21|F|student|55414
+497|20|M|student|50112
+498|26|M|writer|55408
+499|42|M|programmer|75006
+500|28|M|administrator|94305
+501|22|M|student|10025
+502|22|M|student|23092
+503|50|F|writer|27514
+504|40|F|writer|92115
+505|27|F|other|20657
+506|46|M|programmer|03869
+507|18|F|writer|28450
+508|27|M|marketing|19382
+509|23|M|administrator|10011
+510|34|M|other|98038
+511|22|M|student|21250
+512|29|M|other|20090
+513|43|M|administrator|26241
+514|27|M|programmer|20707
+515|53|M|marketing|49508
+516|53|F|librarian|10021
+517|24|M|student|55454
+518|49|F|writer|99709
+519|22|M|other|55320
+520|62|M|healthcare|12603
+521|19|M|student|02146
+522|36|M|engineer|55443
+523|50|F|administrator|04102
+524|56|M|educator|02159
+525|27|F|administrator|19711
+526|30|M|marketing|97124
+527|33|M|librarian|12180
+528|18|M|student|55104
+529|47|F|administrator|44224
+530|29|M|engineer|94040
+531|30|F|salesman|97408
+532|20|M|student|92705
+533|43|M|librarian|02324
+534|20|M|student|05464
+535|45|F|educator|80302
+536|38|M|engineer|30078
+537|36|M|engineer|22902
+538|31|M|scientist|21010
+539|53|F|administrator|80303
+540|28|M|engineer|91201
+541|19|F|student|84302
+542|21|M|student|60515
+543|33|M|scientist|95123
+544|44|F|other|29464
+545|27|M|technician|08052
+546|36|M|executive|22911
+547|50|M|educator|14534
+548|51|M|writer|95468
+549|42|M|scientist|45680
+550|16|F|student|95453
+551|25|M|programmer|55414
+552|45|M|other|68147
+553|58|M|educator|62901
+554|32|M|scientist|62901
+555|29|F|educator|23227
+556|35|F|educator|30606
+557|30|F|writer|11217
+558|56|F|writer|63132
+559|69|M|executive|10022
+560|32|M|student|10003
+561|23|M|engineer|60005
+562|54|F|administrator|20879
+563|39|F|librarian|32707
+564|65|M|retired|94591
+565|40|M|student|55422
+566|20|M|student|14627
+567|24|M|entertainment|10003
+568|39|M|educator|01915
+569|34|M|educator|91903
+570|26|M|educator|14627
+571|34|M|artist|01945
+572|51|M|educator|20003
+573|68|M|retired|48911
+574|56|M|educator|53188
+575|33|M|marketing|46032
+576|48|M|executive|98281
+577|36|F|student|77845
+578|31|M|administrator|M7A1A
+579|32|M|educator|48103
+580|16|M|student|17961
+581|37|M|other|94131
+582|17|M|student|93003
+583|44|M|engineer|29631
+584|25|M|student|27511
+585|69|M|librarian|98501
+586|20|M|student|79508
+587|26|M|other|14216
+588|18|F|student|93063
+589|21|M|lawyer|90034
+590|50|M|educator|82435
+591|57|F|librarian|92093
+592|18|M|student|97520
+593|31|F|educator|68767
+594|46|M|educator|M4J2K
+595|25|M|programmer|31909
+596|20|M|artist|77073
+597|23|M|other|84116
+598|40|F|marketing|43085
+599|22|F|student|R3T5K
+600|34|M|programmer|02320
+601|19|F|artist|99687
+602|47|F|other|34656
+603|21|M|programmer|47905
+604|39|M|educator|11787
+605|33|M|engineer|33716
+606|28|M|programmer|63044
+607|49|F|healthcare|02154
+608|22|M|other|10003
+609|13|F|student|55106
+610|22|M|student|21227
+611|46|M|librarian|77008
+612|36|M|educator|79070
+613|37|F|marketing|29678
+614|54|M|educator|80227
+615|38|M|educator|27705
+616|55|M|scientist|50613
+617|27|F|writer|11201
+618|15|F|student|44212
+619|17|M|student|44134
+620|18|F|writer|81648
+621|17|M|student|60402
+622|25|M|programmer|14850
+623|50|F|educator|60187
+624|19|M|student|30067
+625|27|M|programmer|20723
+626|23|M|scientist|19807
+627|24|M|engineer|08034
+628|13|M|none|94306
+629|46|F|other|44224
+630|26|F|healthcare|55408
+631|18|F|student|38866
+632|18|M|student|55454
+633|35|M|programmer|55414
+634|39|M|engineer|T8H1N
+635|22|M|other|23237
+636|47|M|educator|48043
+637|30|M|other|74101
+638|45|M|engineer|01940
+639|42|F|librarian|12065
+640|20|M|student|61801
+641|24|M|student|60626
+642|18|F|student|95521
+643|39|M|scientist|55122
+644|51|M|retired|63645
+645|27|M|programmer|53211
+646|17|F|student|51250
+647|40|M|educator|45810
+648|43|M|engineer|91351
+649|20|M|student|39762
+650|42|M|engineer|83814
+651|65|M|retired|02903
+652|35|M|other|22911
+653|31|M|executive|55105
+654|27|F|student|78739
+655|50|F|healthcare|60657
+656|48|M|educator|10314
+657|26|F|none|78704
+658|33|M|programmer|92626
+659|31|M|educator|54248
+660|26|M|student|77380
+661|28|M|programmer|98121
+662|55|M|librarian|19102
+663|26|M|other|19341
+664|30|M|engineer|94115
+665|25|M|administrator|55412
+666|44|M|administrator|61820
+667|35|M|librarian|01970
+668|29|F|writer|10016
+669|37|M|other|20009
+670|30|M|technician|21114
+671|21|M|programmer|91919
+672|54|F|administrator|90095
+673|51|M|educator|22906
+674|13|F|student|55337
+675|34|M|other|28814
+676|30|M|programmer|32712
+677|20|M|other|99835
+678|50|M|educator|61462
+679|20|F|student|54302
+680|33|M|lawyer|90405
+681|44|F|marketing|97208
+682|23|M|programmer|55128
+683|42|M|librarian|23509
+684|28|M|student|55414
+685|32|F|librarian|55409
+686|32|M|educator|26506
+687|31|F|healthcare|27713
+688|37|F|administrator|60476
+689|25|M|other|45439
+690|35|M|salesman|63304
+691|34|M|educator|60089
+692|34|M|engineer|18053
+693|43|F|healthcare|85210
+694|60|M|programmer|06365
+695|26|M|writer|38115
+696|55|M|other|94920
+697|25|M|other|77042
+698|28|F|programmer|06906
+699|44|M|other|96754
+700|17|M|student|76309
+701|51|F|librarian|56321
+702|37|M|other|89104
+703|26|M|educator|49512
+704|51|F|librarian|91105
+705|21|F|student|54494
+706|23|M|student|55454
+707|56|F|librarian|19146
+708|26|F|homemaker|96349
+709|21|M|other|N4T1A
+710|19|M|student|92020
+711|22|F|student|15203
+712|22|F|student|54901
+713|42|F|other|07204
+714|26|M|engineer|55343
+715|21|M|technician|91206
+716|36|F|administrator|44265
+717|24|M|technician|84105
+718|42|M|technician|64118
+719|37|F|other|V0R2H
+720|49|F|administrator|16506
+721|24|F|entertainment|11238
+722|50|F|homemaker|17331
+723|26|M|executive|94403
+724|31|M|executive|40243
+725|21|M|student|91711
+726|25|F|administrator|80538
+727|25|M|student|78741
+728|58|M|executive|94306
+729|19|M|student|56567
+730|31|F|scientist|32114
+731|41|F|educator|70403
+732|28|F|other|98405
+733|44|F|other|60630
+734|25|F|other|63108
+735|29|F|healthcare|85719
+736|48|F|writer|94618
+737|30|M|programmer|98072
+738|35|M|technician|95403
+739|35|M|technician|73162
+740|25|F|educator|22206
+741|25|M|writer|63108
+742|35|M|student|29210
+743|31|M|programmer|92660
+744|35|M|marketing|47024
+745|42|M|writer|55113
+746|25|M|engineer|19047
+747|19|M|other|93612
+748|28|M|administrator|94720
+749|33|M|other|80919
+750|28|M|administrator|32303
+751|24|F|other|90034
+752|60|M|retired|21201
+753|56|M|salesman|91206
+754|59|F|librarian|62901
+755|44|F|educator|97007
+756|30|F|none|90247
+757|26|M|student|55104
+758|27|M|student|53706
+759|20|F|student|68503
+760|35|F|other|14211
+761|17|M|student|97302
+762|32|M|administrator|95050
+763|27|M|scientist|02113
+764|27|F|educator|62903
+765|31|M|student|33066
+766|42|M|other|10960
+767|70|M|engineer|00000
+768|29|M|administrator|12866
+769|39|M|executive|06927
+770|28|M|student|14216
+771|26|M|student|15232
+772|50|M|writer|27105
+773|20|M|student|55414
+774|30|M|student|80027
+775|46|M|executive|90036
+776|30|M|librarian|51157
+777|63|M|programmer|01810
+778|34|M|student|01960
+779|31|M|student|K7L5J
+780|49|M|programmer|94560
+781|20|M|student|48825
+782|21|F|artist|33205
+783|30|M|marketing|77081
+784|47|M|administrator|91040
+785|32|M|engineer|23322
+786|36|F|engineer|01754
+787|18|F|student|98620
+788|51|M|administrator|05779
+789|29|M|other|55420
+790|27|M|technician|80913
+791|31|M|educator|20064
+792|40|M|programmer|12205
+793|22|M|student|85281
+794|32|M|educator|57197
+795|30|M|programmer|08610
+796|32|F|writer|33755
+797|44|F|other|62522
+798|40|F|writer|64131
+799|49|F|administrator|19716
+800|25|M|programmer|55337
+801|22|M|writer|92154
+802|35|M|administrator|34105
+803|70|M|administrator|78212
+804|39|M|educator|61820
+805|27|F|other|20009
+806|27|M|marketing|11217
+807|41|F|healthcare|93555
+808|45|M|salesman|90016
+809|50|F|marketing|30803
+810|55|F|other|80526
+811|40|F|educator|73013
+812|22|M|technician|76234
+813|14|F|student|02136
+814|30|M|other|12345
+815|32|M|other|28806
+816|34|M|other|20755
+817|19|M|student|60152
+818|28|M|librarian|27514
+819|59|M|administrator|40205
+820|22|M|student|37725
+821|37|M|engineer|77845
+822|29|F|librarian|53144
+823|27|M|artist|50322
+824|31|M|other|15017
+825|44|M|engineer|05452
+826|28|M|artist|77048
+827|23|F|engineer|80228
+828|28|M|librarian|85282
+829|48|M|writer|80209
+830|46|M|programmer|53066
+831|21|M|other|33765
+832|24|M|technician|77042
+833|34|M|writer|90019
+834|26|M|other|64153
+835|44|F|executive|11577
+836|44|M|artist|10018
+837|36|F|artist|55409
+838|23|M|student|01375
+839|38|F|entertainment|90814
+840|39|M|artist|55406
+841|45|M|doctor|47401
+842|40|M|writer|93055
+843|35|M|librarian|44212
+844|22|M|engineer|95662
+845|64|M|doctor|97405
+846|27|M|lawyer|47130
+847|29|M|student|55417
+848|46|M|engineer|02146
+849|15|F|student|25652
+850|34|M|technician|78390
+851|18|M|other|29646
+852|46|M|administrator|94086
+853|49|M|writer|40515
+854|29|F|student|55408
+855|53|M|librarian|04988
+856|43|F|marketing|97215
+857|35|F|administrator|V1G4L
+858|63|M|educator|09645
+859|18|F|other|06492
+860|70|F|retired|48322
+861|38|F|student|14085
+862|25|M|executive|13820
+863|17|M|student|60089
+864|27|M|programmer|63021
+865|25|M|artist|11231
+866|45|M|other|60302
+867|24|M|scientist|92507
+868|21|M|programmer|55303
+869|30|M|student|10025
+870|22|M|student|65203
+871|31|M|executive|44648
+872|19|F|student|74078
+873|48|F|administrator|33763
+874|36|M|scientist|37076
+875|24|F|student|35802
+876|41|M|other|20902
+877|30|M|other|77504
+878|50|F|educator|98027
+879|33|F|administrator|55337
+880|13|M|student|83702
+881|39|M|marketing|43017
+882|35|M|engineer|40503
+883|49|M|librarian|50266
+884|44|M|engineer|55337
+885|30|F|other|95316
+886|20|M|student|61820
+887|14|F|student|27249
+888|41|M|scientist|17036
+889|24|M|technician|78704
+890|32|M|student|97301
+891|51|F|administrator|03062
+892|36|M|other|45243
+893|25|M|student|95823
+894|47|M|educator|74075
+895|31|F|librarian|32301
+896|28|M|writer|91505
+897|30|M|other|33484
+898|23|M|homemaker|61755
+899|32|M|other|55116
+900|60|M|retired|18505
+901|38|M|executive|L1V3W
+902|45|F|artist|97203
+903|28|M|educator|20850
+904|17|F|student|61073
+905|27|M|other|30350
+906|45|M|librarian|70124
+907|25|F|other|80526
+908|44|F|librarian|68504
+909|50|F|educator|53171
+910|28|M|healthcare|29301
+911|37|F|writer|53210
+912|51|M|other|06512
+913|27|M|student|76201
+914|44|F|other|08105
+915|50|M|entertainment|60614
+916|27|M|engineer|N2L5N
+917|22|F|student|20006
+918|40|M|scientist|70116
+919|25|M|other|14216
+920|30|F|artist|90008
+921|20|F|student|98801
+922|29|F|administrator|21114
+923|21|M|student|E2E3R
+924|29|M|other|11753
+925|18|F|salesman|49036
+926|49|M|entertainment|01701
+927|23|M|programmer|55428
+928|21|M|student|55408
+929|44|M|scientist|53711
+930|28|F|scientist|07310
+931|60|M|educator|33556
+932|58|M|educator|06437
+933|28|M|student|48105
+934|61|M|engineer|22902
+935|42|M|doctor|66221
+936|24|M|other|32789
+937|48|M|educator|98072
+938|38|F|technician|55038
+939|26|F|student|33319
+940|32|M|administrator|02215
+941|20|M|student|97229
+942|48|F|librarian|78209
+943|22|M|student|77841
--- a/Datasets/ml-100k/u1.base
+++ b/Datasets/ml-100k/u1.base
--- a/Datasets/ml-100k/u1.test
+++ b/Datasets/ml-100k/u1.test
--- a/Datasets/ml-100k/u2.base
+++ b/Datasets/ml-100k/u2.base
--- a/Datasets/ml-100k/u2.test
+++ b/Datasets/ml-100k/u2.test
--- a/Datasets/ml-100k/u3.base
+++ b/Datasets/ml-100k/u3.base
--- a/Datasets/ml-100k/u3.test
+++ b/Datasets/ml-100k/u3.test
--- a/Datasets/ml-100k/u4.base
+++ b/Datasets/ml-100k/u4.base
--- a/Datasets/ml-100k/u4.test
+++ b/Datasets/ml-100k/u4.test
--- a/Datasets/ml-100k/u5.base
+++ b/Datasets/ml-100k/u5.base
--- a/Datasets/ml-100k/u5.test
+++ b/Datasets/ml-100k/u5.test
--- a/Datasets/ml-100k/ua.base
+++ b/Datasets/ml-100k/ua.base
--- a/Datasets/ml-100k/ua.test
+++ b/Datasets/ml-100k/ua.test
--- a/Datasets/ml-100k/ub.base
+++ b/Datasets/ml-100k/ub.base
--- a/Datasets/ml-100k/ub.test
+++ b/Datasets/ml-100k/ub.test
--- a/preparation.ipynb
+++ b/preparation.ipynb
--- a/Baseline.ipynb
+++ b/Baseline.ipynb
--- a/Evaluation.ipynb
+++ b/Evaluation.ipynb
--- a/neighbours.ipynb
+++ b/neighbours.ipynb
@ -0,0 +1,957 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Self made simplified I-KNN"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import helpers\n",
+    "import pandas as pd\n",
+    "import numpy as np\n",
+    "import scipy.sparse as sparse\n",
+    "from collections import defaultdict\n",
+    "from itertools import chain\n",
+    "import random\n",
+    "\n",
+    "train_read=pd.read_csv('./Datasets/ml-100k/train.csv', sep='\\t', header=None)\n",
+    "test_read=pd.read_csv('./Datasets/ml-100k/test.csv', sep='\\t', header=None)\n",
+    "train_ui, test_ui, user_code_id, user_id_code, item_code_id, item_id_code = helpers.data_to_csr(train_read, test_read)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class IKNN():\n",
+    "    \n",
+    "    def fit(self, train_ui):\n",
+    "        self.train_ui=train_ui\n",
+    "        \n",
+    "        train_iu=train_ui.transpose()\n",
+    "        norms=np.linalg.norm(train_iu.A, axis=1) # here we compute lenth of each item ratings vector\n",
+    "        norms=np.vectorize(lambda x: max(x,1))(norms[:,None]) # to avoid dividing by zero\n",
+    "\n",
+    "        normalized_train_iu=sparse.csr_matrix(train_iu/norms)\n",
+    "\n",
+    "        self.similarity_matrix_ii=normalized_train_iu*normalized_train_iu.transpose()\n",
+    "        \n",
+    "        self.estimations=np.array(train_ui*self.similarity_matrix_ii/((train_ui>0)*self.similarity_matrix_ii))\n",
+    "        \n",
+    "    def recommend(self, user_code_id, item_code_id, topK=10):\n",
+    "        \n",
+    "        top_k = defaultdict(list)\n",
+    "        for nb_user, user in enumerate(self.estimations):\n",
+    "            \n",
+    "            user_rated=self.train_ui.indices[self.train_ui.indptr[nb_user]:self.train_ui.indptr[nb_user+1]]\n",
+    "            for item, score in enumerate(user):\n",
+    "                if item not in user_rated and not np.isnan(score):\n",
+    "                    top_k[user_code_id[nb_user]].append((item_code_id[item], score))\n",
+    "        result=[]\n",
+    "        # Let's choose k best items in the format: (user, item1, score1, item2, score2, ...)\n",
+    "        for uid, item_scores in top_k.items():\n",
+    "            item_scores.sort(key=lambda x: x[1], reverse=True)\n",
+    "            result.append([uid]+list(chain(*item_scores[:topK])))\n",
+    "        return result\n",
+    "    \n",
+    "    def estimate(self, user_code_id, item_code_id, test_ui):\n",
+    "        result=[]\n",
+    "        for user, item in zip(*test_ui.nonzero()):\n",
+    "            result.append([user_code_id[user], item_code_id[item], \n",
+    "                           self.estimations[user,item] if not np.isnan(self.estimations[user,item]) else 1])\n",
+    "        return result"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "toy train ui:\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "array([[3, 4, 0, 0, 5, 0, 0, 4],\n",
+       "       [0, 1, 2, 3, 0, 0, 0, 0],\n",
+       "       [0, 0, 0, 5, 0, 3, 4, 0]], dtype=int64)"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "similarity matrix:\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "array([[1.        , 0.9701425 , 0.        , 0.        , 1.        ,\n",
+       "        0.        , 0.        , 1.        ],\n",
+       "       [0.9701425 , 1.        , 0.24253563, 0.12478355, 0.9701425 ,\n",
+       "        0.        , 0.        , 0.9701425 ],\n",
+       "       [0.        , 0.24253563, 1.        , 0.51449576, 0.        ,\n",
+       "        0.        , 0.        , 0.        ],\n",
+       "       [0.        , 0.12478355, 0.51449576, 1.        , 0.        ,\n",
+       "        0.85749293, 0.85749293, 0.        ],\n",
+       "       [1.        , 0.9701425 , 0.        , 0.        , 1.        ,\n",
+       "        0.        , 0.        , 1.        ],\n",
+       "       [0.        , 0.        , 0.        , 0.85749293, 0.        ,\n",
+       "        1.        , 1.        , 0.        ],\n",
+       "       [0.        , 0.        , 0.        , 0.85749293, 0.        ,\n",
+       "        1.        , 1.        , 0.        ],\n",
+       "       [1.        , 0.9701425 , 0.        , 0.        , 1.        ,\n",
+       "        0.        , 0.        , 1.        ]])"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "estimations matrix:\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "array([[4.        , 4.        , 4.        , 4.        , 4.        ,\n",
+       "               nan,        nan, 4.        ],\n",
+       "       [1.        , 1.35990333, 2.15478388, 2.53390319, 1.        ,\n",
+       "        3.        , 3.        , 1.        ],\n",
+       "       [       nan, 5.        , 5.        , 4.05248907,        nan,\n",
+       "        3.95012863, 3.95012863,        nan]])"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[[0, 20, 4.0, 30, 4.0],\n",
+       " [10, 50, 3.0, 60, 3.0, 0, 1.0, 40, 1.0, 70, 1.0],\n",
+       " [20, 10, 5.0, 20, 5.0]]"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# toy example\n",
+    "toy_train_read=pd.read_csv('./Datasets/toy-example/train.csv', sep='\\t', header=None, names=['user', 'item', 'rating', 'timestamp'])\n",
+    "toy_test_read=pd.read_csv('./Datasets/toy-example/test.csv', sep='\\t', header=None, names=['user', 'item', 'rating', 'timestamp'])\n",
+    "\n",
+    "toy_train_ui, toy_test_ui, toy_user_code_id, toy_user_id_code, \\\n",
+    "toy_item_code_id, toy_item_id_code = helpers.data_to_csr(toy_train_read, toy_test_read)\n",
+    "\n",
+    "\n",
+    "model=IKNN()\n",
+    "model.fit(toy_train_ui)\n",
+    "\n",
+    "print('toy train ui:')\n",
+    "display(toy_train_ui.A)\n",
+    "\n",
+    "print('similarity matrix:')\n",
+    "display(model.similarity_matrix_ii.A)\n",
+    "\n",
+    "print('estimations matrix:')\n",
+    "display(model.estimations)\n",
+    "\n",
+    "model.recommend(toy_user_code_id, toy_item_code_id)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model=IKNN()\n",
+    "model.fit(train_ui)\n",
+    "\n",
+    "top_n=pd.DataFrame(model.recommend(user_code_id, item_code_id, topK=10))\n",
+    "\n",
+    "top_n.to_csv('Recommendations generated/ml-100k/Self_IKNN_reco.csv', index=False, header=False)\n",
+    "\n",
+    "estimations=pd.DataFrame(model.estimate(user_code_id, item_code_id, test_ui))\n",
+    "estimations.to_csv('Recommendations generated/ml-100k/Self_IKNN_estimations.csv', index=False, header=False)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "943it [00:00, 8845.73it/s]\n"
+     ]
+    },
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>RMSE</th>\n",
+       "      <th>MAE</th>\n",
+       "      <th>precision</th>\n",
+       "      <th>recall</th>\n",
+       "      <th>F_1</th>\n",
+       "      <th>F_05</th>\n",
+       "      <th>precision_super</th>\n",
+       "      <th>recall_super</th>\n",
+       "      <th>NDCG</th>\n",
+       "      <th>mAP</th>\n",
+       "      <th>MRR</th>\n",
+       "      <th>LAUC</th>\n",
+       "      <th>HR</th>\n",
+       "      <th>Reco in test</th>\n",
+       "      <th>Test coverage</th>\n",
+       "      <th>Shannon</th>\n",
+       "      <th>Gini</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>1.018363</td>\n",
+       "      <td>0.808793</td>\n",
+       "      <td>0.000318</td>\n",
+       "      <td>0.000108</td>\n",
+       "      <td>0.00014</td>\n",
+       "      <td>0.000189</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.0</td>\n",
+       "      <td>0.000214</td>\n",
+       "      <td>0.000037</td>\n",
+       "      <td>0.000368</td>\n",
+       "      <td>0.496391</td>\n",
+       "      <td>0.003181</td>\n",
+       "      <td>0.392153</td>\n",
+       "      <td>0.11544</td>\n",
+       "      <td>4.174741</td>\n",
+       "      <td>0.965327</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "       RMSE       MAE  precision    recall      F_1      F_05  \\\n",
+       "0  1.018363  0.808793   0.000318  0.000108  0.00014  0.000189   \n",
+       "\n",
+       "   precision_super  recall_super      NDCG       mAP       MRR      LAUC  \\\n",
+       "0              0.0           0.0  0.000214  0.000037  0.000368  0.496391   \n",
+       "\n",
+       "         HR  Reco in test  Test coverage   Shannon      Gini  \n",
+       "0  0.003181      0.392153        0.11544  4.174741  0.965327  "
+      ]
+     },
+     "execution_count": 7,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "import evaluation_measures as ev\n",
+    "estimations_df=pd.read_csv('Recommendations generated/ml-100k/Self_IKNN_estimations.csv', header=None)\n",
+    "reco=np.loadtxt('Recommendations generated/ml-100k/Self_IKNN_reco.csv', delimiter=',')\n",
+    "\n",
+    "ev.evaluate(test=pd.read_csv('./Datasets/ml-100k/test.csv', sep='\\t', header=None),\n",
+    "            estimations_df=estimations_df, \n",
+    "            reco=reco,\n",
+    "            super_reactions=[4,5])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "943it [00:00, 7423.18it/s]\n",
+      "943it [00:00, 7890.87it/s]\n",
+      "943it [00:00, 7370.82it/s]\n",
+      "943it [00:00, 8035.93it/s]\n",
+      "943it [00:00, 8071.70it/s]\n",
+      "943it [00:00, 7893.80it/s]\n",
+      "943it [00:00, 8159.55it/s]\n",
+      "943it [00:00, 7982.77it/s]\n",
+      "943it [00:00, 7514.53it/s]\n",
+      "943it [00:00, 8047.34it/s]\n",
+      "943it [00:00, 7874.80it/s]\n",
+      "943it [00:00, 7657.62it/s]\n",
+      "943it [00:00, 8281.73it/s]\n",
+      "943it [00:00, 8253.33it/s]\n",
+      "943it [00:00, 8332.31it/s]\n",
+      "943it [00:00, 8348.73it/s]\n"
+     ]
+    },
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>Model</th>\n",
+       "      <th>RMSE</th>\n",
+       "      <th>MAE</th>\n",
+       "      <th>precision</th>\n",
+       "      <th>recall</th>\n",
+       "      <th>F_1</th>\n",
+       "      <th>F_05</th>\n",
+       "      <th>precision_super</th>\n",
+       "      <th>recall_super</th>\n",
+       "      <th>NDCG</th>\n",
+       "      <th>mAP</th>\n",
+       "      <th>MRR</th>\n",
+       "      <th>LAUC</th>\n",
+       "      <th>HR</th>\n",
+       "      <th>Reco in test</th>\n",
+       "      <th>Test coverage</th>\n",
+       "      <th>Shannon</th>\n",
+       "      <th>Gini</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>Self_RP3Beta</td>\n",
+       "      <td>3.702446</td>\n",
+       "      <td>3.527273</td>\n",
+       "      <td>0.282185</td>\n",
+       "      <td>0.192092</td>\n",
+       "      <td>0.186749</td>\n",
+       "      <td>0.216980</td>\n",
+       "      <td>0.204185</td>\n",
+       "      <td>0.240096</td>\n",
+       "      <td>0.339114</td>\n",
+       "      <td>0.204905</td>\n",
+       "      <td>0.572157</td>\n",
+       "      <td>0.593544</td>\n",
+       "      <td>0.875928</td>\n",
+       "      <td>1.000000</td>\n",
+       "      <td>0.077201</td>\n",
+       "      <td>3.875892</td>\n",
+       "      <td>0.974947</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>Self_TopPop</td>\n",
+       "      <td>2.508258</td>\n",
+       "      <td>2.217909</td>\n",
+       "      <td>0.188865</td>\n",
+       "      <td>0.116919</td>\n",
+       "      <td>0.118732</td>\n",
+       "      <td>0.141584</td>\n",
+       "      <td>0.130472</td>\n",
+       "      <td>0.137473</td>\n",
+       "      <td>0.214651</td>\n",
+       "      <td>0.111707</td>\n",
+       "      <td>0.400939</td>\n",
+       "      <td>0.555546</td>\n",
+       "      <td>0.765642</td>\n",
+       "      <td>1.000000</td>\n",
+       "      <td>0.038961</td>\n",
+       "      <td>3.159079</td>\n",
+       "      <td>0.987317</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>Ready_SVD</td>\n",
+       "      <td>0.952784</td>\n",
+       "      <td>0.750597</td>\n",
+       "      <td>0.095228</td>\n",
+       "      <td>0.047497</td>\n",
+       "      <td>0.053142</td>\n",
+       "      <td>0.067082</td>\n",
+       "      <td>0.084871</td>\n",
+       "      <td>0.076457</td>\n",
+       "      <td>0.109075</td>\n",
+       "      <td>0.050124</td>\n",
+       "      <td>0.241366</td>\n",
+       "      <td>0.520459</td>\n",
+       "      <td>0.499470</td>\n",
+       "      <td>0.992047</td>\n",
+       "      <td>0.217893</td>\n",
+       "      <td>4.405246</td>\n",
+       "      <td>0.953484</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>Self_SVDBaseline</td>\n",
+       "      <td>0.930321</td>\n",
+       "      <td>0.734643</td>\n",
+       "      <td>0.092683</td>\n",
+       "      <td>0.042046</td>\n",
+       "      <td>0.048568</td>\n",
+       "      <td>0.063218</td>\n",
+       "      <td>0.082940</td>\n",
+       "      <td>0.068730</td>\n",
+       "      <td>0.098937</td>\n",
+       "      <td>0.044405</td>\n",
+       "      <td>0.203936</td>\n",
+       "      <td>0.517696</td>\n",
+       "      <td>0.469777</td>\n",
+       "      <td>1.000000</td>\n",
+       "      <td>0.058442</td>\n",
+       "      <td>3.085857</td>\n",
+       "      <td>0.988824</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>Ready_SVDBiased</td>\n",
+       "      <td>0.940375</td>\n",
+       "      <td>0.742264</td>\n",
+       "      <td>0.092153</td>\n",
+       "      <td>0.039645</td>\n",
+       "      <td>0.046804</td>\n",
+       "      <td>0.061886</td>\n",
+       "      <td>0.079399</td>\n",
+       "      <td>0.055967</td>\n",
+       "      <td>0.102017</td>\n",
+       "      <td>0.047972</td>\n",
+       "      <td>0.216876</td>\n",
+       "      <td>0.516515</td>\n",
+       "      <td>0.441145</td>\n",
+       "      <td>0.997455</td>\n",
+       "      <td>0.167388</td>\n",
+       "      <td>4.235348</td>\n",
+       "      <td>0.962085</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>Ready_Baseline</td>\n",
+       "      <td>0.949459</td>\n",
+       "      <td>0.752487</td>\n",
+       "      <td>0.091410</td>\n",
+       "      <td>0.037652</td>\n",
+       "      <td>0.046030</td>\n",
+       "      <td>0.061286</td>\n",
+       "      <td>0.079614</td>\n",
+       "      <td>0.056463</td>\n",
+       "      <td>0.095957</td>\n",
+       "      <td>0.043178</td>\n",
+       "      <td>0.198193</td>\n",
+       "      <td>0.515501</td>\n",
+       "      <td>0.437964</td>\n",
+       "      <td>1.000000</td>\n",
+       "      <td>0.033911</td>\n",
+       "      <td>2.836513</td>\n",
+       "      <td>0.991139</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>Self_SVD</td>\n",
+       "      <td>0.939326</td>\n",
+       "      <td>0.740022</td>\n",
+       "      <td>0.074549</td>\n",
+       "      <td>0.031755</td>\n",
+       "      <td>0.038425</td>\n",
+       "      <td>0.050562</td>\n",
+       "      <td>0.065665</td>\n",
+       "      <td>0.050602</td>\n",
+       "      <td>0.077117</td>\n",
+       "      <td>0.031574</td>\n",
+       "      <td>0.165509</td>\n",
+       "      <td>0.512485</td>\n",
+       "      <td>0.414634</td>\n",
+       "      <td>0.981866</td>\n",
+       "      <td>0.080087</td>\n",
+       "      <td>3.858982</td>\n",
+       "      <td>0.975271</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>Self_GlobalAvg</td>\n",
+       "      <td>1.125760</td>\n",
+       "      <td>0.943534</td>\n",
+       "      <td>0.061188</td>\n",
+       "      <td>0.025968</td>\n",
+       "      <td>0.031383</td>\n",
+       "      <td>0.041343</td>\n",
+       "      <td>0.040558</td>\n",
+       "      <td>0.032107</td>\n",
+       "      <td>0.067695</td>\n",
+       "      <td>0.027470</td>\n",
+       "      <td>0.171187</td>\n",
+       "      <td>0.509546</td>\n",
+       "      <td>0.384942</td>\n",
+       "      <td>1.000000</td>\n",
+       "      <td>0.025974</td>\n",
+       "      <td>2.711772</td>\n",
+       "      <td>0.992003</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>Ready_Random</td>\n",
+       "      <td>1.518551</td>\n",
+       "      <td>1.218784</td>\n",
+       "      <td>0.050583</td>\n",
+       "      <td>0.024085</td>\n",
+       "      <td>0.027323</td>\n",
+       "      <td>0.034826</td>\n",
+       "      <td>0.031223</td>\n",
+       "      <td>0.026436</td>\n",
+       "      <td>0.054902</td>\n",
+       "      <td>0.020652</td>\n",
+       "      <td>0.137928</td>\n",
+       "      <td>0.508570</td>\n",
+       "      <td>0.353128</td>\n",
+       "      <td>0.987699</td>\n",
+       "      <td>0.183261</td>\n",
+       "      <td>5.093805</td>\n",
+       "      <td>0.908215</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>Ready_I-KNN</td>\n",
+       "      <td>1.030386</td>\n",
+       "      <td>0.813067</td>\n",
+       "      <td>0.026087</td>\n",
+       "      <td>0.006908</td>\n",
+       "      <td>0.010593</td>\n",
+       "      <td>0.016046</td>\n",
+       "      <td>0.021137</td>\n",
+       "      <td>0.009522</td>\n",
+       "      <td>0.024214</td>\n",
+       "      <td>0.008958</td>\n",
+       "      <td>0.048068</td>\n",
+       "      <td>0.499885</td>\n",
+       "      <td>0.154825</td>\n",
+       "      <td>0.402333</td>\n",
+       "      <td>0.434343</td>\n",
+       "      <td>5.133650</td>\n",
+       "      <td>0.877999</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>Ready_U-KNNBaseline</td>\n",
+       "      <td>0.935327</td>\n",
+       "      <td>0.737424</td>\n",
+       "      <td>0.002545</td>\n",
+       "      <td>0.000755</td>\n",
+       "      <td>0.001105</td>\n",
+       "      <td>0.001602</td>\n",
+       "      <td>0.002253</td>\n",
+       "      <td>0.000930</td>\n",
+       "      <td>0.003444</td>\n",
+       "      <td>0.001362</td>\n",
+       "      <td>0.011760</td>\n",
+       "      <td>0.496724</td>\n",
+       "      <td>0.021209</td>\n",
+       "      <td>0.482821</td>\n",
+       "      <td>0.059885</td>\n",
+       "      <td>2.232578</td>\n",
+       "      <td>0.994487</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>Ready_I-KNNBaseline</td>\n",
+       "      <td>0.935327</td>\n",
+       "      <td>0.737424</td>\n",
+       "      <td>0.002545</td>\n",
+       "      <td>0.000755</td>\n",
+       "      <td>0.001105</td>\n",
+       "      <td>0.001602</td>\n",
+       "      <td>0.002253</td>\n",
+       "      <td>0.000930</td>\n",
+       "      <td>0.003444</td>\n",
+       "      <td>0.001362</td>\n",
+       "      <td>0.011760</td>\n",
+       "      <td>0.496724</td>\n",
+       "      <td>0.021209</td>\n",
+       "      <td>0.482821</td>\n",
+       "      <td>0.059885</td>\n",
+       "      <td>2.232578</td>\n",
+       "      <td>0.994487</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>Ready_U-KNN</td>\n",
+       "      <td>1.023495</td>\n",
+       "      <td>0.807913</td>\n",
+       "      <td>0.000742</td>\n",
+       "      <td>0.000205</td>\n",
+       "      <td>0.000305</td>\n",
+       "      <td>0.000449</td>\n",
+       "      <td>0.000536</td>\n",
+       "      <td>0.000198</td>\n",
+       "      <td>0.000845</td>\n",
+       "      <td>0.000274</td>\n",
+       "      <td>0.002744</td>\n",
+       "      <td>0.496441</td>\n",
+       "      <td>0.007423</td>\n",
+       "      <td>0.602121</td>\n",
+       "      <td>0.010823</td>\n",
+       "      <td>2.089186</td>\n",
+       "      <td>0.995706</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>Self_TopRated</td>\n",
+       "      <td>1.033085</td>\n",
+       "      <td>0.822057</td>\n",
+       "      <td>0.000954</td>\n",
+       "      <td>0.000188</td>\n",
+       "      <td>0.000298</td>\n",
+       "      <td>0.000481</td>\n",
+       "      <td>0.000644</td>\n",
+       "      <td>0.000223</td>\n",
+       "      <td>0.001043</td>\n",
+       "      <td>0.000335</td>\n",
+       "      <td>0.003348</td>\n",
+       "      <td>0.496433</td>\n",
+       "      <td>0.009544</td>\n",
+       "      <td>0.699046</td>\n",
+       "      <td>0.005051</td>\n",
+       "      <td>1.945910</td>\n",
+       "      <td>0.995669</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>Self_BaselineUI</td>\n",
+       "      <td>0.967585</td>\n",
+       "      <td>0.762740</td>\n",
+       "      <td>0.000954</td>\n",
+       "      <td>0.000170</td>\n",
+       "      <td>0.000278</td>\n",
+       "      <td>0.000463</td>\n",
+       "      <td>0.000644</td>\n",
+       "      <td>0.000189</td>\n",
+       "      <td>0.000752</td>\n",
+       "      <td>0.000168</td>\n",
+       "      <td>0.001677</td>\n",
+       "      <td>0.496424</td>\n",
+       "      <td>0.009544</td>\n",
+       "      <td>0.600530</td>\n",
+       "      <td>0.005051</td>\n",
+       "      <td>1.803126</td>\n",
+       "      <td>0.996380</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>Self_IKNN</td>\n",
+       "      <td>1.018363</td>\n",
+       "      <td>0.808793</td>\n",
+       "      <td>0.000318</td>\n",
+       "      <td>0.000108</td>\n",
+       "      <td>0.000140</td>\n",
+       "      <td>0.000189</td>\n",
+       "      <td>0.000000</td>\n",
+       "      <td>0.000000</td>\n",
+       "      <td>0.000214</td>\n",
+       "      <td>0.000037</td>\n",
+       "      <td>0.000368</td>\n",
+       "      <td>0.496391</td>\n",
+       "      <td>0.003181</td>\n",
+       "      <td>0.392153</td>\n",
+       "      <td>0.115440</td>\n",
+       "      <td>4.174741</td>\n",
+       "      <td>0.965327</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                 Model      RMSE       MAE  precision    recall       F_1  \\\n",
+       "0         Self_RP3Beta  3.702446  3.527273   0.282185  0.192092  0.186749   \n",
+       "0          Self_TopPop  2.508258  2.217909   0.188865  0.116919  0.118732   \n",
+       "0            Ready_SVD  0.952784  0.750597   0.095228  0.047497  0.053142   \n",
+       "0     Self_SVDBaseline  0.930321  0.734643   0.092683  0.042046  0.048568   \n",
+       "0      Ready_SVDBiased  0.940375  0.742264   0.092153  0.039645  0.046804   \n",
+       "0       Ready_Baseline  0.949459  0.752487   0.091410  0.037652  0.046030   \n",
+       "0             Self_SVD  0.939326  0.740022   0.074549  0.031755  0.038425   \n",
+       "0       Self_GlobalAvg  1.125760  0.943534   0.061188  0.025968  0.031383   \n",
+       "0         Ready_Random  1.518551  1.218784   0.050583  0.024085  0.027323   \n",
+       "0          Ready_I-KNN  1.030386  0.813067   0.026087  0.006908  0.010593   \n",
+       "0  Ready_U-KNNBaseline  0.935327  0.737424   0.002545  0.000755  0.001105   \n",
+       "0  Ready_I-KNNBaseline  0.935327  0.737424   0.002545  0.000755  0.001105   \n",
+       "0          Ready_U-KNN  1.023495  0.807913   0.000742  0.000205  0.000305   \n",
+       "0        Self_TopRated  1.033085  0.822057   0.000954  0.000188  0.000298   \n",
+       "0      Self_BaselineUI  0.967585  0.762740   0.000954  0.000170  0.000278   \n",
+       "0            Self_IKNN  1.018363  0.808793   0.000318  0.000108  0.000140   \n",
+       "\n",
+       "       F_05  precision_super  recall_super      NDCG       mAP       MRR  \\\n",
+       "0  0.216980         0.204185      0.240096  0.339114  0.204905  0.572157   \n",
+       "0  0.141584         0.130472      0.137473  0.214651  0.111707  0.400939   \n",
+       "0  0.067082         0.084871      0.076457  0.109075  0.050124  0.241366   \n",
+       "0  0.063218         0.082940      0.068730  0.098937  0.044405  0.203936   \n",
+       "0  0.061886         0.079399      0.055967  0.102017  0.047972  0.216876   \n",
+       "0  0.061286         0.079614      0.056463  0.095957  0.043178  0.198193   \n",
+       "0  0.050562         0.065665      0.050602  0.077117  0.031574  0.165509   \n",
+       "0  0.041343         0.040558      0.032107  0.067695  0.027470  0.171187   \n",
+       "0  0.034826         0.031223      0.026436  0.054902  0.020652  0.137928   \n",
+       "0  0.016046         0.021137      0.009522  0.024214  0.008958  0.048068   \n",
+       "0  0.001602         0.002253      0.000930  0.003444  0.001362  0.011760   \n",
+       "0  0.001602         0.002253      0.000930  0.003444  0.001362  0.011760   \n",
+       "0  0.000449         0.000536      0.000198  0.000845  0.000274  0.002744   \n",
+       "0  0.000481         0.000644      0.000223  0.001043  0.000335  0.003348   \n",
+       "0  0.000463         0.000644      0.000189  0.000752  0.000168  0.001677   \n",
+       "0  0.000189         0.000000      0.000000  0.000214  0.000037  0.000368   \n",
+       "\n",
+       "       LAUC        HR  Reco in test  Test coverage   Shannon      Gini  \n",
+       "0  0.593544  0.875928      1.000000       0.077201  3.875892  0.974947  \n",
+       "0  0.555546  0.765642      1.000000       0.038961  3.159079  0.987317  \n",
+       "0  0.520459  0.499470      0.992047       0.217893  4.405246  0.953484  \n",
+       "0  0.517696  0.469777      1.000000       0.058442  3.085857  0.988824  \n",
+       "0  0.516515  0.441145      0.997455       0.167388  4.235348  0.962085  \n",
+       "0  0.515501  0.437964      1.000000       0.033911  2.836513  0.991139  \n",
+       "0  0.512485  0.414634      0.981866       0.080087  3.858982  0.975271  \n",
+       "0  0.509546  0.384942      1.000000       0.025974  2.711772  0.992003  \n",
+       "0  0.508570  0.353128      0.987699       0.183261  5.093805  0.908215  \n",
+       "0  0.499885  0.154825      0.402333       0.434343  5.133650  0.877999  \n",
+       "0  0.496724  0.021209      0.482821       0.059885  2.232578  0.994487  \n",
+       "0  0.496724  0.021209      0.482821       0.059885  2.232578  0.994487  \n",
+       "0  0.496441  0.007423      0.602121       0.010823  2.089186  0.995706  \n",
+       "0  0.496433  0.009544      0.699046       0.005051  1.945910  0.995669  \n",
+       "0  0.496424  0.009544      0.600530       0.005051  1.803126  0.996380  \n",
+       "0  0.496391  0.003181      0.392153       0.115440  4.174741  0.965327  "
+      ]
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "import imp\n",
+    "imp.reload(ev)\n",
+    "\n",
+    "import evaluation_measures as ev\n",
+    "dir_path=\"Recommendations generated/ml-100k/\"\n",
+    "super_reactions=[4,5]\n",
+    "test=pd.read_csv('./Datasets/ml-100k/test.csv', sep='\\t', header=None)\n",
+    "\n",
+    "ev.evaluate_all(test, dir_path, super_reactions)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Ready-made KNNs - Surprise implementation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### I-KNN - basic"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Computing the cosine similarity matrix...\n",
+      "Done computing similarity matrix.\n",
+      "Generating predictions...\n",
+      "Generating top N recommendations...\n",
+      "Generating predictions...\n"
+     ]
+    }
+   ],
+   "source": [
+    "import helpers\n",
+    "import surprise as sp\n",
+    "import imp\n",
+    "imp.reload(helpers)\n",
+    "\n",
+    "sim_options = {'name': 'cosine',\n",
+    "              'user_based': False}  # compute similarities between items\n",
+    "algo = sp.KNNBasic(sim_options=sim_options)\n",
+    "\n",
+    "helpers.ready_made(algo, reco_path='Recommendations generated/ml-100k/Ready_I-KNN_reco.csv',\n",
+    "          estimations_path='Recommendations generated/ml-100k/Ready_Baseline_I-KNN_estimations.csv')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### U-KNN - basic"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Computing the cosine similarity matrix...\n",
+      "Done computing similarity matrix.\n",
+      "Generating predictions...\n"
+     ]
+    },
+    {
+     "ename": "KeyboardInterrupt",
+     "evalue": "",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+      "\u001b[0;31mKeyboardInterrupt\u001b[0m                         Traceback (most recent call last)",
+      "\u001b[0;32m<ipython-input-10-dd4f59625a08>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      9\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     10\u001b[0m helpers.ready_made(algo, reco_path='Recommendations generated/ml-100k/Ready_U-KNN_reco.csv',\n\u001b[0;32m---> 11\u001b[0;31m           estimations_path='Recommendations generated/ml-100k/Ready_Baseline_U-KNN_estimations.csv')\n\u001b[0m",
+      "\u001b[0;32m/mnt/c/Users/rkwie/Repositories/Warsztaty z uczenia maszynowego - systemy rekomendacyjne/helpers.py\u001b[0m in \u001b[0;36mready_made\u001b[0;34m(algo, reco_path, estimations_path)\u001b[0m\n\u001b[1;32m     61\u001b[0m     \u001b[0mantitrainset\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtrainset\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbuild_anti_testset\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# We want to predict ratings of pairs (user, item) which are not in train set\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     62\u001b[0m     \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'Generating predictions...'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 63\u001b[0;31m     \u001b[0mpredictions\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0malgo\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtest\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mantitrainset\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     64\u001b[0m     \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'Generating top N recommendations...'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     65\u001b[0m     \u001b[0mtop_n\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mget_top_n\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpredictions\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mn\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m10\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+      "\u001b[0;32m~/.local/lib/python3.6/site-packages/surprise/prediction_algorithms/algo_base.py\u001b[0m in \u001b[0;36mtest\u001b[0;34m(self, testset, verbose)\u001b[0m\n\u001b[1;32m    165\u001b[0m                                     \u001b[0mr_ui_trans\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    166\u001b[0m                                     verbose=verbose)\n\u001b[0;32m--> 167\u001b[0;31m                        for (uid, iid, r_ui_trans) in testset]\n\u001b[0m\u001b[1;32m    168\u001b[0m         \u001b[0;32mreturn\u001b[0m \u001b[0mpredictions\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    169\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
+      "\u001b[0;32m~/.local/lib/python3.6/site-packages/surprise/prediction_algorithms/algo_base.py\u001b[0m in \u001b[0;36m<listcomp>\u001b[0;34m(.0)\u001b[0m\n\u001b[1;32m    165\u001b[0m                                     \u001b[0mr_ui_trans\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    166\u001b[0m                                     verbose=verbose)\n\u001b[0;32m--> 167\u001b[0;31m                        for (uid, iid, r_ui_trans) in testset]\n\u001b[0m\u001b[1;32m    168\u001b[0m         \u001b[0;32mreturn\u001b[0m \u001b[0mpredictions\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    169\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
+      "\u001b[0;32m~/.local/lib/python3.6/site-packages/surprise/prediction_algorithms/algo_base.py\u001b[0m in \u001b[0;36mpredict\u001b[0;34m(self, uid, iid, r_ui, clip, verbose)\u001b[0m\n\u001b[1;32m    103\u001b[0m         \u001b[0mdetails\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m{\u001b[0m\u001b[0;34m}\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    104\u001b[0m         \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 105\u001b[0;31m             \u001b[0mest\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mestimate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0miuid\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0miiid\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    106\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    107\u001b[0m             \u001b[0;31m# If the details dict was also returned\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+      "\u001b[0;32m~/.local/lib/python3.6/site-packages/surprise/prediction_algorithms/knns.py\u001b[0m in \u001b[0;36mestimate\u001b[0;34m(self, u, i)\u001b[0m\n\u001b[1;32m    109\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    110\u001b[0m         \u001b[0mneighbors\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msim\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mx2\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mr\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mx2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mr\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0myr\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0my\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 111\u001b[0;31m         \u001b[0mk_neighbors\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mheapq\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mnlargest\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mk\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mneighbors\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkey\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mlambda\u001b[0m \u001b[0mt\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mt\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    112\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    113\u001b[0m         \u001b[0;31m# compute weighted average\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+      "\u001b[0;32m/usr/lib/python3.6/heapq.py\u001b[0m in \u001b[0;36mnlargest\u001b[0;34m(n, iterable, key)\u001b[0m\n\u001b[1;32m    567\u001b[0m     \u001b[0;31m# General case, slowest method\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    568\u001b[0m     \u001b[0mit\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0miter\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0miterable\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 569\u001b[0;31m     \u001b[0mresult\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0melem\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mi\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0melem\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mi\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0melem\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mzip\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrange\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m-\u001b[0m\u001b[0mn\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mit\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    570\u001b[0m     \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mresult\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    571\u001b[0m         \u001b[0;32mreturn\u001b[0m \u001b[0mresult\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+      "\u001b[0;32m/usr/lib/python3.6/heapq.py\u001b[0m in \u001b[0;36m<listcomp>\u001b[0;34m(.0)\u001b[0m\n\u001b[1;32m    567\u001b[0m     \u001b[0;31m# General case, slowest method\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    568\u001b[0m     \u001b[0mit\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0miter\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0miterable\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 569\u001b[0;31m     \u001b[0mresult\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0melem\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mi\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0melem\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mi\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0melem\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mzip\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrange\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m-\u001b[0m\u001b[0mn\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mit\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    570\u001b[0m     \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mresult\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    571\u001b[0m         \u001b[0;32mreturn\u001b[0m \u001b[0mresult\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+      "\u001b[0;32m~/.local/lib/python3.6/site-packages/surprise/prediction_algorithms/knns.py\u001b[0m in \u001b[0;36m<lambda>\u001b[0;34m(t)\u001b[0m\n\u001b[1;32m    109\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    110\u001b[0m         \u001b[0mneighbors\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msim\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mx2\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mr\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mx2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mr\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0myr\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0my\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 111\u001b[0;31m         \u001b[0mk_neighbors\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mheapq\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mnlargest\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mk\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mneighbors\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkey\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mlambda\u001b[0m \u001b[0mt\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mt\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    112\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    113\u001b[0m         \u001b[0;31m# compute weighted average\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+      "\u001b[0;31mKeyboardInterrupt\u001b[0m: "
+     ]
+    }
+   ],
+   "source": [
+    "import helpers\n",
+    "import surprise as sp\n",
+    "import imp\n",
+    "imp.reload(helpers)\n",
+    "\n",
+    "sim_options = {'name': 'cosine',\n",
+    "              'user_based': True}  # compute similarities between users\n",
+    "algo = sp.KNNBasic(sim_options=sim_options)\n",
+    "\n",
+    "helpers.ready_made(algo, reco_path='Recommendations generated/ml-100k/Ready_U-KNN_reco.csv',\n",
+    "          estimations_path='Recommendations generated/ml-100k/Ready_Baseline_U-KNN_estimations.csv')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### I-KNN - on top baseline"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import helpers\n",
+    "import surprise as sp\n",
+    "import imp\n",
+    "imp.reload(helpers)\n",
+    "\n",
+    "sim_options = {'name': 'cosine',\n",
+    "              'user_based': False}  # compute similarities between items\n",
+    "algo = sp.KNNBaseline()\n",
+    "\n",
+    "helpers.ready_made(algo, reco_path='Recommendations generated/ml-100k/Ready_I-KNNBaseline_reco.csv',\n",
+    "          estimations_path='Recommendations generated/ml-100k/Ready_I-KNNBaseline_estimations.csv')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# project task 4:  use a version of your choice of Surprise KNNalgorithm"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# read the docs and try to find best parameter configuration (let say in terms of RMSE)\n",
+    "# https://surprise.readthedocs.io/en/stable/knn_inspired.html##surprise.prediction_algorithms.knns.KNNBaseline\n",
+    "# the solution here can be similar to examples above\n",
+    "# please save the output in 'Recommendations generated/ml-100k/Self_KNNSurprisetask_reco.csv' and\n",
+    "# 'Recommendations generated/ml-100k/Self_KNNSurprisetask_estimations.csv'"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.9"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
--- a/spaces.ipynb
+++ b/spaces.ipynb
@ -0,0 +1,80 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "['dimensions: 1, cases when observation is the nearest: 0.0%',\n",
+       " 'dimensions: 2, cases when observation is the nearest: 0.0%',\n",
+       " 'dimensions: 3, cases when observation is the nearest: 0.0%',\n",
+       " 'dimensions: 10, cases when observation is the nearest: 10.0%',\n",
+       " 'dimensions: 20, cases when observation is the nearest: 61.0%',\n",
+       " 'dimensions: 30, cases when observation is the nearest: 96.0%',\n",
+       " 'dimensions: 40, cases when observation is the nearest: 98.0%',\n",
+       " 'dimensions: 50, cases when observation is the nearest: 100.0%',\n",
+       " 'dimensions: 60, cases when observation is the nearest: 100.0%',\n",
+       " 'dimensions: 70, cases when observation is the nearest: 100.0%',\n",
+       " 'dimensions: 80, cases when observation is the nearest: 100.0%',\n",
+       " 'dimensions: 90, cases when observation is the nearest: 100.0%']"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import random\n",
+    "from numpy.linalg import norm\n",
+    "\n",
+    "dimensions=[1,2,3]+[10*i for i in range(1,10)]\n",
+    "nb_vectors=10000\n",
+    "trials=100\n",
+    "k=1 # by setting k=1 we want to check how often the closest vector to the avarage of 2 random vectors is one of these 2 vectors\n",
+    "\n",
+    "result=[]\n",
+    "for dimension in dimensions:\n",
+    "    vectors=np.random.normal(0,1,size=(nb_vectors, dimension))\n",
+    "    successes=0\n",
+    "    for i in range(trials):\n",
+    "        i1,i2=random.sample(range(nb_vectors),2)\n",
+    "        target=(vectors[i1]+vectors[i2])/2\n",
+    "\n",
+    "        distances=pd.DataFrame(enumerate(np.dot(target, vectors.transpose())/norm(target)/norm(vectors.transpose(), axis=0)))\n",
+    "        distances=distances.sort_values(by=[1], ascending=False)\n",
+    "        if (i1 in (list(distances[0][:k]))) | (i2 in (list(distances[0][:k]))):\n",
+    "            successes+=1\n",
+    "    result.append(successes/trials)\n",
+    "    \n",
+    "[f'dimensions: {i}, cases when observation is the nearest: {100*round(j,3)}%' for i,j in zip(dimensions, result)]"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.9"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/Factorization.ipynb
+++ b/Factorization.ipynb
--- a/Graph-based.ipynb
+++ b/Graph-based.ipynb
--- a/evaluation_measures.py
+++ b/evaluation_measures.py
@ -0,0 +1,214 @@
+import os
+import sys
+import numpy as np
+import pandas as pd
+import math
+from sklearn.preprocessing import normalize
+from tqdm import tqdm
+from datetime import datetime, date
+import random
+import scipy.sparse as sparse
+from os import listdir
+from os.path import isfile, join
+from collections import defaultdict
+
+    
+def evaluate(test,
+            estimations_df, 
+            reco,
+            super_reactions=[4,5],
+            topK=10):
+    
+    estimations_df=estimations_df.copy()
+    reco=reco.copy()
+    test_df=test.copy()
+    
+    # prepare testset
+    test_df.columns=['user', 'item', 'rating', 'timestamp']
+    test_df['user_code'] = test_df['user'].astype("category").cat.codes
+    test_df['item_code'] = test_df['item'].astype("category").cat.codes
+
+    user_code_id = dict(enumerate(test_df['user'].astype("category").cat.categories))
+    user_id_code = dict((v, k) for k, v in user_code_id.items())
+    item_code_id = dict(enumerate(test_df['item'].astype("category").cat.categories))
+    item_id_code = dict((v, k) for k, v in item_code_id.items())
+
+    test_ui = sparse.csr_matrix((test_df['rating'], (test_df['user_code'], test_df['item_code'])))
+    
+    #prepare estimations
+    estimations_df.columns=['user', 'item' ,'score']
+    estimations_df['user_code']=[user_id_code[user] for user in estimations_df['user']]
+    estimations_df['item_code']=[item_id_code[item] for item in estimations_df['item']]
+    estimations=sparse.csr_matrix((estimations_df['score'], (estimations_df['user_code'], estimations_df['item_code'])), shape=test_ui.shape)
+    
+    #compute_estimations
+    estimations_df=estimations_metrics(test_ui, estimations)
+    
+    #prepare reco
+    users=reco[:,:1]
+    items=reco[:,1::2]
+    # Let's use inner ids instead of real ones
+    users=np.vectorize(lambda x: user_id_code.setdefault(x, -1))(users) # maybe users we recommend are not in test set
+    items=np.vectorize(lambda x: item_id_code.setdefault(x, -1))(items) # maybe items we recommend are not in test set
+    # Let's put them into one array
+    reco=np.concatenate((users, items), axis=1)
+    
+    #compute ranking metrics
+    ranking_df=ranking_metrics(test_ui, reco, super_reactions=super_reactions, topK=topK)
+    
+    #compute diversity metrics
+    diversity_df=diversity_metrics(test_ui, reco, topK)
+    
+    result=pd.concat([estimations_df, ranking_df, diversity_df], axis=1)
+    
+    return(result)
+
+
+def ranking_metrics(test_ui, reco, super_reactions=[], topK=10):
+    
+    nb_items=test_ui.shape[1]
+    relevant_users, super_relevant_users, prec, rec, F_1, F_05, prec_super, rec_super, ndcg, mAP, MRR, LAUC, HR=\
+    0,0,0,0,0,0,0,0,0,0,0,0,0
+    
+    cg = (1.0 / np.log2(np.arange(2, topK + 2)))
+    cg_sum = np.cumsum(cg)
+    
+    for (nb_user, user) in tqdm(enumerate(reco[:,0])):
+        u_rated_items=test_ui.indices[test_ui.indptr[user]:test_ui.indptr[user+1]]
+        nb_u_rated_items=len(u_rated_items)
+        if nb_u_rated_items>0: # skip users with no items in test set (still possible that there will be no super items)
+            relevant_users+=1
+            
+            u_super_items=u_rated_items[np.vectorize(lambda x: x in super_reactions)\
+            (test_ui.data[test_ui.indptr[user]:test_ui.indptr[user+1]])]
+            # more natural seems u_super_items=[item for item in u_rated_items if test_ui[user,item] in super_reactions]
+            # but accesing test_ui[user,item] is expensive -we should avoid doing it
+            if len(u_super_items)>0:
+                super_relevant_users+=1
+            
+            user_successes=np.zeros(topK)
+            nb_user_successes=0
+            user_super_successes=np.zeros(topK)
+            nb_user_super_successes=0
+            
+            # evaluation
+            for (item_position,item) in enumerate(reco[nb_user,1:topK+1]):
+                if item in u_rated_items:
+                    user_successes[item_position]=1
+                    nb_user_successes+=1
+                    if item in u_super_items:
+                        user_super_successes[item_position]=1
+                        nb_user_super_successes+=1
+                        
+            prec_u=nb_user_successes/topK 
+            prec+=prec_u
+            
+            rec_u=nb_user_successes/nb_u_rated_items
+            rec+=rec_u
+            
+            F_1+=2*(prec_u*rec_u)/(prec_u+rec_u) if prec_u+rec_u>0 else 0
+            F_05+=(0.5**2+1)*(prec_u*rec_u)/(0.5**2*prec_u+rec_u) if prec_u+rec_u>0 else 0
+            
+            prec_super+=nb_user_super_successes/topK
+            rec_super+=nb_user_super_successes/max(len(u_super_items),1)
+            ndcg+=np.dot(user_successes,cg)/cg_sum[min(topK, nb_u_rated_items)-1]
+            
+            cumsum_successes=np.cumsum(user_successes)
+            mAP+=np.dot(cumsum_successes/np.arange(1,topK+1), user_successes)/min(topK, nb_u_rated_items)
+            MRR+=1/(user_successes.nonzero()[0][0]+1) if user_successes.nonzero()[0].size>0 else 0
+            LAUC+=(np.dot(cumsum_successes, 1-user_successes)+\
+            (nb_user_successes+nb_u_rated_items)/2*((nb_items-nb_u_rated_items)-(topK-nb_user_successes)))/\
+            ((nb_items-nb_u_rated_items)*nb_u_rated_items)
+            
+            HR+=nb_user_successes>0
+            
+            
+    result=[]
+    result.append(('precision', prec/relevant_users))
+    result.append(('recall', rec/relevant_users))
+    result.append(('F_1', F_1/relevant_users))
+    result.append(('F_05', F_05/relevant_users))
+    result.append(('precision_super', prec_super/super_relevant_users))
+    result.append(('recall_super', rec_super/super_relevant_users))
+    result.append(('NDCG', ndcg/relevant_users))
+    result.append(('mAP', mAP/relevant_users))
+    result.append(('MRR', MRR/relevant_users))
+    result.append(('LAUC', LAUC/relevant_users))
+    result.append(('HR', HR/relevant_users))
+
+    df_result=pd.DataFrame()
+    if len(result)>0:
+        df_result=(pd.DataFrame(list(zip(*result))[1])).T
+        df_result.columns=list(zip(*result))[0]
+    return df_result
+
+
+def estimations_metrics(test_ui, estimations):
+    result=[]
+
+    RMSE=(np.sum((estimations.data-test_ui.data)**2)/estimations.nnz)**(1/2)
+    result.append(['RMSE', RMSE])
+
+    MAE=np.sum(abs(estimations.data-test_ui.data))/estimations.nnz
+    result.append(['MAE', MAE])
+    
+    df_result=pd.DataFrame()
+    if len(result)>0:
+        df_result=(pd.DataFrame(list(zip(*result))[1])).T
+        df_result.columns=list(zip(*result))[0]
+    return df_result
+
+def diversity_metrics(test_ui, reco, topK=10):
+    
+    frequencies=defaultdict(int)
+    
+    for item in list(set(test_ui.indices)):
+        frequencies[item]=0
+    
+    for item in reco[:,1:].flat:
+        frequencies[item]+=1
+        
+    nb_reco_outside_test=frequencies[-1]
+    del frequencies[-1]
+    
+    frequencies=np.array(list(frequencies.values()))
+                         
+    nb_rec_items=len(frequencies[frequencies>0])
+    nb_reco_inside_test=np.sum(frequencies)
+                         
+    frequencies=frequencies/np.sum(frequencies)
+    frequencies=np.sort(frequencies)
+    
+    with np.errstate(divide='ignore'): # let's put zeros we items with 0 frequency and ignore division warning
+        log_frequencies=np.nan_to_num(np.log(frequencies), posinf=0, neginf=0)
+                         
+    result=[]
+    result.append(('Reco in test', nb_reco_inside_test/(nb_reco_inside_test+nb_reco_outside_test)))
+    result.append(('Test coverage', nb_rec_items/test_ui.shape[1]))
+    result.append(('Shannon', -np.dot(frequencies, log_frequencies)))
+    result.append(('Gini', np.dot(frequencies, np.arange(1-len(frequencies), len(frequencies), 2))/(len(frequencies)-1)))
+    
+    df_result=(pd.DataFrame(list(zip(*result))[1])).T
+    df_result.columns=list(zip(*result))[0]
+    return df_result
+
+
+
+def evaluate_all(test,
+                 dir_path="Recommendations generated/ml-100k/",
+            super_reactions=[4,5],
+            topK=10):
+    
+    models = list(set(['_'.join(f.split('_')[:2]) for f in listdir(dir_path) 
+              if isfile(dir_path+f)]))
+    result=[]
+    for model in models:
+        estimations_df=pd.read_csv('{}{}_estimations.csv'.format(dir_path, model), header=None)
+        reco=np.loadtxt('{}{}_reco.csv'.format(dir_path, model), delimiter=',')
+        to_append=evaluate(test, estimations_df, reco, super_reactions, topK)
+        
+        to_append.insert(0, "Model", model)
+        result.append(to_append)
+    result=pd.concat(result)
+    result=result.sort_values(by='recall', ascending=False)
+    return result
--- a/helpers.py
+++ b/helpers.py
@ -0,0 +1,75 @@
+import pandas as pd
+import numpy as np
+import scipy.sparse as sparse
+import surprise as sp
+import time
+from collections import defaultdict
+from itertools import chain
+
+def data_to_csr(train_read, test_read):
+    train_read.columns=['user', 'item', 'rating', 'timestamp']
+    test_read.columns=['user', 'item', 'rating', 'timestamp']
+
+    # Let's build whole dataset
+    train_and_test=pd.concat([train_read, test_read], axis=0, ignore_index=True)
+    train_and_test['user_code'] = train_and_test['user'].astype("category").cat.codes
+    train_and_test['item_code'] = train_and_test['item'].astype("category").cat.codes
+
+    user_code_id = dict(enumerate(train_and_test['user'].astype("category").cat.categories))
+    user_id_code = dict((v, k) for k, v in user_code_id.items())
+    item_code_id = dict(enumerate(train_and_test['item'].astype("category").cat.categories))
+    item_id_code = dict((v, k) for k, v in item_code_id.items())
+
+    train_df=pd.merge(train_read, train_and_test, on=list(train_read.columns))
+    test_df=pd.merge(test_read, train_and_test, on=list(train_read.columns))
+
+    # Take number of users and items
+    (U,I)=(train_and_test['user_code'].max()+1, train_and_test['item_code'].max()+1)
+
+    # Create sparse csr matrices
+    train_ui = sparse.csr_matrix((train_df['rating'], (train_df['user_code'], train_df['item_code'])), shape=(U, I))
+    test_ui = sparse.csr_matrix((test_df['rating'], (test_df['user_code'], test_df['item_code'])), shape=(U, I))
+    
+    return train_ui, test_ui, user_code_id, user_id_code, item_code_id, item_id_code
+
+
+def get_top_n(predictions, n=10):
+    
+    # Here we create a dictionary which items are lists of pairs (item, score)
+    top_n = defaultdict(list)
+    for uid, iid, true_r, est, _ in predictions:
+        top_n[uid].append((iid, est))
+        
+    result=[]
+    # Let's choose k best items in the format: (user, item1, score1, item2, score2, ...)
+    for uid, user_ratings in top_n.items():
+        user_ratings.sort(key=lambda x: x[1], reverse=True)
+        result.append([uid]+list(chain(*user_ratings[:n]))) 
+    return result
+
+
+def ready_made(algo, reco_path, estimations_path):
+    reader = sp.Reader(line_format='user item rating timestamp', sep='\t')
+    trainset = sp.Dataset.load_from_file('./Datasets/ml-100k/train.csv', reader=reader)
+    trainset = trainset.build_full_trainset() # <class 'surprise.trainset.Trainset'> -> it is needed for using Surprise package
+
+    testset = sp.Dataset.load_from_file('./Datasets/ml-100k/test.csv', reader=reader)
+    testset = sp.Trainset.build_testset(testset.build_full_trainset())
+    
+    algo.fit(trainset)
+
+    antitrainset = trainset.build_anti_testset() # We want to predict ratings of pairs (user, item) which are not in train set
+    print('Generating predictions...')
+    predictions = algo.test(antitrainset)
+    print('Generating top N recommendations...')
+    top_n = get_top_n(predictions, n=10)
+    top_n=pd.DataFrame(top_n)
+    top_n.to_csv(reco_path, index=False, header=False)
+    
+    print('Generating predictions...')
+    predictions = algo.test(testset)
+    predictions_df=[]
+    for uid, iid, true_r, est, _ in predictions:
+        predictions_df.append([uid, iid, est])
+    predictions_df=pd.DataFrame(predictions_df)
+    predictions_df.to_csv(estimations_path, index=False, header=False)