praca_magisterska/dataset/deezer_clean_data/dataset_descriptions.txt

27 lines
1.3 KiB
Plaintext
Raw Normal View History

The paper that uses the dataset can be cited as:
@misc{1802.03997,
author = {Benedek Rozemberczki and Ryan Davies and Rik Sarkar and Charles Sutton},
title = {GEMSEC: Graph Embedding with Self Clustering},
year = {2018},
eprint = {arXiv:1802.03997}}
It can be accessed at:
https://arxiv.org/abs/1802.03997
--------------------------------------
--------------------------------------
Deezer Datasets
--------------------------------------
--------------------------------------
We collected data from the music streaming service Deezer (November 2017). These datasets represent friendhips networks of users from 3 European countries. Nodes represent the users and edges are the mutual friendships. We reindexed the nodes in order to achieve a certain level of anonimity. The csv files contain the edges -- nodes are indexed from 0. The json files contain the genre preferences of users -- each key is a user id, the genres loved are given as lists. Genre notations are consistent across users. In each dataset users could like 84 distinct genres. Liked genre lists were compiled based on the liked song lists. The countries included are Romania, Croatia and Hungary. For each dataset we listed the number of nodes an edges.
Country #Nodes #Edges
--------------------------
RO 41,773 125,826
HR 54,573 498,202
HU 47,538 222,887
--------------------------