gonito-data/README.md

761 B

Amazon Products (Japanese)

This challenge requires extracting product category from product description. The data is taken from Japanese amazon and consists of over 8000 product offers. It was scraped using a simple Python bot. Most of the product descriptions contain the category as a substring somewhere in the text (or alternatively some synonym of the category). There is also no predefined set of all possible categories. Hence this task is NOT about sequence classification.

Scripts used for generating this dataset can be found here https://github.com/aleksander-mendoza/MachineLearningMiniprojects/blob/master/amazon_products/scraper.py