ISI-transformers/similarity search odpowiedz...

13 KiB

import pandas as pd
import pickle
import numpy as np
import faiss
from sklearn.metrics import ndcg_score, dcg_score, average_precision_score

sprawy organizacyjne- zaliczenie

dodatkowy mini projekt/zadanie domowe?

true_relevance = np.asarray([[10, 2, 0, 1, 5]])
scores = np.asarray([[9, 5, 2, 1, 1]])

idealny- 9,2,5

nasz- 10,2,0

dla p = 3

CG =  10 + 2 + 0 
CG
12
np.log2(2)
1.0
DCG = 10 / np.log2(2) + 2 / np.log2(3) +  0 / np.log2(4)
DCG
11.261859507142916
dcg_score(true_relevance, scores, k=3)
11.261859507142916
IDCG = 10 / np.log2(2) + 5 / np.log2(3) +  2 / np.log2(4)
IDCG
14.154648767857287
DCG / IDCG
0.7956297391650307
ndcg_score(true_relevance, scores, k=3)
0.7956297391650307

Pytanie:

jak to się odnosi do praktyki?

ZADANIE

policz ręcznie CG, DCG, nDCG i sprawdź czy zgadza się to z scikit-learn: dla k = 10

TRUE =     np.asarray([[1,0,1,0,0,0,0,1,0,1,0,0,0,0,1]])
PREDICTED = np.asarray([[15,14,13,12,11,10,9,8,7,6,5,4,3,2,1]])
ndcg_score(TRUE, PREDICTED, k=10)
0.7137727261090451
ndcg_score(TRUE, PREDICTED, k=10)
0.7137727261090451

Wyszukiwarka TFIDF - jakie są plusy i minusy?

W jaki sposób można zrobić lepszą wyszukiwarkę (wykorzystując transformery lub inne modele neuronowe)? Jakie są potencjalne zalety i wady takiego podejścia?

from sentence_transformers import SentenceTransformer
sentences = ["Hello World", "Hallo Welt"]

model = SentenceTransformer('LaBSE')
embeddings = model.encode(sentences)
print(embeddings)
[[-0.07142267 -0.07716201 -0.0304776  ...  0.01356028 -0.04016104
  -0.02446149]
 [-0.06508801 -0.06923407 -0.03735013 ...  0.01013562 -0.04027326
  -0.02171571]]

Zadanie

  1. zainstaluj faiss i zrób tutorial: https://github.com/facebookresearch
  2. wczytaj treści artykułów z BBC News Train.csv
  3. Użyj któregoś z transformerów (możesz użyć biblioteki sentence-transformers) do stworzenia embeddingów dokumentów
  4. wczytaj embeddingi do bazy danych faiss
  5. wyszukaj query 'consumer electronics market'
r = pd.read_csv('BBC News Train.csv')
DOCUMENTS = list(r.Text)

embeddings = model.encode(DOCUMENTS)

embeddings = model.encode(list(r.Text))

pickle.dump(embeddings, open('embeddings.pkl','wb'))

QUERY_STR = 'consumer electronics market'
query =  model.encode([QUERY_STR])
embeddings = pickle.load(open('embeddings.pkl','rb'))
index = faiss.IndexFlatL2(embeddings.shape[1]) 
index.add(np.ascontiguousarray(embeddings))
D, I = index.search(query, 5) 
I
array([[1363, 1371,  898,  744,  292]])
D
array([[1.3110979, 1.4027176, 1.4045265, 1.442167 , 1.442167 ]],
      dtype=float32)
DOCUMENTS[1363]
'internet boom for gift shopping cyberspace is becoming a very popular destination for christmas shoppers.  forecasts predict that british people will spend £4bn buying gifts online during the festive season  an increase of 64% on 2003. surveys also show that the average amount that people are spending is rising  as is the range of goods that they are happy to buy online. savvy shoppers are also using the net to find the hot presents that are all but sold out in high street stores.  almost half of the uk population now shop online according to figures collected by the interactive media in retail group which represents web retailers. about 85% of this group  18m people  expect to do a lot of their christmas gift buying online this year  reports the industry group. on average each shopper will spend £220 and britons lead europe in their affection for online shopping.  almost a third of all the money spent online this christmas will come out of british wallets and purses compared to 29% from german shoppers and only 4% from italian gift buyers. james roper  director of the imrg  said shoppers were now much happier to buy so-called big ticket items such as lcd television sets and digital cameras. mr roper added that many retailers were working hard to reassure consumers that online shopping was safe and that goods ordered as presents would arrive in time for christmas. he advised consumers to give shops a little more time than usual to fulfil orders given that online buying is proving so popular. a survey by hostway suggests that many men prefer to shop online to avoid the embarrassment of buying some types of presents  such as lingerie  for wives and girlfriends. much of this online shopping is likely to be done during work time  according to research carried out by security firm saint bernard software. the research reveals that up to two working days will be lost by staff who do their shopping via their work computer. worst offenders will be those in the 18-35 age bracket  suggests the research  who will spend up to five hours per week in december browsing and buying at online shops.  iggy fanlo  chief revenue officer at shopping.com  said that the growing numbers of people using broadband was driving interest in online shopping.  when you consider narrowband and broadband the conversion to sale is two times higher   he said. higher speeds meant that everything happened much faster  he said  which let people spend time browsing and finding out about products before they buy.  the behaviour of online shoppers was also changing  he said.  the single biggest reason people went online before this year was price   he said.  the number one reason now is convenience.   very few consumers click on the lowest price   he said.  they are looking for good prices and merchant reliability.  consumer comments and reviews were also proving popular with shoppers keen to find out who had the most reliable customer service. data collected by ebay suggests that some smart shoppers are getting round the shortages of hot presents by buying them direct through the auction site. according to ebay uk there are now more than 150 robosapiens remote control robots for sale via the site. the robosapiens toy is almost impossible to find in online and offline stores. similarly many shoppers are turning to ebay to help them get hold of the hard-to-find slimline playstation 2  which many retailers are only selling as part of an expensive bundle. the high demand for the playstation 2 has meant that prices for it are being driven up. in shops the ps2 is supposed to sell for £104.99. in some ebay uk auctions the price has risen to more than double this figure. many people are also using ebay to get hold of gadgets not even released in this country. the portable version of the playstation has only just gone on sale in japan yet some enterprising ebay users are selling the device to uk gadget fans.'