Commit Graph

50 Commits

Author SHA1 Message Date
fa3138df29 count occurences feature 2015-10-01 13:36:54 +02:00
209e374226 repaired concordia test 2015-08-19 20:53:40 +02:00
68fecaddf8 adding all tokenized examples 2015-08-19 20:49:26 +02:00
a765443a01 simple search returns matched pattern fragments 2015-08-07 12:54:57 +02:00
28704c2f43 separated tokenization and adding to index 2015-08-01 17:03:39 +02:00
5a57406875 finished original word positions 2015-06-27 12:40:24 +02:00
a8c5fa0c75 original word positions 2015-06-27 10:09:49 +02:00
dba70b4e24 done word positions 2015-06-26 22:50:53 +02:00
724bf0d080 new responsibilities of tokenized sentence 2015-06-26 15:38:24 +02:00
9b1735516c working sentence tokenizer 2015-06-25 20:49:22 +02:00
8432dd321f tokenizer in progress 2015-06-25 10:12:51 +02:00
0baf3e4ef2 character intervals in progress 2015-06-22 13:52:56 +02:00
07d5d4438b clear index, examples 2015-05-04 20:40:44 +02:00
87a26bfa3b cleaned configuration, doc 2015-04-30 21:15:18 +02:00
b790c6898f stable release 2015-04-30 09:29:10 +02:00
bb7608d05e anubis searcher -> concordia searcher
Former-commit-id: 8afe194adf3163ee62caa30732d9c9dd095df66b
2015-04-24 11:48:32 +02:00
f64449311d removed stop words - works slower
Former-commit-id: 97ce33b0a6ea3c89aaa5a4c69cad248c7b2c8203
2015-04-21 21:33:08 +02:00
5c2ae86097 output concordia score
Former-commit-id: fa7db09fe9319fa844d294ca4e7deb22d1328151
2015-04-21 20:44:49 +02:00
7549703414 best overlay computation
Former-commit-id: 986f3d6b611fd276a7b26073daa0094caf078d1e
2015-04-21 15:14:48 +02:00
024fbf72aa concordia search
Former-commit-id: 609c3a54e930ebae45a2e9a07f63991ec4abc9a6
2015-04-17 14:17:59 +02:00
4e02afc897 anubis search v1 - very slow for some patterns
Former-commit-id: ae327d7d24f4bc959d3749745a8c395093a17a50
2015-04-16 11:39:39 +02:00
0d4bdf12de removed using namespace std
Former-commit-id: dbb5129e1f94d83eca887ada0f89d6bb45250f1e
2015-04-15 14:14:10 +02:00
3a03b01f42 std vectors
Former-commit-id: 5816e87c856f7edc242cc707851a0e2ad05aeb38
2015-04-15 10:55:26 +02:00
e02bbaa0fa getTmMatches
Former-commit-id: 94aa3db2db88195c61c6ac70006c0e1d743dc854
2015-04-14 20:14:30 +02:00
f03b4ad954 fixed lcp search
Former-commit-id: 18192126d134323569bc43205ccc60788d9e6cb6
2015-04-12 12:06:41 +02:00
2533fd5b44 extended markers - length, bitwise operators
Former-commit-id: 948a7fc68bf0b2284ce631d877fc13fa3eaa4882
2015-04-09 22:17:19 +02:00
f83aaef4ed trimming anonymized sentence
Former-commit-id: 316b76717e4075e466828c628e064076d39481c5
2014-08-15 13:22:04 +02:00
8f953883bf anubis search continued
Former-commit-id: 95a08f242a03311d067303bfff07bf4890796da5
2014-06-24 18:23:46 +02:00
e8ea5881a5 lcp search
Former-commit-id: 925a5de8bc33256b594c369907f202e29f809f47
2014-05-15 22:20:31 +02:00
e99eb77b28 anonymizing sentences
Former-commit-id: 5d8bd7e16258fda7c02a7cc0e1da589d73418f0d
2014-04-29 14:46:04 +02:00
Rafał Jaworski
5eaf981bc0 working text utils
Former-commit-id: fa44e4578a007291948e4709a0cfd4278fd3af66
2014-04-24 14:26:35 +02:00
Rafał Jaworski
93c3f50b14 utf8case included
Former-commit-id: a330ce0a63a7f0b452eb95273321f165894849f4
2014-04-24 12:04:37 +02:00
9358863f8d text utils stub
Former-commit-id: d4459220f5696839d98848e9c30a61c084763a91
2014-04-24 08:36:48 +02:00
13c97f572d sentence anonymizer stub, regex replacement
Former-commit-id: edb1247f7b29fd62913114be84d3391507a0890e
2014-04-13 12:21:30 +02:00
8a38831306 logarithmic score
Former-commit-id: ec2704b3a206cc39ed42d19620bef6ce0fedbc7e
2014-03-14 12:05:06 +01:00
4b921decae limits control
Former-commit-id: 83d90cb63b3f1447938d16010e66f4345dfe0617
2014-03-14 11:30:17 +01:00
655087582e anubis search stub
Former-commit-id: 41cf0c8811767219f6f58bc06d9729d724269e73
2014-03-11 14:32:10 +01:00
d5e692ebfd Anubis search stub
Former-commit-id: 4d8c76f85afbe910daca69e695c86165c32adbd8
2014-03-11 14:29:30 +01:00
8ec610a2ac building
Former-commit-id: 5c0f29cc2061c88159e3d87fc0d21da32edd05bb
2014-02-20 18:46:04 +01:00
fb65cc9c66 suffix markers
Former-commit-id: 7426cce771f548dcd4eb7478aafa912fb73784bf
2014-02-20 10:49:17 +01:00
b318770752 redesigned project
Former-commit-id: d35841126fda627a04a1a16a26b91943401b6fcf
2013-12-14 15:23:17 +01:00
13e3cd9e33 logging test 2013-12-06 22:59:29 +01:00
47405834a3 concordia-console, new approach to suffix array - 4 sauchars per one saidx 2013-12-06 22:29:25 +01:00
7c1ed7fb6e suachar_t changed to int 2013-12-01 23:34:46 +01:00
0d8a057278 suffix array simple search 2013-11-28 16:47:57 +01:00
d3cccff654 concordia index 2013-11-20 17:43:29 +01:00
656e9dbae9 concordia index stub 2013-11-14 20:36:34 +01:00
b238995a16 working hash generator 2013-11-14 15:44:50 +01:00
3aa4091e4d word map 2013-11-12 22:08:37 +01:00
3c208270b9 init 2013-10-24 17:08:58 +02:00