Commit Graph

125 Commits

Author SHA1 Message Date
829c70d320 long file support 2019-02-26 13:46:57 +01:00
d39c0400c9 occurrence refactoring 2019-01-22 14:07:28 +01:00
73b3d22d97 removing throw declarations 2019-01-18 13:30:51 +01:00
210929751d const to the getter for total occurences count 2019-01-16 13:15:30 +01:00
ec621fb310 working full search 2019-01-09 18:31:52 +01:00
5a7cbbe9e9 full search stub - tests needed 2019-01-09 15:30:56 +01:00
53b100b2e4 lowercasing bad utf 2018-12-13 17:43:01 +01:00
2eda92fe7a interval contains 2018-12-12 21:45:07 +01:00
4258caf522 correction 2018-08-29 11:08:01 +02:00
bd4ff81e32 ensuring UTF-8 strings 2017-10-15 18:54:15 +02:00
61631c52a3 lexicon search 2017-10-10 15:39:47 +02:00
5e809efcce corrected tokenizer 2017-05-05 12:58:32 +02:00
96a5bc3108 original sentence in tokenized sentence 2017-04-28 13:48:32 +02:00
4faae4e91a slight change 2017-04-27 13:52:03 +02:00
dceb0d9f47 date recognition 2017-04-27 10:37:29 +02:00
bd73749388 new tokenizer 2017-04-26 17:02:18 +02:00
a0673df75a cpplint corrections 2017-04-22 23:47:48 +02:00
970dda5dc2 option of white space tokenization while searching 2017-04-22 23:45:51 +02:00
31e4f091ad mutliple results 2017-04-21 14:51:58 +02:00
c3826919ba changes in CMakeLists.txt 2017-03-03 11:28:54 +01:00
cf7b1592f7 updated todo 2016-11-01 22:23:30 +01:00
7e005bfca7 changed significance factor to 2 2016-10-22 18:02:04 +02:00
8bc739ff20 added boundary on simple search results 2016-01-25 22:42:42 +01:00
b3d7c993aa tokenize only option - no word map 2016-01-01 20:45:07 +01:00
bbf3853d2a added lowercasing when tokenizing by space 2015-12-29 21:44:46 +01:00
0a8d2fdd39 tokenize by whitespace option 2015-12-27 20:54:40 +01:00
873d7c300c added parameterless constructor for concordia 2015-10-19 15:38:10 +02:00
1adabf4833 add index path as required argument to concordia constructor 2015-10-16 22:14:11 +02:00
f585ff9e01 corpus figures creator 2015-10-06 13:34:03 +02:00
96c74c47ac corpus analyzer 2015-10-04 16:24:58 +02:00
2601dc83bf test corpus for corpus analyzer 2015-10-03 16:19:10 +02:00
4e17e28f7f working corpus analyzer 2015-10-03 16:18:49 +02:00
fa3138df29 count occurences feature 2015-10-01 13:36:54 +02:00
fd32ff7e12 todo 2015-09-07 08:15:46 +02:00
cdeb57ccfa todo 2015-08-26 20:14:43 +02:00
bd62420cd5 updated tutorial 2015-08-24 14:30:20 +02:00
0a3fd8a04e added an extremely important improvement to the concordia search algorithm - gapped overlays cut-off 2015-08-24 13:10:06 +02:00
209e374226 repaired concordia test 2015-08-19 20:53:40 +02:00
68fecaddf8 adding all tokenized examples 2015-08-19 20:49:26 +02:00
a765443a01 simple search returns matched pattern fragments 2015-08-07 12:54:57 +02:00
28704c2f43 separated tokenization and adding to index 2015-08-01 17:03:39 +02:00
5a57406875 finished original word positions 2015-06-27 12:40:24 +02:00
a8c5fa0c75 original word positions 2015-06-27 10:09:49 +02:00
dba70b4e24 done word positions 2015-06-26 22:50:53 +02:00
724bf0d080 new responsibilities of tokenized sentence 2015-06-26 15:38:24 +02:00
9b1735516c working sentence tokenizer 2015-06-25 20:49:22 +02:00
8432dd321f tokenizer in progress 2015-06-25 10:12:51 +02:00
0baf3e4ef2 character intervals in progress 2015-06-22 13:52:56 +02:00
4c0f2fd08d modified todo 2015-06-12 12:25:02 +02:00
dff52abff7 modified todo 2015-06-11 11:17:45 +02:00