RandomSec/main/tests/data
Tom Morris a3fab26cca
Fix the text format guesser so it doesn't inappropriately guess WikiText (#2924)
* Fix text guesser so it doesn't guess wikitext

Fixes #2850
- Add simple magic detector for zip & gzip files to keep
  it from attempting to guess binary files
- Add a counter for C0 controls for the same reason
- Tighten wikitable counters to require marker at
  beginning of the line, per the specification
- Refactor to use Apache Commons instead of private
  counting methods
- Add tests for most TextGuesser formats

* Remove misplaced duplicate test data file

* Fix LGTM warning + minor cleanups

* Use BoundedInputStream to prevent runaway lines
2020-07-15 08:56:00 +02:00
..
changes Remove DataExtensions from change serialization 2018-05-22 23:09:51 +02:00
big5.html Restore character encoding guesser (#2755) 2020-06-22 06:04:51 +02:00
big5.txt Restore character encoding guesser (#2755) 2020-06-22 06:04:51 +02:00
birds.csv Major refactor to separate the webapp part from the embedded servlet engine part 2010-05-28 23:19:08 +00:00
Colorado-Municipalities-small-xlsx.gz Fix the text format guesser so it doesn't inappropriately guess WikiText (#2924) 2020-07-15 08:56:00 +02:00
euc-jp.html Restore character encoding guesser (#2755) 2020-06-22 06:04:51 +02:00
euc-jp.txt Restore character encoding guesser (#2755) 2020-06-22 06:04:51 +02:00
euc-kr.html Restore character encoding guesser (#2755) 2020-06-22 06:04:51 +02:00
euc-kr.txt Restore character encoding guesser (#2755) 2020-06-22 06:04:51 +02:00
example_project_metadata_save_mode.json Add customMetadata to project metadata parsing test 2019-06-04 12:02:49 +01:00
example_project_metadata.json Add customMetadata to project metadata parsing test 2019-06-04 12:02:49 +01:00
example-latin1.tsv Major refactor to separate the webapp part from the embedded servlet engine part 2010-05-28 23:19:08 +00:00
example-linebreaks-in-cells.csv Major refactor to separate the webapp part from the embedded servlet engine part 2010-05-28 23:19:08 +00:00
example-linebreaks-in-cells.tsv Normalize line endings 2013-03-23 18:46:20 -04:00
example-utf8.tsv Major refactor to separate the webapp part from the embedded servlet engine part 2010-05-28 23:19:08 +00:00
excel95.xls Add Excel95 import test and improve other importer tests (#2844) 2020-06-30 08:20:56 +02:00
films.ods Truncate any completely empty columns on the right (#2842) 2020-06-30 08:19:00 +02:00
food.csv fixed client tests 2010-05-31 17:56:07 +00:00
food.small.csv Major refactor to separate the webapp part from the embedded servlet engine part 2010-05-28 23:19:08 +00:00
government_contracts.csv Major refactor to separate the webapp part from the embedded servlet engine part 2010-05-28 23:19:08 +00:00
grid_small.json Unit test for issue #137 2017-08-13 12:37:20 -05:00
jorf.xml add UT for issue #1509 2018-02-25 15:31:59 -05:00
movies-condensed.tsv Major refactor to separate the webapp part from the embedded servlet engine part 2010-05-28 23:19:08 +00:00
movies.tsv Major refactor to separate the webapp part from the embedded servlet engine part 2010-05-28 23:19:08 +00:00
nobel-prize-winners.csv Nobel prize winners CSV data typos 2018-01-07 21:28:07 +01:00
ozone_8hr_dmax.csv Major refactor to separate the webapp part from the embedded servlet engine part 2010-05-28 23:19:08 +00:00
ozone_sites.csv Major refactor to separate the webapp part from the embedded servlet engine part 2010-05-28 23:19:08 +00:00
presidents.tsv Major refactor to separate the webapp part from the embedded servlet engine part 2010-05-28 23:19:08 +00:00
shift_jis.html Restore character encoding guesser (#2755) 2020-06-22 06:04:51 +02:00
shift_jis.txt Restore character encoding guesser (#2755) 2020-06-22 06:04:51 +02:00
Wpi Data.tsv Major refactor to separate the webapp part from the embedded servlet engine part 2010-05-28 23:19:08 +00:00