Add files

This commit is contained in:
Ryszard Staruch 2024-07-31 15:11:48 +02:00
commit e95e1a194f
1192 changed files with 2201 additions and 0 deletions
README.md
test/images
014fd08927785b2056411ddf4e54c965.png051b3f8ec0508da2b46984147123b827.png06a1ce76aa4b62747b04e7b50183762c.png094a80c8d2c906e81312741d267fd707.png0a5ed6de4998edf794e8be1085784de4.png0bc1a1a414cc7d698db0fbeb840a1577.png0d4e0b45e6f5872d98b8d539eeb8b491.png0f60bf67955c1d24b922b18318031229.png11eaece58eb87c302761db1933fa66ef.png13038cd679c390f8329cb9a84e6201e1.png130bbb2c38d6e39103ca8524a7481728.png13ac946de4b886bd7cac88ec5a269495.png1643205598fc6d2d5d9ab98f7c31a296.png17e9d206de543677e9b50acbe5f074f6.png1a04c7901921a4a27af41f07d25b57d3.png1a9112ff3b933ec861c23abf628cbfc8.png1c82c9e95ebb6e90768ef1534a28a3bd.png1f4dbafbeefac7eb31ba976612899105.png218330e412cf8e9f484337d9f76feac6.png21c24fcfa244a5c767c211eeb5d7c8de.png22ae0572f746709b72635a7b428c4c72.png23e121dd7bf58f1d995eac8ec7b4e13b.png241439383fb65ba2166377992aa03bab.png24cbbae26bed47353f84f6264f6c9e17.png25a80b6d45881ae086353bd460e82fe3.png25f7163e087c46a76995e42e81e5020f.png26aa0be7b5df887aad5a27256d6681cc.png27476085334fdb0f5c29c6eabc656361.png27911ec971b8ff785316738fb296c331.png295904919a86e929f4520a985a27a1ea.png29bc37ac4404f77d5246e288c8a48f6e.png2a56f7dab9dd04922613d9a1f5490359.png2c23cc79c862f02d1d26e8f35013c97f.png2cab17553c4f407eb7ddc3585bd485a9.png2e3b3f6d93ed78b6cef7843d8bad8a40.png2f06045f645498492732b7cb0d66c09a.png31c8eedaa5d60d6ec47dd118dc5ac2ed.png3953b3b36c6a98c4943895bf9e740ad0.png3bda278501e85ca6c7f76bd756bef382.png3c69b19e112c83f0c20a63a19d6b4624.png3d91d8d07f7a302f0b1a798d27d85a37.png4041373cb8b2a380f7046d002329cb58.png4069973024822155c65e4649b096d8ac.png41b9511fa6a181f9cc20c4cfb9eb61fc.png43659f75793f624c25fd6fa0273f1292.png44f19eaacc091f539036d17e9f89ea1f.png47c140f36ae0ff068484ff4da83be2b2.png487f25fa31e3caeb02f96995977f5767.png49bae58d5b70141dd22da13b5023c738.png4a0f5296dbedfe8cbe106045f85407a0.png4b055c586ad5eca1f36f2bb419b1d9d1.png535357dad2402c50a2f6cfcf80d2f46e.png5353fa0016c1f81daba4d83d35604812.png54f5f9193225113f3f1079255f66b362.png560b08a062d8007ace58d4cf2b6e44e5.png579e8783a62cdebf32cbeac9bca2bd2d.png58f145bdcf0222fe479c7093cb7fdb96.png59cd1bcea094c15cd1b7fdf582e01778.png5a057cc97e801d4e943068ca933c16a3.png5a4674a7eea45e501e1feedd3f394e5d.png5a5226a2e1f55c26ca132ffe8ed726cd.png5a83433cc315eb27e82837cf49ba8a1a.png5b3394003042008c784b2355b8238b30.png5b85ecc407a35a5d5f28c5be00cf0236.png5c86ce8683ae5121de7be6f3d1fa519e.png5cbf132ec9365b601a53fdcbdb93ae2b.png5cd469e8d2c02282d2596484fc80ebd7.png5f9a3e649f6d669632fa80a3abadcb96.png61ca03ac388a78809cf31a2eeedccfda.png6257ce56eb1eb610ac92a6da04c78d45.png63bd6752b8bae43e2b6fa5de3a404aee.png64fe7a86a9faf88a1b0e1fb72aaec9f9.png65035ae776d1c76fbd0401f72d801356.png65178336b7b85f2be9cb145b0abf2c8c.png653764a4f8938ac40f35b7b433a94b85.png6864c3be1f8fe81634a61460a22263fc.png69115ccfe023c05e45c51877663df0b0.png69c67d59c305c5366f95403dbb614d74.png6be7ce1ebfba0db2eec3796b3ccd11eb.png6c545ba60aa8cb81b09d9b9e547c55e1.png6d79a316417751256c97ddef290bcc17.png718cba757475c5cb36e151a6a44754fb.png719ece743298b89647004b3aaf9cd020.png726049ff3fb23d35c65d72c736770bbe.png72a2a4aa8fb59c012a3893058229f47b.png73721c6592232ad47cce540140bef537.png73d480046bebdd2d57e8f6eb7fa5a650.png75164ca455a533b12e6a6408d424f381.png7669fe655c325346980d7c84f9f24638.png76d70e9b6037848af7e898953e7565b1.png773d45da7ed1562e0afa6740f6c83d67.png7790f82c149092ec3bd04b9ae83ce003.png77a731f54482e209800ed63f42dddbba.png77fa516e93655ea6381fb114b01320a9.png783ae486e17b6a7202ec1e0b80174bf5.png7844e280fd5ca7423f68abcd648cfb0b.png78988200024c57bf17f36126db8267a7.png7bf1286bcb75feaf6df76d6bdf594187.png7d277423e4731fdb5a82c32895e46f59.png

25
README.md Normal file
View File

@ -0,0 +1,25 @@
HANOI challenge
This challenge is based on the contents of the HANOI corpus described in detail below. This is a binary classification challenge: the aim is to classify interpreter notes as either being written by a trainee or a professional. The training split of the dataset consists of 988 training examples in the form of scans of interpreter notes, with 786 of them being made by professionals and 202 by trainees (university students during an interpreting course).
HANOI, or Handwritten Notation of Interpreters, is a corpus of handwritten notes for consecutive interpreting, collected from professional interpreters and interpreting students. It is the only resource of its kind in the world.
Interpreting is the act of translating spoken language. Professional interpreters are needed to e.g. translate the discussion between international guests speaking in their native tongues during a conference. There are several types of interpreting, with one of them being consecutive interpreting. In this case, the interpreter waits for the speaker to finish his whole speech before starting to interpret. As such speeches can last up to 20 minutes, to accurately convey the content of the original speech interpreters rely on handwritten notes. The interpreter listens to the source language and, at the same time, notes down selected content to remember it, and later recreate it in the target language.
There are rules for note-taking. The writing should be sparse and diagonal, using abbreviations, acronyms, and symbols. Interpreters often take notes in two or more languages at the same time. The resulting specialized multilingual text, the so-called semi-product of interpreting, serves a unique function: supporting short-term memory during interpretation. Developing note-taking skills for interpreting is a process that starts at university with a course in notation and continues basically throughout an interpreter's entire career. Every interpreter's notation style is different, and it is virtually impossible to read someone else's notes.
The notes of consecutive interpreters constitute a unique type of handwritten text, quite unlike the notes people use for everyday tasks, school, and work. Interestingly, the notes of professional interpreters and of those who are new to the skill are also different. The notation of interpreting trainees is more reminiscent of 'traditional' notes: there are grammatically correct sentences and multi-syllable words, pages are densely written, and there are no symbols, abbreviations, or distinctive lines that would divide the speech into separate ideas.
(Description adapted from https://hanoi.amu.edu.pl/)
Taking into account the above-outlined unique characteristics of interpreter notes as well as the differences between the ones created by trainees versus the ones made by professionals, an interesting question arises: could a machine learning model reliably identify the interpreting experience of the author of a note across several hundred examples? Take part in the challenge and prove that the answer can be 'yes'!
Metric: accuracy
Labels: trainee, pro
Dataset authors: https://csi.amu.edu.pl/zespoly/zespol-lingwistyki-diachronicznej
License: CC BY-NC 4.0
HANOI is part of the Digital Research Infrastructure for the Humanities and Arts DARIAH-PL, funded from the Intelligent Development Operational Programme, Polish National Centre for Research and Development, ID: POIR.04.02.00-00-D006/20.

Binary file not shown.

After

(image error) Size: 440 KiB

Binary file not shown.

After

(image error) Size: 359 KiB

Binary file not shown.

After

(image error) Size: 569 KiB

Binary file not shown.

After

(image error) Size: 1.8 MiB

Binary file not shown.

After

(image error) Size: 1.9 MiB

Binary file not shown.

After

(image error) Size: 1007 KiB

Binary file not shown.

After

(image error) Size: 1.2 MiB

Binary file not shown.

After

(image error) Size: 426 KiB

Binary file not shown.

After

(image error) Size: 125 KiB

Binary file not shown.

After

(image error) Size: 494 KiB

Binary file not shown.

After

(image error) Size: 527 KiB

Binary file not shown.

After

(image error) Size: 1.9 MiB

Binary file not shown.

After

(image error) Size: 369 KiB

Binary file not shown.

After

(image error) Size: 2.0 MiB

Binary file not shown.

After

(image error) Size: 1.3 MiB

Binary file not shown.

After

(image error) Size: 453 KiB

Binary file not shown.

After

(image error) Size: 413 KiB

Binary file not shown.

After

(image error) Size: 330 KiB

Binary file not shown.

After

(image error) Size: 1.1 MiB

Binary file not shown.

After

(image error) Size: 1.7 MiB

Binary file not shown.

After

(image error) Size: 367 KiB

Binary file not shown.

After

(image error) Size: 601 KiB

Binary file not shown.

After

(image error) Size: 1.1 MiB

Binary file not shown.

After

(image error) Size: 761 KiB

Binary file not shown.

After

(image error) Size: 1.8 MiB

Binary file not shown.

After

(image error) Size: 327 KiB

Binary file not shown.

After

(image error) Size: 934 KiB

Binary file not shown.

After

(image error) Size: 1.7 MiB

Binary file not shown.

After

(image error) Size: 472 KiB

Binary file not shown.

After

(image error) Size: 1.6 MiB

Binary file not shown.

After

(image error) Size: 1.7 MiB

Binary file not shown.

After

(image error) Size: 1.7 MiB

Binary file not shown.

After

(image error) Size: 1.0 MiB

Binary file not shown.

After

(image error) Size: 2.0 MiB

Binary file not shown.

After

(image error) Size: 609 KiB

Binary file not shown.

After

(image error) Size: 651 KiB

Binary file not shown.

After

(image error) Size: 1.1 MiB

Binary file not shown.

After

(image error) Size: 1.8 MiB

Binary file not shown.

After

(image error) Size: 1.9 MiB

Binary file not shown.

After

(image error) Size: 1.4 MiB

Binary file not shown.

After

(image error) Size: 96 KiB

Binary file not shown.

After

(image error) Size: 576 KiB

Binary file not shown.

After

(image error) Size: 1.0 MiB

Binary file not shown.

After

(image error) Size: 2.8 MiB

Binary file not shown.

After

(image error) Size: 332 KiB

Binary file not shown.

After

(image error) Size: 498 KiB

Binary file not shown.

After

(image error) Size: 1.3 MiB

Binary file not shown.

After

(image error) Size: 2.1 MiB

Binary file not shown.

After

(image error) Size: 693 KiB

Binary file not shown.

After

(image error) Size: 811 KiB

Binary file not shown.

After

(image error) Size: 410 KiB

Binary file not shown.

After

(image error) Size: 2.0 MiB

Binary file not shown.

After

(image error) Size: 342 KiB

Binary file not shown.

After

(image error) Size: 350 KiB

Binary file not shown.

After

(image error) Size: 2.7 MiB

Binary file not shown.

After

(image error) Size: 1.3 MiB

Binary file not shown.

After

(image error) Size: 444 KiB

Binary file not shown.

After

(image error) Size: 401 KiB

Binary file not shown.

After

(image error) Size: 836 KiB

Binary file not shown.

After

(image error) Size: 348 KiB

Binary file not shown.

After

(image error) Size: 1.7 MiB

Binary file not shown.

After

(image error) Size: 828 KiB

Binary file not shown.

After

(image error) Size: 2.3 MiB

Binary file not shown.

After

(image error) Size: 1.6 MiB

Binary file not shown.

After

(image error) Size: 1.1 MiB

Binary file not shown.

After

(image error) Size: 512 KiB

Binary file not shown.

After

(image error) Size: 373 KiB

Binary file not shown.

After

(image error) Size: 553 KiB

Binary file not shown.

After

(image error) Size: 1.2 MiB

Binary file not shown.

After

(image error) Size: 1.5 MiB

Binary file not shown.

After

(image error) Size: 2.0 MiB

Binary file not shown.

After

(image error) Size: 482 KiB

Binary file not shown.

After

(image error) Size: 1.9 MiB

Binary file not shown.

After

(image error) Size: 1.5 MiB

Binary file not shown.

After

(image error) Size: 1.7 MiB

Binary file not shown.

After

(image error) Size: 634 KiB

Binary file not shown.

After

(image error) Size: 362 KiB

Binary file not shown.

After

(image error) Size: 1.7 MiB

Binary file not shown.

After

(image error) Size: 193 KiB

Binary file not shown.

After

(image error) Size: 797 KiB

Binary file not shown.

After

(image error) Size: 1.7 MiB

Binary file not shown.

After

(image error) Size: 580 KiB

Binary file not shown.

After

(image error) Size: 587 KiB

Binary file not shown.

After

(image error) Size: 715 KiB

Binary file not shown.

After

(image error) Size: 1.5 MiB

Binary file not shown.

After

(image error) Size: 919 KiB

Binary file not shown.

After

(image error) Size: 334 KiB

Binary file not shown.

After

(image error) Size: 1015 KiB

Binary file not shown.

After

(image error) Size: 1.9 MiB

Binary file not shown.

After

(image error) Size: 799 KiB

Binary file not shown.

After

(image error) Size: 481 KiB

Binary file not shown.

After

(image error) Size: 557 KiB

Binary file not shown.

After

(image error) Size: 2.9 MiB

Binary file not shown.

After

(image error) Size: 1.3 MiB

Binary file not shown.

After

(image error) Size: 598 KiB

Binary file not shown.

After

(image error) Size: 564 KiB

Binary file not shown.

After

(image error) Size: 418 KiB

Binary file not shown.

After

(image error) Size: 661 KiB

Binary file not shown.

After

(image error) Size: 514 KiB

Some files were not shown because too many files have changed in this diff Show More