diff --git a/README.md b/README.md
index e410d12..7609006 100644
--- a/README.md
+++ b/README.md
@@ -518,10 +518,6 @@ The following files will be used in example calculations, `expected.tsv`:
     Foo baz BAR
     Ok 7777
 
-`in.tsv`:
-
-
-
 Without any flags, the `Accuracy` metric is:
 
     $ geval -o out.tsv -e expected.tsv --metric Accuracy
@@ -541,7 +537,7 @@ Without any flags, the `Accuracy` metric is:
     $ geval -o out.tsv -e expected.tsv --metric Accuracy:l
     0.4
 
-Why the result is differnt for lower-casing and upper-casing? Some
+Why the result is different for lower-casing and upper-casing? Some
 characters, e.g. German _ß_, are tricky. If you upper-case _Straße_
 you've got _STRASSE_, but if you lower-case it, you obtain _straße_,
 not _strasse_! For this reason, when you want to disregard case when
@@ -555,12 +551,12 @@ than lower- or upper-casing:
 
 ### Manipulations with regular expressions
 
-#### `m<REGEXP>` matching a given PCRE regexp
+#### `m<REGEXP>` — matching a given PCRE regexp
 
 The evaluation metric will be calculated only on the parts of the
 outputs matching a given regular expression. This can be used when you
 want to focus on some specific parts of a text. For instance, we could
-calculate Accuracy only considering (disregarding all other
+calculate Accuracy only considering numbers (disregarding all other
 characters, including spaces).
 
     $ geval -o out.tsv -e expected.tsv --metric 'Accuracy:m<\d+>'
@@ -569,9 +565,11 @@ characters, including spaces).
 (Note that apostrophes are due to using Bash here, if you put it into
 the `config.txt` file you should omit apostrophes: `--metric Accuracy:m<\d+>`.)
 
-All matches are considered and concatenated, if no match is found, an empty string is assumed
-(hence, e.g., `testtttttt` is considered a hit for `test` after this normalization).
-Note that both  `aaa 3 4 bbb` and `aaa BBB 34` will be normalized to `34` here.
+All matches are considered and concatenated, if no match is found, an
+empty string is assumed (hence, e.g., `testtttttt` is considered a hit
+for `test` after this normalization, as both will be transformed into
+the empty string). Note that both `aaa 3 4 bbb` and `aaa BBB 34` will
+be normalized to `34` here.
 
 You can use regexp anchoring operators (`^` or `$`). This will refer
 to the beginning or end of the whole *line*. You could use it to
@@ -619,6 +617,9 @@ You can use special operators `\0`, `\1`, `\2` to refer to parts matched by the
 
 This will sort all tokens, e.g. `foo bar baz` will be treated as `bar baz foo`.
 
+    $ geval -o out.tsv -e expected.tsv --metric 'Accuracy:S'
+    0.3
+
 ### Filtering
 
 #### `f<FEATURE>` — filtering
@@ -626,12 +627,12 @@ This will sort all tokens, e.g. `foo bar baz` will be treated as `bar baz foo`.
 Flags such as `u`, `m<...>`, `s<...><...>` etc. work within a line
 (item), they won't change the number items being evaluated. To
 consider only a subset of items, use the `f<FEATURE>` flag — only the
-lines containing the feature FEATURE will be considered during metric
+lines containing the feature FEATURE will be taken during metric
 calculation. Features are the same as listed by the `--worst-features`
 option, e.g. `exp:foo` would accept only lines with the expected
 output containing the token `foo`, `in[2]:bar` — lines with the second
 columns of input contaning the token `bar` (contrary to
-`--worst-features` square brackets should be used be instead of angle ones for indexing).
+`--worst-features` square brackets should be used, instead of angle ones, for indexing).
 
 You *MUST* supply an input file when you use the `f<...>` flag. Assume
 the following `in.txt` file:
@@ -690,7 +691,8 @@ This is handy, when combined with the `{...}` operator (see below).
 This sets the priority level, considered when the results are displayed in the Gonito platform.
 It has no effect in GEval as such (it is simply disregarded in GEval).
 
-    $ geval --precision 3 -o out.tsv -e expected.tsv --metric 'Accuracy:P<1>' --metric 'MultiLabel-F1:P<3>'               Accuracy:P<1>	0.200
+    $ geval --precision 3 -o out.tsv -e expected.tsv --metric 'Accuracy:P<1>' --metric 'MultiLabel-F1:P<3>'
+    Accuracy:P<1>	0.200
     MultiLabel-F1.0:P<3>	0.511
 
 The priority is interpreted by Gonito in the following way:
diff --git a/test/Spec.hs b/test/Spec.hs
index 2730046..d54bd3d 100644
--- a/test/Spec.hs
+++ b/test/Spec.hs
@@ -383,6 +383,8 @@ main = hspec $ do
       runGEvalTest "flags-regexp-substitution" `shouldReturnAlmost` 0.3
     it "regexp-substitution-ref" $ do
       runGEvalTest "flags-regexp-substitution-ref" `shouldReturnAlmost` 0.5
+    it "sort" $ do
+      runGEvalTest "flags-sort" `shouldReturnAlmost` 0.3
     it "filtering" $ do
       runGEvalTest "flags-filtering" `shouldReturnAlmost` 0.25
   describe "evaluating single lines" $ do
diff --git a/test/flags-sort/flags-sort-solution/test-A/out.tsv b/test/flags-sort/flags-sort-solution/test-A/out.tsv
new file mode 100644
index 0000000..4be9eae
--- /dev/null
+++ b/test/flags-sort/flags-sort-solution/test-A/out.tsv
@@ -0,0 +1,10 @@
+foo 999 BAR
+29008 STRASSE
+xyz
+aaa BBB 34
+qwerty 1000
+WWW WWW WWW WWW WWW WWW WWW WWW
+testtttttt
+104
+Foo baz BAR
+Ok 7777
diff --git a/test/flags-sort/flags-sort/config.txt b/test/flags-sort/flags-sort/config.txt
new file mode 100644
index 0000000..0de8e69
--- /dev/null
+++ b/test/flags-sort/flags-sort/config.txt
@@ -0,0 +1 @@
+--metric Accuracy:S
diff --git a/test/flags-sort/flags-sort/test-A/expected.tsv b/test/flags-sort/flags-sort/test-A/expected.tsv
new file mode 100644
index 0000000..a95a323
--- /dev/null
+++ b/test/flags-sort/flags-sort/test-A/expected.tsv
@@ -0,0 +1,10 @@
+foo 123 bar
+29008 Straße
+xyz
+aaa 3 4 bbb
+qwerty 100
+WWW WWW
+test
+104
+BAR Foo baz
+OK 7777