...
This commit is contained in:
parent
55521f222e
commit
ab1d7e2546
4
.gitignore
vendored
Normal file
4
.gitignore
vendored
Normal file
@ -0,0 +1,4 @@
|
||||
.Rproj.user
|
||||
.Rhistory
|
||||
.RData
|
||||
.Ruserdata
|
13
11/11.Rproj
Normal file
13
11/11.Rproj
Normal file
@ -0,0 +1,13 @@
|
||||
Version: 1.0
|
||||
|
||||
RestoreWorkspace: Default
|
||||
SaveWorkspace: Default
|
||||
AlwaysSaveHistory: Default
|
||||
|
||||
EnableCodeIndexing: Yes
|
||||
UseSpacesForTab: Yes
|
||||
NumSpacesForTab: 2
|
||||
Encoding: UTF-8
|
||||
|
||||
RnwWeave: Sweave
|
||||
LaTeX: pdfLaTeX
|
10
11/3.txt
10
11/3.txt
@ -1,10 +0,0 @@
|
||||
Wiem że nie do końca o to chodziło ale jak chodzi o ciekawą graficzną interpretację to polecam:
|
||||
https://www.gwern.net/Traffic
|
||||
|
||||
A teraz 2 odpowiedzi:
|
||||
UNSW-NB15:
|
||||
opis: https://www.unsw.adfa.edu.au/unsw-canberra-cyber/cybersecurity/ADFA-NB15-Datasets/
|
||||
link: https://cloudstor.aarnet.edu.au/plus/index.php/s/2DhnLGDdEECo4ys
|
||||
NSL_KDD:
|
||||
opis: Nie znalazłem samego setu, ale znalazłem jego zrzut :)
|
||||
link: https://github.com/jmnwong/NSL-KDD-Dataset
|
91
11/README.md
Normal file
91
11/README.md
Normal file
@ -0,0 +1,91 @@
|
||||
# Charakteryzacja zbiorów oraz jego historyczność (zadanie 1)
|
||||
|
||||
## 1. kddcup99
|
||||
|
||||
Przygotowany na Fifth International Conference on Knowledge Discovery and Data Mining by w ramach konkursu wyłonić najlepiej zaprojektowany model pradykcyjny wykrywający potencjalny atak.
|
||||
W zbiorze były 4 typy ataków (DOS, R2L, U2R, probing). W zbiorze danych były 24 ataki i 14 dodatkowych w zbiorze testującym.
|
||||
Dane zostały zasymulowane w sieci militernej. To 4GB ruchu sieciowego z 7 tygodni (około 5 milionów rekordów połączeń).
|
||||
Połączenie to sekwencja pakietów TCP z zdefiniowanym początkiem i końcem (w czasie) i jest oznaczone jako norlalne lub przez kod ataku. Każde zawiera około 100 bajtów.
|
||||
|
||||
Co do używalności to znalazłem:
|
||||
|
||||
- Pracę z 2016r. opisującą zastosowania w uczeniu maszynowym w latach 2010-2015 - https://peerj.com/preprints/1954/
|
||||
- Pracę z 2018r. w której autor mówi że ten zbiór używa się często jako benchmark - https://arxiv.org/abs/1811.05372
|
||||
|
||||
### Nie wiem czy to oznacza że jest nadal używany (to jednak 3 lata) - wydaje mi się że tak i taką ocenę zostawiam :)
|
||||
|
||||
<br>
|
||||
|
||||
## 2. network
|
||||
|
||||
Zrzut ruchu sieciowego wykonanego programem tcdump pomiędzy pewną siecią LAN a sieciami zewnętrznymi.
|
||||
Dzięki ofiltrowaniu tcdump'a zebrane zostały wyłącznie połączenia TCP i UDP.
|
||||
|
||||
### Każdy pakiet TCP składa się z:
|
||||
|
||||
- Time stamp
|
||||
- Source IP address
|
||||
- Source port
|
||||
- Destination IP address
|
||||
- Destination port
|
||||
- Flags (syn, fin, push, rst, or .)
|
||||
- Data sequence number of this packet
|
||||
- Data sequence number of the data expected in return
|
||||
- Number of bytes of receive buffer space available
|
||||
- Indication of whether or not the data is urgent
|
||||
|
||||
### Każdy pakiet UDP składa się z:
|
||||
|
||||
- Time stamp
|
||||
- Source IP address
|
||||
- Source port
|
||||
- Destination IP address
|
||||
- Destination port
|
||||
- Length of the packet
|
||||
Wszystkie adresy IP zostały zmodyfikowane by nie udostępniać możliwie niebezpiecznych danych.
|
||||
|
||||
|
||||
### Ostatnia edycja strony tego zbioru była 4 kwietnia 2001r., ostatni artykuł jaki mają podany na stronie (http://ivpr.cs.uml.edu/publications/) jest z 2000r., nie znalazłem wspominek o wykorzystaniu tych danych w nowszych pracach więc oznaczam ten zbiór jaki historyczny.
|
||||
|
||||
<br>
|
||||
|
||||
## 3. wywołania systemowe
|
||||
|
||||
Zbiór zawiera dane wywołań aktywnych procesów systemowych.
|
||||
Każdy plik ścieżkowy (\*.int) zawiera listę par numerów w kolejności:
|
||||
|
||||
- PID procesu
|
||||
- numer reprezentujący zapytanie systemowe
|
||||
|
||||
Mapowanie numerów na wywołania jest załączone w dokumentacji w folderze `UserDoc`.
|
||||
Można też ją pobrać jako postscript pod tym adresem: https://www.cs.unm.edu/~immsec/software/stide_user_doc.ps
|
||||
|
||||
<br>
|
||||
|
||||
## 4. UNIX shell log
|
||||
|
||||
9 zbiorów danych aktywności uzytkmownika (USER0 i USER1 to ta sama osoba na innych maszynach) w systemie UNIX.
|
||||
Dane są wyczyszczone z wszystkich adresów sieciowych, danych osobowych, timestamp'ów etc.
|
||||
Reprezentacja tokenowa danych zawartych w zbiorze jest super opisana tutaj (http://kdd.ics.uci.edu/databases/UNIX_user_data/README) więc nie będę jej powtarzać.
|
||||
|
||||
### Nie znalazłem nowych prac z wykorzystaniem tego zbioru, a strona UCI KDD jest archiwalna jako że wchłonął ich UCI ML więc zakładam że zbiór jest archiwalny.
|
||||
|
||||
<br>
|
||||
<br>
|
||||
|
||||
# Dodatkowe zbiory (zadanie 3)
|
||||
|
||||
## 1. UNSW-NB15:
|
||||
|
||||
- opis: https://www.unsw.adfa.edu.au/unsw-canberra-cyber/cybersecurity/ADFA-NB15-Datasets/
|
||||
- link: https://cloudstor.aarnet.edu.au/plus/index.php/s/2DhnLGDdEECo4ys
|
||||
|
||||
## 2. NSL_KDD:
|
||||
|
||||
- opis: Nie znalazłem samego setu, ale znalazłem jego zrzut :)
|
||||
- link: https://github.com/jmnwong/NSL-KDD-Dataset
|
||||
|
||||
## P.S.
|
||||
|
||||
Wiem że nie do końca o to chodziło ale jak chodzi o ciekawą graficzną interpretację to polecam:
|
||||
https://www.gwern.net/Traffic
|
494021
11/kddcup99/Data/kddcup.data_10_percent_corrected
Normal file
494021
11/kddcup99/Data/kddcup.data_10_percent_corrected
Normal file
File diff suppressed because it is too large
Load Diff
64
11/kddcup99/response.R
Normal file
64
11/kddcup99/response.R
Normal file
@ -0,0 +1,64 @@
|
||||
#
|
||||
#
|
||||
# Na poczatku pragnę przeprosić za analizę jedynie 10% zamiast całego zbioru, lecz moje ograniczenia techniczne nie pozwalają mi
|
||||
# na puszczenie tego na pełnych danych bez spalenia mojego sprzętu. Mam nadzieję że zostanie mi to wybaczone :)
|
||||
#
|
||||
#
|
||||
|
||||
library(ggplot2)
|
||||
|
||||
headers <- c('back','buffer_overflow','ftp_write','guess_passwd','imap','ipsweep','land','loadmodule','multihop','neptune','nmap','normal','perl','phf','pod','portsweep','rootkit','satan','smurf','spy','teardrop','warezclient','warezmaster',
|
||||
'duration',
|
||||
'protocol_type',
|
||||
'service',
|
||||
'flag',
|
||||
'src_bytes',
|
||||
'dst_bytes',
|
||||
'land',
|
||||
'wrong_fragment',
|
||||
'urgent',
|
||||
'hot',
|
||||
'num_failed_logins',
|
||||
'logged_in',
|
||||
'num_compromised',
|
||||
'root_shell',
|
||||
'su_attempted',
|
||||
'num_root',
|
||||
'num_file_creations',
|
||||
'num_shells',
|
||||
'num_access_files',
|
||||
'num_outbound_cmds',
|
||||
'is_host_login',
|
||||
'is_guest_login',
|
||||
'count',
|
||||
'srv_count',
|
||||
'serror_rate',
|
||||
'srv_serror_rate',
|
||||
'rerror_rate',
|
||||
'srv_rerror_rate',
|
||||
'same_srv_rate',
|
||||
'diff_srv_rate',
|
||||
'srv_diff_host_rate',
|
||||
'dst_host_count',
|
||||
'dst_host_srv_count',
|
||||
'dst_host_same_srv_rate',
|
||||
'dst_host_diff_srv_rate',
|
||||
'dst_host_same_src_port_rate',
|
||||
'dst_host_srv_diff_host_rate',
|
||||
'dst_host_serror_rate',
|
||||
'dst_host_srv_serror_rate',
|
||||
'dst_host_rerror_rate',
|
||||
'dst_host_srv_rerror_rate')
|
||||
|
||||
|
||||
kddcup99 <- read.csv('kddcup99/Data/kddcup.data_10_percent_corrected', col.names = headers)
|
||||
|
||||
# nie do końca rozumiem czemu w zbiorze są 42 wartości a w http://kdd.ics.uci.edu/databases/kddcup99/kddcup.names są podane 64 kolumny...
|
||||
# do tego kolumny nie zgadzają mi się do końca sensem z tym co byłoby w nich, ale patrząc na dane wyciągam i pokazuję
|
||||
# poniżej 2 najsensowniej wyglądające do analizy kolumny
|
||||
|
||||
print('Most common imap in kddcup99:')
|
||||
print(tail(names(sort(table(kddcup99$imap)))))
|
||||
|
||||
print('Most common ipsweep in kddcup99:')
|
||||
print(tail(names(sort(table(kddcup99$ipsweep)))))
|
358760
11/network/Data/base.csv
Normal file
358760
11/network/Data/base.csv
Normal file
File diff suppressed because it is too large
Load Diff
628775
11/network/Data/net1.csv
Normal file
628775
11/network/Data/net1.csv
Normal file
File diff suppressed because it is too large
Load Diff
481851
11/network/Data/net2.csv
Normal file
481851
11/network/Data/net2.csv
Normal file
File diff suppressed because it is too large
Load Diff
509263
11/network/Data/net3.csv
Normal file
509263
11/network/Data/net3.csv
Normal file
File diff suppressed because it is too large
Load Diff
632036
11/network/Data/net4.csv
Normal file
632036
11/network/Data/net4.csv
Normal file
File diff suppressed because it is too large
Load Diff
72
11/network/response.R
Normal file
72
11/network/response.R
Normal file
@ -0,0 +1,72 @@
|
||||
library(ggplot2)
|
||||
|
||||
base <- read.csv('network/Data/base.csv')
|
||||
net1 <- read.csv('network/Data/net1.csv')
|
||||
net2 <- read.csv('network/Data/net2.csv')
|
||||
net3 <- read.csv('network/Data/net3.csv')
|
||||
net4 <- read.csv('network/Data/net4.csv')
|
||||
|
||||
# base
|
||||
print('Most common src_port in base:')
|
||||
print(tail(names(sort(table(base$src_port)))))
|
||||
|
||||
print('Most common src_addr in base:')
|
||||
print(tail(names(sort(table(base$src_addr)))))
|
||||
|
||||
print('Most common dest_port in base:')
|
||||
print(tail(names(sort(table(base$dest_port)))))
|
||||
|
||||
print('Most common dest_addr in base:')
|
||||
print(tail(names(sort(table(base$dest_addr)))))
|
||||
|
||||
# net1
|
||||
print('Most common src_port in net1:')
|
||||
print(tail(names(sort(table(net1$src_port)))))
|
||||
|
||||
print('Most common src_addr in net1:')
|
||||
print(tail(names(sort(table(net1$src_addr)))))
|
||||
|
||||
print('Most common dest_port in net1:')
|
||||
print(tail(names(sort(table(net1$dest_port)))))
|
||||
|
||||
print('Most common dest_addr in net1:')
|
||||
print(tail(names(sort(table(net1$dest_addr)))))
|
||||
|
||||
# net2
|
||||
print('Most common src_port in net2:')
|
||||
print(tail(names(sort(table(net2$src_port)))))
|
||||
|
||||
print('Most common src_addr in net2:')
|
||||
print(tail(names(sort(table(net2$src_addr)))))
|
||||
|
||||
print('Most common dest_port in net2:')
|
||||
print(tail(names(sort(table(net2$dest_port)))))
|
||||
|
||||
print('Most common dest_addr in net2:')
|
||||
print(tail(names(sort(table(net2$dest_addr)))))
|
||||
|
||||
# net3
|
||||
print('Most common src_port in net3:')
|
||||
print(tail(names(sort(table(net3$src_port)))))
|
||||
|
||||
print('Most common src_addr in net3:')
|
||||
print(tail(names(sort(table(net3$src_addr)))))
|
||||
|
||||
print('Most common dest_port in net3:')
|
||||
print(tail(names(sort(table(net3$dest_port)))))
|
||||
|
||||
print('Most common dest_addr in net3:')
|
||||
print(tail(names(sort(table(net3$dest_addr)))))
|
||||
|
||||
# net4
|
||||
print('Most common src_port in net4:')
|
||||
print(tail(names(sort(table(net4$src_port)))))
|
||||
|
||||
print('Most common src_addr in net4:')
|
||||
print(tail(names(sort(table(net4$src_addr)))))
|
||||
|
||||
print('Most common dest_port in net4:')
|
||||
print(tail(names(sort(table(net4$dest_port)))))
|
||||
|
||||
print('Most common dest_addr in net4:')
|
||||
print(tail(names(sort(table(net4$dest_addr)))))
|
69
11/unix/Data/README
Normal file
69
11/unix/Data/README
Normal file
@ -0,0 +1,69 @@
|
||||
This file contains 9 sets of sanitized user data drawn from the
|
||||
command histories of 8 UNIX computer users at Purdue over the course
|
||||
of up to 2 years (USER0 and USER1 were generated by the same person,
|
||||
working on different platforms and different projects). The data is
|
||||
drawn from tcsh(1) history files and has been parsed and sanitized to
|
||||
remove filenames, user names, directory structures, web addresses,
|
||||
host names, and other possibly identifying items. Command names,
|
||||
flags, and shell metacharacters have been preserved. Additionally,
|
||||
**SOF** and **EOF** tokens have been inserted at the start and end of
|
||||
shell sessions, respectively. Sessions are concatenated by date order
|
||||
and tokens appear in the order issued within the shell session, but no
|
||||
timestamps are included in this data. For example, the two sessions:
|
||||
|
||||
# Start session 1
|
||||
cd ~/private/docs
|
||||
ls -laF | more
|
||||
cat foo.txt bar.txt zorch.txt > somewhere
|
||||
exit
|
||||
# End session 1
|
||||
|
||||
# Start session 2
|
||||
cd ~/games/
|
||||
xquake &
|
||||
fg
|
||||
vi scores.txt
|
||||
mailx john_doe@somewhere.com
|
||||
exit
|
||||
# End session 2
|
||||
|
||||
would be represented by the token stream
|
||||
|
||||
**SOF**
|
||||
cd
|
||||
<1> # one "file name" argument
|
||||
ls
|
||||
-laF
|
||||
|
|
||||
more
|
||||
cat
|
||||
<3> # three "file" arguments
|
||||
>
|
||||
<1>
|
||||
exit
|
||||
**EOF**
|
||||
**SOF**
|
||||
cd
|
||||
<1>
|
||||
xquake
|
||||
&
|
||||
fg
|
||||
vi
|
||||
<1>
|
||||
mailx
|
||||
<1>
|
||||
exit
|
||||
**EOF**
|
||||
|
||||
|
||||
This data is made available under conditions of anonymity for the
|
||||
contributing users and may be used for research purposes only.
|
||||
Summaries and research results employing this data may be published,
|
||||
but literal tokens or token sequences from the data may not be
|
||||
published except with express consent of the originators of the data.
|
||||
No portion of this data may be released with or included in a
|
||||
commercial product, nor may any portion of this data be sold or
|
||||
redistributed for profit or as part of of a profit-making endeavor.
|
||||
|
||||
Please direct any questions regarding this data to Terran Lane:
|
||||
terran@ecn.purdue.edu.
|
8974
11/unix/Data/user_0
Normal file
8974
11/unix/Data/user_0
Normal file
File diff suppressed because it is too large
Load Diff
19881
11/unix/Data/user_1
Normal file
19881
11/unix/Data/user_1
Normal file
File diff suppressed because it is too large
Load Diff
18738
11/unix/Data/user_2
Normal file
18738
11/unix/Data/user_2
Normal file
File diff suppressed because it is too large
Load Diff
16866
11/unix/Data/user_3
Normal file
16866
11/unix/Data/user_3
Normal file
File diff suppressed because it is too large
Load Diff
37817
11/unix/Data/user_4
Normal file
37817
11/unix/Data/user_4
Normal file
File diff suppressed because it is too large
Load Diff
34821
11/unix/Data/user_5
Normal file
34821
11/unix/Data/user_5
Normal file
File diff suppressed because it is too large
Load Diff
64152
11/unix/Data/user_6
Normal file
64152
11/unix/Data/user_6
Normal file
File diff suppressed because it is too large
Load Diff
17329
11/unix/Data/user_7
Normal file
17329
11/unix/Data/user_7
Normal file
File diff suppressed because it is too large
Load Diff
54042
11/unix/Data/user_8
Normal file
54042
11/unix/Data/user_8
Normal file
File diff suppressed because it is too large
Load Diff
124
11/unix/response.txt
Normal file
124
11/unix/response.txt
Normal file
@ -0,0 +1,124 @@
|
||||
Nie wiem jak na tych plikach dokonać innej analizy statystycznej, więc zliczyłem po prostu najczęściej występujące
|
||||
komendy per każdy użytkownik. Poniżej załączam komendę jaką stosowałem do analizy jak i wyniki badania.
|
||||
Załączyłem tylko 10 najczęstszych żeby nie przytłoczyć.
|
||||
Dodam że oczywiście tokeny typu <1>, **EOF***, etc. będą się pojawiać, ale nie należy
|
||||
brać ich pod uwagę przy analizie statystycznej.
|
||||
|
||||
Komenda: sort <nazwa_pliku> | uniq -c | sort -rn | head -n 10
|
||||
|
||||
Wyniki:
|
||||
|
||||
user_0
|
||||
2147 <1>
|
||||
803 ls
|
||||
567 **SOF**
|
||||
567 **EOF**
|
||||
507 cd
|
||||
485 finger
|
||||
450 elm
|
||||
442 exit
|
||||
251 <2>
|
||||
230 fg
|
||||
|
||||
user_1
|
||||
6069 <1>
|
||||
1951 cd
|
||||
1929 ls
|
||||
1733 vi
|
||||
884 <2>
|
||||
515 **SOF**
|
||||
515 **EOF**
|
||||
397 smake
|
||||
350 ll
|
||||
315 more
|
||||
|
||||
user_2
|
||||
5432 <1>
|
||||
1597 cd
|
||||
1069 **SOF**
|
||||
1069 **EOF**
|
||||
989 a.out
|
||||
932 <2>
|
||||
816 ls
|
||||
626 quota
|
||||
612 xcc
|
||||
497 rm
|
||||
|
||||
user_3
|
||||
4382 <1>
|
||||
1710 ls
|
||||
988 cd
|
||||
806 more
|
||||
778 vi
|
||||
704 elm
|
||||
577 fg
|
||||
511 lo
|
||||
501 **SOF**
|
||||
501 **EOF**
|
||||
|
||||
|
||||
user_4
|
||||
10699 <1>
|
||||
4501 cd
|
||||
2395 ll
|
||||
1682 vi
|
||||
1465 dir
|
||||
1396 <2>
|
||||
955 **SOF**
|
||||
955 **EOF**
|
||||
641 elm
|
||||
559 logout
|
||||
|
||||
|
||||
user_5
|
||||
8987 <1>
|
||||
2862 cd
|
||||
2748 <2>
|
||||
2144 ls
|
||||
1279 less
|
||||
1183 grep
|
||||
973 make
|
||||
887 ll
|
||||
778 -
|
||||
632 <
|
||||
|
||||
|
||||
user_6
|
||||
16298 <1>
|
||||
8761 ls
|
||||
5680 cd
|
||||
3419 **SOF**
|
||||
3419 **EOF**
|
||||
2830 vi
|
||||
2419 elm
|
||||
2015 <2>
|
||||
1457 rm
|
||||
996 exit
|
||||
|
||||
|
||||
user_7
|
||||
3463 <1>
|
||||
1522 **SOF**
|
||||
1522 **EOF**
|
||||
1133 ls
|
||||
848 cd
|
||||
741 z
|
||||
615 <2>
|
||||
595 m
|
||||
514 clear
|
||||
237 rm
|
||||
|
||||
|
||||
user_8
|
||||
14269 <1>
|
||||
5108 ll
|
||||
5016 cd
|
||||
2188 <2>
|
||||
1983 **SOF**
|
||||
1983 **EOF**
|
||||
1553 k
|
||||
1259 m
|
||||
1177 z
|
||||
796 vi
|
||||
|
||||
|
216
11/wywolania/Data/UserDoc/graphic1.eps
Normal file
216
11/wywolania/Data/UserDoc/graphic1.eps
Normal file
@ -0,0 +1,216 @@
|
||||
%!PS-Adobe-2.0 EPSF-2.0
|
||||
%%Title: graphic1.eps
|
||||
%%Creator: fig2dev Version 3.2 Patchlevel 0-beta2
|
||||
%%CreationDate: Tue Feb 24 15:25:25 1998
|
||||
%%For: julie@snow (Julie Rehmeyer,,,)
|
||||
%%Orientation: Portrait
|
||||
%%BoundingBox: 0 0 223 79
|
||||
%%Pages: 0
|
||||
%%BeginSetup
|
||||
%%IncludeFeature: *PageSize Letter
|
||||
%%EndSetup
|
||||
%%Magnification: 0.70
|
||||
%%EndComments
|
||||
/$F2psDict 200 dict def
|
||||
$F2psDict begin
|
||||
$F2psDict /mtrx matrix put
|
||||
/col-1 {0 setgray} bind def
|
||||
/col0 {0.000 0.000 0.000 srgb} bind def
|
||||
/col1 {0.000 0.000 1.000 srgb} bind def
|
||||
/col2 {0.000 1.000 0.000 srgb} bind def
|
||||
/col3 {0.000 1.000 1.000 srgb} bind def
|
||||
/col4 {1.000 0.000 0.000 srgb} bind def
|
||||
/col5 {1.000 0.000 1.000 srgb} bind def
|
||||
/col6 {1.000 1.000 0.000 srgb} bind def
|
||||
/col7 {1.000 1.000 1.000 srgb} bind def
|
||||
/col8 {0.000 0.000 0.560 srgb} bind def
|
||||
/col9 {0.000 0.000 0.690 srgb} bind def
|
||||
/col10 {0.000 0.000 0.820 srgb} bind def
|
||||
/col11 {0.530 0.810 1.000 srgb} bind def
|
||||
/col12 {0.000 0.560 0.000 srgb} bind def
|
||||
/col13 {0.000 0.690 0.000 srgb} bind def
|
||||
/col14 {0.000 0.820 0.000 srgb} bind def
|
||||
/col15 {0.000 0.560 0.560 srgb} bind def
|
||||
/col16 {0.000 0.690 0.690 srgb} bind def
|
||||
/col17 {0.000 0.820 0.820 srgb} bind def
|
||||
/col18 {0.560 0.000 0.000 srgb} bind def
|
||||
/col19 {0.690 0.000 0.000 srgb} bind def
|
||||
/col20 {0.820 0.000 0.000 srgb} bind def
|
||||
/col21 {0.560 0.000 0.560 srgb} bind def
|
||||
/col22 {0.690 0.000 0.690 srgb} bind def
|
||||
/col23 {0.820 0.000 0.820 srgb} bind def
|
||||
/col24 {0.500 0.190 0.000 srgb} bind def
|
||||
/col25 {0.630 0.250 0.000 srgb} bind def
|
||||
/col26 {0.750 0.380 0.000 srgb} bind def
|
||||
/col27 {1.000 0.500 0.500 srgb} bind def
|
||||
/col28 {1.000 0.630 0.630 srgb} bind def
|
||||
/col29 {1.000 0.750 0.750 srgb} bind def
|
||||
/col30 {1.000 0.880 0.880 srgb} bind def
|
||||
/col31 {1.000 0.840 0.000 srgb} bind def
|
||||
|
||||
end
|
||||
save
|
||||
-47.0 140.0 translate
|
||||
1 -1 scale
|
||||
|
||||
/cp {closepath} bind def
|
||||
/ef {eofill} bind def
|
||||
/gr {grestore} bind def
|
||||
/gs {gsave} bind def
|
||||
/sa {save} bind def
|
||||
/rs {restore} bind def
|
||||
/l {lineto} bind def
|
||||
/m {moveto} bind def
|
||||
/rm {rmoveto} bind def
|
||||
/n {newpath} bind def
|
||||
/s {stroke} bind def
|
||||
/sh {show} bind def
|
||||
/slc {setlinecap} bind def
|
||||
/slj {setlinejoin} bind def
|
||||
/slw {setlinewidth} bind def
|
||||
/srgb {setrgbcolor} bind def
|
||||
/rot {rotate} bind def
|
||||
/sc {scale} bind def
|
||||
/sd {setdash} bind def
|
||||
/ff {findfont} bind def
|
||||
/sf {setfont} bind def
|
||||
/scf {scalefont} bind def
|
||||
/sw {stringwidth} bind def
|
||||
/tr {translate} bind def
|
||||
/tnt {dup dup currentrgbcolor
|
||||
4 -2 roll dup 1 exch sub 3 -1 roll mul add
|
||||
4 -2 roll dup 1 exch sub 3 -1 roll mul add
|
||||
4 -2 roll dup 1 exch sub 3 -1 roll mul add srgb}
|
||||
bind def
|
||||
/shd {dup dup currentrgbcolor 4 -2 roll mul 4 -2 roll mul
|
||||
4 -2 roll mul srgb} bind def
|
||||
/DrawEllipse {
|
||||
/endangle exch def
|
||||
/startangle exch def
|
||||
/yrad exch def
|
||||
/xrad exch def
|
||||
/y exch def
|
||||
/x exch def
|
||||
/savematrix mtrx currentmatrix def
|
||||
x y tr xrad yrad sc 0 0 1 startangle endangle arc
|
||||
closepath
|
||||
savematrix setmatrix
|
||||
} def
|
||||
|
||||
/$F2psBegin {$F2psDict begin /$F2psEnteredState save def} def
|
||||
/$F2psEnd {$F2psEnteredState restore end} def
|
||||
%%EndProlog
|
||||
|
||||
$F2psBegin
|
||||
10 setmiterlimit
|
||||
n 0 3367 m 0 0 l 6449 0 l 6449 3367 l cp clip
|
||||
0.04200 0.04200 sc
|
||||
7.500 slw
|
||||
% Ellipse
|
||||
n 1800 1800 75 75 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr
|
||||
|
||||
% Ellipse
|
||||
n 1500 2400 75 75 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr
|
||||
|
||||
% Ellipse
|
||||
n 2078 2378 75 75 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr
|
||||
|
||||
% Ellipse
|
||||
n 2378 2978 75 75 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr
|
||||
|
||||
% Ellipse
|
||||
n 1778 2978 75 75 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr
|
||||
|
||||
% Ellipse
|
||||
n 1246 2994 75 75 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr
|
||||
|
||||
% Ellipse
|
||||
n 3900 1800 75 75 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr
|
||||
|
||||
% Ellipse
|
||||
n 5700 1800 75 75 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr
|
||||
|
||||
% Ellipse
|
||||
n 3900 2400 75 75 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr
|
||||
|
||||
% Ellipse
|
||||
n 3600 3000 75 75 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr
|
||||
|
||||
% Ellipse
|
||||
n 4200 3000 75 75 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr
|
||||
|
||||
% Ellipse
|
||||
n 5400 2400 75 75 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr
|
||||
|
||||
% Ellipse
|
||||
n 5400 3000 75 75 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr
|
||||
|
||||
% Ellipse
|
||||
n 6000 2400 75 75 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr
|
||||
|
||||
% Ellipse
|
||||
n 6000 3000 75 75 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr
|
||||
|
||||
% Polyline
|
||||
n 1800 1800 m 2100 2400 l gs 0.00 setgray ef gr gs col0 s gr
|
||||
% Polyline
|
||||
n 1800 1800 m 1500 2400 l 1800 3000 l gs col0 s gr
|
||||
% Polyline
|
||||
n 1500 2400 m 1200 3000 l gs 0.00 setgray ef gr gs col0 s gr
|
||||
% Polyline
|
||||
n 2100 2400 m 2400 3000 l gs 0.00 setgray ef gr gs col0 s gr
|
||||
% Polyline
|
||||
n 3900 1800 m 3900 2400 l 4200 3000 l gs col0 s gr
|
||||
% Polyline
|
||||
n 3900 2400 m 3600 3000 l gs col0 s gr
|
||||
% Polyline
|
||||
n 5700 1800 m 6000 2400 l 6000 3000 l gs col0 s gr
|
||||
% Polyline
|
||||
n 5700 1800 m 5400 2400 l 5400 3000 l gs col0 s gr
|
||||
/Times-Roman ff 180.00 scf sf
|
||||
1725 1575 m
|
||||
gs 1 -1 sc (24) col0 sh gr
|
||||
/Times-Roman ff 180.00 scf sf
|
||||
1125 2400 m
|
||||
gs 1 -1 sc (13) col0 sh gr
|
||||
/Times-Roman ff 180.00 scf sf
|
||||
1200 3300 m
|
||||
gs 1 -1 sc (5) col0 sh gr
|
||||
/Times-Roman ff 180.00 scf sf
|
||||
3825 1575 m
|
||||
gs 1 -1 sc (13) col0 sh gr
|
||||
/Times-Roman ff 180.00 scf sf
|
||||
2325 2400 m
|
||||
gs 1 -1 sc (4) col0 sh gr
|
||||
/Times-Roman ff 180.00 scf sf
|
||||
1725 3300 m
|
||||
gs 1 -1 sc (2) col0 sh gr
|
||||
/Times-Roman ff 180.00 scf sf
|
||||
3525 3300 m
|
||||
gs 1 -1 sc (81) col0 sh gr
|
||||
/Times-Roman ff 180.00 scf sf
|
||||
4125 3300 m
|
||||
gs 1 -1 sc (18) col0 sh gr
|
||||
/Times-Roman ff 180.00 scf sf
|
||||
5625 1575 m
|
||||
gs 1 -1 sc (4) col0 sh gr
|
||||
/Times-Roman ff 180.00 scf sf
|
||||
5325 3300 m
|
||||
gs 1 -1 sc (4) col0 sh gr
|
||||
/Times-Roman ff 180.00 scf sf
|
||||
6000 3300 m
|
||||
gs 1 -1 sc (5) col0 sh gr
|
||||
/Times-Roman ff 180.00 scf sf
|
||||
6225 2475 m
|
||||
gs 1 -1 sc (13) col0 sh gr
|
||||
/Times-Roman ff 180.00 scf sf
|
||||
3600 2475 m
|
||||
gs 1 -1 sc (5) col0 sh gr
|
||||
/Times-Roman ff 180.00 scf sf
|
||||
5025 2475 m
|
||||
gs 1 -1 sc (24) col0 sh gr
|
||||
/Times-Roman ff 180.00 scf sf
|
||||
2325 3300 m
|
||||
gs 1 -1 sc (13) col0 sh gr
|
||||
$F2psEnd
|
||||
rs
|
BIN
11/wywolania/Data/UserDoc/user_doc.dvi
Executable file
BIN
11/wywolania/Data/UserDoc/user_doc.dvi
Executable file
Binary file not shown.
1416
11/wywolania/Data/UserDoc/user_doc.ps
Executable file
1416
11/wywolania/Data/UserDoc/user_doc.ps
Executable file
File diff suppressed because it is too large
Load Diff
583
11/wywolania/Data/UserDoc/user_doc.tex
Normal file
583
11/wywolania/Data/UserDoc/user_doc.tex
Normal file
@ -0,0 +1,583 @@
|
||||
\documentclass{amsart}
|
||||
\usepackage{graphicx}
|
||||
\usepackage{array}
|
||||
\usepackage{moreverb}
|
||||
\title{DRAFT: User Documentation for the STIDE software package}
|
||||
\author{Julie Rehmeyer}
|
||||
\date{\today}
|
||||
\begin{document}
|
||||
\maketitle
|
||||
|
||||
|
||||
\section{Software Purpose} \label{sec:intro}
|
||||
STIDE stands for Sequence Time-Delay Embedding, and it implements the
|
||||
time-delay embedding method of anomaly detection. Its primary
|
||||
function is to accept as input a time series (or a set of time
|
||||
series), divide it into a set of fixed-length sequences, compare that
|
||||
set of sequences with an existing database of fixed length sequences,
|
||||
and report on the consistency of the time series with the existing
|
||||
database. It can also be used to created a database of fixed-length
|
||||
sequences from scratch, or to add to a pre-existing database.
|
||||
|
||||
The STIDE software was originally developed by Steve Hofmeyr, a
|
||||
graduate student in the Computer Science Department at the University
|
||||
of New Mexico, as part of a research program that is applying ideas
|
||||
from immunology to problems in computer security. In particular,
|
||||
STIDE was written to assist in detecting intrusions by identifying the
|
||||
unusual sequences of system calls that may be created during an
|
||||
attempted intrusion \cite{lightweight, ci, principles, self}. In this
|
||||
context, the time series being considered consists of the system calls
|
||||
made by a single process. We first record the system calls made by a
|
||||
process exibiting normal behavior (i.e., in non-exploited situations),
|
||||
and then use STIDE to divide that continuous stream of system calls
|
||||
into sequences of a given length and store them in a database.
|
||||
Subsequently, when we want to know if another instance of the same
|
||||
program has been attacked, we record the system calls the process has
|
||||
generated and use STIDE to compare the resulting sequences of system
|
||||
calls with the database of normal sequences. A large number of
|
||||
sequences created by the potentially attacked process that weren't
|
||||
created by the uncompromised processes suggests that the process may
|
||||
have been exploited.
|
||||
|
||||
In practice, because of limitations in available system call tracing
|
||||
mechanisms, is far easier for us to record simultaneously the system
|
||||
calls generated by several processes that are running at the same
|
||||
time. STIDE is designed to handle this sort of situation. It can
|
||||
simultaneously process multiple interwoven time series by requiring
|
||||
that each element in the input stream be preceded by an identifier to
|
||||
specify which series it comes from. In our work, that identifier is
|
||||
the process ID.
|
||||
|
||||
The simplest way that STIDE can analyze information about the
|
||||
consistency of new data with an existing database is to report the
|
||||
number of anomolous sequences, i.e., the number of sequences in the
|
||||
input which do not exist in the database.
|
||||
|
||||
It can also report the minimum Hamming distance \cite{lightweight}.
|
||||
Given a sequence from the data stream and a sequence from the
|
||||
database, we can compute the number of entries that are different
|
||||
between the two sequences and get the Hamming distance between those
|
||||
two sequences. The minimum of the Hamming distances between the input
|
||||
sequence and all of the sequences from the database is the minimum
|
||||
Hamming distance for the input sequence.
|
||||
|
||||
The final option is that it can report a ``locality frame count''
|
||||
\cite{ci}. When a process is exploited, there may be a short period
|
||||
of time (a locality) when the percentage of anomolous sequences is
|
||||
much higher. Although ten anomalies over the course of a long
|
||||
run may not be cause for concern, ten anomalies within thirty
|
||||
sequences might be. Thus it can be useful to observe how many
|
||||
anomalies there are {\it locally}. The number of sequences that are
|
||||
considered to be ``local'' to one another is called the size of the
|
||||
locality frame. In this mode, STIDE reports the largest number of
|
||||
anomalies it finds within any locality frame.
|
||||
|
||||
An additional advantage of calculating locality frame counts is that
|
||||
it provides an ``on-line'' measure. Ultimately, we are interested in a
|
||||
system which would detect intrusions as the system is running.
|
||||
Because locality frame counts are calculated locally, one can
|
||||
immediately be notified when an intrusion may be occurring.
|
||||
|
||||
\section{Input Data Format} \label{sec:input}
|
||||
The input data consists of the time series to be analyzed. It is read
|
||||
from standard input. It is expected to be a series of pairs of
|
||||
positive integers, one pair per line, where the first integer
|
||||
identifies the data stream and the second integer is the element of
|
||||
the data stream. The end of the data stream can either be designated
|
||||
by the end of the file or by an occurrence of the number $-1$ as a
|
||||
stream identifier. In our work, the stream identifier is the process
|
||||
identification number (PID), and the elements of the data stream are
|
||||
system call numbers.
|
||||
|
||||
The following is a small example of an input file, tracking three
|
||||
processes, with PID's 744, 1069 and 9.
|
||||
|
||||
\vspace{.15in}
|
||||
\begin{tabular}{l l}
|
||||
744 & 24 \\
|
||||
744 & 13 \\
|
||||
1069 & 4 \\
|
||||
1069 & 24 \\
|
||||
1069 & 4 \\
|
||||
744 & 5 \\
|
||||
9 & 24 \\
|
||||
1069 & 13 \\
|
||||
744 & 81 \\
|
||||
9 & 13 \\
|
||||
9 & 2 \\
|
||||
1069 & 5 \\
|
||||
1069 & 18 \\
|
||||
-1
|
||||
\end{tabular}
|
||||
\vspace{.15in}
|
||||
|
||||
If the number $-1$ occurs as a data element, STIDE interprets that as
|
||||
a missing data element. It does not form any sequences going through
|
||||
that data element. It clears the sequence and starts from scratch.
|
||||
|
||||
For example, suppose that the sequence length is 3 and the input is as
|
||||
follows:
|
||||
\nopagebreak
|
||||
\vspace{5pt}
|
||||
\begin{tabular}{l l}
|
||||
220 & 14 \\
|
||||
220 & 185 \\
|
||||
220 & 20 \\
|
||||
220 & -1 \\
|
||||
220 & 2 \\
|
||||
220 & 20 \\
|
||||
220 & 3 \\
|
||||
220 & 2 \\
|
||||
-1
|
||||
\end{tabular}
|
||||
\vspace{.15in}
|
||||
|
||||
STIDE would derive three sequences from this input: 14, 185, 20; 2,
|
||||
20, 3; and 20, 3, 2.
|
||||
|
||||
\section{Configuration Options}
|
||||
There are a number of options which affect STIDE's behavior. Every
|
||||
option has a default value. The values may be changed through command
|
||||
line arguments or through a configuration file. Values set by the
|
||||
configuration file override default values and values set by the
|
||||
command line override those set by either the configuration file or
|
||||
the defaults. The following options are available:
|
||||
|
||||
\vspace{.2in}
|
||||
\setlength{\extrarowheight}{3pt}
|
||||
|
||||
\begin{tabular}{l|l|l|l}
|
||||
|
||||
\vspace{-3pt}
|
||||
Short &&& \\
|
||||
Name & Long name & Legitimate Values & Default Value \\
|
||||
\hline
|
||||
|
||||
{\tt a} & {\tt add\_to\_db} & on or off & off \\
|
||||
{\tt c} & {\tt config\_name} & filenames & stide.config \\
|
||||
{\tt d} & {\tt db\_name} & filenames & default.db \\
|
||||
{\tt f} & {\tt lf\_size} & 1 -- 999 & 1 \\
|
||||
{\tt g} & {\tt output\_graph} & on or off & off \\
|
||||
{\tt l} & {\tt seq\_len} & 1 -- 199 & 6 \\
|
||||
{\tt p} & {\tt pair\_offset} & integers & 0 \\
|
||||
{\tt s} & {\tt write\_db\_stats} & on or off & off \\
|
||||
{\tt v} & {\tt verbose} & on or off & off \\
|
||||
{\tt V} & {\tt very\_verbose} & on or off & off \\
|
||||
{\tt hd} & {\tt compute\_hdist} & on or off & off \\
|
||||
{\tt me} & {\tt max\_elements} & 1 -- 999 & 500 \\
|
||||
{\tt ms} & {\tt max\_streams} & 1 -- 999 & 100 \\
|
||||
{\tt aof} & {\tt add\_output\_format} & see below & see below \\
|
||||
{\tt cof} & {\tt compare\_output\_format} & see below & see below \\
|
||||
|
||||
\end{tabular}
|
||||
|
||||
\vspace{.2in}
|
||||
|
||||
\subsection{Descriptions of Options}
|
||||
|
||||
\subsubsection{Option {\tt add\_to\_db} }
|
||||
|
||||
This flag indicates that you want the input data to be added to the
|
||||
database. If there is no pre-existing database, it indicates that you
|
||||
want to create a new database from the input data. Note that you
|
||||
cannot simultaneously compare data and add it to the database. If
|
||||
this switch is off, STIDE compares the input data with the database
|
||||
without adding it.
|
||||
|
||||
\subsubsection{Option {\tt{config\_name}}}
|
||||
This is the name of the configuration file to be used. See
|
||||
Section~\ref{subsec:config} for more information about the
|
||||
configuration file.
|
||||
|
||||
\subsubsection{Option {\tt db\_name}}
|
||||
This is the name of an existing database or the name under which to
|
||||
store a new database that will be created from the input data.
|
||||
|
||||
\subsubsection{Option {\tt lf\_size}}
|
||||
This is the size of the locality frame (see Section~\ref{sec:intro}
|
||||
for an explanation of locality frame count). The value 1 effectively
|
||||
turns off locality frames.
|
||||
|
||||
\subsubsection{Option {\tt output\_graph}}
|
||||
This causes STIDE to create a file {\tt db\_name.dot} containing a
|
||||
graph of the entire database forest formatted as input for the program
|
||||
Dot. Running Dot on the file translates it into PostScript format.
|
||||
The result is a graphical image of the database.
|
||||
|
||||
\subsubsection{Option {\tt seq\_len}}
|
||||
A database stores trees of sequences of a set length. When building a
|
||||
new database, the length of the sequences to be stored is set with
|
||||
{\tt seq\_len}. When adding to or comparing with an existing
|
||||
database, one must use the same sequence length that was used when the
|
||||
database was generated. In those situations, STIDE will automatically
|
||||
figure out the correct sequence length and use it regardless of the
|
||||
user specification or the default.\footnote{STIDE can do this for
|
||||
revision 1 databases only. STIDE can still process old-style
|
||||
databases, but cannot implement this feature. STIDE recognizes
|
||||
revision 1 databases by their initial line: {\tt \#DBrev: 1 } and the
|
||||
following line: {\tt \#DBseq\_len: } followed by an integer giving
|
||||
the sequence length. When STIDE processes an old-style database, it
|
||||
converts it to a revision 1 database if it is in {\tt add\_to\_db}
|
||||
mode.}
|
||||
|
||||
\subsubsection{Option {\tt pair\_offset}} \label{subsubsec:po}
|
||||
In {\tt verbose} or {\tt very\_verbose} modes, STIDE reports on
|
||||
particular sequences of interest (see Sections \ref{subsubsec:verbose}
|
||||
and \ref{subsubsec:very-verbose}). One of the pieces of information
|
||||
one might be interested in is where a particular sequence occurs in
|
||||
the input. Recall that the input data is a stream of pairs (stream
|
||||
number, element number), and each element in the sequence being
|
||||
considered came from one of those input pairs. STIDE reports on where
|
||||
the sequence occurred in the input by reporting the pair number of the
|
||||
last element of the sequence.
|
||||
|
||||
These numbers may be offset by a fixed amount by setting {\tt
|
||||
pair\_offset}.
|
||||
|
||||
\subsubsection{Option {\tt write\_db\_stats}}
|
||||
This flag causes STIDE to print out statistics on the database. The
|
||||
statistics it will print are the number of nodes in the database, the
|
||||
number of unique sequences, the number of branches, and the average
|
||||
database branch factor. See Section~\ref{sec:output} for more
|
||||
information.
|
||||
|
||||
\subsubsection{Option {\tt verbose}} \label{subsubsec:verbose}
|
||||
When adding to the database in {\tt verbose} mode, STIDE will print
|
||||
information about each new sequence being added to the database, where
|
||||
the precise information is specified by the {\tt add\_output\_format}
|
||||
parameter (see Section~\ref{subsubsec:aof}). When comparing the input
|
||||
data with an existing database in {\tt verbose }mode, it will print
|
||||
information about each sequence that is itself a miss or whose
|
||||
locality frame contains a miss, where the precise information is
|
||||
specified by the {\tt compare\_output\_format} parameter (see
|
||||
Section~\ref{subsubsec:cof}). In either case, when adding or
|
||||
comparing, STIDE will first print out a header with a list of the names
|
||||
of the variables being printed.
|
||||
|
||||
\subsubsection{Option {\tt very\_verbose}} \label{subsubsec:very-verbose}
|
||||
In {\tt very\_verbose} mode, STIDE will print out the information specified
|
||||
by {\tt add\_output\_format} or {\tt compare\_output\_format} for each sequence
|
||||
encountered in the input data, regardless of whether the sequence is
|
||||
new. As in {\tt verbose} mode, STIDE will first print out a header
|
||||
with a list of the names of the variables being printed.
|
||||
|
||||
\subsubsection{Option {\tt compute\_hdist}}
|
||||
This switch causes the Hamming distance \cite{lightweight} to be
|
||||
computed (see Section~\ref{sec:intro} for an explanation of Hamming
|
||||
distance).
|
||||
|
||||
\subsubsection{Option {\tt max\_elements}}
|
||||
This is the maximum number of unique data elements that STIDE might
|
||||
encounter in the input data.
|
||||
|
||||
\subsubsection{Option {\tt max\_streams}}
|
||||
This is the maximum number of different data streams that STIDE might
|
||||
encounter in the input data.
|
||||
|
||||
\subsubsection{Option {\tt add\_output\_format}} \label{subsubsec:aof}
|
||||
|
||||
When adding to the database in {\tt verbose} or {\tt very\_verbose}
|
||||
modes, STIDE will print the {\tt add\_output\_format} string for every
|
||||
sequence of interest (see Sections \ref{subsubsec:verbose} and
|
||||
\ref{subsubsec:very-verbose}). Substitutions are made for control
|
||||
characters as follows:
|
||||
|
||||
\vspace{.15in}
|
||||
|
||||
\begin{tabular}{c|l}
|
||||
\vspace{-4pt}
|
||||
Control \\ Char & Meaning \\ \hline
|
||||
\%s & Stream Identification Number \\
|
||||
\%d & Database Size \\
|
||||
\vspace{-4pt}
|
||||
\%p & Pair number of last data element of \\
|
||||
& sequence in the whole input stream \\
|
||||
\vspace{-4pt}
|
||||
\%i & Pair number of last data element of \\
|
||||
& sequence in its particular data stream \\
|
||||
\verb+\+t & Tab \\
|
||||
\verb+\+n & Newline \\
|
||||
\end{tabular}
|
||||
|
||||
\vspace{.15in}
|
||||
|
||||
See section \ref{subsubsec:po} for more information about the meaning
|
||||
of the \%p and \%i control characters.
|
||||
|
||||
The default value of {\tt add\_output\_format} is:
|
||||
|
||||
\vspace{5pt}
|
||||
|
||||
\verb+"DB Size: %d\tStream: %s\tPair Number: %p\n"+
|
||||
|
||||
\subsubsection{Option {\tt compare\_output\_format}} \label{subsubsec:cof}
|
||||
When comparing data in {\tt verbose} mode, STIDE will print the
|
||||
{\tt compare\_output\_format} string for every sequence which is
|
||||
itself an anomaly or whose locality frame conatins an anomaly. In
|
||||
{\tt very\_verbose} mode, STIDE will print the string indicated for
|
||||
{\it every} sequence, regardless of whether it is an anomaly.
|
||||
Substitutions are made for control characters as follows:
|
||||
|
||||
\vspace{.15in}
|
||||
|
||||
\begin{tabular}{c|l}
|
||||
\vspace{-4pt}
|
||||
Control \\ Char & Meaning \\ \hline
|
||||
\%s & Stream Identification Number \\
|
||||
\vspace{-4pt}
|
||||
\%p & Pair number of last data element of \\
|
||||
& sequence in the whole input stream \\
|
||||
\vspace{-4pt}
|
||||
\%i & Pair number of last data element of \\
|
||||
& sequence in its particular data stream \\
|
||||
\%a & 1 if this sequence is an anomaly, 0 otherwise \\
|
||||
\%c & locality frame count of this sequence \\
|
||||
\%h & Hamming distance \\
|
||||
\verb+\+t & Tab \\
|
||||
\verb+\+n & Newline \\
|
||||
\end{tabular}
|
||||
|
||||
\vspace{.15in}
|
||||
|
||||
See section \ref{subsubsec:po} for more information about the meaning
|
||||
of the \%p and \%i control characters.
|
||||
|
||||
The default value of {\tt compare\_output\_format} is:
|
||||
|
||||
\vspace{5pt}
|
||||
|
||||
\verb+"Pair Number: %p\tStream Number: %s\n"+
|
||||
|
||||
\subsection{Command-Line Arguments}
|
||||
All parameters may be set using the command line, in one of two ways.
|
||||
The short name may be used, preceeded by a hyphen and followed by a
|
||||
value (if appropriate). The long name may also be used, but it must
|
||||
be preceeded by {\it two} hyphens and followed by a value (if
|
||||
appropriate). Values set by the command line override those set in
|
||||
any other way.
|
||||
|
||||
Switches which are simply turned on or off need not be followed by a
|
||||
value. Parameters may be set in any order. There must be space
|
||||
between the parameter name and the value. Flags may not be combined.
|
||||
|
||||
STIDE expects the input data to come from standard input.
|
||||
|
||||
\subsubsection{Examples}
|
||||
|
||||
To use STIDE to create a database called ``our\_data.db'' from the
|
||||
input file ``input1.dat'' with sequences of length 10, using the
|
||||
default configuration file name, in verbose mode, with ouput format
|
||||
``\verb+%p\t%s\t%d\n+'', one could type:
|
||||
|
||||
\vspace{5pt}
|
||||
|
||||
\begin{verbatim}
|
||||
stide -d our_data.db -a -l 10 -v -aof "%p\t%s\t%d\n" < input1.dat
|
||||
\end{verbatim}
|
||||
|
||||
\vspace{5pt}
|
||||
|
||||
To add the data from the file ``input2.dat'' to that database, using
|
||||
the same configuration file, not in verbose mode, and to create a
|
||||
graph in dot format, one could type:
|
||||
|
||||
\vspace{5pt}
|
||||
|
||||
\begin{verbatim}
|
||||
stide -d our_data.db --output_graph --add_to_db -l 10 < input2.dat
|
||||
\end{verbatim}
|
||||
|
||||
\vspace{5pt}
|
||||
|
||||
Then to compare the data in file ``input3.dat'' to the database and
|
||||
have the results reported using locality frame counts with locality
|
||||
frame size 20, using the configuration file ``run3.config'', one would
|
||||
type:
|
||||
|
||||
\vspace{5pt}
|
||||
|
||||
\begin{verbatim}
|
||||
stide -d our_data.db -f 20 -l 10 -c run3.config < input3.dat
|
||||
\end{verbatim}
|
||||
|
||||
\vspace{5pt}
|
||||
|
||||
\subsection{Configuration File} \label{subsec:config}
|
||||
All parameters may be set using a configuration file. The first line
|
||||
of a configuration file must be:\footnote{Old-style configuration
|
||||
files lack this line. STIDE will assume that configuration files
|
||||
that lack this line are old-style, and will try to parse them
|
||||
accordingly, issuing a warning to the user.}
|
||||
|
||||
\vspace{5pt}
|
||||
|
||||
\begin{verbatim}
|
||||
#ConfigFileRev: 1
|
||||
\end{verbatim}
|
||||
|
||||
\vspace{5pt}
|
||||
|
||||
After the first line, lines may be commented out using a ``\#'' sign.
|
||||
Each parameter is set on its own line, using the long name followed by
|
||||
a colon, followed by the value. Lines may be continued by putting a
|
||||
backslash as the last character of the line. White space at the
|
||||
beginning of lines will be ignored. Parameters which are simple
|
||||
switches may be set with the value ``on'' or ``off'', or with no value
|
||||
at all (which will turn them on).
|
||||
|
||||
Configuration file values override default values and are overriden
|
||||
by command-line values.
|
||||
|
||||
\subsubsection{Example}
|
||||
|
||||
The following is a sample configuration file:
|
||||
|
||||
\vspace{.15in}
|
||||
|
||||
\begin{boxedverbatim}
|
||||
|
||||
# ConfigFileRev: 1
|
||||
# Sample STIDE configuration file containing default values.
|
||||
|
||||
db_name: default.db # name of database
|
||||
seq_len: 6 # length of sequences
|
||||
max_elements: 1000 # maximum number of unique elements
|
||||
# in input
|
||||
max_streams: 500 # maximum number of unique streams
|
||||
# in input
|
||||
pair_offset: 0 # offset for pair number count
|
||||
add_output_format: \
|
||||
"DB Size: %d\tStream: %s\tPair Number: %p\n"
|
||||
compare_output_format: \
|
||||
"Pair Number: %p\tStream Number: %s\n"
|
||||
lf_size: 1 # 1 causes locality frame counts not
|
||||
# to be computed
|
||||
add_to_db: off # Add this data to the database, or,
|
||||
# if there is no database, create a
|
||||
# new one -- do not do comparisons
|
||||
output_graph: off # Outputs graphing information in Dot
|
||||
# format
|
||||
compute_hdist: off # Compute Hamming distances
|
||||
write_db_stats: off # At end, print out statistics about
|
||||
# database
|
||||
verbose: off # Verbose mode
|
||||
very_verbose: off # Very verbose mode
|
||||
|
||||
\end{boxedverbatim}
|
||||
|
||||
\section{Output Data} \label{sec:output}
|
||||
For every run, STIDE will first output the final configuration data
|
||||
assembled from the defaults, the configuration file and the
|
||||
command-line arguments, in a format which could be used as a
|
||||
configuration file. The subsequent output depends on whether STIDE was
|
||||
adding to the database or making comparisons.
|
||||
|
||||
\subsection{Output Data About Comparisons}
|
||||
|
||||
If you have run the program to compare sequences, at the end STIDE
|
||||
will print out the number of different streams in the input, the total
|
||||
number of pairs read from the input, the total number of sequences
|
||||
read from the input, the number of sequences that were anomalous, and
|
||||
the percentage of sequences that were anomalous. If locality frame
|
||||
counts were being computed, STIDE reports the maximum locality frame
|
||||
count encountered in any stream, and if Hamming distances were being
|
||||
computed, STIDE reports the largest minimum Hamming distance of any
|
||||
sequence in any stream.
|
||||
|
||||
If the {\tt verbose} switch was on and the {\tt
|
||||
compare\_output\_format} parameter is set appropriately, STIDE will
|
||||
print out information about each sequence which is either itself an
|
||||
anomaly or whose locality frame contains an anomaly (if locality
|
||||
frames are being computed). If the {\tt very\_verbose} switch was on
|
||||
and the {\tt compare\_output\_format} parameter is set appropriately,
|
||||
STIDE will print out information about each sequence, regardless of
|
||||
whether it is an anomaly. The precise information to be output is
|
||||
specified by the user in {\tt compare\_output\_format}. See Section
|
||||
\ref{subsubsec:cof} for details on what information {\tt
|
||||
compare\_output\_format} may request.
|
||||
|
||||
\subsection{Output Data About The Database}
|
||||
|
||||
If you are adding to the database, STIDE will not print out any
|
||||
information automatically (beyond the configuration information).
|
||||
However, one can get further information about the growth of the
|
||||
database by turning on {\tt verbose} or {\tt very\_verbose} modes, and
|
||||
one can get information about the shape and complexity of a database
|
||||
using the {\tt write\_db\_stats} switch.
|
||||
|
||||
\subsubsection{Database Growth Information}
|
||||
|
||||
In {\tt verbose} mode, STIDE will print out information on each new
|
||||
sequence which is added to the database. In {\tt very\_verbose} mode,
|
||||
STIDE will print out information on each sequence read in, regardless
|
||||
of whether it is new. The information that STIDE produces is
|
||||
determined by the {\tt add\_output\_format} parameter. See Section
|
||||
\ref{subsubsec:aof} for details on what information may be requested.
|
||||
|
||||
\subsubsection{Database Statistics}
|
||||
|
||||
The {\tt write\_db\_stats} switch causes STIDE to print out
|
||||
information about the shape and complexity of the database. The {\tt
|
||||
write\_db\_stats} switch may be used either when adding to the
|
||||
database or when making comparisons.
|
||||
|
||||
The sequences are stored as forests (groups of trees). Each path down
|
||||
each tree represents a sequence that STIDE has encountered. STIDE can
|
||||
compute the number of nodes on the trees, the number of leaves (leaves
|
||||
are the ends of the trees, i.e., the last element in a sequence), the
|
||||
number of branches, and the average branch factor, which is the number
|
||||
of branches divided by the difference between the number of nodes and
|
||||
the number of sequences.
|
||||
|
||||
For example, consider the sequences derived from the first sample input file in
|
||||
Section~\ref{sec:input}:
|
||||
\nopagebreak
|
||||
\vspace{5pt}
|
||||
|
||||
\begin{tabular}{c}
|
||||
24, 13, 5 \\
|
||||
13, 5, 81 \\
|
||||
4, 24, 4 \\
|
||||
24, 4, 13 \\
|
||||
4, 13, 5 \\
|
||||
13, 5, 18 \\
|
||||
24, 13, 2 \\
|
||||
\end{tabular}
|
||||
|
||||
\vspace{.15in}
|
||||
|
||||
We can represent those sequences by the forest:
|
||||
|
||||
\vspace{.15in}
|
||||
|
||||
\begin{picture}(350, 80)
|
||||
\put(40,0){\includegraphics{graphic1.eps}}
|
||||
\end{picture}
|
||||
|
||||
\vspace{.15in}
|
||||
|
||||
In this database, the number of nodes is 15, the number of leaves is
|
||||
7, and the number of branches is 12. There are 7 unique sequences.
|
||||
The average branch factor is $12 / (15 - 7) = 1.5$.
|
||||
|
||||
\begin{thebibliography}{99}
|
||||
|
||||
\bibitem{lightweight} S. Hofmeyr, S. Forrest, and A. Somayaji
|
||||
``Lightweight intrusion detection for networked operating systems.''
|
||||
Submitted to {\em Journal of Computer Security} (July, 1997).
|
||||
|
||||
\bibitem{ci} S. Forrest, S. Hofmeyr, and A. Somayaji ``Computer
|
||||
immunology'' {\em Communications of the ACM} Vol. 40, No. 10, pp.
|
||||
88-96 (1997).
|
||||
|
||||
\bibitem{principles} A. Somayaji, S. Hofmeyr, and S. Forrest
|
||||
``Principles of a Computer Immune System.'' New Security Paradigms
|
||||
Workshop (presented September, 1997).
|
||||
|
||||
\bibitem{self} S. Forrest, S.~A. Hofmeyr, A. Somayaji, and T.~A.
|
||||
Longstaff ``A sense of self for Unix processes.'' In Proceedings of
|
||||
the 1996 IEEE Symposium on Computer Security and Privacy, IEEE
|
||||
Computer Society Press, Los Alamitos, CA, pp. 120-128 (1996).
|
||||
\end{thebibliography}
|
||||
|
||||
\end{document}
|
339
11/wywolania/Data/stide_v1.1/COPYING
Normal file
339
11/wywolania/Data/stide_v1.1/COPYING
Normal file
@ -0,0 +1,339 @@
|
||||
GNU GENERAL PUBLIC LICENSE
|
||||
Version 2, June 1991
|
||||
|
||||
Copyright (C) 1989, 1991 Free Software Foundation, Inc.
|
||||
675 Mass Ave, Cambridge, MA 02139, USA
|
||||
Everyone is permitted to copy and distribute verbatim copies
|
||||
of this license document, but changing it is not allowed.
|
||||
|
||||
Preamble
|
||||
|
||||
The licenses for most software are designed to take away your
|
||||
freedom to share and change it. By contrast, the GNU General Public
|
||||
License is intended to guarantee your freedom to share and change free
|
||||
software--to make sure the software is free for all its users. This
|
||||
General Public License applies to most of the Free Software
|
||||
Foundation's software and to any other program whose authors commit to
|
||||
using it. (Some other Free Software Foundation software is covered by
|
||||
the GNU Library General Public License instead.) You can apply it to
|
||||
your programs, too.
|
||||
|
||||
When we speak of free software, we are referring to freedom, not
|
||||
price. Our General Public Licenses are designed to make sure that you
|
||||
have the freedom to distribute copies of free software (and charge for
|
||||
this service if you wish), that you receive source code or can get it
|
||||
if you want it, that you can change the software or use pieces of it
|
||||
in new free programs; and that you know you can do these things.
|
||||
|
||||
To protect your rights, we need to make restrictions that forbid
|
||||
anyone to deny you these rights or to ask you to surrender the rights.
|
||||
These restrictions translate to certain responsibilities for you if you
|
||||
distribute copies of the software, or if you modify it.
|
||||
|
||||
For example, if you distribute copies of such a program, whether
|
||||
gratis or for a fee, you must give the recipients all the rights that
|
||||
you have. You must make sure that they, too, receive or can get the
|
||||
source code. And you must show them these terms so they know their
|
||||
rights.
|
||||
|
||||
We protect your rights with two steps: (1) copyright the software, and
|
||||
(2) offer you this license which gives you legal permission to copy,
|
||||
distribute and/or modify the software.
|
||||
|
||||
Also, for each author's protection and ours, we want to make certain
|
||||
that everyone understands that there is no warranty for this free
|
||||
software. If the software is modified by someone else and passed on, we
|
||||
want its recipients to know that what they have is not the original, so
|
||||
that any problems introduced by others will not reflect on the original
|
||||
authors' reputations.
|
||||
|
||||
Finally, any free program is threatened constantly by software
|
||||
patents. We wish to avoid the danger that redistributors of a free
|
||||
program will individually obtain patent licenses, in effect making the
|
||||
program proprietary. To prevent this, we have made it clear that any
|
||||
patent must be licensed for everyone's free use or not licensed at all.
|
||||
|
||||
The precise terms and conditions for copying, distribution and
|
||||
modification follow.
|
||||
|
||||
GNU GENERAL PUBLIC LICENSE
|
||||
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
|
||||
|
||||
0. This License applies to any program or other work which contains
|
||||
a notice placed by the copyright holder saying it may be distributed
|
||||
under the terms of this General Public License. The "Program", below,
|
||||
refers to any such program or work, and a "work based on the Program"
|
||||
means either the Program or any derivative work under copyright law:
|
||||
that is to say, a work containing the Program or a portion of it,
|
||||
either verbatim or with modifications and/or translated into another
|
||||
language. (Hereinafter, translation is included without limitation in
|
||||
the term "modification".) Each licensee is addressed as "you".
|
||||
|
||||
Activities other than copying, distribution and modification are not
|
||||
covered by this License; they are outside its scope. The act of
|
||||
running the Program is not restricted, and the output from the Program
|
||||
is covered only if its contents constitute a work based on the
|
||||
Program (independent of having been made by running the Program).
|
||||
Whether that is true depends on what the Program does.
|
||||
|
||||
1. You may copy and distribute verbatim copies of the Program's
|
||||
source code as you receive it, in any medium, provided that you
|
||||
conspicuously and appropriately publish on each copy an appropriate
|
||||
copyright notice and disclaimer of warranty; keep intact all the
|
||||
notices that refer to this License and to the absence of any warranty;
|
||||
and give any other recipients of the Program a copy of this License
|
||||
along with the Program.
|
||||
|
||||
You may charge a fee for the physical act of transferring a copy, and
|
||||
you may at your option offer warranty protection in exchange for a fee.
|
||||
|
||||
2. You may modify your copy or copies of the Program or any portion
|
||||
of it, thus forming a work based on the Program, and copy and
|
||||
distribute such modifications or work under the terms of Section 1
|
||||
above, provided that you also meet all of these conditions:
|
||||
|
||||
a) You must cause the modified files to carry prominent notices
|
||||
stating that you changed the files and the date of any change.
|
||||
|
||||
b) You must cause any work that you distribute or publish, that in
|
||||
whole or in part contains or is derived from the Program or any
|
||||
part thereof, to be licensed as a whole at no charge to all third
|
||||
parties under the terms of this License.
|
||||
|
||||
c) If the modified program normally reads commands interactively
|
||||
when run, you must cause it, when started running for such
|
||||
interactive use in the most ordinary way, to print or display an
|
||||
announcement including an appropriate copyright notice and a
|
||||
notice that there is no warranty (or else, saying that you provide
|
||||
a warranty) and that users may redistribute the program under
|
||||
these conditions, and telling the user how to view a copy of this
|
||||
License. (Exception: if the Program itself is interactive but
|
||||
does not normally print such an announcement, your work based on
|
||||
the Program is not required to print an announcement.)
|
||||
|
||||
These requirements apply to the modified work as a whole. If
|
||||
identifiable sections of that work are not derived from the Program,
|
||||
and can be reasonably considered independent and separate works in
|
||||
themselves, then this License, and its terms, do not apply to those
|
||||
sections when you distribute them as separate works. But when you
|
||||
distribute the same sections as part of a whole which is a work based
|
||||
on the Program, the distribution of the whole must be on the terms of
|
||||
this License, whose permissions for other licensees extend to the
|
||||
entire whole, and thus to each and every part regardless of who wrote it.
|
||||
|
||||
Thus, it is not the intent of this section to claim rights or contest
|
||||
your rights to work written entirely by you; rather, the intent is to
|
||||
exercise the right to control the distribution of derivative or
|
||||
collective works based on the Program.
|
||||
|
||||
In addition, mere aggregation of another work not based on the Program
|
||||
with the Program (or with a work based on the Program) on a volume of
|
||||
a storage or distribution medium does not bring the other work under
|
||||
the scope of this License.
|
||||
|
||||
3. You may copy and distribute the Program (or a work based on it,
|
||||
under Section 2) in object code or executable form under the terms of
|
||||
Sections 1 and 2 above provided that you also do one of the following:
|
||||
|
||||
a) Accompany it with the complete corresponding machine-readable
|
||||
source code, which must be distributed under the terms of Sections
|
||||
1 and 2 above on a medium customarily used for software interchange; or,
|
||||
|
||||
b) Accompany it with a written offer, valid for at least three
|
||||
years, to give any third party, for a charge no more than your
|
||||
cost of physically performing source distribution, a complete
|
||||
machine-readable copy of the corresponding source code, to be
|
||||
distributed under the terms of Sections 1 and 2 above on a medium
|
||||
customarily used for software interchange; or,
|
||||
|
||||
c) Accompany it with the information you received as to the offer
|
||||
to distribute corresponding source code. (This alternative is
|
||||
allowed only for noncommercial distribution and only if you
|
||||
received the program in object code or executable form with such
|
||||
an offer, in accord with Subsection b above.)
|
||||
|
||||
The source code for a work means the preferred form of the work for
|
||||
making modifications to it. For an executable work, complete source
|
||||
code means all the source code for all modules it contains, plus any
|
||||
associated interface definition files, plus the scripts used to
|
||||
control compilation and installation of the executable. However, as a
|
||||
special exception, the source code distributed need not include
|
||||
anything that is normally distributed (in either source or binary
|
||||
form) with the major components (compiler, kernel, and so on) of the
|
||||
operating system on which the executable runs, unless that component
|
||||
itself accompanies the executable.
|
||||
|
||||
If distribution of executable or object code is made by offering
|
||||
access to copy from a designated place, then offering equivalent
|
||||
access to copy the source code from the same place counts as
|
||||
distribution of the source code, even though third parties are not
|
||||
compelled to copy the source along with the object code.
|
||||
|
||||
4. You may not copy, modify, sublicense, or distribute the Program
|
||||
except as expressly provided under this License. Any attempt
|
||||
otherwise to copy, modify, sublicense or distribute the Program is
|
||||
void, and will automatically terminate your rights under this License.
|
||||
However, parties who have received copies, or rights, from you under
|
||||
this License will not have their licenses terminated so long as such
|
||||
parties remain in full compliance.
|
||||
|
||||
5. You are not required to accept this License, since you have not
|
||||
signed it. However, nothing else grants you permission to modify or
|
||||
distribute the Program or its derivative works. These actions are
|
||||
prohibited by law if you do not accept this License. Therefore, by
|
||||
modifying or distributing the Program (or any work based on the
|
||||
Program), you indicate your acceptance of this License to do so, and
|
||||
all its terms and conditions for copying, distributing or modifying
|
||||
the Program or works based on it.
|
||||
|
||||
6. Each time you redistribute the Program (or any work based on the
|
||||
Program), the recipient automatically receives a license from the
|
||||
original licensor to copy, distribute or modify the Program subject to
|
||||
these terms and conditions. You may not impose any further
|
||||
restrictions on the recipients' exercise of the rights granted herein.
|
||||
You are not responsible for enforcing compliance by third parties to
|
||||
this License.
|
||||
|
||||
7. If, as a consequence of a court judgment or allegation of patent
|
||||
infringement or for any other reason (not limited to patent issues),
|
||||
conditions are imposed on you (whether by court order, agreement or
|
||||
otherwise) that contradict the conditions of this License, they do not
|
||||
excuse you from the conditions of this License. If you cannot
|
||||
distribute so as to satisfy simultaneously your obligations under this
|
||||
License and any other pertinent obligations, then as a consequence you
|
||||
may not distribute the Program at all. For example, if a patent
|
||||
license would not permit royalty-free redistribution of the Program by
|
||||
all those who receive copies directly or indirectly through you, then
|
||||
the only way you could satisfy both it and this License would be to
|
||||
refrain entirely from distribution of the Program.
|
||||
|
||||
If any portion of this section is held invalid or unenforceable under
|
||||
any particular circumstance, the balance of the section is intended to
|
||||
apply and the section as a whole is intended to apply in other
|
||||
circumstances.
|
||||
|
||||
It is not the purpose of this section to induce you to infringe any
|
||||
patents or other property right claims or to contest validity of any
|
||||
such claims; this section has the sole purpose of protecting the
|
||||
integrity of the free software distribution system, which is
|
||||
implemented by public license practices. Many people have made
|
||||
generous contributions to the wide range of software distributed
|
||||
through that system in reliance on consistent application of that
|
||||
system; it is up to the author/donor to decide if he or she is willing
|
||||
to distribute software through any other system and a licensee cannot
|
||||
impose that choice.
|
||||
|
||||
This section is intended to make thoroughly clear what is believed to
|
||||
be a consequence of the rest of this License.
|
||||
|
||||
8. If the distribution and/or use of the Program is restricted in
|
||||
certain countries either by patents or by copyrighted interfaces, the
|
||||
original copyright holder who places the Program under this License
|
||||
may add an explicit geographical distribution limitation excluding
|
||||
those countries, so that distribution is permitted only in or among
|
||||
countries not thus excluded. In such case, this License incorporates
|
||||
the limitation as if written in the body of this License.
|
||||
|
||||
9. The Free Software Foundation may publish revised and/or new versions
|
||||
of the General Public License from time to time. Such new versions will
|
||||
be similar in spirit to the present version, but may differ in detail to
|
||||
address new problems or concerns.
|
||||
|
||||
Each version is given a distinguishing version number. If the Program
|
||||
specifies a version number of this License which applies to it and "any
|
||||
later version", you have the option of following the terms and conditions
|
||||
either of that version or of any later version published by the Free
|
||||
Software Foundation. If the Program does not specify a version number of
|
||||
this License, you may choose any version ever published by the Free Software
|
||||
Foundation.
|
||||
|
||||
10. If you wish to incorporate parts of the Program into other free
|
||||
programs whose distribution conditions are different, write to the author
|
||||
to ask for permission. For software which is copyrighted by the Free
|
||||
Software Foundation, write to the Free Software Foundation; we sometimes
|
||||
make exceptions for this. Our decision will be guided by the two goals
|
||||
of preserving the free status of all derivatives of our free software and
|
||||
of promoting the sharing and reuse of software generally.
|
||||
|
||||
NO WARRANTY
|
||||
|
||||
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
|
||||
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
|
||||
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
|
||||
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
|
||||
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
|
||||
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
|
||||
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
|
||||
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
|
||||
REPAIR OR CORRECTION.
|
||||
|
||||
12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
|
||||
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
|
||||
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
|
||||
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
|
||||
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
|
||||
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
|
||||
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
|
||||
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
|
||||
POSSIBILITY OF SUCH DAMAGES.
|
||||
|
||||
END OF TERMS AND CONDITIONS
|
||||
|
||||
Appendix: How to Apply These Terms to Your New Programs
|
||||
|
||||
If you develop a new program, and you want it to be of the greatest
|
||||
possible use to the public, the best way to achieve this is to make it
|
||||
free software which everyone can redistribute and change under these terms.
|
||||
|
||||
To do so, attach the following notices to the program. It is safest
|
||||
to attach them to the start of each source file to most effectively
|
||||
convey the exclusion of warranty; and each file should have at least
|
||||
the "copyright" line and a pointer to where the full notice is found.
|
||||
|
||||
<one line to give the program's name and a brief idea of what it does.>
|
||||
Copyright (C) 19yy <name of author>
|
||||
|
||||
This program is free software; you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation; either version 2 of the License, or
|
||||
(at your option) any later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program; if not, write to the Free Software
|
||||
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
|
||||
|
||||
Also add information on how to contact you by electronic and paper mail.
|
||||
|
||||
If the program is interactive, make it output a short notice like this
|
||||
when it starts in an interactive mode:
|
||||
|
||||
Gnomovision version 69, Copyright (C) 19yy name of author
|
||||
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
|
||||
This is free software, and you are welcome to redistribute it
|
||||
under certain conditions; type `show c' for details.
|
||||
|
||||
The hypothetical commands `show w' and `show c' should show the appropriate
|
||||
parts of the General Public License. Of course, the commands you use may
|
||||
be called something other than `show w' and `show c'; they could even be
|
||||
mouse-clicks or menu items--whatever suits your program.
|
||||
|
||||
You should also get your employer (if you work as a programmer) or your
|
||||
school, if any, to sign a "copyright disclaimer" for the program, if
|
||||
necessary. Here is a sample; alter the names:
|
||||
|
||||
Yoyodyne, Inc., hereby disclaims all copyright interest in the program
|
||||
`Gnomovision' (which makes passes at compilers) written by James Hacker.
|
||||
|
||||
<signature of Ty Coon>, 1 April 1989
|
||||
Ty Coon, President of Vice
|
||||
|
||||
This General Public License does not permit incorporating your program into
|
||||
proprietary programs. If your program is a subroutine library, you may
|
||||
consider it more useful to permit linking proprietary applications with the
|
||||
library. If this is what you want to do, use the GNU Library General
|
||||
Public License instead of this License.
|
6
11/wywolania/Data/stide_v1.1/Makefile
Normal file
6
11/wywolania/Data/stide_v1.1/Makefile
Normal file
@ -0,0 +1,6 @@
|
||||
all:
|
||||
(cd Seq-code; make; cp stide ..)
|
||||
|
||||
clean:
|
||||
@rm -f stide
|
||||
@(cd Seq-code; rm -f *.o stide)
|
11
11/wywolania/Data/stide_v1.1/README
Normal file
11
11/wywolania/Data/stide_v1.1/README
Normal file
@ -0,0 +1,11 @@
|
||||
STIDE version 1.1
|
||||
|
||||
Copyright (C) 1996, 1998 The Regents of the University
|
||||
of New Mexico. All rights reserved.
|
||||
|
||||
|
||||
This code was written for GCC version 2.7.2, but should compile correctly
|
||||
under other more recent versions of GCC.
|
||||
|
||||
For usage information invoke stide with the --help option. More detailed
|
||||
documentation can be found in the UserDoc directory.
|
339
11/wywolania/Data/stide_v1.1/Seq-code/COPYING
Executable file
339
11/wywolania/Data/stide_v1.1/Seq-code/COPYING
Executable file
@ -0,0 +1,339 @@
|
||||
GNU GENERAL PUBLIC LICENSE
|
||||
Version 2, June 1991
|
||||
|
||||
Copyright (C) 1989, 1991 Free Software Foundation, Inc.
|
||||
675 Mass Ave, Cambridge, MA 02139, USA
|
||||
Everyone is permitted to copy and distribute verbatim copies
|
||||
of this license document, but changing it is not allowed.
|
||||
|
||||
Preamble
|
||||
|
||||
The licenses for most software are designed to take away your
|
||||
freedom to share and change it. By contrast, the GNU General Public
|
||||
License is intended to guarantee your freedom to share and change free
|
||||
software--to make sure the software is free for all its users. This
|
||||
General Public License applies to most of the Free Software
|
||||
Foundation's software and to any other program whose authors commit to
|
||||
using it. (Some other Free Software Foundation software is covered by
|
||||
the GNU Library General Public License instead.) You can apply it to
|
||||
your programs, too.
|
||||
|
||||
When we speak of free software, we are referring to freedom, not
|
||||
price. Our General Public Licenses are designed to make sure that you
|
||||
have the freedom to distribute copies of free software (and charge for
|
||||
this service if you wish), that you receive source code or can get it
|
||||
if you want it, that you can change the software or use pieces of it
|
||||
in new free programs; and that you know you can do these things.
|
||||
|
||||
To protect your rights, we need to make restrictions that forbid
|
||||
anyone to deny you these rights or to ask you to surrender the rights.
|
||||
These restrictions translate to certain responsibilities for you if you
|
||||
distribute copies of the software, or if you modify it.
|
||||
|
||||
For example, if you distribute copies of such a program, whether
|
||||
gratis or for a fee, you must give the recipients all the rights that
|
||||
you have. You must make sure that they, too, receive or can get the
|
||||
source code. And you must show them these terms so they know their
|
||||
rights.
|
||||
|
||||
We protect your rights with two steps: (1) copyright the software, and
|
||||
(2) offer you this license which gives you legal permission to copy,
|
||||
distribute and/or modify the software.
|
||||
|
||||
Also, for each author's protection and ours, we want to make certain
|
||||
that everyone understands that there is no warranty for this free
|
||||
software. If the software is modified by someone else and passed on, we
|
||||
want its recipients to know that what they have is not the original, so
|
||||
that any problems introduced by others will not reflect on the original
|
||||
authors' reputations.
|
||||
|
||||
Finally, any free program is threatened constantly by software
|
||||
patents. We wish to avoid the danger that redistributors of a free
|
||||
program will individually obtain patent licenses, in effect making the
|
||||
program proprietary. To prevent this, we have made it clear that any
|
||||
patent must be licensed for everyone's free use or not licensed at all.
|
||||
|
||||
The precise terms and conditions for copying, distribution and
|
||||
modification follow.
|
||||
|
||||
GNU GENERAL PUBLIC LICENSE
|
||||
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
|
||||
|
||||
0. This License applies to any program or other work which contains
|
||||
a notice placed by the copyright holder saying it may be distributed
|
||||
under the terms of this General Public License. The "Program", below,
|
||||
refers to any such program or work, and a "work based on the Program"
|
||||
means either the Program or any derivative work under copyright law:
|
||||
that is to say, a work containing the Program or a portion of it,
|
||||
either verbatim or with modifications and/or translated into another
|
||||
language. (Hereinafter, translation is included without limitation in
|
||||
the term "modification".) Each licensee is addressed as "you".
|
||||
|
||||
Activities other than copying, distribution and modification are not
|
||||
covered by this License; they are outside its scope. The act of
|
||||
running the Program is not restricted, and the output from the Program
|
||||
is covered only if its contents constitute a work based on the
|
||||
Program (independent of having been made by running the Program).
|
||||
Whether that is true depends on what the Program does.
|
||||
|
||||
1. You may copy and distribute verbatim copies of the Program's
|
||||
source code as you receive it, in any medium, provided that you
|
||||
conspicuously and appropriately publish on each copy an appropriate
|
||||
copyright notice and disclaimer of warranty; keep intact all the
|
||||
notices that refer to this License and to the absence of any warranty;
|
||||
and give any other recipients of the Program a copy of this License
|
||||
along with the Program.
|
||||
|
||||
You may charge a fee for the physical act of transferring a copy, and
|
||||
you may at your option offer warranty protection in exchange for a fee.
|
||||
|
||||
2. You may modify your copy or copies of the Program or any portion
|
||||
of it, thus forming a work based on the Program, and copy and
|
||||
distribute such modifications or work under the terms of Section 1
|
||||
above, provided that you also meet all of these conditions:
|
||||
|
||||
a) You must cause the modified files to carry prominent notices
|
||||
stating that you changed the files and the date of any change.
|
||||
|
||||
b) You must cause any work that you distribute or publish, that in
|
||||
whole or in part contains or is derived from the Program or any
|
||||
part thereof, to be licensed as a whole at no charge to all third
|
||||
parties under the terms of this License.
|
||||
|
||||
c) If the modified program normally reads commands interactively
|
||||
when run, you must cause it, when started running for such
|
||||
interactive use in the most ordinary way, to print or display an
|
||||
announcement including an appropriate copyright notice and a
|
||||
notice that there is no warranty (or else, saying that you provide
|
||||
a warranty) and that users may redistribute the program under
|
||||
these conditions, and telling the user how to view a copy of this
|
||||
License. (Exception: if the Program itself is interactive but
|
||||
does not normally print such an announcement, your work based on
|
||||
the Program is not required to print an announcement.)
|
||||
|
||||
These requirements apply to the modified work as a whole. If
|
||||
identifiable sections of that work are not derived from the Program,
|
||||
and can be reasonably considered independent and separate works in
|
||||
themselves, then this License, and its terms, do not apply to those
|
||||
sections when you distribute them as separate works. But when you
|
||||
distribute the same sections as part of a whole which is a work based
|
||||
on the Program, the distribution of the whole must be on the terms of
|
||||
this License, whose permissions for other licensees extend to the
|
||||
entire whole, and thus to each and every part regardless of who wrote it.
|
||||
|
||||
Thus, it is not the intent of this section to claim rights or contest
|
||||
your rights to work written entirely by you; rather, the intent is to
|
||||
exercise the right to control the distribution of derivative or
|
||||
collective works based on the Program.
|
||||
|
||||
In addition, mere aggregation of another work not based on the Program
|
||||
with the Program (or with a work based on the Program) on a volume of
|
||||
a storage or distribution medium does not bring the other work under
|
||||
the scope of this License.
|
||||
|
||||
3. You may copy and distribute the Program (or a work based on it,
|
||||
under Section 2) in object code or executable form under the terms of
|
||||
Sections 1 and 2 above provided that you also do one of the following:
|
||||
|
||||
a) Accompany it with the complete corresponding machine-readable
|
||||
source code, which must be distributed under the terms of Sections
|
||||
1 and 2 above on a medium customarily used for software interchange; or,
|
||||
|
||||
b) Accompany it with a written offer, valid for at least three
|
||||
years, to give any third party, for a charge no more than your
|
||||
cost of physically performing source distribution, a complete
|
||||
machine-readable copy of the corresponding source code, to be
|
||||
distributed under the terms of Sections 1 and 2 above on a medium
|
||||
customarily used for software interchange; or,
|
||||
|
||||
c) Accompany it with the information you received as to the offer
|
||||
to distribute corresponding source code. (This alternative is
|
||||
allowed only for noncommercial distribution and only if you
|
||||
received the program in object code or executable form with such
|
||||
an offer, in accord with Subsection b above.)
|
||||
|
||||
The source code for a work means the preferred form of the work for
|
||||
making modifications to it. For an executable work, complete source
|
||||
code means all the source code for all modules it contains, plus any
|
||||
associated interface definition files, plus the scripts used to
|
||||
control compilation and installation of the executable. However, as a
|
||||
special exception, the source code distributed need not include
|
||||
anything that is normally distributed (in either source or binary
|
||||
form) with the major components (compiler, kernel, and so on) of the
|
||||
operating system on which the executable runs, unless that component
|
||||
itself accompanies the executable.
|
||||
|
||||
If distribution of executable or object code is made by offering
|
||||
access to copy from a designated place, then offering equivalent
|
||||
access to copy the source code from the same place counts as
|
||||
distribution of the source code, even though third parties are not
|
||||
compelled to copy the source along with the object code.
|
||||
|
||||
4. You may not copy, modify, sublicense, or distribute the Program
|
||||
except as expressly provided under this License. Any attempt
|
||||
otherwise to copy, modify, sublicense or distribute the Program is
|
||||
void, and will automatically terminate your rights under this License.
|
||||
However, parties who have received copies, or rights, from you under
|
||||
this License will not have their licenses terminated so long as such
|
||||
parties remain in full compliance.
|
||||
|
||||
5. You are not required to accept this License, since you have not
|
||||
signed it. However, nothing else grants you permission to modify or
|
||||
distribute the Program or its derivative works. These actions are
|
||||
prohibited by law if you do not accept this License. Therefore, by
|
||||
modifying or distributing the Program (or any work based on the
|
||||
Program), you indicate your acceptance of this License to do so, and
|
||||
all its terms and conditions for copying, distributing or modifying
|
||||
the Program or works based on it.
|
||||
|
||||
6. Each time you redistribute the Program (or any work based on the
|
||||
Program), the recipient automatically receives a license from the
|
||||
original licensor to copy, distribute or modify the Program subject to
|
||||
these terms and conditions. You may not impose any further
|
||||
restrictions on the recipients' exercise of the rights granted herein.
|
||||
You are not responsible for enforcing compliance by third parties to
|
||||
this License.
|
||||
|
||||
7. If, as a consequence of a court judgment or allegation of patent
|
||||
infringement or for any other reason (not limited to patent issues),
|
||||
conditions are imposed on you (whether by court order, agreement or
|
||||
otherwise) that contradict the conditions of this License, they do not
|
||||
excuse you from the conditions of this License. If you cannot
|
||||
distribute so as to satisfy simultaneously your obligations under this
|
||||
License and any other pertinent obligations, then as a consequence you
|
||||
may not distribute the Program at all. For example, if a patent
|
||||
license would not permit royalty-free redistribution of the Program by
|
||||
all those who receive copies directly or indirectly through you, then
|
||||
the only way you could satisfy both it and this License would be to
|
||||
refrain entirely from distribution of the Program.
|
||||
|
||||
If any portion of this section is held invalid or unenforceable under
|
||||
any particular circumstance, the balance of the section is intended to
|
||||
apply and the section as a whole is intended to apply in other
|
||||
circumstances.
|
||||
|
||||
It is not the purpose of this section to induce you to infringe any
|
||||
patents or other property right claims or to contest validity of any
|
||||
such claims; this section has the sole purpose of protecting the
|
||||
integrity of the free software distribution system, which is
|
||||
implemented by public license practices. Many people have made
|
||||
generous contributions to the wide range of software distributed
|
||||
through that system in reliance on consistent application of that
|
||||
system; it is up to the author/donor to decide if he or she is willing
|
||||
to distribute software through any other system and a licensee cannot
|
||||
impose that choice.
|
||||
|
||||
This section is intended to make thoroughly clear what is believed to
|
||||
be a consequence of the rest of this License.
|
||||
|
||||
8. If the distribution and/or use of the Program is restricted in
|
||||
certain countries either by patents or by copyrighted interfaces, the
|
||||
original copyright holder who places the Program under this License
|
||||
may add an explicit geographical distribution limitation excluding
|
||||
those countries, so that distribution is permitted only in or among
|
||||
countries not thus excluded. In such case, this License incorporates
|
||||
the limitation as if written in the body of this License.
|
||||
|
||||
9. The Free Software Foundation may publish revised and/or new versions
|
||||
of the General Public License from time to time. Such new versions will
|
||||
be similar in spirit to the present version, but may differ in detail to
|
||||
address new problems or concerns.
|
||||
|
||||
Each version is given a distinguishing version number. If the Program
|
||||
specifies a version number of this License which applies to it and "any
|
||||
later version", you have the option of following the terms and conditions
|
||||
either of that version or of any later version published by the Free
|
||||
Software Foundation. If the Program does not specify a version number of
|
||||
this License, you may choose any version ever published by the Free Software
|
||||
Foundation.
|
||||
|
||||
10. If you wish to incorporate parts of the Program into other free
|
||||
programs whose distribution conditions are different, write to the author
|
||||
to ask for permission. For software which is copyrighted by the Free
|
||||
Software Foundation, write to the Free Software Foundation; we sometimes
|
||||
make exceptions for this. Our decision will be guided by the two goals
|
||||
of preserving the free status of all derivatives of our free software and
|
||||
of promoting the sharing and reuse of software generally.
|
||||
|
||||
NO WARRANTY
|
||||
|
||||
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
|
||||
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
|
||||
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
|
||||
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
|
||||
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
|
||||
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
|
||||
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
|
||||
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
|
||||
REPAIR OR CORRECTION.
|
||||
|
||||
12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
|
||||
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
|
||||
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
|
||||
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
|
||||
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
|
||||
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
|
||||
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
|
||||
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
|
||||
POSSIBILITY OF SUCH DAMAGES.
|
||||
|
||||
END OF TERMS AND CONDITIONS
|
||||
|
||||
Appendix: How to Apply These Terms to Your New Programs
|
||||
|
||||
If you develop a new program, and you want it to be of the greatest
|
||||
possible use to the public, the best way to achieve this is to make it
|
||||
free software which everyone can redistribute and change under these terms.
|
||||
|
||||
To do so, attach the following notices to the program. It is safest
|
||||
to attach them to the start of each source file to most effectively
|
||||
convey the exclusion of warranty; and each file should have at least
|
||||
the "copyright" line and a pointer to where the full notice is found.
|
||||
|
||||
<one line to give the program's name and a brief idea of what it does.>
|
||||
Copyright (C) 19yy <name of author>
|
||||
|
||||
This program is free software; you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation; either version 2 of the License, or
|
||||
(at your option) any later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program; if not, write to the Free Software
|
||||
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
|
||||
|
||||
Also add information on how to contact you by electronic and paper mail.
|
||||
|
||||
If the program is interactive, make it output a short notice like this
|
||||
when it starts in an interactive mode:
|
||||
|
||||
Gnomovision version 69, Copyright (C) 19yy name of author
|
||||
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
|
||||
This is free software, and you are welcome to redistribute it
|
||||
under certain conditions; type `show c' for details.
|
||||
|
||||
The hypothetical commands `show w' and `show c' should show the appropriate
|
||||
parts of the General Public License. Of course, the commands you use may
|
||||
be called something other than `show w' and `show c'; they could even be
|
||||
mouse-clicks or menu items--whatever suits your program.
|
||||
|
||||
You should also get your employer (if you work as a programmer) or your
|
||||
school, if any, to sign a "copyright disclaimer" for the program, if
|
||||
necessary. Here is a sample; alter the names:
|
||||
|
||||
Yoyodyne, Inc., hereby disclaims all copyright interest in the program
|
||||
`Gnomovision' (which makes passes at compilers) written by James Hacker.
|
||||
|
||||
<signature of Ty Coon>, 1 April 1989
|
||||
Ty Coon, President of Vice
|
||||
|
||||
This General Public License does not permit incorporating your program into
|
||||
proprietary programs. If your program is a subroutine library, you may
|
||||
consider it more useful to permit linking proprietary applications with the
|
||||
library. If this is what you want to do, use the GNU Library General
|
||||
Public License instead of this License.
|
23
11/wywolania/Data/stide_v1.1/Seq-code/Makefile
Executable file
23
11/wywolania/Data/stide_v1.1/Seq-code/Makefile
Executable file
@ -0,0 +1,23 @@
|
||||
STIDE_OBJECTS = stide.o seq_config.o seq_stream.o template.o flexitree.o
|
||||
|
||||
LIBES = -lm
|
||||
|
||||
#FLAGS = -O2
|
||||
FLAGS = -g
|
||||
|
||||
stide : $(STIDE_OBJECTS)
|
||||
g++ -fno-implicit-templates $(FLAGS) $(STIDE_OBJECTS) $(LIBES) -o stide
|
||||
|
||||
template.o : template.cc ../Utils/arrays.h ../Utils/tll.h ../Utils/hash.h ../Utils/tll.cc ../Utils/arrays.cc ../Utils/hash.cc seq_stream.h flexitree.h
|
||||
g++ -fno-implicit-templates $(FLAGS) -c template.cc
|
||||
|
||||
stide.o : stide.cc ../Utils/arrays.h ../Utils/hash.h seq_stream.h seq_config.h flexitree.h
|
||||
g++ -fno-implicit-templates $(FLAGS) -c stide.cc
|
||||
flexitree.o : flexitree.cc ../Utils/arrays.h flexitree.h
|
||||
g++ -fno-implicit-templates $(FLAGS) -c flexitree.cc
|
||||
|
||||
seq_config.o : seq_config.cc seq_config.h ../Utils/arrays.h
|
||||
g++ -fno-implicit-templates $(FLAGS) -c seq_config.cc
|
||||
|
||||
seq_stream.o : seq_stream.cc seq_stream.h seq_config.h flexitree.h ../Utils/arrays.h ../Utils/hash.h
|
||||
g++ -fno-implicit-templates $(FLAGS) -c seq_stream.cc
|
449
11/wywolania/Data/stide_v1.1/Seq-code/flexitree.cc
Executable file
449
11/wywolania/Data/stide_v1.1/Seq-code/flexitree.cc
Executable file
@ -0,0 +1,449 @@
|
||||
// flexitree.cc
|
||||
#include "flexitree.h"
|
||||
|
||||
extern int counter;
|
||||
|
||||
// data structures:
|
||||
// node for a linked list
|
||||
class FlexiTreeNode {
|
||||
public:
|
||||
FlexiTree *tree; // the element at this node
|
||||
FlexiTreeNode *next; // pointer to the next node
|
||||
FlexiTreeNode(int root) {tree = new FlexiTree(root); next = NULL;}
|
||||
};
|
||||
//===========================================================================
|
||||
FlexiTree::FlexiTree(void) {
|
||||
children = NULL;
|
||||
root = -1;
|
||||
id = counter;
|
||||
counter++;
|
||||
}
|
||||
//===========================================================================
|
||||
FlexiTree::FlexiTree(int d) {
|
||||
children = NULL;
|
||||
root = d;
|
||||
id = counter;
|
||||
counter++;
|
||||
}
|
||||
//============================================================================
|
||||
FlexiTree::~FlexiTree(void) {
|
||||
if (children) {
|
||||
FlexiTreeNode *temp_ptr = children->next, *next_temp_ptr;
|
||||
if (children->tree) delete children->tree;
|
||||
delete children;
|
||||
while (temp_ptr) {
|
||||
next_temp_ptr = temp_ptr->next;
|
||||
if (temp_ptr->tree) delete temp_ptr->tree;
|
||||
delete temp_ptr;
|
||||
temp_ptr = next_temp_ptr;
|
||||
}
|
||||
}
|
||||
}
|
||||
//============================================================================
|
||||
int FlexiTree::NumNodes(void) {
|
||||
int size = 1;
|
||||
if (children) {
|
||||
FlexiTreeNode *temp_ptr = children;
|
||||
while (temp_ptr) {
|
||||
size += temp_ptr->tree->NumNodes();
|
||||
temp_ptr = temp_ptr->next;
|
||||
}
|
||||
}
|
||||
return size;
|
||||
}
|
||||
//============================================================================
|
||||
int FlexiTree::NumLeaves(void) {
|
||||
int size;
|
||||
if (children) {
|
||||
size = 0;
|
||||
FlexiTreeNode *temp_ptr = children;
|
||||
while (temp_ptr) {
|
||||
size += temp_ptr->tree->NumLeaves();
|
||||
temp_ptr = temp_ptr->next;
|
||||
}
|
||||
} else size = 1;
|
||||
return size;
|
||||
}
|
||||
//============================================================================
|
||||
int FlexiTree::NumBranches(void) {
|
||||
int branches = 0;
|
||||
if (children) {
|
||||
FlexiTreeNode *temp_ptr = children;
|
||||
while (temp_ptr) {
|
||||
branches += (temp_ptr->tree->NumBranches() + 1);
|
||||
temp_ptr = temp_ptr->next;
|
||||
}
|
||||
}
|
||||
return branches;
|
||||
}
|
||||
/**********************************************************************
|
||||
* InsertSeq() *
|
||||
* Inserts a sequence in this tree and returns 1 if the sequence *
|
||||
* begins with the root of this tree and the sequence isn't already *
|
||||
* in this tree. It returns -1 if the sequence doesn't begin with *
|
||||
* the root of this tree. It returns 0 if the sequence was already *
|
||||
* in this tree. This function is recursive and only compares the *
|
||||
* portion of the sequence lying between the argument first and the *
|
||||
* argument last. *
|
||||
* *
|
||||
* *
|
||||
* Input: const Array<int> &seq Current sequence *
|
||||
* int first The first element of the sequence *
|
||||
* to consider *
|
||||
* int last The length of the sequence *
|
||||
*********************************************************************/
|
||||
|
||||
int FlexiTree::InsertSeq(const Array<int> &seq, int first, int last)
|
||||
{
|
||||
// If the root of this tree isn't the same as the first element of
|
||||
// the sequence, return -1 to indicate that
|
||||
if (root != seq[first]) {
|
||||
return -1;
|
||||
}
|
||||
|
||||
first++; // shift the seq forward
|
||||
// If we have reached the end of the sequence now, we haven't added
|
||||
// anything to the tree, so we return 0 to indicate that it was
|
||||
// already there
|
||||
if (first > last) {
|
||||
return 0;
|
||||
}
|
||||
|
||||
// If there are no children, create some with the correct root,
|
||||
// insert the sequence and return 1.
|
||||
if (!children) {
|
||||
children = new FlexiTreeNode(seq[first]);
|
||||
children->tree->InsertSeq(seq, first, last);
|
||||
return 1;
|
||||
}
|
||||
|
||||
// The root agrees, we're not at the end, and there are children.
|
||||
// Now we want to know if the sequence is already in the children,
|
||||
// and if not, we want to find out and add it.
|
||||
FlexiTreeNode *temp_ptr = children;
|
||||
int flag;
|
||||
while (1) {
|
||||
flag = temp_ptr->tree->InsertSeq(seq, first, last);
|
||||
// If the sequence is new and gets added, return 1
|
||||
if (flag == 1) return 1;
|
||||
// If the sequence is old, return 0
|
||||
if (flag == 0) return 0;
|
||||
// Otherwise the new root of the sequence isn't the same as the
|
||||
// root of this child tree, so we will try the next one. But
|
||||
// first, if this is the last child, we know it isn't in here, so
|
||||
// we will add it in and return 1
|
||||
if (temp_ptr->next == NULL) {
|
||||
temp_ptr->next = new FlexiTreeNode(seq[first]);
|
||||
temp_ptr->next->tree->InsertSeq(seq, first, last);
|
||||
return 1;
|
||||
}
|
||||
temp_ptr = temp_ptr->next;
|
||||
}
|
||||
}
|
||||
|
||||
/*********************************************************************
|
||||
* IsSeqInTree() *
|
||||
* Returns 1 if the sequence has a match within this tree and *
|
||||
* returns 0 otherwise. This function is recursive and only *
|
||||
* compares the portion of the sequence lying between the argument *
|
||||
* first and the argument last. *
|
||||
* *
|
||||
* *
|
||||
* Input: Array<int> &seq Current sequence *
|
||||
* int first The first element of the sequence to *
|
||||
* consider *
|
||||
* int last The length of the sequence *
|
||||
********************************************************************/
|
||||
|
||||
int FlexiTree::IsSeqInTree(const Array<int> &seq, int first, int last)
|
||||
{
|
||||
// If the first element of the sequence isn't the same as the root
|
||||
// of this tree, then we know already that there isn't a match here,
|
||||
// so return 0.
|
||||
if (root != seq[first]) {
|
||||
return 0;
|
||||
}
|
||||
first++; // shift the seq forward
|
||||
|
||||
// If we have reached the end of the sequence, then we have
|
||||
// found matches all the way along, so return 1 saying that this is
|
||||
// a match.
|
||||
if (first > last) {
|
||||
return 1;
|
||||
}
|
||||
|
||||
// Now we want to find out if there is a match in any of the
|
||||
// subtrees below this tree. The subtrees are contained in the
|
||||
// linked list children->next->next->...
|
||||
FlexiTreeNode *next_node = children;
|
||||
while (next_node != NULL) {
|
||||
if (next_node->tree->IsSeqInTree(seq, first, last)) {
|
||||
return 1; //Found it!
|
||||
}
|
||||
next_node = next_node->next;
|
||||
}
|
||||
// Now we've been through all of the subtrees without finding a
|
||||
// match, so there aren't any matches.
|
||||
return 0;
|
||||
}
|
||||
/*********************************************************************
|
||||
* ComputeHDistForTree() *
|
||||
* Reports the minimum number of mismatches with any sequence on *
|
||||
* this tree. This is a highly compute-intensive method, because *
|
||||
* every path down the tree is followed. This function is *
|
||||
* recursive, and only compares the portion of the sequence lying *
|
||||
* between the argument first and the argument last. *
|
||||
* *
|
||||
* *
|
||||
* Input: Array<int> &seq Current sequence *
|
||||
* int first The first element of the sequence to *
|
||||
* consider *
|
||||
* int last The length of the sequence *
|
||||
********************************************************************/
|
||||
|
||||
int FlexiTree::ComputeHDistForTree(Array<int> &seq, int first, int
|
||||
last)
|
||||
{
|
||||
|
||||
int tot_misses = 0;
|
||||
|
||||
// If the first element of the sequence isn't the same as the root
|
||||
// of this tree, then every sequence on this tree will disagree with
|
||||
// the sequence here, so we increment tot_misses
|
||||
if (root != seq[first]) {
|
||||
tot_misses++;
|
||||
}
|
||||
|
||||
first++; // shift the seq forward
|
||||
if (first > last) { // reached the end of the seq
|
||||
return tot_misses; // return a zero, i.e. no mismatches
|
||||
}
|
||||
|
||||
// Now we want to add to tot_misses the smallest number of
|
||||
// mismatches with any of this tree's subtrees. This tree's
|
||||
// subtrees are in the linked list children->next->next->
|
||||
FlexiTreeNode *next_node = children;
|
||||
// last is the last element of the sequence, which is one less than
|
||||
// the number of elements in the sequence. The most misses possible
|
||||
// is the number of elements in the sequence.
|
||||
int min_misses = last + 1;
|
||||
int misses;
|
||||
while (next_node != NULL) {
|
||||
misses = next_node->tree->ComputeHDistForTree(seq, first, last);
|
||||
if (misses < min_misses) {
|
||||
min_misses = misses;
|
||||
}
|
||||
next_node = next_node->next;
|
||||
}
|
||||
return (tot_misses + min_misses);
|
||||
}
|
||||
//===========================================================================
|
||||
// format for writing out: we do it df, each path is terminated by a negative number,
|
||||
// which is -(the reqd backtrack length)-1. depth should start out as 0.
|
||||
// the tree writing out will end with -1.
|
||||
void FlexiTree::Write(ostream &s, int &depth) {
|
||||
s<<root<<" ";
|
||||
FlexiTreeNode *temp_ptr = children;
|
||||
while (temp_ptr) {
|
||||
depth = 0;
|
||||
temp_ptr->tree->Write(s, depth);
|
||||
temp_ptr = temp_ptr->next;
|
||||
if (temp_ptr) s<<"-"<<(depth + 1)<<" ";
|
||||
}
|
||||
depth++; // now incr the count
|
||||
}
|
||||
//=============================================================================
|
||||
ostream &operator<<(ostream &s, FlexiTree &tree) {
|
||||
int depth = 0;
|
||||
tree.Write(s, depth);
|
||||
s<<" -1"; // we terminate with a -1
|
||||
return s;
|
||||
}
|
||||
//===========================================================================
|
||||
// returns 0 if we have reached the end of the file, 1 otherwise
|
||||
int FlexiTree::Read(istream &s, int &depth) {
|
||||
int next_num;
|
||||
if (s.eof()) return 0;
|
||||
s>>next_num;
|
||||
|
||||
if (next_num == -1) return 0; // we have reached the end of the tree
|
||||
|
||||
if (next_num >= 0) {
|
||||
children = new FlexiTreeNode(next_num);
|
||||
if (!children->tree->Read(s, depth)) return 0;
|
||||
FlexiTreeNode *temp_ptr = children;
|
||||
while (depth == 0) {
|
||||
if (s.eof()) return 0;
|
||||
s>>next_num;
|
||||
if (next_num == -1) return 0; // we have reached the end of the tree
|
||||
temp_ptr->next = new FlexiTreeNode(next_num);
|
||||
temp_ptr = temp_ptr->next;
|
||||
if (!temp_ptr->tree->Read(s, depth)) return 0;
|
||||
}
|
||||
} else depth = (-1 * next_num) - 1;
|
||||
if (depth) depth--;
|
||||
return 1;
|
||||
}
|
||||
//=============================================================================
|
||||
istream &operator>>(istream &s, FlexiTree &tree) {
|
||||
int next_num, depth = 0;
|
||||
s>>next_num;
|
||||
tree.SetRoot(next_num);
|
||||
tree.Read(s, depth);
|
||||
return s;
|
||||
}
|
||||
//===========================================================================
|
||||
// writes out in the format that dot uses for dags
|
||||
int FlexiTree::OutputGraph(ostream &s) {
|
||||
// first write out the name of the tree
|
||||
s<<" "<<id<<" [label=\""<<root<<"\",shape=plaintext];"<<endl;
|
||||
|
||||
FlexiTreeNode *temp_ptr = children;
|
||||
int childid;
|
||||
while (temp_ptr) {
|
||||
childid = temp_ptr->tree->OutputGraph(s);
|
||||
s<<" "<<id<<" -> "<<childid<<";"<<endl;
|
||||
temp_ptr = temp_ptr->next;
|
||||
}
|
||||
return id;
|
||||
}
|
||||
|
||||
/*********************************************************************
|
||||
* IsSeqInForest() *
|
||||
* Searches through database forest to locate sequence. Returns 1 *
|
||||
* if it finds it, 0 otherwise *
|
||||
*********************************************************************/
|
||||
|
||||
SeqForest::IsSeqInForest(const Array<int> &seq, int seq_len) const
|
||||
{
|
||||
// Have we ever seen a sequence starting with the same root?
|
||||
if (trees_found[seq[0]]) {
|
||||
// Have we seen this precise sequence?
|
||||
return trees[seq[0]].IsSeqInTree(seq, 0, seq_len-1);
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
|
||||
/*
|
||||
#include "fstream.h"
|
||||
|
||||
// for test purposes
|
||||
void main(void) {
|
||||
FlexiTree tree(1);
|
||||
Array<int> seq(10);
|
||||
|
||||
// try out insert and write
|
||||
seq[0] = 1; seq[1] = 1; seq[2] = 2; seq[3] = 3;
|
||||
tree.SeqInsert(seq, 0, 3);
|
||||
cout<<"1123:"<<tree<<endl;
|
||||
seq[0] = 1; seq[1] = 1; seq[2] = 3; seq[3] = 5;
|
||||
tree.SeqInsert(seq, 0, 3);
|
||||
cout<<"1134:"<<tree<<endl;
|
||||
seq[0] = 1; seq[1] = 2; seq[2] = 2; seq[3] = 3;
|
||||
tree.SeqInsert(seq, 0, 3);
|
||||
cout<<"1223:"<<tree<<endl;
|
||||
seq[0] = 1; seq[1] = 2; seq[2] = 3; seq[3] = 3;
|
||||
tree.SeqInsert(seq, 0, 3);
|
||||
cout<<"1233:"<<tree<<endl;
|
||||
seq[0] = 1; seq[1] = 2; seq[2] = 3; seq[3] = 4;
|
||||
tree.SeqInsert(seq, 0, 3);
|
||||
cout<<"1234:"<<tree<<endl;
|
||||
seq[0] = 1; seq[1] = 2; seq[2] = 3; seq[3] = 4;
|
||||
tree.SeqInsert(seq, 0, 3);
|
||||
cout<<"1234:"<<tree<<endl;
|
||||
seq[0] = 1; seq[1] = 2; seq[2] = 1; seq[3] = 4;
|
||||
tree.SeqInsert(seq, 0, 3);
|
||||
cout<<"1214:"<<tree<<endl;
|
||||
|
||||
// now try out search
|
||||
seq[0] = 1; seq[1] = 2; seq[2] = 1; seq[3] = 4;
|
||||
if (tree.SeqSearch(seq, 0, 3)) cout<<"found 1214"<<endl;
|
||||
else cout<<"could not find 1214"<<endl;
|
||||
seq[0] = 1; seq[1] = 2; seq[2] = 2; seq[3] = 4;
|
||||
if (tree.SeqSearch(seq, 0, 3)) cout<<"found 1224"<<endl;
|
||||
else cout<<"could not find 1224"<<endl;
|
||||
seq[0] = 1; seq[1] = 2; seq[2] = 4; seq[3] = 4;
|
||||
if (tree.SeqSearch(seq, 0, 3)) cout<<"found 1244"<<endl;
|
||||
else cout<<"could not find 1244"<<endl;
|
||||
seq[0] = 1; seq[1] = 1; seq[2] = 3; seq[3] = 5;
|
||||
if (tree.SeqSearch(seq, 0, 3)) cout<<"found 1134"<<endl;
|
||||
else cout<<"could not find 1134"<<endl;
|
||||
|
||||
// try out insert and write with shorter and longer sequences
|
||||
seq[0] = 1; seq[1] = 3;
|
||||
tree.SeqInsert(seq, 0, 1);
|
||||
cout<<"13:"<<tree<<endl;
|
||||
seq[0] = 1; seq[1] = 1; seq[2] = 4;
|
||||
tree.SeqInsert(seq, 0, 2);
|
||||
cout<<"114:"<<tree<<endl;
|
||||
seq[0] = 1; seq[1] = 2; seq[2] = 3; seq[3] = 1; seq[4] = 1; seq[5] = 2; seq[6] = 1; seq[7] = 4;
|
||||
tree.SeqInsert(seq, 0, 7);
|
||||
cout<<"12311214:"<<tree<<endl;
|
||||
|
||||
if (tree.SeqSearch(seq, 0, 7)) cout<<"found 12311214"<<endl;
|
||||
else cout<<"could not find 12311214"<<endl;
|
||||
seq[0] = 1; seq[1] = 1; seq[2] = 5;
|
||||
if (tree.SeqSearch(seq, 0, 2)) cout<<"found 115"<<endl;
|
||||
else cout<<"could not find 115"<<endl;
|
||||
if (tree.SeqSearch(seq, 0, 1)) cout<<"found 11"<<endl;
|
||||
else cout<<"could not find 11"<<endl;
|
||||
|
||||
ofstream outf("test.out");
|
||||
outf<<tree;
|
||||
outf.close();
|
||||
|
||||
//counter = 0;
|
||||
|
||||
FlexiTree intree;
|
||||
ifstream inf("test.out");
|
||||
inf>>intree;
|
||||
inf.close();
|
||||
|
||||
cout<<endl<<intree;
|
||||
|
||||
seq[0] = 1; seq[1] = 2; seq[2] = 3; seq[3] = 1; seq[4] = 1; seq[5] = 2; seq[6] = 1; seq[7] = 4;
|
||||
if (intree.SeqSearch(seq, 0, 7)) cout<<"found 12311214"<<endl;
|
||||
else cout<<"could not find 12311214"<<endl;
|
||||
seq[0] = 1; seq[1] = 1; seq[2] = 5;
|
||||
if (intree.SeqSearch(seq, 0, 2)) cout<<"found 115"<<endl;
|
||||
else cout<<"could not find 115"<<endl;
|
||||
if (intree.SeqSearch(seq, 0, 1)) cout<<"found 11"<<endl;
|
||||
else cout<<"could not find 11"<<endl;
|
||||
|
||||
}
|
||||
*/
|
||||
|
||||
/*
|
||||
int FlexiTree::Read(istream &s, int &depth) {
|
||||
int next_num, depth_decr = 0;
|
||||
s>>next_num;
|
||||
if (next_num == -1) return 0; // we have reached the end of the tree
|
||||
if (next_num >= 0) {
|
||||
children = new FlexiTreeNode(next_num);
|
||||
if (!children->tree->Read(s, depth)) return 0;
|
||||
if (depth) {
|
||||
depth--;
|
||||
depth_decr = 1;
|
||||
}
|
||||
FlexiTreeNode *temp_ptr = children;
|
||||
while (depth == 0) {
|
||||
depth_decr = 0;
|
||||
s>>next_num;
|
||||
if (next_num == -1) return 0; // we have reached the end of the tree
|
||||
temp_ptr->next = new FlexiTreeNode(next_num);
|
||||
temp_ptr = temp_ptr->next;
|
||||
if (!temp_ptr->tree->Read(s, depth)) return 0;
|
||||
if (depth) {
|
||||
depth--;
|
||||
depth_decr = 1;
|
||||
}
|
||||
}
|
||||
if (!depth_decr && depth) depth--;
|
||||
} else
|
||||
depth = (-1 * next_num) - 1;
|
||||
return 1;
|
||||
}
|
||||
*/
|
||||
|
59
11/wywolania/Data/stide_v1.1/Seq-code/flexitree.h
Executable file
59
11/wywolania/Data/stide_v1.1/Seq-code/flexitree.h
Executable file
@ -0,0 +1,59 @@
|
||||
#ifndef __FLEXITREE_H
|
||||
#define __FLEXITREE_H
|
||||
|
||||
#include "../Utils/arrays.h"
|
||||
|
||||
class FlexiTreeNode;
|
||||
class FlexiTree {
|
||||
private:
|
||||
FlexiTreeNode *children;
|
||||
int root;
|
||||
int id;
|
||||
public:
|
||||
void Write(ostream &s, int &depth);
|
||||
int Read(istream &s, int &depth);
|
||||
int OutputGraph(ostream &s);
|
||||
FlexiTree();
|
||||
FlexiTree(int d);
|
||||
~FlexiTree();
|
||||
void SetRoot(int d) {root = d;}
|
||||
int InsertSeq(const Array<int> &seq, int first, int last);
|
||||
int IsSeqInTree(const Array<int> &seq, int first, int last);
|
||||
int ComputeHDistForTree(Array<int> &seq, int first, int last);
|
||||
friend ostream &operator<<(ostream &s, FlexiTree &tn);
|
||||
friend istream &operator>>(istream &s, FlexiTree &tn);
|
||||
int NumNodes(); // returns the number of nodes in the tree
|
||||
int NumLeaves(); // returns the number of leaves in the tree, i.e num of distinct seqs
|
||||
int NumBranches(); // returns the total # of branches, of all nodes
|
||||
};
|
||||
|
||||
//===========================================================================
|
||||
class SeqForest {
|
||||
public:
|
||||
// this structure is a an array of N tree nodes, i.e. a tree for each value
|
||||
// type
|
||||
Array<FlexiTree> trees;
|
||||
// this structure is to record what types of values actually occured -
|
||||
// for efficiency, if there were actually fewer value types than
|
||||
// specified in the config
|
||||
Array<int> trees_found;
|
||||
SeqForest(int max_trees)
|
||||
{trees.Allocate(max_trees); trees_found.Allocate(max_trees); trees_found.Set(0);}
|
||||
int IsSeqInForest(const Array<int> &seq, int seq_len) const;
|
||||
};
|
||||
|
||||
//===========================================================================
|
||||
|
||||
#endif
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
34
11/wywolania/Data/stide_v1.1/Seq-code/opt_info.h
Executable file
34
11/wywolania/Data/stide_v1.1/Seq-code/opt_info.h
Executable file
@ -0,0 +1,34 @@
|
||||
#ifndef __OPT_INFO_H
|
||||
#define __OPT_INFO_H
|
||||
|
||||
#include <string>
|
||||
#include "../Utils/arrays.h"
|
||||
|
||||
#define NUM_OPTS 16
|
||||
#define SHORT_NAME 0
|
||||
#define LONG_NAME 1
|
||||
|
||||
class OptInfo {
|
||||
public:
|
||||
string long_name; // Long name of this option; used in
|
||||
// configuration file and with the -- marker
|
||||
// on the command line
|
||||
string short_name; // Short name of this option; used with the -
|
||||
// marker on the command line
|
||||
int set; // Flag indicating if this option has already
|
||||
// been set
|
||||
char type; // type of value: legitimate values are f
|
||||
// (flag, i.e., boolean), i (int), s (string)
|
||||
// or h (help)
|
||||
union { // pointer to actual value to be set
|
||||
int *flag_val; // value if type = 'f'
|
||||
int *int_val; // value if type = 'i'
|
||||
string *str_val; // value if type = 's'
|
||||
};
|
||||
|
||||
OptInfo() {};
|
||||
};
|
||||
|
||||
#endif
|
||||
|
||||
|
54
11/wywolania/Data/stide_v1.1/Seq-code/sample.config
Executable file
54
11/wywolania/Data/stide_v1.1/Seq-code/sample.config
Executable file
@ -0,0 +1,54 @@
|
||||
#ConfigFileRev: 1
|
||||
#Sample STIDE configuration file containing default values.
|
||||
|
||||
db_name: default.db # name of database
|
||||
seq_len: 6 # length of sequences
|
||||
max_elements: 500 # maximum number of unique elements in input
|
||||
max_streams: 100 # maximum number of unique streams in input
|
||||
pair_offset: 0 # offset for pair number count
|
||||
add_output_format: \
|
||||
"DB Size: %d\tStream: %s\tPair Number: %p\n"
|
||||
# In verbose mode, STIDE will print
|
||||
# this information for every new
|
||||
# sequence added to the database. In
|
||||
# very verbose mode, STIDE will print
|
||||
# this information for every sequence
|
||||
# considered. Possible data:
|
||||
# %d Database Size
|
||||
# %i Pair number of last data element of
|
||||
# sequence in its particular
|
||||
# data stream
|
||||
# %p Pair number of last data element of
|
||||
# sequence in the whole input
|
||||
# stream
|
||||
# %s Stream Number
|
||||
|
||||
compare_output_format: \
|
||||
"Pair Number: %p\tStream Number: %s\n"
|
||||
# In verbose mode, STIDE will print
|
||||
# this information for every sequence
|
||||
# which is itself an anomaly or whose
|
||||
# locality frame conatins an anomaly.
|
||||
# In very verbose mode, STIDE will
|
||||
# print this information for every
|
||||
# sequence. Possible data:
|
||||
# %a 1 if this sequence is an anomaly, 0
|
||||
# otherwise
|
||||
# %c locality frame count of this sequence
|
||||
# %h Hamming distance
|
||||
# %i Pair number of last data element of
|
||||
# its particular data stream
|
||||
# %p Pair number of last data element of
|
||||
# the entire input
|
||||
# %s Stream Number
|
||||
lf_size: 1 # 1 causes locality frame counts not
|
||||
# to be computed
|
||||
add_to_db: off # Add this data to the database, or, if there
|
||||
# is no database, create a new one -- do not
|
||||
# do comparisons
|
||||
output_graph: off # Outputs graphing information in Dot
|
||||
# format
|
||||
compute_hdist: off # Compute Hamming distances
|
||||
write_db_stats: off # At end, print out statistics about database
|
||||
verbose: off # See add_ouput_format and compare_output_format
|
||||
very_verbose: off # See add_ouput_format and compare_output_format
|
797
11/wywolania/Data/stide_v1.1/Seq-code/seq_config.cc
Executable file
797
11/wywolania/Data/stide_v1.1/Seq-code/seq_config.cc
Executable file
@ -0,0 +1,797 @@
|
||||
#include <stdlib.h>
|
||||
#include <stdio.h>
|
||||
#include <fstream.h>
|
||||
#include <string>
|
||||
#include "seq_config.h"
|
||||
#include "opt_info.h"
|
||||
|
||||
#define LF_LIM 999
|
||||
#define SEQ_LEN_LIM 199
|
||||
#define MAX_ELEM_LIM 999
|
||||
#define MAX_STREAMS_LIM 9999
|
||||
|
||||
/**********************************************************************
|
||||
* Config() *
|
||||
* Reads in configuration information from configuration file, from *
|
||||
* the command line, and from preset defaults. *
|
||||
* *
|
||||
* Input: int argc: Number of arguments on command line *
|
||||
* char *argv[]: Array of strings of actual arguments *
|
||||
* *
|
||||
* Output: Nothing *
|
||||
*********************************************************************/
|
||||
|
||||
Config::Config(const int argc, const char *argv[])
|
||||
{
|
||||
Array<OptInfo> opt_array;
|
||||
InitOptArray(opt_array);
|
||||
|
||||
SetDefaults();
|
||||
|
||||
ReadCommandLine(argc, argv, opt_array);
|
||||
|
||||
ReadConfigFile(opt_array);
|
||||
|
||||
CheckValues();
|
||||
|
||||
InitOutputFormat();
|
||||
|
||||
OuputConfigInfo(opt_array);
|
||||
|
||||
}
|
||||
|
||||
/*********************************************************************
|
||||
* InitOptArray() *
|
||||
* Sets the values of opt_array so that opr_array contains all the *
|
||||
* information needed about the parameters being set by the config *
|
||||
* file and the command-line arguments. *
|
||||
* *
|
||||
* Input: Array<OptInfo> &opt_array: Array of information about *
|
||||
* options for the program *
|
||||
* *
|
||||
* Output: Nothing *
|
||||
*********************************************************************/
|
||||
|
||||
void Config::InitOptArray(Array<OptInfo> &opt_array)
|
||||
{
|
||||
opt_array.Allocate(NUM_OPTS);
|
||||
|
||||
opt_array[0].long_name = "db_name";
|
||||
opt_array[0].short_name = "d";
|
||||
opt_array[0].set = 0;
|
||||
opt_array[0].type = 's';
|
||||
opt_array[0].str_val = &db_name;
|
||||
|
||||
opt_array[1].long_name = "seq_len";
|
||||
opt_array[1].short_name = "l";
|
||||
opt_array[1].set = 0;
|
||||
opt_array[1].type = 'i';
|
||||
opt_array[1].int_val = &seq_len;
|
||||
|
||||
opt_array[2].long_name = "max_elements";
|
||||
opt_array[2].short_name = "me";
|
||||
opt_array[2].set = 0;
|
||||
opt_array[2].type = 'i';
|
||||
opt_array[2].int_val = &max_elements;
|
||||
|
||||
opt_array[3].long_name = "max_streams";
|
||||
opt_array[3].short_name = "ms";
|
||||
opt_array[3].set = 0;
|
||||
opt_array[3].type = 'i';
|
||||
opt_array[3].int_val = &max_streams;
|
||||
|
||||
opt_array[4].long_name = "cfg_name";
|
||||
opt_array[4].short_name = "c";
|
||||
opt_array[4].set = 0;
|
||||
opt_array[4].type = 's';
|
||||
opt_array[4].str_val = &cfg_name;
|
||||
|
||||
opt_array[5].long_name = "pair_offset";
|
||||
opt_array[5].short_name = "p";
|
||||
opt_array[5].set = 0;
|
||||
opt_array[5].type = 'i';
|
||||
opt_array[5].int_val = &pair_offset;
|
||||
|
||||
opt_array[6].long_name = "add_output_format";
|
||||
opt_array[6].short_name = "aof";
|
||||
opt_array[6].set = 0;
|
||||
opt_array[6].type = 's';
|
||||
opt_array[6].str_val = &add_output_format;
|
||||
|
||||
opt_array[7].long_name = "compare_output_format";
|
||||
opt_array[7].short_name = "cof";
|
||||
opt_array[7].set = 0;
|
||||
opt_array[7].type = 's';
|
||||
opt_array[7].str_val = &compare_output_format;
|
||||
|
||||
opt_array[8].long_name = "add_to_db";
|
||||
opt_array[8].short_name = "a";
|
||||
opt_array[8].set = 0;
|
||||
opt_array[8].type = 'f';
|
||||
opt_array[8].int_val = &add_to_db;
|
||||
|
||||
opt_array[9].long_name = "output_graph";
|
||||
opt_array[9].short_name = "g";
|
||||
opt_array[9].set = 0;
|
||||
opt_array[9].type = 'f';
|
||||
opt_array[9].int_val = &output_graph;
|
||||
|
||||
opt_array[10].long_name = "compute_hdist";
|
||||
opt_array[10].short_name = "hd";
|
||||
opt_array[10].set = 0;
|
||||
opt_array[10].type = 'f';
|
||||
opt_array[10].int_val = &compute_hdist;
|
||||
|
||||
opt_array[11].long_name = "lf_size";
|
||||
opt_array[11].short_name = "lf";
|
||||
opt_array[11].set = 0;
|
||||
opt_array[11].type = 'i';
|
||||
opt_array[11].int_val = &lf_size;
|
||||
|
||||
opt_array[12].long_name = "write_db_stats";
|
||||
opt_array[12].short_name = "s";
|
||||
opt_array[12].set = 0;
|
||||
opt_array[12].type = 'f';
|
||||
opt_array[12].int_val = &write_db_stats;
|
||||
|
||||
opt_array[13].long_name = "verbose";
|
||||
opt_array[13].short_name = "v";
|
||||
opt_array[13].set = 0;
|
||||
opt_array[13].type = 'f';
|
||||
opt_array[13].int_val = &verbose;
|
||||
|
||||
opt_array[14].long_name = "very_verbose";
|
||||
opt_array[14].short_name = "V";
|
||||
opt_array[14].set = 0;
|
||||
opt_array[14].type = 'f';
|
||||
opt_array[14].int_val = &very_verbose;
|
||||
|
||||
opt_array[15].long_name = "help";
|
||||
opt_array[15].short_name = "h";
|
||||
opt_array[15].set = 0;
|
||||
opt_array[15].type = 'h';
|
||||
|
||||
}
|
||||
|
||||
/*********************************************************************
|
||||
* SetDefaults() *
|
||||
* Sets conifiguration variables to their default values *
|
||||
* *
|
||||
* Input: None *
|
||||
* *
|
||||
* Output: None *
|
||||
********************************************************************/
|
||||
|
||||
void Config::SetDefaults()
|
||||
{
|
||||
cfg_name = "stide.config";
|
||||
db_name = "default.db";
|
||||
seq_len = 6;
|
||||
max_elements = 500;
|
||||
max_streams = 100;
|
||||
pair_offset = 0;
|
||||
add_output_format = "DB Size: %d\tStream: %s\tPair Number: %p\n";
|
||||
compare_output_format = "Pair Number: %p\tStream Number: %s\n";
|
||||
lf_size = 1;
|
||||
add_to_db = 0;
|
||||
output_graph = 0;
|
||||
compute_hdist = 0;
|
||||
write_db_stats = 0;
|
||||
verbose = 0;
|
||||
very_verbose = 0;
|
||||
num_fvars = 0;
|
||||
}
|
||||
|
||||
|
||||
/*********************************************************************
|
||||
* ReadCommandLine() *
|
||||
* Parses the command line. Updates configuration variables. *
|
||||
* *
|
||||
* const int argc Number of arguments *
|
||||
* const char *argv[], Array of arguments *
|
||||
* Array<OptInfo> &opt_array Constant array of information about *
|
||||
* the configuration variables *
|
||||
********************************************************************/
|
||||
|
||||
void Config::ReadCommandLine(const int argc, const char *argv[],
|
||||
Array<OptInfo> &opt_array)
|
||||
{
|
||||
string var_name; // Name of variable
|
||||
string var_val; // Value of variable
|
||||
int name_type; // LONG_NAME or SHORT_NAME
|
||||
int argv_i = 1; // First index of argv
|
||||
int argv_j = 0; // Second index of argv
|
||||
|
||||
while (argv_i < argc) {
|
||||
if (argv[argv_i][argv_j] != '-') {
|
||||
cerr<< "ERROR: Switches must be preceeded by a dash: "<<argv[argv_i]
|
||||
<< endl << " is illegal" << endl;
|
||||
exit(-1);
|
||||
}
|
||||
argv_j++;
|
||||
if (argv[argv_i][argv_j] == '-') { // Long name
|
||||
argv_j++;
|
||||
name_type = LONG_NAME;
|
||||
}
|
||||
else {
|
||||
name_type = SHORT_NAME;
|
||||
}
|
||||
|
||||
// Read name into var_name
|
||||
var_name = argv[argv_i]+argv_j;
|
||||
|
||||
// Now we want to read the value, if there is one.
|
||||
argv_j = 0;
|
||||
if (++argv_i < argc) {
|
||||
if (argv[argv_i][argv_j] != '-') {
|
||||
var_val = argv[argv_i];
|
||||
argv_i++;
|
||||
}
|
||||
}
|
||||
|
||||
// assign value to appropriate variable
|
||||
AssignValToVar(opt_array, var_val, var_name, name_type);
|
||||
// Blank var_name and var_val for next time around
|
||||
var_name.resize(0);
|
||||
var_val.resize(0);
|
||||
}
|
||||
}
|
||||
|
||||
/*********************************************************************
|
||||
* AssignValToVar() *
|
||||
* Figures out which variable to assign a given value to and does *
|
||||
* so. Updates opt_array, to say that that particular variable *
|
||||
* has been set. *
|
||||
* *
|
||||
* Input: Array<OptInfo> &opt_array Option Information *
|
||||
* const string &var_val Value to be assigned *
|
||||
* const string &var_name Name of variable to be updated *
|
||||
* const int name_type SHORT_NAME or LONG_NAME *
|
||||
* *
|
||||
* Output: None *
|
||||
********************************************************************/
|
||||
|
||||
void Config::AssignValToVar(Array<OptInfo> &opt_array, const string
|
||||
&var_val, const string &var_name, const
|
||||
int name_type)
|
||||
{
|
||||
int opt_i;
|
||||
|
||||
for (opt_i = 0; opt_i < NUM_OPTS; opt_i++) {
|
||||
if (((name_type == LONG_NAME) && (opt_array[opt_i].long_name ==
|
||||
var_name)) ||
|
||||
((name_type == SHORT_NAME) && (opt_array[opt_i].short_name ==
|
||||
var_name))) {
|
||||
// If we have already set this variable and shouldn't change it,
|
||||
// don't
|
||||
if (opt_array[opt_i].set == 1) {
|
||||
break;
|
||||
}
|
||||
switch (opt_array[opt_i].type) {
|
||||
case 'f': // flag
|
||||
if ((var_val.length() == 0) || (var_val == "On") ||
|
||||
(var_val == "ON") || (var_val == "on")) {
|
||||
*(opt_array[opt_i].flag_val) = 1;
|
||||
opt_array[opt_i].set = 1;
|
||||
}
|
||||
else if ((var_val != "Off") && (var_val != "off") &&
|
||||
(var_val != "OFF")) {
|
||||
cerr << "ERROR: Illegal value for parameter " << var_name
|
||||
<< ". This parameter is a simple flag," << endl
|
||||
<< "and may be followed by \"on\", \"off\", or nothing "
|
||||
<< "(which turns it on). The current value is "
|
||||
<< var_val << ". Aborting...";
|
||||
exit -1;
|
||||
}
|
||||
break;
|
||||
case 'i':
|
||||
// If there isn't a value, just use the default
|
||||
if (var_val.length() == 0) {
|
||||
break;
|
||||
}
|
||||
*(opt_array[opt_i].int_val) = atoi(var_val.c_str());
|
||||
opt_array[opt_i].set = 1;
|
||||
break;
|
||||
case 's':
|
||||
// If there is no string given, just use the default
|
||||
if (var_val.length() == 0) {
|
||||
break;
|
||||
}
|
||||
*(opt_array[opt_i].str_val) = var_val;
|
||||
opt_array[opt_i].set = 1;
|
||||
break;
|
||||
case 'h':
|
||||
WriteHelpInfo();
|
||||
} // end of switch
|
||||
return; // we've found it, so we're done
|
||||
} // end of if (opt_array[opt_i]...
|
||||
} // end of for (opt_i = 0; ...
|
||||
}
|
||||
|
||||
|
||||
/*********************************************************************
|
||||
* ReadConfigFile() *
|
||||
* Parses the configuration file. Updates configuration *
|
||||
* variables. *
|
||||
* *
|
||||
* Input: Array<OptInfo> &opt_array: Option information *
|
||||
* *
|
||||
* Output: None *
|
||||
********************************************************************/
|
||||
|
||||
|
||||
void Config::ReadConfigFile(Array<OptInfo> &opt_array)
|
||||
{
|
||||
string var_name;
|
||||
string var_val;
|
||||
|
||||
// Set up stream for reading configuration
|
||||
ifstream cfg_file(cfg_name.c_str());
|
||||
string buff;
|
||||
int buff_i = 0; // index for buff
|
||||
int opt_i = 0; // index for opt_array
|
||||
int rev_num; // revision number of configuration file
|
||||
|
||||
if (!cfg_file.is_open()) {
|
||||
cerr<<"WARNING: Cannot open configuration file "<<cfg_name
|
||||
<<". I will continue, using the" <<endl
|
||||
<<"default values and the command line arguments." << endl
|
||||
<<"If that isn't what you wanted, type Ctrl-C now to abort."
|
||||
<< endl;
|
||||
return;
|
||||
}
|
||||
|
||||
// First we need to determine if the configuration file is old-style
|
||||
// or new-style, i.e., is there a #ConfigFileRev: in the first
|
||||
// line. We can determine this just be checking the first
|
||||
// character.
|
||||
char c = cfg_file.peek();
|
||||
|
||||
// Config file is empty; just return
|
||||
if (cfg_file.eof()) {
|
||||
return;
|
||||
}
|
||||
|
||||
// If old-style
|
||||
if (c != '#') {
|
||||
cerr << "WARNING: The first line of the configuration file did "
|
||||
<< "not contain the string" << endl
|
||||
<< "\"#ConfigFileRev: " << CFREV << "\"." << endl
|
||||
<< "I will assume that this is an old format configuration "
|
||||
<< "file." << endl
|
||||
<<"If that isn't what you wanted, type Ctrl-C now to abort."
|
||||
<< endl << endl;
|
||||
ReadOldConfigFile(cfg_file, opt_array);
|
||||
return;
|
||||
}
|
||||
|
||||
// Look for "#ConfigFileRev:"
|
||||
cfg_file >> buff;
|
||||
|
||||
if (buff != "#ConfigFileRev:") {
|
||||
cerr << "ERROR: I expected the first line of the configuration "
|
||||
<< "file to either be \"#ConfigFileRev: \" followed by the "
|
||||
<< "revision number or the beginning of an old-style "
|
||||
<< "configuration file, which does not have a comment in the "
|
||||
<< "first line. I'm confused, so I will abort..."
|
||||
<< endl << endl;
|
||||
exit(-1);
|
||||
}
|
||||
|
||||
cfg_file >> rev_num;
|
||||
|
||||
if (rev_num > CFREV) {
|
||||
cerr << "ERROR: This version of STIDE does not know how to deal "
|
||||
<< "with configuration files" << endl
|
||||
<< "more modern than revision " << CFREV << ". Aborting..."
|
||||
<<endl;
|
||||
exit(-1);
|
||||
}
|
||||
if (rev_num < CFREV) {
|
||||
cerr << "ERROR: Configuration files must be revision " << CFREV
|
||||
<< "or later, " << "or an old-style" << endl
|
||||
<< "configuration file without a revision number. "
|
||||
<< "Aborting..." << endl;
|
||||
exit(-1);
|
||||
}
|
||||
|
||||
// Now we know everything's as we expect, so we'll parse the file
|
||||
|
||||
while (!cfg_file.eof()) {
|
||||
// Skip white space at the beginning of the line
|
||||
while (isspace(buff[buff_i])) {
|
||||
buff_i++;
|
||||
}
|
||||
|
||||
// If buff is empty, move on to next line
|
||||
if (buff.length() <= buff_i) {
|
||||
getline(cfg_file, buff);
|
||||
buff_i = 0;
|
||||
continue;
|
||||
}
|
||||
|
||||
// If we start with a comment, move on to next line
|
||||
if (buff[buff_i] == '#') {
|
||||
getline(cfg_file, buff);
|
||||
buff_i = 0;
|
||||
continue;
|
||||
}
|
||||
// Read in variable name, up to the :
|
||||
int start_place = buff_i; // the beginning place of the name
|
||||
while (buff[buff_i] != ':' && (buff_i < buff.length())) {
|
||||
buff_i++;
|
||||
}
|
||||
if (buff[buff_i] == buff.length()) {
|
||||
cerr << "ERROR: Variable names in the configuration file must "
|
||||
<< "be followed by a colon. The line " << endl
|
||||
<< buff << endl << "contains a variable name which is not "
|
||||
<< "terminated by a colon. Aborting..." <<endl;
|
||||
exit(-1);
|
||||
}
|
||||
|
||||
// This assigns the values in buff between start_place and buff_i
|
||||
// to var_name
|
||||
var_name.assign(buff, start_place, buff_i - start_place);
|
||||
|
||||
// Skip colon
|
||||
buff_i++;
|
||||
|
||||
// Skip white space
|
||||
while (isspace(buff[buff_i])) { buff_i++; }
|
||||
|
||||
start_place = buff_i; // the starting place of the value
|
||||
// Find last point in value. If it starts with a quote, it ends
|
||||
// with a quote.
|
||||
if ((buff[buff_i] == '\"') && (buff_i < buff.length())) {
|
||||
while (buff[buff_i] != '\"') {
|
||||
buff_i++;
|
||||
}
|
||||
// Strip off first "
|
||||
start_place++;
|
||||
}
|
||||
// Otherwise, it ends with a space, a # or the end of the line
|
||||
else {
|
||||
while ((buff_i < buff.length()) && (!isspace(buff[buff_i])) &&
|
||||
(buff[buff_i] != '#')) {
|
||||
buff_i++;
|
||||
}
|
||||
}
|
||||
var_val.assign(buff, start_place, buff_i - start_place);
|
||||
|
||||
// Now we want to check to see if the line was continued, in which
|
||||
// case we haven't gotten the value of the variable in var_val, so
|
||||
// we still need to do that.
|
||||
if (buff[buff_i-1] == '\\') {
|
||||
getline(cfg_file, buff);
|
||||
buff_i = 0;
|
||||
while (isspace(buff[buff_i])) { buff_i++; }
|
||||
start_place = buff_i;
|
||||
// Find last point in value. If it starts with a quote, it ends with a
|
||||
// quote.
|
||||
if (buff[buff_i] == '\"') {
|
||||
buff_i++;
|
||||
while ((buff[buff_i] != '\"') && (buff_i < buff.length())) {
|
||||
buff_i++;
|
||||
}
|
||||
start_place++; // Strip off first "
|
||||
}
|
||||
// Otherwise, it ends with a space, a # or the end of the line
|
||||
else {
|
||||
while ((buff_i < buff.length()) && (!isspace(buff[buff_i])) &&
|
||||
(buff[buff_i] != '#')) {
|
||||
buff_i++;
|
||||
}
|
||||
}
|
||||
var_val.assign(buff, start_place, buff_i - start_place);
|
||||
}
|
||||
|
||||
// assign value to appropriate variable
|
||||
AssignValToVar(opt_array, var_val, var_name, LONG_NAME);
|
||||
getline(cfg_file, buff);
|
||||
buff_i = 0;
|
||||
} //end of while (!cfg_file.eof())...
|
||||
|
||||
}
|
||||
|
||||
|
||||
/*********************************************************************
|
||||
* ReadOldConfigFile() *
|
||||
* Reads information from an old-style configuration file. *
|
||||
* Updates configuration variables. *
|
||||
* *
|
||||
* Input: ifstream &cfg_file Configuration file (already opened) *
|
||||
* Array<OptInfo> &opt_array: Option information *
|
||||
* *
|
||||
* Output: None *
|
||||
********************************************************************/
|
||||
|
||||
void Config::ReadOldConfigFile(ifstream &cfg_file,
|
||||
Array<OptInfo> &opt_array)
|
||||
{
|
||||
|
||||
string buff;
|
||||
string var_name;
|
||||
string var_val;
|
||||
|
||||
var_name = "max_elements";
|
||||
cfg_file>>var_val;
|
||||
AssignValToVar(opt_array, var_val, var_name, LONG_NAME);
|
||||
getline(cfg_file, buff);
|
||||
|
||||
var_name = "max_streams";
|
||||
cfg_file>>var_val;
|
||||
AssignValToVar(opt_array, var_val, var_name, LONG_NAME);
|
||||
getline(cfg_file, buff);
|
||||
|
||||
// Next line is hash table size, but we are now figuring that out
|
||||
// dynamically, so just throw it away.
|
||||
getline(cfg_file, buff);
|
||||
|
||||
// Now read in the format string
|
||||
getline(cfg_file, var_val);
|
||||
// Put the format string in the appropriate place
|
||||
if (add_to_db) {
|
||||
var_name = "add_output_format";
|
||||
AssignValToVar(opt_array, var_val, var_name, LONG_NAME);
|
||||
}
|
||||
else {
|
||||
var_name = "compare_output_format";
|
||||
AssignValToVar(opt_array, var_val, var_name, LONG_NAME);
|
||||
}
|
||||
}
|
||||
|
||||
/*********************************************************************
|
||||
* CheckValues() *
|
||||
* Checks configuration values that have been read in to make *
|
||||
* sure that they are within the limits. Flags are automatically *
|
||||
* checked while being read in, the output formats are checked *
|
||||
* in InitOutputFormat(), and filenames are checked when they are *
|
||||
* opened, so all that is left is the integer values. *
|
||||
* *
|
||||
* Input: None *
|
||||
* *
|
||||
* Output: None *
|
||||
********************************************************************/
|
||||
|
||||
void Config::CheckValues()
|
||||
{
|
||||
if ((lf_size < 1) || (lf_size > LF_LIM)) {
|
||||
cerr << "ERROR: lf_size must be between 1 and " << LF_LIM
|
||||
<< ". It has been set to " << lf_size << ". Aborting..." << endl;
|
||||
exit(-1);
|
||||
}
|
||||
if ((seq_len < 1) || (seq_len > SEQ_LEN_LIM)) {
|
||||
cerr << "ERROR: seq_len must be between 1 and " << SEQ_LEN_LIM
|
||||
<< ". It has been set to " << seq_len << ". Aborting..." << endl;
|
||||
exit(-1);
|
||||
}
|
||||
if ((max_elements < 1) || (max_elements > MAX_ELEM_LIM)) {
|
||||
cerr << "ERROR: max_elements must be between 1 and " << MAX_ELEM_LIM
|
||||
<< ". It has been set to " << max_elements
|
||||
<< ". Aborting..." << endl;
|
||||
exit(-1);
|
||||
}
|
||||
if ((max_streams < 1) || (max_streams > MAX_STREAMS_LIM)) {
|
||||
cerr << "ERROR: max_streams must be between 1 and " << MAX_STREAMS_LIM
|
||||
<< ". It has been set to " << max_streams
|
||||
<< ". Aborting..." << endl;
|
||||
exit(-1);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
|
||||
/*********************************************************************
|
||||
* InitOutputFormat() *
|
||||
* Converts the string add_output_format or compare_output_format *
|
||||
* to information filling fmt_str and num_fvars, which is more *
|
||||
* convenient for output. *
|
||||
* *
|
||||
* Input: None *
|
||||
* *
|
||||
* Output: None *
|
||||
********************************************************************/
|
||||
|
||||
void Config::InitOutputFormat()
|
||||
{
|
||||
// Now we analyze add_output_format or compare_output_format
|
||||
int flag = 0;
|
||||
int f_i = 0;
|
||||
num_fvars = 0;
|
||||
string *buff;
|
||||
|
||||
// If we're not in verbose or very_verbose modes, we're never going
|
||||
// to use this information, so don't waste our time doing this
|
||||
if (!(verbose || very_verbose)) {
|
||||
return;
|
||||
}
|
||||
if (add_to_db) {
|
||||
buff = &add_output_format;
|
||||
}
|
||||
else {
|
||||
buff = &compare_output_format;
|
||||
}
|
||||
|
||||
for (int i = 0; i <(*buff).length(); i++) {
|
||||
switch ((*buff)[i]) {
|
||||
case '\\':
|
||||
i++;
|
||||
switch ((*buff)[i]) {
|
||||
case 't': fmt_str[num_fvars][f_i] = '\t'; break;
|
||||
case 'n': fmt_str[num_fvars][f_i] = '\n'; break;
|
||||
}
|
||||
break;
|
||||
case '%':
|
||||
fmt_str[num_fvars][f_i] = '%';
|
||||
flag = 1;
|
||||
break;
|
||||
default:
|
||||
fmt_str[num_fvars][f_i] = (*buff)[i];
|
||||
if (flag) {
|
||||
switch (fmt_str[num_fvars][f_i]) {
|
||||
case 'd': // database size
|
||||
case 'i': // number of last value of sequence in this
|
||||
// data stream
|
||||
case 'p': // number of last value of sequence in entire
|
||||
// input
|
||||
case 's': // external stream ID
|
||||
case 'a': // flag for whether this sequence is anomalous
|
||||
case 'c': // locality frame count of this sequence
|
||||
case 'h': // Hamming distance for this sequence
|
||||
// Record that we must write that val at that position
|
||||
write_val[num_fvars] = fmt_str[num_fvars][f_i];
|
||||
fmt_str[num_fvars][f_i] = 'd';
|
||||
fmt_str[num_fvars][f_i + 1] = '\0';
|
||||
num_fvars++;
|
||||
f_i = -1;
|
||||
flag = 0;
|
||||
break;
|
||||
default: // Unknown flag
|
||||
cerr << "ERROR: Illegal control character in output format."
|
||||
<< " Type stide -h for help." << endl;
|
||||
}
|
||||
}
|
||||
} // switch ((*buff)[i ...
|
||||
f_i++;
|
||||
}
|
||||
fmt_str[num_fvars][f_i] = '\0';
|
||||
}
|
||||
|
||||
|
||||
|
||||
/*********************************************************************
|
||||
* OutputConfigInfo() *
|
||||
* Writes information about the final configuration to standard *
|
||||
* output. Does so in a format that could be used as a *
|
||||
* configuration file. Changes no values anywhere. *
|
||||
* *
|
||||
* Input: const Array<OptInfo> &opt_array Option Information *
|
||||
* *
|
||||
* Output: None *
|
||||
********************************************************************/
|
||||
|
||||
void Config::OuputConfigInfo(const Array<OptInfo> &opt_array) const
|
||||
{
|
||||
cout<<"This run was configured using configuration file "
|
||||
<< cfg_name << " and command" << endl
|
||||
<< "line arguments. The configuration values were as "
|
||||
<< "follows." << endl
|
||||
<<"#ConfigFileRev: " << CFREV << endl;
|
||||
for (int i = 0; i < NUM_OPTS; i++) {
|
||||
if (opt_array[i].type == 'i') {
|
||||
cout << opt_array[i].long_name << ": " << *(opt_array[i].int_val)
|
||||
<< endl;
|
||||
}
|
||||
if ((opt_array[i].type == 's') &&
|
||||
((add_to_db && (opt_array[i].short_name == "aof")) ||
|
||||
(!add_to_db && (opt_array[i].short_name == "cof")))) {
|
||||
cout << opt_array[i].long_name << ": \"" << *(opt_array[i].str_val)
|
||||
<< "\"" << endl;
|
||||
}
|
||||
if (opt_array[i].type == 'f') {
|
||||
if (*(opt_array[i].int_val) == 1) {
|
||||
cout << opt_array[i].long_name << ": On" << endl;
|
||||
}
|
||||
if (*(opt_array[i].int_val) == 0) {
|
||||
cout << opt_array[i].long_name << ": Off" << endl;
|
||||
}
|
||||
}
|
||||
}
|
||||
cout << endl << endl;
|
||||
|
||||
// Now print header for verbose modes
|
||||
if (verbose || very_verbose) {
|
||||
cout<<endl<<"Variables in output: "<<endl;
|
||||
for (int j = 0; j < num_fvars; j++) {
|
||||
switch (write_val[j]) {
|
||||
case 's': cout<<"stream #, "; break;
|
||||
case 'i': cout<<"index #, "; break;
|
||||
case 'h': if (compute_hdist) {cout<<"hamming miss, "; } break;
|
||||
case 'c': if (lf_size > 1) {cout<<"lfc, "; } break;
|
||||
case 'p': cout<<"pair #, "; break;
|
||||
case 'd': cout<<"db size, "; break;
|
||||
case 'a': cout<<"is anomalous?, "; break;
|
||||
}
|
||||
}
|
||||
cout<<endl;
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
/*********************************************************************
|
||||
* WriteHelpInfo() *
|
||||
* Writes help information to standard output. Changes no values.*
|
||||
* *
|
||||
* Input: None *
|
||||
* *
|
||||
* Output: None *
|
||||
* *
|
||||
********************************************************************/
|
||||
|
||||
|
||||
void Config::WriteHelpInfo() const
|
||||
{
|
||||
cout<<"STIDE accepts calls of the form:"<<endl
|
||||
<<" stide -c cfg_name -d db_name -e max_num_elements"
|
||||
<<" -lf lf_size -l seq_len"<<endl<<" -n max_num_streams"
|
||||
<<" -p pair_num_offset -aof add_out_format "
|
||||
<< endl << " -cof comp_out_format -a -g -h -m -s -v -V"
|
||||
<< endl << endl;
|
||||
cout<<"STIDE expects input to come through standard input in"
|
||||
<<" the format of a pair"<<endl
|
||||
<<"of integers per line, where the first integer is a"
|
||||
<<" stream identifier"<<endl
|
||||
<<"and the second is a data element. Command line"
|
||||
<<" arguments override"<<endl
|
||||
<<"specifications in the configuration file. All"
|
||||
<<" parameters are optional"<<endl
|
||||
<<"and can be specified in any order. Parameters"
|
||||
<<" are always preceded by a"<<endl
|
||||
<<"switch. The switches are:"<<endl<<endl;
|
||||
cout<<"-a Add to database; defaults to off"<<endl;
|
||||
cout<<"-c cfg_name The name of file containing the"
|
||||
<<" configuration;"<<endl
|
||||
<<" defaults to \"stide.config\""<<endl;
|
||||
cout<<"-d db_name The name of the file containing"
|
||||
<<" the database;"<<endl
|
||||
<<" defaults to \"default.db\""<<endl;
|
||||
cout<<"-lf lf_size The size of the locality frame;"
|
||||
<<" defaults to 1"<<endl;
|
||||
cout<<"-g Write graphing data in dot format to"
|
||||
<<" db_name.dot;"<<endl
|
||||
<<" defaults to off"<<endl;
|
||||
cout<<"-h Help; displays this information"<<endl;
|
||||
cout<<"-l seq_len Length of sequence; defaults to 6"
|
||||
<<endl;
|
||||
cout<<"-p pair_offset Offset for pair number count;"
|
||||
<<" defaults to 0"<<endl;
|
||||
cout<<"-s Display db stats; defaults to off"
|
||||
<<endl;
|
||||
cout<<"-v Verbose mode on; defaults to off"<<endl;
|
||||
cout<<"-V Very verbose mode on; defaults to off"<<endl;
|
||||
cout<<"-hd Compute Hamming distance measures;"
|
||||
<<" defaults to off"<<endl;
|
||||
cout<<"-me max_elements Maximum number of different"
|
||||
<<" elements"<<endl
|
||||
<<" in the input stream; defaults to"
|
||||
<<" 500" <<endl;
|
||||
cout<<"-ms max_num_streams Maximum number of different"
|
||||
<<" streams in input;"<<endl
|
||||
<<" defaults to 100"<<endl;
|
||||
cout<<"-aof add_out_format Format for output when adding to"
|
||||
<<" database"<<endl
|
||||
<<" in verbose or very_verbose"
|
||||
<<" modes; defaults to"<<endl
|
||||
<<" \"DB Size: %d\\tStream: "
|
||||
<<"%s\\tPair Number: %p\\n\""<<endl;
|
||||
cout<<"-cof compare_out_format Format for output when comparing"
|
||||
<<" with database"<<endl
|
||||
<<" in verbose or very_verbose modes;"
|
||||
<<" defaults to"<<endl
|
||||
<<" \"Pair Number: %p\\tStream"
|
||||
<<" Number: %s\\n\""<<endl;
|
||||
exit(-1);
|
||||
}
|
||||
|
||||
|
||||
|
68
11/wywolania/Data/stide_v1.1/Seq-code/seq_config.h
Executable file
68
11/wywolania/Data/stide_v1.1/Seq-code/seq_config.h
Executable file
@ -0,0 +1,68 @@
|
||||
#ifndef __SEQ_CONFIG_H
|
||||
#define __SEQ_CONFIG_H
|
||||
|
||||
#define CFREV 1
|
||||
|
||||
#include <iostream.h>
|
||||
#include <fstream.h>
|
||||
#include <string>
|
||||
#include "opt_info.h"
|
||||
|
||||
class Config {
|
||||
public:
|
||||
Config(const int argc, const char *argv[]); // Constructor; reads
|
||||
// configuration file and command
|
||||
// line arguments
|
||||
string cfg_name; // Name of configuration file
|
||||
string db_name; // Name of database
|
||||
int seq_len; // Sequence Length
|
||||
int max_elements; // Maximum number of different
|
||||
// data elements we may encounter
|
||||
int max_streams; // Maximum number of different
|
||||
// streams we may encounter
|
||||
int pair_offset; // Number by which to offset
|
||||
// num_pairs_read
|
||||
string add_output_format; // Format for verbose-mode output
|
||||
// when adding to database
|
||||
string compare_output_format; // Format for verbose-mode output
|
||||
// when comparing with an
|
||||
// existing database
|
||||
int lf_size; // Size of locality frames: 1
|
||||
// effectively means don't
|
||||
// compute locality frames
|
||||
int add_to_db; // Flag indicating that we should
|
||||
// add to the database rather
|
||||
// than make comparisons
|
||||
int output_graph; // Output graphing information in
|
||||
// Dot format
|
||||
int compute_hdist; // Compute Hamming distance
|
||||
int write_db_stats; // Write statistics about the
|
||||
// database
|
||||
int verbose; // Output information about each
|
||||
// anomaly or each new sequence
|
||||
// added to the database
|
||||
int very_verbose; // Output information about each
|
||||
// sequence encountered
|
||||
char fmt_str[10][50]; // String used for outputting
|
||||
// information in verbose mode
|
||||
char write_val[7]; // Do we write the value? used
|
||||
// with fmt_str
|
||||
int num_fvars; // Number of format variables
|
||||
|
||||
void Config::InitOptArray(Array<OptInfo> &opt_array);
|
||||
void Config::SetDefaults();
|
||||
void Config::ReadCommandLine(const int argc, const char *argv[],
|
||||
Array<OptInfo> &opt_array);
|
||||
void Config::AssignValToVar(Array<OptInfo> &opt_array, const
|
||||
string &var_val, const string
|
||||
&var_name, const int name_type);
|
||||
void Config::ReadConfigFile(Array<OptInfo> &opt_array);
|
||||
void Config::ReadOldConfigFile(ifstream &cfg_file,
|
||||
Array<OptInfo> &opt_array);
|
||||
void Config::InitOutputFormat();
|
||||
void Config::CheckValues();
|
||||
void Config::OuputConfigInfo(const Array<OptInfo> &opt_array) const;
|
||||
void Config::WriteHelpInfo() const;
|
||||
};
|
||||
|
||||
#endif
|
358
11/wywolania/Data/stide_v1.1/Seq-code/seq_stream.cc
Executable file
358
11/wywolania/Data/stide_v1.1/Seq-code/seq_stream.cc
Executable file
@ -0,0 +1,358 @@
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <iostream.h>
|
||||
#include <fstream.h>
|
||||
#include <stdio.h>
|
||||
#include "../Utils/hash.h"
|
||||
#include "seq_stream.h"
|
||||
|
||||
/********************************************************************
|
||||
* Init() *
|
||||
* Initializes an instance of Stream. *
|
||||
* *
|
||||
* Input: const Config &cfg Configuration information *
|
||||
* const int intern internal stream identifier *
|
||||
* const int extern external stream identifier *
|
||||
* Output: none *
|
||||
*******************************************************************/
|
||||
|
||||
|
||||
void Stream::Init(const Config &cfg,
|
||||
const int intern_id, const int extern_id) {
|
||||
// initialize all the arrays
|
||||
current_seq.Allocate(cfg.seq_len);
|
||||
current_seq.Set(-1); // initialize the array to be empty
|
||||
num_in_seq = -1;
|
||||
num_pairs_read = 0;
|
||||
num_anoms = 0;
|
||||
num_seqs_fnd = 0;
|
||||
int_sid = intern_id;
|
||||
ext_sid = extern_id;
|
||||
max_hdist = 0;
|
||||
seq_hdist = 0;
|
||||
lf.Allocate(cfg.lf_size);
|
||||
lf.Set(0);
|
||||
seq_lfc = 0;
|
||||
max_lfc = 0;
|
||||
ready = 0;
|
||||
seq_len = cfg.seq_len;
|
||||
}
|
||||
|
||||
/*********************************************************************
|
||||
* Append() *
|
||||
* This function puts the integer given into the current_seq array *
|
||||
* as the last element. It flags ready according to whether *
|
||||
* current_seq is full. Updates num_in_seq, ready, current_seq, *
|
||||
* num_seqs_fnd, and num_pairs_read. *
|
||||
* *
|
||||
* Input: const int new_value The next value to be put into the *
|
||||
* current_seq array *
|
||||
* Output: none *
|
||||
********************************************************************/
|
||||
|
||||
void Stream::Append(const int new_value)
|
||||
{
|
||||
// missing system call - zero the current sequence
|
||||
if (new_value == -1) {
|
||||
num_in_seq = -1;
|
||||
ready = 0;
|
||||
}
|
||||
else {
|
||||
num_pairs_read++;
|
||||
if (num_in_seq < seq_len - 1) { // window not yet full
|
||||
num_in_seq++;
|
||||
current_seq[num_in_seq] = new_value;
|
||||
if (num_in_seq == seq_len - 1) {
|
||||
ready = 1;
|
||||
++num_seqs_fnd;
|
||||
}
|
||||
}
|
||||
|
||||
else {
|
||||
// Roll over current_seq array
|
||||
for (int k = 0; k < num_in_seq; k++) {
|
||||
current_seq[k] = current_seq[k + 1];
|
||||
}
|
||||
current_seq[num_in_seq] = new_value;
|
||||
++num_seqs_fnd;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/********************************************************************
|
||||
* AddToDB() *
|
||||
* *
|
||||
* Adds current_seq to the database if it isn't already there; *
|
||||
* Returns 0 if it is already there, 1 if it is new. Updates *
|
||||
* normal and db_size. *
|
||||
* *
|
||||
* Input: SeqForest &normal Forest of normal sequences *
|
||||
* int &db_size Number of unique sequences in the *
|
||||
* database *
|
||||
* const int total_pairs_read Number of pairs read from the *
|
||||
* entire input stream *
|
||||
* const Config &cfg Configuration Information *
|
||||
* Output: 0 if sequence isn't new, 1 if it is *
|
||||
********************************************************************/
|
||||
|
||||
|
||||
int Stream::AddToDB(SeqForest &normal, int &db_size, const int
|
||||
total_pairs_read, const Config &cfg) const
|
||||
{
|
||||
int is_new;
|
||||
|
||||
// If there is not a tree with the same root as this sequence has,
|
||||
// make a new tree with that root and flag trees_found
|
||||
if (!normal.trees_found[current_seq[0]]) {
|
||||
normal.trees[current_seq[0]].SetRoot(current_seq[0]);
|
||||
normal.trees_found[current_seq[0]] = 1;
|
||||
}
|
||||
// Try to add the sequence. If it's already there, is_new will be
|
||||
// set to 0, otherwise it will be set to 1.
|
||||
is_new = normal.trees[current_seq[0]].InsertSeq(current_seq, 0,
|
||||
seq_len-1);
|
||||
db_size += is_new;
|
||||
if ((is_new && cfg.verbose) || cfg.very_verbose) {
|
||||
ReportNewSeq(cfg, total_pairs_read, db_size);
|
||||
}
|
||||
if (is_new)
|
||||
return 1;
|
||||
else
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
/*********************************************************************
|
||||
* CompareSeq() *
|
||||
* Compares the current sequence in this stream to the database, *
|
||||
* in the manner indicated by the configuration file. Reports *
|
||||
* on anomalies if told to by the configuration file. Updates *
|
||||
* num_anoms, seq_hdist, max_hdist, seq_lfc, and max_lfc. *
|
||||
* *
|
||||
* Input: const Config &cfg: Information from configuration file *
|
||||
* const SeqForest &normal: DB of normal sequences *
|
||||
* const int total_pairs_read: Number of pairs read from *
|
||||
* all of the streams *
|
||||
* Output: none *
|
||||
********************************************************************/
|
||||
|
||||
void Stream::CompareSeq(const Config &cfg, const SeqForest &normal,
|
||||
const int total_pairs_read)
|
||||
{
|
||||
int is_anom; // flag to indicate whether current_seq is an anomaly
|
||||
|
||||
is_anom = ComputeMisses(normal);
|
||||
if ((is_anom) && (cfg.compute_hdist)) {
|
||||
ComputeHDist(normal);
|
||||
}
|
||||
if (cfg.lf_size > 1) {
|
||||
ComputeLF(is_anom, cfg.lf_size);
|
||||
}
|
||||
// if we're in verbose mode and either current_seq is an anomaly or
|
||||
// its locality frame contains an anomaly, report it
|
||||
if ((cfg.very_verbose) || (cfg.verbose && (is_anom || seq_lfc))) {
|
||||
ReportSeq(cfg, total_pairs_read, is_anom);
|
||||
}
|
||||
}
|
||||
|
||||
/*********************************************************************
|
||||
* ComputeMisses() *
|
||||
* Compares the current sequence to the database sequences. If *
|
||||
* there is an exact match, we return 0. Otherwise we return 1. *
|
||||
* Updates num_anoms and seq_hdist. *
|
||||
* *
|
||||
* Input: const SeqForest &normal: DB of normal sequences *
|
||||
* Output: 0 if there is an exact match *
|
||||
* 1 if the sequence is anomalous *
|
||||
********************************************************************/
|
||||
|
||||
|
||||
int Stream::ComputeMisses(const SeqForest &normal)
|
||||
{
|
||||
if (normal.IsSeqInForest(current_seq, seq_len)) {
|
||||
seq_hdist = 0;
|
||||
return(0);
|
||||
}
|
||||
|
||||
// We have an anomaly
|
||||
++num_anoms;
|
||||
return(1);
|
||||
}
|
||||
|
||||
/*********************************************************************
|
||||
* ComputeHDist() *
|
||||
* Compares the current sequence in this stream to each sequence *
|
||||
* in the database in turn, adding up the number of mismatches *
|
||||
* between the two sequences. The smallest difference between *
|
||||
* the current sequence and the database sequences is the minimum *
|
||||
* Hamming distance for the current sequence. If this minimum *
|
||||
* Hamming distance is greater than the largest minimum Hamming *
|
||||
* distance encountered so far, then the variable max_hdist is *
|
||||
* updated. Updates seq_hdist and max_hdist. *
|
||||
* *
|
||||
* Input: const SeqForest &normal: DB of normal sequences *
|
||||
* *
|
||||
* Output: none *
|
||||
********************************************************************/
|
||||
|
||||
void Stream::ComputeHDist(const SeqForest &normal)
|
||||
{
|
||||
int misses_on_this_seq; // the number of mismatches between
|
||||
// current_seq and the sequence we're
|
||||
// comparing it with at the moment
|
||||
seq_hdist = seq_len; // start with seq_hdist as high as
|
||||
// possible
|
||||
|
||||
// We compare current_seq with each sequence in our database tree
|
||||
for (int i = 0; i < normal.trees.Size(); i++) {
|
||||
// Have we seen any sequences starting with element i? If not, we
|
||||
// can go on to consider sequences starting with element i+1.
|
||||
if (normal.trees_found[i]) {
|
||||
misses_on_this_seq =
|
||||
normal.trees[i].ComputeHDistForTree(current_seq, 0,
|
||||
seq_len-1);
|
||||
if (misses_on_this_seq < seq_hdist) {
|
||||
seq_hdist = misses_on_this_seq;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (seq_hdist > max_hdist) {
|
||||
max_hdist = seq_hdist;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/*********************************************************************
|
||||
* ComputeLF() *
|
||||
* Computes the number of misses in current_seq's locality frame. *
|
||||
* Updates lf, seq_lfc and max_lfc. *
|
||||
* *
|
||||
* Input: const int is_anom Flag to indicate whether *
|
||||
* current_seq is an anomaly *
|
||||
* const int lf_size Size of locality frame *
|
||||
* Output: none *
|
||||
********************************************************************/
|
||||
|
||||
|
||||
void Stream::ComputeLF(const int is_anom, const int lf_size)
|
||||
{
|
||||
// When num_seqs_fnd is less than lf_size, the locality frame
|
||||
// array is not full
|
||||
if (num_seqs_fnd <= lf_size) {
|
||||
lf[num_seqs_fnd-1] = is_anom;
|
||||
seq_lfc += is_anom;
|
||||
}
|
||||
else {
|
||||
// We're about to remove the first element of lf; since seq_lfc is
|
||||
// the sum of the elements of lf, we should subtract lf[0] from
|
||||
// seq_lfc to remove it from the sum.
|
||||
seq_lfc -= lf[0];
|
||||
// Now we add is_anom and seq_lfc is the sum of the new locality
|
||||
// frame.
|
||||
seq_lfc += is_anom;
|
||||
|
||||
// roll over the array
|
||||
for (int i = 0; i < lf_size-1; i++) {
|
||||
lf[i] = lf[i+1];
|
||||
}
|
||||
lf[lf_size-1] = is_anom;
|
||||
}
|
||||
if (seq_lfc > max_lfc) {
|
||||
max_lfc = seq_lfc;
|
||||
}
|
||||
}
|
||||
|
||||
/*********************************************************************
|
||||
* ReportSeq() *
|
||||
* This function reports data about a sequence. Specifically, it *
|
||||
* can report the external stream id, a number indicating where *
|
||||
* the first element of the current sequence occurs in the input, *
|
||||
* a number indicating how many pairs from this particular data *
|
||||
* stream have been read prior to the first element of the *
|
||||
* sequence, the minimum Hamming distance for the current *
|
||||
* sequence, the locality frame count, the locality frame count, *
|
||||
* and whether this particular sequence is itself an anomaly (it *
|
||||
* could be that some other sequence in its locality frame is *
|
||||
* anomalous). The configuration file determines which of those *
|
||||
* possible data are reported and in what format. Updates no *
|
||||
* values. *
|
||||
* *
|
||||
* Input: const Config &cfg Configuration information *
|
||||
* const int total_pairs_read Total number of pairs read *
|
||||
* from the input stream from any data *
|
||||
* stream, not just this one *
|
||||
* const int is_anom flag for whether the current *
|
||||
* sequence is itself an anomaly *
|
||||
* Output: none *
|
||||
********************************************************************/
|
||||
|
||||
void Stream::ReportSeq(const Config &cfg, const int total_pairs_read,
|
||||
const int is_anom) const
|
||||
{
|
||||
for (int i = 0; i < cfg.num_fvars; i++) {
|
||||
switch (cfg.write_val[i]) {
|
||||
case 'a':
|
||||
printf(cfg.fmt_str[i], is_anom); break;
|
||||
case 'c':
|
||||
if (cfg.lf_size > 1) {
|
||||
printf(cfg.fmt_str[i], seq_lfc);
|
||||
}
|
||||
break;
|
||||
case 'h':
|
||||
if (cfg.compute_hdist) {
|
||||
printf(cfg.fmt_str[i], seq_hdist);
|
||||
}
|
||||
break;
|
||||
case 'i':
|
||||
printf(cfg.fmt_str[i], num_pairs_read); break;
|
||||
case 'p':
|
||||
printf(cfg.fmt_str[i], total_pairs_read); break;
|
||||
case 's':
|
||||
printf(cfg.fmt_str[i], ext_sid); break;
|
||||
}
|
||||
}
|
||||
printf(cfg.fmt_str[cfg.num_fvars]);
|
||||
}
|
||||
|
||||
|
||||
/*********************************************************************
|
||||
* ReportNewSeq() *
|
||||
* This function reports on sequences which have been newly added *
|
||||
* to the database. It can report the external stream *
|
||||
* identifier, where the first element of the sequence occurs *
|
||||
* both within the whole input stream and within its own data *
|
||||
* stream, and the number of unique sequences in the database *
|
||||
* after this sequence has been added. The configuration file *
|
||||
* determines which of those possible data are reported and in *
|
||||
* what format. Updates no values. *
|
||||
* *
|
||||
* Input: const Config &cfg Configuration information *
|
||||
* const int total_pairs_read Total number of pairs read *
|
||||
* from the input stream from any data *
|
||||
* stream, not just this one *
|
||||
* const int db_size Number of unique sequences *
|
||||
* in the database *
|
||||
* Output: none *
|
||||
********************************************************************/
|
||||
|
||||
void Stream::ReportNewSeq(const Config &cfg, const int total_pairs_read,
|
||||
const int db_size) const
|
||||
{
|
||||
for (int i = 0; i < cfg.num_fvars; i++) {
|
||||
switch (cfg.write_val[i]) {
|
||||
case 'd':
|
||||
printf(cfg.fmt_str[i], db_size); break;
|
||||
case 'i':
|
||||
printf(cfg.fmt_str[i], num_pairs_read); break;
|
||||
case 'p':
|
||||
printf(cfg.fmt_str[i], total_pairs_read); break;
|
||||
case 's':
|
||||
printf(cfg.fmt_str[i], ext_sid); break;
|
||||
}
|
||||
}
|
||||
printf(cfg.fmt_str[cfg.num_fvars]);
|
||||
}
|
||||
|
||||
|
||||
|
61
11/wywolania/Data/stide_v1.1/Seq-code/seq_stream.h
Executable file
61
11/wywolania/Data/stide_v1.1/Seq-code/seq_stream.h
Executable file
@ -0,0 +1,61 @@
|
||||
#ifndef __STREAM_H
|
||||
#define __STREAM_H
|
||||
|
||||
#include "../Utils/arrays.h"
|
||||
#include "seq_config.h"
|
||||
#include "flexitree.h"
|
||||
|
||||
class Stream {
|
||||
public:
|
||||
Stream() {};
|
||||
void Init(const Config &cfg, const int intern_id, const int
|
||||
extern_id);
|
||||
void Append(const int next_value);
|
||||
int AddToDB(SeqForest &normal, int &db_size, int total_pairs_read,
|
||||
const Config &cfg) const;
|
||||
void CompareSeq(const Config &cfg, const SeqForest &normal, const
|
||||
int total_pairs_read);
|
||||
int GetMaxHDist(void) {return max_hdist;}
|
||||
int GetMaxLFC(void) {return max_lfc;}
|
||||
int Ready(void) {return ready;}
|
||||
int GetNumAnoms(void) {return num_anoms;}
|
||||
int GetNumPairsRead(void) {return num_pairs_read;}
|
||||
int GetNumSeqsFnd(void) {return num_seqs_fnd;}
|
||||
private:
|
||||
Array<int> current_seq; // current sequence being filled or
|
||||
// processed
|
||||
int num_in_seq; // current_seq is full up through
|
||||
// num_in_seq
|
||||
int num_pairs_read; // the number of input pairs belonging to
|
||||
// this stream that have been read so far
|
||||
int num_anoms; // the number of anomalies found so far
|
||||
int num_seqs_fnd; // the number of (not necessarily unique)
|
||||
// sequences belonging to this stream
|
||||
// found so far
|
||||
int ext_sid; // the external stream id
|
||||
int int_sid; // the internal stream id
|
||||
int max_hdist; // the largest minimum Hamming distance
|
||||
// found in this stream
|
||||
int seq_hdist; // the minimum Hamming distance for
|
||||
// current_seq
|
||||
Array<int> lf; // array for locality frame
|
||||
int seq_lfc; // the locality frame count for this
|
||||
// sequence
|
||||
int max_lfc; // the largest locality frame count
|
||||
// encountered so far
|
||||
int ready; // a flag to indicate whether this stream
|
||||
// has a full sequence ready to be
|
||||
// processed. 0 = no, 1 = yes.
|
||||
int seq_len; // sequence length
|
||||
int ComputeMisses(const SeqForest &normal);
|
||||
void ComputeHDist(const SeqForest &normal);
|
||||
void ComputeLF(const int is_anom, const int lf_size);
|
||||
void ReportSeq(const Config &cfg, const int total_pairs_read,
|
||||
const int is_anom) const;
|
||||
void ReportNewSeq(const Config &cfg, const int total_pairs_read,
|
||||
const int db_size) const;
|
||||
};
|
||||
|
||||
#endif
|
||||
|
||||
|
574
11/wywolania/Data/stide_v1.1/Seq-code/stide.cc
Executable file
574
11/wywolania/Data/stide_v1.1/Seq-code/stide.cc
Executable file
@ -0,0 +1,574 @@
|
||||
/*********************************************************************
|
||||
* *
|
||||
* STIDE: Sequence Time-Delay Embedding v1.1 *
|
||||
* *
|
||||
* Written by Steve Hofmeyr 7/21/96 *
|
||||
* Revised by Julie Rehmeyer 3/98 *
|
||||
* *
|
||||
* Copyright (C) 1996, 1998 Regents of the University of New Mexico. *
|
||||
* All Rights Reserved. *
|
||||
* *
|
||||
* This program is free software; you can redistribute it and/or *
|
||||
* modify it under the terms of the GNU General Public License as *
|
||||
* published by the Free Software Foundation; either version 2 of *
|
||||
* the License, or (at your option) any later version. *
|
||||
* *
|
||||
* This program is distributed in the hope that it will be useful, *
|
||||
* but WITHOUT ANY WARRANTY; without even the implied warranty of *
|
||||
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the *
|
||||
* GNU General Public License for more details. *
|
||||
* *
|
||||
* You should have received a copy of the GNU General Public *
|
||||
* License along with this program; if not, write to the Free *
|
||||
* Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, *
|
||||
* USA. *
|
||||
* *
|
||||
********************************************************************/
|
||||
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <iostream.h>
|
||||
#include <fstream.h>
|
||||
#include "../Utils/arrays.h"
|
||||
#include "../Utils/hash.h"
|
||||
#include "seq_config.h"
|
||||
#include "seq_stream.h"
|
||||
#include "flexitree.h"
|
||||
|
||||
#define DBREV 1
|
||||
|
||||
int counter = 0;
|
||||
|
||||
Stream *GetReadyStream(Array<Stream> &streams, HashTableInt
|
||||
&sid_table, int &num_streams_fnd, int
|
||||
&total_pairs_read, const Config &cfg);
|
||||
int ReadDB(SeqForest &db_forest, const string &db_name,
|
||||
int &seq_len);
|
||||
void WriteDB(const SeqForest &db_forest, const string &db_name, const
|
||||
int db_size, const int seq_len);
|
||||
void FinalReport(const Config &cfg, const SeqForest &normal, const int
|
||||
num_streams_fnd, const int num_seqs_added, const
|
||||
Array<Stream> &streams, const int db_size);
|
||||
void WriteDBStats(const SeqForest &db_forest, ostream &out_stream,
|
||||
const int db_size);
|
||||
void OutputGraph(const SeqForest &db_forest, string db_name);
|
||||
int GetPrimeLargerThan(const int n);
|
||||
|
||||
/*********************************************************************
|
||||
* main() *
|
||||
* Input: int argc: Number of command-line arguments *
|
||||
* char *argv[]: array of strings containing *
|
||||
* command-line arguments *
|
||||
* Output: 0 if successful, -1 if unsuccessful *
|
||||
*********************************************************************/
|
||||
|
||||
int main(int argc, char *argv[])
|
||||
|
||||
{
|
||||
Config cfg((const int) argc, (const char **) argv);
|
||||
// Declare configuration object and do
|
||||
// the configuration on the basis of the
|
||||
// command line arguments and the
|
||||
// configuration file
|
||||
Stream *active_stream; // This will point to the stream that
|
||||
// currently has a sequence to be worked
|
||||
// on (either added to the database or
|
||||
// compared).
|
||||
HashTableInt sid_table(GetPrimeLargerThan(cfg.max_streams));
|
||||
// Hash table relating external stream ids to
|
||||
// internal sids; make size of table
|
||||
// smallest prime larger than the number
|
||||
// of streams
|
||||
SeqForest normal(cfg.max_elements); // Uninitialized forest of
|
||||
// normal sequences
|
||||
Array<Stream> streams(cfg.max_streams); // Array of stream objects,
|
||||
// one for each data stream
|
||||
// in input, which are
|
||||
// allocated as needed
|
||||
int num_streams_fnd = 0; // Number of data streams
|
||||
// encountered to date
|
||||
int total_pairs_read = cfg.pair_offset; // Number of pairs read from
|
||||
// input to date from all
|
||||
// the data streams combined
|
||||
// -- can be offset using
|
||||
// the "-n" switch
|
||||
int db_size; // Total number of unique
|
||||
// sequences in the database
|
||||
int init_db_size = 0; // Number of unique
|
||||
// sequences in the
|
||||
// pre-existing database
|
||||
|
||||
|
||||
|
||||
|
||||
// Read database into normal, if database exists
|
||||
db_size = init_db_size = ReadDB(normal, cfg.db_name, cfg.seq_len);
|
||||
|
||||
if (cfg.add_to_db) {
|
||||
while ((active_stream =
|
||||
GetReadyStream(streams, sid_table, num_streams_fnd,
|
||||
total_pairs_read, cfg))
|
||||
!= NULL) {
|
||||
active_stream->AddToDB(normal, db_size, total_pairs_read, cfg);
|
||||
}
|
||||
WriteDB(normal, cfg.db_name, db_size, cfg.seq_len);
|
||||
if (cfg.output_graph) {
|
||||
OutputGraph(normal,cfg.db_name);
|
||||
}
|
||||
|
||||
}
|
||||
else {
|
||||
int i = 0;
|
||||
while ((active_stream =
|
||||
GetReadyStream(streams, sid_table, num_streams_fnd,
|
||||
total_pairs_read, cfg))
|
||||
!= NULL) {
|
||||
active_stream->CompareSeq(cfg, normal, total_pairs_read);
|
||||
}
|
||||
}
|
||||
FinalReport(cfg, normal, num_streams_fnd, db_size - init_db_size,
|
||||
streams, db_size);
|
||||
return(0);
|
||||
}
|
||||
|
||||
/**********************************************************************
|
||||
* GetReadyStream() *
|
||||
* This function reads a pair from the input, appends the element *
|
||||
* to the current sequence string in the appropriate data stream, *
|
||||
* finds out if that data stream has a complete sequence to be *
|
||||
* processed, continues until it has found such a data stream, and *
|
||||
* returns a pointer to it. It updates num_streams_fnd, *
|
||||
* total_pairs_read, sid_table, and streams. *
|
||||
* *
|
||||
* Input: Array<Stream> &streams: the array of streams that we have *
|
||||
* found so far *
|
||||
* HashTableInt &sid_table: hash table relating external sids *
|
||||
* to internal sids *
|
||||
* int &num_streams_fnd: the number of streams found so far; *
|
||||
* int &total_pairs_read: the number of pairs read from the *
|
||||
* input stream so far *
|
||||
* const Config &cfg: configuration information *
|
||||
* *
|
||||
* Output: a pointer to the next stream that is ready for processing *
|
||||
**********************************************************************/
|
||||
|
||||
Stream *GetReadyStream(Array<Stream> &streams, HashTableInt
|
||||
&sid_table, int &num_streams_fnd, int
|
||||
&total_pairs_read, const Config &cfg)
|
||||
|
||||
{
|
||||
Stream *ready_stream = NULL;
|
||||
int ext_sid;
|
||||
int int_sid;
|
||||
int sval;
|
||||
|
||||
cin >> ext_sid;
|
||||
while (!cin.eof()) {
|
||||
if (ext_sid == -1) {
|
||||
break;
|
||||
}
|
||||
int_sid = sid_table.ExtToInt(ext_sid, num_streams_fnd);
|
||||
cin >> sval;
|
||||
++total_pairs_read;
|
||||
|
||||
// Update num_streams_fnd, if necessary
|
||||
if (int_sid >= num_streams_fnd) {
|
||||
if (int_sid > cfg.max_streams) {
|
||||
cerr<<"ERROR: Too many streams to follow, aborting..."<<endl;
|
||||
exit(-1);
|
||||
}
|
||||
// We need a new stream object
|
||||
streams[num_streams_fnd].Init(cfg, int_sid, ext_sid);
|
||||
num_streams_fnd = int_sid + 1;
|
||||
}
|
||||
streams[int_sid].Append(sval);
|
||||
if (streams[int_sid].Ready()) {
|
||||
ready_stream = &streams[int_sid];
|
||||
break;
|
||||
}
|
||||
cin >> ext_sid;
|
||||
}
|
||||
|
||||
return ready_stream;
|
||||
}
|
||||
|
||||
/*********************************************************************
|
||||
* ReadDB() *
|
||||
* Reads the database from a file and returns the number of unique *
|
||||
* sequences in the database. Checks for appropriate revision *
|
||||
* number. If it is a revision DBREV database, the second line *
|
||||
* will be "#DBseq_len: " followed by the sequence length. The *
|
||||
* next line will contain a single number, giving the root of the *
|
||||
* first tree. The following lines will contain the tree itself. *
|
||||
* The first seq_len numbers make up the first sequence (so the *
|
||||
* first number of the second line will be the same as the number *
|
||||
* on the first line). The next number will be a negative number *
|
||||
* between -(seq_len-1) and -2, indicating how far to backtrack in *
|
||||
* the first sequence, and the following positive numbers give the *
|
||||
* rest of the second sequence. So, for example, -3 would mean *
|
||||
* backtrack 3 numbers, take the previous numbers including the *
|
||||
* one you're on, and append the next two numbers. So after the *
|
||||
* -3 you would find two positive numbers, followed by a negative *
|
||||
* number (which you would use the same way as you used the -3, on *
|
||||
* the most recent sequence). Each tree is terminated by the *
|
||||
* number -1. So the sample input file *
|
||||
* 3 *
|
||||
* 3 4 2 9 10 3 -4 3 9 8 -2 3 -3 4 9 -1 *
|
||||
* 2 *
|
||||
* 2 3 4 5 6 7 -3 2 9 -1 *
|
||||
* yields the sequences: *
|
||||
* 3 4 2 9 10 3 *
|
||||
* 3 4 2 3 9 8 *
|
||||
* 3 4 2 3 9 3 *
|
||||
* 3 4 2 3 4 9 *
|
||||
* 2 3 4 5 6 7 *
|
||||
* 2 3 4 5 2 9 *
|
||||
* *
|
||||
* Input: SeqForest &db_forest Forest of sequences *
|
||||
* const string &db_name Name of database *
|
||||
* int &seq_len User-specified sequence length *
|
||||
* *
|
||||
* Output: the number of unique sequences in the database *
|
||||
* *
|
||||
********************************************************************/
|
||||
|
||||
int ReadDB(SeqForest &db_forest, const string &db_name,
|
||||
int &seq_len)
|
||||
{
|
||||
ifstream in_db_file(db_name.c_str()); // file to read the database from
|
||||
int db_size = 0; // size of the database
|
||||
int root; // the first element of the sequences
|
||||
// we are reading in at the moment;
|
||||
// i.e., the root of this tree
|
||||
string buff;
|
||||
int db_seq_len;
|
||||
int rev_num;
|
||||
|
||||
if (!in_db_file.is_open()) {
|
||||
cerr<<"WARNING: Cannot open database file " << db_name
|
||||
<< " for input"<<endl<<"Creating a new file"<<endl;
|
||||
return 0;
|
||||
}
|
||||
|
||||
// Check to see if the first line contains "#DBrev:"
|
||||
in_db_file>>buff;
|
||||
if (buff == "#DBrev:") {
|
||||
in_db_file>>rev_num;
|
||||
if (rev_num > DBREV) {
|
||||
cerr << "ERROR: The revision number is greater than " << DBREV
|
||||
<< ". This version of STIDE is only capable of dealing "
|
||||
<< "with databases through DBrev " << DBREV
|
||||
<< ". Aborting..."<<endl;
|
||||
exit(-1);
|
||||
}
|
||||
if (rev_num < DBREV) {
|
||||
cerr << "ERROR: Revision number of database must be >= " << DBREV
|
||||
<< endl;
|
||||
exit(-1);
|
||||
}
|
||||
// Now we know that it is revision DBREV. Check sequence length of
|
||||
// database against user-indicated sequence length
|
||||
in_db_file>>buff;
|
||||
// Now check to see if next line is "#DBseq_len: " followed by a
|
||||
// number
|
||||
if (buff != "#DBseq_len:") {
|
||||
cerr << "ERROR: The second line of the database does not "
|
||||
<< "contain the string \"#DBseq_len: \"" << endl
|
||||
<< "followed by the sequence length of the database, as "
|
||||
<< "required of revision " << DBREV
|
||||
<< " databases. Aborting..."<< endl;
|
||||
exit(-1);
|
||||
}
|
||||
in_db_file>>db_seq_len;
|
||||
if (db_seq_len != seq_len) {
|
||||
cerr << "WARNING: Database sequence length is " << db_seq_len
|
||||
<< ", which does not match "
|
||||
<< "sequence length specified" << endl
|
||||
<< "by user (or by default if no specification was given), "
|
||||
<< "which is " << seq_len << endl
|
||||
<< "I will use the database sequence length. If that is "
|
||||
<< "not what you intended, type Ctrl-C to abort." << endl;
|
||||
seq_len = db_seq_len;
|
||||
}
|
||||
// Read next number into root
|
||||
in_db_file >> root;
|
||||
}
|
||||
// Otherwise, we assume we have an old-style database, and let the
|
||||
// user know that that's our assumption
|
||||
else {
|
||||
cerr << "WARNING: The string \"DBrev: \" is not in the first "
|
||||
<< "line of the database." << endl
|
||||
<< "I'm assuming that it's an older style of database, and "
|
||||
<< "will read it in" << endl
|
||||
<< "based on that assumption. If that is not what you want "
|
||||
<< "me to do, type CTRL-C" << endl << endl;
|
||||
// we have just read the first root into buff -- put it in root
|
||||
// instead
|
||||
root = atoi(buff.c_str());
|
||||
}
|
||||
|
||||
while (!in_db_file.eof()) {
|
||||
if (root == -1) break;
|
||||
db_forest.trees_found[root]++;
|
||||
in_db_file>>db_forest.trees[root];
|
||||
db_size += db_forest.trees[root].NumLeaves();
|
||||
in_db_file>>root;
|
||||
}
|
||||
in_db_file.close();
|
||||
|
||||
return db_size;
|
||||
}
|
||||
|
||||
|
||||
/*********************************************************************
|
||||
* WriteDB() *
|
||||
* Writes db_forest to the file db_name, with the format described *
|
||||
* in the header of ReadDB(). Prints database statistics at the *
|
||||
* end of the file. *
|
||||
* *
|
||||
* Input: const SeqForest &db_forest Forest of sequences in *
|
||||
* database *
|
||||
* const string &db_name Name of file in which to *
|
||||
* put database. *
|
||||
* const int db_size Number of unique sequences *
|
||||
* in the database *
|
||||
* const int seq_len Sequence length *
|
||||
* *
|
||||
* Output: none *
|
||||
********************************************************************/
|
||||
|
||||
void WriteDB(const SeqForest &db_forest, const string &db_name, const
|
||||
int db_size, const int seq_len)
|
||||
{
|
||||
|
||||
ofstream out_db_file(db_name.c_str());
|
||||
|
||||
if (!out_db_file.is_open()) {
|
||||
cerr << "ERROR: Cannot open database file " << db_name
|
||||
<< "for output, aborting..." << endl ;
|
||||
exit(-2);
|
||||
}
|
||||
out_db_file << "#DBrev: " << DBREV << endl;
|
||||
out_db_file << "#DBseq_len: " << seq_len << endl;
|
||||
|
||||
for (int i = 0; i < db_forest.trees.Size(); i++) {
|
||||
if (db_forest.trees_found[i]) {
|
||||
out_db_file<<i<<endl;
|
||||
out_db_file<<db_forest.trees[i]<<endl;
|
||||
}
|
||||
}
|
||||
out_db_file<<" -1"<<endl;
|
||||
// we can now write anything, so I will write the db stats
|
||||
out_db_file<<"; DB STATS"<<endl;
|
||||
WriteDBStats(db_forest, out_db_file, db_size);
|
||||
out_db_file.close();
|
||||
|
||||
}
|
||||
|
||||
/*********************************************************************
|
||||
* FinalReport() *
|
||||
* Reports data at end of run. The number of streams, the number *
|
||||
* of input pairs, and the number of sequences in the input are *
|
||||
* always reported. If we have done a comparison run, we report *
|
||||
* the number of anomalies, and the precentage of sequences that *
|
||||
* were anomalous. Additionally, if asked for, the Hamming *
|
||||
* distance or locality frame count is reported. If we have added *
|
||||
* to the database, we report having done so and report the number *
|
||||
* of sequences added. If database statistics are asked for, we *
|
||||
* report the number of nodes, the number of unique sequences, the *
|
||||
* number of branches, and the average database branch factor. *
|
||||
* *
|
||||
* Input: const Config &cfg: Configuration information *
|
||||
* const SeqForest &normal: DB of normal sequences *
|
||||
* const int num_streams_fnd: Total number of streams found*
|
||||
* const int num_seqs_added: Number of unique sequences *
|
||||
* added *
|
||||
* const Array<Stream> &streams: Array of data streams *
|
||||
* const int db_size: Number of unique sequences *
|
||||
* in DB *
|
||||
* *
|
||||
* Output: none *
|
||||
* *
|
||||
*********************************************************************/
|
||||
|
||||
void FinalReport(const Config &cfg, const SeqForest &normal, const int
|
||||
num_streams_fnd, const int num_seqs_added, const
|
||||
Array<Stream> &streams, const int db_size)
|
||||
{
|
||||
int total_pairs = 0;
|
||||
int total_seqs = 0;
|
||||
int total_anoms = 0;
|
||||
int total_max_lfc = 0;
|
||||
int total_max_hdist = 0;
|
||||
int db_nodes = 0;
|
||||
int db_seqs = 0;
|
||||
int db_branches = 0;
|
||||
int j;
|
||||
|
||||
// Sum up number of pairs input and number of seqs from all the streams
|
||||
for (j = 0; j < num_streams_fnd; j++) {
|
||||
total_seqs += streams[j].GetNumSeqsFnd();
|
||||
total_pairs += streams[j].GetNumPairsRead();
|
||||
}
|
||||
|
||||
cout << endl;
|
||||
cout << "Number of different streams in input = "
|
||||
<< num_streams_fnd << endl;
|
||||
cout << "Total number of input pairs = "
|
||||
<< total_pairs << endl;
|
||||
cout << "Total number of sequences in input = "
|
||||
<< total_seqs << endl;
|
||||
|
||||
if (cfg.add_to_db) {
|
||||
cout << "File added to database" << endl;
|
||||
cout << "Number of new sequences added to the database: "
|
||||
<< num_seqs_added << endl;
|
||||
}
|
||||
else {
|
||||
cout << "Scan completed" << endl;
|
||||
// Sum up number of anomalies from all the streams
|
||||
for (j = 0; j < num_streams_fnd; j++) {
|
||||
total_anoms += streams[j].GetNumAnoms();
|
||||
}
|
||||
|
||||
cout << "Number of anomalies = "
|
||||
<< total_anoms << endl;
|
||||
cout << "Percentage anomalous = "
|
||||
<< ((float)total_anoms * 100.0)/total_seqs << endl;
|
||||
|
||||
// If asked for, compute Hamming distances across streams and report
|
||||
if (cfg.compute_hdist) {
|
||||
for (j = 0; j < num_streams_fnd; j++) {
|
||||
if (streams[j].GetMaxHDist() > total_max_hdist) {
|
||||
total_max_hdist = streams[j].GetMaxHDist();
|
||||
}
|
||||
}
|
||||
cout << "Largest minimum Hamming distance = "
|
||||
<< total_max_hdist << endl;
|
||||
}
|
||||
|
||||
// If asked for, compute lfc across streams and report
|
||||
if (cfg.lf_size > 1) {
|
||||
for (j = 0; j < num_streams_fnd; j++) {
|
||||
if (streams[j].GetMaxLFC() > total_max_lfc) {
|
||||
total_max_lfc = streams[j].GetMaxLFC();
|
||||
}
|
||||
}
|
||||
cout << "Maximum lfc = " << total_max_lfc << endl;
|
||||
}
|
||||
}
|
||||
|
||||
// If asked for, compute db stats and report
|
||||
if (cfg.write_db_stats) {
|
||||
WriteDBStats(normal, cout, db_size);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/*********************************************************************
|
||||
* WriteDBStats() *
|
||||
* Computes and writes to standard output the number of nodes in *
|
||||
* the database, the number of unique sequences, the number of *
|
||||
* branches, and the average database branch factor. *
|
||||
* *
|
||||
* Input: const SeqForest &db_forest Forest of sequences in *
|
||||
* database *
|
||||
* ostream &out_stream Where to write info *
|
||||
* const int db_size Number of unique sequences in the *
|
||||
* database *
|
||||
* *
|
||||
* Output: none *
|
||||
*********************************************************************/
|
||||
|
||||
void WriteDBStats(const SeqForest &db_forest, ostream &out_stream,
|
||||
const int db_size)
|
||||
{
|
||||
int db_nodes = 0;
|
||||
int db_branches = 0;
|
||||
|
||||
for (int i = 0; i < db_forest.trees.Size(); i++) {
|
||||
if (db_forest.trees_found[i]) {
|
||||
db_nodes += db_forest.trees[i].NumNodes();
|
||||
db_branches += db_forest.trees[i].NumBranches();
|
||||
}
|
||||
}
|
||||
|
||||
out_stream << "Number of DB nodes = " << db_nodes << endl;
|
||||
out_stream << "Number of unique sequences = "<<db_size << endl;
|
||||
out_stream << "Number of branches (edges) = "<<db_branches << endl;
|
||||
out_stream << "Average DB branch factor = "
|
||||
<<((float)db_branches/(db_nodes - db_size))<<endl;
|
||||
|
||||
}
|
||||
|
||||
|
||||
/*********************************************************************
|
||||
* OutputGraph() *
|
||||
* Writes a file db_name.dot containing input for the program Dot. *
|
||||
* Running Dot on db_name.dot produces a PostScript file *
|
||||
* containing a picture of the whole database tree. *
|
||||
* *
|
||||
* Input: const SeqForest &db_forest Forest of sequences in *
|
||||
* database *
|
||||
* const string db_name Filename to use *
|
||||
* *
|
||||
* Output: none *
|
||||
*********************************************************************/
|
||||
|
||||
void OutputGraph(const SeqForest &db_forest, const string db_name)
|
||||
{
|
||||
char *dot_filename;
|
||||
dot_filename = new char [strlen(db_name.c_str())+4];
|
||||
strcpy(dot_filename, db_name.c_str());
|
||||
ofstream output_file(strcat(dot_filename,".dot"));
|
||||
|
||||
output_file<<"digraph \""<<db_name<<"\" {"<<endl;
|
||||
output_file<<" ratio=auto;"<<endl;
|
||||
output_file<<" page=\"8.5,11\";"<<endl;
|
||||
for (int i = 0; i < db_forest.trees.Size(); i++) {
|
||||
if (db_forest.trees_found[i])
|
||||
db_forest.trees[i].OutputGraph(output_file);
|
||||
}
|
||||
output_file<<"}"<<endl;
|
||||
output_file.close();
|
||||
}
|
||||
|
||||
|
||||
/****************************************************************************
|
||||
* GetPrimeLargerThan(int n) *
|
||||
* Returns the smallest prime larger than the input integer. *
|
||||
* Changes no values. *
|
||||
* *
|
||||
* Input: const int n *
|
||||
* Output: smallest prime larger than n *
|
||||
***************************************************************************/
|
||||
|
||||
int GetPrimeLargerThan(const int n)
|
||||
{
|
||||
int primes[n];
|
||||
int primes_fnd = 1;
|
||||
int curr_num = 3;
|
||||
int is_prime = 1;
|
||||
|
||||
primes[0] = 2;
|
||||
while(1) {
|
||||
for (int i = 0; i < primes_fnd; i++) {
|
||||
if ((curr_num % primes[i]) == 0) {
|
||||
is_prime = 0;
|
||||
break;
|
||||
}
|
||||
}
|
||||
if (is_prime == 1) {
|
||||
primes[primes_fnd++] = curr_num;
|
||||
if (curr_num > n) {
|
||||
break;
|
||||
}
|
||||
}
|
||||
curr_num = curr_num + 2;
|
||||
is_prime = 1;
|
||||
}
|
||||
return curr_num;
|
||||
}
|
||||
|
||||
|
||||
|
35
11/wywolania/Data/stide_v1.1/Seq-code/template.cc
Executable file
35
11/wywolania/Data/stide_v1.1/Seq-code/template.cc
Executable file
@ -0,0 +1,35 @@
|
||||
#include "../Utils/arrays.cc"
|
||||
#include "../Utils/tll.cc"
|
||||
#include "../Utils/hash.cc"
|
||||
#include "seq_stream.h"
|
||||
#include "flexitree.h"
|
||||
#include "opt_info.h"
|
||||
|
||||
/*
|
||||
template class List<FlexiTree>;
|
||||
template class LLNode<FlexiTree>;
|
||||
template class LinkedList<FlexiTree>;
|
||||
*/
|
||||
template class Array<FlexiTree>;
|
||||
|
||||
template class List<int>;
|
||||
template class LinkedList<int>;
|
||||
template class Array<int>;
|
||||
template class Array<LinkedList<int> >;
|
||||
|
||||
template class List<HashItem>;
|
||||
template class LLNode<HashItem>;
|
||||
template class LinkedList<HashItem>;
|
||||
template class Array<LinkedList<HashItem> >;
|
||||
|
||||
template class List<HashItemInt>;
|
||||
template class LLNode<HashItemInt>;
|
||||
template class LinkedList<HashItemInt>;
|
||||
template class Array<LinkedList<HashItemInt> >;
|
||||
template class Array<HashItemInt>;
|
||||
|
||||
template class Array<Stream>;
|
||||
template class Array<char*>;
|
||||
template class Array<OptInfo>;
|
||||
|
||||
|
141
11/wywolania/Data/stide_v1.1/Utils/arrays.cc
Executable file
141
11/wywolania/Data/stide_v1.1/Utils/arrays.cc
Executable file
@ -0,0 +1,141 @@
|
||||
// **********
|
||||
// ARRAYS.CPP
|
||||
// **********
|
||||
|
||||
#include <iostream.h>
|
||||
#include <assert.h>
|
||||
|
||||
#include "arrays.h"
|
||||
|
||||
//=============================================================================
|
||||
template <class T> void Array<T>::Init(const Array<T> &t) {
|
||||
Allocate(t.size);
|
||||
assert(size == t.size);
|
||||
for (int i = 0; i < size; i++)
|
||||
data[i] = t.data[i];
|
||||
}
|
||||
//=============================================================================
|
||||
template <class T> void Array<T>::Allocate(int as) {
|
||||
// if previously allocated, delete old dynamic array
|
||||
if (size) delete[] data;
|
||||
size = as;
|
||||
data = new T[size];
|
||||
assert(data);
|
||||
}
|
||||
//=============================================================================
|
||||
template <class T> Array<T>::~Array() {
|
||||
delete[] data;
|
||||
}
|
||||
//=============================================================================
|
||||
template <class T> T &Array<T>::operator[](int i) const {
|
||||
if (i < 0) { cout<<"ERROR in []: "<<i<<"< 0"<<endl; exit(-1); }
|
||||
if (i >= size) { cout<<"ERROR in []: "<<i<<" >= "<<size<<endl; exit(-1); }
|
||||
assert(i >= 0);
|
||||
assert(i < size);
|
||||
return data[i];
|
||||
}
|
||||
//=============================================================================
|
||||
template <class T> T &Array<T>::Data(int i) {
|
||||
if (i < 0) { cout<<"ERROR in Data: "<<i<<"< 0"<<endl; exit(-1); }
|
||||
if (i >= size) { cout<<"ERROR in Data: "<<i<<" >= "<<size<<endl; exit(-1); }
|
||||
assert(i >= 0);
|
||||
assert(i < size);
|
||||
return data[i];
|
||||
}
|
||||
//=============================================================================
|
||||
template <class T> Array<T> &Array<T>::operator = (const Array<T> &t) {
|
||||
if (!size) // if the object in not yet allocated, do it and then assign
|
||||
Allocate(t.size);
|
||||
assert(size == t.size);
|
||||
for (int i = 0; i < size; i++)
|
||||
data[i] = t.data[i];
|
||||
return *this;
|
||||
}
|
||||
//=============================================================================
|
||||
template <class T> int Array<T>::Size() const {
|
||||
return size;
|
||||
}
|
||||
//=============================================================================
|
||||
template <class T> ostream &operator<<(ostream &s, const Array<T> &t) {
|
||||
for (int i =0; i < t.size; i++)
|
||||
s<<t.data[i]<<" ";
|
||||
return s;
|
||||
}
|
||||
//=============================================================================
|
||||
template <class T> void Array<T>::Set(T t) {
|
||||
for (int i =0; i < size; i++)
|
||||
data[i] = t;
|
||||
}
|
||||
//=============================================================================
|
||||
//=============================================================================
|
||||
// HeapSort data[0..size-1] DESCENDING
|
||||
template <class T> void SortableArray<T>::Sort() {
|
||||
// build the heap
|
||||
for (int i = Size()-1; i >= 0; i--)
|
||||
Adjust(i, Size()-1);
|
||||
|
||||
for (int i = Size()-1; i >= 1; i--) {
|
||||
// swap data
|
||||
T temp1 = Data(0);
|
||||
Data(0) = Data(i);
|
||||
Data(i) = temp1;
|
||||
|
||||
Adjust(0, i-1);
|
||||
}
|
||||
}
|
||||
//=============================================================================
|
||||
template <class T> void SortableArray<T>::Adjust(int root, int last) {
|
||||
if (2*root <= last) {
|
||||
int child = 2*root;
|
||||
if ((child+1) <= last) {
|
||||
if (Data(child+1) < Data(child))
|
||||
child++;
|
||||
}
|
||||
if (Data(child) < Data(root)) {
|
||||
T temp = Data(root);
|
||||
Data(root) = Data(child);
|
||||
Data(child) = temp;
|
||||
|
||||
Adjust(child, last);
|
||||
}
|
||||
}
|
||||
}
|
||||
//=============================================================================
|
||||
//=============================================================================
|
||||
// HeapSort data[0..size-1] DESCENDING
|
||||
template <class T, class C> void CompSortableArray<T, C>::Sort() {
|
||||
// build the heap
|
||||
int sz = Size();
|
||||
for (int i = sz-1; i >= 0; i--) {
|
||||
Adjust(i, sz-1);
|
||||
}
|
||||
|
||||
for (i = sz-1; i >= 1; i--) {
|
||||
// do the swap
|
||||
T temp1 = Data(0);
|
||||
Data(0) = Data(i);
|
||||
Data(i) = temp1;
|
||||
Adjust(0, i-1);
|
||||
}
|
||||
}
|
||||
//=============================================================================
|
||||
template <class T, class C> void CompSortableArray<T, C>::Adjust(int root,
|
||||
int last) {
|
||||
if (2*root <= last) {
|
||||
int child = 2*root;
|
||||
if ((child+1) <= last) {
|
||||
if (comp_ptr->Compare(Data(child+1), Data(child)) == -1) {
|
||||
child++;
|
||||
}
|
||||
}
|
||||
if (comp_ptr->Compare(Data(child), Data(root)) == -1) {
|
||||
T temp = Data(root);
|
||||
Data(root) = Data(child);
|
||||
Data(child) = temp;
|
||||
|
||||
Adjust(child, last);
|
||||
}
|
||||
}
|
||||
}
|
||||
//=============================================================================
|
||||
|
109
11/wywolania/Data/stide_v1.1/Utils/arrays.h
Executable file
109
11/wywolania/Data/stide_v1.1/Utils/arrays.h
Executable file
@ -0,0 +1,109 @@
|
||||
// ********
|
||||
// ARRAYS.H
|
||||
// ********
|
||||
|
||||
#ifndef ARRAYS_H
|
||||
#define ARRAYS_H
|
||||
|
||||
//#define PC
|
||||
|
||||
#include <iostream.h>
|
||||
#include "errors.h"
|
||||
|
||||
// this is a template for all classes which use an array of objects
|
||||
// it is dynamically allocated
|
||||
template <class T> class Array {
|
||||
public:
|
||||
Array() {Init();}
|
||||
Array(const Array<T> &t) {Init(t);} // the copy constructor
|
||||
Array(int asize) {Init(); Allocate(asize);} // creates an array of size "asize"
|
||||
~Array(); // the destructor deletes all internal
|
||||
// objects
|
||||
void Allocate(int asize); // allocates asize objects
|
||||
// if data was already allocated,
|
||||
// deletes and re-allocates
|
||||
T &operator[](int i) const;
|
||||
Array<T> &operator = (const Array<T> &t);
|
||||
// copies one array to another,
|
||||
// requires that the assignment
|
||||
// operator be defined for array
|
||||
// elements
|
||||
void Set(T t); // sets all elements to t
|
||||
int Size() const; // returns the size of the array
|
||||
friend ostream &operator<<(ostream &s, const Array<T> &t);
|
||||
protected:
|
||||
T &Data(int i); // method derived class can use for
|
||||
// accessing data
|
||||
void Init() {size = 0; data = NULL;}// default intialisor
|
||||
void Init(const Array<T> &t); // implements copy constructor
|
||||
private:
|
||||
int size; // the size of the array
|
||||
T *data; // ptr to the array of objects
|
||||
};
|
||||
|
||||
// this is a template for sortable arrays of objects, i.e. the objects provide
|
||||
// a less than comparison operator, which is used in the Sort method to perform
|
||||
// a heap sort
|
||||
template <class T> class SortableArray : public Array<T> {
|
||||
public:
|
||||
SortableArray() {Init();}
|
||||
SortableArray(const SortableArray<T> &t) {Init(t);}
|
||||
SortableArray(int asize) {Allocate(asize);}
|
||||
void Sort(); // performs a heapsort on the data,
|
||||
// using the < operator
|
||||
protected:
|
||||
void Adjust(int root, int last); // for the heap sort
|
||||
};
|
||||
|
||||
// this is a template for sortable arrays of objects, but the comparison
|
||||
// operator is provided by another class C
|
||||
template <class T, class C> class CompSortableArray : public Array<T> {
|
||||
public:
|
||||
CompSortableArray() {Init();}
|
||||
CompSortableArray(int asize, C *c_ptr) {Allocate(asize); comp_ptr = c_ptr;}
|
||||
void Sort(); // performs a heapsort on the data,
|
||||
// using comp_ptr->Compare
|
||||
protected:
|
||||
C *comp_ptr; // a ptr to the object with the Compare
|
||||
// method
|
||||
void Adjust(int root, int last);
|
||||
};
|
||||
|
||||
// this is a template for a multidimensional array of one type of object
|
||||
// when declaring this one must specify the number of dimensions first,
|
||||
// followed by the size for each array dimension
|
||||
/*
|
||||
template <class T> class MultiArray {
|
||||
public:
|
||||
MultiArray() {Init();}
|
||||
MultiArray(const MultiArray<T> &t) {Init(t);} // the copy constructor
|
||||
MultiArray(int dims, int x, ...); // a variable number of parameters
|
||||
{Init(); Allocate(xsize, ysize);}
|
||||
~Array2D(); // the destructor deletes all internal
|
||||
// objects
|
||||
void Allocate(int xsize, ...); // allocates x, y, ... size array
|
||||
// if data was already allocated,
|
||||
// deletes and re-allocates
|
||||
T Data(int x, int y); // returns object in x,y location
|
||||
Array2D<T> &operator = (const Array2D<T> &t);
|
||||
// copies one array to another,
|
||||
// requires that the assignment
|
||||
// operator be defined for array
|
||||
// elements
|
||||
void Set(T t); // sets all elements to t
|
||||
int XSize(); // returns the x size of the array
|
||||
int YSize(); // returns the y size of the array
|
||||
friend ostream &operator<<(ostream &s, const Array2D<T> &t);
|
||||
protected:
|
||||
T &Data(int i); // method derived class can use for
|
||||
// accessing data
|
||||
void Init() {size = 0; data = NULL;}// default intialisor
|
||||
void Init(const Array<T> &t); // implements copy constructor
|
||||
protected:
|
||||
Array<Array<T> > data;
|
||||
};
|
||||
*/
|
||||
#endif
|
||||
|
||||
|
||||
|
16
11/wywolania/Data/stide_v1.1/Utils/cstrings.h
Executable file
16
11/wywolania/Data/stide_v1.1/Utils/cstrings.h
Executable file
@ -0,0 +1,16 @@
|
||||
/*
|
||||
This class implements strings. It is meant to offer all the functionality
|
||||
of strings in C, so whenever a C function is needed that manipulates strings,
|
||||
it must be coded into this.
|
||||
*/
|
||||
|
||||
class String {
|
||||
public:
|
||||
String(void) {data = NULL; dsize = 0;}
|
||||
String(char *init) {dsize = strlen(init); data = new char[dsize];}
|
||||
~String(void) {if (dsize) delete data;}
|
||||
|
||||
private:
|
||||
char *data;
|
||||
int dsize;
|
||||
};
|
20
11/wywolania/Data/stide_v1.1/Utils/errors.cc
Executable file
20
11/wywolania/Data/stide_v1.1/Utils/errors.cc
Executable file
@ -0,0 +1,20 @@
|
||||
// **********
|
||||
// ERRORS.CPP
|
||||
// **********
|
||||
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <iostream.h>
|
||||
#include "errors.h"
|
||||
|
||||
void Error(const char *msg, ...) {
|
||||
char buffer[150];
|
||||
va_list ap;
|
||||
va_start(ap, msg);
|
||||
vsprintf(buffer, msg, ap);
|
||||
cout<<endl<<buffer<<endl;
|
||||
va_end(ap);
|
||||
exit(-1);
|
||||
}
|
||||
|
||||
|
16
11/wywolania/Data/stide_v1.1/Utils/errors.h
Executable file
16
11/wywolania/Data/stide_v1.1/Utils/errors.h
Executable file
@ -0,0 +1,16 @@
|
||||
// ********
|
||||
// ERRORS.H
|
||||
// ********
|
||||
|
||||
#ifndef ERRORS_H
|
||||
#define ERRORS_H
|
||||
|
||||
#include <stdarg.h>
|
||||
#include <assert.h>
|
||||
|
||||
// this function takes a formatted character string and params like printf,
|
||||
// prints a formatted message, and then aborts the program. Its used for
|
||||
// trapping errors and halting execution.
|
||||
void Error(const char *msg, ...);
|
||||
|
||||
#endif
|
182
11/wywolania/Data/stide_v1.1/Utils/hash.cc
Executable file
182
11/wywolania/Data/stide_v1.1/Utils/hash.cc
Executable file
@ -0,0 +1,182 @@
|
||||
// hash.cpp
|
||||
|
||||
#include "hash.h"
|
||||
|
||||
//===========================================================================
|
||||
HashItem::HashItem(char *s, int v) {
|
||||
if (strlen(s) > STR_LEN) {
|
||||
cout<<endl<<"Hash item string too long";
|
||||
exit(-1);
|
||||
}
|
||||
strcpy(str, s); value = v;
|
||||
}
|
||||
//===========================================================================
|
||||
void HashItem::Set(char *s, int v) {
|
||||
if (strlen(s) > STR_LEN) {
|
||||
cout<<endl<<"Hash item string too long";
|
||||
exit(-1);
|
||||
}
|
||||
strcpy(str, s); value = v;
|
||||
}
|
||||
//===========================================================================
|
||||
void HashTable::Insert(HashItem &h_item) {
|
||||
int hash_index = HashFunc(h_item.str);
|
||||
#ifdef DBG
|
||||
cout<<hash_index;cout.flush();
|
||||
#endif
|
||||
data[hash_index].Insert(h_item);
|
||||
}
|
||||
//===========================================================================
|
||||
int HashTable::Retrieve(HashItem &h_item) {
|
||||
int hash_index = HashFunc(h_item.str);
|
||||
HashItem *temp_item_ptr = data[hash_index].Search(h_item);
|
||||
if (!temp_item_ptr) return 0;
|
||||
else {
|
||||
h_item = *temp_item_ptr;
|
||||
return 1;
|
||||
}
|
||||
}
|
||||
//===========================================================================
|
||||
unsigned HashTable::HashFunc(char *str) {
|
||||
unsigned k = 0;
|
||||
for (int i = 0; i < strlen(str); i++) {
|
||||
k += (unsigned)str[i] << (i * 8);
|
||||
}
|
||||
return (k % data.Size());
|
||||
}
|
||||
//=============================================================================
|
||||
ostream &operator<<(ostream &s, HashTable &ht) {
|
||||
for (int i = 0; i < ht.data.Size(); i++) {
|
||||
if (!ht.data[i].Empty()) {
|
||||
ht.data[i].Write(s);
|
||||
s<<endl;
|
||||
}
|
||||
}
|
||||
return s;
|
||||
}
|
||||
//===========================================================================
|
||||
// for int hash tables
|
||||
//===========================================================================
|
||||
HashItemInt &HashItemInt::operator = (const HashItemInt &h_item) {
|
||||
key = h_item.key;
|
||||
value = h_item.value;
|
||||
return *this;
|
||||
}
|
||||
//===========================================================================
|
||||
void HashTableInt::Insert(HashItemInt &h_item) {
|
||||
int hash_index = HashFunc(h_item.key);
|
||||
data[hash_index].Insert(h_item);
|
||||
}
|
||||
//===========================================================================
|
||||
int HashTableInt::Retrieve(HashItemInt &h_item) {
|
||||
int hash_index = HashFunc(h_item.key);
|
||||
HashItemInt *temp_item_ptr;
|
||||
temp_item_ptr = data[hash_index].Search(h_item);
|
||||
if (!temp_item_ptr) return 0;
|
||||
else {
|
||||
h_item = *temp_item_ptr;
|
||||
return 1;
|
||||
}
|
||||
}
|
||||
//===========================================================================
|
||||
unsigned HashTableInt::HashFunc(int key) {
|
||||
return (key % data.Size());
|
||||
}
|
||||
//=============================================================================
|
||||
ostream &operator<<(ostream &s, HashTableInt &ht) {
|
||||
for (int i = 0; i < ht.data.Size(); i++) {
|
||||
if (!ht.data[i].Empty()) {
|
||||
ht.data[i].Write(s);
|
||||
s<<endl;
|
||||
}
|
||||
}
|
||||
return s;
|
||||
}
|
||||
//===========================================================================
|
||||
void HashTableInt::PutInArray(Array<HashItemInt> &h_array, int &num_items) {
|
||||
num_items = 0;
|
||||
HashItemInt h_item;
|
||||
for (int i = 0; i < data.Size(); i++) {
|
||||
if (!data[i].Empty()) { // now iterate through the linked list
|
||||
int start = 1;
|
||||
while (data[i].GetNext(h_item, start)) {
|
||||
h_array[num_items].Set(h_item.key, h_item.value);
|
||||
start = 0;
|
||||
num_items++;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
//===========================================================================
|
||||
int HashTableInt::ExtToInt(int key, int next_value)
|
||||
{
|
||||
HashItemInt h_item(key, 0);
|
||||
|
||||
// Check to see if we know this one. 0 matches any number. If we
|
||||
// do know this one, h_item.value gets set to what we knew it to be.
|
||||
if (!Retrieve(h_item)) {
|
||||
h_item.Set(key, next_value);
|
||||
Insert(h_item);
|
||||
}
|
||||
return (h_item.value);
|
||||
}
|
||||
|
||||
/*
|
||||
// to test out the hash table
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <fstream.h>
|
||||
|
||||
#define MAX_SYS_CALLS 255
|
||||
//============================================================================
|
||||
int GetCalls(HashTable &ht) {
|
||||
ifstream calls_file("calls.txt");
|
||||
char buff[255];
|
||||
int buff_len;
|
||||
int num_sys_calls = 0;
|
||||
HashItem h_item;
|
||||
while (!calls_file.eof() && num_sys_calls < MAX_SYS_CALLS) {
|
||||
calls_file.getline(buff, 254);
|
||||
buff_len = strlen(buff);
|
||||
if (buff_len) {
|
||||
// cat on a parenth to make sure only calls are matched
|
||||
strcat(buff, "(");
|
||||
#ifdef DBG
|
||||
cout<<endl<<buff; cout.flush();
|
||||
#endif
|
||||
h_item.Set(buff, num_sys_calls);
|
||||
ht.Insert(h_item);
|
||||
num_sys_calls++;
|
||||
}
|
||||
}
|
||||
calls_file.close();
|
||||
if (num_sys_calls == MAX_SYS_CALLS) return 0;
|
||||
else return 1;
|
||||
}
|
||||
//============================================================================
|
||||
void main(void) {
|
||||
HashTable hashtable(701);
|
||||
HashItem h_item("unlink(", 0);
|
||||
if (GetCalls(hashtable)) {
|
||||
cout<<endl<<hashtable;
|
||||
h_item.Set("unlink(", 0);
|
||||
if (hashtable.Retrieve(h_item))
|
||||
cout<<endl<<" unlink found, index = "<<h_item.value;
|
||||
else cout<<endl<<" unlink not found";
|
||||
h_item.Set("get_kernel_syms(", 0);
|
||||
if (hashtable.Retrieve(h_item))
|
||||
cout<<endl<<" get_kernel_syms found, index = "<<h_item.value;
|
||||
else cout<<endl<<" get_kernel_syms not found";
|
||||
h_item.Set("hello(", 0);
|
||||
if (hashtable.Retrieve(h_item))
|
||||
cout<<endl<<" hello found, index = "<<h_item.value;
|
||||
else cout<<endl<<" hello not found";
|
||||
h_item.Set("setsockopt(", 0);
|
||||
if (hashtable.Retrieve(h_item))
|
||||
cout<<endl<<" setsockopt found, index = "<<h_item.value;
|
||||
else cout<<endl<<" setsockopt not found";
|
||||
}
|
||||
}
|
||||
*/
|
||||
|
||||
|
71
11/wywolania/Data/stide_v1.1/Utils/hash.h
Executable file
71
11/wywolania/Data/stide_v1.1/Utils/hash.h
Executable file
@ -0,0 +1,71 @@
|
||||
#ifndef __HASH_H
|
||||
#define __HASH_H
|
||||
|
||||
#define STR_LEN 100
|
||||
#include <string.h>
|
||||
#include "arrays.h"
|
||||
#include "tll.h"
|
||||
|
||||
//#define DBG
|
||||
|
||||
class HashItem {
|
||||
public:
|
||||
HashItem(void) {strcpy(str, ""); value = 0;}
|
||||
HashItem(char *s, int v);
|
||||
void Set(char *s, int v);
|
||||
int operator == (const HashItem &h_item) {return !strcmp(str, h_item.str);}
|
||||
friend ostream &operator<<(ostream &s, HashItem &h_item) {
|
||||
s<<h_item.str<<":"<<h_item.value; return s;}
|
||||
int value;
|
||||
char str[STR_LEN];
|
||||
};
|
||||
|
||||
class HashTable {
|
||||
public:
|
||||
HashTable(int size) {data.Allocate(size);}
|
||||
void Insert(HashItem &h_item); // we insert a complete item, i.e. the
|
||||
// str and its assoc.
|
||||
int Retrieve(HashItem &h_item); // returns 0 if item is not found
|
||||
// we retrieve a complete item, the value
|
||||
// of assoc is not specified beforehand,
|
||||
// and is returned in h_item
|
||||
friend ostream &operator<<(ostream &s, HashTable &ht);
|
||||
private:
|
||||
Array<LinkedList<HashItem> > data;
|
||||
unsigned HashFunc(char *str);
|
||||
};
|
||||
|
||||
// these store ints, not strings
|
||||
class HashItemInt {
|
||||
public:
|
||||
HashItemInt(void) {key = 0; value = 0;}
|
||||
HashItemInt(int k, int v) {key = k; value = v;}
|
||||
// the copy constructor
|
||||
HashItemInt(const HashItemInt &h_item) {key = h_item.key; value = h_item.value;}
|
||||
void Set(int k, int v) {key = k; value = v;}
|
||||
int operator == (const HashItemInt &h_item)
|
||||
{return ((key == h_item.key) ? 1 : 0);}
|
||||
HashItemInt &operator = (const HashItemInt &h_item);
|
||||
friend ostream &operator<<(ostream &s, HashItemInt &h_item) {
|
||||
s<<h_item.key<<":"<<h_item.value; return s;}
|
||||
int value, key;
|
||||
};
|
||||
|
||||
class HashTableInt {
|
||||
public:
|
||||
HashTableInt(int size) {data.Allocate(size);}
|
||||
void Insert(HashItemInt &h_item); // we insert a complete item, i.e. the
|
||||
// str and its assoc.
|
||||
int Retrieve(HashItemInt &h_item); // returns 0 if item is not found
|
||||
// we retrieve a complete item, the value
|
||||
// of assoc is not specified beforehand,
|
||||
// and is returned in h_item
|
||||
friend ostream &operator<<(ostream &s, HashTableInt &ht);
|
||||
void PutInArray(Array<HashItemInt> &h_array, int &num_items); // puts it into a linear array
|
||||
int ExtToInt(int key, int next_value);
|
||||
private:
|
||||
Array<LinkedList<HashItemInt> > data;
|
||||
unsigned HashFunc(int key);
|
||||
};
|
||||
|
||||
#endif
|
87
11/wywolania/Data/stide_v1.1/Utils/krand.cc
Executable file
87
11/wywolania/Data/stide_v1.1/Utils/krand.cc
Executable file
@ -0,0 +1,87 @@
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <iostream.h>
|
||||
#include "krand.h"
|
||||
|
||||
#include <time.h>
|
||||
|
||||
#define MBIG 1000000000L
|
||||
#define MSEED 161803398L
|
||||
#define FAC (1.0 / MBIG)
|
||||
|
||||
static int inext;
|
||||
static int inextp;
|
||||
static long ma[56];
|
||||
|
||||
double knuth_random(void) {
|
||||
long mj;
|
||||
if (++inext == 56) inext = 1;
|
||||
if (++inextp == 56) inextp = 1;
|
||||
mj = ma[inext] - ma[inextp];
|
||||
if (mj < 0) mj += MBIG;
|
||||
ma[inext] = mj;
|
||||
return mj * FAC;
|
||||
}
|
||||
|
||||
long seed_random(long seed) {
|
||||
long mj, mk;
|
||||
register int i, k;
|
||||
|
||||
if (seed < 0) {
|
||||
time_t tp;
|
||||
seed = time(&tp);
|
||||
}
|
||||
|
||||
if (seed >= MBIG) {
|
||||
cerr<<"Seed value too big (> "<<MBIG<<") in knuth_srand().";
|
||||
exit(1);
|
||||
}
|
||||
|
||||
ma[55] = mj = seed;
|
||||
mk = 1;
|
||||
|
||||
for (i = 1; i <= 54; i++) {
|
||||
register int ii = (21 * i) % 55;
|
||||
ma[ii] = mk;
|
||||
mk = mj - mk;
|
||||
if (mk < 0) mk += MBIG;
|
||||
mj = ma[ii];
|
||||
}
|
||||
|
||||
for (k = 0; k < 4; k++) {
|
||||
for (i = 1; i <= 55; i++) {
|
||||
ma[i] -= ma[1 + (i + 30) % 55];
|
||||
if (ma[i] < 0) ma[i] += MBIG;
|
||||
}
|
||||
}
|
||||
|
||||
inext = 0;
|
||||
inextp = 31;
|
||||
|
||||
return seed;
|
||||
}
|
||||
|
||||
int krandom(int max) {
|
||||
int retval = (int)(knuth_random() * max);
|
||||
if (retval < 0 || retval >= max) {
|
||||
cout<<"ERROR: random num generator out of bounds!"<<endl;
|
||||
exit(-1);
|
||||
}
|
||||
return retval;
|
||||
}
|
||||
|
||||
|
||||
/*
|
||||
//for testing
|
||||
void main(void) {
|
||||
seed_random(100);
|
||||
for (int i = 0; i < 24; i++) cout<<i<<" "<<knuth_random()<<" "<<krandom(100)<<endl;
|
||||
cin.get();
|
||||
seed_random(200);
|
||||
for (i = 0; i < 24; i++) cout<<i<<" "<<knuth_random()<<" "<<krandom(100)<<endl;
|
||||
cin.get();
|
||||
seed_random(100);
|
||||
for (i = 0; i < 24; i++) cout<<i<<" "<<knuth_random()<<" "<<krandom(100)<<endl;
|
||||
cin.get();
|
||||
}
|
||||
*/
|
13
11/wywolania/Data/stide_v1.1/Utils/krand.h
Executable file
13
11/wywolania/Data/stide_v1.1/Utils/krand.h
Executable file
@ -0,0 +1,13 @@
|
||||
#ifndef __KRAND_H
|
||||
#define __KRAND_H
|
||||
/*
|
||||
knuth-random.h
|
||||
declarations for krand.cc
|
||||
*/
|
||||
|
||||
double knuth_random(void);
|
||||
long seed_random(long);
|
||||
int krandom(int);
|
||||
|
||||
#endif
|
||||
|
132
11/wywolania/Data/stide_v1.1/Utils/linklist.cc
Executable file
132
11/wywolania/Data/stide_v1.1/Utils/linklist.cc
Executable file
@ -0,0 +1,132 @@
|
||||
// linklist.cpp
|
||||
|
||||
#include "linklist.h"
|
||||
|
||||
// data structures:
|
||||
// node for a linked list
|
||||
class LLNode {
|
||||
public:
|
||||
int val; // the value at this node
|
||||
LLNode *next; // pointer to the next node
|
||||
};
|
||||
//===========================================================================
|
||||
LinkedList::LinkedList(void) {
|
||||
root = NULL;
|
||||
}
|
||||
//===========================================================================
|
||||
LinkedList::LinkedList(const LinkedList &llist) { // the copy constructor
|
||||
root = NULL;
|
||||
LLNode *temp_ptr = llist.root;
|
||||
while (temp_ptr) {
|
||||
Insert(temp_ptr->val);
|
||||
temp_ptr = temp_ptr->next;
|
||||
}
|
||||
}
|
||||
//===========================================================================
|
||||
LinkedList &LinkedList::operator = (const LinkedList &llist) {
|
||||
root = NULL;
|
||||
LLNode *temp_ptr = llist.root;
|
||||
while (temp_ptr) {
|
||||
Insert(temp_ptr->val);
|
||||
temp_ptr = temp_ptr->next;
|
||||
}
|
||||
return *this;
|
||||
}
|
||||
//============================================================================
|
||||
LinkedList::~LinkedList(void) {
|
||||
if (root) {
|
||||
LLNode *temp_ptr = root->next, *next_temp_ptr;
|
||||
delete root;
|
||||
while (temp_ptr) {
|
||||
next_temp_ptr = temp_ptr->next;
|
||||
delete temp_ptr;
|
||||
temp_ptr = next_temp_ptr;
|
||||
}
|
||||
}
|
||||
}
|
||||
//===========================================================================
|
||||
// returns 1 if there was no copy, 0 otherwise
|
||||
int LinkedList::Insert(int val) {
|
||||
if (!root) {
|
||||
root = new LLNode;
|
||||
root->val = val;
|
||||
root->next = NULL;
|
||||
return 1;
|
||||
} else {
|
||||
if (!Search(val)) { // only put in if it is not already in - this is ineff.
|
||||
LLNode temp_node;
|
||||
temp_node.val = root->val;
|
||||
temp_node.next = root->next;
|
||||
root->val = val;
|
||||
root->next = new LLNode;
|
||||
root->next->val = temp_node.val;
|
||||
root->next->next = temp_node.next;
|
||||
return 1;
|
||||
}
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
//===========================================================================
|
||||
int LinkedList::Search(int val) {
|
||||
LLNode *curr_ptr = root;
|
||||
while (curr_ptr) {
|
||||
if (curr_ptr->val == val) return 1;
|
||||
else curr_ptr = curr_ptr->next;
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
//===========================================================================
|
||||
void LinkedList::Write(ostream &s) {
|
||||
LLNode *curr_ptr = root;
|
||||
while (curr_ptr) {
|
||||
s<<curr_ptr->val<<" ";
|
||||
curr_ptr = curr_ptr->next;
|
||||
}
|
||||
}
|
||||
//=============================================================================
|
||||
ostream &operator<<(ostream &s, LinkedList &ll) {
|
||||
ll.Write(s);
|
||||
return s;
|
||||
}
|
||||
//===========================================================================
|
||||
|
||||
/*
|
||||
// this is for testing the linked list
|
||||
// something similar should be done for all data structures
|
||||
// using the Test func on the base class, we can test all descendants with
|
||||
// those methods
|
||||
#include "arrays.h"
|
||||
|
||||
void Test(LinkedList &list) {
|
||||
list.Insert(10);
|
||||
list.Insert(5);
|
||||
list.Insert(3);
|
||||
list.Insert(5);
|
||||
list.Insert(7);
|
||||
list.Insert(26);
|
||||
list.Insert(13);
|
||||
list.Insert(26);
|
||||
cout<<endl;
|
||||
list.Write(cout);
|
||||
cout<<endl<<list.Search(5);
|
||||
cout<<endl<<list.Search(0);
|
||||
cout<<endl<<list.Search(13);
|
||||
}
|
||||
|
||||
void TestArray(Array<LinkedList> &larray) {
|
||||
for (int i = 0; i < larray.Size(); i++)
|
||||
Test(larray[i]);
|
||||
}
|
||||
|
||||
#include <fstream.h>
|
||||
|
||||
void main(void) {
|
||||
ifstream inf;
|
||||
Array<LinkedList> ll(200);
|
||||
TestArray(ll);
|
||||
ll[0].Insert(999);
|
||||
cout<<endl<<ll[0];
|
||||
ll[3] = ll[0];
|
||||
cout<<endl<<ll[3]<<" = "<<ll[0];
|
||||
}
|
||||
*/
|
22
11/wywolania/Data/stide_v1.1/Utils/linklist.h
Executable file
22
11/wywolania/Data/stide_v1.1/Utils/linklist.h
Executable file
@ -0,0 +1,22 @@
|
||||
// linklist.h
|
||||
#include <iostream.h>
|
||||
#include "list.h"
|
||||
|
||||
class LLNode;
|
||||
class LinkedList : public List {
|
||||
LLNode *root;
|
||||
public:
|
||||
LinkedList(void);
|
||||
~LinkedList(void);
|
||||
LinkedList(const LinkedList &llist); // the copy constructor
|
||||
LinkedList &operator = (const LinkedList &llist);
|
||||
int Insert(int val); // this does not insert if val already exists
|
||||
// returns 1 if it could insert
|
||||
// could later on also do frequency counts
|
||||
int Search(int val);
|
||||
void Write(ostream &s);
|
||||
friend ostream &operator<<(ostream &s, LinkedList &ll);
|
||||
int Empty(void) {return (root ? 0 : 1);}
|
||||
};
|
||||
|
||||
//typedef LinkedList *LinkedListPtr;
|
12
11/wywolania/Data/stide_v1.1/Utils/list.h
Executable file
12
11/wywolania/Data/stide_v1.1/Utils/list.h
Executable file
@ -0,0 +1,12 @@
|
||||
// list.h
|
||||
|
||||
class List {
|
||||
public:
|
||||
virtual int Insert(int val) {return 0;}
|
||||
virtual int Search(int val) {return 0;}
|
||||
virtual void Write(ostream &s) {;}
|
||||
virtual int Empty(void) {return 0;}
|
||||
};
|
||||
|
||||
//typedef List *ListPtr;
|
||||
|
71
11/wywolania/Data/stide_v1.1/Utils/random.cc
Executable file
71
11/wywolania/Data/stide_v1.1/Utils/random.cc
Executable file
@ -0,0 +1,71 @@
|
||||
#include <string.h>
|
||||
#include <math.h>
|
||||
#include <stdlib.h>
|
||||
#include "random.h"
|
||||
|
||||
#define PCRAND
|
||||
#ifdef PCRAND
|
||||
#define RANDOM_MAX RAND_MAX
|
||||
#else
|
||||
#define RANDOM_MAX pow(2, 31)-1
|
||||
#endif
|
||||
|
||||
/* this random function returns 1 if a random toss is within pfactor, 0<pfactor<1*/
|
||||
int Probability(float pfactor)
|
||||
{
|
||||
return (pfactor>Random1());
|
||||
/* if a random value from 0 to 1 is less than pfactor, return true*/
|
||||
}
|
||||
/* -----------------------------------------------------------------------------------------------------------------*/
|
||||
unsigned Random(unsigned num) /* returns a random word between 0 and num-1*/
|
||||
{
|
||||
float ratio, temp;
|
||||
|
||||
ratio=num;
|
||||
temp=RANDOM_MAX;
|
||||
ratio=ratio/temp;
|
||||
#ifdef PCRAND
|
||||
temp=rand();
|
||||
#else
|
||||
temp=random();
|
||||
#endif
|
||||
if (temp*ratio>num-1) return num-1;
|
||||
else return (temp*ratio);
|
||||
}
|
||||
/* ------------------------------------------------------------------------------------------------------------*/
|
||||
/* returns a value between 0 and 1*/
|
||||
float Random1(void)
|
||||
{
|
||||
float ratio,temp;
|
||||
|
||||
#ifdef PCRAND
|
||||
ratio=rand();
|
||||
#else
|
||||
ratio=random();
|
||||
#endif
|
||||
temp=RANDOM_MAX;
|
||||
ratio=ratio/temp;
|
||||
if (ratio<=0) ratio=0.0001;
|
||||
if (ratio>=0.9999) ratio=0.9999;
|
||||
return ratio;
|
||||
}
|
||||
/* -------------------------------------------------------------------------------------------------------------------*/
|
||||
/* initializes random generator*/
|
||||
void InitRandom(int seed)
|
||||
{
|
||||
#ifdef PCRAND
|
||||
srand(seed);
|
||||
#else
|
||||
static char state[64];
|
||||
|
||||
initstate(seed,state,64);
|
||||
setstate(state);
|
||||
#endif
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
13
11/wywolania/Data/stide_v1.1/Utils/random.h
Executable file
13
11/wywolania/Data/stide_v1.1/Utils/random.h
Executable file
@ -0,0 +1,13 @@
|
||||
#ifndef __RANDOM_H
|
||||
#define __RANDOM_H
|
||||
|
||||
/* these are routines to generate random nos in commonly used formats. These routines all
|
||||
use the random function and so are very random !
|
||||
*/
|
||||
|
||||
void InitRandom(int seed); /* initializes the random system to seed - uses internal state buffers*/
|
||||
int Probability(float pfactor); /* returns 1 if a random toss is within pfactor, 0 otherwise*/
|
||||
unsigned Random(unsigned num); /* returns an unsigned from 0 to num-1*/
|
||||
float Random1(void); /* returns a random floating pt between 0 and 1, i.e over interval (0,1)*/
|
||||
|
||||
#endif
|
27
11/wywolania/Data/stide_v1.1/Utils/tlist.h
Executable file
27
11/wywolania/Data/stide_v1.1/Utils/tlist.h
Executable file
@ -0,0 +1,27 @@
|
||||
#ifndef __TLIST_H
|
||||
#define __TLIST_H
|
||||
// tlist.h
|
||||
// this is a base template class for lists
|
||||
// it is for a list of elements. An element can be of any class, but it must
|
||||
// have the operators == and = defined, so that the list can be searched
|
||||
// also the operator >> must be defined for write
|
||||
|
||||
template <class Elem> class List {
|
||||
public:
|
||||
virtual Elem *Insert(const Elem &elem) {return NULL;}
|
||||
// insert elem into the list
|
||||
// returns a ptr to elem if elem inserted, NULL if elem was already there
|
||||
// i.e. doesn't put in duplicates and returns NULL for duplicates
|
||||
virtual Elem *Search(const Elem &elem) {return NULL;}
|
||||
// finds the element that matchs elem and returns it. This allows assoc
|
||||
// retrieval
|
||||
// returns NULL if the elem is not found
|
||||
virtual void Write(ostream &s) {;}
|
||||
// writes out the list of elements. Requires that the element overload the
|
||||
// stream output operator
|
||||
virtual int Empty(void) {return 0;}
|
||||
// returns true if the list is empty
|
||||
};
|
||||
|
||||
#endif
|
||||
|
136
11/wywolania/Data/stide_v1.1/Utils/tll.cc
Executable file
136
11/wywolania/Data/stide_v1.1/Utils/tll.cc
Executable file
@ -0,0 +1,136 @@
|
||||
// tll.cpp
|
||||
|
||||
#include "tll.h"
|
||||
|
||||
// data structures:
|
||||
// node for a linked list
|
||||
template <class Elem> class LLNode {
|
||||
public:
|
||||
Elem elem; // the element at this node
|
||||
LLNode<Elem> *next; // pointer to the next node
|
||||
};
|
||||
//===========================================================================
|
||||
template <class Elem> LinkedList<Elem>::LinkedList(void) {
|
||||
root = NULL;
|
||||
length = 0;
|
||||
}
|
||||
//===========================================================================
|
||||
template <class Elem> LinkedList<Elem>::LinkedList(const LinkedList<Elem> &llist) {
|
||||
root = NULL;
|
||||
LLNode<Elem> *temp_ptr = llist.root;
|
||||
while (temp_ptr) {
|
||||
Insert(temp_ptr->elem);
|
||||
temp_ptr = temp_ptr->next;
|
||||
}
|
||||
}
|
||||
//===========================================================================
|
||||
template <class Elem> LinkedList<Elem> &LinkedList<Elem>::operator = (
|
||||
const LinkedList<Elem> &llist) {
|
||||
root = NULL;
|
||||
LLNode<Elem> *temp_ptr = llist.root;
|
||||
while (temp_ptr) {
|
||||
Insert(temp_ptr->elem);
|
||||
temp_ptr = temp_ptr->next;
|
||||
}
|
||||
return *this;
|
||||
}
|
||||
//============================================================================
|
||||
template <class Elem> LinkedList<Elem>::~LinkedList(void) {
|
||||
if (root) {
|
||||
LLNode<Elem> *temp_ptr = root->next, *next_temp_ptr;
|
||||
delete root;
|
||||
while (temp_ptr) {
|
||||
next_temp_ptr = temp_ptr->next;
|
||||
delete temp_ptr;
|
||||
temp_ptr = next_temp_ptr;
|
||||
}
|
||||
}
|
||||
}
|
||||
//============================================================================
|
||||
template <class Elem> void LinkedList<Elem>::Clear(void) {
|
||||
if (root) {
|
||||
LLNode<Elem> *temp_ptr = root->next, *next_temp_ptr;
|
||||
delete root;
|
||||
while (temp_ptr) {
|
||||
next_temp_ptr = temp_ptr->next;
|
||||
delete temp_ptr;
|
||||
temp_ptr = next_temp_ptr;
|
||||
}
|
||||
}
|
||||
root = NULL;
|
||||
length = 0;
|
||||
}
|
||||
//===========================================================================
|
||||
template <class Elem> Elem *LinkedList<Elem>::Insert(const Elem &elem) {
|
||||
if (!root) {
|
||||
root = new LLNode<Elem>;
|
||||
root->elem = elem;
|
||||
root->next = NULL;
|
||||
length++;
|
||||
return &(root->elem);
|
||||
} else {
|
||||
if (!Search(elem)) { // only put in if it is not already in - this is ineff.
|
||||
LLNode<Elem> temp_node;
|
||||
temp_node.elem = root->elem;
|
||||
temp_node.next = root->next;
|
||||
root->elem = elem;
|
||||
root->next = new LLNode<Elem>;
|
||||
root->next->elem = temp_node.elem;
|
||||
root->next->next = temp_node.next;
|
||||
length++;
|
||||
return &(root->elem);
|
||||
} else { // put the elem back in the same place
|
||||
root->elem = elem;
|
||||
return &(root->elem);
|
||||
}
|
||||
}
|
||||
return NULL;
|
||||
}
|
||||
//===========================================================================
|
||||
template <class Elem> Elem *LinkedList<Elem>::Search(const Elem &elem) {
|
||||
LLNode<Elem> *curr_ptr = root;
|
||||
while (curr_ptr) {
|
||||
if (curr_ptr->elem == elem)
|
||||
return &(curr_ptr->elem);
|
||||
// this is very important, because they may not be completely the same,
|
||||
// since the comparison could be done on a key only
|
||||
else curr_ptr = curr_ptr->next;
|
||||
}
|
||||
return NULL;
|
||||
}
|
||||
//===========================================================================
|
||||
template <class Elem> void LinkedList<Elem>::Write(ostream &s) {
|
||||
LLNode<Elem> *curr_ptr = root;
|
||||
while (curr_ptr) {
|
||||
s<<(curr_ptr->elem)<<" ";
|
||||
curr_ptr = curr_ptr->next;
|
||||
}
|
||||
}
|
||||
//=============================================================================
|
||||
template <class Elem> ostream &operator<<(ostream &s, LinkedList<Elem> &ll) {
|
||||
ll.Write(s);
|
||||
return s;
|
||||
}
|
||||
//===========================================================================
|
||||
template <class Elem> int LinkedList<Elem>::DeleteNext(Elem &elem) {
|
||||
if (!root) return 0;
|
||||
elem = root->elem;
|
||||
LLNode<Elem> *kill_ptr = root;
|
||||
root = root->next;
|
||||
delete kill_ptr;
|
||||
length--;
|
||||
return 1;
|
||||
}
|
||||
//===========================================================================
|
||||
template <class Elem> int LinkedList<Elem>::GetNext(Elem &elem, int start) {
|
||||
if (start) get_next_ptr = root;
|
||||
if (get_next_ptr) {
|
||||
elem = get_next_ptr->elem;
|
||||
get_next_ptr = get_next_ptr->next;
|
||||
return 1;
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
|
42
11/wywolania/Data/stide_v1.1/Utils/tll.h
Executable file
42
11/wywolania/Data/stide_v1.1/Utils/tll.h
Executable file
@ -0,0 +1,42 @@
|
||||
#ifndef __TLL_H
|
||||
#define __TLL_H
|
||||
|
||||
// tll.h
|
||||
|
||||
/* this implements a template class linklist, descended from tlist.h
|
||||
one can create an assoc array out of this by creating an elem class in
|
||||
which the comparison operator depends on the key alone. Then search will
|
||||
return the full elem and one can check the associated vaule to the key
|
||||
*/
|
||||
|
||||
#include <iostream.h>
|
||||
#include "tlist.h"
|
||||
|
||||
|
||||
template <class Elem> class LLNode;
|
||||
template <class Elem> class LinkedList : public List<Elem> {
|
||||
LLNode<Elem> *root;
|
||||
public:
|
||||
LinkedList(void);
|
||||
~LinkedList(void);
|
||||
LinkedList(const LinkedList<Elem> &llist); // the copy constructor
|
||||
LinkedList &operator = (const LinkedList<Elem> &llist);
|
||||
Elem *Insert(const Elem &elem); // this does not insert if val already exists
|
||||
// returns ptr to elem in list if it could insert
|
||||
void Clear(void);
|
||||
int DeleteNext(Elem &elem); // deletes first elem in list and returns it
|
||||
int GetNext(Elem &elem, int start); // returns the next element in the list, if start is set then returns
|
||||
// the first one, returns 0 if the list is now empty
|
||||
Elem *Search(const Elem &elem); // assumes the == operator defined on elem
|
||||
void Write(ostream &s);
|
||||
friend ostream &operator<<(ostream &s, LinkedList<Elem> &ll);
|
||||
int Empty(void) {return (root ? 0 : 1);}
|
||||
int Size(void) {return length;}
|
||||
private:
|
||||
int length;
|
||||
LLNode<Elem> *get_next_ptr; // because the next one is ongoing
|
||||
|
||||
};
|
||||
|
||||
#endif
|
||||
|
339
11/wywolania/Data/stide_v1.2/COPYING
Normal file
339
11/wywolania/Data/stide_v1.2/COPYING
Normal file
@ -0,0 +1,339 @@
|
||||
GNU GENERAL PUBLIC LICENSE
|
||||
Version 2, June 1991
|
||||
|
||||
Copyright (C) 1989, 1991 Free Software Foundation, Inc.
|
||||
675 Mass Ave, Cambridge, MA 02139, USA
|
||||
Everyone is permitted to copy and distribute verbatim copies
|
||||
of this license document, but changing it is not allowed.
|
||||
|
||||
Preamble
|
||||
|
||||
The licenses for most software are designed to take away your
|
||||
freedom to share and change it. By contrast, the GNU General Public
|
||||
License is intended to guarantee your freedom to share and change free
|
||||
software--to make sure the software is free for all its users. This
|
||||
General Public License applies to most of the Free Software
|
||||
Foundation's software and to any other program whose authors commit to
|
||||
using it. (Some other Free Software Foundation software is covered by
|
||||
the GNU Library General Public License instead.) You can apply it to
|
||||
your programs, too.
|
||||
|
||||
When we speak of free software, we are referring to freedom, not
|
||||
price. Our General Public Licenses are designed to make sure that you
|
||||
have the freedom to distribute copies of free software (and charge for
|
||||
this service if you wish), that you receive source code or can get it
|
||||
if you want it, that you can change the software or use pieces of it
|
||||
in new free programs; and that you know you can do these things.
|
||||
|
||||
To protect your rights, we need to make restrictions that forbid
|
||||
anyone to deny you these rights or to ask you to surrender the rights.
|
||||
These restrictions translate to certain responsibilities for you if you
|
||||
distribute copies of the software, or if you modify it.
|
||||
|
||||
For example, if you distribute copies of such a program, whether
|
||||
gratis or for a fee, you must give the recipients all the rights that
|
||||
you have. You must make sure that they, too, receive or can get the
|
||||
source code. And you must show them these terms so they know their
|
||||
rights.
|
||||
|
||||
We protect your rights with two steps: (1) copyright the software, and
|
||||
(2) offer you this license which gives you legal permission to copy,
|
||||
distribute and/or modify the software.
|
||||
|
||||
Also, for each author's protection and ours, we want to make certain
|
||||
that everyone understands that there is no warranty for this free
|
||||
software. If the software is modified by someone else and passed on, we
|
||||
want its recipients to know that what they have is not the original, so
|
||||
that any problems introduced by others will not reflect on the original
|
||||
authors' reputations.
|
||||
|
||||
Finally, any free program is threatened constantly by software
|
||||
patents. We wish to avoid the danger that redistributors of a free
|
||||
program will individually obtain patent licenses, in effect making the
|
||||
program proprietary. To prevent this, we have made it clear that any
|
||||
patent must be licensed for everyone's free use or not licensed at all.
|
||||
|
||||
The precise terms and conditions for copying, distribution and
|
||||
modification follow.
|
||||
|
||||
GNU GENERAL PUBLIC LICENSE
|
||||
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
|
||||
|
||||
0. This License applies to any program or other work which contains
|
||||
a notice placed by the copyright holder saying it may be distributed
|
||||
under the terms of this General Public License. The "Program", below,
|
||||
refers to any such program or work, and a "work based on the Program"
|
||||
means either the Program or any derivative work under copyright law:
|
||||
that is to say, a work containing the Program or a portion of it,
|
||||
either verbatim or with modifications and/or translated into another
|
||||
language. (Hereinafter, translation is included without limitation in
|
||||
the term "modification".) Each licensee is addressed as "you".
|
||||
|
||||
Activities other than copying, distribution and modification are not
|
||||
covered by this License; they are outside its scope. The act of
|
||||
running the Program is not restricted, and the output from the Program
|
||||
is covered only if its contents constitute a work based on the
|
||||
Program (independent of having been made by running the Program).
|
||||
Whether that is true depends on what the Program does.
|
||||
|
||||
1. You may copy and distribute verbatim copies of the Program's
|
||||
source code as you receive it, in any medium, provided that you
|
||||
conspicuously and appropriately publish on each copy an appropriate
|
||||
copyright notice and disclaimer of warranty; keep intact all the
|
||||
notices that refer to this License and to the absence of any warranty;
|
||||
and give any other recipients of the Program a copy of this License
|
||||
along with the Program.
|
||||
|
||||
You may charge a fee for the physical act of transferring a copy, and
|
||||
you may at your option offer warranty protection in exchange for a fee.
|
||||
|
||||
2. You may modify your copy or copies of the Program or any portion
|
||||
of it, thus forming a work based on the Program, and copy and
|
||||
distribute such modifications or work under the terms of Section 1
|
||||
above, provided that you also meet all of these conditions:
|
||||
|
||||
a) You must cause the modified files to carry prominent notices
|
||||
stating that you changed the files and the date of any change.
|
||||
|
||||
b) You must cause any work that you distribute or publish, that in
|
||||
whole or in part contains or is derived from the Program or any
|
||||
part thereof, to be licensed as a whole at no charge to all third
|
||||
parties under the terms of this License.
|
||||
|
||||
c) If the modified program normally reads commands interactively
|
||||
when run, you must cause it, when started running for such
|
||||
interactive use in the most ordinary way, to print or display an
|
||||
announcement including an appropriate copyright notice and a
|
||||
notice that there is no warranty (or else, saying that you provide
|
||||
a warranty) and that users may redistribute the program under
|
||||
these conditions, and telling the user how to view a copy of this
|
||||
License. (Exception: if the Program itself is interactive but
|
||||
does not normally print such an announcement, your work based on
|
||||
the Program is not required to print an announcement.)
|
||||
|
||||
These requirements apply to the modified work as a whole. If
|
||||
identifiable sections of that work are not derived from the Program,
|
||||
and can be reasonably considered independent and separate works in
|
||||
themselves, then this License, and its terms, do not apply to those
|
||||
sections when you distribute them as separate works. But when you
|
||||
distribute the same sections as part of a whole which is a work based
|
||||
on the Program, the distribution of the whole must be on the terms of
|
||||
this License, whose permissions for other licensees extend to the
|
||||
entire whole, and thus to each and every part regardless of who wrote it.
|
||||
|
||||
Thus, it is not the intent of this section to claim rights or contest
|
||||
your rights to work written entirely by you; rather, the intent is to
|
||||
exercise the right to control the distribution of derivative or
|
||||
collective works based on the Program.
|
||||
|
||||
In addition, mere aggregation of another work not based on the Program
|
||||
with the Program (or with a work based on the Program) on a volume of
|
||||
a storage or distribution medium does not bring the other work under
|
||||
the scope of this License.
|
||||
|
||||
3. You may copy and distribute the Program (or a work based on it,
|
||||
under Section 2) in object code or executable form under the terms of
|
||||
Sections 1 and 2 above provided that you also do one of the following:
|
||||
|
||||
a) Accompany it with the complete corresponding machine-readable
|
||||
source code, which must be distributed under the terms of Sections
|
||||
1 and 2 above on a medium customarily used for software interchange; or,
|
||||
|
||||
b) Accompany it with a written offer, valid for at least three
|
||||
years, to give any third party, for a charge no more than your
|
||||
cost of physically performing source distribution, a complete
|
||||
machine-readable copy of the corresponding source code, to be
|
||||
distributed under the terms of Sections 1 and 2 above on a medium
|
||||
customarily used for software interchange; or,
|
||||
|
||||
c) Accompany it with the information you received as to the offer
|
||||
to distribute corresponding source code. (This alternative is
|
||||
allowed only for noncommercial distribution and only if you
|
||||
received the program in object code or executable form with such
|
||||
an offer, in accord with Subsection b above.)
|
||||
|
||||
The source code for a work means the preferred form of the work for
|
||||
making modifications to it. For an executable work, complete source
|
||||
code means all the source code for all modules it contains, plus any
|
||||
associated interface definition files, plus the scripts used to
|
||||
control compilation and installation of the executable. However, as a
|
||||
special exception, the source code distributed need not include
|
||||
anything that is normally distributed (in either source or binary
|
||||
form) with the major components (compiler, kernel, and so on) of the
|
||||
operating system on which the executable runs, unless that component
|
||||
itself accompanies the executable.
|
||||
|
||||
If distribution of executable or object code is made by offering
|
||||
access to copy from a designated place, then offering equivalent
|
||||
access to copy the source code from the same place counts as
|
||||
distribution of the source code, even though third parties are not
|
||||
compelled to copy the source along with the object code.
|
||||
|
||||
4. You may not copy, modify, sublicense, or distribute the Program
|
||||
except as expressly provided under this License. Any attempt
|
||||
otherwise to copy, modify, sublicense or distribute the Program is
|
||||
void, and will automatically terminate your rights under this License.
|
||||
However, parties who have received copies, or rights, from you under
|
||||
this License will not have their licenses terminated so long as such
|
||||
parties remain in full compliance.
|
||||
|
||||
5. You are not required to accept this License, since you have not
|
||||
signed it. However, nothing else grants you permission to modify or
|
||||
distribute the Program or its derivative works. These actions are
|
||||
prohibited by law if you do not accept this License. Therefore, by
|
||||
modifying or distributing the Program (or any work based on the
|
||||
Program), you indicate your acceptance of this License to do so, and
|
||||
all its terms and conditions for copying, distributing or modifying
|
||||
the Program or works based on it.
|
||||
|
||||
6. Each time you redistribute the Program (or any work based on the
|
||||
Program), the recipient automatically receives a license from the
|
||||
original licensor to copy, distribute or modify the Program subject to
|
||||
these terms and conditions. You may not impose any further
|
||||
restrictions on the recipients' exercise of the rights granted herein.
|
||||
You are not responsible for enforcing compliance by third parties to
|
||||
this License.
|
||||
|
||||
7. If, as a consequence of a court judgment or allegation of patent
|
||||
infringement or for any other reason (not limited to patent issues),
|
||||
conditions are imposed on you (whether by court order, agreement or
|
||||
otherwise) that contradict the conditions of this License, they do not
|
||||
excuse you from the conditions of this License. If you cannot
|
||||
distribute so as to satisfy simultaneously your obligations under this
|
||||
License and any other pertinent obligations, then as a consequence you
|
||||
may not distribute the Program at all. For example, if a patent
|
||||
license would not permit royalty-free redistribution of the Program by
|
||||
all those who receive copies directly or indirectly through you, then
|
||||
the only way you could satisfy both it and this License would be to
|
||||
refrain entirely from distribution of the Program.
|
||||
|
||||
If any portion of this section is held invalid or unenforceable under
|
||||
any particular circumstance, the balance of the section is intended to
|
||||
apply and the section as a whole is intended to apply in other
|
||||
circumstances.
|
||||
|
||||
It is not the purpose of this section to induce you to infringe any
|
||||
patents or other property right claims or to contest validity of any
|
||||
such claims; this section has the sole purpose of protecting the
|
||||
integrity of the free software distribution system, which is
|
||||
implemented by public license practices. Many people have made
|
||||
generous contributions to the wide range of software distributed
|
||||
through that system in reliance on consistent application of that
|
||||
system; it is up to the author/donor to decide if he or she is willing
|
||||
to distribute software through any other system and a licensee cannot
|
||||
impose that choice.
|
||||
|
||||
This section is intended to make thoroughly clear what is believed to
|
||||
be a consequence of the rest of this License.
|
||||
|
||||
8. If the distribution and/or use of the Program is restricted in
|
||||
certain countries either by patents or by copyrighted interfaces, the
|
||||
original copyright holder who places the Program under this License
|
||||
may add an explicit geographical distribution limitation excluding
|
||||
those countries, so that distribution is permitted only in or among
|
||||
countries not thus excluded. In such case, this License incorporates
|
||||
the limitation as if written in the body of this License.
|
||||
|
||||
9. The Free Software Foundation may publish revised and/or new versions
|
||||
of the General Public License from time to time. Such new versions will
|
||||
be similar in spirit to the present version, but may differ in detail to
|
||||
address new problems or concerns.
|
||||
|
||||
Each version is given a distinguishing version number. If the Program
|
||||
specifies a version number of this License which applies to it and "any
|
||||
later version", you have the option of following the terms and conditions
|
||||
either of that version or of any later version published by the Free
|
||||
Software Foundation. If the Program does not specify a version number of
|
||||
this License, you may choose any version ever published by the Free Software
|
||||
Foundation.
|
||||
|
||||
10. If you wish to incorporate parts of the Program into other free
|
||||
programs whose distribution conditions are different, write to the author
|
||||
to ask for permission. For software which is copyrighted by the Free
|
||||
Software Foundation, write to the Free Software Foundation; we sometimes
|
||||
make exceptions for this. Our decision will be guided by the two goals
|
||||
of preserving the free status of all derivatives of our free software and
|
||||
of promoting the sharing and reuse of software generally.
|
||||
|
||||
NO WARRANTY
|
||||
|
||||
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
|
||||
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
|
||||
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
|
||||
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
|
||||
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
|
||||
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
|
||||
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
|
||||
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
|
||||
REPAIR OR CORRECTION.
|
||||
|
||||
12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
|
||||
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
|
||||
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
|
||||
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
|
||||
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
|
||||
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
|
||||
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
|
||||
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
|
||||
POSSIBILITY OF SUCH DAMAGES.
|
||||
|
||||
END OF TERMS AND CONDITIONS
|
||||
|
||||
Appendix: How to Apply These Terms to Your New Programs
|
||||
|
||||
If you develop a new program, and you want it to be of the greatest
|
||||
possible use to the public, the best way to achieve this is to make it
|
||||
free software which everyone can redistribute and change under these terms.
|
||||
|
||||
To do so, attach the following notices to the program. It is safest
|
||||
to attach them to the start of each source file to most effectively
|
||||
convey the exclusion of warranty; and each file should have at least
|
||||
the "copyright" line and a pointer to where the full notice is found.
|
||||
|
||||
<one line to give the program's name and a brief idea of what it does.>
|
||||
Copyright (C) 19yy <name of author>
|
||||
|
||||
This program is free software; you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation; either version 2 of the License, or
|
||||
(at your option) any later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program; if not, write to the Free Software
|
||||
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
|
||||
|
||||
Also add information on how to contact you by electronic and paper mail.
|
||||
|
||||
If the program is interactive, make it output a short notice like this
|
||||
when it starts in an interactive mode:
|
||||
|
||||
Gnomovision version 69, Copyright (C) 19yy name of author
|
||||
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
|
||||
This is free software, and you are welcome to redistribute it
|
||||
under certain conditions; type `show c' for details.
|
||||
|
||||
The hypothetical commands `show w' and `show c' should show the appropriate
|
||||
parts of the General Public License. Of course, the commands you use may
|
||||
be called something other than `show w' and `show c'; they could even be
|
||||
mouse-clicks or menu items--whatever suits your program.
|
||||
|
||||
You should also get your employer (if you work as a programmer) or your
|
||||
school, if any, to sign a "copyright disclaimer" for the program, if
|
||||
necessary. Here is a sample; alter the names:
|
||||
|
||||
Yoyodyne, Inc., hereby disclaims all copyright interest in the program
|
||||
`Gnomovision' (which makes passes at compilers) written by James Hacker.
|
||||
|
||||
<signature of Ty Coon>, 1 April 1989
|
||||
Ty Coon, President of Vice
|
||||
|
||||
This General Public License does not permit incorporating your program into
|
||||
proprietary programs. If your program is a subroutine library, you may
|
||||
consider it more useful to permit linking proprietary applications with the
|
||||
library. If this is what you want to do, use the GNU Library General
|
||||
Public License instead of this License.
|
21
11/wywolania/Data/stide_v1.2/Makefile
Normal file
21
11/wywolania/Data/stide_v1.2/Makefile
Normal file
@ -0,0 +1,21 @@
|
||||
STIDE_OBJECTS = config.o flexitree.o stide.o stream.o
|
||||
|
||||
STIDE_HEADERS = config.h flexitree.h opt_info.h stream.h
|
||||
|
||||
FLAGS = -g
|
||||
|
||||
stide: $(STIDE_OBJECTS)
|
||||
g++ $(FLAGS) $(STIDE_OBJECTS) -o stide
|
||||
|
||||
config.o: config.C config.h
|
||||
g++ -c $(FLAGS) config.C
|
||||
|
||||
flexitree.o: flexitree.C flexitree.h
|
||||
g++ -c $(FLAGS) flexitree.C
|
||||
|
||||
stream.o: stream.C stream.h
|
||||
g++ -c $(FLAGS) stream.C
|
||||
|
||||
stide.o: stide.C $(STIDE_HEADERS)
|
||||
g++ -c $(FLAGS) stide.C
|
||||
|
13
11/wywolania/Data/stide_v1.2/README
Normal file
13
11/wywolania/Data/stide_v1.2/README
Normal file
@ -0,0 +1,13 @@
|
||||
STIDE version 1.2
|
||||
|
||||
Copyright (C) 1996, 1998 The Regents of the University of New Mexico.
|
||||
Copyright (C) 2006 Hajime Inoue.
|
||||
|
||||
All rights reserved.
|
||||
|
||||
STIDE v1.2 should work identically to v1.1. Modern GCCs will not compile v1.1.
|
||||
STIDE v1.2 was ported to STL and current C++ conventions. Please report
|
||||
any bugs to hinoue@ccsl.carleton.ca.
|
||||
|
||||
For usage information invoke stide with the --help option. More detailed
|
||||
documentation can be found in the UserDoc directory.
|
803
11/wywolania/Data/stide_v1.2/config.C
Normal file
803
11/wywolania/Data/stide_v1.2/config.C
Normal file
@ -0,0 +1,803 @@
|
||||
#include <stdlib.h>
|
||||
#include <stdio.h>
|
||||
#include <fstream>
|
||||
#include <string>
|
||||
#include <vector>
|
||||
#include "config.h"
|
||||
#include "opt_info.h"
|
||||
|
||||
#define LF_LIM 999
|
||||
#define SEQ_LEN_LIM 199
|
||||
#define MAX_ELEM_LIM 999
|
||||
#define MAX_STREAMS_LIM 9999
|
||||
|
||||
using std::vector;
|
||||
using std::cout;
|
||||
using std::cerr;
|
||||
using std::endl;
|
||||
|
||||
/**********************************************************************
|
||||
* Config() *
|
||||
* Reads in configuration information from configuration file, from *
|
||||
* the command line, and from preset defaults. *
|
||||
* *
|
||||
* Input: int argc: Number of arguments on command line *
|
||||
* char *argv[]: Array of strings of actual arguments *
|
||||
* *
|
||||
* Output: Nothing *
|
||||
*********************************************************************/
|
||||
|
||||
Config::Config(const int argc, const char *argv[])
|
||||
{
|
||||
vector<OptInfo> opt_array(NUM_OPTS);
|
||||
InitOptArray(opt_array);
|
||||
|
||||
SetDefaults();
|
||||
|
||||
ReadCommandLine(argc, argv, opt_array);
|
||||
|
||||
ReadConfigFile(opt_array);
|
||||
|
||||
CheckValues();
|
||||
|
||||
InitOutputFormat();
|
||||
|
||||
OuputConfigInfo(opt_array);
|
||||
|
||||
}
|
||||
|
||||
/*********************************************************************
|
||||
* InitOptArray() *
|
||||
* Sets the values of opt_array so that opr_array contains all the *
|
||||
* information needed about the parameters being set by the config *
|
||||
* file and the command-line arguments. *
|
||||
* *
|
||||
* Input: vector<OptInfo> &opt_array: Array of information about *
|
||||
* options for the program *
|
||||
* *
|
||||
* Output: Nothing *
|
||||
*********************************************************************/
|
||||
|
||||
void Config::InitOptArray(vector<OptInfo> &opt_array)
|
||||
{
|
||||
// opt_array.reserve(NUM_OPTS);
|
||||
|
||||
opt_array[0].long_name = "db_name";
|
||||
opt_array[0].short_name = "d";
|
||||
opt_array[0].set = 0;
|
||||
opt_array[0].type = 's';
|
||||
opt_array[0].str_val = &db_name;
|
||||
|
||||
opt_array[1].long_name = "seq_len";
|
||||
opt_array[1].short_name = "l";
|
||||
opt_array[1].set = 0;
|
||||
opt_array[1].type = 'i';
|
||||
opt_array[1].int_val = &seq_len;
|
||||
|
||||
opt_array[2].long_name = "max_elements";
|
||||
opt_array[2].short_name = "me";
|
||||
opt_array[2].set = 0;
|
||||
opt_array[2].type = 'i';
|
||||
opt_array[2].int_val = &max_elements;
|
||||
|
||||
opt_array[3].long_name = "max_streams";
|
||||
opt_array[3].short_name = "ms";
|
||||
opt_array[3].set = 0;
|
||||
opt_array[3].type = 'i';
|
||||
opt_array[3].int_val = &max_streams;
|
||||
|
||||
opt_array[4].long_name = "cfg_name";
|
||||
opt_array[4].short_name = "c";
|
||||
opt_array[4].set = 0;
|
||||
opt_array[4].type = 's';
|
||||
opt_array[4].str_val = &cfg_name;
|
||||
|
||||
opt_array[5].long_name = "pair_offset";
|
||||
opt_array[5].short_name = "p";
|
||||
opt_array[5].set = 0;
|
||||
opt_array[5].type = 'i';
|
||||
opt_array[5].int_val = &pair_offset;
|
||||
|
||||
opt_array[6].long_name = "add_output_format";
|
||||
opt_array[6].short_name = "aof";
|
||||
opt_array[6].set = 0;
|
||||
opt_array[6].type = 's';
|
||||
opt_array[6].str_val = &add_output_format;
|
||||
|
||||
opt_array[7].long_name = "compare_output_format";
|
||||
opt_array[7].short_name = "cof";
|
||||
opt_array[7].set = 0;
|
||||
opt_array[7].type = 's';
|
||||
opt_array[7].str_val = &compare_output_format;
|
||||
|
||||
opt_array[8].long_name = "add_to_db";
|
||||
opt_array[8].short_name = "a";
|
||||
opt_array[8].set = 0;
|
||||
opt_array[8].type = 'f';
|
||||
opt_array[8].int_val = &add_to_db;
|
||||
|
||||
opt_array[9].long_name = "output_graph";
|
||||
opt_array[9].short_name = "g";
|
||||
opt_array[9].set = 0;
|
||||
opt_array[9].type = 'f';
|
||||
opt_array[9].int_val = &output_graph;
|
||||
|
||||
opt_array[10].long_name = "compute_hdist";
|
||||
opt_array[10].short_name = "hd";
|
||||
opt_array[10].set = 0;
|
||||
opt_array[10].type = 'f';
|
||||
opt_array[10].int_val = &compute_hdist;
|
||||
|
||||
opt_array[11].long_name = "lf_size";
|
||||
opt_array[11].short_name = "lf";
|
||||
opt_array[11].set = 0;
|
||||
opt_array[11].type = 'i';
|
||||
opt_array[11].int_val = &lf_size;
|
||||
|
||||
opt_array[12].long_name = "write_db_stats";
|
||||
opt_array[12].short_name = "s";
|
||||
opt_array[12].set = 0;
|
||||
opt_array[12].type = 'f';
|
||||
opt_array[12].int_val = &write_db_stats;
|
||||
|
||||
opt_array[13].long_name = "verbose";
|
||||
opt_array[13].short_name = "v";
|
||||
opt_array[13].set = 0;
|
||||
opt_array[13].type = 'f';
|
||||
opt_array[13].int_val = &verbose;
|
||||
|
||||
opt_array[14].long_name = "very_verbose";
|
||||
opt_array[14].short_name = "V";
|
||||
opt_array[14].set = 0;
|
||||
opt_array[14].type = 'f';
|
||||
opt_array[14].int_val = &very_verbose;
|
||||
|
||||
opt_array[15].long_name = "help";
|
||||
opt_array[15].short_name = "h";
|
||||
opt_array[15].set = 0;
|
||||
opt_array[15].type = 'h';
|
||||
|
||||
}
|
||||
|
||||
/*********************************************************************
|
||||
* SetDefaults() *
|
||||
* Sets conifiguration variables to their default values *
|
||||
* *
|
||||
* Input: None *
|
||||
* *
|
||||
* Output: None *
|
||||
********************************************************************/
|
||||
|
||||
void Config::SetDefaults()
|
||||
{
|
||||
cfg_name = "stide.config";
|
||||
db_name = "default.db";
|
||||
seq_len = 6;
|
||||
max_elements = 500;
|
||||
max_streams = 500;
|
||||
pair_offset = 0;
|
||||
add_output_format = "DB Size: %d\tStream: %s\tPair Number: %p\n";
|
||||
compare_output_format = "Pair Number: %p\tStream Number: %s\n";
|
||||
lf_size = 1;
|
||||
add_to_db = 0;
|
||||
output_graph = 0;
|
||||
compute_hdist = 0;
|
||||
write_db_stats = 0;
|
||||
verbose = 0;
|
||||
very_verbose = 0;
|
||||
num_fvars = 0;
|
||||
}
|
||||
|
||||
|
||||
/*********************************************************************
|
||||
* ReadCommandLine() *
|
||||
* Parses the command line. Updates configuration variables. *
|
||||
* *
|
||||
* const int argc Number of arguments *
|
||||
* const char *argv[], Array of arguments *
|
||||
* vector<OptInfo> &opt_array Constant array of information about *
|
||||
* the configuration variables *
|
||||
********************************************************************/
|
||||
|
||||
void Config::ReadCommandLine(const int argc, const char *argv[],
|
||||
vector<OptInfo> &opt_array)
|
||||
{
|
||||
string var_name; // Name of variable
|
||||
string var_val; // Value of variable
|
||||
int name_type; // LONG_NAME or SHORT_NAME
|
||||
int argv_i = 1; // First index of argv
|
||||
int argv_j = 0; // Second index of argv
|
||||
|
||||
while (argv_i < argc) {
|
||||
if (argv[argv_i][argv_j] != '-') {
|
||||
cerr<< "ERROR: Switches must be preceeded by a dash: "<<argv[argv_i]
|
||||
<< endl << " is illegal" << endl;
|
||||
exit(-1);
|
||||
}
|
||||
argv_j++;
|
||||
if (argv[argv_i][argv_j] == '-') { // Long name
|
||||
argv_j++;
|
||||
name_type = LONG_NAME;
|
||||
}
|
||||
else {
|
||||
name_type = SHORT_NAME;
|
||||
}
|
||||
|
||||
// Read name into var_name
|
||||
var_name = argv[argv_i]+argv_j;
|
||||
|
||||
// Now we want to read the value, if there is one.
|
||||
argv_j = 0;
|
||||
if (++argv_i < argc) {
|
||||
if (argv[argv_i][argv_j] != '-') {
|
||||
var_val = argv[argv_i];
|
||||
argv_i++;
|
||||
}
|
||||
}
|
||||
|
||||
// assign value to appropriate variable
|
||||
AssignValToVar(opt_array, var_val, var_name, name_type);
|
||||
// Blank var_name and var_val for next time around
|
||||
var_name.resize(0);
|
||||
var_val.resize(0);
|
||||
}
|
||||
}
|
||||
|
||||
/*********************************************************************
|
||||
* AssignValToVar() *
|
||||
* Figures out which variable to assign a given value to and does *
|
||||
* so. Updates opt_array, to say that that particular variable *
|
||||
* has been set. *
|
||||
* *
|
||||
* Input: vector<OptInfo> &opt_array Option Information *
|
||||
* const string &var_val Value to be assigned *
|
||||
* const string &var_name Name of variable to be updated *
|
||||
* const int name_type SHORT_NAME or LONG_NAME *
|
||||
* *
|
||||
* Output: None *
|
||||
********************************************************************/
|
||||
|
||||
void Config::AssignValToVar(vector<OptInfo> &opt_array, const string
|
||||
&var_val, const string &var_name, const
|
||||
int name_type)
|
||||
{
|
||||
int opt_i;
|
||||
|
||||
for (opt_i = 0; opt_i < NUM_OPTS; opt_i++) {
|
||||
if (((name_type == LONG_NAME) && (opt_array[opt_i].long_name ==
|
||||
var_name)) ||
|
||||
((name_type == SHORT_NAME) && (opt_array[opt_i].short_name ==
|
||||
var_name))) {
|
||||
// If we have already set this variable and shouldn't change it,
|
||||
// don't
|
||||
if (opt_array[opt_i].set == 1) {
|
||||
break;
|
||||
}
|
||||
switch (opt_array[opt_i].type) {
|
||||
case 'f': // flag
|
||||
if ((var_val.length() == 0) || (var_val == "On") ||
|
||||
(var_val == "ON") || (var_val == "on")) {
|
||||
*(opt_array[opt_i].flag_val) = 1;
|
||||
opt_array[opt_i].set = 1;
|
||||
}
|
||||
else if ((var_val != "Off") && (var_val != "off") &&
|
||||
(var_val != "OFF")) {
|
||||
cerr << "ERROR: Illegal value for parameter " << var_name
|
||||
<< ". This parameter is a simple flag," << endl
|
||||
<< "and may be followed by \"on\", \"off\", or nothing "
|
||||
<< "(which turns it on). The current value is "
|
||||
<< var_val << ". Aborting...";
|
||||
exit(-1);
|
||||
}
|
||||
break;
|
||||
case 'i':
|
||||
// If there isn't a value, just use the default
|
||||
if (var_val.length() == 0) {
|
||||
break;
|
||||
}
|
||||
*(opt_array[opt_i].int_val) = atoi(var_val.c_str());
|
||||
opt_array[opt_i].set = 1;
|
||||
break;
|
||||
case 's':
|
||||
// If there is no string given, just use the default
|
||||
if (var_val.length() == 0) {
|
||||
break;
|
||||
}
|
||||
*(opt_array[opt_i].str_val) = var_val;
|
||||
opt_array[opt_i].set = 1;
|
||||
break;
|
||||
case 'h':
|
||||
WriteHelpInfo();
|
||||
} // end of switch
|
||||
return; // we've found it, so we're done
|
||||
} // end of if (opt_array[opt_i]...
|
||||
} // end of for (opt_i = 0; ...
|
||||
}
|
||||
|
||||
|
||||
/*********************************************************************
|
||||
* ReadConfigFile() *
|
||||
* Parses the configuration file. Updates configuration *
|
||||
* variables. *
|
||||
* *
|
||||
* Input: vector<OptInfo> &opt_array: Option information *
|
||||
* *
|
||||
* Output: None *
|
||||
********************************************************************/
|
||||
|
||||
|
||||
void Config::ReadConfigFile(vector<OptInfo> &opt_array)
|
||||
{
|
||||
string var_name;
|
||||
string var_val;
|
||||
|
||||
// Set up stream for reading configuration
|
||||
ifstream cfg_file(cfg_name.c_str());
|
||||
string buff;
|
||||
int buff_i = 0; // index for buff
|
||||
int opt_i = 0; // index for opt_array
|
||||
int rev_num; // revision number of configuration file
|
||||
|
||||
if (!cfg_file.is_open()) {
|
||||
cerr<<"WARNING: Cannot open configuration file "<<cfg_name
|
||||
<<". I will continue, using the" <<endl
|
||||
<<"default values and the command line arguments." << endl
|
||||
<<"If that isn't what you wanted, type Ctrl-C now to abort."
|
||||
<< endl;
|
||||
return;
|
||||
}
|
||||
|
||||
// First we need to determine if the configuration file is old-style
|
||||
// or new-style, i.e., is there a #ConfigFileRev: in the first
|
||||
// line. We can determine this just be checking the first
|
||||
// character.
|
||||
char c = cfg_file.peek();
|
||||
|
||||
// Config file is empty; just return
|
||||
if (cfg_file.eof()) {
|
||||
return;
|
||||
}
|
||||
|
||||
// If old-style
|
||||
if (c != '#') {
|
||||
cerr << "WARNING: The first line of the configuration file did "
|
||||
<< "not contain the string" << endl
|
||||
<< "\"#ConfigFileRev: " << CFREV << "\"." << endl
|
||||
<< "I will assume that this is an old format configuration "
|
||||
<< "file." << endl
|
||||
<<"If that isn't what you wanted, type Ctrl-C now to abort."
|
||||
<< endl << endl;
|
||||
ReadOldConfigFile(cfg_file, opt_array);
|
||||
return;
|
||||
}
|
||||
|
||||
// Look for "#ConfigFileRev:"
|
||||
cfg_file >> buff;
|
||||
|
||||
if (buff != "#ConfigFileRev:") {
|
||||
cerr << "ERROR: I expected the first line of the configuration "
|
||||
<< "file to either be \"#ConfigFileRev: \" followed by the "
|
||||
<< "revision number or the beginning of an old-style "
|
||||
<< "configuration file, which does not have a comment in the "
|
||||
<< "first line. I'm confused, so I will abort..."
|
||||
<< endl << endl;
|
||||
exit(-1);
|
||||
}
|
||||
|
||||
cfg_file >> rev_num;
|
||||
|
||||
if (rev_num > CFREV) {
|
||||
cerr << "ERROR: This version of STIDE does not know how to deal "
|
||||
<< "with configuration files" << endl
|
||||
<< "more modern than revision " << CFREV << ". Aborting..."
|
||||
<<endl;
|
||||
exit(-1);
|
||||
}
|
||||
if (rev_num < CFREV) {
|
||||
cerr << "ERROR: Configuration files must be revision " << CFREV
|
||||
<< "or later, " << "or an old-style" << endl
|
||||
<< "configuration file without a revision number. "
|
||||
<< "Aborting..." << endl;
|
||||
exit(-1);
|
||||
}
|
||||
|
||||
// Now we know everything's as we expect, so we'll parse the file
|
||||
|
||||
while (!cfg_file.eof()) {
|
||||
// Skip white space at the beginning of the line
|
||||
while (isspace(buff[buff_i])) {
|
||||
buff_i++;
|
||||
}
|
||||
|
||||
// If buff is empty, move on to next line
|
||||
if (buff.length() <= buff_i) {
|
||||
getline(cfg_file, buff);
|
||||
buff_i = 0;
|
||||
continue;
|
||||
}
|
||||
|
||||
// If we start with a comment, move on to next line
|
||||
if (buff[buff_i] == '#') {
|
||||
getline(cfg_file, buff);
|
||||
buff_i = 0;
|
||||
continue;
|
||||
}
|
||||
// Read in variable name, up to the :
|
||||
int start_place = buff_i; // the beginning place of the name
|
||||
while (buff[buff_i] != ':' && (buff_i < buff.length())) {
|
||||
buff_i++;
|
||||
}
|
||||
if (buff[buff_i] == buff.length()) {
|
||||
cerr << "ERROR: Variable names in the configuration file must "
|
||||
<< "be followed by a colon. The line " << endl
|
||||
<< buff << endl << "contains a variable name which is not "
|
||||
<< "terminated by a colon. Aborting..." <<endl;
|
||||
exit(-1);
|
||||
}
|
||||
|
||||
// This assigns the values in buff between start_place and buff_i
|
||||
// to var_name
|
||||
var_name.assign(buff, start_place, buff_i - start_place);
|
||||
|
||||
// Skip colon
|
||||
buff_i++;
|
||||
|
||||
// Skip white space
|
||||
while (isspace(buff[buff_i])) { buff_i++; }
|
||||
|
||||
start_place = buff_i; // the starting place of the value
|
||||
// Find last point in value. If it starts with a quote, it ends
|
||||
// with a quote.
|
||||
if ((buff[buff_i] == '\"') && (buff_i < buff.length())) {
|
||||
while (buff[buff_i] != '\"') {
|
||||
buff_i++;
|
||||
}
|
||||
// Strip off first "
|
||||
start_place++;
|
||||
}
|
||||
// Otherwise, it ends with a space, a # or the end of the line
|
||||
else {
|
||||
while ((buff_i < buff.length()) && (!isspace(buff[buff_i])) &&
|
||||
(buff[buff_i] != '#')) {
|
||||
buff_i++;
|
||||
}
|
||||
}
|
||||
var_val.assign(buff, start_place, buff_i - start_place);
|
||||
|
||||
// Now we want to check to see if the line was continued, in which
|
||||
// case we haven't gotten the value of the variable in var_val, so
|
||||
// we still need to do that.
|
||||
if (buff[buff_i-1] == '\\') {
|
||||
getline(cfg_file, buff);
|
||||
buff_i = 0;
|
||||
while (isspace(buff[buff_i])) { buff_i++; }
|
||||
start_place = buff_i;
|
||||
// Find last point in value. If it starts with a quote, it ends with a
|
||||
// quote.
|
||||
if (buff[buff_i] == '\"') {
|
||||
buff_i++;
|
||||
while ((buff[buff_i] != '\"') && (buff_i < buff.length())) {
|
||||
buff_i++;
|
||||
}
|
||||
start_place++; // Strip off first "
|
||||
}
|
||||
// Otherwise, it ends with a space, a # or the end of the line
|
||||
else {
|
||||
while ((buff_i < buff.length()) && (!isspace(buff[buff_i])) &&
|
||||
(buff[buff_i] != '#')) {
|
||||
buff_i++;
|
||||
}
|
||||
}
|
||||
var_val.assign(buff, start_place, buff_i - start_place);
|
||||
}
|
||||
|
||||
// assign value to appropriate variable
|
||||
AssignValToVar(opt_array, var_val, var_name, LONG_NAME);
|
||||
getline(cfg_file, buff);
|
||||
buff_i = 0;
|
||||
} //end of while (!cfg_file.eof())...
|
||||
|
||||
}
|
||||
|
||||
|
||||
/*********************************************************************
|
||||
* ReadOldConfigFile() *
|
||||
* Reads information from an old-style configuration file. *
|
||||
* Updates configuration variables. *
|
||||
* *
|
||||
* Input: ifstream &cfg_file Configuration file (already opened) *
|
||||
* vector<OptInfo> &opt_array: Option information *
|
||||
* *
|
||||
* Output: None *
|
||||
********************************************************************/
|
||||
|
||||
void Config::ReadOldConfigFile(ifstream &cfg_file,
|
||||
vector<OptInfo> &opt_array)
|
||||
{
|
||||
|
||||
string buff;
|
||||
string var_name;
|
||||
string var_val;
|
||||
|
||||
var_name = "max_elements";
|
||||
cfg_file>>var_val;
|
||||
AssignValToVar(opt_array, var_val, var_name, LONG_NAME);
|
||||
getline(cfg_file, buff);
|
||||
|
||||
var_name = "max_streams";
|
||||
cfg_file>>var_val;
|
||||
AssignValToVar(opt_array, var_val, var_name, LONG_NAME);
|
||||
getline(cfg_file, buff);
|
||||
|
||||
// Next line is hash table size, but we are now figuring that out
|
||||
// dynamically, so just throw it away.
|
||||
getline(cfg_file, buff);
|
||||
|
||||
// Now read in the format string
|
||||
getline(cfg_file, var_val);
|
||||
// Put the format string in the appropriate place
|
||||
if (add_to_db) {
|
||||
var_name = "add_output_format";
|
||||
AssignValToVar(opt_array, var_val, var_name, LONG_NAME);
|
||||
}
|
||||
else {
|
||||
var_name = "compare_output_format";
|
||||
AssignValToVar(opt_array, var_val, var_name, LONG_NAME);
|
||||
}
|
||||
}
|
||||
|
||||
/*********************************************************************
|
||||
* CheckValues() *
|
||||
* Checks configuration values that have been read in to make *
|
||||
* sure that they are within the limits. Flags are automatically *
|
||||
* checked while being read in, the output formats are checked *
|
||||
* in InitOutputFormat(), and filenames are checked when they are *
|
||||
* opened, so all that is left is the integer values. *
|
||||
* *
|
||||
* Input: None *
|
||||
* *
|
||||
* Output: None *
|
||||
********************************************************************/
|
||||
|
||||
void Config::CheckValues()
|
||||
{
|
||||
if ((lf_size < 1) || (lf_size > LF_LIM)) {
|
||||
cerr << "ERROR: lf_size must be between 1 and " << LF_LIM
|
||||
<< ". It has been set to " << lf_size << ". Aborting..." << endl;
|
||||
exit(-1);
|
||||
}
|
||||
if ((seq_len < 1) || (seq_len > SEQ_LEN_LIM)) {
|
||||
cerr << "ERROR: seq_len must be between 1 and " << SEQ_LEN_LIM
|
||||
<< ". It has been set to " << seq_len << ". Aborting..." << endl;
|
||||
exit(-1);
|
||||
}
|
||||
if ((max_elements < 1) || (max_elements > MAX_ELEM_LIM)) {
|
||||
cerr << "ERROR: max_elements must be between 1 and " << MAX_ELEM_LIM
|
||||
<< ". It has been set to " << max_elements
|
||||
<< ". Aborting..." << endl;
|
||||
exit(-1);
|
||||
}
|
||||
if ((max_streams < 1) || (max_streams > MAX_STREAMS_LIM)) {
|
||||
cerr << "ERROR: max_streams must be between 1 and " << MAX_STREAMS_LIM
|
||||
<< ". It has been set to " << max_streams
|
||||
<< ". Aborting..." << endl;
|
||||
exit(-1);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
|
||||
/*********************************************************************
|
||||
* InitOutputFormat() *
|
||||
* Converts the string add_output_format or compare_output_format *
|
||||
* to information filling fmt_str and num_fvars, which is more *
|
||||
* convenient for output. *
|
||||
* *
|
||||
* Input: None *
|
||||
* *
|
||||
* Output: None *
|
||||
********************************************************************/
|
||||
|
||||
void Config::InitOutputFormat()
|
||||
{
|
||||
// Now we analyze add_output_format or compare_output_format
|
||||
int flag = 0;
|
||||
int f_i = 0;
|
||||
num_fvars = 0;
|
||||
string *buff;
|
||||
|
||||
// If we're not in verbose or very_verbose modes, we're never going
|
||||
// to use this information, so don't waste our time doing this
|
||||
if (!(verbose || very_verbose)) {
|
||||
return;
|
||||
}
|
||||
if (add_to_db) {
|
||||
buff = &add_output_format;
|
||||
}
|
||||
else {
|
||||
buff = &compare_output_format;
|
||||
}
|
||||
|
||||
for (int i = 0; i <(*buff).length(); i++) {
|
||||
switch ((*buff)[i]) {
|
||||
case '\\':
|
||||
i++;
|
||||
switch ((*buff)[i]) {
|
||||
case 't': fmt_str[num_fvars][f_i] = '\t'; break;
|
||||
case 'n': fmt_str[num_fvars][f_i] = '\n'; break;
|
||||
}
|
||||
break;
|
||||
case '%':
|
||||
fmt_str[num_fvars][f_i] = '%';
|
||||
flag = 1;
|
||||
break;
|
||||
default:
|
||||
fmt_str[num_fvars][f_i] = (*buff)[i];
|
||||
if (flag) {
|
||||
switch (fmt_str[num_fvars][f_i]) {
|
||||
case 'd': // database size
|
||||
case 'i': // number of last value of sequence in this
|
||||
// data stream
|
||||
case 'p': // number of last value of sequence in entire
|
||||
// input
|
||||
case 's': // external stream ID
|
||||
case 'a': // flag for whether this sequence is anomalous
|
||||
case 'c': // locality frame count of this sequence
|
||||
case 'h': // Hamming distance for this sequence
|
||||
// Record that we must write that val at that position
|
||||
write_val[num_fvars] = fmt_str[num_fvars][f_i];
|
||||
fmt_str[num_fvars][f_i] = 'd';
|
||||
fmt_str[num_fvars][f_i + 1] = '\0';
|
||||
num_fvars++;
|
||||
f_i = -1;
|
||||
flag = 0;
|
||||
break;
|
||||
default: // Unknown flag
|
||||
cerr << "ERROR: Illegal control character in output format."
|
||||
<< " Type stide -h for help." << endl;
|
||||
}
|
||||
}
|
||||
} // switch ((*buff)[i ...
|
||||
f_i++;
|
||||
}
|
||||
fmt_str[num_fvars][f_i] = '\0';
|
||||
}
|
||||
|
||||
|
||||
|
||||
/*********************************************************************
|
||||
* OutputConfigInfo() *
|
||||
* Writes information about the final configuration to standard *
|
||||
* output. Does so in a format that could be used as a *
|
||||
* configuration file. Changes no values anywhere. *
|
||||
* *
|
||||
* Input: const vector<OptInfo> &opt_array Option Information *
|
||||
* *
|
||||
* Output: None *
|
||||
********************************************************************/
|
||||
|
||||
void Config::OuputConfigInfo(const vector<OptInfo> &opt_array) const
|
||||
{
|
||||
cout<<"This run was configured using configuration file "
|
||||
<< cfg_name << " and command" << endl
|
||||
<< "line arguments. The configuration values were as "
|
||||
<< "follows." << endl
|
||||
<<"#ConfigFileRev: " << CFREV << endl;
|
||||
for (int i = 0; i < NUM_OPTS; i++) {
|
||||
if (opt_array[i].type == 'i') {
|
||||
cout << opt_array[i].long_name << ": " << *(opt_array[i].int_val)
|
||||
<< endl;
|
||||
}
|
||||
if ((opt_array[i].type == 's') &&
|
||||
((add_to_db && (opt_array[i].short_name == "aof")) ||
|
||||
(!add_to_db && (opt_array[i].short_name == "cof")))) {
|
||||
cout << opt_array[i].long_name << ": \"" << *(opt_array[i].str_val)
|
||||
<< "\"" << endl;
|
||||
}
|
||||
if (opt_array[i].type == 'f') {
|
||||
if (*(opt_array[i].int_val) == 1) {
|
||||
cout << opt_array[i].long_name << ": On" << endl;
|
||||
}
|
||||
if (*(opt_array[i].int_val) == 0) {
|
||||
cout << opt_array[i].long_name << ": Off" << endl;
|
||||
}
|
||||
}
|
||||
}
|
||||
cout << endl << endl;
|
||||
|
||||
// Now print header for verbose modes
|
||||
if (verbose || very_verbose) {
|
||||
cout<<endl<<"Variables in output: "<<endl;
|
||||
for (int j = 0; j < num_fvars; j++) {
|
||||
switch (write_val[j]) {
|
||||
case 's': cout<<"stream #, "; break;
|
||||
case 'i': cout<<"index #, "; break;
|
||||
case 'h': if (compute_hdist) {cout<<"hamming miss, "; } break;
|
||||
case 'c': if (lf_size > 1) {cout<<"lfc, "; } break;
|
||||
case 'p': cout<<"pair #, "; break;
|
||||
case 'd': cout<<"db size, "; break;
|
||||
case 'a': cout<<"is anomalous?, "; break;
|
||||
}
|
||||
}
|
||||
cout<<endl;
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
/*********************************************************************
|
||||
* WriteHelpInfo() *
|
||||
* Writes help information to standard output. Changes no values.*
|
||||
* *
|
||||
* Input: None *
|
||||
* *
|
||||
* Output: None *
|
||||
* *
|
||||
********************************************************************/
|
||||
|
||||
|
||||
void Config::WriteHelpInfo() const
|
||||
{
|
||||
cout<<"STIDE accepts calls of the form:"<<endl
|
||||
<<" stide -c cfg_name -d db_name -e max_num_elements"
|
||||
<<" -lf lf_size -l seq_len"<<endl<<" -n max_num_streams"
|
||||
<<" -p pair_num_offset -aof add_out_format "
|
||||
<< endl << " -cof comp_out_format -a -g -h -m -s -v -V"
|
||||
<< endl << endl;
|
||||
cout<<"STIDE expects input to come through standard input in"
|
||||
<<" the format of a pair"<<endl
|
||||
<<"of integers per line, where the first integer is a"
|
||||
<<" stream identifier"<<endl
|
||||
<<"and the second is a data element. Command line"
|
||||
<<" arguments override"<<endl
|
||||
<<"specifications in the configuration file. All"
|
||||
<<" parameters are optional"<<endl
|
||||
<<"and can be specified in any order. Parameters"
|
||||
<<" are always preceded by a"<<endl
|
||||
<<"switch. The switches are:"<<endl<<endl;
|
||||
cout<<"-a Add to database; defaults to off"<<endl;
|
||||
cout<<"-c cfg_name The name of file containing the"
|
||||
<<" configuration;"<<endl
|
||||
<<" defaults to \"stide.config\""<<endl;
|
||||
cout<<"-d db_name The name of the file containing"
|
||||
<<" the database;"<<endl
|
||||
<<" defaults to \"default.db\""<<endl;
|
||||
cout<<"-lf lf_size The size of the locality frame;"
|
||||
<<" defaults to 1"<<endl;
|
||||
cout<<"-g Write graphing data in dot format to"
|
||||
<<" db_name.dot;"<<endl
|
||||
<<" defaults to off"<<endl;
|
||||
cout<<"-h Help; displays this information"<<endl;
|
||||
cout<<"-l seq_len Length of sequence; defaults to 6"
|
||||
<<endl;
|
||||
cout<<"-p pair_offset Offset for pair number count;"
|
||||
<<" defaults to 0"<<endl;
|
||||
cout<<"-s Display db stats; defaults to off"
|
||||
<<endl;
|
||||
cout<<"-v Verbose mode on; defaults to off"<<endl;
|
||||
cout<<"-V Very verbose mode on; defaults to off"<<endl;
|
||||
cout<<"-hd Compute Hamming distance measures;"
|
||||
<<" defaults to off"<<endl;
|
||||
cout<<"-me max_elements Maximum number of different"
|
||||
<<" elements"<<endl
|
||||
<<" in the input stream; defaults to"
|
||||
<<" 500" <<endl;
|
||||
cout<<"-ms max_num_streams Maximum number of different"
|
||||
<<" streams in input;"<<endl
|
||||
<<" defaults to 100"<<endl;
|
||||
cout<<"-aof add_out_format Format for output when adding to"
|
||||
<<" database"<<endl
|
||||
<<" in verbose or very_verbose"
|
||||
<<" modes; defaults to"<<endl
|
||||
<<" \"DB Size: %d\\tStream: "
|
||||
<<"%s\\tPair Number: %p\\n\""<<endl;
|
||||
cout<<"-cof compare_out_format Format for output when comparing"
|
||||
<<" with database"<<endl
|
||||
<<" in verbose or very_verbose modes;"
|
||||
<<" defaults to"<<endl
|
||||
<<" \"Pair Number: %p\\tStream"
|
||||
<<" Number: %s\\n\""<<endl;
|
||||
exit(-1);
|
||||
}
|
||||
|
||||
|
||||
|
72
11/wywolania/Data/stide_v1.2/config.h
Normal file
72
11/wywolania/Data/stide_v1.2/config.h
Normal file
@ -0,0 +1,72 @@
|
||||
#ifndef __SEQ_CONFIG_H
|
||||
#define __SEQ_CONFIG_H
|
||||
|
||||
#define CFREV 1
|
||||
|
||||
#include <iostream>
|
||||
#include <fstream>
|
||||
#include <string>
|
||||
#include <vector>
|
||||
#include "opt_info.h"
|
||||
|
||||
using std::vector;
|
||||
using std::ifstream;
|
||||
|
||||
class Config {
|
||||
public:
|
||||
Config(const int argc, const char *argv[]); // Constructor; reads
|
||||
// configuration file and command
|
||||
// line arguments
|
||||
string cfg_name; // Name of configuration file
|
||||
string db_name; // Name of database
|
||||
int seq_len; // Sequence Length
|
||||
int max_elements; // Maximum number of different
|
||||
// data elements we may encounter
|
||||
int max_streams; // Maximum number of different
|
||||
// streams we may encounter
|
||||
int pair_offset; // Number by which to offset
|
||||
// num_pairs_read
|
||||
string add_output_format; // Format for verbose-mode output
|
||||
// when adding to database
|
||||
string compare_output_format; // Format for verbose-mode output
|
||||
// when comparing with an
|
||||
// existing database
|
||||
int lf_size; // Size of locality frames: 1
|
||||
// effectively means don't
|
||||
// compute locality frames
|
||||
int add_to_db; // Flag indicating that we should
|
||||
// add to the database rather
|
||||
// than make comparisons
|
||||
int output_graph; // Output graphing information in
|
||||
// Dot format
|
||||
int compute_hdist; // Compute Hamming distance
|
||||
int write_db_stats; // Write statistics about the
|
||||
// database
|
||||
int verbose; // Output information about each
|
||||
// anomaly or each new sequence
|
||||
// added to the database
|
||||
int very_verbose; // Output information about each
|
||||
// sequence encountered
|
||||
char fmt_str[10][50]; // String used for outputting
|
||||
// information in verbose mode
|
||||
char write_val[7]; // Do we write the value? used
|
||||
// with fmt_str
|
||||
int num_fvars; // Number of format variables
|
||||
|
||||
void Config::InitOptArray(vector<OptInfo> &opt_array);
|
||||
void Config::SetDefaults();
|
||||
void Config::ReadCommandLine(const int argc, const char *argv[],
|
||||
vector<OptInfo> &opt_array);
|
||||
void Config::AssignValToVar(vector<OptInfo> &opt_array, const
|
||||
string &var_val, const string
|
||||
&var_name, const int name_type);
|
||||
void Config::ReadConfigFile(vector<OptInfo> &opt_array);
|
||||
void Config::ReadOldConfigFile(ifstream &cfg_file,
|
||||
vector<OptInfo> &opt_array);
|
||||
void Config::InitOutputFormat();
|
||||
void Config::CheckValues();
|
||||
void Config::OuputConfigInfo(const vector<OptInfo> &opt_array) const;
|
||||
void Config::WriteHelpInfo() const;
|
||||
};
|
||||
|
||||
#endif
|
461
11/wywolania/Data/stide_v1.2/flexitree.C
Normal file
461
11/wywolania/Data/stide_v1.2/flexitree.C
Normal file
@ -0,0 +1,461 @@
|
||||
// flexitree.C
|
||||
#include "flexitree.h"
|
||||
|
||||
#include<iostream>
|
||||
#include<ostream>
|
||||
|
||||
extern int counter;
|
||||
|
||||
using std::endl;
|
||||
using std::cerr;
|
||||
|
||||
// data structures:
|
||||
// node for a linked list
|
||||
class FlexiTreeNode {
|
||||
public:
|
||||
FlexiTree *tree; // the element at this node
|
||||
FlexiTreeNode *next; // pointer to the next node
|
||||
FlexiTreeNode(int root) {tree = new FlexiTree(root); next = NULL;}
|
||||
};
|
||||
//===========================================================================
|
||||
FlexiTree::FlexiTree(void) {
|
||||
children = NULL;
|
||||
root = -1;
|
||||
id = counter;
|
||||
counter++;
|
||||
}
|
||||
//===========================================================================
|
||||
FlexiTree::FlexiTree(int d) {
|
||||
children = NULL;
|
||||
root = d;
|
||||
id = counter;
|
||||
counter++;
|
||||
}
|
||||
//============================================================================
|
||||
FlexiTree::~FlexiTree(void) {
|
||||
if (children) {
|
||||
FlexiTreeNode *temp_ptr = children->next, *next_temp_ptr;
|
||||
if (children->tree) delete children->tree;
|
||||
delete children;
|
||||
while (temp_ptr) {
|
||||
next_temp_ptr = temp_ptr->next;
|
||||
if (temp_ptr->tree) delete temp_ptr->tree;
|
||||
delete temp_ptr;
|
||||
temp_ptr = next_temp_ptr;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
//============================================================================
|
||||
int FlexiTree::NumNodes(void) const {
|
||||
int size = 1;
|
||||
if (children) {
|
||||
FlexiTreeNode *temp_ptr = children;
|
||||
while (temp_ptr) {
|
||||
size += temp_ptr->tree->NumNodes();
|
||||
temp_ptr = temp_ptr->next;
|
||||
}
|
||||
}
|
||||
return size;
|
||||
}
|
||||
//============================================================================
|
||||
int FlexiTree::NumLeaves(void) const {
|
||||
int size;
|
||||
if (children) {
|
||||
size = 0;
|
||||
FlexiTreeNode *temp_ptr = children;
|
||||
while (temp_ptr) {
|
||||
size += temp_ptr->tree->NumLeaves();
|
||||
temp_ptr = temp_ptr->next;
|
||||
}
|
||||
} else size = 1;
|
||||
return size;
|
||||
}
|
||||
//============================================================================
|
||||
int FlexiTree::NumBranches(void) const {
|
||||
int branches = 0;
|
||||
if (children) {
|
||||
FlexiTreeNode *temp_ptr = children;
|
||||
while (temp_ptr) {
|
||||
branches += (temp_ptr->tree->NumBranches() + 1);
|
||||
temp_ptr = temp_ptr->next;
|
||||
}
|
||||
}
|
||||
return branches;
|
||||
}
|
||||
/**********************************************************************
|
||||
* InsertSeq() *
|
||||
* Inserts a sequence in this tree and returns 1 if the sequence *
|
||||
* begins with the root of this tree and the sequence isn't already *
|
||||
* in this tree. It returns -1 if the sequence doesn't begin with *
|
||||
* the root of this tree. It returns 0 if the sequence was already *
|
||||
* in this tree. This function is recursive and only compares the *
|
||||
* portion of the sequence lying between the argument first and the *
|
||||
* argument last. *
|
||||
* *
|
||||
* *
|
||||
* Input: const vector<int> &seq Current sequence *
|
||||
* int first The first element of the sequence *
|
||||
* to consider *
|
||||
* int last The length of the sequence *
|
||||
*********************************************************************/
|
||||
|
||||
int FlexiTree::InsertSeq(const vector<int> &seq, int first, int last)
|
||||
{
|
||||
// If the root of this tree isn't the same as the first element of
|
||||
// the sequence, return -1 to indicate that
|
||||
if (root != seq[first]) {
|
||||
return -1;
|
||||
}
|
||||
|
||||
first++; // shift the seq forward
|
||||
// If we have reached the end of the sequence now, we haven't added
|
||||
// anything to the tree, so we return 0 to indicate that it was
|
||||
// already there
|
||||
if (first > last) {
|
||||
return 0;
|
||||
}
|
||||
|
||||
// If there are no children, create some with the correct root,
|
||||
// insert the sequence and return 1.
|
||||
if (!children) {
|
||||
children = new FlexiTreeNode(seq[first]);
|
||||
children->tree->InsertSeq(seq, first, last);
|
||||
return 1;
|
||||
}
|
||||
|
||||
// The root agrees, we're not at the end, and there are children.
|
||||
// Now we want to know if the sequence is already in the children,
|
||||
// and if not, we want to find out and add it.
|
||||
FlexiTreeNode *temp_ptr = children;
|
||||
int flag;
|
||||
while (1) {
|
||||
flag = temp_ptr->tree->InsertSeq(seq, first, last);
|
||||
// If the sequence is new and gets added, return 1
|
||||
if (flag == 1) return 1;
|
||||
// If the sequence is old, return 0
|
||||
if (flag == 0) return 0;
|
||||
// Otherwise the new root of the sequence isn't the same as the
|
||||
// root of this child tree, so we will try the next one. But
|
||||
// first, if this is the last child, we know it isn't in here, so
|
||||
// we will add it in and return 1
|
||||
if (temp_ptr->next == NULL) {
|
||||
temp_ptr->next = new FlexiTreeNode(seq[first]);
|
||||
temp_ptr->next->tree->InsertSeq(seq, first, last);
|
||||
return 1;
|
||||
}
|
||||
temp_ptr = temp_ptr->next;
|
||||
}
|
||||
}
|
||||
|
||||
/*********************************************************************
|
||||
* IsSeqInTree() *
|
||||
* Returns 1 if the sequence has a match within this tree and *
|
||||
* returns 0 otherwise. This function is recursive and only *
|
||||
* compares the portion of the sequence lying between the argument *
|
||||
* first and the argument last. *
|
||||
* *
|
||||
* *
|
||||
* Input: vector<int> &seq Current sequence *
|
||||
* int first The first element of the sequence to *
|
||||
* consider *
|
||||
* int last The length of the sequence *
|
||||
********************************************************************/
|
||||
|
||||
int FlexiTree::IsSeqInTree(const vector<int> &seq, int first, int last) const
|
||||
{
|
||||
// If the first element of the sequence isn't the same as the root
|
||||
// of this tree, then we know already that there isn't a match here,
|
||||
// so return 0.
|
||||
if (root != seq[first]) {
|
||||
return 0;
|
||||
}
|
||||
first++; // shift the seq forward
|
||||
|
||||
// If we have reached the end of the sequence, then we have
|
||||
// found matches all the way along, so return 1 saying that this is
|
||||
// a match.
|
||||
if (first > last) {
|
||||
return 1;
|
||||
}
|
||||
|
||||
// Now we want to find out if there is a match in any of the
|
||||
// subtrees below this tree. The subtrees are contained in the
|
||||
// linked list children->next->next->...
|
||||
FlexiTreeNode *next_node = children;
|
||||
while (next_node != NULL) {
|
||||
if (next_node->tree->IsSeqInTree(seq, first, last)) {
|
||||
return 1; //Found it!
|
||||
}
|
||||
next_node = next_node->next;
|
||||
}
|
||||
// Now we've been through all of the subtrees without finding a
|
||||
// match, so there aren't any matches.
|
||||
return 0;
|
||||
}
|
||||
/*********************************************************************
|
||||
* ComputeHDistForTree() *
|
||||
* Reports the minimum number of mismatches with any sequence on *
|
||||
* this tree. This is a highly compute-intensive method, because *
|
||||
* every path down the tree is followed. This function is *
|
||||
* recursive, and only compares the portion of the sequence lying *
|
||||
* between the argument first and the argument last. *
|
||||
* *
|
||||
* *
|
||||
* Input: vector<int> &seq Current sequence *
|
||||
* int first The first element of the sequence to *
|
||||
* consider *
|
||||
* int last The length of the sequence *
|
||||
********************************************************************/
|
||||
|
||||
int FlexiTree::ComputeHDistForTree(vector<int> &seq, int first, int
|
||||
last) const
|
||||
{
|
||||
|
||||
int tot_misses = 0;
|
||||
|
||||
// If the first element of the sequence isn't the same as the root
|
||||
// of this tree, then every sequence on this tree will disagree with
|
||||
// the sequence here, so we increment tot_misses
|
||||
if (root != seq[first]) {
|
||||
tot_misses++;
|
||||
}
|
||||
|
||||
first++; // shift the seq forward
|
||||
if (first > last) { // reached the end of the seq
|
||||
return tot_misses; // return a zero, i.e. no mismatches
|
||||
}
|
||||
|
||||
// Now we want to add to tot_misses the smallest number of
|
||||
// mismatches with any of this tree's subtrees. This tree's
|
||||
// subtrees are in the linked list children->next->next->
|
||||
FlexiTreeNode *next_node = children;
|
||||
// last is the last element of the sequence, which is one less than
|
||||
// the number of elements in the sequence. The most misses possible
|
||||
// is the number of elements in the sequence.
|
||||
int min_misses = last + 1;
|
||||
int misses;
|
||||
while (next_node != NULL) {
|
||||
misses = next_node->tree->ComputeHDistForTree(seq, first, last);
|
||||
if (misses < min_misses) {
|
||||
min_misses = misses;
|
||||
}
|
||||
next_node = next_node->next;
|
||||
}
|
||||
return (tot_misses + min_misses);
|
||||
}
|
||||
//===========================================================================
|
||||
// format for writing out: we do it df, each path is terminated by a negative number,
|
||||
// which is -(the reqd backtrack length)-1. depth should start out as 0.
|
||||
// the tree writing out will end with -1.
|
||||
void FlexiTree::Write(ostream &s, int &depth) const {
|
||||
s<<root<<" ";
|
||||
FlexiTreeNode *temp_ptr = children;
|
||||
while (temp_ptr) {
|
||||
depth = 0;
|
||||
temp_ptr->tree->Write(s, depth);
|
||||
temp_ptr = temp_ptr->next;
|
||||
if (temp_ptr) s<<"-"<<(depth + 1)<<" ";
|
||||
}
|
||||
depth++; // now incr the count
|
||||
}
|
||||
//=============================================================================
|
||||
ostream &operator<<(ostream &s, const FlexiTree &tree) {
|
||||
int depth = 0;
|
||||
tree.Write(s, depth);
|
||||
s<<" -1"; // we terminate with a -1
|
||||
return s;
|
||||
}
|
||||
//===========================================================================
|
||||
// returns 0 if we have reached the end of the file, 1 otherwise
|
||||
int FlexiTree::Read(istream &s, int &depth) {
|
||||
int next_num;
|
||||
if (s.eof()) return 0;
|
||||
s>>next_num;
|
||||
|
||||
if (next_num == -1) return 0; // we have reached the end of the tree
|
||||
|
||||
if (next_num >= 0) {
|
||||
children = new FlexiTreeNode(next_num);
|
||||
if (!children->tree->Read(s, depth)) return 0;
|
||||
FlexiTreeNode *temp_ptr = children;
|
||||
while (depth == 0) {
|
||||
if (s.eof()) return 0;
|
||||
s>>next_num;
|
||||
if (next_num == -1) return 0; // we have reached the end of the tree
|
||||
temp_ptr->next = new FlexiTreeNode(next_num);
|
||||
temp_ptr = temp_ptr->next;
|
||||
if (!temp_ptr->tree->Read(s, depth)) return 0;
|
||||
}
|
||||
} else depth = (-1 * next_num) - 1;
|
||||
if (depth) depth--;
|
||||
return 1;
|
||||
}
|
||||
//=============================================================================
|
||||
istream &operator>>(istream &s, FlexiTree &tree) {
|
||||
int next_num, depth = 0;
|
||||
s>>next_num;
|
||||
tree.SetRoot(next_num);
|
||||
tree.Read(s, depth);
|
||||
return s;
|
||||
}
|
||||
//===========================================================================
|
||||
// writes out in the format that dot uses for dags
|
||||
int FlexiTree::OutputGraph(ostream &s) const {
|
||||
// first write out the name of the tree
|
||||
s<<" "<<id<<" [label=\""<<root<<"\",shape=plaintext];"<<endl;
|
||||
|
||||
FlexiTreeNode *temp_ptr = children;
|
||||
int childid;
|
||||
while (temp_ptr) {
|
||||
childid = temp_ptr->tree->OutputGraph(s);
|
||||
s<<" "<<id<<" -> "<<childid<<";"<<endl;
|
||||
temp_ptr = temp_ptr->next;
|
||||
}
|
||||
return id;
|
||||
}
|
||||
|
||||
/*********************************************************************
|
||||
* IsSeqInForest() *
|
||||
* Searches through database forest to locate sequence. Returns 1 *
|
||||
* if it finds it, 0 otherwise *
|
||||
*********************************************************************/
|
||||
|
||||
int SeqForest::IsSeqInForest(const vector<int> &seq, int seq_len) const
|
||||
{
|
||||
// Have we ever seen a sequence starting with the same root?
|
||||
if (trees_found[seq[0]]) {
|
||||
// Have we seen this precise sequence?
|
||||
return trees[seq[0]].IsSeqInTree(seq, 0, seq_len-1);
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
SeqForest::SeqForest(int max_trees)
|
||||
{
|
||||
trees = vector<FlexiTree>(max_trees);
|
||||
trees_found = vector<int>(max_trees, 0);
|
||||
}
|
||||
|
||||
|
||||
/*
|
||||
#include "fstream.h"
|
||||
|
||||
// for test purposes
|
||||
void main(void) {
|
||||
FlexiTree tree(1);
|
||||
vector<int> seq(10);
|
||||
|
||||
// try out insert and write
|
||||
seq[0] = 1; seq[1] = 1; seq[2] = 2; seq[3] = 3;
|
||||
tree.SeqInsert(seq, 0, 3);
|
||||
cout<<"1123:"<<tree<<endl;
|
||||
seq[0] = 1; seq[1] = 1; seq[2] = 3; seq[3] = 5;
|
||||
tree.SeqInsert(seq, 0, 3);
|
||||
cout<<"1134:"<<tree<<endl;
|
||||
seq[0] = 1; seq[1] = 2; seq[2] = 2; seq[3] = 3;
|
||||
tree.SeqInsert(seq, 0, 3);
|
||||
cout<<"1223:"<<tree<<endl;
|
||||
seq[0] = 1; seq[1] = 2; seq[2] = 3; seq[3] = 3;
|
||||
tree.SeqInsert(seq, 0, 3);
|
||||
cout<<"1233:"<<tree<<endl;
|
||||
seq[0] = 1; seq[1] = 2; seq[2] = 3; seq[3] = 4;
|
||||
tree.SeqInsert(seq, 0, 3);
|
||||
cout<<"1234:"<<tree<<endl;
|
||||
seq[0] = 1; seq[1] = 2; seq[2] = 3; seq[3] = 4;
|
||||
tree.SeqInsert(seq, 0, 3);
|
||||
cout<<"1234:"<<tree<<endl;
|
||||
seq[0] = 1; seq[1] = 2; seq[2] = 1; seq[3] = 4;
|
||||
tree.SeqInsert(seq, 0, 3);
|
||||
cout<<"1214:"<<tree<<endl;
|
||||
|
||||
// now try out search
|
||||
seq[0] = 1; seq[1] = 2; seq[2] = 1; seq[3] = 4;
|
||||
if (tree.SeqSearch(seq, 0, 3)) cout<<"found 1214"<<endl;
|
||||
else cout<<"could not find 1214"<<endl;
|
||||
seq[0] = 1; seq[1] = 2; seq[2] = 2; seq[3] = 4;
|
||||
if (tree.SeqSearch(seq, 0, 3)) cout<<"found 1224"<<endl;
|
||||
else cout<<"could not find 1224"<<endl;
|
||||
seq[0] = 1; seq[1] = 2; seq[2] = 4; seq[3] = 4;
|
||||
if (tree.SeqSearch(seq, 0, 3)) cout<<"found 1244"<<endl;
|
||||
else cout<<"could not find 1244"<<endl;
|
||||
seq[0] = 1; seq[1] = 1; seq[2] = 3; seq[3] = 5;
|
||||
if (tree.SeqSearch(seq, 0, 3)) cout<<"found 1134"<<endl;
|
||||
else cout<<"could not find 1134"<<endl;
|
||||
|
||||
// try out insert and write with shorter and longer sequences
|
||||
seq[0] = 1; seq[1] = 3;
|
||||
tree.SeqInsert(seq, 0, 1);
|
||||
cout<<"13:"<<tree<<endl;
|
||||
seq[0] = 1; seq[1] = 1; seq[2] = 4;
|
||||
tree.SeqInsert(seq, 0, 2);
|
||||
cout<<"114:"<<tree<<endl;
|
||||
seq[0] = 1; seq[1] = 2; seq[2] = 3; seq[3] = 1; seq[4] = 1; seq[5] = 2; seq[6] = 1; seq[7] = 4;
|
||||
tree.SeqInsert(seq, 0, 7);
|
||||
cout<<"12311214:"<<tree<<endl;
|
||||
|
||||
if (tree.SeqSearch(seq, 0, 7)) cout<<"found 12311214"<<endl;
|
||||
else cout<<"could not find 12311214"<<endl;
|
||||
seq[0] = 1; seq[1] = 1; seq[2] = 5;
|
||||
if (tree.SeqSearch(seq, 0, 2)) cout<<"found 115"<<endl;
|
||||
else cout<<"could not find 115"<<endl;
|
||||
if (tree.SeqSearch(seq, 0, 1)) cout<<"found 11"<<endl;
|
||||
else cout<<"could not find 11"<<endl;
|
||||
|
||||
ofstream outf("test.out");
|
||||
outf<<tree;
|
||||
outf.close();
|
||||
|
||||
//counter = 0;
|
||||
|
||||
FlexiTree intree;
|
||||
ifstream inf("test.out");
|
||||
inf>>intree;
|
||||
inf.close();
|
||||
|
||||
cout<<endl<<intree;
|
||||
|
||||
seq[0] = 1; seq[1] = 2; seq[2] = 3; seq[3] = 1; seq[4] = 1; seq[5] = 2; seq[6] = 1; seq[7] = 4;
|
||||
if (intree.SeqSearch(seq, 0, 7)) cout<<"found 12311214"<<endl;
|
||||
else cout<<"could not find 12311214"<<endl;
|
||||
seq[0] = 1; seq[1] = 1; seq[2] = 5;
|
||||
if (intree.SeqSearch(seq, 0, 2)) cout<<"found 115"<<endl;
|
||||
else cout<<"could not find 115"<<endl;
|
||||
if (intree.SeqSearch(seq, 0, 1)) cout<<"found 11"<<endl;
|
||||
else cout<<"could not find 11"<<endl;
|
||||
|
||||
}
|
||||
*/
|
||||
|
||||
/*
|
||||
int FlexiTree::Read(istream &s, int &depth) {
|
||||
int next_num, depth_decr = 0;
|
||||
s>>next_num;
|
||||
if (next_num == -1) return 0; // we have reached the end of the tree
|
||||
if (next_num >= 0) {
|
||||
children = new FlexiTreeNode(next_num);
|
||||
if (!children->tree->Read(s, depth)) return 0;
|
||||
if (depth) {
|
||||
depth--;
|
||||
depth_decr = 1;
|
||||
}
|
||||
FlexiTreeNode *temp_ptr = children;
|
||||
while (depth == 0) {
|
||||
depth_decr = 0;
|
||||
s>>next_num;
|
||||
if (next_num == -1) return 0; // we have reached the end of the tree
|
||||
temp_ptr->next = new FlexiTreeNode(next_num);
|
||||
temp_ptr = temp_ptr->next;
|
||||
if (!temp_ptr->tree->Read(s, depth)) return 0;
|
||||
if (depth) {
|
||||
depth--;
|
||||
depth_decr = 1;
|
||||
}
|
||||
}
|
||||
if (!depth_decr && depth) depth--;
|
||||
} else
|
||||
depth = (-1 * next_num) - 1;
|
||||
return 1;
|
||||
}
|
||||
*/
|
||||
|
54
11/wywolania/Data/stide_v1.2/flexitree.h
Normal file
54
11/wywolania/Data/stide_v1.2/flexitree.h
Normal file
@ -0,0 +1,54 @@
|
||||
#ifndef __FLEXITREE_H
|
||||
#define __FLEXITREE_H
|
||||
|
||||
#include<vector>
|
||||
#include<iostream>
|
||||
#include<ostream>
|
||||
|
||||
using std::ostream;
|
||||
using std::istream;
|
||||
using std::vector;
|
||||
|
||||
class FlexiTreeNode;
|
||||
class FlexiTree {
|
||||
private:
|
||||
FlexiTreeNode *children;
|
||||
int root;
|
||||
int id;
|
||||
public:
|
||||
void Write(ostream &s, int &depth) const;
|
||||
int Read(istream &s, int &depth);
|
||||
int OutputGraph(ostream &s) const;
|
||||
FlexiTree();
|
||||
FlexiTree(int d);
|
||||
// FlexiTree(const FlexiTree& ft);
|
||||
~FlexiTree();
|
||||
void SetRoot(int d) {root = d;}
|
||||
int InsertSeq(const vector<int> &seq, int first, int last);
|
||||
int IsSeqInTree(const vector<int> &seq, int first, int last) const;
|
||||
int ComputeHDistForTree(vector<int> &seq, int first, int last) const;
|
||||
friend ostream &operator<<(ostream &s, const FlexiTree &tn);
|
||||
friend istream &operator>>(istream &s, FlexiTree &tn);
|
||||
int NumNodes() const; // returns the number of nodes in the tree
|
||||
int NumLeaves() const; // returns the number of leaves in the tree, i.e num of distinct seqs
|
||||
int NumBranches() const; // returns the total # of branches, of all nodes
|
||||
};
|
||||
|
||||
//===========================================================================
|
||||
class SeqForest {
|
||||
public:
|
||||
// this structure is a an array of N tree nodes, i.e. a tree for each value
|
||||
// type
|
||||
vector<FlexiTree> trees;
|
||||
// this structure is to record what types of values actually occured -
|
||||
// for efficiency, if there were actually fewer value types than
|
||||
// specified in the config
|
||||
vector<int> trees_found;
|
||||
SeqForest(int max_trees);
|
||||
int IsSeqInForest(const vector<int> &seq, int seq_len) const;
|
||||
};
|
||||
|
||||
//===========================================================================
|
||||
|
||||
#endif
|
||||
|
34
11/wywolania/Data/stide_v1.2/opt_info.h
Normal file
34
11/wywolania/Data/stide_v1.2/opt_info.h
Normal file
@ -0,0 +1,34 @@
|
||||
#ifndef __OPT_INFO_H
|
||||
#define __OPT_INFO_H
|
||||
|
||||
#include <string>
|
||||
|
||||
#define NUM_OPTS 16
|
||||
#define SHORT_NAME 0
|
||||
#define LONG_NAME 1
|
||||
|
||||
using std::string;
|
||||
|
||||
class OptInfo {
|
||||
public:
|
||||
string long_name; // Long name of this option; used in
|
||||
// configuration file and with the -- marker
|
||||
// on the command line
|
||||
string short_name; // Short name of this option; used with the -
|
||||
// marker on the command line
|
||||
int set; // Flag indicating if this option has already
|
||||
// been set
|
||||
char type; // type of value: legitimate values are f
|
||||
// (flag, i.e., boolean), i (int), s (string)
|
||||
// or h (help)
|
||||
union { // pointer to actual value to be set
|
||||
int *flag_val; // value if type = 'f'
|
||||
int *int_val; // value if type = 'i'
|
||||
string *str_val; // value if type = 's'
|
||||
};
|
||||
|
||||
OptInfo() {};
|
||||
};
|
||||
|
||||
#endif
|
||||
|
54
11/wywolania/Data/stide_v1.2/sample.config
Normal file
54
11/wywolania/Data/stide_v1.2/sample.config
Normal file
@ -0,0 +1,54 @@
|
||||
#ConfigFileRev: 1
|
||||
#Sample STIDE configuration file containing default values.
|
||||
|
||||
db_name: default.db # name of database
|
||||
seq_len: 6 # length of sequences
|
||||
max_elements: 500 # maximum number of unique elements in input
|
||||
max_streams: 100 # maximum number of unique streams in input
|
||||
pair_offset: 0 # offset for pair number count
|
||||
add_output_format: \
|
||||
"DB Size: %d\tStream: %s\tPair Number: %p\n"
|
||||
# In verbose mode, STIDE will print
|
||||
# this information for every new
|
||||
# sequence added to the database. In
|
||||
# very verbose mode, STIDE will print
|
||||
# this information for every sequence
|
||||
# considered. Possible data:
|
||||
# %d Database Size
|
||||
# %i Pair number of last data element of
|
||||
# sequence in its particular
|
||||
# data stream
|
||||
# %p Pair number of last data element of
|
||||
# sequence in the whole input
|
||||
# stream
|
||||
# %s Stream Number
|
||||
|
||||
compare_output_format: \
|
||||
"Pair Number: %p\tStream Number: %s\n"
|
||||
# In verbose mode, STIDE will print
|
||||
# this information for every sequence
|
||||
# which is itself an anomaly or whose
|
||||
# locality frame conatins an anomaly.
|
||||
# In very verbose mode, STIDE will
|
||||
# print this information for every
|
||||
# sequence. Possible data:
|
||||
# %a 1 if this sequence is an anomaly, 0
|
||||
# otherwise
|
||||
# %c locality frame count of this sequence
|
||||
# %h Hamming distance
|
||||
# %i Pair number of last data element of
|
||||
# its particular data stream
|
||||
# %p Pair number of last data element of
|
||||
# the entire input
|
||||
# %s Stream Number
|
||||
lf_size: 1 # 1 causes locality frame counts not
|
||||
# to be computed
|
||||
add_to_db: off # Add this data to the database, or, if there
|
||||
# is no database, create a new one -- do not
|
||||
# do comparisons
|
||||
output_graph: off # Outputs graphing information in Dot
|
||||
# format
|
||||
compute_hdist: off # Compute Hamming distances
|
||||
write_db_stats: off # At end, print out statistics about database
|
||||
verbose: off # See add_ouput_format and compare_output_format
|
||||
very_verbose: off # See add_ouput_format and compare_output_format
|
576
11/wywolania/Data/stide_v1.2/stide.C
Normal file
576
11/wywolania/Data/stide_v1.2/stide.C
Normal file
@ -0,0 +1,576 @@
|
||||
/*********************************************************************
|
||||
* *
|
||||
* STIDE: Sequence Time-Delay Embedding v1.2 *
|
||||
* *
|
||||
* Written by Steve Hofmeyr 7/21/1996 *
|
||||
* Revised by Julie Rehmeyer 3/1998 *
|
||||
* Revised by Hajime Inoue 11/2006 *
|
||||
* *
|
||||
* Copyright (C) 1996, 1998 Regents of the University of New Mexico. *
|
||||
* Copyright (C) 2006 Hajime Inoue. *
|
||||
* All Rights Reserved. *
|
||||
* *
|
||||
* This program is free software; you can redistribute it and/or *
|
||||
* modify it under the terms of the GNU General Public License as *
|
||||
* published by the Free Software Foundation; either version 2 of *
|
||||
* the License, or (at your option) any later version. *
|
||||
* *
|
||||
* This program is distributed in the hope that it will be useful, *
|
||||
* but WITHOUT ANY WARRANTY; without even the implied warranty of *
|
||||
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the *
|
||||
* GNU General Public License for more details. *
|
||||
* *
|
||||
* You should have received a copy of the GNU General Public *
|
||||
* License along with this program; if not, write to the Free *
|
||||
* Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, *
|
||||
* USA. *
|
||||
* *
|
||||
********************************************************************/
|
||||
|
||||
#include <stdlib.h>
|
||||
#include <string>
|
||||
#include <iostream>
|
||||
#include <fstream>
|
||||
#include <vector>
|
||||
#include <map>
|
||||
#include "config.h"
|
||||
#include "stream.h"
|
||||
#include "flexitree.h"
|
||||
|
||||
#define DBREV 1
|
||||
|
||||
using std::vector;
|
||||
|
||||
using std::cin;
|
||||
using std::cerr;
|
||||
using std::cout;
|
||||
using std::endl;
|
||||
using std::ofstream;
|
||||
|
||||
typedef std::map<int, int> HashTableInt;
|
||||
|
||||
int counter = 0;
|
||||
|
||||
Stream *GetReadyStream(vector<Stream> &streams, HashTableInt
|
||||
&sid_table, int &num_streams_fnd, int
|
||||
&total_pairs_read, const Config &cfg);
|
||||
int ReadDB(SeqForest &db_forest, const string &db_name,
|
||||
int &seq_len);
|
||||
void WriteDB(const SeqForest &db_forest, const string &db_name, const
|
||||
int db_size, const int seq_len);
|
||||
void FinalReport(const Config &cfg, const SeqForest &normal, const int
|
||||
num_streams_fnd, const int num_seqs_added, const
|
||||
vector<Stream> &streams, const int db_size);
|
||||
void WriteDBStats(const SeqForest &db_forest, ostream &out_stream,
|
||||
const int db_size);
|
||||
void OutputGraph(const SeqForest &db_forest, string db_name);
|
||||
int GetPrimeLargerThan(const int n);
|
||||
|
||||
|
||||
int ExtToInt(HashTableInt &sid_table, int key, int next_value)
|
||||
{
|
||||
if(sid_table.find(key) == sid_table.end())
|
||||
sid_table[key] = next_value;
|
||||
|
||||
return sid_table[key];
|
||||
}
|
||||
|
||||
|
||||
/*********************************************************************
|
||||
* main() *
|
||||
* Input: int argc: Number of command-line arguments *
|
||||
* char *argv[]: array of strings containing *
|
||||
* command-line arguments *
|
||||
* Output: 0 if successful, -1 if unsuccessful *
|
||||
*********************************************************************/
|
||||
|
||||
int main(int argc, char *argv[])
|
||||
|
||||
{
|
||||
Config cfg((const int) argc, (const char **) argv);
|
||||
// Declare configuration object and do
|
||||
// the configuration on the basis of the
|
||||
// command line arguments and the
|
||||
// configuration file
|
||||
Stream *active_stream; // This will point to the stream that
|
||||
// currently has a sequence to be worked
|
||||
// on (either added to the database or
|
||||
// compared).
|
||||
HashTableInt sid_table;
|
||||
// Hash table relating external stream ids to
|
||||
// internal sids; make size of table
|
||||
// smallest prime larger than the number
|
||||
// of streams
|
||||
SeqForest normal(cfg.max_elements); // Uninitialized forest of
|
||||
// normal sequences
|
||||
vector<Stream> streams(cfg.max_streams); // Array of stream objects,
|
||||
// one for each data stream
|
||||
// in input, which are
|
||||
// allocated as needed
|
||||
int num_streams_fnd = 0; // Number of data streams
|
||||
// encountered to date
|
||||
int total_pairs_read = cfg.pair_offset; // Number of pairs read from
|
||||
// input to date from all
|
||||
// the data streams combined
|
||||
// -- can be offset using
|
||||
// the "-n" switch
|
||||
int db_size; // Total number of unique
|
||||
// sequences in the database
|
||||
int init_db_size = 0; // Number of unique
|
||||
// sequences in the
|
||||
// pre-existing database
|
||||
|
||||
|
||||
|
||||
|
||||
// Read database into normal, if database exists
|
||||
db_size = init_db_size = ReadDB(normal, cfg.db_name, cfg.seq_len);
|
||||
|
||||
if (cfg.add_to_db)
|
||||
{
|
||||
while ((active_stream =
|
||||
GetReadyStream(streams, sid_table, num_streams_fnd,
|
||||
total_pairs_read, cfg)) != NULL)
|
||||
{
|
||||
active_stream->AddToDB(normal, db_size, total_pairs_read, cfg);
|
||||
}
|
||||
WriteDB(normal, cfg.db_name, db_size, cfg.seq_len);
|
||||
if (cfg.output_graph)
|
||||
{
|
||||
OutputGraph(normal,cfg.db_name);
|
||||
}
|
||||
|
||||
}
|
||||
else
|
||||
{
|
||||
int i = 0;
|
||||
while ((active_stream =
|
||||
GetReadyStream(streams, sid_table, num_streams_fnd,
|
||||
total_pairs_read, cfg)) != NULL)
|
||||
{
|
||||
active_stream->CompareSeq(cfg, normal, total_pairs_read);
|
||||
}
|
||||
}
|
||||
|
||||
FinalReport(cfg, normal, num_streams_fnd, db_size - init_db_size,
|
||||
streams, db_size);
|
||||
|
||||
return(0);
|
||||
}
|
||||
|
||||
/**********************************************************************
|
||||
* GetReadyStream() *
|
||||
* This function reads a pair from the input, appends the element *
|
||||
* to the current sequence string in the appropriate data stream, *
|
||||
* finds out if that data stream has a complete sequence to be *
|
||||
* processed, continues until it has found such a data stream, and *
|
||||
* returns a pointer to it. It updates num_streams_fnd, *
|
||||
* total_pairs_read, sid_table, and streams. *
|
||||
* *
|
||||
* Input: vector<Stream> &streams: the array of streams that we have *
|
||||
* found so far *
|
||||
* HashTableInt &sid_table: hash table relating external sids *
|
||||
* to internal sids *
|
||||
* int &num_streams_fnd: the number of streams found so far; *
|
||||
* int &total_pairs_read: the number of pairs read from the *
|
||||
* input stream so far *
|
||||
* const Config &cfg: configuration information *
|
||||
* *
|
||||
* Output: a pointer to the next stream that is ready for processing *
|
||||
**********************************************************************/
|
||||
|
||||
Stream *GetReadyStream(vector<Stream> &streams, HashTableInt
|
||||
&sid_table, int &num_streams_fnd, int
|
||||
&total_pairs_read, const Config &cfg)
|
||||
|
||||
{
|
||||
Stream *ready_stream = NULL;
|
||||
int ext_sid;
|
||||
int int_sid;
|
||||
int sval;
|
||||
|
||||
cin >> ext_sid;
|
||||
while (!cin.eof()) {
|
||||
if (ext_sid == -1) {
|
||||
break;
|
||||
}
|
||||
// int_sid = sid_table.ExtToInt(ext_sid, num_streams_fnd);
|
||||
int_sid = ExtToInt(sid_table, ext_sid, num_streams_fnd);
|
||||
cin >> sval;
|
||||
++total_pairs_read;
|
||||
|
||||
// Update num_streams_fnd, if necessary
|
||||
if (int_sid >= num_streams_fnd) {
|
||||
if (int_sid > cfg.max_streams) {
|
||||
cerr<<"ERROR: Too many streams to follow, aborting..."<<endl;
|
||||
exit(-1);
|
||||
}
|
||||
|
||||
// We need a new stream object
|
||||
if(num_streams_fnd == streams.size())
|
||||
{
|
||||
cerr << "WRITING OVER THE END OF THE ARRAY" << endl;
|
||||
cerr << "num_streams_fnd: " << num_streams_fnd << endl;
|
||||
cerr << "cfg.max_streams: " << cfg.max_streams << endl;
|
||||
exit(-1);
|
||||
}
|
||||
streams[num_streams_fnd].Init(cfg, int_sid, ext_sid);
|
||||
num_streams_fnd = int_sid + 1;
|
||||
}
|
||||
streams[int_sid].Append(sval);
|
||||
if (streams[int_sid].Ready()) {
|
||||
ready_stream = &streams[int_sid];
|
||||
break;
|
||||
}
|
||||
cin >> ext_sid;
|
||||
}
|
||||
|
||||
return ready_stream;
|
||||
}
|
||||
|
||||
/*********************************************************************
|
||||
* ReadDB() *
|
||||
* Reads the database from a file and returns the number of unique *
|
||||
* sequences in the database. Checks for appropriate revision *
|
||||
* number. If it is a revision DBREV database, the second line *
|
||||
* will be "#DBseq_len: " followed by the sequence length. The *
|
||||
* next line will contain a single number, giving the root of the *
|
||||
* first tree. The following lines will contain the tree itself. *
|
||||
* The first seq_len numbers make up the first sequence (so the *
|
||||
* first number of the second line will be the same as the number *
|
||||
* on the first line). The next number will be a negative number *
|
||||
* between -(seq_len-1) and -2, indicating how far to backtrack in *
|
||||
* the first sequence, and the following positive numbers give the *
|
||||
* rest of the second sequence. So, for example, -3 would mean *
|
||||
* backtrack 3 numbers, take the previous numbers including the *
|
||||
* one you're on, and append the next two numbers. So after the *
|
||||
* -3 you would find two positive numbers, followed by a negative *
|
||||
* number (which you would use the same way as you used the -3, on *
|
||||
* the most recent sequence). Each tree is terminated by the *
|
||||
* number -1. So the sample input file *
|
||||
* 3 *
|
||||
* 3 4 2 9 10 3 -4 3 9 8 -2 3 -3 4 9 -1 *
|
||||
* 2 *
|
||||
* 2 3 4 5 6 7 -3 2 9 -1 *
|
||||
* yields the sequences: *
|
||||
* 3 4 2 9 10 3 *
|
||||
* 3 4 2 3 9 8 *
|
||||
* 3 4 2 3 9 3 *
|
||||
* 3 4 2 3 4 9 *
|
||||
* 2 3 4 5 6 7 *
|
||||
* 2 3 4 5 2 9 *
|
||||
* *
|
||||
* Input: SeqForest &db_forest Forest of sequences *
|
||||
* const string &db_name Name of database *
|
||||
* int &seq_len User-specified sequence length *
|
||||
* *
|
||||
* Output: the number of unique sequences in the database *
|
||||
* *
|
||||
********************************************************************/
|
||||
|
||||
int ReadDB(SeqForest &db_forest, const string &db_name,
|
||||
int &seq_len)
|
||||
{
|
||||
ifstream in_db_file(db_name.c_str()); // file to read the database from
|
||||
int db_size = 0; // size of the database
|
||||
int root; // the first element of the sequences
|
||||
// we are reading in at the moment;
|
||||
// i.e., the root of this tree
|
||||
string buff;
|
||||
int db_seq_len;
|
||||
int rev_num;
|
||||
|
||||
if (!in_db_file.is_open()) {
|
||||
cerr<<"WARNING: Cannot open database file " << db_name
|
||||
<< " for input"<<endl<<"Creating a new file"<<endl;
|
||||
return 0;
|
||||
}
|
||||
|
||||
// Check to see if the first line contains "#DBrev:"
|
||||
in_db_file>>buff;
|
||||
if (buff == "#DBrev:") {
|
||||
in_db_file>>rev_num;
|
||||
if (rev_num > DBREV) {
|
||||
cerr << "ERROR: The revision number is greater than " << DBREV
|
||||
<< ". This version of STIDE is only capable of dealing "
|
||||
<< "with databases through DBrev " << DBREV
|
||||
<< ". Aborting..."<<endl;
|
||||
exit(-1);
|
||||
}
|
||||
if (rev_num < DBREV) {
|
||||
cerr << "ERROR: Revision number of database must be >= " << DBREV
|
||||
<< endl;
|
||||
exit(-1);
|
||||
}
|
||||
// Now we know that it is revision DBREV. Check sequence length of
|
||||
// database against user-indicated sequence length
|
||||
in_db_file>>buff;
|
||||
// Now check to see if next line is "#DBseq_len: " followed by a
|
||||
// number
|
||||
if (buff != "#DBseq_len:") {
|
||||
cerr << "ERROR: The second line of the database does not "
|
||||
<< "contain the string \"#DBseq_len: \"" << endl
|
||||
<< "followed by the sequence length of the database, as "
|
||||
<< "required of revision " << DBREV
|
||||
<< " databases. Aborting..."<< endl;
|
||||
exit(-1);
|
||||
}
|
||||
in_db_file>>db_seq_len;
|
||||
if (db_seq_len != seq_len) {
|
||||
cerr << "WARNING: Database sequence length is " << db_seq_len
|
||||
<< ", which does not match "
|
||||
<< "sequence length specified" << endl
|
||||
<< "by user (or by default if no specification was given), "
|
||||
<< "which is " << seq_len << endl
|
||||
<< "I will use the database sequence length. If that is "
|
||||
<< "not what you intended, type Ctrl-C to abort." << endl;
|
||||
seq_len = db_seq_len;
|
||||
}
|
||||
// Read next number into root
|
||||
in_db_file >> root;
|
||||
}
|
||||
// Otherwise, we assume we have an old-style database, and let the
|
||||
// user know that that's our assumption
|
||||
else {
|
||||
cerr << "WARNING: The string \"DBrev: \" is not in the first "
|
||||
<< "line of the database." << endl
|
||||
<< "I'm assuming that it's an older style of database, and "
|
||||
<< "will read it in" << endl
|
||||
<< "based on that assumption. If that is not what you want "
|
||||
<< "me to do, type CTRL-C" << endl << endl;
|
||||
// we have just read the first root into buff -- put it in root
|
||||
// instead
|
||||
root = atoi(buff.c_str());
|
||||
}
|
||||
|
||||
while (!in_db_file.eof()) {
|
||||
if (root == -1) break;
|
||||
db_forest.trees_found[root]++;
|
||||
in_db_file>>db_forest.trees[root];
|
||||
db_size += db_forest.trees[root].NumLeaves();
|
||||
in_db_file>>root;
|
||||
}
|
||||
in_db_file.close();
|
||||
|
||||
return db_size;
|
||||
}
|
||||
|
||||
|
||||
/*********************************************************************
|
||||
* WriteDB() *
|
||||
* Writes db_forest to the file db_name, with the format described *
|
||||
* in the header of ReadDB(). Prints database statistics at the *
|
||||
* end of the file. *
|
||||
* *
|
||||
* Input: const SeqForest &db_forest Forest of sequences in *
|
||||
* database *
|
||||
* const string &db_name Name of file in which to *
|
||||
* put database. *
|
||||
* const int db_size Number of unique sequences *
|
||||
* in the database *
|
||||
* const int seq_len Sequence length *
|
||||
* *
|
||||
* Output: none *
|
||||
********************************************************************/
|
||||
|
||||
void WriteDB(const SeqForest &db_forest, const string &db_name, const
|
||||
int db_size, const int seq_len)
|
||||
{
|
||||
ofstream out_db_file(db_name.c_str());
|
||||
|
||||
if (!out_db_file.is_open())
|
||||
{
|
||||
cerr << "ERROR: Cannot open database file " << db_name
|
||||
<< "for output, aborting..." << endl ;
|
||||
exit(-2);
|
||||
}
|
||||
|
||||
out_db_file << "#DBrev: " << DBREV << endl;
|
||||
out_db_file << "#DBseq_len: " << seq_len << endl;
|
||||
|
||||
for (int i = 0; i < db_forest.trees.size(); i++)
|
||||
{
|
||||
if (db_forest.trees_found[i])
|
||||
{
|
||||
out_db_file<<i<<endl;
|
||||
out_db_file << db_forest.trees[i] << endl;
|
||||
}
|
||||
}
|
||||
|
||||
out_db_file<<" -1"<<endl;
|
||||
// we can now write anything, so I will write the db stats
|
||||
out_db_file<<"; DB STATS"<<endl;
|
||||
WriteDBStats(db_forest, out_db_file, db_size);
|
||||
out_db_file.close();
|
||||
}
|
||||
|
||||
/*********************************************************************
|
||||
* FinalReport() *
|
||||
* Reports data at end of run. The number of streams, the number *
|
||||
* of input pairs, and the number of sequences in the input are *
|
||||
* always reported. If we have done a comparison run, we report *
|
||||
* the number of anomalies, and the precentage of sequences that *
|
||||
* were anomalous. Additionally, if asked for, the Hamming *
|
||||
* distance or locality frame count is reported. If we have added *
|
||||
* to the database, we report having done so and report the number *
|
||||
* of sequences added. If database statistics are asked for, we *
|
||||
* report the number of nodes, the number of unique sequences, the *
|
||||
* number of branches, and the average database branch factor. *
|
||||
* *
|
||||
* Input: const Config &cfg: Configuration information *
|
||||
* const SeqForest &normal: DB of normal sequences *
|
||||
* const int num_streams_fnd: Total number of streams found*
|
||||
* const int num_seqs_added: Number of unique sequences *
|
||||
* added *
|
||||
* const vector<Stream> &streams: Array of data streams *
|
||||
* const int db_size: Number of unique sequences *
|
||||
* in DB *
|
||||
* *
|
||||
* Output: none *
|
||||
* *
|
||||
*********************************************************************/
|
||||
|
||||
void FinalReport(const Config &cfg, const SeqForest &normal, const int
|
||||
num_streams_fnd, const int num_seqs_added, const
|
||||
vector<Stream> &streams, const int db_size)
|
||||
{
|
||||
int total_pairs = 0;
|
||||
int total_seqs = 0;
|
||||
int total_anoms = 0;
|
||||
int total_max_lfc = 0;
|
||||
int total_max_hdist = 0;
|
||||
int db_nodes = 0;
|
||||
int db_seqs = 0;
|
||||
int db_branches = 0;
|
||||
int j;
|
||||
|
||||
// Sum up number of pairs input and number of seqs from all the streams
|
||||
for (j = 0; j < num_streams_fnd; j++) {
|
||||
total_seqs += streams[j].GetNumSeqsFnd();
|
||||
total_pairs += streams[j].GetNumPairsRead();
|
||||
}
|
||||
|
||||
cout << endl;
|
||||
cout << "Number of different streams in input = "
|
||||
<< num_streams_fnd << endl;
|
||||
cout << "Total number of input pairs = "
|
||||
<< total_pairs << endl;
|
||||
cout << "Total number of sequences in input = "
|
||||
<< total_seqs << endl;
|
||||
|
||||
if (cfg.add_to_db) {
|
||||
cout << "File added to database" << endl;
|
||||
cout << "Number of new sequences added to the database: "
|
||||
<< num_seqs_added << endl;
|
||||
}
|
||||
else {
|
||||
cout << "Scan completed" << endl;
|
||||
// Sum up number of anomalies from all the streams
|
||||
for (j = 0; j < num_streams_fnd; j++) {
|
||||
total_anoms += streams[j].GetNumAnoms();
|
||||
}
|
||||
|
||||
cout << "Number of anomalies = "
|
||||
<< total_anoms << endl;
|
||||
cout << "Percentage anomalous = "
|
||||
<< ((float)total_anoms * 100.0)/total_seqs << endl;
|
||||
|
||||
// If asked for, compute Hamming distances across streams and report
|
||||
if (cfg.compute_hdist) {
|
||||
for (j = 0; j < num_streams_fnd; j++) {
|
||||
if (streams[j].GetMaxHDist() > total_max_hdist) {
|
||||
total_max_hdist = streams[j].GetMaxHDist();
|
||||
}
|
||||
}
|
||||
cout << "Largest minimum Hamming distance = "
|
||||
<< total_max_hdist << endl;
|
||||
}
|
||||
|
||||
// If asked for, compute lfc across streams and report
|
||||
if (cfg.lf_size > 1) {
|
||||
for (j = 0; j < num_streams_fnd; j++) {
|
||||
if (streams[j].GetMaxLFC() > total_max_lfc) {
|
||||
total_max_lfc = streams[j].GetMaxLFC();
|
||||
}
|
||||
}
|
||||
cout << "Maximum lfc = " << total_max_lfc << endl;
|
||||
}
|
||||
}
|
||||
|
||||
// If asked for, compute db stats and report
|
||||
if (cfg.write_db_stats) {
|
||||
WriteDBStats(normal, cout, db_size);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/*********************************************************************
|
||||
* WriteDBStats() *
|
||||
* Computes and writes to standard output the number of nodes in *
|
||||
* the database, the number of unique sequences, the number of *
|
||||
* branches, and the average database branch factor. *
|
||||
* *
|
||||
* Input: const SeqForest &db_forest Forest of sequences in *
|
||||
* database *
|
||||
* ostream &out_stream Where to write info *
|
||||
* const int db_size Number of unique sequences in the *
|
||||
* database *
|
||||
* *
|
||||
* Output: none *
|
||||
*********************************************************************/
|
||||
|
||||
void WriteDBStats(const SeqForest &db_forest, ostream &out_stream,
|
||||
const int db_size)
|
||||
{
|
||||
int db_nodes = 0;
|
||||
int db_branches = 0;
|
||||
|
||||
for (int i = 0; i < db_forest.trees.size(); i++)
|
||||
{
|
||||
if (db_forest.trees_found[i])
|
||||
{
|
||||
db_nodes += db_forest.trees[i].NumNodes();
|
||||
db_branches += db_forest.trees[i].NumBranches();
|
||||
}
|
||||
}
|
||||
|
||||
out_stream << "Number of DB nodes = " << db_nodes << endl;
|
||||
out_stream << "Number of unique sequences = "<<db_size << endl;
|
||||
out_stream << "Number of branches (edges) = "<<db_branches << endl;
|
||||
out_stream << "Average DB branch factor = "
|
||||
<<((float)db_branches/(db_nodes - db_size))<<endl;
|
||||
|
||||
}
|
||||
|
||||
|
||||
/*********************************************************************
|
||||
* OutputGraph() *
|
||||
* Writes a file db_name.dot containing input for the program Dot. *
|
||||
* Running Dot on db_name.dot produces a PostScript file *
|
||||
* containing a picture of the whole database tree. *
|
||||
* *
|
||||
* Input: const SeqForest &db_forest Forest of sequences in *
|
||||
* database *
|
||||
* const string db_name Filename to use *
|
||||
* *
|
||||
* Output: none *
|
||||
*********************************************************************/
|
||||
|
||||
void OutputGraph(const SeqForest &db_forest, const string db_name)
|
||||
{
|
||||
char *dot_filename;
|
||||
dot_filename = new char [strlen(db_name.c_str())+4];
|
||||
strcpy(dot_filename, db_name.c_str());
|
||||
ofstream output_file(strcat(dot_filename,".dot"));
|
||||
|
||||
output_file<<"digraph \""<<db_name<<"\" {"<<endl;
|
||||
output_file<<" ratio=auto;"<<endl;
|
||||
output_file<<" page=\"8.5,11\";"<<endl;
|
||||
for (int i = 0; i < db_forest.trees.size(); i++) {
|
||||
if (db_forest.trees_found[i])
|
||||
db_forest.trees[i].OutputGraph(output_file);
|
||||
}
|
||||
output_file<<"}"<<endl;
|
||||
output_file.close();
|
||||
}
|
||||
|
367
11/wywolania/Data/stide_v1.2/stream.C
Normal file
367
11/wywolania/Data/stide_v1.2/stream.C
Normal file
@ -0,0 +1,367 @@
|
||||
#include <stdlib.h>
|
||||
#include <stdio.h>
|
||||
#include <string>
|
||||
#include <iostream>
|
||||
#include <fstream>
|
||||
#include "stream.h"
|
||||
|
||||
/********************************************************************
|
||||
* Init() *
|
||||
* Initializes an instance of Stream. *
|
||||
* *
|
||||
* Input: const Config &cfg Configuration information *
|
||||
* const int intern internal stream identifier *
|
||||
* const int extern external stream identifier *
|
||||
* Output: none *
|
||||
*******************************************************************/
|
||||
|
||||
using std::cerr;
|
||||
using std::endl;
|
||||
|
||||
void Stream::Init(const Config &cfg,
|
||||
const int intern_id, const int extern_id) {
|
||||
int i;
|
||||
// initialize all the arrays
|
||||
current_seq.clear();
|
||||
current_seq.reserve(cfg.seq_len);
|
||||
for(i=0; i < cfg.seq_len; i++)
|
||||
current_seq[i] = -1;
|
||||
|
||||
num_in_seq = -1;
|
||||
num_pairs_read = 0;
|
||||
num_anoms = 0;
|
||||
num_seqs_fnd = 0;
|
||||
int_sid = intern_id;
|
||||
ext_sid = extern_id;
|
||||
max_hdist = 0;
|
||||
seq_hdist = 0;
|
||||
lf.reserve(cfg.lf_size);
|
||||
for(i=0; i < cfg.lf_size; i++)
|
||||
lf[i] = 0;
|
||||
seq_lfc = 0;
|
||||
max_lfc = 0;
|
||||
ready = 0;
|
||||
seq_len = cfg.seq_len;
|
||||
}
|
||||
|
||||
/*********************************************************************
|
||||
* Append() *
|
||||
* This function puts the integer given into the current_seq array *
|
||||
* as the last element. It flags ready according to whether *
|
||||
* current_seq is full. Updates num_in_seq, ready, current_seq, *
|
||||
* num_seqs_fnd, and num_pairs_read. *
|
||||
* *
|
||||
* Input: const int new_value The next value to be put into the *
|
||||
* current_seq array *
|
||||
* Output: none *
|
||||
********************************************************************/
|
||||
|
||||
void Stream::Append(const int new_value)
|
||||
{
|
||||
// missing system call - zero the current sequence
|
||||
if (new_value == -1) {
|
||||
num_in_seq = -1;
|
||||
ready = 0;
|
||||
}
|
||||
else {
|
||||
num_pairs_read++;
|
||||
if (num_in_seq < seq_len - 1) { // window not yet full
|
||||
num_in_seq++;
|
||||
current_seq[num_in_seq] = new_value;
|
||||
if (num_in_seq == seq_len - 1) {
|
||||
ready = 1;
|
||||
++num_seqs_fnd;
|
||||
}
|
||||
}
|
||||
|
||||
else {
|
||||
// Roll over current_seq array
|
||||
for (int k = 0; k < num_in_seq; k++) {
|
||||
current_seq[k] = current_seq[k + 1];
|
||||
}
|
||||
current_seq[num_in_seq] = new_value;
|
||||
++num_seqs_fnd;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/********************************************************************
|
||||
* AddToDB() *
|
||||
* *
|
||||
* Adds current_seq to the database if it isn't already there; *
|
||||
* Returns 0 if it is already there, 1 if it is new. Updates *
|
||||
* normal and db_size. *
|
||||
* *
|
||||
* Input: SeqForest &normal Forest of normal sequences *
|
||||
* int &db_size Number of unique sequences in the *
|
||||
* database *
|
||||
* const int total_pairs_read Number of pairs read from the *
|
||||
* entire input stream *
|
||||
* const Config &cfg Configuration Information *
|
||||
* Output: 0 if sequence isn't new, 1 if it is *
|
||||
********************************************************************/
|
||||
|
||||
|
||||
int Stream::AddToDB(SeqForest &normal, int &db_size, const int
|
||||
total_pairs_read, const Config &cfg) const
|
||||
{
|
||||
int is_new;
|
||||
|
||||
// If there is not a tree with the same root as this sequence has,
|
||||
// make a new tree with that root and flag trees_found
|
||||
if (!normal.trees_found[current_seq[0]])
|
||||
{
|
||||
normal.trees[current_seq[0]].SetRoot(current_seq[0]);
|
||||
normal.trees_found[current_seq[0]] = 1;
|
||||
}
|
||||
|
||||
// Try to add the sequence. If it's already there, is_new will be
|
||||
// set to 0, otherwise it will be set to 1.
|
||||
is_new = normal.trees[current_seq[0]].InsertSeq(current_seq, 0, seq_len-1);
|
||||
db_size += is_new;
|
||||
|
||||
if ((is_new && cfg.verbose) || cfg.very_verbose)
|
||||
{
|
||||
ReportNewSeq(cfg, total_pairs_read, db_size);
|
||||
}
|
||||
|
||||
if (is_new)
|
||||
return 1;
|
||||
else
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
/*********************************************************************
|
||||
* CompareSeq() *
|
||||
* Compares the current sequence in this stream to the database, *
|
||||
* in the manner indicated by the configuration file. Reports *
|
||||
* on anomalies if told to by the configuration file. Updates *
|
||||
* num_anoms, seq_hdist, max_hdist, seq_lfc, and max_lfc. *
|
||||
* *
|
||||
* Input: const Config &cfg: Information from configuration file *
|
||||
* const SeqForest &normal: DB of normal sequences *
|
||||
* const int total_pairs_read: Number of pairs read from *
|
||||
* all of the streams *
|
||||
* Output: none *
|
||||
********************************************************************/
|
||||
|
||||
void Stream::CompareSeq(const Config &cfg, const SeqForest &normal,
|
||||
const int total_pairs_read)
|
||||
{
|
||||
int is_anom; // flag to indicate whether current_seq is an anomaly
|
||||
|
||||
is_anom = ComputeMisses(normal);
|
||||
if ((is_anom) && (cfg.compute_hdist)) {
|
||||
ComputeHDist(normal);
|
||||
}
|
||||
if (cfg.lf_size > 1) {
|
||||
ComputeLF(is_anom, cfg.lf_size);
|
||||
}
|
||||
// if we're in verbose mode and either current_seq is an anomaly or
|
||||
// its locality frame contains an anomaly, report it
|
||||
if ((cfg.very_verbose) || (cfg.verbose && (is_anom || seq_lfc))) {
|
||||
ReportSeq(cfg, total_pairs_read, is_anom);
|
||||
}
|
||||
}
|
||||
|
||||
/*********************************************************************
|
||||
* ComputeMisses() *
|
||||
* Compares the current sequence to the database sequences. If *
|
||||
* there is an exact match, we return 0. Otherwise we return 1. *
|
||||
* Updates num_anoms and seq_hdist. *
|
||||
* *
|
||||
* Input: const SeqForest &normal: DB of normal sequences *
|
||||
* Output: 0 if there is an exact match *
|
||||
* 1 if the sequence is anomalous *
|
||||
********************************************************************/
|
||||
|
||||
|
||||
int Stream::ComputeMisses(const SeqForest &normal)
|
||||
{
|
||||
if (normal.IsSeqInForest(current_seq, seq_len)) {
|
||||
seq_hdist = 0;
|
||||
return(0);
|
||||
}
|
||||
|
||||
// We have an anomaly
|
||||
++num_anoms;
|
||||
return(1);
|
||||
}
|
||||
|
||||
/*********************************************************************
|
||||
* ComputeHDist() *
|
||||
* Compares the current sequence in this stream to each sequence *
|
||||
* in the database in turn, adding up the number of mismatches *
|
||||
* between the two sequences. The smallest difference between *
|
||||
* the current sequence and the database sequences is the minimum *
|
||||
* Hamming distance for the current sequence. If this minimum *
|
||||
* Hamming distance is greater than the largest minimum Hamming *
|
||||
* distance encountered so far, then the variable max_hdist is *
|
||||
* updated. Updates seq_hdist and max_hdist. *
|
||||
* *
|
||||
* Input: const SeqForest &normal: DB of normal sequences *
|
||||
* *
|
||||
* Output: none *
|
||||
********************************************************************/
|
||||
|
||||
void Stream::ComputeHDist(const SeqForest &normal)
|
||||
{
|
||||
int misses_on_this_seq; // the number of mismatches between
|
||||
// current_seq and the sequence we're
|
||||
// comparing it with at the moment
|
||||
seq_hdist = seq_len; // start with seq_hdist as high as
|
||||
// possible
|
||||
|
||||
// We compare current_seq with each sequence in our database tree
|
||||
for (int i = 0; i < normal.trees.size(); i++) {
|
||||
// Have we seen any sequences starting with element i? If not, we
|
||||
// can go on to consider sequences starting with element i+1.
|
||||
if (normal.trees_found[i]) {
|
||||
misses_on_this_seq =
|
||||
normal.trees[i].ComputeHDistForTree(current_seq, 0, seq_len-1);
|
||||
if (misses_on_this_seq < seq_hdist) {
|
||||
seq_hdist = misses_on_this_seq;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (seq_hdist > max_hdist) {
|
||||
max_hdist = seq_hdist;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/*********************************************************************
|
||||
* ComputeLF() *
|
||||
* Computes the number of misses in current_seq's locality frame. *
|
||||
* Updates lf, seq_lfc and max_lfc. *
|
||||
* *
|
||||
* Input: const int is_anom Flag to indicate whether *
|
||||
* current_seq is an anomaly *
|
||||
* const int lf_size Size of locality frame *
|
||||
* Output: none *
|
||||
********************************************************************/
|
||||
|
||||
|
||||
void Stream::ComputeLF(const int is_anom, const int lf_size)
|
||||
{
|
||||
// When num_seqs_fnd is less than lf_size, the locality frame
|
||||
// array is not full
|
||||
if (num_seqs_fnd <= lf_size) {
|
||||
lf[num_seqs_fnd-1] = is_anom;
|
||||
seq_lfc += is_anom;
|
||||
}
|
||||
else {
|
||||
// We're about to remove the first element of lf; since seq_lfc is
|
||||
// the sum of the elements of lf, we should subtract lf[0] from
|
||||
// seq_lfc to remove it from the sum.
|
||||
seq_lfc -= lf[0];
|
||||
// Now we add is_anom and seq_lfc is the sum of the new locality
|
||||
// frame.
|
||||
seq_lfc += is_anom;
|
||||
|
||||
// roll over the array
|
||||
for (int i = 0; i < lf_size-1; i++) {
|
||||
lf[i] = lf[i+1];
|
||||
}
|
||||
lf[lf_size-1] = is_anom;
|
||||
}
|
||||
if (seq_lfc > max_lfc) {
|
||||
max_lfc = seq_lfc;
|
||||
}
|
||||
}
|
||||
|
||||
/*********************************************************************
|
||||
* ReportSeq() *
|
||||
* This function reports data about a sequence. Specifically, it *
|
||||
* can report the external stream id, a number indicating where *
|
||||
* the first element of the current sequence occurs in the input, *
|
||||
* a number indicating how many pairs from this particular data *
|
||||
* stream have been read prior to the first element of the *
|
||||
* sequence, the minimum Hamming distance for the current *
|
||||
* sequence, the locality frame count, the locality frame count, *
|
||||
* and whether this particular sequence is itself an anomaly (it *
|
||||
* could be that some other sequence in its locality frame is *
|
||||
* anomalous). The configuration file determines which of those *
|
||||
* possible data are reported and in what format. Updates no *
|
||||
* values. *
|
||||
* *
|
||||
* Input: const Config &cfg Configuration information *
|
||||
* const int total_pairs_read Total number of pairs read *
|
||||
* from the input stream from any data *
|
||||
* stream, not just this one *
|
||||
* const int is_anom flag for whether the current *
|
||||
* sequence is itself an anomaly *
|
||||
* Output: none *
|
||||
********************************************************************/
|
||||
|
||||
void Stream::ReportSeq(const Config &cfg, const int total_pairs_read,
|
||||
const int is_anom) const
|
||||
{
|
||||
for (int i = 0; i < cfg.num_fvars; i++) {
|
||||
switch (cfg.write_val[i]) {
|
||||
case 'a':
|
||||
printf(cfg.fmt_str[i], is_anom); break;
|
||||
case 'c':
|
||||
if (cfg.lf_size > 1) {
|
||||
printf(cfg.fmt_str[i], seq_lfc);
|
||||
}
|
||||
break;
|
||||
case 'h':
|
||||
if (cfg.compute_hdist) {
|
||||
printf(cfg.fmt_str[i], seq_hdist);
|
||||
}
|
||||
break;
|
||||
case 'i':
|
||||
printf(cfg.fmt_str[i], num_pairs_read); break;
|
||||
case 'p':
|
||||
printf(cfg.fmt_str[i], total_pairs_read); break;
|
||||
case 's':
|
||||
printf(cfg.fmt_str[i], ext_sid); break;
|
||||
}
|
||||
}
|
||||
printf(cfg.fmt_str[cfg.num_fvars]);
|
||||
}
|
||||
|
||||
|
||||
/*********************************************************************
|
||||
* ReportNewSeq() *
|
||||
* This function reports on sequences which have been newly added *
|
||||
* to the database. It can report the external stream *
|
||||
* identifier, where the first element of the sequence occurs *
|
||||
* both within the whole input stream and within its own data *
|
||||
* stream, and the number of unique sequences in the database *
|
||||
* after this sequence has been added. The configuration file *
|
||||
* determines which of those possible data are reported and in *
|
||||
* what format. Updates no values. *
|
||||
* *
|
||||
* Input: const Config &cfg Configuration information *
|
||||
* const int total_pairs_read Total number of pairs read *
|
||||
* from the input stream from any data *
|
||||
* stream, not just this one *
|
||||
* const int db_size Number of unique sequences *
|
||||
* in the database *
|
||||
* Output: none *
|
||||
********************************************************************/
|
||||
|
||||
void Stream::ReportNewSeq(const Config &cfg, const int total_pairs_read,
|
||||
const int db_size) const
|
||||
{
|
||||
for (int i = 0; i < cfg.num_fvars; i++) {
|
||||
switch (cfg.write_val[i]) {
|
||||
case 'd':
|
||||
printf(cfg.fmt_str[i], db_size); break;
|
||||
case 'i':
|
||||
printf(cfg.fmt_str[i], num_pairs_read); break;
|
||||
case 'p':
|
||||
printf(cfg.fmt_str[i], total_pairs_read); break;
|
||||
case 's':
|
||||
printf(cfg.fmt_str[i], ext_sid); break;
|
||||
}
|
||||
}
|
||||
printf(cfg.fmt_str[cfg.num_fvars]);
|
||||
}
|
||||
|
||||
|
||||
|
63
11/wywolania/Data/stide_v1.2/stream.h
Normal file
63
11/wywolania/Data/stide_v1.2/stream.h
Normal file
@ -0,0 +1,63 @@
|
||||
#ifndef __STREAM_H
|
||||
#define __STREAM_H
|
||||
|
||||
#include <vector>
|
||||
#include "config.h"
|
||||
#include "flexitree.h"
|
||||
|
||||
using std::vector;
|
||||
|
||||
class Stream {
|
||||
public:
|
||||
Stream() {};
|
||||
void Init(const Config &cfg, const int intern_id, const int
|
||||
extern_id);
|
||||
void Append(const int next_value);
|
||||
int AddToDB(SeqForest &normal, int &db_size, int total_pairs_read,
|
||||
const Config &cfg) const;
|
||||
void CompareSeq(const Config &cfg, const SeqForest &normal, const
|
||||
int total_pairs_read);
|
||||
int GetMaxHDist(void) const {return max_hdist;}
|
||||
int GetMaxLFC(void) const {return max_lfc;}
|
||||
int Ready(void) const {return ready;}
|
||||
int GetNumAnoms(void) const {return num_anoms;}
|
||||
int GetNumPairsRead(void) const {return num_pairs_read;}
|
||||
int GetNumSeqsFnd(void) const {return num_seqs_fnd;}
|
||||
private:
|
||||
vector<int> current_seq; // current sequence being filled or
|
||||
// processed
|
||||
int num_in_seq; // current_seq is full up through
|
||||
// num_in_seq
|
||||
int num_pairs_read; // the number of input pairs belonging to
|
||||
// this stream that have been read so far
|
||||
int num_anoms; // the number of anomalies found so far
|
||||
int num_seqs_fnd; // the number of (not necessarily unique)
|
||||
// sequences belonging to this stream
|
||||
// found so far
|
||||
int ext_sid; // the external stream id
|
||||
int int_sid; // the internal stream id
|
||||
int max_hdist; // the largest minimum Hamming distance
|
||||
// found in this stream
|
||||
int seq_hdist; // the minimum Hamming distance for
|
||||
// current_seq
|
||||
vector<int> lf; // array for locality frame
|
||||
int seq_lfc; // the locality frame count for this
|
||||
// sequence
|
||||
int max_lfc; // the largest locality frame count
|
||||
// encountered so far
|
||||
int ready; // a flag to indicate whether this stream
|
||||
// has a full sequence ready to be
|
||||
// processed. 0 = no, 1 = yes.
|
||||
int seq_len; // sequence length
|
||||
int ComputeMisses(const SeqForest &normal);
|
||||
void ComputeHDist(const SeqForest &normal);
|
||||
void ComputeLF(const int is_anom, const int lf_size);
|
||||
void ReportSeq(const Config &cfg, const int total_pairs_read,
|
||||
const int is_anom) const;
|
||||
void ReportNewSeq(const Config &cfg, const int total_pairs_read,
|
||||
const int db_size) const;
|
||||
};
|
||||
|
||||
#endif
|
||||
|
||||
|
0
11/wywolania/main.R
Normal file
0
11/wywolania/main.R
Normal file
BIN
426254-l9.tb2
Normal file
BIN
426254-l9.tb2
Normal file
Binary file not shown.
Loading…
Reference in New Issue
Block a user