...
This commit is contained in:
parent
55521f222e
commit
ab1d7e2546
4
.gitignore
vendored
Normal file
4
.gitignore
vendored
Normal file
@ -0,0 +1,4 @@
|
|||||||
|
.Rproj.user
|
||||||
|
.Rhistory
|
||||||
|
.RData
|
||||||
|
.Ruserdata
|
13
11/11.Rproj
Normal file
13
11/11.Rproj
Normal file
@ -0,0 +1,13 @@
|
|||||||
|
Version: 1.0
|
||||||
|
|
||||||
|
RestoreWorkspace: Default
|
||||||
|
SaveWorkspace: Default
|
||||||
|
AlwaysSaveHistory: Default
|
||||||
|
|
||||||
|
EnableCodeIndexing: Yes
|
||||||
|
UseSpacesForTab: Yes
|
||||||
|
NumSpacesForTab: 2
|
||||||
|
Encoding: UTF-8
|
||||||
|
|
||||||
|
RnwWeave: Sweave
|
||||||
|
LaTeX: pdfLaTeX
|
10
11/3.txt
10
11/3.txt
@ -1,10 +0,0 @@
|
|||||||
Wiem że nie do końca o to chodziło ale jak chodzi o ciekawą graficzną interpretację to polecam:
|
|
||||||
https://www.gwern.net/Traffic
|
|
||||||
|
|
||||||
A teraz 2 odpowiedzi:
|
|
||||||
UNSW-NB15:
|
|
||||||
opis: https://www.unsw.adfa.edu.au/unsw-canberra-cyber/cybersecurity/ADFA-NB15-Datasets/
|
|
||||||
link: https://cloudstor.aarnet.edu.au/plus/index.php/s/2DhnLGDdEECo4ys
|
|
||||||
NSL_KDD:
|
|
||||||
opis: Nie znalazłem samego setu, ale znalazłem jego zrzut :)
|
|
||||||
link: https://github.com/jmnwong/NSL-KDD-Dataset
|
|
91
11/README.md
Normal file
91
11/README.md
Normal file
@ -0,0 +1,91 @@
|
|||||||
|
# Charakteryzacja zbiorów oraz jego historyczność (zadanie 1)
|
||||||
|
|
||||||
|
## 1. kddcup99
|
||||||
|
|
||||||
|
Przygotowany na Fifth International Conference on Knowledge Discovery and Data Mining by w ramach konkursu wyłonić najlepiej zaprojektowany model pradykcyjny wykrywający potencjalny atak.
|
||||||
|
W zbiorze były 4 typy ataków (DOS, R2L, U2R, probing). W zbiorze danych były 24 ataki i 14 dodatkowych w zbiorze testującym.
|
||||||
|
Dane zostały zasymulowane w sieci militernej. To 4GB ruchu sieciowego z 7 tygodni (około 5 milionów rekordów połączeń).
|
||||||
|
Połączenie to sekwencja pakietów TCP z zdefiniowanym początkiem i końcem (w czasie) i jest oznaczone jako norlalne lub przez kod ataku. Każde zawiera około 100 bajtów.
|
||||||
|
|
||||||
|
Co do używalności to znalazłem:
|
||||||
|
|
||||||
|
- Pracę z 2016r. opisującą zastosowania w uczeniu maszynowym w latach 2010-2015 - https://peerj.com/preprints/1954/
|
||||||
|
- Pracę z 2018r. w której autor mówi że ten zbiór używa się często jako benchmark - https://arxiv.org/abs/1811.05372
|
||||||
|
|
||||||
|
### Nie wiem czy to oznacza że jest nadal używany (to jednak 3 lata) - wydaje mi się że tak i taką ocenę zostawiam :)
|
||||||
|
|
||||||
|
<br>
|
||||||
|
|
||||||
|
## 2. network
|
||||||
|
|
||||||
|
Zrzut ruchu sieciowego wykonanego programem tcdump pomiędzy pewną siecią LAN a sieciami zewnętrznymi.
|
||||||
|
Dzięki ofiltrowaniu tcdump'a zebrane zostały wyłącznie połączenia TCP i UDP.
|
||||||
|
|
||||||
|
### Każdy pakiet TCP składa się z:
|
||||||
|
|
||||||
|
- Time stamp
|
||||||
|
- Source IP address
|
||||||
|
- Source port
|
||||||
|
- Destination IP address
|
||||||
|
- Destination port
|
||||||
|
- Flags (syn, fin, push, rst, or .)
|
||||||
|
- Data sequence number of this packet
|
||||||
|
- Data sequence number of the data expected in return
|
||||||
|
- Number of bytes of receive buffer space available
|
||||||
|
- Indication of whether or not the data is urgent
|
||||||
|
|
||||||
|
### Każdy pakiet UDP składa się z:
|
||||||
|
|
||||||
|
- Time stamp
|
||||||
|
- Source IP address
|
||||||
|
- Source port
|
||||||
|
- Destination IP address
|
||||||
|
- Destination port
|
||||||
|
- Length of the packet
|
||||||
|
Wszystkie adresy IP zostały zmodyfikowane by nie udostępniać możliwie niebezpiecznych danych.
|
||||||
|
|
||||||
|
|
||||||
|
### Ostatnia edycja strony tego zbioru była 4 kwietnia 2001r., ostatni artykuł jaki mają podany na stronie (http://ivpr.cs.uml.edu/publications/) jest z 2000r., nie znalazłem wspominek o wykorzystaniu tych danych w nowszych pracach więc oznaczam ten zbiór jaki historyczny.
|
||||||
|
|
||||||
|
<br>
|
||||||
|
|
||||||
|
## 3. wywołania systemowe
|
||||||
|
|
||||||
|
Zbiór zawiera dane wywołań aktywnych procesów systemowych.
|
||||||
|
Każdy plik ścieżkowy (\*.int) zawiera listę par numerów w kolejności:
|
||||||
|
|
||||||
|
- PID procesu
|
||||||
|
- numer reprezentujący zapytanie systemowe
|
||||||
|
|
||||||
|
Mapowanie numerów na wywołania jest załączone w dokumentacji w folderze `UserDoc`.
|
||||||
|
Można też ją pobrać jako postscript pod tym adresem: https://www.cs.unm.edu/~immsec/software/stide_user_doc.ps
|
||||||
|
|
||||||
|
<br>
|
||||||
|
|
||||||
|
## 4. UNIX shell log
|
||||||
|
|
||||||
|
9 zbiorów danych aktywności uzytkmownika (USER0 i USER1 to ta sama osoba na innych maszynach) w systemie UNIX.
|
||||||
|
Dane są wyczyszczone z wszystkich adresów sieciowych, danych osobowych, timestamp'ów etc.
|
||||||
|
Reprezentacja tokenowa danych zawartych w zbiorze jest super opisana tutaj (http://kdd.ics.uci.edu/databases/UNIX_user_data/README) więc nie będę jej powtarzać.
|
||||||
|
|
||||||
|
### Nie znalazłem nowych prac z wykorzystaniem tego zbioru, a strona UCI KDD jest archiwalna jako że wchłonął ich UCI ML więc zakładam że zbiór jest archiwalny.
|
||||||
|
|
||||||
|
<br>
|
||||||
|
<br>
|
||||||
|
|
||||||
|
# Dodatkowe zbiory (zadanie 3)
|
||||||
|
|
||||||
|
## 1. UNSW-NB15:
|
||||||
|
|
||||||
|
- opis: https://www.unsw.adfa.edu.au/unsw-canberra-cyber/cybersecurity/ADFA-NB15-Datasets/
|
||||||
|
- link: https://cloudstor.aarnet.edu.au/plus/index.php/s/2DhnLGDdEECo4ys
|
||||||
|
|
||||||
|
## 2. NSL_KDD:
|
||||||
|
|
||||||
|
- opis: Nie znalazłem samego setu, ale znalazłem jego zrzut :)
|
||||||
|
- link: https://github.com/jmnwong/NSL-KDD-Dataset
|
||||||
|
|
||||||
|
## P.S.
|
||||||
|
|
||||||
|
Wiem że nie do końca o to chodziło ale jak chodzi o ciekawą graficzną interpretację to polecam:
|
||||||
|
https://www.gwern.net/Traffic
|
494021
11/kddcup99/Data/kddcup.data_10_percent_corrected
Normal file
494021
11/kddcup99/Data/kddcup.data_10_percent_corrected
Normal file
File diff suppressed because it is too large
Load Diff
64
11/kddcup99/response.R
Normal file
64
11/kddcup99/response.R
Normal file
@ -0,0 +1,64 @@
|
|||||||
|
#
|
||||||
|
#
|
||||||
|
# Na poczatku pragnę przeprosić za analizę jedynie 10% zamiast całego zbioru, lecz moje ograniczenia techniczne nie pozwalają mi
|
||||||
|
# na puszczenie tego na pełnych danych bez spalenia mojego sprzętu. Mam nadzieję że zostanie mi to wybaczone :)
|
||||||
|
#
|
||||||
|
#
|
||||||
|
|
||||||
|
library(ggplot2)
|
||||||
|
|
||||||
|
headers <- c('back','buffer_overflow','ftp_write','guess_passwd','imap','ipsweep','land','loadmodule','multihop','neptune','nmap','normal','perl','phf','pod','portsweep','rootkit','satan','smurf','spy','teardrop','warezclient','warezmaster',
|
||||||
|
'duration',
|
||||||
|
'protocol_type',
|
||||||
|
'service',
|
||||||
|
'flag',
|
||||||
|
'src_bytes',
|
||||||
|
'dst_bytes',
|
||||||
|
'land',
|
||||||
|
'wrong_fragment',
|
||||||
|
'urgent',
|
||||||
|
'hot',
|
||||||
|
'num_failed_logins',
|
||||||
|
'logged_in',
|
||||||
|
'num_compromised',
|
||||||
|
'root_shell',
|
||||||
|
'su_attempted',
|
||||||
|
'num_root',
|
||||||
|
'num_file_creations',
|
||||||
|
'num_shells',
|
||||||
|
'num_access_files',
|
||||||
|
'num_outbound_cmds',
|
||||||
|
'is_host_login',
|
||||||
|
'is_guest_login',
|
||||||
|
'count',
|
||||||
|
'srv_count',
|
||||||
|
'serror_rate',
|
||||||
|
'srv_serror_rate',
|
||||||
|
'rerror_rate',
|
||||||
|
'srv_rerror_rate',
|
||||||
|
'same_srv_rate',
|
||||||
|
'diff_srv_rate',
|
||||||
|
'srv_diff_host_rate',
|
||||||
|
'dst_host_count',
|
||||||
|
'dst_host_srv_count',
|
||||||
|
'dst_host_same_srv_rate',
|
||||||
|
'dst_host_diff_srv_rate',
|
||||||
|
'dst_host_same_src_port_rate',
|
||||||
|
'dst_host_srv_diff_host_rate',
|
||||||
|
'dst_host_serror_rate',
|
||||||
|
'dst_host_srv_serror_rate',
|
||||||
|
'dst_host_rerror_rate',
|
||||||
|
'dst_host_srv_rerror_rate')
|
||||||
|
|
||||||
|
|
||||||
|
kddcup99 <- read.csv('kddcup99/Data/kddcup.data_10_percent_corrected', col.names = headers)
|
||||||
|
|
||||||
|
# nie do końca rozumiem czemu w zbiorze są 42 wartości a w http://kdd.ics.uci.edu/databases/kddcup99/kddcup.names są podane 64 kolumny...
|
||||||
|
# do tego kolumny nie zgadzają mi się do końca sensem z tym co byłoby w nich, ale patrząc na dane wyciągam i pokazuję
|
||||||
|
# poniżej 2 najsensowniej wyglądające do analizy kolumny
|
||||||
|
|
||||||
|
print('Most common imap in kddcup99:')
|
||||||
|
print(tail(names(sort(table(kddcup99$imap)))))
|
||||||
|
|
||||||
|
print('Most common ipsweep in kddcup99:')
|
||||||
|
print(tail(names(sort(table(kddcup99$ipsweep)))))
|
358760
11/network/Data/base.csv
Normal file
358760
11/network/Data/base.csv
Normal file
File diff suppressed because it is too large
Load Diff
628775
11/network/Data/net1.csv
Normal file
628775
11/network/Data/net1.csv
Normal file
File diff suppressed because it is too large
Load Diff
481851
11/network/Data/net2.csv
Normal file
481851
11/network/Data/net2.csv
Normal file
File diff suppressed because it is too large
Load Diff
509263
11/network/Data/net3.csv
Normal file
509263
11/network/Data/net3.csv
Normal file
File diff suppressed because it is too large
Load Diff
632036
11/network/Data/net4.csv
Normal file
632036
11/network/Data/net4.csv
Normal file
File diff suppressed because it is too large
Load Diff
72
11/network/response.R
Normal file
72
11/network/response.R
Normal file
@ -0,0 +1,72 @@
|
|||||||
|
library(ggplot2)
|
||||||
|
|
||||||
|
base <- read.csv('network/Data/base.csv')
|
||||||
|
net1 <- read.csv('network/Data/net1.csv')
|
||||||
|
net2 <- read.csv('network/Data/net2.csv')
|
||||||
|
net3 <- read.csv('network/Data/net3.csv')
|
||||||
|
net4 <- read.csv('network/Data/net4.csv')
|
||||||
|
|
||||||
|
# base
|
||||||
|
print('Most common src_port in base:')
|
||||||
|
print(tail(names(sort(table(base$src_port)))))
|
||||||
|
|
||||||
|
print('Most common src_addr in base:')
|
||||||
|
print(tail(names(sort(table(base$src_addr)))))
|
||||||
|
|
||||||
|
print('Most common dest_port in base:')
|
||||||
|
print(tail(names(sort(table(base$dest_port)))))
|
||||||
|
|
||||||
|
print('Most common dest_addr in base:')
|
||||||
|
print(tail(names(sort(table(base$dest_addr)))))
|
||||||
|
|
||||||
|
# net1
|
||||||
|
print('Most common src_port in net1:')
|
||||||
|
print(tail(names(sort(table(net1$src_port)))))
|
||||||
|
|
||||||
|
print('Most common src_addr in net1:')
|
||||||
|
print(tail(names(sort(table(net1$src_addr)))))
|
||||||
|
|
||||||
|
print('Most common dest_port in net1:')
|
||||||
|
print(tail(names(sort(table(net1$dest_port)))))
|
||||||
|
|
||||||
|
print('Most common dest_addr in net1:')
|
||||||
|
print(tail(names(sort(table(net1$dest_addr)))))
|
||||||
|
|
||||||
|
# net2
|
||||||
|
print('Most common src_port in net2:')
|
||||||
|
print(tail(names(sort(table(net2$src_port)))))
|
||||||
|
|
||||||
|
print('Most common src_addr in net2:')
|
||||||
|
print(tail(names(sort(table(net2$src_addr)))))
|
||||||
|
|
||||||
|
print('Most common dest_port in net2:')
|
||||||
|
print(tail(names(sort(table(net2$dest_port)))))
|
||||||
|
|
||||||
|
print('Most common dest_addr in net2:')
|
||||||
|
print(tail(names(sort(table(net2$dest_addr)))))
|
||||||
|
|
||||||
|
# net3
|
||||||
|
print('Most common src_port in net3:')
|
||||||
|
print(tail(names(sort(table(net3$src_port)))))
|
||||||
|
|
||||||
|
print('Most common src_addr in net3:')
|
||||||
|
print(tail(names(sort(table(net3$src_addr)))))
|
||||||
|
|
||||||
|
print('Most common dest_port in net3:')
|
||||||
|
print(tail(names(sort(table(net3$dest_port)))))
|
||||||
|
|
||||||
|
print('Most common dest_addr in net3:')
|
||||||
|
print(tail(names(sort(table(net3$dest_addr)))))
|
||||||
|
|
||||||
|
# net4
|
||||||
|
print('Most common src_port in net4:')
|
||||||
|
print(tail(names(sort(table(net4$src_port)))))
|
||||||
|
|
||||||
|
print('Most common src_addr in net4:')
|
||||||
|
print(tail(names(sort(table(net4$src_addr)))))
|
||||||
|
|
||||||
|
print('Most common dest_port in net4:')
|
||||||
|
print(tail(names(sort(table(net4$dest_port)))))
|
||||||
|
|
||||||
|
print('Most common dest_addr in net4:')
|
||||||
|
print(tail(names(sort(table(net4$dest_addr)))))
|
69
11/unix/Data/README
Normal file
69
11/unix/Data/README
Normal file
@ -0,0 +1,69 @@
|
|||||||
|
This file contains 9 sets of sanitized user data drawn from the
|
||||||
|
command histories of 8 UNIX computer users at Purdue over the course
|
||||||
|
of up to 2 years (USER0 and USER1 were generated by the same person,
|
||||||
|
working on different platforms and different projects). The data is
|
||||||
|
drawn from tcsh(1) history files and has been parsed and sanitized to
|
||||||
|
remove filenames, user names, directory structures, web addresses,
|
||||||
|
host names, and other possibly identifying items. Command names,
|
||||||
|
flags, and shell metacharacters have been preserved. Additionally,
|
||||||
|
**SOF** and **EOF** tokens have been inserted at the start and end of
|
||||||
|
shell sessions, respectively. Sessions are concatenated by date order
|
||||||
|
and tokens appear in the order issued within the shell session, but no
|
||||||
|
timestamps are included in this data. For example, the two sessions:
|
||||||
|
|
||||||
|
# Start session 1
|
||||||
|
cd ~/private/docs
|
||||||
|
ls -laF | more
|
||||||
|
cat foo.txt bar.txt zorch.txt > somewhere
|
||||||
|
exit
|
||||||
|
# End session 1
|
||||||
|
|
||||||
|
# Start session 2
|
||||||
|
cd ~/games/
|
||||||
|
xquake &
|
||||||
|
fg
|
||||||
|
vi scores.txt
|
||||||
|
mailx john_doe@somewhere.com
|
||||||
|
exit
|
||||||
|
# End session 2
|
||||||
|
|
||||||
|
would be represented by the token stream
|
||||||
|
|
||||||
|
**SOF**
|
||||||
|
cd
|
||||||
|
<1> # one "file name" argument
|
||||||
|
ls
|
||||||
|
-laF
|
||||||
|
|
|
||||||
|
more
|
||||||
|
cat
|
||||||
|
<3> # three "file" arguments
|
||||||
|
>
|
||||||
|
<1>
|
||||||
|
exit
|
||||||
|
**EOF**
|
||||||
|
**SOF**
|
||||||
|
cd
|
||||||
|
<1>
|
||||||
|
xquake
|
||||||
|
&
|
||||||
|
fg
|
||||||
|
vi
|
||||||
|
<1>
|
||||||
|
mailx
|
||||||
|
<1>
|
||||||
|
exit
|
||||||
|
**EOF**
|
||||||
|
|
||||||
|
|
||||||
|
This data is made available under conditions of anonymity for the
|
||||||
|
contributing users and may be used for research purposes only.
|
||||||
|
Summaries and research results employing this data may be published,
|
||||||
|
but literal tokens or token sequences from the data may not be
|
||||||
|
published except with express consent of the originators of the data.
|
||||||
|
No portion of this data may be released with or included in a
|
||||||
|
commercial product, nor may any portion of this data be sold or
|
||||||
|
redistributed for profit or as part of of a profit-making endeavor.
|
||||||
|
|
||||||
|
Please direct any questions regarding this data to Terran Lane:
|
||||||
|
terran@ecn.purdue.edu.
|
8974
11/unix/Data/user_0
Normal file
8974
11/unix/Data/user_0
Normal file
File diff suppressed because it is too large
Load Diff
19881
11/unix/Data/user_1
Normal file
19881
11/unix/Data/user_1
Normal file
File diff suppressed because it is too large
Load Diff
18738
11/unix/Data/user_2
Normal file
18738
11/unix/Data/user_2
Normal file
File diff suppressed because it is too large
Load Diff
16866
11/unix/Data/user_3
Normal file
16866
11/unix/Data/user_3
Normal file
File diff suppressed because it is too large
Load Diff
37817
11/unix/Data/user_4
Normal file
37817
11/unix/Data/user_4
Normal file
File diff suppressed because it is too large
Load Diff
34821
11/unix/Data/user_5
Normal file
34821
11/unix/Data/user_5
Normal file
File diff suppressed because it is too large
Load Diff
64152
11/unix/Data/user_6
Normal file
64152
11/unix/Data/user_6
Normal file
File diff suppressed because it is too large
Load Diff
17329
11/unix/Data/user_7
Normal file
17329
11/unix/Data/user_7
Normal file
File diff suppressed because it is too large
Load Diff
54042
11/unix/Data/user_8
Normal file
54042
11/unix/Data/user_8
Normal file
File diff suppressed because it is too large
Load Diff
124
11/unix/response.txt
Normal file
124
11/unix/response.txt
Normal file
@ -0,0 +1,124 @@
|
|||||||
|
Nie wiem jak na tych plikach dokonać innej analizy statystycznej, więc zliczyłem po prostu najczęściej występujące
|
||||||
|
komendy per każdy użytkownik. Poniżej załączam komendę jaką stosowałem do analizy jak i wyniki badania.
|
||||||
|
Załączyłem tylko 10 najczęstszych żeby nie przytłoczyć.
|
||||||
|
Dodam że oczywiście tokeny typu <1>, **EOF***, etc. będą się pojawiać, ale nie należy
|
||||||
|
brać ich pod uwagę przy analizie statystycznej.
|
||||||
|
|
||||||
|
Komenda: sort <nazwa_pliku> | uniq -c | sort -rn | head -n 10
|
||||||
|
|
||||||
|
Wyniki:
|
||||||
|
|
||||||
|
user_0
|
||||||
|
2147 <1>
|
||||||
|
803 ls
|
||||||
|
567 **SOF**
|
||||||
|
567 **EOF**
|
||||||
|
507 cd
|
||||||
|
485 finger
|
||||||
|
450 elm
|
||||||
|
442 exit
|
||||||
|
251 <2>
|
||||||
|
230 fg
|
||||||
|
|
||||||
|
user_1
|
||||||
|
6069 <1>
|
||||||
|
1951 cd
|
||||||
|
1929 ls
|
||||||
|
1733 vi
|
||||||
|
884 <2>
|
||||||
|
515 **SOF**
|
||||||
|
515 **EOF**
|
||||||
|
397 smake
|
||||||
|
350 ll
|
||||||
|
315 more
|
||||||
|
|
||||||
|
user_2
|
||||||
|
5432 <1>
|
||||||
|
1597 cd
|
||||||
|
1069 **SOF**
|
||||||
|
1069 **EOF**
|
||||||
|
989 a.out
|
||||||
|
932 <2>
|
||||||
|
816 ls
|
||||||
|
626 quota
|
||||||
|
612 xcc
|
||||||
|
497 rm
|
||||||
|
|
||||||
|
user_3
|
||||||
|
4382 <1>
|
||||||
|
1710 ls
|
||||||
|
988 cd
|
||||||
|
806 more
|
||||||
|
778 vi
|
||||||
|
704 elm
|
||||||
|
577 fg
|
||||||
|
511 lo
|
||||||
|
501 **SOF**
|
||||||
|
501 **EOF**
|
||||||
|
|
||||||
|
|
||||||
|
user_4
|
||||||
|
10699 <1>
|
||||||
|
4501 cd
|
||||||
|
2395 ll
|
||||||
|
1682 vi
|
||||||
|
1465 dir
|
||||||
|
1396 <2>
|
||||||
|
955 **SOF**
|
||||||
|
955 **EOF**
|
||||||
|
641 elm
|
||||||
|
559 logout
|
||||||
|
|
||||||
|
|
||||||
|
user_5
|
||||||
|
8987 <1>
|
||||||
|
2862 cd
|
||||||
|
2748 <2>
|
||||||
|
2144 ls
|
||||||
|
1279 less
|
||||||
|
1183 grep
|
||||||
|
973 make
|
||||||
|
887 ll
|
||||||
|
778 -
|
||||||
|
632 <
|
||||||
|
|
||||||
|
|
||||||
|
user_6
|
||||||
|
16298 <1>
|
||||||
|
8761 ls
|
||||||
|
5680 cd
|
||||||
|
3419 **SOF**
|
||||||
|
3419 **EOF**
|
||||||
|
2830 vi
|
||||||
|
2419 elm
|
||||||
|
2015 <2>
|
||||||
|
1457 rm
|
||||||
|
996 exit
|
||||||
|
|
||||||
|
|
||||||
|
user_7
|
||||||
|
3463 <1>
|
||||||
|
1522 **SOF**
|
||||||
|
1522 **EOF**
|
||||||
|
1133 ls
|
||||||
|
848 cd
|
||||||
|
741 z
|
||||||
|
615 <2>
|
||||||
|
595 m
|
||||||
|
514 clear
|
||||||
|
237 rm
|
||||||
|
|
||||||
|
|
||||||
|
user_8
|
||||||
|
14269 <1>
|
||||||
|
5108 ll
|
||||||
|
5016 cd
|
||||||
|
2188 <2>
|
||||||
|
1983 **SOF**
|
||||||
|
1983 **EOF**
|
||||||
|
1553 k
|
||||||
|
1259 m
|
||||||
|
1177 z
|
||||||
|
796 vi
|
||||||
|
|
||||||
|
|
216
11/wywolania/Data/UserDoc/graphic1.eps
Normal file
216
11/wywolania/Data/UserDoc/graphic1.eps
Normal file
@ -0,0 +1,216 @@
|
|||||||
|
%!PS-Adobe-2.0 EPSF-2.0
|
||||||
|
%%Title: graphic1.eps
|
||||||
|
%%Creator: fig2dev Version 3.2 Patchlevel 0-beta2
|
||||||
|
%%CreationDate: Tue Feb 24 15:25:25 1998
|
||||||
|
%%For: julie@snow (Julie Rehmeyer,,,)
|
||||||
|
%%Orientation: Portrait
|
||||||
|
%%BoundingBox: 0 0 223 79
|
||||||
|
%%Pages: 0
|
||||||
|
%%BeginSetup
|
||||||
|
%%IncludeFeature: *PageSize Letter
|
||||||
|
%%EndSetup
|
||||||
|
%%Magnification: 0.70
|
||||||
|
%%EndComments
|
||||||
|
/$F2psDict 200 dict def
|
||||||
|
$F2psDict begin
|
||||||
|
$F2psDict /mtrx matrix put
|
||||||
|
/col-1 {0 setgray} bind def
|
||||||
|
/col0 {0.000 0.000 0.000 srgb} bind def
|
||||||
|
/col1 {0.000 0.000 1.000 srgb} bind def
|
||||||
|
/col2 {0.000 1.000 0.000 srgb} bind def
|
||||||
|
/col3 {0.000 1.000 1.000 srgb} bind def
|
||||||
|
/col4 {1.000 0.000 0.000 srgb} bind def
|
||||||
|
/col5 {1.000 0.000 1.000 srgb} bind def
|
||||||
|
/col6 {1.000 1.000 0.000 srgb} bind def
|
||||||
|
/col7 {1.000 1.000 1.000 srgb} bind def
|
||||||
|
/col8 {0.000 0.000 0.560 srgb} bind def
|
||||||
|
/col9 {0.000 0.000 0.690 srgb} bind def
|
||||||
|
/col10 {0.000 0.000 0.820 srgb} bind def
|
||||||
|
/col11 {0.530 0.810 1.000 srgb} bind def
|
||||||
|
/col12 {0.000 0.560 0.000 srgb} bind def
|
||||||
|
/col13 {0.000 0.690 0.000 srgb} bind def
|
||||||
|
/col14 {0.000 0.820 0.000 srgb} bind def
|
||||||
|
/col15 {0.000 0.560 0.560 srgb} bind def
|
||||||
|
/col16 {0.000 0.690 0.690 srgb} bind def
|
||||||
|
/col17 {0.000 0.820 0.820 srgb} bind def
|
||||||
|
/col18 {0.560 0.000 0.000 srgb} bind def
|
||||||
|
/col19 {0.690 0.000 0.000 srgb} bind def
|
||||||
|
/col20 {0.820 0.000 0.000 srgb} bind def
|
||||||
|
/col21 {0.560 0.000 0.560 srgb} bind def
|
||||||
|
/col22 {0.690 0.000 0.690 srgb} bind def
|
||||||
|
/col23 {0.820 0.000 0.820 srgb} bind def
|
||||||
|
/col24 {0.500 0.190 0.000 srgb} bind def
|
||||||
|
/col25 {0.630 0.250 0.000 srgb} bind def
|
||||||
|
/col26 {0.750 0.380 0.000 srgb} bind def
|
||||||
|
/col27 {1.000 0.500 0.500 srgb} bind def
|
||||||
|
/col28 {1.000 0.630 0.630 srgb} bind def
|
||||||
|
/col29 {1.000 0.750 0.750 srgb} bind def
|
||||||
|
/col30 {1.000 0.880 0.880 srgb} bind def
|
||||||
|
/col31 {1.000 0.840 0.000 srgb} bind def
|
||||||
|
|
||||||
|
end
|
||||||
|
save
|
||||||
|
-47.0 140.0 translate
|
||||||
|
1 -1 scale
|
||||||
|
|
||||||
|
/cp {closepath} bind def
|
||||||
|
/ef {eofill} bind def
|
||||||
|
/gr {grestore} bind def
|
||||||
|
/gs {gsave} bind def
|
||||||
|
/sa {save} bind def
|
||||||
|
/rs {restore} bind def
|
||||||
|
/l {lineto} bind def
|
||||||
|
/m {moveto} bind def
|
||||||
|
/rm {rmoveto} bind def
|
||||||
|
/n {newpath} bind def
|
||||||
|
/s {stroke} bind def
|
||||||
|
/sh {show} bind def
|
||||||
|
/slc {setlinecap} bind def
|
||||||
|
/slj {setlinejoin} bind def
|
||||||
|
/slw {setlinewidth} bind def
|
||||||
|
/srgb {setrgbcolor} bind def
|
||||||
|
/rot {rotate} bind def
|
||||||
|
/sc {scale} bind def
|
||||||
|
/sd {setdash} bind def
|
||||||
|
/ff {findfont} bind def
|
||||||
|
/sf {setfont} bind def
|
||||||
|
/scf {scalefont} bind def
|
||||||
|
/sw {stringwidth} bind def
|
||||||
|
/tr {translate} bind def
|
||||||
|
/tnt {dup dup currentrgbcolor
|
||||||
|
4 -2 roll dup 1 exch sub 3 -1 roll mul add
|
||||||
|
4 -2 roll dup 1 exch sub 3 -1 roll mul add
|
||||||
|
4 -2 roll dup 1 exch sub 3 -1 roll mul add srgb}
|
||||||
|
bind def
|
||||||
|
/shd {dup dup currentrgbcolor 4 -2 roll mul 4 -2 roll mul
|
||||||
|
4 -2 roll mul srgb} bind def
|
||||||
|
/DrawEllipse {
|
||||||
|
/endangle exch def
|
||||||
|
/startangle exch def
|
||||||
|
/yrad exch def
|
||||||
|
/xrad exch def
|
||||||
|
/y exch def
|
||||||
|
/x exch def
|
||||||
|
/savematrix mtrx currentmatrix def
|
||||||
|
x y tr xrad yrad sc 0 0 1 startangle endangle arc
|
||||||
|
closepath
|
||||||
|
savematrix setmatrix
|
||||||
|
} def
|
||||||
|
|
||||||
|
/$F2psBegin {$F2psDict begin /$F2psEnteredState save def} def
|
||||||
|
/$F2psEnd {$F2psEnteredState restore end} def
|
||||||
|
%%EndProlog
|
||||||
|
|
||||||
|
$F2psBegin
|
||||||
|
10 setmiterlimit
|
||||||
|
n 0 3367 m 0 0 l 6449 0 l 6449 3367 l cp clip
|
||||||
|
0.04200 0.04200 sc
|
||||||
|
7.500 slw
|
||||||
|
% Ellipse
|
||||||
|
n 1800 1800 75 75 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr
|
||||||
|
|
||||||
|
% Ellipse
|
||||||
|
n 1500 2400 75 75 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr
|
||||||
|
|
||||||
|
% Ellipse
|
||||||
|
n 2078 2378 75 75 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr
|
||||||
|
|
||||||
|
% Ellipse
|
||||||
|
n 2378 2978 75 75 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr
|
||||||
|
|
||||||
|
% Ellipse
|
||||||
|
n 1778 2978 75 75 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr
|
||||||
|
|
||||||
|
% Ellipse
|
||||||
|
n 1246 2994 75 75 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr
|
||||||
|
|
||||||
|
% Ellipse
|
||||||
|
n 3900 1800 75 75 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr
|
||||||
|
|
||||||
|
% Ellipse
|
||||||
|
n 5700 1800 75 75 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr
|
||||||
|
|
||||||
|
% Ellipse
|
||||||
|
n 3900 2400 75 75 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr
|
||||||
|
|
||||||
|
% Ellipse
|
||||||
|
n 3600 3000 75 75 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr
|
||||||
|
|
||||||
|
% Ellipse
|
||||||
|
n 4200 3000 75 75 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr
|
||||||
|
|
||||||
|
% Ellipse
|
||||||
|
n 5400 2400 75 75 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr
|
||||||
|
|
||||||
|
% Ellipse
|
||||||
|
n 5400 3000 75 75 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr
|
||||||
|
|
||||||
|
% Ellipse
|
||||||
|
n 6000 2400 75 75 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr
|
||||||
|
|
||||||
|
% Ellipse
|
||||||
|
n 6000 3000 75 75 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr
|
||||||
|
|
||||||
|
% Polyline
|
||||||
|
n 1800 1800 m 2100 2400 l gs 0.00 setgray ef gr gs col0 s gr
|
||||||
|
% Polyline
|
||||||
|
n 1800 1800 m 1500 2400 l 1800 3000 l gs col0 s gr
|
||||||
|
% Polyline
|
||||||
|
n 1500 2400 m 1200 3000 l gs 0.00 setgray ef gr gs col0 s gr
|
||||||
|
% Polyline
|
||||||
|
n 2100 2400 m 2400 3000 l gs 0.00 setgray ef gr gs col0 s gr
|
||||||
|
% Polyline
|
||||||
|
n 3900 1800 m 3900 2400 l 4200 3000 l gs col0 s gr
|
||||||
|
% Polyline
|
||||||
|
n 3900 2400 m 3600 3000 l gs col0 s gr
|
||||||
|
% Polyline
|
||||||
|
n 5700 1800 m 6000 2400 l 6000 3000 l gs col0 s gr
|
||||||
|
% Polyline
|
||||||
|
n 5700 1800 m 5400 2400 l 5400 3000 l gs col0 s gr
|
||||||
|
/Times-Roman ff 180.00 scf sf
|
||||||
|
1725 1575 m
|
||||||
|
gs 1 -1 sc (24) col0 sh gr
|
||||||
|
/Times-Roman ff 180.00 scf sf
|
||||||
|
1125 2400 m
|
||||||
|
gs 1 -1 sc (13) col0 sh gr
|
||||||
|
/Times-Roman ff 180.00 scf sf
|
||||||
|
1200 3300 m
|
||||||
|
gs 1 -1 sc (5) col0 sh gr
|
||||||
|
/Times-Roman ff 180.00 scf sf
|
||||||
|
3825 1575 m
|
||||||
|
gs 1 -1 sc (13) col0 sh gr
|
||||||
|
/Times-Roman ff 180.00 scf sf
|
||||||
|
2325 2400 m
|
||||||
|
gs 1 -1 sc (4) col0 sh gr
|
||||||
|
/Times-Roman ff 180.00 scf sf
|
||||||
|
1725 3300 m
|
||||||
|
gs 1 -1 sc (2) col0 sh gr
|
||||||
|
/Times-Roman ff 180.00 scf sf
|
||||||
|
3525 3300 m
|
||||||
|
gs 1 -1 sc (81) col0 sh gr
|
||||||
|
/Times-Roman ff 180.00 scf sf
|
||||||
|
4125 3300 m
|
||||||
|
gs 1 -1 sc (18) col0 sh gr
|
||||||
|
/Times-Roman ff 180.00 scf sf
|
||||||
|
5625 1575 m
|
||||||
|
gs 1 -1 sc (4) col0 sh gr
|
||||||
|
/Times-Roman ff 180.00 scf sf
|
||||||
|
5325 3300 m
|
||||||
|
gs 1 -1 sc (4) col0 sh gr
|
||||||
|
/Times-Roman ff 180.00 scf sf
|
||||||
|
6000 3300 m
|
||||||
|
gs 1 -1 sc (5) col0 sh gr
|
||||||
|
/Times-Roman ff 180.00 scf sf
|
||||||
|
6225 2475 m
|
||||||
|
gs 1 -1 sc (13) col0 sh gr
|
||||||
|
/Times-Roman ff 180.00 scf sf
|
||||||
|
3600 2475 m
|
||||||
|
gs 1 -1 sc (5) col0 sh gr
|
||||||
|
/Times-Roman ff 180.00 scf sf
|
||||||
|
5025 2475 m
|
||||||
|
gs 1 -1 sc (24) col0 sh gr
|
||||||
|
/Times-Roman ff 180.00 scf sf
|
||||||
|
2325 3300 m
|
||||||
|
gs 1 -1 sc (13) col0 sh gr
|
||||||
|
$F2psEnd
|
||||||
|
rs
|
BIN
11/wywolania/Data/UserDoc/user_doc.dvi
Executable file
BIN
11/wywolania/Data/UserDoc/user_doc.dvi
Executable file
Binary file not shown.
1416
11/wywolania/Data/UserDoc/user_doc.ps
Executable file
1416
11/wywolania/Data/UserDoc/user_doc.ps
Executable file
File diff suppressed because it is too large
Load Diff
583
11/wywolania/Data/UserDoc/user_doc.tex
Normal file
583
11/wywolania/Data/UserDoc/user_doc.tex
Normal file
@ -0,0 +1,583 @@
|
|||||||
|
\documentclass{amsart}
|
||||||
|
\usepackage{graphicx}
|
||||||
|
\usepackage{array}
|
||||||
|
\usepackage{moreverb}
|
||||||
|
\title{DRAFT: User Documentation for the STIDE software package}
|
||||||
|
\author{Julie Rehmeyer}
|
||||||
|
\date{\today}
|
||||||
|
\begin{document}
|
||||||
|
\maketitle
|
||||||
|
|
||||||
|
|
||||||
|
\section{Software Purpose} \label{sec:intro}
|
||||||
|
STIDE stands for Sequence Time-Delay Embedding, and it implements the
|
||||||
|
time-delay embedding method of anomaly detection. Its primary
|
||||||
|
function is to accept as input a time series (or a set of time
|
||||||
|
series), divide it into a set of fixed-length sequences, compare that
|
||||||
|
set of sequences with an existing database of fixed length sequences,
|
||||||
|
and report on the consistency of the time series with the existing
|
||||||
|
database. It can also be used to created a database of fixed-length
|
||||||
|
sequences from scratch, or to add to a pre-existing database.
|
||||||
|
|
||||||
|
The STIDE software was originally developed by Steve Hofmeyr, a
|
||||||
|
graduate student in the Computer Science Department at the University
|
||||||
|
of New Mexico, as part of a research program that is applying ideas
|
||||||
|
from immunology to problems in computer security. In particular,
|
||||||
|
STIDE was written to assist in detecting intrusions by identifying the
|
||||||
|
unusual sequences of system calls that may be created during an
|
||||||
|
attempted intrusion \cite{lightweight, ci, principles, self}. In this
|
||||||
|
context, the time series being considered consists of the system calls
|
||||||
|
made by a single process. We first record the system calls made by a
|
||||||
|
process exibiting normal behavior (i.e., in non-exploited situations),
|
||||||
|
and then use STIDE to divide that continuous stream of system calls
|
||||||
|
into sequences of a given length and store them in a database.
|
||||||
|
Subsequently, when we want to know if another instance of the same
|
||||||
|
program has been attacked, we record the system calls the process has
|
||||||
|
generated and use STIDE to compare the resulting sequences of system
|
||||||
|
calls with the database of normal sequences. A large number of
|
||||||
|
sequences created by the potentially attacked process that weren't
|
||||||
|
created by the uncompromised processes suggests that the process may
|
||||||
|
have been exploited.
|
||||||
|
|
||||||
|
In practice, because of limitations in available system call tracing
|
||||||
|
mechanisms, is far easier for us to record simultaneously the system
|
||||||
|
calls generated by several processes that are running at the same
|
||||||
|
time. STIDE is designed to handle this sort of situation. It can
|
||||||
|
simultaneously process multiple interwoven time series by requiring
|
||||||
|
that each element in the input stream be preceded by an identifier to
|
||||||
|
specify which series it comes from. In our work, that identifier is
|
||||||
|
the process ID.
|
||||||
|
|
||||||
|
The simplest way that STIDE can analyze information about the
|
||||||
|
consistency of new data with an existing database is to report the
|
||||||
|
number of anomolous sequences, i.e., the number of sequences in the
|
||||||
|
input which do not exist in the database.
|
||||||
|
|
||||||
|
It can also report the minimum Hamming distance \cite{lightweight}.
|
||||||
|
Given a sequence from the data stream and a sequence from the
|
||||||
|
database, we can compute the number of entries that are different
|
||||||
|
between the two sequences and get the Hamming distance between those
|
||||||
|
two sequences. The minimum of the Hamming distances between the input
|
||||||
|
sequence and all of the sequences from the database is the minimum
|
||||||
|
Hamming distance for the input sequence.
|
||||||
|
|
||||||
|
The final option is that it can report a ``locality frame count''
|
||||||
|
\cite{ci}. When a process is exploited, there may be a short period
|
||||||
|
of time (a locality) when the percentage of anomolous sequences is
|
||||||
|
much higher. Although ten anomalies over the course of a long
|
||||||
|
run may not be cause for concern, ten anomalies within thirty
|
||||||
|
sequences might be. Thus it can be useful to observe how many
|
||||||
|
anomalies there are {\it locally}. The number of sequences that are
|
||||||
|
considered to be ``local'' to one another is called the size of the
|
||||||
|
locality frame. In this mode, STIDE reports the largest number of
|
||||||
|
anomalies it finds within any locality frame.
|
||||||
|
|
||||||
|
An additional advantage of calculating locality frame counts is that
|
||||||
|
it provides an ``on-line'' measure. Ultimately, we are interested in a
|
||||||
|
system which would detect intrusions as the system is running.
|
||||||
|
Because locality frame counts are calculated locally, one can
|
||||||
|
immediately be notified when an intrusion may be occurring.
|
||||||
|
|
||||||
|
\section{Input Data Format} \label{sec:input}
|
||||||
|
The input data consists of the time series to be analyzed. It is read
|
||||||
|
from standard input. It is expected to be a series of pairs of
|
||||||
|
positive integers, one pair per line, where the first integer
|
||||||
|
identifies the data stream and the second integer is the element of
|
||||||
|
the data stream. The end of the data stream can either be designated
|
||||||
|
by the end of the file or by an occurrence of the number $-1$ as a
|
||||||
|
stream identifier. In our work, the stream identifier is the process
|
||||||
|
identification number (PID), and the elements of the data stream are
|
||||||
|
system call numbers.
|
||||||
|
|
||||||
|
The following is a small example of an input file, tracking three
|
||||||
|
processes, with PID's 744, 1069 and 9.
|
||||||
|
|
||||||
|
\vspace{.15in}
|
||||||
|
\begin{tabular}{l l}
|
||||||
|
744 & 24 \\
|
||||||
|
744 & 13 \\
|
||||||
|
1069 & 4 \\
|
||||||
|
1069 & 24 \\
|
||||||
|
1069 & 4 \\
|
||||||
|
744 & 5 \\
|
||||||
|
9 & 24 \\
|
||||||
|
1069 & 13 \\
|
||||||
|
744 & 81 \\
|
||||||
|
9 & 13 \\
|
||||||
|
9 & 2 \\
|
||||||
|
1069 & 5 \\
|
||||||
|
1069 & 18 \\
|
||||||
|
-1
|
||||||
|
\end{tabular}
|
||||||
|
\vspace{.15in}
|
||||||
|
|
||||||
|
If the number $-1$ occurs as a data element, STIDE interprets that as
|
||||||
|
a missing data element. It does not form any sequences going through
|
||||||
|
that data element. It clears the sequence and starts from scratch.
|
||||||
|
|
||||||
|
For example, suppose that the sequence length is 3 and the input is as
|
||||||
|
follows:
|
||||||
|
\nopagebreak
|
||||||
|
\vspace{5pt}
|
||||||
|
\begin{tabular}{l l}
|
||||||
|
220 & 14 \\
|
||||||
|
220 & 185 \\
|
||||||
|
220 & 20 \\
|
||||||
|
220 & -1 \\
|
||||||
|
220 & 2 \\
|
||||||
|
220 & 20 \\
|
||||||
|
220 & 3 \\
|
||||||
|
220 & 2 \\
|
||||||
|
-1
|
||||||
|
\end{tabular}
|
||||||
|
\vspace{.15in}
|
||||||
|
|
||||||
|
STIDE would derive three sequences from this input: 14, 185, 20; 2,
|
||||||
|
20, 3; and 20, 3, 2.
|
||||||
|
|
||||||
|
\section{Configuration Options}
|
||||||
|
There are a number of options which affect STIDE's behavior. Every
|
||||||
|
option has a default value. The values may be changed through command
|
||||||
|
line arguments or through a configuration file. Values set by the
|
||||||
|
configuration file override default values and values set by the
|
||||||
|
command line override those set by either the configuration file or
|
||||||
|
the defaults. The following options are available:
|
||||||
|
|
||||||
|
\vspace{.2in}
|
||||||
|
\setlength{\extrarowheight}{3pt}
|
||||||
|
|
||||||
|
\begin{tabular}{l|l|l|l}
|
||||||
|
|
||||||
|
\vspace{-3pt}
|
||||||
|
Short &&& \\
|
||||||
|
Name & Long name & Legitimate Values & Default Value \\
|
||||||
|
\hline
|
||||||
|
|
||||||
|
{\tt a} & {\tt add\_to\_db} & on or off & off \\
|
||||||
|
{\tt c} & {\tt config\_name} & filenames & stide.config \\
|
||||||
|
{\tt d} & {\tt db\_name} & filenames & default.db \\
|
||||||
|
{\tt f} & {\tt lf\_size} & 1 -- 999 & 1 \\
|
||||||
|
{\tt g} & {\tt output\_graph} & on or off & off \\
|
||||||
|
{\tt l} & {\tt seq\_len} & 1 -- 199 & 6 \\
|
||||||
|
{\tt p} & {\tt pair\_offset} & integers & 0 \\
|
||||||
|
{\tt s} & {\tt write\_db\_stats} & on or off & off \\
|
||||||
|
{\tt v} & {\tt verbose} & on or off & off \\
|
||||||
|
{\tt V} & {\tt very\_verbose} & on or off & off \\
|
||||||
|
{\tt hd} & {\tt compute\_hdist} & on or off & off \\
|
||||||
|
{\tt me} & {\tt max\_elements} & 1 -- 999 & 500 \\
|
||||||
|
{\tt ms} & {\tt max\_streams} & 1 -- 999 & 100 \\
|
||||||
|
{\tt aof} & {\tt add\_output\_format} & see below & see below \\
|
||||||
|
{\tt cof} & {\tt compare\_output\_format} & see below & see below \\
|
||||||
|
|
||||||
|
\end{tabular}
|
||||||
|
|
||||||
|
\vspace{.2in}
|
||||||
|
|
||||||
|
\subsection{Descriptions of Options}
|
||||||
|
|
||||||
|
\subsubsection{Option {\tt add\_to\_db} }
|
||||||
|
|
||||||
|
This flag indicates that you want the input data to be added to the
|
||||||
|
database. If there is no pre-existing database, it indicates that you
|
||||||
|
want to create a new database from the input data. Note that you
|
||||||
|
cannot simultaneously compare data and add it to the database. If
|
||||||
|
this switch is off, STIDE compares the input data with the database
|
||||||
|
without adding it.
|
||||||
|
|
||||||
|
\subsubsection{Option {\tt{config\_name}}}
|
||||||
|
This is the name of the configuration file to be used. See
|
||||||
|
Section~\ref{subsec:config} for more information about the
|
||||||
|
configuration file.
|
||||||
|
|
||||||
|
\subsubsection{Option {\tt db\_name}}
|
||||||
|
This is the name of an existing database or the name under which to
|
||||||
|
store a new database that will be created from the input data.
|
||||||
|
|
||||||
|
\subsubsection{Option {\tt lf\_size}}
|
||||||
|
This is the size of the locality frame (see Section~\ref{sec:intro}
|
||||||
|
for an explanation of locality frame count). The value 1 effectively
|
||||||
|
turns off locality frames.
|
||||||
|
|
||||||
|
\subsubsection{Option {\tt output\_graph}}
|
||||||
|
This causes STIDE to create a file {\tt db\_name.dot} containing a
|
||||||
|
graph of the entire database forest formatted as input for the program
|
||||||
|
Dot. Running Dot on the file translates it into PostScript format.
|
||||||
|
The result is a graphical image of the database.
|
||||||
|
|
||||||
|
\subsubsection{Option {\tt seq\_len}}
|
||||||
|
A database stores trees of sequences of a set length. When building a
|
||||||
|
new database, the length of the sequences to be stored is set with
|
||||||
|
{\tt seq\_len}. When adding to or comparing with an existing
|
||||||
|
database, one must use the same sequence length that was used when the
|
||||||
|
database was generated. In those situations, STIDE will automatically
|
||||||
|
figure out the correct sequence length and use it regardless of the
|
||||||
|
user specification or the default.\footnote{STIDE can do this for
|
||||||
|
revision 1 databases only. STIDE can still process old-style
|
||||||
|
databases, but cannot implement this feature. STIDE recognizes
|
||||||
|
revision 1 databases by their initial line: {\tt \#DBrev: 1 } and the
|
||||||
|
following line: {\tt \#DBseq\_len: } followed by an integer giving
|
||||||
|
the sequence length. When STIDE processes an old-style database, it
|
||||||
|
converts it to a revision 1 database if it is in {\tt add\_to\_db}
|
||||||
|
mode.}
|
||||||
|
|
||||||
|
\subsubsection{Option {\tt pair\_offset}} \label{subsubsec:po}
|
||||||
|
In {\tt verbose} or {\tt very\_verbose} modes, STIDE reports on
|
||||||
|
particular sequences of interest (see Sections \ref{subsubsec:verbose}
|
||||||
|
and \ref{subsubsec:very-verbose}). One of the pieces of information
|
||||||
|
one might be interested in is where a particular sequence occurs in
|
||||||
|
the input. Recall that the input data is a stream of pairs (stream
|
||||||
|
number, element number), and each element in the sequence being
|
||||||
|
considered came from one of those input pairs. STIDE reports on where
|
||||||
|
the sequence occurred in the input by reporting the pair number of the
|
||||||
|
last element of the sequence.
|
||||||
|
|
||||||
|
These numbers may be offset by a fixed amount by setting {\tt
|
||||||
|
pair\_offset}.
|
||||||
|
|
||||||
|
\subsubsection{Option {\tt write\_db\_stats}}
|
||||||
|
This flag causes STIDE to print out statistics on the database. The
|
||||||
|
statistics it will print are the number of nodes in the database, the
|
||||||
|
number of unique sequences, the number of branches, and the average
|
||||||
|
database branch factor. See Section~\ref{sec:output} for more
|
||||||
|
information.
|
||||||
|
|
||||||
|
\subsubsection{Option {\tt verbose}} \label{subsubsec:verbose}
|
||||||
|
When adding to the database in {\tt verbose} mode, STIDE will print
|
||||||
|
information about each new sequence being added to the database, where
|
||||||
|
the precise information is specified by the {\tt add\_output\_format}
|
||||||
|
parameter (see Section~\ref{subsubsec:aof}). When comparing the input
|
||||||
|
data with an existing database in {\tt verbose }mode, it will print
|
||||||
|
information about each sequence that is itself a miss or whose
|
||||||
|
locality frame contains a miss, where the precise information is
|
||||||
|
specified by the {\tt compare\_output\_format} parameter (see
|
||||||
|
Section~\ref{subsubsec:cof}). In either case, when adding or
|
||||||
|
comparing, STIDE will first print out a header with a list of the names
|
||||||
|
of the variables being printed.
|
||||||
|
|
||||||
|
\subsubsection{Option {\tt very\_verbose}} \label{subsubsec:very-verbose}
|
||||||
|
In {\tt very\_verbose} mode, STIDE will print out the information specified
|
||||||
|
by {\tt add\_output\_format} or {\tt compare\_output\_format} for each sequence
|
||||||
|
encountered in the input data, regardless of whether the sequence is
|
||||||
|
new. As in {\tt verbose} mode, STIDE will first print out a header
|
||||||
|
with a list of the names of the variables being printed.
|
||||||
|
|
||||||
|
\subsubsection{Option {\tt compute\_hdist}}
|
||||||
|
This switch causes the Hamming distance \cite{lightweight} to be
|
||||||
|
computed (see Section~\ref{sec:intro} for an explanation of Hamming
|
||||||
|
distance).
|
||||||
|
|
||||||
|
\subsubsection{Option {\tt max\_elements}}
|
||||||
|
This is the maximum number of unique data elements that STIDE might
|
||||||
|
encounter in the input data.
|
||||||
|
|
||||||
|
\subsubsection{Option {\tt max\_streams}}
|
||||||
|
This is the maximum number of different data streams that STIDE might
|
||||||
|
encounter in the input data.
|
||||||
|
|
||||||
|
\subsubsection{Option {\tt add\_output\_format}} \label{subsubsec:aof}
|
||||||
|
|
||||||
|
When adding to the database in {\tt verbose} or {\tt very\_verbose}
|
||||||
|
modes, STIDE will print the {\tt add\_output\_format} string for every
|
||||||
|
sequence of interest (see Sections \ref{subsubsec:verbose} and
|
||||||
|
\ref{subsubsec:very-verbose}). Substitutions are made for control
|
||||||
|
characters as follows:
|
||||||
|
|
||||||
|
\vspace{.15in}
|
||||||
|
|
||||||
|
\begin{tabular}{c|l}
|
||||||
|
\vspace{-4pt}
|
||||||
|
Control \\ Char & Meaning \\ \hline
|
||||||
|
\%s & Stream Identification Number \\
|
||||||
|
\%d & Database Size \\
|
||||||
|
\vspace{-4pt}
|
||||||
|
\%p & Pair number of last data element of \\
|
||||||
|
& sequence in the whole input stream \\
|
||||||
|
\vspace{-4pt}
|
||||||
|
\%i & Pair number of last data element of \\
|
||||||
|
& sequence in its particular data stream \\
|
||||||
|
\verb+\+t & Tab \\
|
||||||
|
\verb+\+n & Newline \\
|
||||||
|
\end{tabular}
|
||||||
|
|
||||||
|
\vspace{.15in}
|
||||||
|
|
||||||
|
See section \ref{subsubsec:po} for more information about the meaning
|
||||||
|
of the \%p and \%i control characters.
|
||||||
|
|
||||||
|
The default value of {\tt add\_output\_format} is:
|
||||||
|
|
||||||
|
\vspace{5pt}
|
||||||
|
|
||||||
|
\verb+"DB Size: %d\tStream: %s\tPair Number: %p\n"+
|
||||||
|
|
||||||
|
\subsubsection{Option {\tt compare\_output\_format}} \label{subsubsec:cof}
|
||||||
|
When comparing data in {\tt verbose} mode, STIDE will print the
|
||||||
|
{\tt compare\_output\_format} string for every sequence which is
|
||||||
|
itself an anomaly or whose locality frame conatins an anomaly. In
|
||||||
|
{\tt very\_verbose} mode, STIDE will print the string indicated for
|
||||||
|
{\it every} sequence, regardless of whether it is an anomaly.
|
||||||
|
Substitutions are made for control characters as follows:
|
||||||
|
|
||||||
|
\vspace{.15in}
|
||||||
|
|
||||||
|
\begin{tabular}{c|l}
|
||||||
|
\vspace{-4pt}
|
||||||
|
Control \\ Char & Meaning \\ \hline
|
||||||
|
\%s & Stream Identification Number \\
|
||||||
|
\vspace{-4pt}
|
||||||
|
\%p & Pair number of last data element of \\
|
||||||
|
& sequence in the whole input stream \\
|
||||||
|
\vspace{-4pt}
|
||||||
|
\%i & Pair number of last data element of \\
|
||||||
|
& sequence in its particular data stream \\
|
||||||
|
\%a & 1 if this sequence is an anomaly, 0 otherwise \\
|
||||||
|
\%c & locality frame count of this sequence \\
|
||||||
|
\%h & Hamming distance \\
|
||||||
|
\verb+\+t & Tab \\
|
||||||
|
\verb+\+n & Newline \\
|
||||||
|
\end{tabular}
|
||||||
|
|
||||||
|
\vspace{.15in}
|
||||||
|
|
||||||
|
See section \ref{subsubsec:po} for more information about the meaning
|
||||||
|
of the \%p and \%i control characters.
|
||||||
|
|
||||||
|
The default value of {\tt compare\_output\_format} is:
|
||||||
|
|
||||||
|
\vspace{5pt}
|
||||||
|
|
||||||
|
\verb+"Pair Number: %p\tStream Number: %s\n"+
|
||||||
|
|
||||||
|
\subsection{Command-Line Arguments}
|
||||||
|
All parameters may be set using the command line, in one of two ways.
|
||||||
|
The short name may be used, preceeded by a hyphen and followed by a
|
||||||
|
value (if appropriate). The long name may also be used, but it must
|
||||||
|
be preceeded by {\it two} hyphens and followed by a value (if
|
||||||
|
appropriate). Values set by the command line override those set in
|
||||||
|
any other way.
|
||||||
|
|
||||||
|
Switches which are simply turned on or off need not be followed by a
|
||||||
|
value. Parameters may be set in any order. There must be space
|
||||||
|
between the parameter name and the value. Flags may not be combined.
|
||||||
|
|
||||||
|
STIDE expects the input data to come from standard input.
|
||||||
|
|
||||||
|
\subsubsection{Examples}
|
||||||
|
|
||||||
|
To use STIDE to create a database called ``our\_data.db'' from the
|
||||||
|
input file ``input1.dat'' with sequences of length 10, using the
|
||||||
|
default configuration file name, in verbose mode, with ouput format
|
||||||
|
``\verb+%p\t%s\t%d\n+'', one could type:
|
||||||
|
|
||||||
|
\vspace{5pt}
|
||||||
|
|
||||||
|
\begin{verbatim}
|
||||||
|
stide -d our_data.db -a -l 10 -v -aof "%p\t%s\t%d\n" < input1.dat
|
||||||
|
\end{verbatim}
|
||||||
|
|
||||||
|
\vspace{5pt}
|
||||||
|
|
||||||
|
To add the data from the file ``input2.dat'' to that database, using
|
||||||
|
the same configuration file, not in verbose mode, and to create a
|
||||||
|
graph in dot format, one could type:
|
||||||
|
|
||||||
|
\vspace{5pt}
|
||||||
|
|
||||||
|
\begin{verbatim}
|
||||||
|
stide -d our_data.db --output_graph --add_to_db -l 10 < input2.dat
|
||||||
|
\end{verbatim}
|
||||||
|
|
||||||
|
\vspace{5pt}
|
||||||
|
|
||||||
|
Then to compare the data in file ``input3.dat'' to the database and
|
||||||
|
have the results reported using locality frame counts with locality
|
||||||
|
frame size 20, using the configuration file ``run3.config'', one would
|
||||||
|
type:
|
||||||
|
|
||||||
|
\vspace{5pt}
|
||||||
|
|
||||||
|
\begin{verbatim}
|
||||||
|
stide -d our_data.db -f 20 -l 10 -c run3.config < input3.dat
|
||||||
|
\end{verbatim}
|
||||||
|
|
||||||
|
\vspace{5pt}
|
||||||
|
|
||||||
|
\subsection{Configuration File} \label{subsec:config}
|
||||||
|
All parameters may be set using a configuration file. The first line
|
||||||
|
of a configuration file must be:\footnote{Old-style configuration
|
||||||
|
files lack this line. STIDE will assume that configuration files
|
||||||
|
that lack this line are old-style, and will try to parse them
|
||||||
|
accordingly, issuing a warning to the user.}
|
||||||
|
|
||||||
|
\vspace{5pt}
|
||||||
|
|
||||||
|
\begin{verbatim}
|
||||||
|
#ConfigFileRev: 1
|
||||||
|
\end{verbatim}
|
||||||
|
|
||||||
|
\vspace{5pt}
|
||||||
|
|
||||||
|
After the first line, lines may be commented out using a ``\#'' sign.
|
||||||
|
Each parameter is set on its own line, using the long name followed by
|
||||||
|
a colon, followed by the value. Lines may be continued by putting a
|
||||||
|
backslash as the last character of the line. White space at the
|
||||||
|
beginning of lines will be ignored. Parameters which are simple
|
||||||
|
switches may be set with the value ``on'' or ``off'', or with no value
|
||||||
|
at all (which will turn them on).
|
||||||
|
|
||||||
|
Configuration file values override default values and are overriden
|
||||||
|
by command-line values.
|
||||||
|
|
||||||
|
\subsubsection{Example}
|
||||||
|
|
||||||
|
The following is a sample configuration file:
|
||||||
|
|
||||||
|
\vspace{.15in}
|
||||||
|
|
||||||
|
\begin{boxedverbatim}
|
||||||
|
|
||||||
|
# ConfigFileRev: 1
|
||||||
|
# Sample STIDE configuration file containing default values.
|
||||||
|
|
||||||
|
db_name: default.db # name of database
|
||||||
|
seq_len: 6 # length of sequences
|
||||||
|
max_elements: 1000 # maximum number of unique elements
|
||||||
|
# in input
|
||||||
|
max_streams: 500 # maximum number of unique streams
|
||||||
|
# in input
|
||||||
|
pair_offset: 0 # offset for pair number count
|
||||||
|
add_output_format: \
|
||||||
|
"DB Size: %d\tStream: %s\tPair Number: %p\n"
|
||||||
|
compare_output_format: \
|
||||||
|
"Pair Number: %p\tStream Number: %s\n"
|
||||||
|
lf_size: 1 # 1 causes locality frame counts not
|
||||||
|
# to be computed
|
||||||
|
add_to_db: off # Add this data to the database, or,
|
||||||
|
# if there is no database, create a
|
||||||
|
# new one -- do not do comparisons
|
||||||
|
output_graph: off # Outputs graphing information in Dot
|
||||||
|
# format
|
||||||
|
compute_hdist: off # Compute Hamming distances
|
||||||
|
write_db_stats: off # At end, print out statistics about
|
||||||
|
# database
|
||||||
|
verbose: off # Verbose mode
|
||||||
|
very_verbose: off # Very verbose mode
|
||||||
|
|
||||||
|
\end{boxedverbatim}
|
||||||
|
|
||||||
|
\section{Output Data} \label{sec:output}
|
||||||
|
For every run, STIDE will first output the final configuration data
|
||||||
|
assembled from the defaults, the configuration file and the
|
||||||
|
command-line arguments, in a format which could be used as a
|
||||||
|
configuration file. The subsequent output depends on whether STIDE was
|
||||||
|
adding to the database or making comparisons.
|
||||||
|
|
||||||
|
\subsection{Output Data About Comparisons}
|
||||||
|
|
||||||
|
If you have run the program to compare sequences, at the end STIDE
|
||||||
|
will print out the number of different streams in the input, the total
|
||||||
|
number of pairs read from the input, the total number of sequences
|
||||||
|
read from the input, the number of sequences that were anomalous, and
|
||||||
|
the percentage of sequences that were anomalous. If locality frame
|
||||||
|
counts were being computed, STIDE reports the maximum locality frame
|
||||||
|
count encountered in any stream, and if Hamming distances were being
|
||||||
|
computed, STIDE reports the largest minimum Hamming distance of any
|
||||||
|
sequence in any stream.
|
||||||
|
|
||||||
|
If the {\tt verbose} switch was on and the {\tt
|
||||||
|
compare\_output\_format} parameter is set appropriately, STIDE will
|
||||||
|
print out information about each sequence which is either itself an
|
||||||
|
anomaly or whose locality frame contains an anomaly (if locality
|
||||||
|
frames are being computed). If the {\tt very\_verbose} switch was on
|
||||||
|
and the {\tt compare\_output\_format} parameter is set appropriately,
|
||||||
|
STIDE will print out information about each sequence, regardless of
|
||||||
|
whether it is an anomaly. The precise information to be output is
|
||||||
|
specified by the user in {\tt compare\_output\_format}. See Section
|
||||||
|
\ref{subsubsec:cof} for details on what information {\tt
|
||||||
|
compare\_output\_format} may request.
|
||||||
|
|
||||||
|
\subsection{Output Data About The Database}
|
||||||
|
|
||||||
|
If you are adding to the database, STIDE will not print out any
|
||||||
|
information automatically (beyond the configuration information).
|
||||||
|
However, one can get further information about the growth of the
|
||||||
|
database by turning on {\tt verbose} or {\tt very\_verbose} modes, and
|
||||||
|
one can get information about the shape and complexity of a database
|
||||||
|
using the {\tt write\_db\_stats} switch.
|
||||||
|
|
||||||
|
\subsubsection{Database Growth Information}
|
||||||
|
|
||||||
|
In {\tt verbose} mode, STIDE will print out information on each new
|
||||||
|
sequence which is added to the database. In {\tt very\_verbose} mode,
|
||||||
|
STIDE will print out information on each sequence read in, regardless
|
||||||
|
of whether it is new. The information that STIDE produces is
|
||||||
|
determined by the {\tt add\_output\_format} parameter. See Section
|
||||||
|
\ref{subsubsec:aof} for details on what information may be requested.
|
||||||
|
|
||||||
|
\subsubsection{Database Statistics}
|
||||||
|
|
||||||
|
The {\tt write\_db\_stats} switch causes STIDE to print out
|
||||||
|
information about the shape and complexity of the database. The {\tt
|
||||||
|
write\_db\_stats} switch may be used either when adding to the
|
||||||
|
database or when making comparisons.
|
||||||
|
|
||||||
|
The sequences are stored as forests (groups of trees). Each path down
|
||||||
|
each tree represents a sequence that STIDE has encountered. STIDE can
|
||||||
|
compute the number of nodes on the trees, the number of leaves (leaves
|
||||||
|
are the ends of the trees, i.e., the last element in a sequence), the
|
||||||
|
number of branches, and the average branch factor, which is the number
|
||||||
|
of branches divided by the difference between the number of nodes and
|
||||||
|
the number of sequences.
|
||||||
|
|
||||||
|
For example, consider the sequences derived from the first sample input file in
|
||||||
|
Section~\ref{sec:input}:
|
||||||
|
\nopagebreak
|
||||||
|
\vspace{5pt}
|
||||||
|
|
||||||
|
\begin{tabular}{c}
|
||||||
|
24, 13, 5 \\
|
||||||
|
13, 5, 81 \\
|
||||||
|
4, 24, 4 \\
|
||||||
|
24, 4, 13 \\
|
||||||
|
4, 13, 5 \\
|
||||||
|
13, 5, 18 \\
|
||||||
|
24, 13, 2 \\
|
||||||
|
\end{tabular}
|
||||||
|
|
||||||
|
\vspace{.15in}
|
||||||
|
|
||||||
|
We can represent those sequences by the forest:
|
||||||
|
|
||||||
|
\vspace{.15in}
|
||||||
|
|
||||||
|
\begin{picture}(350, 80)
|
||||||
|
\put(40,0){\includegraphics{graphic1.eps}}
|
||||||
|
\end{picture}
|
||||||
|
|
||||||
|
\vspace{.15in}
|
||||||
|
|
||||||
|
In this database, the number of nodes is 15, the number of leaves is
|
||||||
|
7, and the number of branches is 12. There are 7 unique sequences.
|
||||||
|
The average branch factor is $12 / (15 - 7) = 1.5$.
|
||||||
|
|
||||||
|
\begin{thebibliography}{99}
|
||||||
|
|
||||||
|
\bibitem{lightweight} S. Hofmeyr, S. Forrest, and A. Somayaji
|
||||||
|
``Lightweight intrusion detection for networked operating systems.''
|
||||||
|
Submitted to {\em Journal of Computer Security} (July, 1997).
|
||||||
|
|
||||||
|
\bibitem{ci} S. Forrest, S. Hofmeyr, and A. Somayaji ``Computer
|
||||||
|
immunology'' {\em Communications of the ACM} Vol. 40, No. 10, pp.
|
||||||
|
88-96 (1997).
|
||||||
|
|
||||||
|
\bibitem{principles} A. Somayaji, S. Hofmeyr, and S. Forrest
|
||||||
|
``Principles of a Computer Immune System.'' New Security Paradigms
|
||||||
|
Workshop (presented September, 1997).
|
||||||
|
|
||||||
|
\bibitem{self} S. Forrest, S.~A. Hofmeyr, A. Somayaji, and T.~A.
|
||||||
|
Longstaff ``A sense of self for Unix processes.'' In Proceedings of
|
||||||
|
the 1996 IEEE Symposium on Computer Security and Privacy, IEEE
|
||||||
|
Computer Society Press, Los Alamitos, CA, pp. 120-128 (1996).
|
||||||
|
\end{thebibliography}
|
||||||
|
|
||||||
|
\end{document}
|
339
11/wywolania/Data/stide_v1.1/COPYING
Normal file
339
11/wywolania/Data/stide_v1.1/COPYING
Normal file
@ -0,0 +1,339 @@
|
|||||||
|
GNU GENERAL PUBLIC LICENSE
|
||||||
|
Version 2, June 1991
|
||||||
|
|
||||||
|
Copyright (C) 1989, 1991 Free Software Foundation, Inc.
|
||||||
|
675 Mass Ave, Cambridge, MA 02139, USA
|
||||||
|
Everyone is permitted to copy and distribute verbatim copies
|
||||||
|
of this license document, but changing it is not allowed.
|
||||||
|
|
||||||
|
Preamble
|
||||||
|
|
||||||
|
The licenses for most software are designed to take away your
|
||||||
|
freedom to share and change it. By contrast, the GNU General Public
|
||||||
|
License is intended to guarantee your freedom to share and change free
|
||||||
|
software--to make sure the software is free for all its users. This
|
||||||
|
General Public License applies to most of the Free Software
|
||||||
|
Foundation's software and to any other program whose authors commit to
|
||||||
|
using it. (Some other Free Software Foundation software is covered by
|
||||||
|
the GNU Library General Public License instead.) You can apply it to
|
||||||
|
your programs, too.
|
||||||
|
|
||||||
|
When we speak of free software, we are referring to freedom, not
|
||||||
|
price. Our General Public Licenses are designed to make sure that you
|
||||||
|
have the freedom to distribute copies of free software (and charge for
|
||||||
|
this service if you wish), that you receive source code or can get it
|
||||||
|
if you want it, that you can change the software or use pieces of it
|
||||||
|
in new free programs; and that you know you can do these things.
|
||||||
|
|
||||||
|
To protect your rights, we need to make restrictions that forbid
|
||||||
|
anyone to deny you these rights or to ask you to surrender the rights.
|
||||||
|
These restrictions translate to certain responsibilities for you if you
|
||||||
|
distribute copies of the software, or if you modify it.
|
||||||
|
|
||||||
|
For example, if you distribute copies of such a program, whether
|
||||||
|
gratis or for a fee, you must give the recipients all the rights that
|
||||||
|
you have. You must make sure that they, too, receive or can get the
|
||||||
|
source code. And you must show them these terms so they know their
|
||||||
|
rights.
|
||||||
|
|
||||||
|
We protect your rights with two steps: (1) copyright the software, and
|
||||||
|
(2) offer you this license which gives you legal permission to copy,
|
||||||
|
distribute and/or modify the software.
|
||||||
|
|
||||||
|
Also, for each author's protection and ours, we want to make certain
|
||||||
|
that everyone understands that there is no warranty for this free
|
||||||
|
software. If the software is modified by someone else and passed on, we
|
||||||
|
want its recipients to know that what they have is not the original, so
|
||||||
|
that any problems introduced by others will not reflect on the original
|
||||||
|
authors' reputations.
|
||||||
|
|
||||||
|
Finally, any free program is threatened constantly by software
|
||||||
|
patents. We wish to avoid the danger that redistributors of a free
|
||||||
|
program will individually obtain patent licenses, in effect making the
|
||||||
|
program proprietary. To prevent this, we have made it clear that any
|
||||||
|
patent must be licensed for everyone's free use or not licensed at all.
|
||||||
|
|
||||||
|
The precise terms and conditions for copying, distribution and
|
||||||
|
modification follow.
|
||||||
|
|
||||||
|
GNU GENERAL PUBLIC LICENSE
|
||||||
|
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
|
||||||
|
|
||||||
|
0. This License applies to any program or other work which contains
|
||||||
|
a notice placed by the copyright holder saying it may be distributed
|
||||||
|
under the terms of this General Public License. The "Program", below,
|
||||||
|
refers to any such program or work, and a "work based on the Program"
|
||||||
|
means either the Program or any derivative work under copyright law:
|
||||||
|
that is to say, a work containing the Program or a portion of it,
|
||||||
|
either verbatim or with modifications and/or translated into another
|
||||||
|
language. (Hereinafter, translation is included without limitation in
|
||||||
|
the term "modification".) Each licensee is addressed as "you".
|
||||||
|
|
||||||
|
Activities other than copying, distribution and modification are not
|
||||||
|
covered by this License; they are outside its scope. The act of
|
||||||
|
running the Program is not restricted, and the output from the Program
|
||||||
|
is covered only if its contents constitute a work based on the
|
||||||
|
Program (independent of having been made by running the Program).
|
||||||
|
Whether that is true depends on what the Program does.
|
||||||
|
|
||||||
|
1. You may copy and distribute verbatim copies of the Program's
|
||||||
|
source code as you receive it, in any medium, provided that you
|
||||||
|
conspicuously and appropriately publish on each copy an appropriate
|
||||||
|
copyright notice and disclaimer of warranty; keep intact all the
|
||||||
|
notices that refer to this License and to the absence of any warranty;
|
||||||
|
and give any other recipients of the Program a copy of this License
|
||||||
|
along with the Program.
|
||||||
|
|
||||||
|
You may charge a fee for the physical act of transferring a copy, and
|
||||||
|
you may at your option offer warranty protection in exchange for a fee.
|
||||||
|
|
||||||
|
2. You may modify your copy or copies of the Program or any portion
|
||||||
|
of it, thus forming a work based on the Program, and copy and
|
||||||
|
distribute such modifications or work under the terms of Section 1
|
||||||
|
above, provided that you also meet all of these conditions:
|
||||||
|
|
||||||
|
a) You must cause the modified files to carry prominent notices
|
||||||
|
stating that you changed the files and the date of any change.
|
||||||
|
|
||||||
|
b) You must cause any work that you distribute or publish, that in
|
||||||
|
whole or in part contains or is derived from the Program or any
|
||||||
|
part thereof, to be licensed as a whole at no charge to all third
|
||||||
|
parties under the terms of this License.
|
||||||
|
|
||||||
|
c) If the modified program normally reads commands interactively
|
||||||
|
when run, you must cause it, when started running for such
|
||||||
|
interactive use in the most ordinary way, to print or display an
|
||||||
|
announcement including an appropriate copyright notice and a
|
||||||
|
notice that there is no warranty (or else, saying that you provide
|
||||||
|
a warranty) and that users may redistribute the program under
|
||||||
|
these conditions, and telling the user how to view a copy of this
|
||||||
|
License. (Exception: if the Program itself is interactive but
|
||||||
|
does not normally print such an announcement, your work based on
|
||||||
|
the Program is not required to print an announcement.)
|
||||||
|
|
||||||
|
These requirements apply to the modified work as a whole. If
|
||||||
|
identifiable sections of that work are not derived from the Program,
|
||||||
|
and can be reasonably considered independent and separate works in
|
||||||
|
themselves, then this License, and its terms, do not apply to those
|
||||||
|
sections when you distribute them as separate works. But when you
|
||||||
|
distribute the same sections as part of a whole which is a work based
|
||||||
|
on the Program, the distribution of the whole must be on the terms of
|
||||||
|
this License, whose permissions for other licensees extend to the
|
||||||
|
entire whole, and thus to each and every part regardless of who wrote it.
|
||||||
|
|
||||||
|
Thus, it is not the intent of this section to claim rights or contest
|
||||||
|
your rights to work written entirely by you; rather, the intent is to
|
||||||
|
exercise the right to control the distribution of derivative or
|
||||||
|
collective works based on the Program.
|
||||||
|
|
||||||
|
In addition, mere aggregation of another work not based on the Program
|
||||||
|
with the Program (or with a work based on the Program) on a volume of
|
||||||
|
a storage or distribution medium does not bring the other work under
|
||||||
|
the scope of this License.
|
||||||
|
|
||||||
|
3. You may copy and distribute the Program (or a work based on it,
|
||||||
|
under Section 2) in object code or executable form under the terms of
|
||||||
|
Sections 1 and 2 above provided that you also do one of the following:
|
||||||
|
|
||||||
|
a) Accompany it with the complete corresponding machine-readable
|
||||||
|
source code, which must be distributed under the terms of Sections
|
||||||
|
1 and 2 above on a medium customarily used for software interchange; or,
|
||||||
|
|
||||||
|
b) Accompany it with a written offer, valid for at least three
|
||||||
|
years, to give any third party, for a charge no more than your
|
||||||
|
cost of physically performing source distribution, a complete
|
||||||
|
machine-readable copy of the corresponding source code, to be
|
||||||
|
distributed under the terms of Sections 1 and 2 above on a medium
|
||||||
|
customarily used for software interchange; or,
|
||||||
|
|
||||||
|
c) Accompany it with the information you received as to the offer
|
||||||
|
to distribute corresponding source code. (This alternative is
|
||||||
|
allowed only for noncommercial distribution and only if you
|
||||||
|
received the program in object code or executable form with such
|
||||||
|
an offer, in accord with Subsection b above.)
|
||||||
|
|
||||||
|
The source code for a work means the preferred form of the work for
|
||||||
|
making modifications to it. For an executable work, complete source
|
||||||
|
code means all the source code for all modules it contains, plus any
|
||||||
|
associated interface definition files, plus the scripts used to
|
||||||
|
control compilation and installation of the executable. However, as a
|
||||||
|
special exception, the source code distributed need not include
|
||||||
|
anything that is normally distributed (in either source or binary
|
||||||
|
form) with the major components (compiler, kernel, and so on) of the
|
||||||
|
operating system on which the executable runs, unless that component
|
||||||
|
itself accompanies the executable.
|
||||||
|
|
||||||
|
If distribution of executable or object code is made by offering
|
||||||
|
access to copy from a designated place, then offering equivalent
|
||||||
|
access to copy the source code from the same place counts as
|
||||||
|
distribution of the source code, even though third parties are not
|
||||||
|
compelled to copy the source along with the object code.
|
||||||
|
|
||||||
|
4. You may not copy, modify, sublicense, or distribute the Program
|
||||||
|
except as expressly provided under this License. Any attempt
|
||||||
|
otherwise to copy, modify, sublicense or distribute the Program is
|
||||||
|
void, and will automatically terminate your rights under this License.
|
||||||
|
However, parties who have received copies, or rights, from you under
|
||||||
|
this License will not have their licenses terminated so long as such
|
||||||
|
parties remain in full compliance.
|
||||||
|
|
||||||
|
5. You are not required to accept this License, since you have not
|
||||||
|
signed it. However, nothing else grants you permission to modify or
|
||||||
|
distribute the Program or its derivative works. These actions are
|
||||||
|
prohibited by law if you do not accept this License. Therefore, by
|
||||||
|
modifying or distributing the Program (or any work based on the
|
||||||
|
Program), you indicate your acceptance of this License to do so, and
|
||||||
|
all its terms and conditions for copying, distributing or modifying
|
||||||
|
the Program or works based on it.
|
||||||
|
|
||||||
|
6. Each time you redistribute the Program (or any work based on the
|
||||||
|
Program), the recipient automatically receives a license from the
|
||||||
|
original licensor to copy, distribute or modify the Program subject to
|
||||||
|
these terms and conditions. You may not impose any further
|
||||||
|
restrictions on the recipients' exercise of the rights granted herein.
|
||||||
|
You are not responsible for enforcing compliance by third parties to
|
||||||
|
this License.
|
||||||
|
|
||||||
|
7. If, as a consequence of a court judgment or allegation of patent
|
||||||
|
infringement or for any other reason (not limited to patent issues),
|
||||||
|
conditions are imposed on you (whether by court order, agreement or
|
||||||
|
otherwise) that contradict the conditions of this License, they do not
|
||||||
|
excuse you from the conditions of this License. If you cannot
|
||||||
|
distribute so as to satisfy simultaneously your obligations under this
|
||||||
|
License and any other pertinent obligations, then as a consequence you
|
||||||
|
may not distribute the Program at all. For example, if a patent
|
||||||
|
license would not permit royalty-free redistribution of the Program by
|
||||||
|
all those who receive copies directly or indirectly through you, then
|
||||||
|
the only way you could satisfy both it and this License would be to
|
||||||
|
refrain entirely from distribution of the Program.
|
||||||
|
|
||||||
|
If any portion of this section is held invalid or unenforceable under
|
||||||
|
any particular circumstance, the balance of the section is intended to
|
||||||
|
apply and the section as a whole is intended to apply in other
|
||||||
|
circumstances.
|
||||||
|
|
||||||
|
It is not the purpose of this section to induce you to infringe any
|
||||||
|
patents or other property right claims or to contest validity of any
|
||||||
|
such claims; this section has the sole purpose of protecting the
|
||||||
|
integrity of the free software distribution system, which is
|
||||||
|
implemented by public license practices. Many people have made
|
||||||
|
generous contributions to the wide range of software distributed
|
||||||
|
through that system in reliance on consistent application of that
|
||||||
|
system; it is up to the author/donor to decide if he or she is willing
|
||||||
|
to distribute software through any other system and a licensee cannot
|
||||||
|
impose that choice.
|
||||||
|
|
||||||
|
This section is intended to make thoroughly clear what is believed to
|
||||||
|
be a consequence of the rest of this License.
|
||||||
|
|
||||||
|
8. If the distribution and/or use of the Program is restricted in
|
||||||
|
certain countries either by patents or by copyrighted interfaces, the
|
||||||
|
original copyright holder who places the Program under this License
|
||||||
|
may add an explicit geographical distribution limitation excluding
|
||||||
|
those countries, so that distribution is permitted only in or among
|
||||||
|
countries not thus excluded. In such case, this License incorporates
|
||||||
|
the limitation as if written in the body of this License.
|
||||||
|
|
||||||
|
9. The Free Software Foundation may publish revised and/or new versions
|
||||||
|
of the General Public License from time to time. Such new versions will
|
||||||
|
be similar in spirit to the present version, but may differ in detail to
|
||||||
|
address new problems or concerns.
|
||||||
|
|
||||||
|
Each version is given a distinguishing version number. If the Program
|
||||||
|
specifies a version number of this License which applies to it and "any
|
||||||
|
later version", you have the option of following the terms and conditions
|
||||||
|
either of that version or of any later version published by the Free
|
||||||
|
Software Foundation. If the Program does not specify a version number of
|
||||||
|
this License, you may choose any version ever published by the Free Software
|
||||||
|
Foundation.
|
||||||
|
|
||||||
|
10. If you wish to incorporate parts of the Program into other free
|
||||||
|
programs whose distribution conditions are different, write to the author
|
||||||
|
to ask for permission. For software which is copyrighted by the Free
|
||||||
|
Software Foundation, write to the Free Software Foundation; we sometimes
|
||||||
|
make exceptions for this. Our decision will be guided by the two goals
|
||||||
|
of preserving the free status of all derivatives of our free software and
|
||||||
|
of promoting the sharing and reuse of software generally.
|
||||||
|
|
||||||
|
NO WARRANTY
|
||||||
|
|
||||||
|
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
|
||||||
|
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
|
||||||
|
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
|
||||||
|
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
|
||||||
|
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
|
||||||
|
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
|
||||||
|
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
|
||||||
|
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
|
||||||
|
REPAIR OR CORRECTION.
|
||||||
|
|
||||||
|
12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
|
||||||
|
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
|
||||||
|
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
|
||||||
|
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
|
||||||
|
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
|
||||||
|
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
|
||||||
|
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
|
||||||
|
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
|
||||||
|
POSSIBILITY OF SUCH DAMAGES.
|
||||||
|
|
||||||
|
END OF TERMS AND CONDITIONS
|
||||||
|
|
||||||
|
Appendix: How to Apply These Terms to Your New Programs
|
||||||
|
|
||||||
|
If you develop a new program, and you want it to be of the greatest
|
||||||
|
possible use to the public, the best way to achieve this is to make it
|
||||||
|
free software which everyone can redistribute and change under these terms.
|
||||||
|
|
||||||
|
To do so, attach the following notices to the program. It is safest
|
||||||
|
to attach them to the start of each source file to most effectively
|
||||||
|
convey the exclusion of warranty; and each file should have at least
|
||||||
|
the "copyright" line and a pointer to where the full notice is found.
|
||||||
|
|
||||||
|
<one line to give the program's name and a brief idea of what it does.>
|
||||||
|
Copyright (C) 19yy <name of author>
|
||||||
|
|
||||||
|
This program is free software; you can redistribute it and/or modify
|
||||||
|
it under the terms of the GNU General Public License as published by
|
||||||
|
the Free Software Foundation; either version 2 of the License, or
|
||||||
|
(at your option) any later version.
|
||||||
|
|
||||||
|
This program is distributed in the hope that it will be useful,
|
||||||
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
GNU General Public License for more details.
|
||||||
|
|
||||||
|
You should have received a copy of the GNU General Public License
|
||||||
|
along with this program; if not, write to the Free Software
|
||||||
|
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
|
||||||
|
|
||||||
|
Also add information on how to contact you by electronic and paper mail.
|
||||||
|
|
||||||
|
If the program is interactive, make it output a short notice like this
|
||||||
|
when it starts in an interactive mode:
|
||||||
|
|
||||||
|
Gnomovision version 69, Copyright (C) 19yy name of author
|
||||||
|
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
|
||||||
|
This is free software, and you are welcome to redistribute it
|
||||||
|
under certain conditions; type `show c' for details.
|
||||||
|
|
||||||
|
The hypothetical commands `show w' and `show c' should show the appropriate
|
||||||
|
parts of the General Public License. Of course, the commands you use may
|
||||||
|
be called something other than `show w' and `show c'; they could even be
|
||||||
|
mouse-clicks or menu items--whatever suits your program.
|
||||||
|
|
||||||
|
You should also get your employer (if you work as a programmer) or your
|
||||||
|
school, if any, to sign a "copyright disclaimer" for the program, if
|
||||||
|
necessary. Here is a sample; alter the names:
|
||||||
|
|
||||||
|
Yoyodyne, Inc., hereby disclaims all copyright interest in the program
|
||||||
|
`Gnomovision' (which makes passes at compilers) written by James Hacker.
|
||||||
|
|
||||||
|
<signature of Ty Coon>, 1 April 1989
|
||||||
|
Ty Coon, President of Vice
|
||||||
|
|
||||||
|
This General Public License does not permit incorporating your program into
|
||||||
|
proprietary programs. If your program is a subroutine library, you may
|
||||||
|
consider it more useful to permit linking proprietary applications with the
|
||||||
|
library. If this is what you want to do, use the GNU Library General
|
||||||
|
Public License instead of this License.
|
6
11/wywolania/Data/stide_v1.1/Makefile
Normal file
6
11/wywolania/Data/stide_v1.1/Makefile
Normal file
@ -0,0 +1,6 @@
|
|||||||
|
all:
|
||||||
|
(cd Seq-code; make; cp stide ..)
|
||||||
|
|
||||||
|
clean:
|
||||||
|
@rm -f stide
|
||||||
|
@(cd Seq-code; rm -f *.o stide)
|
11
11/wywolania/Data/stide_v1.1/README
Normal file
11
11/wywolania/Data/stide_v1.1/README
Normal file
@ -0,0 +1,11 @@
|
|||||||
|
STIDE version 1.1
|
||||||
|
|
||||||
|
Copyright (C) 1996, 1998 The Regents of the University
|
||||||
|
of New Mexico. All rights reserved.
|
||||||
|
|
||||||
|
|
||||||
|
This code was written for GCC version 2.7.2, but should compile correctly
|
||||||
|
under other more recent versions of GCC.
|
||||||
|
|
||||||
|
For usage information invoke stide with the --help option. More detailed
|
||||||
|
documentation can be found in the UserDoc directory.
|
339
11/wywolania/Data/stide_v1.1/Seq-code/COPYING
Executable file
339
11/wywolania/Data/stide_v1.1/Seq-code/COPYING
Executable file
@ -0,0 +1,339 @@
|
|||||||
|
GNU GENERAL PUBLIC LICENSE
|
||||||
|
Version 2, June 1991
|
||||||
|
|
||||||
|
Copyright (C) 1989, 1991 Free Software Foundation, Inc.
|
||||||
|
675 Mass Ave, Cambridge, MA 02139, USA
|
||||||
|
Everyone is permitted to copy and distribute verbatim copies
|
||||||
|
of this license document, but changing it is not allowed.
|
||||||
|
|
||||||
|
Preamble
|
||||||
|
|
||||||
|
The licenses for most software are designed to take away your
|
||||||
|
freedom to share and change it. By contrast, the GNU General Public
|
||||||
|
License is intended to guarantee your freedom to share and change free
|
||||||
|
software--to make sure the software is free for all its users. This
|
||||||
|
General Public License applies to most of the Free Software
|
||||||
|
Foundation's software and to any other program whose authors commit to
|
||||||
|
using it. (Some other Free Software Foundation software is covered by
|
||||||
|
the GNU Library General Public License instead.) You can apply it to
|
||||||
|
your programs, too.
|
||||||
|
|
||||||
|
When we speak of free software, we are referring to freedom, not
|
||||||
|
price. Our General Public Licenses are designed to make sure that you
|
||||||
|
have the freedom to distribute copies of free software (and charge for
|
||||||
|
this service if you wish), that you receive source code or can get it
|
||||||
|
if you want it, that you can change the software or use pieces of it
|
||||||
|
in new free programs; and that you know you can do these things.
|
||||||
|
|
||||||
|
To protect your rights, we need to make restrictions that forbid
|
||||||
|
anyone to deny you these rights or to ask you to surrender the rights.
|
||||||
|
These restrictions translate to certain responsibilities for you if you
|
||||||
|
distribute copies of the software, or if you modify it.
|
||||||
|
|
||||||
|
For example, if you distribute copies of such a program, whether
|
||||||
|
gratis or for a fee, you must give the recipients all the rights that
|
||||||
|
you have. You must make sure that they, too, receive or can get the
|
||||||
|
source code. And you must show them these terms so they know their
|
||||||
|
rights.
|
||||||
|
|
||||||
|
We protect your rights with two steps: (1) copyright the software, and
|
||||||
|
(2) offer you this license which gives you legal permission to copy,
|
||||||
|
distribute and/or modify the software.
|
||||||
|
|
||||||
|
Also, for each author's protection and ours, we want to make certain
|
||||||
|
that everyone understands that there is no warranty for this free
|
||||||
|
software. If the software is modified by someone else and passed on, we
|
||||||
|
want its recipients to know that what they have is not the original, so
|
||||||
|
that any problems introduced by others will not reflect on the original
|
||||||
|
authors' reputations.
|
||||||
|
|
||||||
|
Finally, any free program is threatened constantly by software
|
||||||
|
patents. We wish to avoid the danger that redistributors of a free
|
||||||
|
program will individually obtain patent licenses, in effect making the
|
||||||
|
program proprietary. To prevent this, we have made it clear that any
|
||||||
|
patent must be licensed for everyone's free use or not licensed at all.
|
||||||
|
|
||||||
|
The precise terms and conditions for copying, distribution and
|
||||||
|
modification follow.
|
||||||
|
|
||||||
|
GNU GENERAL PUBLIC LICENSE
|
||||||
|
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
|
||||||
|
|
||||||
|
0. This License applies to any program or other work which contains
|
||||||
|
a notice placed by the copyright holder saying it may be distributed
|
||||||
|
under the terms of this General Public License. The "Program", below,
|
||||||
|
refers to any such program or work, and a "work based on the Program"
|
||||||
|
means either the Program or any derivative work under copyright law:
|
||||||
|
that is to say, a work containing the Program or a portion of it,
|
||||||
|
either verbatim or with modifications and/or translated into another
|
||||||
|
language. (Hereinafter, translation is included without limitation in
|
||||||
|
the term "modification".) Each licensee is addressed as "you".
|
||||||
|
|
||||||
|
Activities other than copying, distribution and modification are not
|
||||||
|
covered by this License; they are outside its scope. The act of
|
||||||
|
running the Program is not restricted, and the output from the Program
|
||||||
|
is covered only if its contents constitute a work based on the
|
||||||
|
Program (independent of having been made by running the Program).
|
||||||
|
Whether that is true depends on what the Program does.
|
||||||
|
|
||||||
|
1. You may copy and distribute verbatim copies of the Program's
|
||||||
|
source code as you receive it, in any medium, provided that you
|
||||||
|
conspicuously and appropriately publish on each copy an appropriate
|
||||||
|
copyright notice and disclaimer of warranty; keep intact all the
|
||||||
|
notices that refer to this License and to the absence of any warranty;
|
||||||
|
and give any other recipients of the Program a copy of this License
|
||||||
|
along with the Program.
|
||||||
|
|
||||||
|
You may charge a fee for the physical act of transferring a copy, and
|
||||||
|
you may at your option offer warranty protection in exchange for a fee.
|
||||||
|
|
||||||
|
2. You may modify your copy or copies of the Program or any portion
|
||||||
|
of it, thus forming a work based on the Program, and copy and
|
||||||
|
distribute such modifications or work under the terms of Section 1
|
||||||
|
above, provided that you also meet all of these conditions:
|
||||||
|
|
||||||
|
a) You must cause the modified files to carry prominent notices
|
||||||
|
stating that you changed the files and the date of any change.
|
||||||
|
|
||||||
|
b) You must cause any work that you distribute or publish, that in
|
||||||
|
whole or in part contains or is derived from the Program or any
|
||||||
|
part thereof, to be licensed as a whole at no charge to all third
|
||||||
|
parties under the terms of this License.
|
||||||
|
|
||||||
|
c) If the modified program normally reads commands interactively
|
||||||
|
when run, you must cause it, when started running for such
|
||||||
|
interactive use in the most ordinary way, to print or display an
|
||||||
|
announcement including an appropriate copyright notice and a
|
||||||
|
notice that there is no warranty (or else, saying that you provide
|
||||||
|
a warranty) and that users may redistribute the program under
|
||||||
|
these conditions, and telling the user how to view a copy of this
|
||||||
|
License. (Exception: if the Program itself is interactive but
|
||||||
|
does not normally print such an announcement, your work based on
|
||||||
|
the Program is not required to print an announcement.)
|
||||||
|
|
||||||
|
These requirements apply to the modified work as a whole. If
|
||||||
|
identifiable sections of that work are not derived from the Program,
|
||||||
|
and can be reasonably considered independent and separate works in
|
||||||
|
themselves, then this License, and its terms, do not apply to those
|
||||||
|
sections when you distribute them as separate works. But when you
|
||||||
|
distribute the same sections as part of a whole which is a work based
|
||||||
|
on the Program, the distribution of the whole must be on the terms of
|
||||||
|
this License, whose permissions for other licensees extend to the
|
||||||
|
entire whole, and thus to each and every part regardless of who wrote it.
|
||||||
|
|
||||||
|
Thus, it is not the intent of this section to claim rights or contest
|
||||||
|
your rights to work written entirely by you; rather, the intent is to
|
||||||
|
exercise the right to control the distribution of derivative or
|
||||||
|
collective works based on the Program.
|
||||||
|
|
||||||
|
In addition, mere aggregation of another work not based on the Program
|
||||||
|
with the Program (or with a work based on the Program) on a volume of
|
||||||
|
a storage or distribution medium does not bring the other work under
|
||||||
|
the scope of this License.
|
||||||
|
|
||||||
|
3. You may copy and distribute the Program (or a work based on it,
|
||||||
|
under Section 2) in object code or executable form under the terms of
|
||||||
|
Sections 1 and 2 above provided that you also do one of the following:
|
||||||
|
|
||||||
|
a) Accompany it with the complete corresponding machine-readable
|
||||||
|
source code, which must be distributed under the terms of Sections
|
||||||
|
1 and 2 above on a medium customarily used for software interchange; or,
|
||||||
|
|
||||||
|
b) Accompany it with a written offer, valid for at least three
|
||||||
|
years, to give any third party, for a charge no more than your
|
||||||
|
cost of physically performing source distribution, a complete
|
||||||
|
machine-readable copy of the corresponding source code, to be
|
||||||
|
distributed under the terms of Sections 1 and 2 above on a medium
|
||||||
|
customarily used for software interchange; or,
|
||||||
|
|
||||||
|
c) Accompany it with the information you received as to the offer
|
||||||
|
to distribute corresponding source code. (This alternative is
|
||||||
|
allowed only for noncommercial distribution and only if you
|
||||||
|
received the program in object code or executable form with such
|
||||||
|
an offer, in accord with Subsection b above.)
|
||||||
|
|
||||||
|
The source code for a work means the preferred form of the work for
|
||||||
|
making modifications to it. For an executable work, complete source
|
||||||
|
code means all the source code for all modules it contains, plus any
|
||||||
|
associated interface definition files, plus the scripts used to
|
||||||
|
control compilation and installation of the executable. However, as a
|
||||||
|
special exception, the source code distributed need not include
|
||||||
|
anything that is normally distributed (in either source or binary
|
||||||
|
form) with the major components (compiler, kernel, and so on) of the
|
||||||
|
operating system on which the executable runs, unless that component
|
||||||
|
itself accompanies the executable.
|
||||||
|
|
||||||
|
If distribution of executable or object code is made by offering
|
||||||
|
access to copy from a designated place, then offering equivalent
|
||||||
|
access to copy the source code from the same place counts as
|
||||||
|
distribution of the source code, even though third parties are not
|
||||||
|
compelled to copy the source along with the object code.
|
||||||
|
|
||||||
|
4. You may not copy, modify, sublicense, or distribute the Program
|
||||||
|
except as expressly provided under this License. Any attempt
|
||||||
|
otherwise to copy, modify, sublicense or distribute the Program is
|
||||||
|
void, and will automatically terminate your rights under this License.
|
||||||
|
However, parties who have received copies, or rights, from you under
|
||||||
|
this License will not have their licenses terminated so long as such
|
||||||
|
parties remain in full compliance.
|
||||||
|
|
||||||
|
5. You are not required to accept this License, since you have not
|
||||||
|
signed it. However, nothing else grants you permission to modify or
|
||||||
|
distribute the Program or its derivative works. These actions are
|
||||||
|
prohibited by law if you do not accept this License. Therefore, by
|
||||||
|
modifying or distributing the Program (or any work based on the
|
||||||
|
Program), you indicate your acceptance of this License to do so, and
|
||||||
|
all its terms and conditions for copying, distributing or modifying
|
||||||
|
the Program or works based on it.
|
||||||
|
|
||||||
|
6. Each time you redistribute the Program (or any work based on the
|
||||||
|
Program), the recipient automatically receives a license from the
|
||||||
|
original licensor to copy, distribute or modify the Program subject to
|
||||||
|
these terms and conditions. You may not impose any further
|
||||||
|
restrictions on the recipients' exercise of the rights granted herein.
|
||||||
|
You are not responsible for enforcing compliance by third parties to
|
||||||
|
this License.
|
||||||
|
|
||||||
|
7. If, as a consequence of a court judgment or allegation of patent
|
||||||
|
infringement or for any other reason (not limited to patent issues),
|
||||||
|
conditions are imposed on you (whether by court order, agreement or
|
||||||
|
otherwise) that contradict the conditions of this License, they do not
|
||||||
|
excuse you from the conditions of this License. If you cannot
|
||||||
|
distribute so as to satisfy simultaneously your obligations under this
|
||||||
|
License and any other pertinent obligations, then as a consequence you
|
||||||
|
may not distribute the Program at all. For example, if a patent
|
||||||
|
license would not permit royalty-free redistribution of the Program by
|
||||||
|
all those who receive copies directly or indirectly through you, then
|
||||||
|
the only way you could satisfy both it and this License would be to
|
||||||
|
refrain entirely from distribution of the Program.
|
||||||
|
|
||||||
|
If any portion of this section is held invalid or unenforceable under
|
||||||
|
any particular circumstance, the balance of the section is intended to
|
||||||
|
apply and the section as a whole is intended to apply in other
|
||||||
|
circumstances.
|
||||||
|
|
||||||
|
It is not the purpose of this section to induce you to infringe any
|
||||||
|
patents or other property right claims or to contest validity of any
|
||||||
|
such claims; this section has the sole purpose of protecting the
|
||||||
|
integrity of the free software distribution system, which is
|
||||||
|
implemented by public license practices. Many people have made
|
||||||
|
generous contributions to the wide range of software distributed
|
||||||
|
through that system in reliance on consistent application of that
|
||||||
|
system; it is up to the author/donor to decide if he or she is willing
|
||||||
|
to distribute software through any other system and a licensee cannot
|
||||||
|
impose that choice.
|
||||||
|
|
||||||
|
This section is intended to make thoroughly clear what is believed to
|
||||||
|
be a consequence of the rest of this License.
|
||||||
|
|
||||||
|
8. If the distribution and/or use of the Program is restricted in
|
||||||
|
certain countries either by patents or by copyrighted interfaces, the
|
||||||
|
original copyright holder who places the Program under this License
|
||||||
|
may add an explicit geographical distribution limitation excluding
|
||||||
|
those countries, so that distribution is permitted only in or among
|
||||||
|
countries not thus excluded. In such case, this License incorporates
|
||||||
|
the limitation as if written in the body of this License.
|
||||||
|
|
||||||
|
9. The Free Software Foundation may publish revised and/or new versions
|
||||||
|
of the General Public License from time to time. Such new versions will
|
||||||
|
be similar in spirit to the present version, but may differ in detail to
|
||||||
|
address new problems or concerns.
|
||||||
|
|
||||||
|
Each version is given a distinguishing version number. If the Program
|
||||||
|
specifies a version number of this License which applies to it and "any
|
||||||
|
later version", you have the option of following the terms and conditions
|
||||||
|
either of that version or of any later version published by the Free
|
||||||
|
Software Foundation. If the Program does not specify a version number of
|
||||||
|
this License, you may choose any version ever published by the Free Software
|
||||||
|
Foundation.
|
||||||
|
|
||||||
|
10. If you wish to incorporate parts of the Program into other free
|
||||||
|
programs whose distribution conditions are different, write to the author
|
||||||
|
to ask for permission. For software which is copyrighted by the Free
|
||||||
|
Software Foundation, write to the Free Software Foundation; we sometimes
|
||||||
|
make exceptions for this. Our decision will be guided by the two goals
|
||||||
|
of preserving the free status of all derivatives of our free software and
|
||||||
|
of promoting the sharing and reuse of software generally.
|
||||||
|
|
||||||
|
NO WARRANTY
|
||||||
|
|
||||||
|
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
|
||||||
|
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
|
||||||
|
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
|
||||||
|
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
|
||||||
|
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
|
||||||
|
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
|
||||||
|
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
|
||||||
|
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
|
||||||
|
REPAIR OR CORRECTION.
|
||||||
|
|
||||||
|
12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
|
||||||
|
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
|
||||||
|
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
|
||||||
|
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
|
||||||
|
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
|
||||||
|
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
|
||||||
|
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
|
||||||
|
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
|
||||||
|
POSSIBILITY OF SUCH DAMAGES.
|
||||||
|
|
||||||
|
END OF TERMS AND CONDITIONS
|
||||||
|
|
||||||
|
Appendix: How to Apply These Terms to Your New Programs
|
||||||
|
|
||||||
|
If you develop a new program, and you want it to be of the greatest
|
||||||
|
possible use to the public, the best way to achieve this is to make it
|
||||||
|
free software which everyone can redistribute and change under these terms.
|
||||||
|
|
||||||
|
To do so, attach the following notices to the program. It is safest
|
||||||
|
to attach them to the start of each source file to most effectively
|
||||||
|
convey the exclusion of warranty; and each file should have at least
|
||||||
|
the "copyright" line and a pointer to where the full notice is found.
|
||||||
|
|
||||||
|
<one line to give the program's name and a brief idea of what it does.>
|
||||||
|
Copyright (C) 19yy <name of author>
|
||||||
|
|
||||||
|
This program is free software; you can redistribute it and/or modify
|
||||||
|
it under the terms of the GNU General Public License as published by
|
||||||
|
the Free Software Foundation; either version 2 of the License, or
|
||||||
|
(at your option) any later version.
|
||||||
|
|
||||||
|
This program is distributed in the hope that it will be useful,
|
||||||
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
GNU General Public License for more details.
|
||||||
|
|
||||||
|
You should have received a copy of the GNU General Public License
|
||||||
|
along with this program; if not, write to the Free Software
|
||||||
|
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
|
||||||
|
|
||||||
|
Also add information on how to contact you by electronic and paper mail.
|
||||||
|
|
||||||
|
If the program is interactive, make it output a short notice like this
|
||||||
|
when it starts in an interactive mode:
|
||||||
|
|
||||||
|
Gnomovision version 69, Copyright (C) 19yy name of author
|
||||||
|
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
|
||||||
|
This is free software, and you are welcome to redistribute it
|
||||||
|
under certain conditions; type `show c' for details.
|
||||||
|
|
||||||
|
The hypothetical commands `show w' and `show c' should show the appropriate
|
||||||
|
parts of the General Public License. Of course, the commands you use may
|
||||||
|
be called something other than `show w' and `show c'; they could even be
|
||||||
|
mouse-clicks or menu items--whatever suits your program.
|
||||||
|
|
||||||
|
You should also get your employer (if you work as a programmer) or your
|
||||||
|
school, if any, to sign a "copyright disclaimer" for the program, if
|
||||||
|
necessary. Here is a sample; alter the names:
|
||||||
|
|
||||||
|
Yoyodyne, Inc., hereby disclaims all copyright interest in the program
|
||||||
|
`Gnomovision' (which makes passes at compilers) written by James Hacker.
|
||||||
|
|
||||||
|
<signature of Ty Coon>, 1 April 1989
|
||||||
|
Ty Coon, President of Vice
|
||||||
|
|
||||||
|
This General Public License does not permit incorporating your program into
|
||||||
|
proprietary programs. If your program is a subroutine library, you may
|
||||||
|
consider it more useful to permit linking proprietary applications with the
|
||||||
|
library. If this is what you want to do, use the GNU Library General
|
||||||
|
Public License instead of this License.
|
23
11/wywolania/Data/stide_v1.1/Seq-code/Makefile
Executable file
23
11/wywolania/Data/stide_v1.1/Seq-code/Makefile
Executable file
@ -0,0 +1,23 @@
|
|||||||
|
STIDE_OBJECTS = stide.o seq_config.o seq_stream.o template.o flexitree.o
|
||||||
|
|
||||||
|
LIBES = -lm
|
||||||
|
|
||||||
|
#FLAGS = -O2
|
||||||
|
FLAGS = -g
|
||||||
|
|
||||||
|
stide : $(STIDE_OBJECTS)
|
||||||
|
g++ -fno-implicit-templates $(FLAGS) $(STIDE_OBJECTS) $(LIBES) -o stide
|
||||||
|
|
||||||
|
template.o : template.cc ../Utils/arrays.h ../Utils/tll.h ../Utils/hash.h ../Utils/tll.cc ../Utils/arrays.cc ../Utils/hash.cc seq_stream.h flexitree.h
|
||||||
|
g++ -fno-implicit-templates $(FLAGS) -c template.cc
|
||||||
|
|
||||||
|
stide.o : stide.cc ../Utils/arrays.h ../Utils/hash.h seq_stream.h seq_config.h flexitree.h
|
||||||
|
g++ -fno-implicit-templates $(FLAGS) -c stide.cc
|
||||||
|
flexitree.o : flexitree.cc ../Utils/arrays.h flexitree.h
|
||||||
|
g++ -fno-implicit-templates $(FLAGS) -c flexitree.cc
|
||||||
|
|
||||||
|
seq_config.o : seq_config.cc seq_config.h ../Utils/arrays.h
|
||||||
|
g++ -fno-implicit-templates $(FLAGS) -c seq_config.cc
|
||||||
|
|
||||||
|
seq_stream.o : seq_stream.cc seq_stream.h seq_config.h flexitree.h ../Utils/arrays.h ../Utils/hash.h
|
||||||
|
g++ -fno-implicit-templates $(FLAGS) -c seq_stream.cc
|
449
11/wywolania/Data/stide_v1.1/Seq-code/flexitree.cc
Executable file
449
11/wywolania/Data/stide_v1.1/Seq-code/flexitree.cc
Executable file
@ -0,0 +1,449 @@
|
|||||||
|
// flexitree.cc
|
||||||
|
#include "flexitree.h"
|
||||||
|
|
||||||
|
extern int counter;
|
||||||
|
|
||||||
|
// data structures:
|
||||||
|
// node for a linked list
|
||||||
|
class FlexiTreeNode {
|
||||||
|
public:
|
||||||
|
FlexiTree *tree; // the element at this node
|
||||||
|
FlexiTreeNode *next; // pointer to the next node
|
||||||
|
FlexiTreeNode(int root) {tree = new FlexiTree(root); next = NULL;}
|
||||||
|
};
|
||||||
|
//===========================================================================
|
||||||
|
FlexiTree::FlexiTree(void) {
|
||||||
|
children = NULL;
|
||||||
|
root = -1;
|
||||||
|
id = counter;
|
||||||
|
counter++;
|
||||||
|
}
|
||||||
|
//===========================================================================
|
||||||
|
FlexiTree::FlexiTree(int d) {
|
||||||
|
children = NULL;
|
||||||
|
root = d;
|
||||||
|
id = counter;
|
||||||
|
counter++;
|
||||||
|
}
|
||||||
|
//============================================================================
|
||||||
|
FlexiTree::~FlexiTree(void) {
|
||||||
|
if (children) {
|
||||||
|
FlexiTreeNode *temp_ptr = children->next, *next_temp_ptr;
|
||||||
|
if (children->tree) delete children->tree;
|
||||||
|
delete children;
|
||||||
|
while (temp_ptr) {
|
||||||
|
next_temp_ptr = temp_ptr->next;
|
||||||
|
if (temp_ptr->tree) delete temp_ptr->tree;
|
||||||
|
delete temp_ptr;
|
||||||
|
temp_ptr = next_temp_ptr;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
//============================================================================
|
||||||
|
int FlexiTree::NumNodes(void) {
|
||||||
|
int size = 1;
|
||||||
|
if (children) {
|
||||||
|
FlexiTreeNode *temp_ptr = children;
|
||||||
|
while (temp_ptr) {
|
||||||
|
size += temp_ptr->tree->NumNodes();
|
||||||
|
temp_ptr = temp_ptr->next;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return size;
|
||||||
|
}
|
||||||
|
//============================================================================
|
||||||
|
int FlexiTree::NumLeaves(void) {
|
||||||
|
int size;
|
||||||
|
if (children) {
|
||||||
|
size = 0;
|
||||||
|
FlexiTreeNode *temp_ptr = children;
|
||||||
|
while (temp_ptr) {
|
||||||
|
size += temp_ptr->tree->NumLeaves();
|
||||||
|
temp_ptr = temp_ptr->next;
|
||||||
|
}
|
||||||
|
} else size = 1;
|
||||||
|
return size;
|
||||||
|
}
|
||||||
|
//============================================================================
|
||||||
|
int FlexiTree::NumBranches(void) {
|
||||||
|
int branches = 0;
|
||||||
|
if (children) {
|
||||||
|
FlexiTreeNode *temp_ptr = children;
|
||||||
|
while (temp_ptr) {
|
||||||
|
branches += (temp_ptr->tree->NumBranches() + 1);
|
||||||
|
temp_ptr = temp_ptr->next;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return branches;
|
||||||
|
}
|
||||||
|
/**********************************************************************
|
||||||
|
* InsertSeq() *
|
||||||
|
* Inserts a sequence in this tree and returns 1 if the sequence *
|
||||||
|
* begins with the root of this tree and the sequence isn't already *
|
||||||
|
* in this tree. It returns -1 if the sequence doesn't begin with *
|
||||||
|
* the root of this tree. It returns 0 if the sequence was already *
|
||||||
|
* in this tree. This function is recursive and only compares the *
|
||||||
|
* portion of the sequence lying between the argument first and the *
|
||||||
|
* argument last. *
|
||||||
|
* *
|
||||||
|
* *
|
||||||
|
* Input: const Array<int> &seq Current sequence *
|
||||||
|
* int first The first element of the sequence *
|
||||||
|
* to consider *
|
||||||
|
* int last The length of the sequence *
|
||||||
|
*********************************************************************/
|
||||||
|
|
||||||
|
int FlexiTree::InsertSeq(const Array<int> &seq, int first, int last)
|
||||||
|
{
|
||||||
|
// If the root of this tree isn't the same as the first element of
|
||||||
|
// the sequence, return -1 to indicate that
|
||||||
|
if (root != seq[first]) {
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
first++; // shift the seq forward
|
||||||
|
// If we have reached the end of the sequence now, we haven't added
|
||||||
|
// anything to the tree, so we return 0 to indicate that it was
|
||||||
|
// already there
|
||||||
|
if (first > last) {
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
// If there are no children, create some with the correct root,
|
||||||
|
// insert the sequence and return 1.
|
||||||
|
if (!children) {
|
||||||
|
children = new FlexiTreeNode(seq[first]);
|
||||||
|
children->tree->InsertSeq(seq, first, last);
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
// The root agrees, we're not at the end, and there are children.
|
||||||
|
// Now we want to know if the sequence is already in the children,
|
||||||
|
// and if not, we want to find out and add it.
|
||||||
|
FlexiTreeNode *temp_ptr = children;
|
||||||
|
int flag;
|
||||||
|
while (1) {
|
||||||
|
flag = temp_ptr->tree->InsertSeq(seq, first, last);
|
||||||
|
// If the sequence is new and gets added, return 1
|
||||||
|
if (flag == 1) return 1;
|
||||||
|
// If the sequence is old, return 0
|
||||||
|
if (flag == 0) return 0;
|
||||||
|
// Otherwise the new root of the sequence isn't the same as the
|
||||||
|
// root of this child tree, so we will try the next one. But
|
||||||
|
// first, if this is the last child, we know it isn't in here, so
|
||||||
|
// we will add it in and return 1
|
||||||
|
if (temp_ptr->next == NULL) {
|
||||||
|
temp_ptr->next = new FlexiTreeNode(seq[first]);
|
||||||
|
temp_ptr->next->tree->InsertSeq(seq, first, last);
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
temp_ptr = temp_ptr->next;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* IsSeqInTree() *
|
||||||
|
* Returns 1 if the sequence has a match within this tree and *
|
||||||
|
* returns 0 otherwise. This function is recursive and only *
|
||||||
|
* compares the portion of the sequence lying between the argument *
|
||||||
|
* first and the argument last. *
|
||||||
|
* *
|
||||||
|
* *
|
||||||
|
* Input: Array<int> &seq Current sequence *
|
||||||
|
* int first The first element of the sequence to *
|
||||||
|
* consider *
|
||||||
|
* int last The length of the sequence *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
int FlexiTree::IsSeqInTree(const Array<int> &seq, int first, int last)
|
||||||
|
{
|
||||||
|
// If the first element of the sequence isn't the same as the root
|
||||||
|
// of this tree, then we know already that there isn't a match here,
|
||||||
|
// so return 0.
|
||||||
|
if (root != seq[first]) {
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
first++; // shift the seq forward
|
||||||
|
|
||||||
|
// If we have reached the end of the sequence, then we have
|
||||||
|
// found matches all the way along, so return 1 saying that this is
|
||||||
|
// a match.
|
||||||
|
if (first > last) {
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Now we want to find out if there is a match in any of the
|
||||||
|
// subtrees below this tree. The subtrees are contained in the
|
||||||
|
// linked list children->next->next->...
|
||||||
|
FlexiTreeNode *next_node = children;
|
||||||
|
while (next_node != NULL) {
|
||||||
|
if (next_node->tree->IsSeqInTree(seq, first, last)) {
|
||||||
|
return 1; //Found it!
|
||||||
|
}
|
||||||
|
next_node = next_node->next;
|
||||||
|
}
|
||||||
|
// Now we've been through all of the subtrees without finding a
|
||||||
|
// match, so there aren't any matches.
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
/*********************************************************************
|
||||||
|
* ComputeHDistForTree() *
|
||||||
|
* Reports the minimum number of mismatches with any sequence on *
|
||||||
|
* this tree. This is a highly compute-intensive method, because *
|
||||||
|
* every path down the tree is followed. This function is *
|
||||||
|
* recursive, and only compares the portion of the sequence lying *
|
||||||
|
* between the argument first and the argument last. *
|
||||||
|
* *
|
||||||
|
* *
|
||||||
|
* Input: Array<int> &seq Current sequence *
|
||||||
|
* int first The first element of the sequence to *
|
||||||
|
* consider *
|
||||||
|
* int last The length of the sequence *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
int FlexiTree::ComputeHDistForTree(Array<int> &seq, int first, int
|
||||||
|
last)
|
||||||
|
{
|
||||||
|
|
||||||
|
int tot_misses = 0;
|
||||||
|
|
||||||
|
// If the first element of the sequence isn't the same as the root
|
||||||
|
// of this tree, then every sequence on this tree will disagree with
|
||||||
|
// the sequence here, so we increment tot_misses
|
||||||
|
if (root != seq[first]) {
|
||||||
|
tot_misses++;
|
||||||
|
}
|
||||||
|
|
||||||
|
first++; // shift the seq forward
|
||||||
|
if (first > last) { // reached the end of the seq
|
||||||
|
return tot_misses; // return a zero, i.e. no mismatches
|
||||||
|
}
|
||||||
|
|
||||||
|
// Now we want to add to tot_misses the smallest number of
|
||||||
|
// mismatches with any of this tree's subtrees. This tree's
|
||||||
|
// subtrees are in the linked list children->next->next->
|
||||||
|
FlexiTreeNode *next_node = children;
|
||||||
|
// last is the last element of the sequence, which is one less than
|
||||||
|
// the number of elements in the sequence. The most misses possible
|
||||||
|
// is the number of elements in the sequence.
|
||||||
|
int min_misses = last + 1;
|
||||||
|
int misses;
|
||||||
|
while (next_node != NULL) {
|
||||||
|
misses = next_node->tree->ComputeHDistForTree(seq, first, last);
|
||||||
|
if (misses < min_misses) {
|
||||||
|
min_misses = misses;
|
||||||
|
}
|
||||||
|
next_node = next_node->next;
|
||||||
|
}
|
||||||
|
return (tot_misses + min_misses);
|
||||||
|
}
|
||||||
|
//===========================================================================
|
||||||
|
// format for writing out: we do it df, each path is terminated by a negative number,
|
||||||
|
// which is -(the reqd backtrack length)-1. depth should start out as 0.
|
||||||
|
// the tree writing out will end with -1.
|
||||||
|
void FlexiTree::Write(ostream &s, int &depth) {
|
||||||
|
s<<root<<" ";
|
||||||
|
FlexiTreeNode *temp_ptr = children;
|
||||||
|
while (temp_ptr) {
|
||||||
|
depth = 0;
|
||||||
|
temp_ptr->tree->Write(s, depth);
|
||||||
|
temp_ptr = temp_ptr->next;
|
||||||
|
if (temp_ptr) s<<"-"<<(depth + 1)<<" ";
|
||||||
|
}
|
||||||
|
depth++; // now incr the count
|
||||||
|
}
|
||||||
|
//=============================================================================
|
||||||
|
ostream &operator<<(ostream &s, FlexiTree &tree) {
|
||||||
|
int depth = 0;
|
||||||
|
tree.Write(s, depth);
|
||||||
|
s<<" -1"; // we terminate with a -1
|
||||||
|
return s;
|
||||||
|
}
|
||||||
|
//===========================================================================
|
||||||
|
// returns 0 if we have reached the end of the file, 1 otherwise
|
||||||
|
int FlexiTree::Read(istream &s, int &depth) {
|
||||||
|
int next_num;
|
||||||
|
if (s.eof()) return 0;
|
||||||
|
s>>next_num;
|
||||||
|
|
||||||
|
if (next_num == -1) return 0; // we have reached the end of the tree
|
||||||
|
|
||||||
|
if (next_num >= 0) {
|
||||||
|
children = new FlexiTreeNode(next_num);
|
||||||
|
if (!children->tree->Read(s, depth)) return 0;
|
||||||
|
FlexiTreeNode *temp_ptr = children;
|
||||||
|
while (depth == 0) {
|
||||||
|
if (s.eof()) return 0;
|
||||||
|
s>>next_num;
|
||||||
|
if (next_num == -1) return 0; // we have reached the end of the tree
|
||||||
|
temp_ptr->next = new FlexiTreeNode(next_num);
|
||||||
|
temp_ptr = temp_ptr->next;
|
||||||
|
if (!temp_ptr->tree->Read(s, depth)) return 0;
|
||||||
|
}
|
||||||
|
} else depth = (-1 * next_num) - 1;
|
||||||
|
if (depth) depth--;
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
//=============================================================================
|
||||||
|
istream &operator>>(istream &s, FlexiTree &tree) {
|
||||||
|
int next_num, depth = 0;
|
||||||
|
s>>next_num;
|
||||||
|
tree.SetRoot(next_num);
|
||||||
|
tree.Read(s, depth);
|
||||||
|
return s;
|
||||||
|
}
|
||||||
|
//===========================================================================
|
||||||
|
// writes out in the format that dot uses for dags
|
||||||
|
int FlexiTree::OutputGraph(ostream &s) {
|
||||||
|
// first write out the name of the tree
|
||||||
|
s<<" "<<id<<" [label=\""<<root<<"\",shape=plaintext];"<<endl;
|
||||||
|
|
||||||
|
FlexiTreeNode *temp_ptr = children;
|
||||||
|
int childid;
|
||||||
|
while (temp_ptr) {
|
||||||
|
childid = temp_ptr->tree->OutputGraph(s);
|
||||||
|
s<<" "<<id<<" -> "<<childid<<";"<<endl;
|
||||||
|
temp_ptr = temp_ptr->next;
|
||||||
|
}
|
||||||
|
return id;
|
||||||
|
}
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* IsSeqInForest() *
|
||||||
|
* Searches through database forest to locate sequence. Returns 1 *
|
||||||
|
* if it finds it, 0 otherwise *
|
||||||
|
*********************************************************************/
|
||||||
|
|
||||||
|
SeqForest::IsSeqInForest(const Array<int> &seq, int seq_len) const
|
||||||
|
{
|
||||||
|
// Have we ever seen a sequence starting with the same root?
|
||||||
|
if (trees_found[seq[0]]) {
|
||||||
|
// Have we seen this precise sequence?
|
||||||
|
return trees[seq[0]].IsSeqInTree(seq, 0, seq_len-1);
|
||||||
|
}
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
/*
|
||||||
|
#include "fstream.h"
|
||||||
|
|
||||||
|
// for test purposes
|
||||||
|
void main(void) {
|
||||||
|
FlexiTree tree(1);
|
||||||
|
Array<int> seq(10);
|
||||||
|
|
||||||
|
// try out insert and write
|
||||||
|
seq[0] = 1; seq[1] = 1; seq[2] = 2; seq[3] = 3;
|
||||||
|
tree.SeqInsert(seq, 0, 3);
|
||||||
|
cout<<"1123:"<<tree<<endl;
|
||||||
|
seq[0] = 1; seq[1] = 1; seq[2] = 3; seq[3] = 5;
|
||||||
|
tree.SeqInsert(seq, 0, 3);
|
||||||
|
cout<<"1134:"<<tree<<endl;
|
||||||
|
seq[0] = 1; seq[1] = 2; seq[2] = 2; seq[3] = 3;
|
||||||
|
tree.SeqInsert(seq, 0, 3);
|
||||||
|
cout<<"1223:"<<tree<<endl;
|
||||||
|
seq[0] = 1; seq[1] = 2; seq[2] = 3; seq[3] = 3;
|
||||||
|
tree.SeqInsert(seq, 0, 3);
|
||||||
|
cout<<"1233:"<<tree<<endl;
|
||||||
|
seq[0] = 1; seq[1] = 2; seq[2] = 3; seq[3] = 4;
|
||||||
|
tree.SeqInsert(seq, 0, 3);
|
||||||
|
cout<<"1234:"<<tree<<endl;
|
||||||
|
seq[0] = 1; seq[1] = 2; seq[2] = 3; seq[3] = 4;
|
||||||
|
tree.SeqInsert(seq, 0, 3);
|
||||||
|
cout<<"1234:"<<tree<<endl;
|
||||||
|
seq[0] = 1; seq[1] = 2; seq[2] = 1; seq[3] = 4;
|
||||||
|
tree.SeqInsert(seq, 0, 3);
|
||||||
|
cout<<"1214:"<<tree<<endl;
|
||||||
|
|
||||||
|
// now try out search
|
||||||
|
seq[0] = 1; seq[1] = 2; seq[2] = 1; seq[3] = 4;
|
||||||
|
if (tree.SeqSearch(seq, 0, 3)) cout<<"found 1214"<<endl;
|
||||||
|
else cout<<"could not find 1214"<<endl;
|
||||||
|
seq[0] = 1; seq[1] = 2; seq[2] = 2; seq[3] = 4;
|
||||||
|
if (tree.SeqSearch(seq, 0, 3)) cout<<"found 1224"<<endl;
|
||||||
|
else cout<<"could not find 1224"<<endl;
|
||||||
|
seq[0] = 1; seq[1] = 2; seq[2] = 4; seq[3] = 4;
|
||||||
|
if (tree.SeqSearch(seq, 0, 3)) cout<<"found 1244"<<endl;
|
||||||
|
else cout<<"could not find 1244"<<endl;
|
||||||
|
seq[0] = 1; seq[1] = 1; seq[2] = 3; seq[3] = 5;
|
||||||
|
if (tree.SeqSearch(seq, 0, 3)) cout<<"found 1134"<<endl;
|
||||||
|
else cout<<"could not find 1134"<<endl;
|
||||||
|
|
||||||
|
// try out insert and write with shorter and longer sequences
|
||||||
|
seq[0] = 1; seq[1] = 3;
|
||||||
|
tree.SeqInsert(seq, 0, 1);
|
||||||
|
cout<<"13:"<<tree<<endl;
|
||||||
|
seq[0] = 1; seq[1] = 1; seq[2] = 4;
|
||||||
|
tree.SeqInsert(seq, 0, 2);
|
||||||
|
cout<<"114:"<<tree<<endl;
|
||||||
|
seq[0] = 1; seq[1] = 2; seq[2] = 3; seq[3] = 1; seq[4] = 1; seq[5] = 2; seq[6] = 1; seq[7] = 4;
|
||||||
|
tree.SeqInsert(seq, 0, 7);
|
||||||
|
cout<<"12311214:"<<tree<<endl;
|
||||||
|
|
||||||
|
if (tree.SeqSearch(seq, 0, 7)) cout<<"found 12311214"<<endl;
|
||||||
|
else cout<<"could not find 12311214"<<endl;
|
||||||
|
seq[0] = 1; seq[1] = 1; seq[2] = 5;
|
||||||
|
if (tree.SeqSearch(seq, 0, 2)) cout<<"found 115"<<endl;
|
||||||
|
else cout<<"could not find 115"<<endl;
|
||||||
|
if (tree.SeqSearch(seq, 0, 1)) cout<<"found 11"<<endl;
|
||||||
|
else cout<<"could not find 11"<<endl;
|
||||||
|
|
||||||
|
ofstream outf("test.out");
|
||||||
|
outf<<tree;
|
||||||
|
outf.close();
|
||||||
|
|
||||||
|
//counter = 0;
|
||||||
|
|
||||||
|
FlexiTree intree;
|
||||||
|
ifstream inf("test.out");
|
||||||
|
inf>>intree;
|
||||||
|
inf.close();
|
||||||
|
|
||||||
|
cout<<endl<<intree;
|
||||||
|
|
||||||
|
seq[0] = 1; seq[1] = 2; seq[2] = 3; seq[3] = 1; seq[4] = 1; seq[5] = 2; seq[6] = 1; seq[7] = 4;
|
||||||
|
if (intree.SeqSearch(seq, 0, 7)) cout<<"found 12311214"<<endl;
|
||||||
|
else cout<<"could not find 12311214"<<endl;
|
||||||
|
seq[0] = 1; seq[1] = 1; seq[2] = 5;
|
||||||
|
if (intree.SeqSearch(seq, 0, 2)) cout<<"found 115"<<endl;
|
||||||
|
else cout<<"could not find 115"<<endl;
|
||||||
|
if (intree.SeqSearch(seq, 0, 1)) cout<<"found 11"<<endl;
|
||||||
|
else cout<<"could not find 11"<<endl;
|
||||||
|
|
||||||
|
}
|
||||||
|
*/
|
||||||
|
|
||||||
|
/*
|
||||||
|
int FlexiTree::Read(istream &s, int &depth) {
|
||||||
|
int next_num, depth_decr = 0;
|
||||||
|
s>>next_num;
|
||||||
|
if (next_num == -1) return 0; // we have reached the end of the tree
|
||||||
|
if (next_num >= 0) {
|
||||||
|
children = new FlexiTreeNode(next_num);
|
||||||
|
if (!children->tree->Read(s, depth)) return 0;
|
||||||
|
if (depth) {
|
||||||
|
depth--;
|
||||||
|
depth_decr = 1;
|
||||||
|
}
|
||||||
|
FlexiTreeNode *temp_ptr = children;
|
||||||
|
while (depth == 0) {
|
||||||
|
depth_decr = 0;
|
||||||
|
s>>next_num;
|
||||||
|
if (next_num == -1) return 0; // we have reached the end of the tree
|
||||||
|
temp_ptr->next = new FlexiTreeNode(next_num);
|
||||||
|
temp_ptr = temp_ptr->next;
|
||||||
|
if (!temp_ptr->tree->Read(s, depth)) return 0;
|
||||||
|
if (depth) {
|
||||||
|
depth--;
|
||||||
|
depth_decr = 1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (!depth_decr && depth) depth--;
|
||||||
|
} else
|
||||||
|
depth = (-1 * next_num) - 1;
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
*/
|
||||||
|
|
59
11/wywolania/Data/stide_v1.1/Seq-code/flexitree.h
Executable file
59
11/wywolania/Data/stide_v1.1/Seq-code/flexitree.h
Executable file
@ -0,0 +1,59 @@
|
|||||||
|
#ifndef __FLEXITREE_H
|
||||||
|
#define __FLEXITREE_H
|
||||||
|
|
||||||
|
#include "../Utils/arrays.h"
|
||||||
|
|
||||||
|
class FlexiTreeNode;
|
||||||
|
class FlexiTree {
|
||||||
|
private:
|
||||||
|
FlexiTreeNode *children;
|
||||||
|
int root;
|
||||||
|
int id;
|
||||||
|
public:
|
||||||
|
void Write(ostream &s, int &depth);
|
||||||
|
int Read(istream &s, int &depth);
|
||||||
|
int OutputGraph(ostream &s);
|
||||||
|
FlexiTree();
|
||||||
|
FlexiTree(int d);
|
||||||
|
~FlexiTree();
|
||||||
|
void SetRoot(int d) {root = d;}
|
||||||
|
int InsertSeq(const Array<int> &seq, int first, int last);
|
||||||
|
int IsSeqInTree(const Array<int> &seq, int first, int last);
|
||||||
|
int ComputeHDistForTree(Array<int> &seq, int first, int last);
|
||||||
|
friend ostream &operator<<(ostream &s, FlexiTree &tn);
|
||||||
|
friend istream &operator>>(istream &s, FlexiTree &tn);
|
||||||
|
int NumNodes(); // returns the number of nodes in the tree
|
||||||
|
int NumLeaves(); // returns the number of leaves in the tree, i.e num of distinct seqs
|
||||||
|
int NumBranches(); // returns the total # of branches, of all nodes
|
||||||
|
};
|
||||||
|
|
||||||
|
//===========================================================================
|
||||||
|
class SeqForest {
|
||||||
|
public:
|
||||||
|
// this structure is a an array of N tree nodes, i.e. a tree for each value
|
||||||
|
// type
|
||||||
|
Array<FlexiTree> trees;
|
||||||
|
// this structure is to record what types of values actually occured -
|
||||||
|
// for efficiency, if there were actually fewer value types than
|
||||||
|
// specified in the config
|
||||||
|
Array<int> trees_found;
|
||||||
|
SeqForest(int max_trees)
|
||||||
|
{trees.Allocate(max_trees); trees_found.Allocate(max_trees); trees_found.Set(0);}
|
||||||
|
int IsSeqInForest(const Array<int> &seq, int seq_len) const;
|
||||||
|
};
|
||||||
|
|
||||||
|
//===========================================================================
|
||||||
|
|
||||||
|
#endif
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
34
11/wywolania/Data/stide_v1.1/Seq-code/opt_info.h
Executable file
34
11/wywolania/Data/stide_v1.1/Seq-code/opt_info.h
Executable file
@ -0,0 +1,34 @@
|
|||||||
|
#ifndef __OPT_INFO_H
|
||||||
|
#define __OPT_INFO_H
|
||||||
|
|
||||||
|
#include <string>
|
||||||
|
#include "../Utils/arrays.h"
|
||||||
|
|
||||||
|
#define NUM_OPTS 16
|
||||||
|
#define SHORT_NAME 0
|
||||||
|
#define LONG_NAME 1
|
||||||
|
|
||||||
|
class OptInfo {
|
||||||
|
public:
|
||||||
|
string long_name; // Long name of this option; used in
|
||||||
|
// configuration file and with the -- marker
|
||||||
|
// on the command line
|
||||||
|
string short_name; // Short name of this option; used with the -
|
||||||
|
// marker on the command line
|
||||||
|
int set; // Flag indicating if this option has already
|
||||||
|
// been set
|
||||||
|
char type; // type of value: legitimate values are f
|
||||||
|
// (flag, i.e., boolean), i (int), s (string)
|
||||||
|
// or h (help)
|
||||||
|
union { // pointer to actual value to be set
|
||||||
|
int *flag_val; // value if type = 'f'
|
||||||
|
int *int_val; // value if type = 'i'
|
||||||
|
string *str_val; // value if type = 's'
|
||||||
|
};
|
||||||
|
|
||||||
|
OptInfo() {};
|
||||||
|
};
|
||||||
|
|
||||||
|
#endif
|
||||||
|
|
||||||
|
|
54
11/wywolania/Data/stide_v1.1/Seq-code/sample.config
Executable file
54
11/wywolania/Data/stide_v1.1/Seq-code/sample.config
Executable file
@ -0,0 +1,54 @@
|
|||||||
|
#ConfigFileRev: 1
|
||||||
|
#Sample STIDE configuration file containing default values.
|
||||||
|
|
||||||
|
db_name: default.db # name of database
|
||||||
|
seq_len: 6 # length of sequences
|
||||||
|
max_elements: 500 # maximum number of unique elements in input
|
||||||
|
max_streams: 100 # maximum number of unique streams in input
|
||||||
|
pair_offset: 0 # offset for pair number count
|
||||||
|
add_output_format: \
|
||||||
|
"DB Size: %d\tStream: %s\tPair Number: %p\n"
|
||||||
|
# In verbose mode, STIDE will print
|
||||||
|
# this information for every new
|
||||||
|
# sequence added to the database. In
|
||||||
|
# very verbose mode, STIDE will print
|
||||||
|
# this information for every sequence
|
||||||
|
# considered. Possible data:
|
||||||
|
# %d Database Size
|
||||||
|
# %i Pair number of last data element of
|
||||||
|
# sequence in its particular
|
||||||
|
# data stream
|
||||||
|
# %p Pair number of last data element of
|
||||||
|
# sequence in the whole input
|
||||||
|
# stream
|
||||||
|
# %s Stream Number
|
||||||
|
|
||||||
|
compare_output_format: \
|
||||||
|
"Pair Number: %p\tStream Number: %s\n"
|
||||||
|
# In verbose mode, STIDE will print
|
||||||
|
# this information for every sequence
|
||||||
|
# which is itself an anomaly or whose
|
||||||
|
# locality frame conatins an anomaly.
|
||||||
|
# In very verbose mode, STIDE will
|
||||||
|
# print this information for every
|
||||||
|
# sequence. Possible data:
|
||||||
|
# %a 1 if this sequence is an anomaly, 0
|
||||||
|
# otherwise
|
||||||
|
# %c locality frame count of this sequence
|
||||||
|
# %h Hamming distance
|
||||||
|
# %i Pair number of last data element of
|
||||||
|
# its particular data stream
|
||||||
|
# %p Pair number of last data element of
|
||||||
|
# the entire input
|
||||||
|
# %s Stream Number
|
||||||
|
lf_size: 1 # 1 causes locality frame counts not
|
||||||
|
# to be computed
|
||||||
|
add_to_db: off # Add this data to the database, or, if there
|
||||||
|
# is no database, create a new one -- do not
|
||||||
|
# do comparisons
|
||||||
|
output_graph: off # Outputs graphing information in Dot
|
||||||
|
# format
|
||||||
|
compute_hdist: off # Compute Hamming distances
|
||||||
|
write_db_stats: off # At end, print out statistics about database
|
||||||
|
verbose: off # See add_ouput_format and compare_output_format
|
||||||
|
very_verbose: off # See add_ouput_format and compare_output_format
|
797
11/wywolania/Data/stide_v1.1/Seq-code/seq_config.cc
Executable file
797
11/wywolania/Data/stide_v1.1/Seq-code/seq_config.cc
Executable file
@ -0,0 +1,797 @@
|
|||||||
|
#include <stdlib.h>
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <fstream.h>
|
||||||
|
#include <string>
|
||||||
|
#include "seq_config.h"
|
||||||
|
#include "opt_info.h"
|
||||||
|
|
||||||
|
#define LF_LIM 999
|
||||||
|
#define SEQ_LEN_LIM 199
|
||||||
|
#define MAX_ELEM_LIM 999
|
||||||
|
#define MAX_STREAMS_LIM 9999
|
||||||
|
|
||||||
|
/**********************************************************************
|
||||||
|
* Config() *
|
||||||
|
* Reads in configuration information from configuration file, from *
|
||||||
|
* the command line, and from preset defaults. *
|
||||||
|
* *
|
||||||
|
* Input: int argc: Number of arguments on command line *
|
||||||
|
* char *argv[]: Array of strings of actual arguments *
|
||||||
|
* *
|
||||||
|
* Output: Nothing *
|
||||||
|
*********************************************************************/
|
||||||
|
|
||||||
|
Config::Config(const int argc, const char *argv[])
|
||||||
|
{
|
||||||
|
Array<OptInfo> opt_array;
|
||||||
|
InitOptArray(opt_array);
|
||||||
|
|
||||||
|
SetDefaults();
|
||||||
|
|
||||||
|
ReadCommandLine(argc, argv, opt_array);
|
||||||
|
|
||||||
|
ReadConfigFile(opt_array);
|
||||||
|
|
||||||
|
CheckValues();
|
||||||
|
|
||||||
|
InitOutputFormat();
|
||||||
|
|
||||||
|
OuputConfigInfo(opt_array);
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* InitOptArray() *
|
||||||
|
* Sets the values of opt_array so that opr_array contains all the *
|
||||||
|
* information needed about the parameters being set by the config *
|
||||||
|
* file and the command-line arguments. *
|
||||||
|
* *
|
||||||
|
* Input: Array<OptInfo> &opt_array: Array of information about *
|
||||||
|
* options for the program *
|
||||||
|
* *
|
||||||
|
* Output: Nothing *
|
||||||
|
*********************************************************************/
|
||||||
|
|
||||||
|
void Config::InitOptArray(Array<OptInfo> &opt_array)
|
||||||
|
{
|
||||||
|
opt_array.Allocate(NUM_OPTS);
|
||||||
|
|
||||||
|
opt_array[0].long_name = "db_name";
|
||||||
|
opt_array[0].short_name = "d";
|
||||||
|
opt_array[0].set = 0;
|
||||||
|
opt_array[0].type = 's';
|
||||||
|
opt_array[0].str_val = &db_name;
|
||||||
|
|
||||||
|
opt_array[1].long_name = "seq_len";
|
||||||
|
opt_array[1].short_name = "l";
|
||||||
|
opt_array[1].set = 0;
|
||||||
|
opt_array[1].type = 'i';
|
||||||
|
opt_array[1].int_val = &seq_len;
|
||||||
|
|
||||||
|
opt_array[2].long_name = "max_elements";
|
||||||
|
opt_array[2].short_name = "me";
|
||||||
|
opt_array[2].set = 0;
|
||||||
|
opt_array[2].type = 'i';
|
||||||
|
opt_array[2].int_val = &max_elements;
|
||||||
|
|
||||||
|
opt_array[3].long_name = "max_streams";
|
||||||
|
opt_array[3].short_name = "ms";
|
||||||
|
opt_array[3].set = 0;
|
||||||
|
opt_array[3].type = 'i';
|
||||||
|
opt_array[3].int_val = &max_streams;
|
||||||
|
|
||||||
|
opt_array[4].long_name = "cfg_name";
|
||||||
|
opt_array[4].short_name = "c";
|
||||||
|
opt_array[4].set = 0;
|
||||||
|
opt_array[4].type = 's';
|
||||||
|
opt_array[4].str_val = &cfg_name;
|
||||||
|
|
||||||
|
opt_array[5].long_name = "pair_offset";
|
||||||
|
opt_array[5].short_name = "p";
|
||||||
|
opt_array[5].set = 0;
|
||||||
|
opt_array[5].type = 'i';
|
||||||
|
opt_array[5].int_val = &pair_offset;
|
||||||
|
|
||||||
|
opt_array[6].long_name = "add_output_format";
|
||||||
|
opt_array[6].short_name = "aof";
|
||||||
|
opt_array[6].set = 0;
|
||||||
|
opt_array[6].type = 's';
|
||||||
|
opt_array[6].str_val = &add_output_format;
|
||||||
|
|
||||||
|
opt_array[7].long_name = "compare_output_format";
|
||||||
|
opt_array[7].short_name = "cof";
|
||||||
|
opt_array[7].set = 0;
|
||||||
|
opt_array[7].type = 's';
|
||||||
|
opt_array[7].str_val = &compare_output_format;
|
||||||
|
|
||||||
|
opt_array[8].long_name = "add_to_db";
|
||||||
|
opt_array[8].short_name = "a";
|
||||||
|
opt_array[8].set = 0;
|
||||||
|
opt_array[8].type = 'f';
|
||||||
|
opt_array[8].int_val = &add_to_db;
|
||||||
|
|
||||||
|
opt_array[9].long_name = "output_graph";
|
||||||
|
opt_array[9].short_name = "g";
|
||||||
|
opt_array[9].set = 0;
|
||||||
|
opt_array[9].type = 'f';
|
||||||
|
opt_array[9].int_val = &output_graph;
|
||||||
|
|
||||||
|
opt_array[10].long_name = "compute_hdist";
|
||||||
|
opt_array[10].short_name = "hd";
|
||||||
|
opt_array[10].set = 0;
|
||||||
|
opt_array[10].type = 'f';
|
||||||
|
opt_array[10].int_val = &compute_hdist;
|
||||||
|
|
||||||
|
opt_array[11].long_name = "lf_size";
|
||||||
|
opt_array[11].short_name = "lf";
|
||||||
|
opt_array[11].set = 0;
|
||||||
|
opt_array[11].type = 'i';
|
||||||
|
opt_array[11].int_val = &lf_size;
|
||||||
|
|
||||||
|
opt_array[12].long_name = "write_db_stats";
|
||||||
|
opt_array[12].short_name = "s";
|
||||||
|
opt_array[12].set = 0;
|
||||||
|
opt_array[12].type = 'f';
|
||||||
|
opt_array[12].int_val = &write_db_stats;
|
||||||
|
|
||||||
|
opt_array[13].long_name = "verbose";
|
||||||
|
opt_array[13].short_name = "v";
|
||||||
|
opt_array[13].set = 0;
|
||||||
|
opt_array[13].type = 'f';
|
||||||
|
opt_array[13].int_val = &verbose;
|
||||||
|
|
||||||
|
opt_array[14].long_name = "very_verbose";
|
||||||
|
opt_array[14].short_name = "V";
|
||||||
|
opt_array[14].set = 0;
|
||||||
|
opt_array[14].type = 'f';
|
||||||
|
opt_array[14].int_val = &very_verbose;
|
||||||
|
|
||||||
|
opt_array[15].long_name = "help";
|
||||||
|
opt_array[15].short_name = "h";
|
||||||
|
opt_array[15].set = 0;
|
||||||
|
opt_array[15].type = 'h';
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* SetDefaults() *
|
||||||
|
* Sets conifiguration variables to their default values *
|
||||||
|
* *
|
||||||
|
* Input: None *
|
||||||
|
* *
|
||||||
|
* Output: None *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
void Config::SetDefaults()
|
||||||
|
{
|
||||||
|
cfg_name = "stide.config";
|
||||||
|
db_name = "default.db";
|
||||||
|
seq_len = 6;
|
||||||
|
max_elements = 500;
|
||||||
|
max_streams = 100;
|
||||||
|
pair_offset = 0;
|
||||||
|
add_output_format = "DB Size: %d\tStream: %s\tPair Number: %p\n";
|
||||||
|
compare_output_format = "Pair Number: %p\tStream Number: %s\n";
|
||||||
|
lf_size = 1;
|
||||||
|
add_to_db = 0;
|
||||||
|
output_graph = 0;
|
||||||
|
compute_hdist = 0;
|
||||||
|
write_db_stats = 0;
|
||||||
|
verbose = 0;
|
||||||
|
very_verbose = 0;
|
||||||
|
num_fvars = 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* ReadCommandLine() *
|
||||||
|
* Parses the command line. Updates configuration variables. *
|
||||||
|
* *
|
||||||
|
* const int argc Number of arguments *
|
||||||
|
* const char *argv[], Array of arguments *
|
||||||
|
* Array<OptInfo> &opt_array Constant array of information about *
|
||||||
|
* the configuration variables *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
void Config::ReadCommandLine(const int argc, const char *argv[],
|
||||||
|
Array<OptInfo> &opt_array)
|
||||||
|
{
|
||||||
|
string var_name; // Name of variable
|
||||||
|
string var_val; // Value of variable
|
||||||
|
int name_type; // LONG_NAME or SHORT_NAME
|
||||||
|
int argv_i = 1; // First index of argv
|
||||||
|
int argv_j = 0; // Second index of argv
|
||||||
|
|
||||||
|
while (argv_i < argc) {
|
||||||
|
if (argv[argv_i][argv_j] != '-') {
|
||||||
|
cerr<< "ERROR: Switches must be preceeded by a dash: "<<argv[argv_i]
|
||||||
|
<< endl << " is illegal" << endl;
|
||||||
|
exit(-1);
|
||||||
|
}
|
||||||
|
argv_j++;
|
||||||
|
if (argv[argv_i][argv_j] == '-') { // Long name
|
||||||
|
argv_j++;
|
||||||
|
name_type = LONG_NAME;
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
name_type = SHORT_NAME;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Read name into var_name
|
||||||
|
var_name = argv[argv_i]+argv_j;
|
||||||
|
|
||||||
|
// Now we want to read the value, if there is one.
|
||||||
|
argv_j = 0;
|
||||||
|
if (++argv_i < argc) {
|
||||||
|
if (argv[argv_i][argv_j] != '-') {
|
||||||
|
var_val = argv[argv_i];
|
||||||
|
argv_i++;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// assign value to appropriate variable
|
||||||
|
AssignValToVar(opt_array, var_val, var_name, name_type);
|
||||||
|
// Blank var_name and var_val for next time around
|
||||||
|
var_name.resize(0);
|
||||||
|
var_val.resize(0);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* AssignValToVar() *
|
||||||
|
* Figures out which variable to assign a given value to and does *
|
||||||
|
* so. Updates opt_array, to say that that particular variable *
|
||||||
|
* has been set. *
|
||||||
|
* *
|
||||||
|
* Input: Array<OptInfo> &opt_array Option Information *
|
||||||
|
* const string &var_val Value to be assigned *
|
||||||
|
* const string &var_name Name of variable to be updated *
|
||||||
|
* const int name_type SHORT_NAME or LONG_NAME *
|
||||||
|
* *
|
||||||
|
* Output: None *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
void Config::AssignValToVar(Array<OptInfo> &opt_array, const string
|
||||||
|
&var_val, const string &var_name, const
|
||||||
|
int name_type)
|
||||||
|
{
|
||||||
|
int opt_i;
|
||||||
|
|
||||||
|
for (opt_i = 0; opt_i < NUM_OPTS; opt_i++) {
|
||||||
|
if (((name_type == LONG_NAME) && (opt_array[opt_i].long_name ==
|
||||||
|
var_name)) ||
|
||||||
|
((name_type == SHORT_NAME) && (opt_array[opt_i].short_name ==
|
||||||
|
var_name))) {
|
||||||
|
// If we have already set this variable and shouldn't change it,
|
||||||
|
// don't
|
||||||
|
if (opt_array[opt_i].set == 1) {
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
switch (opt_array[opt_i].type) {
|
||||||
|
case 'f': // flag
|
||||||
|
if ((var_val.length() == 0) || (var_val == "On") ||
|
||||||
|
(var_val == "ON") || (var_val == "on")) {
|
||||||
|
*(opt_array[opt_i].flag_val) = 1;
|
||||||
|
opt_array[opt_i].set = 1;
|
||||||
|
}
|
||||||
|
else if ((var_val != "Off") && (var_val != "off") &&
|
||||||
|
(var_val != "OFF")) {
|
||||||
|
cerr << "ERROR: Illegal value for parameter " << var_name
|
||||||
|
<< ". This parameter is a simple flag," << endl
|
||||||
|
<< "and may be followed by \"on\", \"off\", or nothing "
|
||||||
|
<< "(which turns it on). The current value is "
|
||||||
|
<< var_val << ". Aborting...";
|
||||||
|
exit -1;
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
case 'i':
|
||||||
|
// If there isn't a value, just use the default
|
||||||
|
if (var_val.length() == 0) {
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
*(opt_array[opt_i].int_val) = atoi(var_val.c_str());
|
||||||
|
opt_array[opt_i].set = 1;
|
||||||
|
break;
|
||||||
|
case 's':
|
||||||
|
// If there is no string given, just use the default
|
||||||
|
if (var_val.length() == 0) {
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
*(opt_array[opt_i].str_val) = var_val;
|
||||||
|
opt_array[opt_i].set = 1;
|
||||||
|
break;
|
||||||
|
case 'h':
|
||||||
|
WriteHelpInfo();
|
||||||
|
} // end of switch
|
||||||
|
return; // we've found it, so we're done
|
||||||
|
} // end of if (opt_array[opt_i]...
|
||||||
|
} // end of for (opt_i = 0; ...
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* ReadConfigFile() *
|
||||||
|
* Parses the configuration file. Updates configuration *
|
||||||
|
* variables. *
|
||||||
|
* *
|
||||||
|
* Input: Array<OptInfo> &opt_array: Option information *
|
||||||
|
* *
|
||||||
|
* Output: None *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
|
||||||
|
void Config::ReadConfigFile(Array<OptInfo> &opt_array)
|
||||||
|
{
|
||||||
|
string var_name;
|
||||||
|
string var_val;
|
||||||
|
|
||||||
|
// Set up stream for reading configuration
|
||||||
|
ifstream cfg_file(cfg_name.c_str());
|
||||||
|
string buff;
|
||||||
|
int buff_i = 0; // index for buff
|
||||||
|
int opt_i = 0; // index for opt_array
|
||||||
|
int rev_num; // revision number of configuration file
|
||||||
|
|
||||||
|
if (!cfg_file.is_open()) {
|
||||||
|
cerr<<"WARNING: Cannot open configuration file "<<cfg_name
|
||||||
|
<<". I will continue, using the" <<endl
|
||||||
|
<<"default values and the command line arguments." << endl
|
||||||
|
<<"If that isn't what you wanted, type Ctrl-C now to abort."
|
||||||
|
<< endl;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// First we need to determine if the configuration file is old-style
|
||||||
|
// or new-style, i.e., is there a #ConfigFileRev: in the first
|
||||||
|
// line. We can determine this just be checking the first
|
||||||
|
// character.
|
||||||
|
char c = cfg_file.peek();
|
||||||
|
|
||||||
|
// Config file is empty; just return
|
||||||
|
if (cfg_file.eof()) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// If old-style
|
||||||
|
if (c != '#') {
|
||||||
|
cerr << "WARNING: The first line of the configuration file did "
|
||||||
|
<< "not contain the string" << endl
|
||||||
|
<< "\"#ConfigFileRev: " << CFREV << "\"." << endl
|
||||||
|
<< "I will assume that this is an old format configuration "
|
||||||
|
<< "file." << endl
|
||||||
|
<<"If that isn't what you wanted, type Ctrl-C now to abort."
|
||||||
|
<< endl << endl;
|
||||||
|
ReadOldConfigFile(cfg_file, opt_array);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Look for "#ConfigFileRev:"
|
||||||
|
cfg_file >> buff;
|
||||||
|
|
||||||
|
if (buff != "#ConfigFileRev:") {
|
||||||
|
cerr << "ERROR: I expected the first line of the configuration "
|
||||||
|
<< "file to either be \"#ConfigFileRev: \" followed by the "
|
||||||
|
<< "revision number or the beginning of an old-style "
|
||||||
|
<< "configuration file, which does not have a comment in the "
|
||||||
|
<< "first line. I'm confused, so I will abort..."
|
||||||
|
<< endl << endl;
|
||||||
|
exit(-1);
|
||||||
|
}
|
||||||
|
|
||||||
|
cfg_file >> rev_num;
|
||||||
|
|
||||||
|
if (rev_num > CFREV) {
|
||||||
|
cerr << "ERROR: This version of STIDE does not know how to deal "
|
||||||
|
<< "with configuration files" << endl
|
||||||
|
<< "more modern than revision " << CFREV << ". Aborting..."
|
||||||
|
<<endl;
|
||||||
|
exit(-1);
|
||||||
|
}
|
||||||
|
if (rev_num < CFREV) {
|
||||||
|
cerr << "ERROR: Configuration files must be revision " << CFREV
|
||||||
|
<< "or later, " << "or an old-style" << endl
|
||||||
|
<< "configuration file without a revision number. "
|
||||||
|
<< "Aborting..." << endl;
|
||||||
|
exit(-1);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Now we know everything's as we expect, so we'll parse the file
|
||||||
|
|
||||||
|
while (!cfg_file.eof()) {
|
||||||
|
// Skip white space at the beginning of the line
|
||||||
|
while (isspace(buff[buff_i])) {
|
||||||
|
buff_i++;
|
||||||
|
}
|
||||||
|
|
||||||
|
// If buff is empty, move on to next line
|
||||||
|
if (buff.length() <= buff_i) {
|
||||||
|
getline(cfg_file, buff);
|
||||||
|
buff_i = 0;
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
// If we start with a comment, move on to next line
|
||||||
|
if (buff[buff_i] == '#') {
|
||||||
|
getline(cfg_file, buff);
|
||||||
|
buff_i = 0;
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
// Read in variable name, up to the :
|
||||||
|
int start_place = buff_i; // the beginning place of the name
|
||||||
|
while (buff[buff_i] != ':' && (buff_i < buff.length())) {
|
||||||
|
buff_i++;
|
||||||
|
}
|
||||||
|
if (buff[buff_i] == buff.length()) {
|
||||||
|
cerr << "ERROR: Variable names in the configuration file must "
|
||||||
|
<< "be followed by a colon. The line " << endl
|
||||||
|
<< buff << endl << "contains a variable name which is not "
|
||||||
|
<< "terminated by a colon. Aborting..." <<endl;
|
||||||
|
exit(-1);
|
||||||
|
}
|
||||||
|
|
||||||
|
// This assigns the values in buff between start_place and buff_i
|
||||||
|
// to var_name
|
||||||
|
var_name.assign(buff, start_place, buff_i - start_place);
|
||||||
|
|
||||||
|
// Skip colon
|
||||||
|
buff_i++;
|
||||||
|
|
||||||
|
// Skip white space
|
||||||
|
while (isspace(buff[buff_i])) { buff_i++; }
|
||||||
|
|
||||||
|
start_place = buff_i; // the starting place of the value
|
||||||
|
// Find last point in value. If it starts with a quote, it ends
|
||||||
|
// with a quote.
|
||||||
|
if ((buff[buff_i] == '\"') && (buff_i < buff.length())) {
|
||||||
|
while (buff[buff_i] != '\"') {
|
||||||
|
buff_i++;
|
||||||
|
}
|
||||||
|
// Strip off first "
|
||||||
|
start_place++;
|
||||||
|
}
|
||||||
|
// Otherwise, it ends with a space, a # or the end of the line
|
||||||
|
else {
|
||||||
|
while ((buff_i < buff.length()) && (!isspace(buff[buff_i])) &&
|
||||||
|
(buff[buff_i] != '#')) {
|
||||||
|
buff_i++;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
var_val.assign(buff, start_place, buff_i - start_place);
|
||||||
|
|
||||||
|
// Now we want to check to see if the line was continued, in which
|
||||||
|
// case we haven't gotten the value of the variable in var_val, so
|
||||||
|
// we still need to do that.
|
||||||
|
if (buff[buff_i-1] == '\\') {
|
||||||
|
getline(cfg_file, buff);
|
||||||
|
buff_i = 0;
|
||||||
|
while (isspace(buff[buff_i])) { buff_i++; }
|
||||||
|
start_place = buff_i;
|
||||||
|
// Find last point in value. If it starts with a quote, it ends with a
|
||||||
|
// quote.
|
||||||
|
if (buff[buff_i] == '\"') {
|
||||||
|
buff_i++;
|
||||||
|
while ((buff[buff_i] != '\"') && (buff_i < buff.length())) {
|
||||||
|
buff_i++;
|
||||||
|
}
|
||||||
|
start_place++; // Strip off first "
|
||||||
|
}
|
||||||
|
// Otherwise, it ends with a space, a # or the end of the line
|
||||||
|
else {
|
||||||
|
while ((buff_i < buff.length()) && (!isspace(buff[buff_i])) &&
|
||||||
|
(buff[buff_i] != '#')) {
|
||||||
|
buff_i++;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
var_val.assign(buff, start_place, buff_i - start_place);
|
||||||
|
}
|
||||||
|
|
||||||
|
// assign value to appropriate variable
|
||||||
|
AssignValToVar(opt_array, var_val, var_name, LONG_NAME);
|
||||||
|
getline(cfg_file, buff);
|
||||||
|
buff_i = 0;
|
||||||
|
} //end of while (!cfg_file.eof())...
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* ReadOldConfigFile() *
|
||||||
|
* Reads information from an old-style configuration file. *
|
||||||
|
* Updates configuration variables. *
|
||||||
|
* *
|
||||||
|
* Input: ifstream &cfg_file Configuration file (already opened) *
|
||||||
|
* Array<OptInfo> &opt_array: Option information *
|
||||||
|
* *
|
||||||
|
* Output: None *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
void Config::ReadOldConfigFile(ifstream &cfg_file,
|
||||||
|
Array<OptInfo> &opt_array)
|
||||||
|
{
|
||||||
|
|
||||||
|
string buff;
|
||||||
|
string var_name;
|
||||||
|
string var_val;
|
||||||
|
|
||||||
|
var_name = "max_elements";
|
||||||
|
cfg_file>>var_val;
|
||||||
|
AssignValToVar(opt_array, var_val, var_name, LONG_NAME);
|
||||||
|
getline(cfg_file, buff);
|
||||||
|
|
||||||
|
var_name = "max_streams";
|
||||||
|
cfg_file>>var_val;
|
||||||
|
AssignValToVar(opt_array, var_val, var_name, LONG_NAME);
|
||||||
|
getline(cfg_file, buff);
|
||||||
|
|
||||||
|
// Next line is hash table size, but we are now figuring that out
|
||||||
|
// dynamically, so just throw it away.
|
||||||
|
getline(cfg_file, buff);
|
||||||
|
|
||||||
|
// Now read in the format string
|
||||||
|
getline(cfg_file, var_val);
|
||||||
|
// Put the format string in the appropriate place
|
||||||
|
if (add_to_db) {
|
||||||
|
var_name = "add_output_format";
|
||||||
|
AssignValToVar(opt_array, var_val, var_name, LONG_NAME);
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
var_name = "compare_output_format";
|
||||||
|
AssignValToVar(opt_array, var_val, var_name, LONG_NAME);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* CheckValues() *
|
||||||
|
* Checks configuration values that have been read in to make *
|
||||||
|
* sure that they are within the limits. Flags are automatically *
|
||||||
|
* checked while being read in, the output formats are checked *
|
||||||
|
* in InitOutputFormat(), and filenames are checked when they are *
|
||||||
|
* opened, so all that is left is the integer values. *
|
||||||
|
* *
|
||||||
|
* Input: None *
|
||||||
|
* *
|
||||||
|
* Output: None *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
void Config::CheckValues()
|
||||||
|
{
|
||||||
|
if ((lf_size < 1) || (lf_size > LF_LIM)) {
|
||||||
|
cerr << "ERROR: lf_size must be between 1 and " << LF_LIM
|
||||||
|
<< ". It has been set to " << lf_size << ". Aborting..." << endl;
|
||||||
|
exit(-1);
|
||||||
|
}
|
||||||
|
if ((seq_len < 1) || (seq_len > SEQ_LEN_LIM)) {
|
||||||
|
cerr << "ERROR: seq_len must be between 1 and " << SEQ_LEN_LIM
|
||||||
|
<< ". It has been set to " << seq_len << ". Aborting..." << endl;
|
||||||
|
exit(-1);
|
||||||
|
}
|
||||||
|
if ((max_elements < 1) || (max_elements > MAX_ELEM_LIM)) {
|
||||||
|
cerr << "ERROR: max_elements must be between 1 and " << MAX_ELEM_LIM
|
||||||
|
<< ". It has been set to " << max_elements
|
||||||
|
<< ". Aborting..." << endl;
|
||||||
|
exit(-1);
|
||||||
|
}
|
||||||
|
if ((max_streams < 1) || (max_streams > MAX_STREAMS_LIM)) {
|
||||||
|
cerr << "ERROR: max_streams must be between 1 and " << MAX_STREAMS_LIM
|
||||||
|
<< ". It has been set to " << max_streams
|
||||||
|
<< ". Aborting..." << endl;
|
||||||
|
exit(-1);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* InitOutputFormat() *
|
||||||
|
* Converts the string add_output_format or compare_output_format *
|
||||||
|
* to information filling fmt_str and num_fvars, which is more *
|
||||||
|
* convenient for output. *
|
||||||
|
* *
|
||||||
|
* Input: None *
|
||||||
|
* *
|
||||||
|
* Output: None *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
void Config::InitOutputFormat()
|
||||||
|
{
|
||||||
|
// Now we analyze add_output_format or compare_output_format
|
||||||
|
int flag = 0;
|
||||||
|
int f_i = 0;
|
||||||
|
num_fvars = 0;
|
||||||
|
string *buff;
|
||||||
|
|
||||||
|
// If we're not in verbose or very_verbose modes, we're never going
|
||||||
|
// to use this information, so don't waste our time doing this
|
||||||
|
if (!(verbose || very_verbose)) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
if (add_to_db) {
|
||||||
|
buff = &add_output_format;
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
buff = &compare_output_format;
|
||||||
|
}
|
||||||
|
|
||||||
|
for (int i = 0; i <(*buff).length(); i++) {
|
||||||
|
switch ((*buff)[i]) {
|
||||||
|
case '\\':
|
||||||
|
i++;
|
||||||
|
switch ((*buff)[i]) {
|
||||||
|
case 't': fmt_str[num_fvars][f_i] = '\t'; break;
|
||||||
|
case 'n': fmt_str[num_fvars][f_i] = '\n'; break;
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
case '%':
|
||||||
|
fmt_str[num_fvars][f_i] = '%';
|
||||||
|
flag = 1;
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
fmt_str[num_fvars][f_i] = (*buff)[i];
|
||||||
|
if (flag) {
|
||||||
|
switch (fmt_str[num_fvars][f_i]) {
|
||||||
|
case 'd': // database size
|
||||||
|
case 'i': // number of last value of sequence in this
|
||||||
|
// data stream
|
||||||
|
case 'p': // number of last value of sequence in entire
|
||||||
|
// input
|
||||||
|
case 's': // external stream ID
|
||||||
|
case 'a': // flag for whether this sequence is anomalous
|
||||||
|
case 'c': // locality frame count of this sequence
|
||||||
|
case 'h': // Hamming distance for this sequence
|
||||||
|
// Record that we must write that val at that position
|
||||||
|
write_val[num_fvars] = fmt_str[num_fvars][f_i];
|
||||||
|
fmt_str[num_fvars][f_i] = 'd';
|
||||||
|
fmt_str[num_fvars][f_i + 1] = '\0';
|
||||||
|
num_fvars++;
|
||||||
|
f_i = -1;
|
||||||
|
flag = 0;
|
||||||
|
break;
|
||||||
|
default: // Unknown flag
|
||||||
|
cerr << "ERROR: Illegal control character in output format."
|
||||||
|
<< " Type stide -h for help." << endl;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
} // switch ((*buff)[i ...
|
||||||
|
f_i++;
|
||||||
|
}
|
||||||
|
fmt_str[num_fvars][f_i] = '\0';
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* OutputConfigInfo() *
|
||||||
|
* Writes information about the final configuration to standard *
|
||||||
|
* output. Does so in a format that could be used as a *
|
||||||
|
* configuration file. Changes no values anywhere. *
|
||||||
|
* *
|
||||||
|
* Input: const Array<OptInfo> &opt_array Option Information *
|
||||||
|
* *
|
||||||
|
* Output: None *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
void Config::OuputConfigInfo(const Array<OptInfo> &opt_array) const
|
||||||
|
{
|
||||||
|
cout<<"This run was configured using configuration file "
|
||||||
|
<< cfg_name << " and command" << endl
|
||||||
|
<< "line arguments. The configuration values were as "
|
||||||
|
<< "follows." << endl
|
||||||
|
<<"#ConfigFileRev: " << CFREV << endl;
|
||||||
|
for (int i = 0; i < NUM_OPTS; i++) {
|
||||||
|
if (opt_array[i].type == 'i') {
|
||||||
|
cout << opt_array[i].long_name << ": " << *(opt_array[i].int_val)
|
||||||
|
<< endl;
|
||||||
|
}
|
||||||
|
if ((opt_array[i].type == 's') &&
|
||||||
|
((add_to_db && (opt_array[i].short_name == "aof")) ||
|
||||||
|
(!add_to_db && (opt_array[i].short_name == "cof")))) {
|
||||||
|
cout << opt_array[i].long_name << ": \"" << *(opt_array[i].str_val)
|
||||||
|
<< "\"" << endl;
|
||||||
|
}
|
||||||
|
if (opt_array[i].type == 'f') {
|
||||||
|
if (*(opt_array[i].int_val) == 1) {
|
||||||
|
cout << opt_array[i].long_name << ": On" << endl;
|
||||||
|
}
|
||||||
|
if (*(opt_array[i].int_val) == 0) {
|
||||||
|
cout << opt_array[i].long_name << ": Off" << endl;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
cout << endl << endl;
|
||||||
|
|
||||||
|
// Now print header for verbose modes
|
||||||
|
if (verbose || very_verbose) {
|
||||||
|
cout<<endl<<"Variables in output: "<<endl;
|
||||||
|
for (int j = 0; j < num_fvars; j++) {
|
||||||
|
switch (write_val[j]) {
|
||||||
|
case 's': cout<<"stream #, "; break;
|
||||||
|
case 'i': cout<<"index #, "; break;
|
||||||
|
case 'h': if (compute_hdist) {cout<<"hamming miss, "; } break;
|
||||||
|
case 'c': if (lf_size > 1) {cout<<"lfc, "; } break;
|
||||||
|
case 'p': cout<<"pair #, "; break;
|
||||||
|
case 'd': cout<<"db size, "; break;
|
||||||
|
case 'a': cout<<"is anomalous?, "; break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
cout<<endl;
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* WriteHelpInfo() *
|
||||||
|
* Writes help information to standard output. Changes no values.*
|
||||||
|
* *
|
||||||
|
* Input: None *
|
||||||
|
* *
|
||||||
|
* Output: None *
|
||||||
|
* *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
|
||||||
|
void Config::WriteHelpInfo() const
|
||||||
|
{
|
||||||
|
cout<<"STIDE accepts calls of the form:"<<endl
|
||||||
|
<<" stide -c cfg_name -d db_name -e max_num_elements"
|
||||||
|
<<" -lf lf_size -l seq_len"<<endl<<" -n max_num_streams"
|
||||||
|
<<" -p pair_num_offset -aof add_out_format "
|
||||||
|
<< endl << " -cof comp_out_format -a -g -h -m -s -v -V"
|
||||||
|
<< endl << endl;
|
||||||
|
cout<<"STIDE expects input to come through standard input in"
|
||||||
|
<<" the format of a pair"<<endl
|
||||||
|
<<"of integers per line, where the first integer is a"
|
||||||
|
<<" stream identifier"<<endl
|
||||||
|
<<"and the second is a data element. Command line"
|
||||||
|
<<" arguments override"<<endl
|
||||||
|
<<"specifications in the configuration file. All"
|
||||||
|
<<" parameters are optional"<<endl
|
||||||
|
<<"and can be specified in any order. Parameters"
|
||||||
|
<<" are always preceded by a"<<endl
|
||||||
|
<<"switch. The switches are:"<<endl<<endl;
|
||||||
|
cout<<"-a Add to database; defaults to off"<<endl;
|
||||||
|
cout<<"-c cfg_name The name of file containing the"
|
||||||
|
<<" configuration;"<<endl
|
||||||
|
<<" defaults to \"stide.config\""<<endl;
|
||||||
|
cout<<"-d db_name The name of the file containing"
|
||||||
|
<<" the database;"<<endl
|
||||||
|
<<" defaults to \"default.db\""<<endl;
|
||||||
|
cout<<"-lf lf_size The size of the locality frame;"
|
||||||
|
<<" defaults to 1"<<endl;
|
||||||
|
cout<<"-g Write graphing data in dot format to"
|
||||||
|
<<" db_name.dot;"<<endl
|
||||||
|
<<" defaults to off"<<endl;
|
||||||
|
cout<<"-h Help; displays this information"<<endl;
|
||||||
|
cout<<"-l seq_len Length of sequence; defaults to 6"
|
||||||
|
<<endl;
|
||||||
|
cout<<"-p pair_offset Offset for pair number count;"
|
||||||
|
<<" defaults to 0"<<endl;
|
||||||
|
cout<<"-s Display db stats; defaults to off"
|
||||||
|
<<endl;
|
||||||
|
cout<<"-v Verbose mode on; defaults to off"<<endl;
|
||||||
|
cout<<"-V Very verbose mode on; defaults to off"<<endl;
|
||||||
|
cout<<"-hd Compute Hamming distance measures;"
|
||||||
|
<<" defaults to off"<<endl;
|
||||||
|
cout<<"-me max_elements Maximum number of different"
|
||||||
|
<<" elements"<<endl
|
||||||
|
<<" in the input stream; defaults to"
|
||||||
|
<<" 500" <<endl;
|
||||||
|
cout<<"-ms max_num_streams Maximum number of different"
|
||||||
|
<<" streams in input;"<<endl
|
||||||
|
<<" defaults to 100"<<endl;
|
||||||
|
cout<<"-aof add_out_format Format for output when adding to"
|
||||||
|
<<" database"<<endl
|
||||||
|
<<" in verbose or very_verbose"
|
||||||
|
<<" modes; defaults to"<<endl
|
||||||
|
<<" \"DB Size: %d\\tStream: "
|
||||||
|
<<"%s\\tPair Number: %p\\n\""<<endl;
|
||||||
|
cout<<"-cof compare_out_format Format for output when comparing"
|
||||||
|
<<" with database"<<endl
|
||||||
|
<<" in verbose or very_verbose modes;"
|
||||||
|
<<" defaults to"<<endl
|
||||||
|
<<" \"Pair Number: %p\\tStream"
|
||||||
|
<<" Number: %s\\n\""<<endl;
|
||||||
|
exit(-1);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
68
11/wywolania/Data/stide_v1.1/Seq-code/seq_config.h
Executable file
68
11/wywolania/Data/stide_v1.1/Seq-code/seq_config.h
Executable file
@ -0,0 +1,68 @@
|
|||||||
|
#ifndef __SEQ_CONFIG_H
|
||||||
|
#define __SEQ_CONFIG_H
|
||||||
|
|
||||||
|
#define CFREV 1
|
||||||
|
|
||||||
|
#include <iostream.h>
|
||||||
|
#include <fstream.h>
|
||||||
|
#include <string>
|
||||||
|
#include "opt_info.h"
|
||||||
|
|
||||||
|
class Config {
|
||||||
|
public:
|
||||||
|
Config(const int argc, const char *argv[]); // Constructor; reads
|
||||||
|
// configuration file and command
|
||||||
|
// line arguments
|
||||||
|
string cfg_name; // Name of configuration file
|
||||||
|
string db_name; // Name of database
|
||||||
|
int seq_len; // Sequence Length
|
||||||
|
int max_elements; // Maximum number of different
|
||||||
|
// data elements we may encounter
|
||||||
|
int max_streams; // Maximum number of different
|
||||||
|
// streams we may encounter
|
||||||
|
int pair_offset; // Number by which to offset
|
||||||
|
// num_pairs_read
|
||||||
|
string add_output_format; // Format for verbose-mode output
|
||||||
|
// when adding to database
|
||||||
|
string compare_output_format; // Format for verbose-mode output
|
||||||
|
// when comparing with an
|
||||||
|
// existing database
|
||||||
|
int lf_size; // Size of locality frames: 1
|
||||||
|
// effectively means don't
|
||||||
|
// compute locality frames
|
||||||
|
int add_to_db; // Flag indicating that we should
|
||||||
|
// add to the database rather
|
||||||
|
// than make comparisons
|
||||||
|
int output_graph; // Output graphing information in
|
||||||
|
// Dot format
|
||||||
|
int compute_hdist; // Compute Hamming distance
|
||||||
|
int write_db_stats; // Write statistics about the
|
||||||
|
// database
|
||||||
|
int verbose; // Output information about each
|
||||||
|
// anomaly or each new sequence
|
||||||
|
// added to the database
|
||||||
|
int very_verbose; // Output information about each
|
||||||
|
// sequence encountered
|
||||||
|
char fmt_str[10][50]; // String used for outputting
|
||||||
|
// information in verbose mode
|
||||||
|
char write_val[7]; // Do we write the value? used
|
||||||
|
// with fmt_str
|
||||||
|
int num_fvars; // Number of format variables
|
||||||
|
|
||||||
|
void Config::InitOptArray(Array<OptInfo> &opt_array);
|
||||||
|
void Config::SetDefaults();
|
||||||
|
void Config::ReadCommandLine(const int argc, const char *argv[],
|
||||||
|
Array<OptInfo> &opt_array);
|
||||||
|
void Config::AssignValToVar(Array<OptInfo> &opt_array, const
|
||||||
|
string &var_val, const string
|
||||||
|
&var_name, const int name_type);
|
||||||
|
void Config::ReadConfigFile(Array<OptInfo> &opt_array);
|
||||||
|
void Config::ReadOldConfigFile(ifstream &cfg_file,
|
||||||
|
Array<OptInfo> &opt_array);
|
||||||
|
void Config::InitOutputFormat();
|
||||||
|
void Config::CheckValues();
|
||||||
|
void Config::OuputConfigInfo(const Array<OptInfo> &opt_array) const;
|
||||||
|
void Config::WriteHelpInfo() const;
|
||||||
|
};
|
||||||
|
|
||||||
|
#endif
|
358
11/wywolania/Data/stide_v1.1/Seq-code/seq_stream.cc
Executable file
358
11/wywolania/Data/stide_v1.1/Seq-code/seq_stream.cc
Executable file
@ -0,0 +1,358 @@
|
|||||||
|
#include <stdlib.h>
|
||||||
|
#include <string.h>
|
||||||
|
#include <iostream.h>
|
||||||
|
#include <fstream.h>
|
||||||
|
#include <stdio.h>
|
||||||
|
#include "../Utils/hash.h"
|
||||||
|
#include "seq_stream.h"
|
||||||
|
|
||||||
|
/********************************************************************
|
||||||
|
* Init() *
|
||||||
|
* Initializes an instance of Stream. *
|
||||||
|
* *
|
||||||
|
* Input: const Config &cfg Configuration information *
|
||||||
|
* const int intern internal stream identifier *
|
||||||
|
* const int extern external stream identifier *
|
||||||
|
* Output: none *
|
||||||
|
*******************************************************************/
|
||||||
|
|
||||||
|
|
||||||
|
void Stream::Init(const Config &cfg,
|
||||||
|
const int intern_id, const int extern_id) {
|
||||||
|
// initialize all the arrays
|
||||||
|
current_seq.Allocate(cfg.seq_len);
|
||||||
|
current_seq.Set(-1); // initialize the array to be empty
|
||||||
|
num_in_seq = -1;
|
||||||
|
num_pairs_read = 0;
|
||||||
|
num_anoms = 0;
|
||||||
|
num_seqs_fnd = 0;
|
||||||
|
int_sid = intern_id;
|
||||||
|
ext_sid = extern_id;
|
||||||
|
max_hdist = 0;
|
||||||
|
seq_hdist = 0;
|
||||||
|
lf.Allocate(cfg.lf_size);
|
||||||
|
lf.Set(0);
|
||||||
|
seq_lfc = 0;
|
||||||
|
max_lfc = 0;
|
||||||
|
ready = 0;
|
||||||
|
seq_len = cfg.seq_len;
|
||||||
|
}
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* Append() *
|
||||||
|
* This function puts the integer given into the current_seq array *
|
||||||
|
* as the last element. It flags ready according to whether *
|
||||||
|
* current_seq is full. Updates num_in_seq, ready, current_seq, *
|
||||||
|
* num_seqs_fnd, and num_pairs_read. *
|
||||||
|
* *
|
||||||
|
* Input: const int new_value The next value to be put into the *
|
||||||
|
* current_seq array *
|
||||||
|
* Output: none *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
void Stream::Append(const int new_value)
|
||||||
|
{
|
||||||
|
// missing system call - zero the current sequence
|
||||||
|
if (new_value == -1) {
|
||||||
|
num_in_seq = -1;
|
||||||
|
ready = 0;
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
num_pairs_read++;
|
||||||
|
if (num_in_seq < seq_len - 1) { // window not yet full
|
||||||
|
num_in_seq++;
|
||||||
|
current_seq[num_in_seq] = new_value;
|
||||||
|
if (num_in_seq == seq_len - 1) {
|
||||||
|
ready = 1;
|
||||||
|
++num_seqs_fnd;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
else {
|
||||||
|
// Roll over current_seq array
|
||||||
|
for (int k = 0; k < num_in_seq; k++) {
|
||||||
|
current_seq[k] = current_seq[k + 1];
|
||||||
|
}
|
||||||
|
current_seq[num_in_seq] = new_value;
|
||||||
|
++num_seqs_fnd;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/********************************************************************
|
||||||
|
* AddToDB() *
|
||||||
|
* *
|
||||||
|
* Adds current_seq to the database if it isn't already there; *
|
||||||
|
* Returns 0 if it is already there, 1 if it is new. Updates *
|
||||||
|
* normal and db_size. *
|
||||||
|
* *
|
||||||
|
* Input: SeqForest &normal Forest of normal sequences *
|
||||||
|
* int &db_size Number of unique sequences in the *
|
||||||
|
* database *
|
||||||
|
* const int total_pairs_read Number of pairs read from the *
|
||||||
|
* entire input stream *
|
||||||
|
* const Config &cfg Configuration Information *
|
||||||
|
* Output: 0 if sequence isn't new, 1 if it is *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
|
||||||
|
int Stream::AddToDB(SeqForest &normal, int &db_size, const int
|
||||||
|
total_pairs_read, const Config &cfg) const
|
||||||
|
{
|
||||||
|
int is_new;
|
||||||
|
|
||||||
|
// If there is not a tree with the same root as this sequence has,
|
||||||
|
// make a new tree with that root and flag trees_found
|
||||||
|
if (!normal.trees_found[current_seq[0]]) {
|
||||||
|
normal.trees[current_seq[0]].SetRoot(current_seq[0]);
|
||||||
|
normal.trees_found[current_seq[0]] = 1;
|
||||||
|
}
|
||||||
|
// Try to add the sequence. If it's already there, is_new will be
|
||||||
|
// set to 0, otherwise it will be set to 1.
|
||||||
|
is_new = normal.trees[current_seq[0]].InsertSeq(current_seq, 0,
|
||||||
|
seq_len-1);
|
||||||
|
db_size += is_new;
|
||||||
|
if ((is_new && cfg.verbose) || cfg.very_verbose) {
|
||||||
|
ReportNewSeq(cfg, total_pairs_read, db_size);
|
||||||
|
}
|
||||||
|
if (is_new)
|
||||||
|
return 1;
|
||||||
|
else
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* CompareSeq() *
|
||||||
|
* Compares the current sequence in this stream to the database, *
|
||||||
|
* in the manner indicated by the configuration file. Reports *
|
||||||
|
* on anomalies if told to by the configuration file. Updates *
|
||||||
|
* num_anoms, seq_hdist, max_hdist, seq_lfc, and max_lfc. *
|
||||||
|
* *
|
||||||
|
* Input: const Config &cfg: Information from configuration file *
|
||||||
|
* const SeqForest &normal: DB of normal sequences *
|
||||||
|
* const int total_pairs_read: Number of pairs read from *
|
||||||
|
* all of the streams *
|
||||||
|
* Output: none *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
void Stream::CompareSeq(const Config &cfg, const SeqForest &normal,
|
||||||
|
const int total_pairs_read)
|
||||||
|
{
|
||||||
|
int is_anom; // flag to indicate whether current_seq is an anomaly
|
||||||
|
|
||||||
|
is_anom = ComputeMisses(normal);
|
||||||
|
if ((is_anom) && (cfg.compute_hdist)) {
|
||||||
|
ComputeHDist(normal);
|
||||||
|
}
|
||||||
|
if (cfg.lf_size > 1) {
|
||||||
|
ComputeLF(is_anom, cfg.lf_size);
|
||||||
|
}
|
||||||
|
// if we're in verbose mode and either current_seq is an anomaly or
|
||||||
|
// its locality frame contains an anomaly, report it
|
||||||
|
if ((cfg.very_verbose) || (cfg.verbose && (is_anom || seq_lfc))) {
|
||||||
|
ReportSeq(cfg, total_pairs_read, is_anom);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* ComputeMisses() *
|
||||||
|
* Compares the current sequence to the database sequences. If *
|
||||||
|
* there is an exact match, we return 0. Otherwise we return 1. *
|
||||||
|
* Updates num_anoms and seq_hdist. *
|
||||||
|
* *
|
||||||
|
* Input: const SeqForest &normal: DB of normal sequences *
|
||||||
|
* Output: 0 if there is an exact match *
|
||||||
|
* 1 if the sequence is anomalous *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
|
||||||
|
int Stream::ComputeMisses(const SeqForest &normal)
|
||||||
|
{
|
||||||
|
if (normal.IsSeqInForest(current_seq, seq_len)) {
|
||||||
|
seq_hdist = 0;
|
||||||
|
return(0);
|
||||||
|
}
|
||||||
|
|
||||||
|
// We have an anomaly
|
||||||
|
++num_anoms;
|
||||||
|
return(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* ComputeHDist() *
|
||||||
|
* Compares the current sequence in this stream to each sequence *
|
||||||
|
* in the database in turn, adding up the number of mismatches *
|
||||||
|
* between the two sequences. The smallest difference between *
|
||||||
|
* the current sequence and the database sequences is the minimum *
|
||||||
|
* Hamming distance for the current sequence. If this minimum *
|
||||||
|
* Hamming distance is greater than the largest minimum Hamming *
|
||||||
|
* distance encountered so far, then the variable max_hdist is *
|
||||||
|
* updated. Updates seq_hdist and max_hdist. *
|
||||||
|
* *
|
||||||
|
* Input: const SeqForest &normal: DB of normal sequences *
|
||||||
|
* *
|
||||||
|
* Output: none *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
void Stream::ComputeHDist(const SeqForest &normal)
|
||||||
|
{
|
||||||
|
int misses_on_this_seq; // the number of mismatches between
|
||||||
|
// current_seq and the sequence we're
|
||||||
|
// comparing it with at the moment
|
||||||
|
seq_hdist = seq_len; // start with seq_hdist as high as
|
||||||
|
// possible
|
||||||
|
|
||||||
|
// We compare current_seq with each sequence in our database tree
|
||||||
|
for (int i = 0; i < normal.trees.Size(); i++) {
|
||||||
|
// Have we seen any sequences starting with element i? If not, we
|
||||||
|
// can go on to consider sequences starting with element i+1.
|
||||||
|
if (normal.trees_found[i]) {
|
||||||
|
misses_on_this_seq =
|
||||||
|
normal.trees[i].ComputeHDistForTree(current_seq, 0,
|
||||||
|
seq_len-1);
|
||||||
|
if (misses_on_this_seq < seq_hdist) {
|
||||||
|
seq_hdist = misses_on_this_seq;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if (seq_hdist > max_hdist) {
|
||||||
|
max_hdist = seq_hdist;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* ComputeLF() *
|
||||||
|
* Computes the number of misses in current_seq's locality frame. *
|
||||||
|
* Updates lf, seq_lfc and max_lfc. *
|
||||||
|
* *
|
||||||
|
* Input: const int is_anom Flag to indicate whether *
|
||||||
|
* current_seq is an anomaly *
|
||||||
|
* const int lf_size Size of locality frame *
|
||||||
|
* Output: none *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
|
||||||
|
void Stream::ComputeLF(const int is_anom, const int lf_size)
|
||||||
|
{
|
||||||
|
// When num_seqs_fnd is less than lf_size, the locality frame
|
||||||
|
// array is not full
|
||||||
|
if (num_seqs_fnd <= lf_size) {
|
||||||
|
lf[num_seqs_fnd-1] = is_anom;
|
||||||
|
seq_lfc += is_anom;
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
// We're about to remove the first element of lf; since seq_lfc is
|
||||||
|
// the sum of the elements of lf, we should subtract lf[0] from
|
||||||
|
// seq_lfc to remove it from the sum.
|
||||||
|
seq_lfc -= lf[0];
|
||||||
|
// Now we add is_anom and seq_lfc is the sum of the new locality
|
||||||
|
// frame.
|
||||||
|
seq_lfc += is_anom;
|
||||||
|
|
||||||
|
// roll over the array
|
||||||
|
for (int i = 0; i < lf_size-1; i++) {
|
||||||
|
lf[i] = lf[i+1];
|
||||||
|
}
|
||||||
|
lf[lf_size-1] = is_anom;
|
||||||
|
}
|
||||||
|
if (seq_lfc > max_lfc) {
|
||||||
|
max_lfc = seq_lfc;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* ReportSeq() *
|
||||||
|
* This function reports data about a sequence. Specifically, it *
|
||||||
|
* can report the external stream id, a number indicating where *
|
||||||
|
* the first element of the current sequence occurs in the input, *
|
||||||
|
* a number indicating how many pairs from this particular data *
|
||||||
|
* stream have been read prior to the first element of the *
|
||||||
|
* sequence, the minimum Hamming distance for the current *
|
||||||
|
* sequence, the locality frame count, the locality frame count, *
|
||||||
|
* and whether this particular sequence is itself an anomaly (it *
|
||||||
|
* could be that some other sequence in its locality frame is *
|
||||||
|
* anomalous). The configuration file determines which of those *
|
||||||
|
* possible data are reported and in what format. Updates no *
|
||||||
|
* values. *
|
||||||
|
* *
|
||||||
|
* Input: const Config &cfg Configuration information *
|
||||||
|
* const int total_pairs_read Total number of pairs read *
|
||||||
|
* from the input stream from any data *
|
||||||
|
* stream, not just this one *
|
||||||
|
* const int is_anom flag for whether the current *
|
||||||
|
* sequence is itself an anomaly *
|
||||||
|
* Output: none *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
void Stream::ReportSeq(const Config &cfg, const int total_pairs_read,
|
||||||
|
const int is_anom) const
|
||||||
|
{
|
||||||
|
for (int i = 0; i < cfg.num_fvars; i++) {
|
||||||
|
switch (cfg.write_val[i]) {
|
||||||
|
case 'a':
|
||||||
|
printf(cfg.fmt_str[i], is_anom); break;
|
||||||
|
case 'c':
|
||||||
|
if (cfg.lf_size > 1) {
|
||||||
|
printf(cfg.fmt_str[i], seq_lfc);
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
case 'h':
|
||||||
|
if (cfg.compute_hdist) {
|
||||||
|
printf(cfg.fmt_str[i], seq_hdist);
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
case 'i':
|
||||||
|
printf(cfg.fmt_str[i], num_pairs_read); break;
|
||||||
|
case 'p':
|
||||||
|
printf(cfg.fmt_str[i], total_pairs_read); break;
|
||||||
|
case 's':
|
||||||
|
printf(cfg.fmt_str[i], ext_sid); break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
printf(cfg.fmt_str[cfg.num_fvars]);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* ReportNewSeq() *
|
||||||
|
* This function reports on sequences which have been newly added *
|
||||||
|
* to the database. It can report the external stream *
|
||||||
|
* identifier, where the first element of the sequence occurs *
|
||||||
|
* both within the whole input stream and within its own data *
|
||||||
|
* stream, and the number of unique sequences in the database *
|
||||||
|
* after this sequence has been added. The configuration file *
|
||||||
|
* determines which of those possible data are reported and in *
|
||||||
|
* what format. Updates no values. *
|
||||||
|
* *
|
||||||
|
* Input: const Config &cfg Configuration information *
|
||||||
|
* const int total_pairs_read Total number of pairs read *
|
||||||
|
* from the input stream from any data *
|
||||||
|
* stream, not just this one *
|
||||||
|
* const int db_size Number of unique sequences *
|
||||||
|
* in the database *
|
||||||
|
* Output: none *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
void Stream::ReportNewSeq(const Config &cfg, const int total_pairs_read,
|
||||||
|
const int db_size) const
|
||||||
|
{
|
||||||
|
for (int i = 0; i < cfg.num_fvars; i++) {
|
||||||
|
switch (cfg.write_val[i]) {
|
||||||
|
case 'd':
|
||||||
|
printf(cfg.fmt_str[i], db_size); break;
|
||||||
|
case 'i':
|
||||||
|
printf(cfg.fmt_str[i], num_pairs_read); break;
|
||||||
|
case 'p':
|
||||||
|
printf(cfg.fmt_str[i], total_pairs_read); break;
|
||||||
|
case 's':
|
||||||
|
printf(cfg.fmt_str[i], ext_sid); break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
printf(cfg.fmt_str[cfg.num_fvars]);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
61
11/wywolania/Data/stide_v1.1/Seq-code/seq_stream.h
Executable file
61
11/wywolania/Data/stide_v1.1/Seq-code/seq_stream.h
Executable file
@ -0,0 +1,61 @@
|
|||||||
|
#ifndef __STREAM_H
|
||||||
|
#define __STREAM_H
|
||||||
|
|
||||||
|
#include "../Utils/arrays.h"
|
||||||
|
#include "seq_config.h"
|
||||||
|
#include "flexitree.h"
|
||||||
|
|
||||||
|
class Stream {
|
||||||
|
public:
|
||||||
|
Stream() {};
|
||||||
|
void Init(const Config &cfg, const int intern_id, const int
|
||||||
|
extern_id);
|
||||||
|
void Append(const int next_value);
|
||||||
|
int AddToDB(SeqForest &normal, int &db_size, int total_pairs_read,
|
||||||
|
const Config &cfg) const;
|
||||||
|
void CompareSeq(const Config &cfg, const SeqForest &normal, const
|
||||||
|
int total_pairs_read);
|
||||||
|
int GetMaxHDist(void) {return max_hdist;}
|
||||||
|
int GetMaxLFC(void) {return max_lfc;}
|
||||||
|
int Ready(void) {return ready;}
|
||||||
|
int GetNumAnoms(void) {return num_anoms;}
|
||||||
|
int GetNumPairsRead(void) {return num_pairs_read;}
|
||||||
|
int GetNumSeqsFnd(void) {return num_seqs_fnd;}
|
||||||
|
private:
|
||||||
|
Array<int> current_seq; // current sequence being filled or
|
||||||
|
// processed
|
||||||
|
int num_in_seq; // current_seq is full up through
|
||||||
|
// num_in_seq
|
||||||
|
int num_pairs_read; // the number of input pairs belonging to
|
||||||
|
// this stream that have been read so far
|
||||||
|
int num_anoms; // the number of anomalies found so far
|
||||||
|
int num_seqs_fnd; // the number of (not necessarily unique)
|
||||||
|
// sequences belonging to this stream
|
||||||
|
// found so far
|
||||||
|
int ext_sid; // the external stream id
|
||||||
|
int int_sid; // the internal stream id
|
||||||
|
int max_hdist; // the largest minimum Hamming distance
|
||||||
|
// found in this stream
|
||||||
|
int seq_hdist; // the minimum Hamming distance for
|
||||||
|
// current_seq
|
||||||
|
Array<int> lf; // array for locality frame
|
||||||
|
int seq_lfc; // the locality frame count for this
|
||||||
|
// sequence
|
||||||
|
int max_lfc; // the largest locality frame count
|
||||||
|
// encountered so far
|
||||||
|
int ready; // a flag to indicate whether this stream
|
||||||
|
// has a full sequence ready to be
|
||||||
|
// processed. 0 = no, 1 = yes.
|
||||||
|
int seq_len; // sequence length
|
||||||
|
int ComputeMisses(const SeqForest &normal);
|
||||||
|
void ComputeHDist(const SeqForest &normal);
|
||||||
|
void ComputeLF(const int is_anom, const int lf_size);
|
||||||
|
void ReportSeq(const Config &cfg, const int total_pairs_read,
|
||||||
|
const int is_anom) const;
|
||||||
|
void ReportNewSeq(const Config &cfg, const int total_pairs_read,
|
||||||
|
const int db_size) const;
|
||||||
|
};
|
||||||
|
|
||||||
|
#endif
|
||||||
|
|
||||||
|
|
574
11/wywolania/Data/stide_v1.1/Seq-code/stide.cc
Executable file
574
11/wywolania/Data/stide_v1.1/Seq-code/stide.cc
Executable file
@ -0,0 +1,574 @@
|
|||||||
|
/*********************************************************************
|
||||||
|
* *
|
||||||
|
* STIDE: Sequence Time-Delay Embedding v1.1 *
|
||||||
|
* *
|
||||||
|
* Written by Steve Hofmeyr 7/21/96 *
|
||||||
|
* Revised by Julie Rehmeyer 3/98 *
|
||||||
|
* *
|
||||||
|
* Copyright (C) 1996, 1998 Regents of the University of New Mexico. *
|
||||||
|
* All Rights Reserved. *
|
||||||
|
* *
|
||||||
|
* This program is free software; you can redistribute it and/or *
|
||||||
|
* modify it under the terms of the GNU General Public License as *
|
||||||
|
* published by the Free Software Foundation; either version 2 of *
|
||||||
|
* the License, or (at your option) any later version. *
|
||||||
|
* *
|
||||||
|
* This program is distributed in the hope that it will be useful, *
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of *
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the *
|
||||||
|
* GNU General Public License for more details. *
|
||||||
|
* *
|
||||||
|
* You should have received a copy of the GNU General Public *
|
||||||
|
* License along with this program; if not, write to the Free *
|
||||||
|
* Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, *
|
||||||
|
* USA. *
|
||||||
|
* *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <string.h>
|
||||||
|
#include <iostream.h>
|
||||||
|
#include <fstream.h>
|
||||||
|
#include "../Utils/arrays.h"
|
||||||
|
#include "../Utils/hash.h"
|
||||||
|
#include "seq_config.h"
|
||||||
|
#include "seq_stream.h"
|
||||||
|
#include "flexitree.h"
|
||||||
|
|
||||||
|
#define DBREV 1
|
||||||
|
|
||||||
|
int counter = 0;
|
||||||
|
|
||||||
|
Stream *GetReadyStream(Array<Stream> &streams, HashTableInt
|
||||||
|
&sid_table, int &num_streams_fnd, int
|
||||||
|
&total_pairs_read, const Config &cfg);
|
||||||
|
int ReadDB(SeqForest &db_forest, const string &db_name,
|
||||||
|
int &seq_len);
|
||||||
|
void WriteDB(const SeqForest &db_forest, const string &db_name, const
|
||||||
|
int db_size, const int seq_len);
|
||||||
|
void FinalReport(const Config &cfg, const SeqForest &normal, const int
|
||||||
|
num_streams_fnd, const int num_seqs_added, const
|
||||||
|
Array<Stream> &streams, const int db_size);
|
||||||
|
void WriteDBStats(const SeqForest &db_forest, ostream &out_stream,
|
||||||
|
const int db_size);
|
||||||
|
void OutputGraph(const SeqForest &db_forest, string db_name);
|
||||||
|
int GetPrimeLargerThan(const int n);
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* main() *
|
||||||
|
* Input: int argc: Number of command-line arguments *
|
||||||
|
* char *argv[]: array of strings containing *
|
||||||
|
* command-line arguments *
|
||||||
|
* Output: 0 if successful, -1 if unsuccessful *
|
||||||
|
*********************************************************************/
|
||||||
|
|
||||||
|
int main(int argc, char *argv[])
|
||||||
|
|
||||||
|
{
|
||||||
|
Config cfg((const int) argc, (const char **) argv);
|
||||||
|
// Declare configuration object and do
|
||||||
|
// the configuration on the basis of the
|
||||||
|
// command line arguments and the
|
||||||
|
// configuration file
|
||||||
|
Stream *active_stream; // This will point to the stream that
|
||||||
|
// currently has a sequence to be worked
|
||||||
|
// on (either added to the database or
|
||||||
|
// compared).
|
||||||
|
HashTableInt sid_table(GetPrimeLargerThan(cfg.max_streams));
|
||||||
|
// Hash table relating external stream ids to
|
||||||
|
// internal sids; make size of table
|
||||||
|
// smallest prime larger than the number
|
||||||
|
// of streams
|
||||||
|
SeqForest normal(cfg.max_elements); // Uninitialized forest of
|
||||||
|
// normal sequences
|
||||||
|
Array<Stream> streams(cfg.max_streams); // Array of stream objects,
|
||||||
|
// one for each data stream
|
||||||
|
// in input, which are
|
||||||
|
// allocated as needed
|
||||||
|
int num_streams_fnd = 0; // Number of data streams
|
||||||
|
// encountered to date
|
||||||
|
int total_pairs_read = cfg.pair_offset; // Number of pairs read from
|
||||||
|
// input to date from all
|
||||||
|
// the data streams combined
|
||||||
|
// -- can be offset using
|
||||||
|
// the "-n" switch
|
||||||
|
int db_size; // Total number of unique
|
||||||
|
// sequences in the database
|
||||||
|
int init_db_size = 0; // Number of unique
|
||||||
|
// sequences in the
|
||||||
|
// pre-existing database
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
// Read database into normal, if database exists
|
||||||
|
db_size = init_db_size = ReadDB(normal, cfg.db_name, cfg.seq_len);
|
||||||
|
|
||||||
|
if (cfg.add_to_db) {
|
||||||
|
while ((active_stream =
|
||||||
|
GetReadyStream(streams, sid_table, num_streams_fnd,
|
||||||
|
total_pairs_read, cfg))
|
||||||
|
!= NULL) {
|
||||||
|
active_stream->AddToDB(normal, db_size, total_pairs_read, cfg);
|
||||||
|
}
|
||||||
|
WriteDB(normal, cfg.db_name, db_size, cfg.seq_len);
|
||||||
|
if (cfg.output_graph) {
|
||||||
|
OutputGraph(normal,cfg.db_name);
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
int i = 0;
|
||||||
|
while ((active_stream =
|
||||||
|
GetReadyStream(streams, sid_table, num_streams_fnd,
|
||||||
|
total_pairs_read, cfg))
|
||||||
|
!= NULL) {
|
||||||
|
active_stream->CompareSeq(cfg, normal, total_pairs_read);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
FinalReport(cfg, normal, num_streams_fnd, db_size - init_db_size,
|
||||||
|
streams, db_size);
|
||||||
|
return(0);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**********************************************************************
|
||||||
|
* GetReadyStream() *
|
||||||
|
* This function reads a pair from the input, appends the element *
|
||||||
|
* to the current sequence string in the appropriate data stream, *
|
||||||
|
* finds out if that data stream has a complete sequence to be *
|
||||||
|
* processed, continues until it has found such a data stream, and *
|
||||||
|
* returns a pointer to it. It updates num_streams_fnd, *
|
||||||
|
* total_pairs_read, sid_table, and streams. *
|
||||||
|
* *
|
||||||
|
* Input: Array<Stream> &streams: the array of streams that we have *
|
||||||
|
* found so far *
|
||||||
|
* HashTableInt &sid_table: hash table relating external sids *
|
||||||
|
* to internal sids *
|
||||||
|
* int &num_streams_fnd: the number of streams found so far; *
|
||||||
|
* int &total_pairs_read: the number of pairs read from the *
|
||||||
|
* input stream so far *
|
||||||
|
* const Config &cfg: configuration information *
|
||||||
|
* *
|
||||||
|
* Output: a pointer to the next stream that is ready for processing *
|
||||||
|
**********************************************************************/
|
||||||
|
|
||||||
|
Stream *GetReadyStream(Array<Stream> &streams, HashTableInt
|
||||||
|
&sid_table, int &num_streams_fnd, int
|
||||||
|
&total_pairs_read, const Config &cfg)
|
||||||
|
|
||||||
|
{
|
||||||
|
Stream *ready_stream = NULL;
|
||||||
|
int ext_sid;
|
||||||
|
int int_sid;
|
||||||
|
int sval;
|
||||||
|
|
||||||
|
cin >> ext_sid;
|
||||||
|
while (!cin.eof()) {
|
||||||
|
if (ext_sid == -1) {
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
int_sid = sid_table.ExtToInt(ext_sid, num_streams_fnd);
|
||||||
|
cin >> sval;
|
||||||
|
++total_pairs_read;
|
||||||
|
|
||||||
|
// Update num_streams_fnd, if necessary
|
||||||
|
if (int_sid >= num_streams_fnd) {
|
||||||
|
if (int_sid > cfg.max_streams) {
|
||||||
|
cerr<<"ERROR: Too many streams to follow, aborting..."<<endl;
|
||||||
|
exit(-1);
|
||||||
|
}
|
||||||
|
// We need a new stream object
|
||||||
|
streams[num_streams_fnd].Init(cfg, int_sid, ext_sid);
|
||||||
|
num_streams_fnd = int_sid + 1;
|
||||||
|
}
|
||||||
|
streams[int_sid].Append(sval);
|
||||||
|
if (streams[int_sid].Ready()) {
|
||||||
|
ready_stream = &streams[int_sid];
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
cin >> ext_sid;
|
||||||
|
}
|
||||||
|
|
||||||
|
return ready_stream;
|
||||||
|
}
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* ReadDB() *
|
||||||
|
* Reads the database from a file and returns the number of unique *
|
||||||
|
* sequences in the database. Checks for appropriate revision *
|
||||||
|
* number. If it is a revision DBREV database, the second line *
|
||||||
|
* will be "#DBseq_len: " followed by the sequence length. The *
|
||||||
|
* next line will contain a single number, giving the root of the *
|
||||||
|
* first tree. The following lines will contain the tree itself. *
|
||||||
|
* The first seq_len numbers make up the first sequence (so the *
|
||||||
|
* first number of the second line will be the same as the number *
|
||||||
|
* on the first line). The next number will be a negative number *
|
||||||
|
* between -(seq_len-1) and -2, indicating how far to backtrack in *
|
||||||
|
* the first sequence, and the following positive numbers give the *
|
||||||
|
* rest of the second sequence. So, for example, -3 would mean *
|
||||||
|
* backtrack 3 numbers, take the previous numbers including the *
|
||||||
|
* one you're on, and append the next two numbers. So after the *
|
||||||
|
* -3 you would find two positive numbers, followed by a negative *
|
||||||
|
* number (which you would use the same way as you used the -3, on *
|
||||||
|
* the most recent sequence). Each tree is terminated by the *
|
||||||
|
* number -1. So the sample input file *
|
||||||
|
* 3 *
|
||||||
|
* 3 4 2 9 10 3 -4 3 9 8 -2 3 -3 4 9 -1 *
|
||||||
|
* 2 *
|
||||||
|
* 2 3 4 5 6 7 -3 2 9 -1 *
|
||||||
|
* yields the sequences: *
|
||||||
|
* 3 4 2 9 10 3 *
|
||||||
|
* 3 4 2 3 9 8 *
|
||||||
|
* 3 4 2 3 9 3 *
|
||||||
|
* 3 4 2 3 4 9 *
|
||||||
|
* 2 3 4 5 6 7 *
|
||||||
|
* 2 3 4 5 2 9 *
|
||||||
|
* *
|
||||||
|
* Input: SeqForest &db_forest Forest of sequences *
|
||||||
|
* const string &db_name Name of database *
|
||||||
|
* int &seq_len User-specified sequence length *
|
||||||
|
* *
|
||||||
|
* Output: the number of unique sequences in the database *
|
||||||
|
* *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
int ReadDB(SeqForest &db_forest, const string &db_name,
|
||||||
|
int &seq_len)
|
||||||
|
{
|
||||||
|
ifstream in_db_file(db_name.c_str()); // file to read the database from
|
||||||
|
int db_size = 0; // size of the database
|
||||||
|
int root; // the first element of the sequences
|
||||||
|
// we are reading in at the moment;
|
||||||
|
// i.e., the root of this tree
|
||||||
|
string buff;
|
||||||
|
int db_seq_len;
|
||||||
|
int rev_num;
|
||||||
|
|
||||||
|
if (!in_db_file.is_open()) {
|
||||||
|
cerr<<"WARNING: Cannot open database file " << db_name
|
||||||
|
<< " for input"<<endl<<"Creating a new file"<<endl;
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check to see if the first line contains "#DBrev:"
|
||||||
|
in_db_file>>buff;
|
||||||
|
if (buff == "#DBrev:") {
|
||||||
|
in_db_file>>rev_num;
|
||||||
|
if (rev_num > DBREV) {
|
||||||
|
cerr << "ERROR: The revision number is greater than " << DBREV
|
||||||
|
<< ". This version of STIDE is only capable of dealing "
|
||||||
|
<< "with databases through DBrev " << DBREV
|
||||||
|
<< ". Aborting..."<<endl;
|
||||||
|
exit(-1);
|
||||||
|
}
|
||||||
|
if (rev_num < DBREV) {
|
||||||
|
cerr << "ERROR: Revision number of database must be >= " << DBREV
|
||||||
|
<< endl;
|
||||||
|
exit(-1);
|
||||||
|
}
|
||||||
|
// Now we know that it is revision DBREV. Check sequence length of
|
||||||
|
// database against user-indicated sequence length
|
||||||
|
in_db_file>>buff;
|
||||||
|
// Now check to see if next line is "#DBseq_len: " followed by a
|
||||||
|
// number
|
||||||
|
if (buff != "#DBseq_len:") {
|
||||||
|
cerr << "ERROR: The second line of the database does not "
|
||||||
|
<< "contain the string \"#DBseq_len: \"" << endl
|
||||||
|
<< "followed by the sequence length of the database, as "
|
||||||
|
<< "required of revision " << DBREV
|
||||||
|
<< " databases. Aborting..."<< endl;
|
||||||
|
exit(-1);
|
||||||
|
}
|
||||||
|
in_db_file>>db_seq_len;
|
||||||
|
if (db_seq_len != seq_len) {
|
||||||
|
cerr << "WARNING: Database sequence length is " << db_seq_len
|
||||||
|
<< ", which does not match "
|
||||||
|
<< "sequence length specified" << endl
|
||||||
|
<< "by user (or by default if no specification was given), "
|
||||||
|
<< "which is " << seq_len << endl
|
||||||
|
<< "I will use the database sequence length. If that is "
|
||||||
|
<< "not what you intended, type Ctrl-C to abort." << endl;
|
||||||
|
seq_len = db_seq_len;
|
||||||
|
}
|
||||||
|
// Read next number into root
|
||||||
|
in_db_file >> root;
|
||||||
|
}
|
||||||
|
// Otherwise, we assume we have an old-style database, and let the
|
||||||
|
// user know that that's our assumption
|
||||||
|
else {
|
||||||
|
cerr << "WARNING: The string \"DBrev: \" is not in the first "
|
||||||
|
<< "line of the database." << endl
|
||||||
|
<< "I'm assuming that it's an older style of database, and "
|
||||||
|
<< "will read it in" << endl
|
||||||
|
<< "based on that assumption. If that is not what you want "
|
||||||
|
<< "me to do, type CTRL-C" << endl << endl;
|
||||||
|
// we have just read the first root into buff -- put it in root
|
||||||
|
// instead
|
||||||
|
root = atoi(buff.c_str());
|
||||||
|
}
|
||||||
|
|
||||||
|
while (!in_db_file.eof()) {
|
||||||
|
if (root == -1) break;
|
||||||
|
db_forest.trees_found[root]++;
|
||||||
|
in_db_file>>db_forest.trees[root];
|
||||||
|
db_size += db_forest.trees[root].NumLeaves();
|
||||||
|
in_db_file>>root;
|
||||||
|
}
|
||||||
|
in_db_file.close();
|
||||||
|
|
||||||
|
return db_size;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* WriteDB() *
|
||||||
|
* Writes db_forest to the file db_name, with the format described *
|
||||||
|
* in the header of ReadDB(). Prints database statistics at the *
|
||||||
|
* end of the file. *
|
||||||
|
* *
|
||||||
|
* Input: const SeqForest &db_forest Forest of sequences in *
|
||||||
|
* database *
|
||||||
|
* const string &db_name Name of file in which to *
|
||||||
|
* put database. *
|
||||||
|
* const int db_size Number of unique sequences *
|
||||||
|
* in the database *
|
||||||
|
* const int seq_len Sequence length *
|
||||||
|
* *
|
||||||
|
* Output: none *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
void WriteDB(const SeqForest &db_forest, const string &db_name, const
|
||||||
|
int db_size, const int seq_len)
|
||||||
|
{
|
||||||
|
|
||||||
|
ofstream out_db_file(db_name.c_str());
|
||||||
|
|
||||||
|
if (!out_db_file.is_open()) {
|
||||||
|
cerr << "ERROR: Cannot open database file " << db_name
|
||||||
|
<< "for output, aborting..." << endl ;
|
||||||
|
exit(-2);
|
||||||
|
}
|
||||||
|
out_db_file << "#DBrev: " << DBREV << endl;
|
||||||
|
out_db_file << "#DBseq_len: " << seq_len << endl;
|
||||||
|
|
||||||
|
for (int i = 0; i < db_forest.trees.Size(); i++) {
|
||||||
|
if (db_forest.trees_found[i]) {
|
||||||
|
out_db_file<<i<<endl;
|
||||||
|
out_db_file<<db_forest.trees[i]<<endl;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
out_db_file<<" -1"<<endl;
|
||||||
|
// we can now write anything, so I will write the db stats
|
||||||
|
out_db_file<<"; DB STATS"<<endl;
|
||||||
|
WriteDBStats(db_forest, out_db_file, db_size);
|
||||||
|
out_db_file.close();
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* FinalReport() *
|
||||||
|
* Reports data at end of run. The number of streams, the number *
|
||||||
|
* of input pairs, and the number of sequences in the input are *
|
||||||
|
* always reported. If we have done a comparison run, we report *
|
||||||
|
* the number of anomalies, and the precentage of sequences that *
|
||||||
|
* were anomalous. Additionally, if asked for, the Hamming *
|
||||||
|
* distance or locality frame count is reported. If we have added *
|
||||||
|
* to the database, we report having done so and report the number *
|
||||||
|
* of sequences added. If database statistics are asked for, we *
|
||||||
|
* report the number of nodes, the number of unique sequences, the *
|
||||||
|
* number of branches, and the average database branch factor. *
|
||||||
|
* *
|
||||||
|
* Input: const Config &cfg: Configuration information *
|
||||||
|
* const SeqForest &normal: DB of normal sequences *
|
||||||
|
* const int num_streams_fnd: Total number of streams found*
|
||||||
|
* const int num_seqs_added: Number of unique sequences *
|
||||||
|
* added *
|
||||||
|
* const Array<Stream> &streams: Array of data streams *
|
||||||
|
* const int db_size: Number of unique sequences *
|
||||||
|
* in DB *
|
||||||
|
* *
|
||||||
|
* Output: none *
|
||||||
|
* *
|
||||||
|
*********************************************************************/
|
||||||
|
|
||||||
|
void FinalReport(const Config &cfg, const SeqForest &normal, const int
|
||||||
|
num_streams_fnd, const int num_seqs_added, const
|
||||||
|
Array<Stream> &streams, const int db_size)
|
||||||
|
{
|
||||||
|
int total_pairs = 0;
|
||||||
|
int total_seqs = 0;
|
||||||
|
int total_anoms = 0;
|
||||||
|
int total_max_lfc = 0;
|
||||||
|
int total_max_hdist = 0;
|
||||||
|
int db_nodes = 0;
|
||||||
|
int db_seqs = 0;
|
||||||
|
int db_branches = 0;
|
||||||
|
int j;
|
||||||
|
|
||||||
|
// Sum up number of pairs input and number of seqs from all the streams
|
||||||
|
for (j = 0; j < num_streams_fnd; j++) {
|
||||||
|
total_seqs += streams[j].GetNumSeqsFnd();
|
||||||
|
total_pairs += streams[j].GetNumPairsRead();
|
||||||
|
}
|
||||||
|
|
||||||
|
cout << endl;
|
||||||
|
cout << "Number of different streams in input = "
|
||||||
|
<< num_streams_fnd << endl;
|
||||||
|
cout << "Total number of input pairs = "
|
||||||
|
<< total_pairs << endl;
|
||||||
|
cout << "Total number of sequences in input = "
|
||||||
|
<< total_seqs << endl;
|
||||||
|
|
||||||
|
if (cfg.add_to_db) {
|
||||||
|
cout << "File added to database" << endl;
|
||||||
|
cout << "Number of new sequences added to the database: "
|
||||||
|
<< num_seqs_added << endl;
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
cout << "Scan completed" << endl;
|
||||||
|
// Sum up number of anomalies from all the streams
|
||||||
|
for (j = 0; j < num_streams_fnd; j++) {
|
||||||
|
total_anoms += streams[j].GetNumAnoms();
|
||||||
|
}
|
||||||
|
|
||||||
|
cout << "Number of anomalies = "
|
||||||
|
<< total_anoms << endl;
|
||||||
|
cout << "Percentage anomalous = "
|
||||||
|
<< ((float)total_anoms * 100.0)/total_seqs << endl;
|
||||||
|
|
||||||
|
// If asked for, compute Hamming distances across streams and report
|
||||||
|
if (cfg.compute_hdist) {
|
||||||
|
for (j = 0; j < num_streams_fnd; j++) {
|
||||||
|
if (streams[j].GetMaxHDist() > total_max_hdist) {
|
||||||
|
total_max_hdist = streams[j].GetMaxHDist();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
cout << "Largest minimum Hamming distance = "
|
||||||
|
<< total_max_hdist << endl;
|
||||||
|
}
|
||||||
|
|
||||||
|
// If asked for, compute lfc across streams and report
|
||||||
|
if (cfg.lf_size > 1) {
|
||||||
|
for (j = 0; j < num_streams_fnd; j++) {
|
||||||
|
if (streams[j].GetMaxLFC() > total_max_lfc) {
|
||||||
|
total_max_lfc = streams[j].GetMaxLFC();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
cout << "Maximum lfc = " << total_max_lfc << endl;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// If asked for, compute db stats and report
|
||||||
|
if (cfg.write_db_stats) {
|
||||||
|
WriteDBStats(normal, cout, db_size);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* WriteDBStats() *
|
||||||
|
* Computes and writes to standard output the number of nodes in *
|
||||||
|
* the database, the number of unique sequences, the number of *
|
||||||
|
* branches, and the average database branch factor. *
|
||||||
|
* *
|
||||||
|
* Input: const SeqForest &db_forest Forest of sequences in *
|
||||||
|
* database *
|
||||||
|
* ostream &out_stream Where to write info *
|
||||||
|
* const int db_size Number of unique sequences in the *
|
||||||
|
* database *
|
||||||
|
* *
|
||||||
|
* Output: none *
|
||||||
|
*********************************************************************/
|
||||||
|
|
||||||
|
void WriteDBStats(const SeqForest &db_forest, ostream &out_stream,
|
||||||
|
const int db_size)
|
||||||
|
{
|
||||||
|
int db_nodes = 0;
|
||||||
|
int db_branches = 0;
|
||||||
|
|
||||||
|
for (int i = 0; i < db_forest.trees.Size(); i++) {
|
||||||
|
if (db_forest.trees_found[i]) {
|
||||||
|
db_nodes += db_forest.trees[i].NumNodes();
|
||||||
|
db_branches += db_forest.trees[i].NumBranches();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
out_stream << "Number of DB nodes = " << db_nodes << endl;
|
||||||
|
out_stream << "Number of unique sequences = "<<db_size << endl;
|
||||||
|
out_stream << "Number of branches (edges) = "<<db_branches << endl;
|
||||||
|
out_stream << "Average DB branch factor = "
|
||||||
|
<<((float)db_branches/(db_nodes - db_size))<<endl;
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* OutputGraph() *
|
||||||
|
* Writes a file db_name.dot containing input for the program Dot. *
|
||||||
|
* Running Dot on db_name.dot produces a PostScript file *
|
||||||
|
* containing a picture of the whole database tree. *
|
||||||
|
* *
|
||||||
|
* Input: const SeqForest &db_forest Forest of sequences in *
|
||||||
|
* database *
|
||||||
|
* const string db_name Filename to use *
|
||||||
|
* *
|
||||||
|
* Output: none *
|
||||||
|
*********************************************************************/
|
||||||
|
|
||||||
|
void OutputGraph(const SeqForest &db_forest, const string db_name)
|
||||||
|
{
|
||||||
|
char *dot_filename;
|
||||||
|
dot_filename = new char [strlen(db_name.c_str())+4];
|
||||||
|
strcpy(dot_filename, db_name.c_str());
|
||||||
|
ofstream output_file(strcat(dot_filename,".dot"));
|
||||||
|
|
||||||
|
output_file<<"digraph \""<<db_name<<"\" {"<<endl;
|
||||||
|
output_file<<" ratio=auto;"<<endl;
|
||||||
|
output_file<<" page=\"8.5,11\";"<<endl;
|
||||||
|
for (int i = 0; i < db_forest.trees.Size(); i++) {
|
||||||
|
if (db_forest.trees_found[i])
|
||||||
|
db_forest.trees[i].OutputGraph(output_file);
|
||||||
|
}
|
||||||
|
output_file<<"}"<<endl;
|
||||||
|
output_file.close();
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/****************************************************************************
|
||||||
|
* GetPrimeLargerThan(int n) *
|
||||||
|
* Returns the smallest prime larger than the input integer. *
|
||||||
|
* Changes no values. *
|
||||||
|
* *
|
||||||
|
* Input: const int n *
|
||||||
|
* Output: smallest prime larger than n *
|
||||||
|
***************************************************************************/
|
||||||
|
|
||||||
|
int GetPrimeLargerThan(const int n)
|
||||||
|
{
|
||||||
|
int primes[n];
|
||||||
|
int primes_fnd = 1;
|
||||||
|
int curr_num = 3;
|
||||||
|
int is_prime = 1;
|
||||||
|
|
||||||
|
primes[0] = 2;
|
||||||
|
while(1) {
|
||||||
|
for (int i = 0; i < primes_fnd; i++) {
|
||||||
|
if ((curr_num % primes[i]) == 0) {
|
||||||
|
is_prime = 0;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (is_prime == 1) {
|
||||||
|
primes[primes_fnd++] = curr_num;
|
||||||
|
if (curr_num > n) {
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
curr_num = curr_num + 2;
|
||||||
|
is_prime = 1;
|
||||||
|
}
|
||||||
|
return curr_num;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
35
11/wywolania/Data/stide_v1.1/Seq-code/template.cc
Executable file
35
11/wywolania/Data/stide_v1.1/Seq-code/template.cc
Executable file
@ -0,0 +1,35 @@
|
|||||||
|
#include "../Utils/arrays.cc"
|
||||||
|
#include "../Utils/tll.cc"
|
||||||
|
#include "../Utils/hash.cc"
|
||||||
|
#include "seq_stream.h"
|
||||||
|
#include "flexitree.h"
|
||||||
|
#include "opt_info.h"
|
||||||
|
|
||||||
|
/*
|
||||||
|
template class List<FlexiTree>;
|
||||||
|
template class LLNode<FlexiTree>;
|
||||||
|
template class LinkedList<FlexiTree>;
|
||||||
|
*/
|
||||||
|
template class Array<FlexiTree>;
|
||||||
|
|
||||||
|
template class List<int>;
|
||||||
|
template class LinkedList<int>;
|
||||||
|
template class Array<int>;
|
||||||
|
template class Array<LinkedList<int> >;
|
||||||
|
|
||||||
|
template class List<HashItem>;
|
||||||
|
template class LLNode<HashItem>;
|
||||||
|
template class LinkedList<HashItem>;
|
||||||
|
template class Array<LinkedList<HashItem> >;
|
||||||
|
|
||||||
|
template class List<HashItemInt>;
|
||||||
|
template class LLNode<HashItemInt>;
|
||||||
|
template class LinkedList<HashItemInt>;
|
||||||
|
template class Array<LinkedList<HashItemInt> >;
|
||||||
|
template class Array<HashItemInt>;
|
||||||
|
|
||||||
|
template class Array<Stream>;
|
||||||
|
template class Array<char*>;
|
||||||
|
template class Array<OptInfo>;
|
||||||
|
|
||||||
|
|
141
11/wywolania/Data/stide_v1.1/Utils/arrays.cc
Executable file
141
11/wywolania/Data/stide_v1.1/Utils/arrays.cc
Executable file
@ -0,0 +1,141 @@
|
|||||||
|
// **********
|
||||||
|
// ARRAYS.CPP
|
||||||
|
// **********
|
||||||
|
|
||||||
|
#include <iostream.h>
|
||||||
|
#include <assert.h>
|
||||||
|
|
||||||
|
#include "arrays.h"
|
||||||
|
|
||||||
|
//=============================================================================
|
||||||
|
template <class T> void Array<T>::Init(const Array<T> &t) {
|
||||||
|
Allocate(t.size);
|
||||||
|
assert(size == t.size);
|
||||||
|
for (int i = 0; i < size; i++)
|
||||||
|
data[i] = t.data[i];
|
||||||
|
}
|
||||||
|
//=============================================================================
|
||||||
|
template <class T> void Array<T>::Allocate(int as) {
|
||||||
|
// if previously allocated, delete old dynamic array
|
||||||
|
if (size) delete[] data;
|
||||||
|
size = as;
|
||||||
|
data = new T[size];
|
||||||
|
assert(data);
|
||||||
|
}
|
||||||
|
//=============================================================================
|
||||||
|
template <class T> Array<T>::~Array() {
|
||||||
|
delete[] data;
|
||||||
|
}
|
||||||
|
//=============================================================================
|
||||||
|
template <class T> T &Array<T>::operator[](int i) const {
|
||||||
|
if (i < 0) { cout<<"ERROR in []: "<<i<<"< 0"<<endl; exit(-1); }
|
||||||
|
if (i >= size) { cout<<"ERROR in []: "<<i<<" >= "<<size<<endl; exit(-1); }
|
||||||
|
assert(i >= 0);
|
||||||
|
assert(i < size);
|
||||||
|
return data[i];
|
||||||
|
}
|
||||||
|
//=============================================================================
|
||||||
|
template <class T> T &Array<T>::Data(int i) {
|
||||||
|
if (i < 0) { cout<<"ERROR in Data: "<<i<<"< 0"<<endl; exit(-1); }
|
||||||
|
if (i >= size) { cout<<"ERROR in Data: "<<i<<" >= "<<size<<endl; exit(-1); }
|
||||||
|
assert(i >= 0);
|
||||||
|
assert(i < size);
|
||||||
|
return data[i];
|
||||||
|
}
|
||||||
|
//=============================================================================
|
||||||
|
template <class T> Array<T> &Array<T>::operator = (const Array<T> &t) {
|
||||||
|
if (!size) // if the object in not yet allocated, do it and then assign
|
||||||
|
Allocate(t.size);
|
||||||
|
assert(size == t.size);
|
||||||
|
for (int i = 0; i < size; i++)
|
||||||
|
data[i] = t.data[i];
|
||||||
|
return *this;
|
||||||
|
}
|
||||||
|
//=============================================================================
|
||||||
|
template <class T> int Array<T>::Size() const {
|
||||||
|
return size;
|
||||||
|
}
|
||||||
|
//=============================================================================
|
||||||
|
template <class T> ostream &operator<<(ostream &s, const Array<T> &t) {
|
||||||
|
for (int i =0; i < t.size; i++)
|
||||||
|
s<<t.data[i]<<" ";
|
||||||
|
return s;
|
||||||
|
}
|
||||||
|
//=============================================================================
|
||||||
|
template <class T> void Array<T>::Set(T t) {
|
||||||
|
for (int i =0; i < size; i++)
|
||||||
|
data[i] = t;
|
||||||
|
}
|
||||||
|
//=============================================================================
|
||||||
|
//=============================================================================
|
||||||
|
// HeapSort data[0..size-1] DESCENDING
|
||||||
|
template <class T> void SortableArray<T>::Sort() {
|
||||||
|
// build the heap
|
||||||
|
for (int i = Size()-1; i >= 0; i--)
|
||||||
|
Adjust(i, Size()-1);
|
||||||
|
|
||||||
|
for (int i = Size()-1; i >= 1; i--) {
|
||||||
|
// swap data
|
||||||
|
T temp1 = Data(0);
|
||||||
|
Data(0) = Data(i);
|
||||||
|
Data(i) = temp1;
|
||||||
|
|
||||||
|
Adjust(0, i-1);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
//=============================================================================
|
||||||
|
template <class T> void SortableArray<T>::Adjust(int root, int last) {
|
||||||
|
if (2*root <= last) {
|
||||||
|
int child = 2*root;
|
||||||
|
if ((child+1) <= last) {
|
||||||
|
if (Data(child+1) < Data(child))
|
||||||
|
child++;
|
||||||
|
}
|
||||||
|
if (Data(child) < Data(root)) {
|
||||||
|
T temp = Data(root);
|
||||||
|
Data(root) = Data(child);
|
||||||
|
Data(child) = temp;
|
||||||
|
|
||||||
|
Adjust(child, last);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
//=============================================================================
|
||||||
|
//=============================================================================
|
||||||
|
// HeapSort data[0..size-1] DESCENDING
|
||||||
|
template <class T, class C> void CompSortableArray<T, C>::Sort() {
|
||||||
|
// build the heap
|
||||||
|
int sz = Size();
|
||||||
|
for (int i = sz-1; i >= 0; i--) {
|
||||||
|
Adjust(i, sz-1);
|
||||||
|
}
|
||||||
|
|
||||||
|
for (i = sz-1; i >= 1; i--) {
|
||||||
|
// do the swap
|
||||||
|
T temp1 = Data(0);
|
||||||
|
Data(0) = Data(i);
|
||||||
|
Data(i) = temp1;
|
||||||
|
Adjust(0, i-1);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
//=============================================================================
|
||||||
|
template <class T, class C> void CompSortableArray<T, C>::Adjust(int root,
|
||||||
|
int last) {
|
||||||
|
if (2*root <= last) {
|
||||||
|
int child = 2*root;
|
||||||
|
if ((child+1) <= last) {
|
||||||
|
if (comp_ptr->Compare(Data(child+1), Data(child)) == -1) {
|
||||||
|
child++;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (comp_ptr->Compare(Data(child), Data(root)) == -1) {
|
||||||
|
T temp = Data(root);
|
||||||
|
Data(root) = Data(child);
|
||||||
|
Data(child) = temp;
|
||||||
|
|
||||||
|
Adjust(child, last);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
//=============================================================================
|
||||||
|
|
109
11/wywolania/Data/stide_v1.1/Utils/arrays.h
Executable file
109
11/wywolania/Data/stide_v1.1/Utils/arrays.h
Executable file
@ -0,0 +1,109 @@
|
|||||||
|
// ********
|
||||||
|
// ARRAYS.H
|
||||||
|
// ********
|
||||||
|
|
||||||
|
#ifndef ARRAYS_H
|
||||||
|
#define ARRAYS_H
|
||||||
|
|
||||||
|
//#define PC
|
||||||
|
|
||||||
|
#include <iostream.h>
|
||||||
|
#include "errors.h"
|
||||||
|
|
||||||
|
// this is a template for all classes which use an array of objects
|
||||||
|
// it is dynamically allocated
|
||||||
|
template <class T> class Array {
|
||||||
|
public:
|
||||||
|
Array() {Init();}
|
||||||
|
Array(const Array<T> &t) {Init(t);} // the copy constructor
|
||||||
|
Array(int asize) {Init(); Allocate(asize);} // creates an array of size "asize"
|
||||||
|
~Array(); // the destructor deletes all internal
|
||||||
|
// objects
|
||||||
|
void Allocate(int asize); // allocates asize objects
|
||||||
|
// if data was already allocated,
|
||||||
|
// deletes and re-allocates
|
||||||
|
T &operator[](int i) const;
|
||||||
|
Array<T> &operator = (const Array<T> &t);
|
||||||
|
// copies one array to another,
|
||||||
|
// requires that the assignment
|
||||||
|
// operator be defined for array
|
||||||
|
// elements
|
||||||
|
void Set(T t); // sets all elements to t
|
||||||
|
int Size() const; // returns the size of the array
|
||||||
|
friend ostream &operator<<(ostream &s, const Array<T> &t);
|
||||||
|
protected:
|
||||||
|
T &Data(int i); // method derived class can use for
|
||||||
|
// accessing data
|
||||||
|
void Init() {size = 0; data = NULL;}// default intialisor
|
||||||
|
void Init(const Array<T> &t); // implements copy constructor
|
||||||
|
private:
|
||||||
|
int size; // the size of the array
|
||||||
|
T *data; // ptr to the array of objects
|
||||||
|
};
|
||||||
|
|
||||||
|
// this is a template for sortable arrays of objects, i.e. the objects provide
|
||||||
|
// a less than comparison operator, which is used in the Sort method to perform
|
||||||
|
// a heap sort
|
||||||
|
template <class T> class SortableArray : public Array<T> {
|
||||||
|
public:
|
||||||
|
SortableArray() {Init();}
|
||||||
|
SortableArray(const SortableArray<T> &t) {Init(t);}
|
||||||
|
SortableArray(int asize) {Allocate(asize);}
|
||||||
|
void Sort(); // performs a heapsort on the data,
|
||||||
|
// using the < operator
|
||||||
|
protected:
|
||||||
|
void Adjust(int root, int last); // for the heap sort
|
||||||
|
};
|
||||||
|
|
||||||
|
// this is a template for sortable arrays of objects, but the comparison
|
||||||
|
// operator is provided by another class C
|
||||||
|
template <class T, class C> class CompSortableArray : public Array<T> {
|
||||||
|
public:
|
||||||
|
CompSortableArray() {Init();}
|
||||||
|
CompSortableArray(int asize, C *c_ptr) {Allocate(asize); comp_ptr = c_ptr;}
|
||||||
|
void Sort(); // performs a heapsort on the data,
|
||||||
|
// using comp_ptr->Compare
|
||||||
|
protected:
|
||||||
|
C *comp_ptr; // a ptr to the object with the Compare
|
||||||
|
// method
|
||||||
|
void Adjust(int root, int last);
|
||||||
|
};
|
||||||
|
|
||||||
|
// this is a template for a multidimensional array of one type of object
|
||||||
|
// when declaring this one must specify the number of dimensions first,
|
||||||
|
// followed by the size for each array dimension
|
||||||
|
/*
|
||||||
|
template <class T> class MultiArray {
|
||||||
|
public:
|
||||||
|
MultiArray() {Init();}
|
||||||
|
MultiArray(const MultiArray<T> &t) {Init(t);} // the copy constructor
|
||||||
|
MultiArray(int dims, int x, ...); // a variable number of parameters
|
||||||
|
{Init(); Allocate(xsize, ysize);}
|
||||||
|
~Array2D(); // the destructor deletes all internal
|
||||||
|
// objects
|
||||||
|
void Allocate(int xsize, ...); // allocates x, y, ... size array
|
||||||
|
// if data was already allocated,
|
||||||
|
// deletes and re-allocates
|
||||||
|
T Data(int x, int y); // returns object in x,y location
|
||||||
|
Array2D<T> &operator = (const Array2D<T> &t);
|
||||||
|
// copies one array to another,
|
||||||
|
// requires that the assignment
|
||||||
|
// operator be defined for array
|
||||||
|
// elements
|
||||||
|
void Set(T t); // sets all elements to t
|
||||||
|
int XSize(); // returns the x size of the array
|
||||||
|
int YSize(); // returns the y size of the array
|
||||||
|
friend ostream &operator<<(ostream &s, const Array2D<T> &t);
|
||||||
|
protected:
|
||||||
|
T &Data(int i); // method derived class can use for
|
||||||
|
// accessing data
|
||||||
|
void Init() {size = 0; data = NULL;}// default intialisor
|
||||||
|
void Init(const Array<T> &t); // implements copy constructor
|
||||||
|
protected:
|
||||||
|
Array<Array<T> > data;
|
||||||
|
};
|
||||||
|
*/
|
||||||
|
#endif
|
||||||
|
|
||||||
|
|
||||||
|
|
16
11/wywolania/Data/stide_v1.1/Utils/cstrings.h
Executable file
16
11/wywolania/Data/stide_v1.1/Utils/cstrings.h
Executable file
@ -0,0 +1,16 @@
|
|||||||
|
/*
|
||||||
|
This class implements strings. It is meant to offer all the functionality
|
||||||
|
of strings in C, so whenever a C function is needed that manipulates strings,
|
||||||
|
it must be coded into this.
|
||||||
|
*/
|
||||||
|
|
||||||
|
class String {
|
||||||
|
public:
|
||||||
|
String(void) {data = NULL; dsize = 0;}
|
||||||
|
String(char *init) {dsize = strlen(init); data = new char[dsize];}
|
||||||
|
~String(void) {if (dsize) delete data;}
|
||||||
|
|
||||||
|
private:
|
||||||
|
char *data;
|
||||||
|
int dsize;
|
||||||
|
};
|
20
11/wywolania/Data/stide_v1.1/Utils/errors.cc
Executable file
20
11/wywolania/Data/stide_v1.1/Utils/errors.cc
Executable file
@ -0,0 +1,20 @@
|
|||||||
|
// **********
|
||||||
|
// ERRORS.CPP
|
||||||
|
// **********
|
||||||
|
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <iostream.h>
|
||||||
|
#include "errors.h"
|
||||||
|
|
||||||
|
void Error(const char *msg, ...) {
|
||||||
|
char buffer[150];
|
||||||
|
va_list ap;
|
||||||
|
va_start(ap, msg);
|
||||||
|
vsprintf(buffer, msg, ap);
|
||||||
|
cout<<endl<<buffer<<endl;
|
||||||
|
va_end(ap);
|
||||||
|
exit(-1);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
16
11/wywolania/Data/stide_v1.1/Utils/errors.h
Executable file
16
11/wywolania/Data/stide_v1.1/Utils/errors.h
Executable file
@ -0,0 +1,16 @@
|
|||||||
|
// ********
|
||||||
|
// ERRORS.H
|
||||||
|
// ********
|
||||||
|
|
||||||
|
#ifndef ERRORS_H
|
||||||
|
#define ERRORS_H
|
||||||
|
|
||||||
|
#include <stdarg.h>
|
||||||
|
#include <assert.h>
|
||||||
|
|
||||||
|
// this function takes a formatted character string and params like printf,
|
||||||
|
// prints a formatted message, and then aborts the program. Its used for
|
||||||
|
// trapping errors and halting execution.
|
||||||
|
void Error(const char *msg, ...);
|
||||||
|
|
||||||
|
#endif
|
182
11/wywolania/Data/stide_v1.1/Utils/hash.cc
Executable file
182
11/wywolania/Data/stide_v1.1/Utils/hash.cc
Executable file
@ -0,0 +1,182 @@
|
|||||||
|
// hash.cpp
|
||||||
|
|
||||||
|
#include "hash.h"
|
||||||
|
|
||||||
|
//===========================================================================
|
||||||
|
HashItem::HashItem(char *s, int v) {
|
||||||
|
if (strlen(s) > STR_LEN) {
|
||||||
|
cout<<endl<<"Hash item string too long";
|
||||||
|
exit(-1);
|
||||||
|
}
|
||||||
|
strcpy(str, s); value = v;
|
||||||
|
}
|
||||||
|
//===========================================================================
|
||||||
|
void HashItem::Set(char *s, int v) {
|
||||||
|
if (strlen(s) > STR_LEN) {
|
||||||
|
cout<<endl<<"Hash item string too long";
|
||||||
|
exit(-1);
|
||||||
|
}
|
||||||
|
strcpy(str, s); value = v;
|
||||||
|
}
|
||||||
|
//===========================================================================
|
||||||
|
void HashTable::Insert(HashItem &h_item) {
|
||||||
|
int hash_index = HashFunc(h_item.str);
|
||||||
|
#ifdef DBG
|
||||||
|
cout<<hash_index;cout.flush();
|
||||||
|
#endif
|
||||||
|
data[hash_index].Insert(h_item);
|
||||||
|
}
|
||||||
|
//===========================================================================
|
||||||
|
int HashTable::Retrieve(HashItem &h_item) {
|
||||||
|
int hash_index = HashFunc(h_item.str);
|
||||||
|
HashItem *temp_item_ptr = data[hash_index].Search(h_item);
|
||||||
|
if (!temp_item_ptr) return 0;
|
||||||
|
else {
|
||||||
|
h_item = *temp_item_ptr;
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
//===========================================================================
|
||||||
|
unsigned HashTable::HashFunc(char *str) {
|
||||||
|
unsigned k = 0;
|
||||||
|
for (int i = 0; i < strlen(str); i++) {
|
||||||
|
k += (unsigned)str[i] << (i * 8);
|
||||||
|
}
|
||||||
|
return (k % data.Size());
|
||||||
|
}
|
||||||
|
//=============================================================================
|
||||||
|
ostream &operator<<(ostream &s, HashTable &ht) {
|
||||||
|
for (int i = 0; i < ht.data.Size(); i++) {
|
||||||
|
if (!ht.data[i].Empty()) {
|
||||||
|
ht.data[i].Write(s);
|
||||||
|
s<<endl;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return s;
|
||||||
|
}
|
||||||
|
//===========================================================================
|
||||||
|
// for int hash tables
|
||||||
|
//===========================================================================
|
||||||
|
HashItemInt &HashItemInt::operator = (const HashItemInt &h_item) {
|
||||||
|
key = h_item.key;
|
||||||
|
value = h_item.value;
|
||||||
|
return *this;
|
||||||
|
}
|
||||||
|
//===========================================================================
|
||||||
|
void HashTableInt::Insert(HashItemInt &h_item) {
|
||||||
|
int hash_index = HashFunc(h_item.key);
|
||||||
|
data[hash_index].Insert(h_item);
|
||||||
|
}
|
||||||
|
//===========================================================================
|
||||||
|
int HashTableInt::Retrieve(HashItemInt &h_item) {
|
||||||
|
int hash_index = HashFunc(h_item.key);
|
||||||
|
HashItemInt *temp_item_ptr;
|
||||||
|
temp_item_ptr = data[hash_index].Search(h_item);
|
||||||
|
if (!temp_item_ptr) return 0;
|
||||||
|
else {
|
||||||
|
h_item = *temp_item_ptr;
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
//===========================================================================
|
||||||
|
unsigned HashTableInt::HashFunc(int key) {
|
||||||
|
return (key % data.Size());
|
||||||
|
}
|
||||||
|
//=============================================================================
|
||||||
|
ostream &operator<<(ostream &s, HashTableInt &ht) {
|
||||||
|
for (int i = 0; i < ht.data.Size(); i++) {
|
||||||
|
if (!ht.data[i].Empty()) {
|
||||||
|
ht.data[i].Write(s);
|
||||||
|
s<<endl;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return s;
|
||||||
|
}
|
||||||
|
//===========================================================================
|
||||||
|
void HashTableInt::PutInArray(Array<HashItemInt> &h_array, int &num_items) {
|
||||||
|
num_items = 0;
|
||||||
|
HashItemInt h_item;
|
||||||
|
for (int i = 0; i < data.Size(); i++) {
|
||||||
|
if (!data[i].Empty()) { // now iterate through the linked list
|
||||||
|
int start = 1;
|
||||||
|
while (data[i].GetNext(h_item, start)) {
|
||||||
|
h_array[num_items].Set(h_item.key, h_item.value);
|
||||||
|
start = 0;
|
||||||
|
num_items++;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
//===========================================================================
|
||||||
|
int HashTableInt::ExtToInt(int key, int next_value)
|
||||||
|
{
|
||||||
|
HashItemInt h_item(key, 0);
|
||||||
|
|
||||||
|
// Check to see if we know this one. 0 matches any number. If we
|
||||||
|
// do know this one, h_item.value gets set to what we knew it to be.
|
||||||
|
if (!Retrieve(h_item)) {
|
||||||
|
h_item.Set(key, next_value);
|
||||||
|
Insert(h_item);
|
||||||
|
}
|
||||||
|
return (h_item.value);
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
// to test out the hash table
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <string.h>
|
||||||
|
#include <fstream.h>
|
||||||
|
|
||||||
|
#define MAX_SYS_CALLS 255
|
||||||
|
//============================================================================
|
||||||
|
int GetCalls(HashTable &ht) {
|
||||||
|
ifstream calls_file("calls.txt");
|
||||||
|
char buff[255];
|
||||||
|
int buff_len;
|
||||||
|
int num_sys_calls = 0;
|
||||||
|
HashItem h_item;
|
||||||
|
while (!calls_file.eof() && num_sys_calls < MAX_SYS_CALLS) {
|
||||||
|
calls_file.getline(buff, 254);
|
||||||
|
buff_len = strlen(buff);
|
||||||
|
if (buff_len) {
|
||||||
|
// cat on a parenth to make sure only calls are matched
|
||||||
|
strcat(buff, "(");
|
||||||
|
#ifdef DBG
|
||||||
|
cout<<endl<<buff; cout.flush();
|
||||||
|
#endif
|
||||||
|
h_item.Set(buff, num_sys_calls);
|
||||||
|
ht.Insert(h_item);
|
||||||
|
num_sys_calls++;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
calls_file.close();
|
||||||
|
if (num_sys_calls == MAX_SYS_CALLS) return 0;
|
||||||
|
else return 1;
|
||||||
|
}
|
||||||
|
//============================================================================
|
||||||
|
void main(void) {
|
||||||
|
HashTable hashtable(701);
|
||||||
|
HashItem h_item("unlink(", 0);
|
||||||
|
if (GetCalls(hashtable)) {
|
||||||
|
cout<<endl<<hashtable;
|
||||||
|
h_item.Set("unlink(", 0);
|
||||||
|
if (hashtable.Retrieve(h_item))
|
||||||
|
cout<<endl<<" unlink found, index = "<<h_item.value;
|
||||||
|
else cout<<endl<<" unlink not found";
|
||||||
|
h_item.Set("get_kernel_syms(", 0);
|
||||||
|
if (hashtable.Retrieve(h_item))
|
||||||
|
cout<<endl<<" get_kernel_syms found, index = "<<h_item.value;
|
||||||
|
else cout<<endl<<" get_kernel_syms not found";
|
||||||
|
h_item.Set("hello(", 0);
|
||||||
|
if (hashtable.Retrieve(h_item))
|
||||||
|
cout<<endl<<" hello found, index = "<<h_item.value;
|
||||||
|
else cout<<endl<<" hello not found";
|
||||||
|
h_item.Set("setsockopt(", 0);
|
||||||
|
if (hashtable.Retrieve(h_item))
|
||||||
|
cout<<endl<<" setsockopt found, index = "<<h_item.value;
|
||||||
|
else cout<<endl<<" setsockopt not found";
|
||||||
|
}
|
||||||
|
}
|
||||||
|
*/
|
||||||
|
|
||||||
|
|
71
11/wywolania/Data/stide_v1.1/Utils/hash.h
Executable file
71
11/wywolania/Data/stide_v1.1/Utils/hash.h
Executable file
@ -0,0 +1,71 @@
|
|||||||
|
#ifndef __HASH_H
|
||||||
|
#define __HASH_H
|
||||||
|
|
||||||
|
#define STR_LEN 100
|
||||||
|
#include <string.h>
|
||||||
|
#include "arrays.h"
|
||||||
|
#include "tll.h"
|
||||||
|
|
||||||
|
//#define DBG
|
||||||
|
|
||||||
|
class HashItem {
|
||||||
|
public:
|
||||||
|
HashItem(void) {strcpy(str, ""); value = 0;}
|
||||||
|
HashItem(char *s, int v);
|
||||||
|
void Set(char *s, int v);
|
||||||
|
int operator == (const HashItem &h_item) {return !strcmp(str, h_item.str);}
|
||||||
|
friend ostream &operator<<(ostream &s, HashItem &h_item) {
|
||||||
|
s<<h_item.str<<":"<<h_item.value; return s;}
|
||||||
|
int value;
|
||||||
|
char str[STR_LEN];
|
||||||
|
};
|
||||||
|
|
||||||
|
class HashTable {
|
||||||
|
public:
|
||||||
|
HashTable(int size) {data.Allocate(size);}
|
||||||
|
void Insert(HashItem &h_item); // we insert a complete item, i.e. the
|
||||||
|
// str and its assoc.
|
||||||
|
int Retrieve(HashItem &h_item); // returns 0 if item is not found
|
||||||
|
// we retrieve a complete item, the value
|
||||||
|
// of assoc is not specified beforehand,
|
||||||
|
// and is returned in h_item
|
||||||
|
friend ostream &operator<<(ostream &s, HashTable &ht);
|
||||||
|
private:
|
||||||
|
Array<LinkedList<HashItem> > data;
|
||||||
|
unsigned HashFunc(char *str);
|
||||||
|
};
|
||||||
|
|
||||||
|
// these store ints, not strings
|
||||||
|
class HashItemInt {
|
||||||
|
public:
|
||||||
|
HashItemInt(void) {key = 0; value = 0;}
|
||||||
|
HashItemInt(int k, int v) {key = k; value = v;}
|
||||||
|
// the copy constructor
|
||||||
|
HashItemInt(const HashItemInt &h_item) {key = h_item.key; value = h_item.value;}
|
||||||
|
void Set(int k, int v) {key = k; value = v;}
|
||||||
|
int operator == (const HashItemInt &h_item)
|
||||||
|
{return ((key == h_item.key) ? 1 : 0);}
|
||||||
|
HashItemInt &operator = (const HashItemInt &h_item);
|
||||||
|
friend ostream &operator<<(ostream &s, HashItemInt &h_item) {
|
||||||
|
s<<h_item.key<<":"<<h_item.value; return s;}
|
||||||
|
int value, key;
|
||||||
|
};
|
||||||
|
|
||||||
|
class HashTableInt {
|
||||||
|
public:
|
||||||
|
HashTableInt(int size) {data.Allocate(size);}
|
||||||
|
void Insert(HashItemInt &h_item); // we insert a complete item, i.e. the
|
||||||
|
// str and its assoc.
|
||||||
|
int Retrieve(HashItemInt &h_item); // returns 0 if item is not found
|
||||||
|
// we retrieve a complete item, the value
|
||||||
|
// of assoc is not specified beforehand,
|
||||||
|
// and is returned in h_item
|
||||||
|
friend ostream &operator<<(ostream &s, HashTableInt &ht);
|
||||||
|
void PutInArray(Array<HashItemInt> &h_array, int &num_items); // puts it into a linear array
|
||||||
|
int ExtToInt(int key, int next_value);
|
||||||
|
private:
|
||||||
|
Array<LinkedList<HashItemInt> > data;
|
||||||
|
unsigned HashFunc(int key);
|
||||||
|
};
|
||||||
|
|
||||||
|
#endif
|
87
11/wywolania/Data/stide_v1.1/Utils/krand.cc
Executable file
87
11/wywolania/Data/stide_v1.1/Utils/krand.cc
Executable file
@ -0,0 +1,87 @@
|
|||||||
|
#include <stdio.h>
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <iostream.h>
|
||||||
|
#include "krand.h"
|
||||||
|
|
||||||
|
#include <time.h>
|
||||||
|
|
||||||
|
#define MBIG 1000000000L
|
||||||
|
#define MSEED 161803398L
|
||||||
|
#define FAC (1.0 / MBIG)
|
||||||
|
|
||||||
|
static int inext;
|
||||||
|
static int inextp;
|
||||||
|
static long ma[56];
|
||||||
|
|
||||||
|
double knuth_random(void) {
|
||||||
|
long mj;
|
||||||
|
if (++inext == 56) inext = 1;
|
||||||
|
if (++inextp == 56) inextp = 1;
|
||||||
|
mj = ma[inext] - ma[inextp];
|
||||||
|
if (mj < 0) mj += MBIG;
|
||||||
|
ma[inext] = mj;
|
||||||
|
return mj * FAC;
|
||||||
|
}
|
||||||
|
|
||||||
|
long seed_random(long seed) {
|
||||||
|
long mj, mk;
|
||||||
|
register int i, k;
|
||||||
|
|
||||||
|
if (seed < 0) {
|
||||||
|
time_t tp;
|
||||||
|
seed = time(&tp);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (seed >= MBIG) {
|
||||||
|
cerr<<"Seed value too big (> "<<MBIG<<") in knuth_srand().";
|
||||||
|
exit(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
ma[55] = mj = seed;
|
||||||
|
mk = 1;
|
||||||
|
|
||||||
|
for (i = 1; i <= 54; i++) {
|
||||||
|
register int ii = (21 * i) % 55;
|
||||||
|
ma[ii] = mk;
|
||||||
|
mk = mj - mk;
|
||||||
|
if (mk < 0) mk += MBIG;
|
||||||
|
mj = ma[ii];
|
||||||
|
}
|
||||||
|
|
||||||
|
for (k = 0; k < 4; k++) {
|
||||||
|
for (i = 1; i <= 55; i++) {
|
||||||
|
ma[i] -= ma[1 + (i + 30) % 55];
|
||||||
|
if (ma[i] < 0) ma[i] += MBIG;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
inext = 0;
|
||||||
|
inextp = 31;
|
||||||
|
|
||||||
|
return seed;
|
||||||
|
}
|
||||||
|
|
||||||
|
int krandom(int max) {
|
||||||
|
int retval = (int)(knuth_random() * max);
|
||||||
|
if (retval < 0 || retval >= max) {
|
||||||
|
cout<<"ERROR: random num generator out of bounds!"<<endl;
|
||||||
|
exit(-1);
|
||||||
|
}
|
||||||
|
return retval;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*
|
||||||
|
//for testing
|
||||||
|
void main(void) {
|
||||||
|
seed_random(100);
|
||||||
|
for (int i = 0; i < 24; i++) cout<<i<<" "<<knuth_random()<<" "<<krandom(100)<<endl;
|
||||||
|
cin.get();
|
||||||
|
seed_random(200);
|
||||||
|
for (i = 0; i < 24; i++) cout<<i<<" "<<knuth_random()<<" "<<krandom(100)<<endl;
|
||||||
|
cin.get();
|
||||||
|
seed_random(100);
|
||||||
|
for (i = 0; i < 24; i++) cout<<i<<" "<<knuth_random()<<" "<<krandom(100)<<endl;
|
||||||
|
cin.get();
|
||||||
|
}
|
||||||
|
*/
|
13
11/wywolania/Data/stide_v1.1/Utils/krand.h
Executable file
13
11/wywolania/Data/stide_v1.1/Utils/krand.h
Executable file
@ -0,0 +1,13 @@
|
|||||||
|
#ifndef __KRAND_H
|
||||||
|
#define __KRAND_H
|
||||||
|
/*
|
||||||
|
knuth-random.h
|
||||||
|
declarations for krand.cc
|
||||||
|
*/
|
||||||
|
|
||||||
|
double knuth_random(void);
|
||||||
|
long seed_random(long);
|
||||||
|
int krandom(int);
|
||||||
|
|
||||||
|
#endif
|
||||||
|
|
132
11/wywolania/Data/stide_v1.1/Utils/linklist.cc
Executable file
132
11/wywolania/Data/stide_v1.1/Utils/linklist.cc
Executable file
@ -0,0 +1,132 @@
|
|||||||
|
// linklist.cpp
|
||||||
|
|
||||||
|
#include "linklist.h"
|
||||||
|
|
||||||
|
// data structures:
|
||||||
|
// node for a linked list
|
||||||
|
class LLNode {
|
||||||
|
public:
|
||||||
|
int val; // the value at this node
|
||||||
|
LLNode *next; // pointer to the next node
|
||||||
|
};
|
||||||
|
//===========================================================================
|
||||||
|
LinkedList::LinkedList(void) {
|
||||||
|
root = NULL;
|
||||||
|
}
|
||||||
|
//===========================================================================
|
||||||
|
LinkedList::LinkedList(const LinkedList &llist) { // the copy constructor
|
||||||
|
root = NULL;
|
||||||
|
LLNode *temp_ptr = llist.root;
|
||||||
|
while (temp_ptr) {
|
||||||
|
Insert(temp_ptr->val);
|
||||||
|
temp_ptr = temp_ptr->next;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
//===========================================================================
|
||||||
|
LinkedList &LinkedList::operator = (const LinkedList &llist) {
|
||||||
|
root = NULL;
|
||||||
|
LLNode *temp_ptr = llist.root;
|
||||||
|
while (temp_ptr) {
|
||||||
|
Insert(temp_ptr->val);
|
||||||
|
temp_ptr = temp_ptr->next;
|
||||||
|
}
|
||||||
|
return *this;
|
||||||
|
}
|
||||||
|
//============================================================================
|
||||||
|
LinkedList::~LinkedList(void) {
|
||||||
|
if (root) {
|
||||||
|
LLNode *temp_ptr = root->next, *next_temp_ptr;
|
||||||
|
delete root;
|
||||||
|
while (temp_ptr) {
|
||||||
|
next_temp_ptr = temp_ptr->next;
|
||||||
|
delete temp_ptr;
|
||||||
|
temp_ptr = next_temp_ptr;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
//===========================================================================
|
||||||
|
// returns 1 if there was no copy, 0 otherwise
|
||||||
|
int LinkedList::Insert(int val) {
|
||||||
|
if (!root) {
|
||||||
|
root = new LLNode;
|
||||||
|
root->val = val;
|
||||||
|
root->next = NULL;
|
||||||
|
return 1;
|
||||||
|
} else {
|
||||||
|
if (!Search(val)) { // only put in if it is not already in - this is ineff.
|
||||||
|
LLNode temp_node;
|
||||||
|
temp_node.val = root->val;
|
||||||
|
temp_node.next = root->next;
|
||||||
|
root->val = val;
|
||||||
|
root->next = new LLNode;
|
||||||
|
root->next->val = temp_node.val;
|
||||||
|
root->next->next = temp_node.next;
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
//===========================================================================
|
||||||
|
int LinkedList::Search(int val) {
|
||||||
|
LLNode *curr_ptr = root;
|
||||||
|
while (curr_ptr) {
|
||||||
|
if (curr_ptr->val == val) return 1;
|
||||||
|
else curr_ptr = curr_ptr->next;
|
||||||
|
}
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
//===========================================================================
|
||||||
|
void LinkedList::Write(ostream &s) {
|
||||||
|
LLNode *curr_ptr = root;
|
||||||
|
while (curr_ptr) {
|
||||||
|
s<<curr_ptr->val<<" ";
|
||||||
|
curr_ptr = curr_ptr->next;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
//=============================================================================
|
||||||
|
ostream &operator<<(ostream &s, LinkedList &ll) {
|
||||||
|
ll.Write(s);
|
||||||
|
return s;
|
||||||
|
}
|
||||||
|
//===========================================================================
|
||||||
|
|
||||||
|
/*
|
||||||
|
// this is for testing the linked list
|
||||||
|
// something similar should be done for all data structures
|
||||||
|
// using the Test func on the base class, we can test all descendants with
|
||||||
|
// those methods
|
||||||
|
#include "arrays.h"
|
||||||
|
|
||||||
|
void Test(LinkedList &list) {
|
||||||
|
list.Insert(10);
|
||||||
|
list.Insert(5);
|
||||||
|
list.Insert(3);
|
||||||
|
list.Insert(5);
|
||||||
|
list.Insert(7);
|
||||||
|
list.Insert(26);
|
||||||
|
list.Insert(13);
|
||||||
|
list.Insert(26);
|
||||||
|
cout<<endl;
|
||||||
|
list.Write(cout);
|
||||||
|
cout<<endl<<list.Search(5);
|
||||||
|
cout<<endl<<list.Search(0);
|
||||||
|
cout<<endl<<list.Search(13);
|
||||||
|
}
|
||||||
|
|
||||||
|
void TestArray(Array<LinkedList> &larray) {
|
||||||
|
for (int i = 0; i < larray.Size(); i++)
|
||||||
|
Test(larray[i]);
|
||||||
|
}
|
||||||
|
|
||||||
|
#include <fstream.h>
|
||||||
|
|
||||||
|
void main(void) {
|
||||||
|
ifstream inf;
|
||||||
|
Array<LinkedList> ll(200);
|
||||||
|
TestArray(ll);
|
||||||
|
ll[0].Insert(999);
|
||||||
|
cout<<endl<<ll[0];
|
||||||
|
ll[3] = ll[0];
|
||||||
|
cout<<endl<<ll[3]<<" = "<<ll[0];
|
||||||
|
}
|
||||||
|
*/
|
22
11/wywolania/Data/stide_v1.1/Utils/linklist.h
Executable file
22
11/wywolania/Data/stide_v1.1/Utils/linklist.h
Executable file
@ -0,0 +1,22 @@
|
|||||||
|
// linklist.h
|
||||||
|
#include <iostream.h>
|
||||||
|
#include "list.h"
|
||||||
|
|
||||||
|
class LLNode;
|
||||||
|
class LinkedList : public List {
|
||||||
|
LLNode *root;
|
||||||
|
public:
|
||||||
|
LinkedList(void);
|
||||||
|
~LinkedList(void);
|
||||||
|
LinkedList(const LinkedList &llist); // the copy constructor
|
||||||
|
LinkedList &operator = (const LinkedList &llist);
|
||||||
|
int Insert(int val); // this does not insert if val already exists
|
||||||
|
// returns 1 if it could insert
|
||||||
|
// could later on also do frequency counts
|
||||||
|
int Search(int val);
|
||||||
|
void Write(ostream &s);
|
||||||
|
friend ostream &operator<<(ostream &s, LinkedList &ll);
|
||||||
|
int Empty(void) {return (root ? 0 : 1);}
|
||||||
|
};
|
||||||
|
|
||||||
|
//typedef LinkedList *LinkedListPtr;
|
12
11/wywolania/Data/stide_v1.1/Utils/list.h
Executable file
12
11/wywolania/Data/stide_v1.1/Utils/list.h
Executable file
@ -0,0 +1,12 @@
|
|||||||
|
// list.h
|
||||||
|
|
||||||
|
class List {
|
||||||
|
public:
|
||||||
|
virtual int Insert(int val) {return 0;}
|
||||||
|
virtual int Search(int val) {return 0;}
|
||||||
|
virtual void Write(ostream &s) {;}
|
||||||
|
virtual int Empty(void) {return 0;}
|
||||||
|
};
|
||||||
|
|
||||||
|
//typedef List *ListPtr;
|
||||||
|
|
71
11/wywolania/Data/stide_v1.1/Utils/random.cc
Executable file
71
11/wywolania/Data/stide_v1.1/Utils/random.cc
Executable file
@ -0,0 +1,71 @@
|
|||||||
|
#include <string.h>
|
||||||
|
#include <math.h>
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include "random.h"
|
||||||
|
|
||||||
|
#define PCRAND
|
||||||
|
#ifdef PCRAND
|
||||||
|
#define RANDOM_MAX RAND_MAX
|
||||||
|
#else
|
||||||
|
#define RANDOM_MAX pow(2, 31)-1
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/* this random function returns 1 if a random toss is within pfactor, 0<pfactor<1*/
|
||||||
|
int Probability(float pfactor)
|
||||||
|
{
|
||||||
|
return (pfactor>Random1());
|
||||||
|
/* if a random value from 0 to 1 is less than pfactor, return true*/
|
||||||
|
}
|
||||||
|
/* -----------------------------------------------------------------------------------------------------------------*/
|
||||||
|
unsigned Random(unsigned num) /* returns a random word between 0 and num-1*/
|
||||||
|
{
|
||||||
|
float ratio, temp;
|
||||||
|
|
||||||
|
ratio=num;
|
||||||
|
temp=RANDOM_MAX;
|
||||||
|
ratio=ratio/temp;
|
||||||
|
#ifdef PCRAND
|
||||||
|
temp=rand();
|
||||||
|
#else
|
||||||
|
temp=random();
|
||||||
|
#endif
|
||||||
|
if (temp*ratio>num-1) return num-1;
|
||||||
|
else return (temp*ratio);
|
||||||
|
}
|
||||||
|
/* ------------------------------------------------------------------------------------------------------------*/
|
||||||
|
/* returns a value between 0 and 1*/
|
||||||
|
float Random1(void)
|
||||||
|
{
|
||||||
|
float ratio,temp;
|
||||||
|
|
||||||
|
#ifdef PCRAND
|
||||||
|
ratio=rand();
|
||||||
|
#else
|
||||||
|
ratio=random();
|
||||||
|
#endif
|
||||||
|
temp=RANDOM_MAX;
|
||||||
|
ratio=ratio/temp;
|
||||||
|
if (ratio<=0) ratio=0.0001;
|
||||||
|
if (ratio>=0.9999) ratio=0.9999;
|
||||||
|
return ratio;
|
||||||
|
}
|
||||||
|
/* -------------------------------------------------------------------------------------------------------------------*/
|
||||||
|
/* initializes random generator*/
|
||||||
|
void InitRandom(int seed)
|
||||||
|
{
|
||||||
|
#ifdef PCRAND
|
||||||
|
srand(seed);
|
||||||
|
#else
|
||||||
|
static char state[64];
|
||||||
|
|
||||||
|
initstate(seed,state,64);
|
||||||
|
setstate(state);
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
13
11/wywolania/Data/stide_v1.1/Utils/random.h
Executable file
13
11/wywolania/Data/stide_v1.1/Utils/random.h
Executable file
@ -0,0 +1,13 @@
|
|||||||
|
#ifndef __RANDOM_H
|
||||||
|
#define __RANDOM_H
|
||||||
|
|
||||||
|
/* these are routines to generate random nos in commonly used formats. These routines all
|
||||||
|
use the random function and so are very random !
|
||||||
|
*/
|
||||||
|
|
||||||
|
void InitRandom(int seed); /* initializes the random system to seed - uses internal state buffers*/
|
||||||
|
int Probability(float pfactor); /* returns 1 if a random toss is within pfactor, 0 otherwise*/
|
||||||
|
unsigned Random(unsigned num); /* returns an unsigned from 0 to num-1*/
|
||||||
|
float Random1(void); /* returns a random floating pt between 0 and 1, i.e over interval (0,1)*/
|
||||||
|
|
||||||
|
#endif
|
27
11/wywolania/Data/stide_v1.1/Utils/tlist.h
Executable file
27
11/wywolania/Data/stide_v1.1/Utils/tlist.h
Executable file
@ -0,0 +1,27 @@
|
|||||||
|
#ifndef __TLIST_H
|
||||||
|
#define __TLIST_H
|
||||||
|
// tlist.h
|
||||||
|
// this is a base template class for lists
|
||||||
|
// it is for a list of elements. An element can be of any class, but it must
|
||||||
|
// have the operators == and = defined, so that the list can be searched
|
||||||
|
// also the operator >> must be defined for write
|
||||||
|
|
||||||
|
template <class Elem> class List {
|
||||||
|
public:
|
||||||
|
virtual Elem *Insert(const Elem &elem) {return NULL;}
|
||||||
|
// insert elem into the list
|
||||||
|
// returns a ptr to elem if elem inserted, NULL if elem was already there
|
||||||
|
// i.e. doesn't put in duplicates and returns NULL for duplicates
|
||||||
|
virtual Elem *Search(const Elem &elem) {return NULL;}
|
||||||
|
// finds the element that matchs elem and returns it. This allows assoc
|
||||||
|
// retrieval
|
||||||
|
// returns NULL if the elem is not found
|
||||||
|
virtual void Write(ostream &s) {;}
|
||||||
|
// writes out the list of elements. Requires that the element overload the
|
||||||
|
// stream output operator
|
||||||
|
virtual int Empty(void) {return 0;}
|
||||||
|
// returns true if the list is empty
|
||||||
|
};
|
||||||
|
|
||||||
|
#endif
|
||||||
|
|
136
11/wywolania/Data/stide_v1.1/Utils/tll.cc
Executable file
136
11/wywolania/Data/stide_v1.1/Utils/tll.cc
Executable file
@ -0,0 +1,136 @@
|
|||||||
|
// tll.cpp
|
||||||
|
|
||||||
|
#include "tll.h"
|
||||||
|
|
||||||
|
// data structures:
|
||||||
|
// node for a linked list
|
||||||
|
template <class Elem> class LLNode {
|
||||||
|
public:
|
||||||
|
Elem elem; // the element at this node
|
||||||
|
LLNode<Elem> *next; // pointer to the next node
|
||||||
|
};
|
||||||
|
//===========================================================================
|
||||||
|
template <class Elem> LinkedList<Elem>::LinkedList(void) {
|
||||||
|
root = NULL;
|
||||||
|
length = 0;
|
||||||
|
}
|
||||||
|
//===========================================================================
|
||||||
|
template <class Elem> LinkedList<Elem>::LinkedList(const LinkedList<Elem> &llist) {
|
||||||
|
root = NULL;
|
||||||
|
LLNode<Elem> *temp_ptr = llist.root;
|
||||||
|
while (temp_ptr) {
|
||||||
|
Insert(temp_ptr->elem);
|
||||||
|
temp_ptr = temp_ptr->next;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
//===========================================================================
|
||||||
|
template <class Elem> LinkedList<Elem> &LinkedList<Elem>::operator = (
|
||||||
|
const LinkedList<Elem> &llist) {
|
||||||
|
root = NULL;
|
||||||
|
LLNode<Elem> *temp_ptr = llist.root;
|
||||||
|
while (temp_ptr) {
|
||||||
|
Insert(temp_ptr->elem);
|
||||||
|
temp_ptr = temp_ptr->next;
|
||||||
|
}
|
||||||
|
return *this;
|
||||||
|
}
|
||||||
|
//============================================================================
|
||||||
|
template <class Elem> LinkedList<Elem>::~LinkedList(void) {
|
||||||
|
if (root) {
|
||||||
|
LLNode<Elem> *temp_ptr = root->next, *next_temp_ptr;
|
||||||
|
delete root;
|
||||||
|
while (temp_ptr) {
|
||||||
|
next_temp_ptr = temp_ptr->next;
|
||||||
|
delete temp_ptr;
|
||||||
|
temp_ptr = next_temp_ptr;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
//============================================================================
|
||||||
|
template <class Elem> void LinkedList<Elem>::Clear(void) {
|
||||||
|
if (root) {
|
||||||
|
LLNode<Elem> *temp_ptr = root->next, *next_temp_ptr;
|
||||||
|
delete root;
|
||||||
|
while (temp_ptr) {
|
||||||
|
next_temp_ptr = temp_ptr->next;
|
||||||
|
delete temp_ptr;
|
||||||
|
temp_ptr = next_temp_ptr;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
root = NULL;
|
||||||
|
length = 0;
|
||||||
|
}
|
||||||
|
//===========================================================================
|
||||||
|
template <class Elem> Elem *LinkedList<Elem>::Insert(const Elem &elem) {
|
||||||
|
if (!root) {
|
||||||
|
root = new LLNode<Elem>;
|
||||||
|
root->elem = elem;
|
||||||
|
root->next = NULL;
|
||||||
|
length++;
|
||||||
|
return &(root->elem);
|
||||||
|
} else {
|
||||||
|
if (!Search(elem)) { // only put in if it is not already in - this is ineff.
|
||||||
|
LLNode<Elem> temp_node;
|
||||||
|
temp_node.elem = root->elem;
|
||||||
|
temp_node.next = root->next;
|
||||||
|
root->elem = elem;
|
||||||
|
root->next = new LLNode<Elem>;
|
||||||
|
root->next->elem = temp_node.elem;
|
||||||
|
root->next->next = temp_node.next;
|
||||||
|
length++;
|
||||||
|
return &(root->elem);
|
||||||
|
} else { // put the elem back in the same place
|
||||||
|
root->elem = elem;
|
||||||
|
return &(root->elem);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return NULL;
|
||||||
|
}
|
||||||
|
//===========================================================================
|
||||||
|
template <class Elem> Elem *LinkedList<Elem>::Search(const Elem &elem) {
|
||||||
|
LLNode<Elem> *curr_ptr = root;
|
||||||
|
while (curr_ptr) {
|
||||||
|
if (curr_ptr->elem == elem)
|
||||||
|
return &(curr_ptr->elem);
|
||||||
|
// this is very important, because they may not be completely the same,
|
||||||
|
// since the comparison could be done on a key only
|
||||||
|
else curr_ptr = curr_ptr->next;
|
||||||
|
}
|
||||||
|
return NULL;
|
||||||
|
}
|
||||||
|
//===========================================================================
|
||||||
|
template <class Elem> void LinkedList<Elem>::Write(ostream &s) {
|
||||||
|
LLNode<Elem> *curr_ptr = root;
|
||||||
|
while (curr_ptr) {
|
||||||
|
s<<(curr_ptr->elem)<<" ";
|
||||||
|
curr_ptr = curr_ptr->next;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
//=============================================================================
|
||||||
|
template <class Elem> ostream &operator<<(ostream &s, LinkedList<Elem> &ll) {
|
||||||
|
ll.Write(s);
|
||||||
|
return s;
|
||||||
|
}
|
||||||
|
//===========================================================================
|
||||||
|
template <class Elem> int LinkedList<Elem>::DeleteNext(Elem &elem) {
|
||||||
|
if (!root) return 0;
|
||||||
|
elem = root->elem;
|
||||||
|
LLNode<Elem> *kill_ptr = root;
|
||||||
|
root = root->next;
|
||||||
|
delete kill_ptr;
|
||||||
|
length--;
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
//===========================================================================
|
||||||
|
template <class Elem> int LinkedList<Elem>::GetNext(Elem &elem, int start) {
|
||||||
|
if (start) get_next_ptr = root;
|
||||||
|
if (get_next_ptr) {
|
||||||
|
elem = get_next_ptr->elem;
|
||||||
|
get_next_ptr = get_next_ptr->next;
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
42
11/wywolania/Data/stide_v1.1/Utils/tll.h
Executable file
42
11/wywolania/Data/stide_v1.1/Utils/tll.h
Executable file
@ -0,0 +1,42 @@
|
|||||||
|
#ifndef __TLL_H
|
||||||
|
#define __TLL_H
|
||||||
|
|
||||||
|
// tll.h
|
||||||
|
|
||||||
|
/* this implements a template class linklist, descended from tlist.h
|
||||||
|
one can create an assoc array out of this by creating an elem class in
|
||||||
|
which the comparison operator depends on the key alone. Then search will
|
||||||
|
return the full elem and one can check the associated vaule to the key
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include <iostream.h>
|
||||||
|
#include "tlist.h"
|
||||||
|
|
||||||
|
|
||||||
|
template <class Elem> class LLNode;
|
||||||
|
template <class Elem> class LinkedList : public List<Elem> {
|
||||||
|
LLNode<Elem> *root;
|
||||||
|
public:
|
||||||
|
LinkedList(void);
|
||||||
|
~LinkedList(void);
|
||||||
|
LinkedList(const LinkedList<Elem> &llist); // the copy constructor
|
||||||
|
LinkedList &operator = (const LinkedList<Elem> &llist);
|
||||||
|
Elem *Insert(const Elem &elem); // this does not insert if val already exists
|
||||||
|
// returns ptr to elem in list if it could insert
|
||||||
|
void Clear(void);
|
||||||
|
int DeleteNext(Elem &elem); // deletes first elem in list and returns it
|
||||||
|
int GetNext(Elem &elem, int start); // returns the next element in the list, if start is set then returns
|
||||||
|
// the first one, returns 0 if the list is now empty
|
||||||
|
Elem *Search(const Elem &elem); // assumes the == operator defined on elem
|
||||||
|
void Write(ostream &s);
|
||||||
|
friend ostream &operator<<(ostream &s, LinkedList<Elem> &ll);
|
||||||
|
int Empty(void) {return (root ? 0 : 1);}
|
||||||
|
int Size(void) {return length;}
|
||||||
|
private:
|
||||||
|
int length;
|
||||||
|
LLNode<Elem> *get_next_ptr; // because the next one is ongoing
|
||||||
|
|
||||||
|
};
|
||||||
|
|
||||||
|
#endif
|
||||||
|
|
339
11/wywolania/Data/stide_v1.2/COPYING
Normal file
339
11/wywolania/Data/stide_v1.2/COPYING
Normal file
@ -0,0 +1,339 @@
|
|||||||
|
GNU GENERAL PUBLIC LICENSE
|
||||||
|
Version 2, June 1991
|
||||||
|
|
||||||
|
Copyright (C) 1989, 1991 Free Software Foundation, Inc.
|
||||||
|
675 Mass Ave, Cambridge, MA 02139, USA
|
||||||
|
Everyone is permitted to copy and distribute verbatim copies
|
||||||
|
of this license document, but changing it is not allowed.
|
||||||
|
|
||||||
|
Preamble
|
||||||
|
|
||||||
|
The licenses for most software are designed to take away your
|
||||||
|
freedom to share and change it. By contrast, the GNU General Public
|
||||||
|
License is intended to guarantee your freedom to share and change free
|
||||||
|
software--to make sure the software is free for all its users. This
|
||||||
|
General Public License applies to most of the Free Software
|
||||||
|
Foundation's software and to any other program whose authors commit to
|
||||||
|
using it. (Some other Free Software Foundation software is covered by
|
||||||
|
the GNU Library General Public License instead.) You can apply it to
|
||||||
|
your programs, too.
|
||||||
|
|
||||||
|
When we speak of free software, we are referring to freedom, not
|
||||||
|
price. Our General Public Licenses are designed to make sure that you
|
||||||
|
have the freedom to distribute copies of free software (and charge for
|
||||||
|
this service if you wish), that you receive source code or can get it
|
||||||
|
if you want it, that you can change the software or use pieces of it
|
||||||
|
in new free programs; and that you know you can do these things.
|
||||||
|
|
||||||
|
To protect your rights, we need to make restrictions that forbid
|
||||||
|
anyone to deny you these rights or to ask you to surrender the rights.
|
||||||
|
These restrictions translate to certain responsibilities for you if you
|
||||||
|
distribute copies of the software, or if you modify it.
|
||||||
|
|
||||||
|
For example, if you distribute copies of such a program, whether
|
||||||
|
gratis or for a fee, you must give the recipients all the rights that
|
||||||
|
you have. You must make sure that they, too, receive or can get the
|
||||||
|
source code. And you must show them these terms so they know their
|
||||||
|
rights.
|
||||||
|
|
||||||
|
We protect your rights with two steps: (1) copyright the software, and
|
||||||
|
(2) offer you this license which gives you legal permission to copy,
|
||||||
|
distribute and/or modify the software.
|
||||||
|
|
||||||
|
Also, for each author's protection and ours, we want to make certain
|
||||||
|
that everyone understands that there is no warranty for this free
|
||||||
|
software. If the software is modified by someone else and passed on, we
|
||||||
|
want its recipients to know that what they have is not the original, so
|
||||||
|
that any problems introduced by others will not reflect on the original
|
||||||
|
authors' reputations.
|
||||||
|
|
||||||
|
Finally, any free program is threatened constantly by software
|
||||||
|
patents. We wish to avoid the danger that redistributors of a free
|
||||||
|
program will individually obtain patent licenses, in effect making the
|
||||||
|
program proprietary. To prevent this, we have made it clear that any
|
||||||
|
patent must be licensed for everyone's free use or not licensed at all.
|
||||||
|
|
||||||
|
The precise terms and conditions for copying, distribution and
|
||||||
|
modification follow.
|
||||||
|
|
||||||
|
GNU GENERAL PUBLIC LICENSE
|
||||||
|
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
|
||||||
|
|
||||||
|
0. This License applies to any program or other work which contains
|
||||||
|
a notice placed by the copyright holder saying it may be distributed
|
||||||
|
under the terms of this General Public License. The "Program", below,
|
||||||
|
refers to any such program or work, and a "work based on the Program"
|
||||||
|
means either the Program or any derivative work under copyright law:
|
||||||
|
that is to say, a work containing the Program or a portion of it,
|
||||||
|
either verbatim or with modifications and/or translated into another
|
||||||
|
language. (Hereinafter, translation is included without limitation in
|
||||||
|
the term "modification".) Each licensee is addressed as "you".
|
||||||
|
|
||||||
|
Activities other than copying, distribution and modification are not
|
||||||
|
covered by this License; they are outside its scope. The act of
|
||||||
|
running the Program is not restricted, and the output from the Program
|
||||||
|
is covered only if its contents constitute a work based on the
|
||||||
|
Program (independent of having been made by running the Program).
|
||||||
|
Whether that is true depends on what the Program does.
|
||||||
|
|
||||||
|
1. You may copy and distribute verbatim copies of the Program's
|
||||||
|
source code as you receive it, in any medium, provided that you
|
||||||
|
conspicuously and appropriately publish on each copy an appropriate
|
||||||
|
copyright notice and disclaimer of warranty; keep intact all the
|
||||||
|
notices that refer to this License and to the absence of any warranty;
|
||||||
|
and give any other recipients of the Program a copy of this License
|
||||||
|
along with the Program.
|
||||||
|
|
||||||
|
You may charge a fee for the physical act of transferring a copy, and
|
||||||
|
you may at your option offer warranty protection in exchange for a fee.
|
||||||
|
|
||||||
|
2. You may modify your copy or copies of the Program or any portion
|
||||||
|
of it, thus forming a work based on the Program, and copy and
|
||||||
|
distribute such modifications or work under the terms of Section 1
|
||||||
|
above, provided that you also meet all of these conditions:
|
||||||
|
|
||||||
|
a) You must cause the modified files to carry prominent notices
|
||||||
|
stating that you changed the files and the date of any change.
|
||||||
|
|
||||||
|
b) You must cause any work that you distribute or publish, that in
|
||||||
|
whole or in part contains or is derived from the Program or any
|
||||||
|
part thereof, to be licensed as a whole at no charge to all third
|
||||||
|
parties under the terms of this License.
|
||||||
|
|
||||||
|
c) If the modified program normally reads commands interactively
|
||||||
|
when run, you must cause it, when started running for such
|
||||||
|
interactive use in the most ordinary way, to print or display an
|
||||||
|
announcement including an appropriate copyright notice and a
|
||||||
|
notice that there is no warranty (or else, saying that you provide
|
||||||
|
a warranty) and that users may redistribute the program under
|
||||||
|
these conditions, and telling the user how to view a copy of this
|
||||||
|
License. (Exception: if the Program itself is interactive but
|
||||||
|
does not normally print such an announcement, your work based on
|
||||||
|
the Program is not required to print an announcement.)
|
||||||
|
|
||||||
|
These requirements apply to the modified work as a whole. If
|
||||||
|
identifiable sections of that work are not derived from the Program,
|
||||||
|
and can be reasonably considered independent and separate works in
|
||||||
|
themselves, then this License, and its terms, do not apply to those
|
||||||
|
sections when you distribute them as separate works. But when you
|
||||||
|
distribute the same sections as part of a whole which is a work based
|
||||||
|
on the Program, the distribution of the whole must be on the terms of
|
||||||
|
this License, whose permissions for other licensees extend to the
|
||||||
|
entire whole, and thus to each and every part regardless of who wrote it.
|
||||||
|
|
||||||
|
Thus, it is not the intent of this section to claim rights or contest
|
||||||
|
your rights to work written entirely by you; rather, the intent is to
|
||||||
|
exercise the right to control the distribution of derivative or
|
||||||
|
collective works based on the Program.
|
||||||
|
|
||||||
|
In addition, mere aggregation of another work not based on the Program
|
||||||
|
with the Program (or with a work based on the Program) on a volume of
|
||||||
|
a storage or distribution medium does not bring the other work under
|
||||||
|
the scope of this License.
|
||||||
|
|
||||||
|
3. You may copy and distribute the Program (or a work based on it,
|
||||||
|
under Section 2) in object code or executable form under the terms of
|
||||||
|
Sections 1 and 2 above provided that you also do one of the following:
|
||||||
|
|
||||||
|
a) Accompany it with the complete corresponding machine-readable
|
||||||
|
source code, which must be distributed under the terms of Sections
|
||||||
|
1 and 2 above on a medium customarily used for software interchange; or,
|
||||||
|
|
||||||
|
b) Accompany it with a written offer, valid for at least three
|
||||||
|
years, to give any third party, for a charge no more than your
|
||||||
|
cost of physically performing source distribution, a complete
|
||||||
|
machine-readable copy of the corresponding source code, to be
|
||||||
|
distributed under the terms of Sections 1 and 2 above on a medium
|
||||||
|
customarily used for software interchange; or,
|
||||||
|
|
||||||
|
c) Accompany it with the information you received as to the offer
|
||||||
|
to distribute corresponding source code. (This alternative is
|
||||||
|
allowed only for noncommercial distribution and only if you
|
||||||
|
received the program in object code or executable form with such
|
||||||
|
an offer, in accord with Subsection b above.)
|
||||||
|
|
||||||
|
The source code for a work means the preferred form of the work for
|
||||||
|
making modifications to it. For an executable work, complete source
|
||||||
|
code means all the source code for all modules it contains, plus any
|
||||||
|
associated interface definition files, plus the scripts used to
|
||||||
|
control compilation and installation of the executable. However, as a
|
||||||
|
special exception, the source code distributed need not include
|
||||||
|
anything that is normally distributed (in either source or binary
|
||||||
|
form) with the major components (compiler, kernel, and so on) of the
|
||||||
|
operating system on which the executable runs, unless that component
|
||||||
|
itself accompanies the executable.
|
||||||
|
|
||||||
|
If distribution of executable or object code is made by offering
|
||||||
|
access to copy from a designated place, then offering equivalent
|
||||||
|
access to copy the source code from the same place counts as
|
||||||
|
distribution of the source code, even though third parties are not
|
||||||
|
compelled to copy the source along with the object code.
|
||||||
|
|
||||||
|
4. You may not copy, modify, sublicense, or distribute the Program
|
||||||
|
except as expressly provided under this License. Any attempt
|
||||||
|
otherwise to copy, modify, sublicense or distribute the Program is
|
||||||
|
void, and will automatically terminate your rights under this License.
|
||||||
|
However, parties who have received copies, or rights, from you under
|
||||||
|
this License will not have their licenses terminated so long as such
|
||||||
|
parties remain in full compliance.
|
||||||
|
|
||||||
|
5. You are not required to accept this License, since you have not
|
||||||
|
signed it. However, nothing else grants you permission to modify or
|
||||||
|
distribute the Program or its derivative works. These actions are
|
||||||
|
prohibited by law if you do not accept this License. Therefore, by
|
||||||
|
modifying or distributing the Program (or any work based on the
|
||||||
|
Program), you indicate your acceptance of this License to do so, and
|
||||||
|
all its terms and conditions for copying, distributing or modifying
|
||||||
|
the Program or works based on it.
|
||||||
|
|
||||||
|
6. Each time you redistribute the Program (or any work based on the
|
||||||
|
Program), the recipient automatically receives a license from the
|
||||||
|
original licensor to copy, distribute or modify the Program subject to
|
||||||
|
these terms and conditions. You may not impose any further
|
||||||
|
restrictions on the recipients' exercise of the rights granted herein.
|
||||||
|
You are not responsible for enforcing compliance by third parties to
|
||||||
|
this License.
|
||||||
|
|
||||||
|
7. If, as a consequence of a court judgment or allegation of patent
|
||||||
|
infringement or for any other reason (not limited to patent issues),
|
||||||
|
conditions are imposed on you (whether by court order, agreement or
|
||||||
|
otherwise) that contradict the conditions of this License, they do not
|
||||||
|
excuse you from the conditions of this License. If you cannot
|
||||||
|
distribute so as to satisfy simultaneously your obligations under this
|
||||||
|
License and any other pertinent obligations, then as a consequence you
|
||||||
|
may not distribute the Program at all. For example, if a patent
|
||||||
|
license would not permit royalty-free redistribution of the Program by
|
||||||
|
all those who receive copies directly or indirectly through you, then
|
||||||
|
the only way you could satisfy both it and this License would be to
|
||||||
|
refrain entirely from distribution of the Program.
|
||||||
|
|
||||||
|
If any portion of this section is held invalid or unenforceable under
|
||||||
|
any particular circumstance, the balance of the section is intended to
|
||||||
|
apply and the section as a whole is intended to apply in other
|
||||||
|
circumstances.
|
||||||
|
|
||||||
|
It is not the purpose of this section to induce you to infringe any
|
||||||
|
patents or other property right claims or to contest validity of any
|
||||||
|
such claims; this section has the sole purpose of protecting the
|
||||||
|
integrity of the free software distribution system, which is
|
||||||
|
implemented by public license practices. Many people have made
|
||||||
|
generous contributions to the wide range of software distributed
|
||||||
|
through that system in reliance on consistent application of that
|
||||||
|
system; it is up to the author/donor to decide if he or she is willing
|
||||||
|
to distribute software through any other system and a licensee cannot
|
||||||
|
impose that choice.
|
||||||
|
|
||||||
|
This section is intended to make thoroughly clear what is believed to
|
||||||
|
be a consequence of the rest of this License.
|
||||||
|
|
||||||
|
8. If the distribution and/or use of the Program is restricted in
|
||||||
|
certain countries either by patents or by copyrighted interfaces, the
|
||||||
|
original copyright holder who places the Program under this License
|
||||||
|
may add an explicit geographical distribution limitation excluding
|
||||||
|
those countries, so that distribution is permitted only in or among
|
||||||
|
countries not thus excluded. In such case, this License incorporates
|
||||||
|
the limitation as if written in the body of this License.
|
||||||
|
|
||||||
|
9. The Free Software Foundation may publish revised and/or new versions
|
||||||
|
of the General Public License from time to time. Such new versions will
|
||||||
|
be similar in spirit to the present version, but may differ in detail to
|
||||||
|
address new problems or concerns.
|
||||||
|
|
||||||
|
Each version is given a distinguishing version number. If the Program
|
||||||
|
specifies a version number of this License which applies to it and "any
|
||||||
|
later version", you have the option of following the terms and conditions
|
||||||
|
either of that version or of any later version published by the Free
|
||||||
|
Software Foundation. If the Program does not specify a version number of
|
||||||
|
this License, you may choose any version ever published by the Free Software
|
||||||
|
Foundation.
|
||||||
|
|
||||||
|
10. If you wish to incorporate parts of the Program into other free
|
||||||
|
programs whose distribution conditions are different, write to the author
|
||||||
|
to ask for permission. For software which is copyrighted by the Free
|
||||||
|
Software Foundation, write to the Free Software Foundation; we sometimes
|
||||||
|
make exceptions for this. Our decision will be guided by the two goals
|
||||||
|
of preserving the free status of all derivatives of our free software and
|
||||||
|
of promoting the sharing and reuse of software generally.
|
||||||
|
|
||||||
|
NO WARRANTY
|
||||||
|
|
||||||
|
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
|
||||||
|
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
|
||||||
|
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
|
||||||
|
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
|
||||||
|
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
|
||||||
|
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
|
||||||
|
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
|
||||||
|
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
|
||||||
|
REPAIR OR CORRECTION.
|
||||||
|
|
||||||
|
12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
|
||||||
|
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
|
||||||
|
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
|
||||||
|
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
|
||||||
|
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
|
||||||
|
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
|
||||||
|
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
|
||||||
|
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
|
||||||
|
POSSIBILITY OF SUCH DAMAGES.
|
||||||
|
|
||||||
|
END OF TERMS AND CONDITIONS
|
||||||
|
|
||||||
|
Appendix: How to Apply These Terms to Your New Programs
|
||||||
|
|
||||||
|
If you develop a new program, and you want it to be of the greatest
|
||||||
|
possible use to the public, the best way to achieve this is to make it
|
||||||
|
free software which everyone can redistribute and change under these terms.
|
||||||
|
|
||||||
|
To do so, attach the following notices to the program. It is safest
|
||||||
|
to attach them to the start of each source file to most effectively
|
||||||
|
convey the exclusion of warranty; and each file should have at least
|
||||||
|
the "copyright" line and a pointer to where the full notice is found.
|
||||||
|
|
||||||
|
<one line to give the program's name and a brief idea of what it does.>
|
||||||
|
Copyright (C) 19yy <name of author>
|
||||||
|
|
||||||
|
This program is free software; you can redistribute it and/or modify
|
||||||
|
it under the terms of the GNU General Public License as published by
|
||||||
|
the Free Software Foundation; either version 2 of the License, or
|
||||||
|
(at your option) any later version.
|
||||||
|
|
||||||
|
This program is distributed in the hope that it will be useful,
|
||||||
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
GNU General Public License for more details.
|
||||||
|
|
||||||
|
You should have received a copy of the GNU General Public License
|
||||||
|
along with this program; if not, write to the Free Software
|
||||||
|
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
|
||||||
|
|
||||||
|
Also add information on how to contact you by electronic and paper mail.
|
||||||
|
|
||||||
|
If the program is interactive, make it output a short notice like this
|
||||||
|
when it starts in an interactive mode:
|
||||||
|
|
||||||
|
Gnomovision version 69, Copyright (C) 19yy name of author
|
||||||
|
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
|
||||||
|
This is free software, and you are welcome to redistribute it
|
||||||
|
under certain conditions; type `show c' for details.
|
||||||
|
|
||||||
|
The hypothetical commands `show w' and `show c' should show the appropriate
|
||||||
|
parts of the General Public License. Of course, the commands you use may
|
||||||
|
be called something other than `show w' and `show c'; they could even be
|
||||||
|
mouse-clicks or menu items--whatever suits your program.
|
||||||
|
|
||||||
|
You should also get your employer (if you work as a programmer) or your
|
||||||
|
school, if any, to sign a "copyright disclaimer" for the program, if
|
||||||
|
necessary. Here is a sample; alter the names:
|
||||||
|
|
||||||
|
Yoyodyne, Inc., hereby disclaims all copyright interest in the program
|
||||||
|
`Gnomovision' (which makes passes at compilers) written by James Hacker.
|
||||||
|
|
||||||
|
<signature of Ty Coon>, 1 April 1989
|
||||||
|
Ty Coon, President of Vice
|
||||||
|
|
||||||
|
This General Public License does not permit incorporating your program into
|
||||||
|
proprietary programs. If your program is a subroutine library, you may
|
||||||
|
consider it more useful to permit linking proprietary applications with the
|
||||||
|
library. If this is what you want to do, use the GNU Library General
|
||||||
|
Public License instead of this License.
|
21
11/wywolania/Data/stide_v1.2/Makefile
Normal file
21
11/wywolania/Data/stide_v1.2/Makefile
Normal file
@ -0,0 +1,21 @@
|
|||||||
|
STIDE_OBJECTS = config.o flexitree.o stide.o stream.o
|
||||||
|
|
||||||
|
STIDE_HEADERS = config.h flexitree.h opt_info.h stream.h
|
||||||
|
|
||||||
|
FLAGS = -g
|
||||||
|
|
||||||
|
stide: $(STIDE_OBJECTS)
|
||||||
|
g++ $(FLAGS) $(STIDE_OBJECTS) -o stide
|
||||||
|
|
||||||
|
config.o: config.C config.h
|
||||||
|
g++ -c $(FLAGS) config.C
|
||||||
|
|
||||||
|
flexitree.o: flexitree.C flexitree.h
|
||||||
|
g++ -c $(FLAGS) flexitree.C
|
||||||
|
|
||||||
|
stream.o: stream.C stream.h
|
||||||
|
g++ -c $(FLAGS) stream.C
|
||||||
|
|
||||||
|
stide.o: stide.C $(STIDE_HEADERS)
|
||||||
|
g++ -c $(FLAGS) stide.C
|
||||||
|
|
13
11/wywolania/Data/stide_v1.2/README
Normal file
13
11/wywolania/Data/stide_v1.2/README
Normal file
@ -0,0 +1,13 @@
|
|||||||
|
STIDE version 1.2
|
||||||
|
|
||||||
|
Copyright (C) 1996, 1998 The Regents of the University of New Mexico.
|
||||||
|
Copyright (C) 2006 Hajime Inoue.
|
||||||
|
|
||||||
|
All rights reserved.
|
||||||
|
|
||||||
|
STIDE v1.2 should work identically to v1.1. Modern GCCs will not compile v1.1.
|
||||||
|
STIDE v1.2 was ported to STL and current C++ conventions. Please report
|
||||||
|
any bugs to hinoue@ccsl.carleton.ca.
|
||||||
|
|
||||||
|
For usage information invoke stide with the --help option. More detailed
|
||||||
|
documentation can be found in the UserDoc directory.
|
803
11/wywolania/Data/stide_v1.2/config.C
Normal file
803
11/wywolania/Data/stide_v1.2/config.C
Normal file
@ -0,0 +1,803 @@
|
|||||||
|
#include <stdlib.h>
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <fstream>
|
||||||
|
#include <string>
|
||||||
|
#include <vector>
|
||||||
|
#include "config.h"
|
||||||
|
#include "opt_info.h"
|
||||||
|
|
||||||
|
#define LF_LIM 999
|
||||||
|
#define SEQ_LEN_LIM 199
|
||||||
|
#define MAX_ELEM_LIM 999
|
||||||
|
#define MAX_STREAMS_LIM 9999
|
||||||
|
|
||||||
|
using std::vector;
|
||||||
|
using std::cout;
|
||||||
|
using std::cerr;
|
||||||
|
using std::endl;
|
||||||
|
|
||||||
|
/**********************************************************************
|
||||||
|
* Config() *
|
||||||
|
* Reads in configuration information from configuration file, from *
|
||||||
|
* the command line, and from preset defaults. *
|
||||||
|
* *
|
||||||
|
* Input: int argc: Number of arguments on command line *
|
||||||
|
* char *argv[]: Array of strings of actual arguments *
|
||||||
|
* *
|
||||||
|
* Output: Nothing *
|
||||||
|
*********************************************************************/
|
||||||
|
|
||||||
|
Config::Config(const int argc, const char *argv[])
|
||||||
|
{
|
||||||
|
vector<OptInfo> opt_array(NUM_OPTS);
|
||||||
|
InitOptArray(opt_array);
|
||||||
|
|
||||||
|
SetDefaults();
|
||||||
|
|
||||||
|
ReadCommandLine(argc, argv, opt_array);
|
||||||
|
|
||||||
|
ReadConfigFile(opt_array);
|
||||||
|
|
||||||
|
CheckValues();
|
||||||
|
|
||||||
|
InitOutputFormat();
|
||||||
|
|
||||||
|
OuputConfigInfo(opt_array);
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* InitOptArray() *
|
||||||
|
* Sets the values of opt_array so that opr_array contains all the *
|
||||||
|
* information needed about the parameters being set by the config *
|
||||||
|
* file and the command-line arguments. *
|
||||||
|
* *
|
||||||
|
* Input: vector<OptInfo> &opt_array: Array of information about *
|
||||||
|
* options for the program *
|
||||||
|
* *
|
||||||
|
* Output: Nothing *
|
||||||
|
*********************************************************************/
|
||||||
|
|
||||||
|
void Config::InitOptArray(vector<OptInfo> &opt_array)
|
||||||
|
{
|
||||||
|
// opt_array.reserve(NUM_OPTS);
|
||||||
|
|
||||||
|
opt_array[0].long_name = "db_name";
|
||||||
|
opt_array[0].short_name = "d";
|
||||||
|
opt_array[0].set = 0;
|
||||||
|
opt_array[0].type = 's';
|
||||||
|
opt_array[0].str_val = &db_name;
|
||||||
|
|
||||||
|
opt_array[1].long_name = "seq_len";
|
||||||
|
opt_array[1].short_name = "l";
|
||||||
|
opt_array[1].set = 0;
|
||||||
|
opt_array[1].type = 'i';
|
||||||
|
opt_array[1].int_val = &seq_len;
|
||||||
|
|
||||||
|
opt_array[2].long_name = "max_elements";
|
||||||
|
opt_array[2].short_name = "me";
|
||||||
|
opt_array[2].set = 0;
|
||||||
|
opt_array[2].type = 'i';
|
||||||
|
opt_array[2].int_val = &max_elements;
|
||||||
|
|
||||||
|
opt_array[3].long_name = "max_streams";
|
||||||
|
opt_array[3].short_name = "ms";
|
||||||
|
opt_array[3].set = 0;
|
||||||
|
opt_array[3].type = 'i';
|
||||||
|
opt_array[3].int_val = &max_streams;
|
||||||
|
|
||||||
|
opt_array[4].long_name = "cfg_name";
|
||||||
|
opt_array[4].short_name = "c";
|
||||||
|
opt_array[4].set = 0;
|
||||||
|
opt_array[4].type = 's';
|
||||||
|
opt_array[4].str_val = &cfg_name;
|
||||||
|
|
||||||
|
opt_array[5].long_name = "pair_offset";
|
||||||
|
opt_array[5].short_name = "p";
|
||||||
|
opt_array[5].set = 0;
|
||||||
|
opt_array[5].type = 'i';
|
||||||
|
opt_array[5].int_val = &pair_offset;
|
||||||
|
|
||||||
|
opt_array[6].long_name = "add_output_format";
|
||||||
|
opt_array[6].short_name = "aof";
|
||||||
|
opt_array[6].set = 0;
|
||||||
|
opt_array[6].type = 's';
|
||||||
|
opt_array[6].str_val = &add_output_format;
|
||||||
|
|
||||||
|
opt_array[7].long_name = "compare_output_format";
|
||||||
|
opt_array[7].short_name = "cof";
|
||||||
|
opt_array[7].set = 0;
|
||||||
|
opt_array[7].type = 's';
|
||||||
|
opt_array[7].str_val = &compare_output_format;
|
||||||
|
|
||||||
|
opt_array[8].long_name = "add_to_db";
|
||||||
|
opt_array[8].short_name = "a";
|
||||||
|
opt_array[8].set = 0;
|
||||||
|
opt_array[8].type = 'f';
|
||||||
|
opt_array[8].int_val = &add_to_db;
|
||||||
|
|
||||||
|
opt_array[9].long_name = "output_graph";
|
||||||
|
opt_array[9].short_name = "g";
|
||||||
|
opt_array[9].set = 0;
|
||||||
|
opt_array[9].type = 'f';
|
||||||
|
opt_array[9].int_val = &output_graph;
|
||||||
|
|
||||||
|
opt_array[10].long_name = "compute_hdist";
|
||||||
|
opt_array[10].short_name = "hd";
|
||||||
|
opt_array[10].set = 0;
|
||||||
|
opt_array[10].type = 'f';
|
||||||
|
opt_array[10].int_val = &compute_hdist;
|
||||||
|
|
||||||
|
opt_array[11].long_name = "lf_size";
|
||||||
|
opt_array[11].short_name = "lf";
|
||||||
|
opt_array[11].set = 0;
|
||||||
|
opt_array[11].type = 'i';
|
||||||
|
opt_array[11].int_val = &lf_size;
|
||||||
|
|
||||||
|
opt_array[12].long_name = "write_db_stats";
|
||||||
|
opt_array[12].short_name = "s";
|
||||||
|
opt_array[12].set = 0;
|
||||||
|
opt_array[12].type = 'f';
|
||||||
|
opt_array[12].int_val = &write_db_stats;
|
||||||
|
|
||||||
|
opt_array[13].long_name = "verbose";
|
||||||
|
opt_array[13].short_name = "v";
|
||||||
|
opt_array[13].set = 0;
|
||||||
|
opt_array[13].type = 'f';
|
||||||
|
opt_array[13].int_val = &verbose;
|
||||||
|
|
||||||
|
opt_array[14].long_name = "very_verbose";
|
||||||
|
opt_array[14].short_name = "V";
|
||||||
|
opt_array[14].set = 0;
|
||||||
|
opt_array[14].type = 'f';
|
||||||
|
opt_array[14].int_val = &very_verbose;
|
||||||
|
|
||||||
|
opt_array[15].long_name = "help";
|
||||||
|
opt_array[15].short_name = "h";
|
||||||
|
opt_array[15].set = 0;
|
||||||
|
opt_array[15].type = 'h';
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* SetDefaults() *
|
||||||
|
* Sets conifiguration variables to their default values *
|
||||||
|
* *
|
||||||
|
* Input: None *
|
||||||
|
* *
|
||||||
|
* Output: None *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
void Config::SetDefaults()
|
||||||
|
{
|
||||||
|
cfg_name = "stide.config";
|
||||||
|
db_name = "default.db";
|
||||||
|
seq_len = 6;
|
||||||
|
max_elements = 500;
|
||||||
|
max_streams = 500;
|
||||||
|
pair_offset = 0;
|
||||||
|
add_output_format = "DB Size: %d\tStream: %s\tPair Number: %p\n";
|
||||||
|
compare_output_format = "Pair Number: %p\tStream Number: %s\n";
|
||||||
|
lf_size = 1;
|
||||||
|
add_to_db = 0;
|
||||||
|
output_graph = 0;
|
||||||
|
compute_hdist = 0;
|
||||||
|
write_db_stats = 0;
|
||||||
|
verbose = 0;
|
||||||
|
very_verbose = 0;
|
||||||
|
num_fvars = 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* ReadCommandLine() *
|
||||||
|
* Parses the command line. Updates configuration variables. *
|
||||||
|
* *
|
||||||
|
* const int argc Number of arguments *
|
||||||
|
* const char *argv[], Array of arguments *
|
||||||
|
* vector<OptInfo> &opt_array Constant array of information about *
|
||||||
|
* the configuration variables *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
void Config::ReadCommandLine(const int argc, const char *argv[],
|
||||||
|
vector<OptInfo> &opt_array)
|
||||||
|
{
|
||||||
|
string var_name; // Name of variable
|
||||||
|
string var_val; // Value of variable
|
||||||
|
int name_type; // LONG_NAME or SHORT_NAME
|
||||||
|
int argv_i = 1; // First index of argv
|
||||||
|
int argv_j = 0; // Second index of argv
|
||||||
|
|
||||||
|
while (argv_i < argc) {
|
||||||
|
if (argv[argv_i][argv_j] != '-') {
|
||||||
|
cerr<< "ERROR: Switches must be preceeded by a dash: "<<argv[argv_i]
|
||||||
|
<< endl << " is illegal" << endl;
|
||||||
|
exit(-1);
|
||||||
|
}
|
||||||
|
argv_j++;
|
||||||
|
if (argv[argv_i][argv_j] == '-') { // Long name
|
||||||
|
argv_j++;
|
||||||
|
name_type = LONG_NAME;
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
name_type = SHORT_NAME;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Read name into var_name
|
||||||
|
var_name = argv[argv_i]+argv_j;
|
||||||
|
|
||||||
|
// Now we want to read the value, if there is one.
|
||||||
|
argv_j = 0;
|
||||||
|
if (++argv_i < argc) {
|
||||||
|
if (argv[argv_i][argv_j] != '-') {
|
||||||
|
var_val = argv[argv_i];
|
||||||
|
argv_i++;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// assign value to appropriate variable
|
||||||
|
AssignValToVar(opt_array, var_val, var_name, name_type);
|
||||||
|
// Blank var_name and var_val for next time around
|
||||||
|
var_name.resize(0);
|
||||||
|
var_val.resize(0);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* AssignValToVar() *
|
||||||
|
* Figures out which variable to assign a given value to and does *
|
||||||
|
* so. Updates opt_array, to say that that particular variable *
|
||||||
|
* has been set. *
|
||||||
|
* *
|
||||||
|
* Input: vector<OptInfo> &opt_array Option Information *
|
||||||
|
* const string &var_val Value to be assigned *
|
||||||
|
* const string &var_name Name of variable to be updated *
|
||||||
|
* const int name_type SHORT_NAME or LONG_NAME *
|
||||||
|
* *
|
||||||
|
* Output: None *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
void Config::AssignValToVar(vector<OptInfo> &opt_array, const string
|
||||||
|
&var_val, const string &var_name, const
|
||||||
|
int name_type)
|
||||||
|
{
|
||||||
|
int opt_i;
|
||||||
|
|
||||||
|
for (opt_i = 0; opt_i < NUM_OPTS; opt_i++) {
|
||||||
|
if (((name_type == LONG_NAME) && (opt_array[opt_i].long_name ==
|
||||||
|
var_name)) ||
|
||||||
|
((name_type == SHORT_NAME) && (opt_array[opt_i].short_name ==
|
||||||
|
var_name))) {
|
||||||
|
// If we have already set this variable and shouldn't change it,
|
||||||
|
// don't
|
||||||
|
if (opt_array[opt_i].set == 1) {
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
switch (opt_array[opt_i].type) {
|
||||||
|
case 'f': // flag
|
||||||
|
if ((var_val.length() == 0) || (var_val == "On") ||
|
||||||
|
(var_val == "ON") || (var_val == "on")) {
|
||||||
|
*(opt_array[opt_i].flag_val) = 1;
|
||||||
|
opt_array[opt_i].set = 1;
|
||||||
|
}
|
||||||
|
else if ((var_val != "Off") && (var_val != "off") &&
|
||||||
|
(var_val != "OFF")) {
|
||||||
|
cerr << "ERROR: Illegal value for parameter " << var_name
|
||||||
|
<< ". This parameter is a simple flag," << endl
|
||||||
|
<< "and may be followed by \"on\", \"off\", or nothing "
|
||||||
|
<< "(which turns it on). The current value is "
|
||||||
|
<< var_val << ". Aborting...";
|
||||||
|
exit(-1);
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
case 'i':
|
||||||
|
// If there isn't a value, just use the default
|
||||||
|
if (var_val.length() == 0) {
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
*(opt_array[opt_i].int_val) = atoi(var_val.c_str());
|
||||||
|
opt_array[opt_i].set = 1;
|
||||||
|
break;
|
||||||
|
case 's':
|
||||||
|
// If there is no string given, just use the default
|
||||||
|
if (var_val.length() == 0) {
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
*(opt_array[opt_i].str_val) = var_val;
|
||||||
|
opt_array[opt_i].set = 1;
|
||||||
|
break;
|
||||||
|
case 'h':
|
||||||
|
WriteHelpInfo();
|
||||||
|
} // end of switch
|
||||||
|
return; // we've found it, so we're done
|
||||||
|
} // end of if (opt_array[opt_i]...
|
||||||
|
} // end of for (opt_i = 0; ...
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* ReadConfigFile() *
|
||||||
|
* Parses the configuration file. Updates configuration *
|
||||||
|
* variables. *
|
||||||
|
* *
|
||||||
|
* Input: vector<OptInfo> &opt_array: Option information *
|
||||||
|
* *
|
||||||
|
* Output: None *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
|
||||||
|
void Config::ReadConfigFile(vector<OptInfo> &opt_array)
|
||||||
|
{
|
||||||
|
string var_name;
|
||||||
|
string var_val;
|
||||||
|
|
||||||
|
// Set up stream for reading configuration
|
||||||
|
ifstream cfg_file(cfg_name.c_str());
|
||||||
|
string buff;
|
||||||
|
int buff_i = 0; // index for buff
|
||||||
|
int opt_i = 0; // index for opt_array
|
||||||
|
int rev_num; // revision number of configuration file
|
||||||
|
|
||||||
|
if (!cfg_file.is_open()) {
|
||||||
|
cerr<<"WARNING: Cannot open configuration file "<<cfg_name
|
||||||
|
<<". I will continue, using the" <<endl
|
||||||
|
<<"default values and the command line arguments." << endl
|
||||||
|
<<"If that isn't what you wanted, type Ctrl-C now to abort."
|
||||||
|
<< endl;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// First we need to determine if the configuration file is old-style
|
||||||
|
// or new-style, i.e., is there a #ConfigFileRev: in the first
|
||||||
|
// line. We can determine this just be checking the first
|
||||||
|
// character.
|
||||||
|
char c = cfg_file.peek();
|
||||||
|
|
||||||
|
// Config file is empty; just return
|
||||||
|
if (cfg_file.eof()) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// If old-style
|
||||||
|
if (c != '#') {
|
||||||
|
cerr << "WARNING: The first line of the configuration file did "
|
||||||
|
<< "not contain the string" << endl
|
||||||
|
<< "\"#ConfigFileRev: " << CFREV << "\"." << endl
|
||||||
|
<< "I will assume that this is an old format configuration "
|
||||||
|
<< "file." << endl
|
||||||
|
<<"If that isn't what you wanted, type Ctrl-C now to abort."
|
||||||
|
<< endl << endl;
|
||||||
|
ReadOldConfigFile(cfg_file, opt_array);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Look for "#ConfigFileRev:"
|
||||||
|
cfg_file >> buff;
|
||||||
|
|
||||||
|
if (buff != "#ConfigFileRev:") {
|
||||||
|
cerr << "ERROR: I expected the first line of the configuration "
|
||||||
|
<< "file to either be \"#ConfigFileRev: \" followed by the "
|
||||||
|
<< "revision number or the beginning of an old-style "
|
||||||
|
<< "configuration file, which does not have a comment in the "
|
||||||
|
<< "first line. I'm confused, so I will abort..."
|
||||||
|
<< endl << endl;
|
||||||
|
exit(-1);
|
||||||
|
}
|
||||||
|
|
||||||
|
cfg_file >> rev_num;
|
||||||
|
|
||||||
|
if (rev_num > CFREV) {
|
||||||
|
cerr << "ERROR: This version of STIDE does not know how to deal "
|
||||||
|
<< "with configuration files" << endl
|
||||||
|
<< "more modern than revision " << CFREV << ". Aborting..."
|
||||||
|
<<endl;
|
||||||
|
exit(-1);
|
||||||
|
}
|
||||||
|
if (rev_num < CFREV) {
|
||||||
|
cerr << "ERROR: Configuration files must be revision " << CFREV
|
||||||
|
<< "or later, " << "or an old-style" << endl
|
||||||
|
<< "configuration file without a revision number. "
|
||||||
|
<< "Aborting..." << endl;
|
||||||
|
exit(-1);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Now we know everything's as we expect, so we'll parse the file
|
||||||
|
|
||||||
|
while (!cfg_file.eof()) {
|
||||||
|
// Skip white space at the beginning of the line
|
||||||
|
while (isspace(buff[buff_i])) {
|
||||||
|
buff_i++;
|
||||||
|
}
|
||||||
|
|
||||||
|
// If buff is empty, move on to next line
|
||||||
|
if (buff.length() <= buff_i) {
|
||||||
|
getline(cfg_file, buff);
|
||||||
|
buff_i = 0;
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
// If we start with a comment, move on to next line
|
||||||
|
if (buff[buff_i] == '#') {
|
||||||
|
getline(cfg_file, buff);
|
||||||
|
buff_i = 0;
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
// Read in variable name, up to the :
|
||||||
|
int start_place = buff_i; // the beginning place of the name
|
||||||
|
while (buff[buff_i] != ':' && (buff_i < buff.length())) {
|
||||||
|
buff_i++;
|
||||||
|
}
|
||||||
|
if (buff[buff_i] == buff.length()) {
|
||||||
|
cerr << "ERROR: Variable names in the configuration file must "
|
||||||
|
<< "be followed by a colon. The line " << endl
|
||||||
|
<< buff << endl << "contains a variable name which is not "
|
||||||
|
<< "terminated by a colon. Aborting..." <<endl;
|
||||||
|
exit(-1);
|
||||||
|
}
|
||||||
|
|
||||||
|
// This assigns the values in buff between start_place and buff_i
|
||||||
|
// to var_name
|
||||||
|
var_name.assign(buff, start_place, buff_i - start_place);
|
||||||
|
|
||||||
|
// Skip colon
|
||||||
|
buff_i++;
|
||||||
|
|
||||||
|
// Skip white space
|
||||||
|
while (isspace(buff[buff_i])) { buff_i++; }
|
||||||
|
|
||||||
|
start_place = buff_i; // the starting place of the value
|
||||||
|
// Find last point in value. If it starts with a quote, it ends
|
||||||
|
// with a quote.
|
||||||
|
if ((buff[buff_i] == '\"') && (buff_i < buff.length())) {
|
||||||
|
while (buff[buff_i] != '\"') {
|
||||||
|
buff_i++;
|
||||||
|
}
|
||||||
|
// Strip off first "
|
||||||
|
start_place++;
|
||||||
|
}
|
||||||
|
// Otherwise, it ends with a space, a # or the end of the line
|
||||||
|
else {
|
||||||
|
while ((buff_i < buff.length()) && (!isspace(buff[buff_i])) &&
|
||||||
|
(buff[buff_i] != '#')) {
|
||||||
|
buff_i++;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
var_val.assign(buff, start_place, buff_i - start_place);
|
||||||
|
|
||||||
|
// Now we want to check to see if the line was continued, in which
|
||||||
|
// case we haven't gotten the value of the variable in var_val, so
|
||||||
|
// we still need to do that.
|
||||||
|
if (buff[buff_i-1] == '\\') {
|
||||||
|
getline(cfg_file, buff);
|
||||||
|
buff_i = 0;
|
||||||
|
while (isspace(buff[buff_i])) { buff_i++; }
|
||||||
|
start_place = buff_i;
|
||||||
|
// Find last point in value. If it starts with a quote, it ends with a
|
||||||
|
// quote.
|
||||||
|
if (buff[buff_i] == '\"') {
|
||||||
|
buff_i++;
|
||||||
|
while ((buff[buff_i] != '\"') && (buff_i < buff.length())) {
|
||||||
|
buff_i++;
|
||||||
|
}
|
||||||
|
start_place++; // Strip off first "
|
||||||
|
}
|
||||||
|
// Otherwise, it ends with a space, a # or the end of the line
|
||||||
|
else {
|
||||||
|
while ((buff_i < buff.length()) && (!isspace(buff[buff_i])) &&
|
||||||
|
(buff[buff_i] != '#')) {
|
||||||
|
buff_i++;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
var_val.assign(buff, start_place, buff_i - start_place);
|
||||||
|
}
|
||||||
|
|
||||||
|
// assign value to appropriate variable
|
||||||
|
AssignValToVar(opt_array, var_val, var_name, LONG_NAME);
|
||||||
|
getline(cfg_file, buff);
|
||||||
|
buff_i = 0;
|
||||||
|
} //end of while (!cfg_file.eof())...
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* ReadOldConfigFile() *
|
||||||
|
* Reads information from an old-style configuration file. *
|
||||||
|
* Updates configuration variables. *
|
||||||
|
* *
|
||||||
|
* Input: ifstream &cfg_file Configuration file (already opened) *
|
||||||
|
* vector<OptInfo> &opt_array: Option information *
|
||||||
|
* *
|
||||||
|
* Output: None *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
void Config::ReadOldConfigFile(ifstream &cfg_file,
|
||||||
|
vector<OptInfo> &opt_array)
|
||||||
|
{
|
||||||
|
|
||||||
|
string buff;
|
||||||
|
string var_name;
|
||||||
|
string var_val;
|
||||||
|
|
||||||
|
var_name = "max_elements";
|
||||||
|
cfg_file>>var_val;
|
||||||
|
AssignValToVar(opt_array, var_val, var_name, LONG_NAME);
|
||||||
|
getline(cfg_file, buff);
|
||||||
|
|
||||||
|
var_name = "max_streams";
|
||||||
|
cfg_file>>var_val;
|
||||||
|
AssignValToVar(opt_array, var_val, var_name, LONG_NAME);
|
||||||
|
getline(cfg_file, buff);
|
||||||
|
|
||||||
|
// Next line is hash table size, but we are now figuring that out
|
||||||
|
// dynamically, so just throw it away.
|
||||||
|
getline(cfg_file, buff);
|
||||||
|
|
||||||
|
// Now read in the format string
|
||||||
|
getline(cfg_file, var_val);
|
||||||
|
// Put the format string in the appropriate place
|
||||||
|
if (add_to_db) {
|
||||||
|
var_name = "add_output_format";
|
||||||
|
AssignValToVar(opt_array, var_val, var_name, LONG_NAME);
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
var_name = "compare_output_format";
|
||||||
|
AssignValToVar(opt_array, var_val, var_name, LONG_NAME);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* CheckValues() *
|
||||||
|
* Checks configuration values that have been read in to make *
|
||||||
|
* sure that they are within the limits. Flags are automatically *
|
||||||
|
* checked while being read in, the output formats are checked *
|
||||||
|
* in InitOutputFormat(), and filenames are checked when they are *
|
||||||
|
* opened, so all that is left is the integer values. *
|
||||||
|
* *
|
||||||
|
* Input: None *
|
||||||
|
* *
|
||||||
|
* Output: None *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
void Config::CheckValues()
|
||||||
|
{
|
||||||
|
if ((lf_size < 1) || (lf_size > LF_LIM)) {
|
||||||
|
cerr << "ERROR: lf_size must be between 1 and " << LF_LIM
|
||||||
|
<< ". It has been set to " << lf_size << ". Aborting..." << endl;
|
||||||
|
exit(-1);
|
||||||
|
}
|
||||||
|
if ((seq_len < 1) || (seq_len > SEQ_LEN_LIM)) {
|
||||||
|
cerr << "ERROR: seq_len must be between 1 and " << SEQ_LEN_LIM
|
||||||
|
<< ". It has been set to " << seq_len << ". Aborting..." << endl;
|
||||||
|
exit(-1);
|
||||||
|
}
|
||||||
|
if ((max_elements < 1) || (max_elements > MAX_ELEM_LIM)) {
|
||||||
|
cerr << "ERROR: max_elements must be between 1 and " << MAX_ELEM_LIM
|
||||||
|
<< ". It has been set to " << max_elements
|
||||||
|
<< ". Aborting..." << endl;
|
||||||
|
exit(-1);
|
||||||
|
}
|
||||||
|
if ((max_streams < 1) || (max_streams > MAX_STREAMS_LIM)) {
|
||||||
|
cerr << "ERROR: max_streams must be between 1 and " << MAX_STREAMS_LIM
|
||||||
|
<< ". It has been set to " << max_streams
|
||||||
|
<< ". Aborting..." << endl;
|
||||||
|
exit(-1);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* InitOutputFormat() *
|
||||||
|
* Converts the string add_output_format or compare_output_format *
|
||||||
|
* to information filling fmt_str and num_fvars, which is more *
|
||||||
|
* convenient for output. *
|
||||||
|
* *
|
||||||
|
* Input: None *
|
||||||
|
* *
|
||||||
|
* Output: None *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
void Config::InitOutputFormat()
|
||||||
|
{
|
||||||
|
// Now we analyze add_output_format or compare_output_format
|
||||||
|
int flag = 0;
|
||||||
|
int f_i = 0;
|
||||||
|
num_fvars = 0;
|
||||||
|
string *buff;
|
||||||
|
|
||||||
|
// If we're not in verbose or very_verbose modes, we're never going
|
||||||
|
// to use this information, so don't waste our time doing this
|
||||||
|
if (!(verbose || very_verbose)) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
if (add_to_db) {
|
||||||
|
buff = &add_output_format;
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
buff = &compare_output_format;
|
||||||
|
}
|
||||||
|
|
||||||
|
for (int i = 0; i <(*buff).length(); i++) {
|
||||||
|
switch ((*buff)[i]) {
|
||||||
|
case '\\':
|
||||||
|
i++;
|
||||||
|
switch ((*buff)[i]) {
|
||||||
|
case 't': fmt_str[num_fvars][f_i] = '\t'; break;
|
||||||
|
case 'n': fmt_str[num_fvars][f_i] = '\n'; break;
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
case '%':
|
||||||
|
fmt_str[num_fvars][f_i] = '%';
|
||||||
|
flag = 1;
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
fmt_str[num_fvars][f_i] = (*buff)[i];
|
||||||
|
if (flag) {
|
||||||
|
switch (fmt_str[num_fvars][f_i]) {
|
||||||
|
case 'd': // database size
|
||||||
|
case 'i': // number of last value of sequence in this
|
||||||
|
// data stream
|
||||||
|
case 'p': // number of last value of sequence in entire
|
||||||
|
// input
|
||||||
|
case 's': // external stream ID
|
||||||
|
case 'a': // flag for whether this sequence is anomalous
|
||||||
|
case 'c': // locality frame count of this sequence
|
||||||
|
case 'h': // Hamming distance for this sequence
|
||||||
|
// Record that we must write that val at that position
|
||||||
|
write_val[num_fvars] = fmt_str[num_fvars][f_i];
|
||||||
|
fmt_str[num_fvars][f_i] = 'd';
|
||||||
|
fmt_str[num_fvars][f_i + 1] = '\0';
|
||||||
|
num_fvars++;
|
||||||
|
f_i = -1;
|
||||||
|
flag = 0;
|
||||||
|
break;
|
||||||
|
default: // Unknown flag
|
||||||
|
cerr << "ERROR: Illegal control character in output format."
|
||||||
|
<< " Type stide -h for help." << endl;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
} // switch ((*buff)[i ...
|
||||||
|
f_i++;
|
||||||
|
}
|
||||||
|
fmt_str[num_fvars][f_i] = '\0';
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* OutputConfigInfo() *
|
||||||
|
* Writes information about the final configuration to standard *
|
||||||
|
* output. Does so in a format that could be used as a *
|
||||||
|
* configuration file. Changes no values anywhere. *
|
||||||
|
* *
|
||||||
|
* Input: const vector<OptInfo> &opt_array Option Information *
|
||||||
|
* *
|
||||||
|
* Output: None *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
void Config::OuputConfigInfo(const vector<OptInfo> &opt_array) const
|
||||||
|
{
|
||||||
|
cout<<"This run was configured using configuration file "
|
||||||
|
<< cfg_name << " and command" << endl
|
||||||
|
<< "line arguments. The configuration values were as "
|
||||||
|
<< "follows." << endl
|
||||||
|
<<"#ConfigFileRev: " << CFREV << endl;
|
||||||
|
for (int i = 0; i < NUM_OPTS; i++) {
|
||||||
|
if (opt_array[i].type == 'i') {
|
||||||
|
cout << opt_array[i].long_name << ": " << *(opt_array[i].int_val)
|
||||||
|
<< endl;
|
||||||
|
}
|
||||||
|
if ((opt_array[i].type == 's') &&
|
||||||
|
((add_to_db && (opt_array[i].short_name == "aof")) ||
|
||||||
|
(!add_to_db && (opt_array[i].short_name == "cof")))) {
|
||||||
|
cout << opt_array[i].long_name << ": \"" << *(opt_array[i].str_val)
|
||||||
|
<< "\"" << endl;
|
||||||
|
}
|
||||||
|
if (opt_array[i].type == 'f') {
|
||||||
|
if (*(opt_array[i].int_val) == 1) {
|
||||||
|
cout << opt_array[i].long_name << ": On" << endl;
|
||||||
|
}
|
||||||
|
if (*(opt_array[i].int_val) == 0) {
|
||||||
|
cout << opt_array[i].long_name << ": Off" << endl;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
cout << endl << endl;
|
||||||
|
|
||||||
|
// Now print header for verbose modes
|
||||||
|
if (verbose || very_verbose) {
|
||||||
|
cout<<endl<<"Variables in output: "<<endl;
|
||||||
|
for (int j = 0; j < num_fvars; j++) {
|
||||||
|
switch (write_val[j]) {
|
||||||
|
case 's': cout<<"stream #, "; break;
|
||||||
|
case 'i': cout<<"index #, "; break;
|
||||||
|
case 'h': if (compute_hdist) {cout<<"hamming miss, "; } break;
|
||||||
|
case 'c': if (lf_size > 1) {cout<<"lfc, "; } break;
|
||||||
|
case 'p': cout<<"pair #, "; break;
|
||||||
|
case 'd': cout<<"db size, "; break;
|
||||||
|
case 'a': cout<<"is anomalous?, "; break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
cout<<endl;
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* WriteHelpInfo() *
|
||||||
|
* Writes help information to standard output. Changes no values.*
|
||||||
|
* *
|
||||||
|
* Input: None *
|
||||||
|
* *
|
||||||
|
* Output: None *
|
||||||
|
* *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
|
||||||
|
void Config::WriteHelpInfo() const
|
||||||
|
{
|
||||||
|
cout<<"STIDE accepts calls of the form:"<<endl
|
||||||
|
<<" stide -c cfg_name -d db_name -e max_num_elements"
|
||||||
|
<<" -lf lf_size -l seq_len"<<endl<<" -n max_num_streams"
|
||||||
|
<<" -p pair_num_offset -aof add_out_format "
|
||||||
|
<< endl << " -cof comp_out_format -a -g -h -m -s -v -V"
|
||||||
|
<< endl << endl;
|
||||||
|
cout<<"STIDE expects input to come through standard input in"
|
||||||
|
<<" the format of a pair"<<endl
|
||||||
|
<<"of integers per line, where the first integer is a"
|
||||||
|
<<" stream identifier"<<endl
|
||||||
|
<<"and the second is a data element. Command line"
|
||||||
|
<<" arguments override"<<endl
|
||||||
|
<<"specifications in the configuration file. All"
|
||||||
|
<<" parameters are optional"<<endl
|
||||||
|
<<"and can be specified in any order. Parameters"
|
||||||
|
<<" are always preceded by a"<<endl
|
||||||
|
<<"switch. The switches are:"<<endl<<endl;
|
||||||
|
cout<<"-a Add to database; defaults to off"<<endl;
|
||||||
|
cout<<"-c cfg_name The name of file containing the"
|
||||||
|
<<" configuration;"<<endl
|
||||||
|
<<" defaults to \"stide.config\""<<endl;
|
||||||
|
cout<<"-d db_name The name of the file containing"
|
||||||
|
<<" the database;"<<endl
|
||||||
|
<<" defaults to \"default.db\""<<endl;
|
||||||
|
cout<<"-lf lf_size The size of the locality frame;"
|
||||||
|
<<" defaults to 1"<<endl;
|
||||||
|
cout<<"-g Write graphing data in dot format to"
|
||||||
|
<<" db_name.dot;"<<endl
|
||||||
|
<<" defaults to off"<<endl;
|
||||||
|
cout<<"-h Help; displays this information"<<endl;
|
||||||
|
cout<<"-l seq_len Length of sequence; defaults to 6"
|
||||||
|
<<endl;
|
||||||
|
cout<<"-p pair_offset Offset for pair number count;"
|
||||||
|
<<" defaults to 0"<<endl;
|
||||||
|
cout<<"-s Display db stats; defaults to off"
|
||||||
|
<<endl;
|
||||||
|
cout<<"-v Verbose mode on; defaults to off"<<endl;
|
||||||
|
cout<<"-V Very verbose mode on; defaults to off"<<endl;
|
||||||
|
cout<<"-hd Compute Hamming distance measures;"
|
||||||
|
<<" defaults to off"<<endl;
|
||||||
|
cout<<"-me max_elements Maximum number of different"
|
||||||
|
<<" elements"<<endl
|
||||||
|
<<" in the input stream; defaults to"
|
||||||
|
<<" 500" <<endl;
|
||||||
|
cout<<"-ms max_num_streams Maximum number of different"
|
||||||
|
<<" streams in input;"<<endl
|
||||||
|
<<" defaults to 100"<<endl;
|
||||||
|
cout<<"-aof add_out_format Format for output when adding to"
|
||||||
|
<<" database"<<endl
|
||||||
|
<<" in verbose or very_verbose"
|
||||||
|
<<" modes; defaults to"<<endl
|
||||||
|
<<" \"DB Size: %d\\tStream: "
|
||||||
|
<<"%s\\tPair Number: %p\\n\""<<endl;
|
||||||
|
cout<<"-cof compare_out_format Format for output when comparing"
|
||||||
|
<<" with database"<<endl
|
||||||
|
<<" in verbose or very_verbose modes;"
|
||||||
|
<<" defaults to"<<endl
|
||||||
|
<<" \"Pair Number: %p\\tStream"
|
||||||
|
<<" Number: %s\\n\""<<endl;
|
||||||
|
exit(-1);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
72
11/wywolania/Data/stide_v1.2/config.h
Normal file
72
11/wywolania/Data/stide_v1.2/config.h
Normal file
@ -0,0 +1,72 @@
|
|||||||
|
#ifndef __SEQ_CONFIG_H
|
||||||
|
#define __SEQ_CONFIG_H
|
||||||
|
|
||||||
|
#define CFREV 1
|
||||||
|
|
||||||
|
#include <iostream>
|
||||||
|
#include <fstream>
|
||||||
|
#include <string>
|
||||||
|
#include <vector>
|
||||||
|
#include "opt_info.h"
|
||||||
|
|
||||||
|
using std::vector;
|
||||||
|
using std::ifstream;
|
||||||
|
|
||||||
|
class Config {
|
||||||
|
public:
|
||||||
|
Config(const int argc, const char *argv[]); // Constructor; reads
|
||||||
|
// configuration file and command
|
||||||
|
// line arguments
|
||||||
|
string cfg_name; // Name of configuration file
|
||||||
|
string db_name; // Name of database
|
||||||
|
int seq_len; // Sequence Length
|
||||||
|
int max_elements; // Maximum number of different
|
||||||
|
// data elements we may encounter
|
||||||
|
int max_streams; // Maximum number of different
|
||||||
|
// streams we may encounter
|
||||||
|
int pair_offset; // Number by which to offset
|
||||||
|
// num_pairs_read
|
||||||
|
string add_output_format; // Format for verbose-mode output
|
||||||
|
// when adding to database
|
||||||
|
string compare_output_format; // Format for verbose-mode output
|
||||||
|
// when comparing with an
|
||||||
|
// existing database
|
||||||
|
int lf_size; // Size of locality frames: 1
|
||||||
|
// effectively means don't
|
||||||
|
// compute locality frames
|
||||||
|
int add_to_db; // Flag indicating that we should
|
||||||
|
// add to the database rather
|
||||||
|
// than make comparisons
|
||||||
|
int output_graph; // Output graphing information in
|
||||||
|
// Dot format
|
||||||
|
int compute_hdist; // Compute Hamming distance
|
||||||
|
int write_db_stats; // Write statistics about the
|
||||||
|
// database
|
||||||
|
int verbose; // Output information about each
|
||||||
|
// anomaly or each new sequence
|
||||||
|
// added to the database
|
||||||
|
int very_verbose; // Output information about each
|
||||||
|
// sequence encountered
|
||||||
|
char fmt_str[10][50]; // String used for outputting
|
||||||
|
// information in verbose mode
|
||||||
|
char write_val[7]; // Do we write the value? used
|
||||||
|
// with fmt_str
|
||||||
|
int num_fvars; // Number of format variables
|
||||||
|
|
||||||
|
void Config::InitOptArray(vector<OptInfo> &opt_array);
|
||||||
|
void Config::SetDefaults();
|
||||||
|
void Config::ReadCommandLine(const int argc, const char *argv[],
|
||||||
|
vector<OptInfo> &opt_array);
|
||||||
|
void Config::AssignValToVar(vector<OptInfo> &opt_array, const
|
||||||
|
string &var_val, const string
|
||||||
|
&var_name, const int name_type);
|
||||||
|
void Config::ReadConfigFile(vector<OptInfo> &opt_array);
|
||||||
|
void Config::ReadOldConfigFile(ifstream &cfg_file,
|
||||||
|
vector<OptInfo> &opt_array);
|
||||||
|
void Config::InitOutputFormat();
|
||||||
|
void Config::CheckValues();
|
||||||
|
void Config::OuputConfigInfo(const vector<OptInfo> &opt_array) const;
|
||||||
|
void Config::WriteHelpInfo() const;
|
||||||
|
};
|
||||||
|
|
||||||
|
#endif
|
461
11/wywolania/Data/stide_v1.2/flexitree.C
Normal file
461
11/wywolania/Data/stide_v1.2/flexitree.C
Normal file
@ -0,0 +1,461 @@
|
|||||||
|
// flexitree.C
|
||||||
|
#include "flexitree.h"
|
||||||
|
|
||||||
|
#include<iostream>
|
||||||
|
#include<ostream>
|
||||||
|
|
||||||
|
extern int counter;
|
||||||
|
|
||||||
|
using std::endl;
|
||||||
|
using std::cerr;
|
||||||
|
|
||||||
|
// data structures:
|
||||||
|
// node for a linked list
|
||||||
|
class FlexiTreeNode {
|
||||||
|
public:
|
||||||
|
FlexiTree *tree; // the element at this node
|
||||||
|
FlexiTreeNode *next; // pointer to the next node
|
||||||
|
FlexiTreeNode(int root) {tree = new FlexiTree(root); next = NULL;}
|
||||||
|
};
|
||||||
|
//===========================================================================
|
||||||
|
FlexiTree::FlexiTree(void) {
|
||||||
|
children = NULL;
|
||||||
|
root = -1;
|
||||||
|
id = counter;
|
||||||
|
counter++;
|
||||||
|
}
|
||||||
|
//===========================================================================
|
||||||
|
FlexiTree::FlexiTree(int d) {
|
||||||
|
children = NULL;
|
||||||
|
root = d;
|
||||||
|
id = counter;
|
||||||
|
counter++;
|
||||||
|
}
|
||||||
|
//============================================================================
|
||||||
|
FlexiTree::~FlexiTree(void) {
|
||||||
|
if (children) {
|
||||||
|
FlexiTreeNode *temp_ptr = children->next, *next_temp_ptr;
|
||||||
|
if (children->tree) delete children->tree;
|
||||||
|
delete children;
|
||||||
|
while (temp_ptr) {
|
||||||
|
next_temp_ptr = temp_ptr->next;
|
||||||
|
if (temp_ptr->tree) delete temp_ptr->tree;
|
||||||
|
delete temp_ptr;
|
||||||
|
temp_ptr = next_temp_ptr;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
//============================================================================
|
||||||
|
int FlexiTree::NumNodes(void) const {
|
||||||
|
int size = 1;
|
||||||
|
if (children) {
|
||||||
|
FlexiTreeNode *temp_ptr = children;
|
||||||
|
while (temp_ptr) {
|
||||||
|
size += temp_ptr->tree->NumNodes();
|
||||||
|
temp_ptr = temp_ptr->next;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return size;
|
||||||
|
}
|
||||||
|
//============================================================================
|
||||||
|
int FlexiTree::NumLeaves(void) const {
|
||||||
|
int size;
|
||||||
|
if (children) {
|
||||||
|
size = 0;
|
||||||
|
FlexiTreeNode *temp_ptr = children;
|
||||||
|
while (temp_ptr) {
|
||||||
|
size += temp_ptr->tree->NumLeaves();
|
||||||
|
temp_ptr = temp_ptr->next;
|
||||||
|
}
|
||||||
|
} else size = 1;
|
||||||
|
return size;
|
||||||
|
}
|
||||||
|
//============================================================================
|
||||||
|
int FlexiTree::NumBranches(void) const {
|
||||||
|
int branches = 0;
|
||||||
|
if (children) {
|
||||||
|
FlexiTreeNode *temp_ptr = children;
|
||||||
|
while (temp_ptr) {
|
||||||
|
branches += (temp_ptr->tree->NumBranches() + 1);
|
||||||
|
temp_ptr = temp_ptr->next;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return branches;
|
||||||
|
}
|
||||||
|
/**********************************************************************
|
||||||
|
* InsertSeq() *
|
||||||
|
* Inserts a sequence in this tree and returns 1 if the sequence *
|
||||||
|
* begins with the root of this tree and the sequence isn't already *
|
||||||
|
* in this tree. It returns -1 if the sequence doesn't begin with *
|
||||||
|
* the root of this tree. It returns 0 if the sequence was already *
|
||||||
|
* in this tree. This function is recursive and only compares the *
|
||||||
|
* portion of the sequence lying between the argument first and the *
|
||||||
|
* argument last. *
|
||||||
|
* *
|
||||||
|
* *
|
||||||
|
* Input: const vector<int> &seq Current sequence *
|
||||||
|
* int first The first element of the sequence *
|
||||||
|
* to consider *
|
||||||
|
* int last The length of the sequence *
|
||||||
|
*********************************************************************/
|
||||||
|
|
||||||
|
int FlexiTree::InsertSeq(const vector<int> &seq, int first, int last)
|
||||||
|
{
|
||||||
|
// If the root of this tree isn't the same as the first element of
|
||||||
|
// the sequence, return -1 to indicate that
|
||||||
|
if (root != seq[first]) {
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
first++; // shift the seq forward
|
||||||
|
// If we have reached the end of the sequence now, we haven't added
|
||||||
|
// anything to the tree, so we return 0 to indicate that it was
|
||||||
|
// already there
|
||||||
|
if (first > last) {
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
// If there are no children, create some with the correct root,
|
||||||
|
// insert the sequence and return 1.
|
||||||
|
if (!children) {
|
||||||
|
children = new FlexiTreeNode(seq[first]);
|
||||||
|
children->tree->InsertSeq(seq, first, last);
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
// The root agrees, we're not at the end, and there are children.
|
||||||
|
// Now we want to know if the sequence is already in the children,
|
||||||
|
// and if not, we want to find out and add it.
|
||||||
|
FlexiTreeNode *temp_ptr = children;
|
||||||
|
int flag;
|
||||||
|
while (1) {
|
||||||
|
flag = temp_ptr->tree->InsertSeq(seq, first, last);
|
||||||
|
// If the sequence is new and gets added, return 1
|
||||||
|
if (flag == 1) return 1;
|
||||||
|
// If the sequence is old, return 0
|
||||||
|
if (flag == 0) return 0;
|
||||||
|
// Otherwise the new root of the sequence isn't the same as the
|
||||||
|
// root of this child tree, so we will try the next one. But
|
||||||
|
// first, if this is the last child, we know it isn't in here, so
|
||||||
|
// we will add it in and return 1
|
||||||
|
if (temp_ptr->next == NULL) {
|
||||||
|
temp_ptr->next = new FlexiTreeNode(seq[first]);
|
||||||
|
temp_ptr->next->tree->InsertSeq(seq, first, last);
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
temp_ptr = temp_ptr->next;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* IsSeqInTree() *
|
||||||
|
* Returns 1 if the sequence has a match within this tree and *
|
||||||
|
* returns 0 otherwise. This function is recursive and only *
|
||||||
|
* compares the portion of the sequence lying between the argument *
|
||||||
|
* first and the argument last. *
|
||||||
|
* *
|
||||||
|
* *
|
||||||
|
* Input: vector<int> &seq Current sequence *
|
||||||
|
* int first The first element of the sequence to *
|
||||||
|
* consider *
|
||||||
|
* int last The length of the sequence *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
int FlexiTree::IsSeqInTree(const vector<int> &seq, int first, int last) const
|
||||||
|
{
|
||||||
|
// If the first element of the sequence isn't the same as the root
|
||||||
|
// of this tree, then we know already that there isn't a match here,
|
||||||
|
// so return 0.
|
||||||
|
if (root != seq[first]) {
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
first++; // shift the seq forward
|
||||||
|
|
||||||
|
// If we have reached the end of the sequence, then we have
|
||||||
|
// found matches all the way along, so return 1 saying that this is
|
||||||
|
// a match.
|
||||||
|
if (first > last) {
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Now we want to find out if there is a match in any of the
|
||||||
|
// subtrees below this tree. The subtrees are contained in the
|
||||||
|
// linked list children->next->next->...
|
||||||
|
FlexiTreeNode *next_node = children;
|
||||||
|
while (next_node != NULL) {
|
||||||
|
if (next_node->tree->IsSeqInTree(seq, first, last)) {
|
||||||
|
return 1; //Found it!
|
||||||
|
}
|
||||||
|
next_node = next_node->next;
|
||||||
|
}
|
||||||
|
// Now we've been through all of the subtrees without finding a
|
||||||
|
// match, so there aren't any matches.
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
/*********************************************************************
|
||||||
|
* ComputeHDistForTree() *
|
||||||
|
* Reports the minimum number of mismatches with any sequence on *
|
||||||
|
* this tree. This is a highly compute-intensive method, because *
|
||||||
|
* every path down the tree is followed. This function is *
|
||||||
|
* recursive, and only compares the portion of the sequence lying *
|
||||||
|
* between the argument first and the argument last. *
|
||||||
|
* *
|
||||||
|
* *
|
||||||
|
* Input: vector<int> &seq Current sequence *
|
||||||
|
* int first The first element of the sequence to *
|
||||||
|
* consider *
|
||||||
|
* int last The length of the sequence *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
int FlexiTree::ComputeHDistForTree(vector<int> &seq, int first, int
|
||||||
|
last) const
|
||||||
|
{
|
||||||
|
|
||||||
|
int tot_misses = 0;
|
||||||
|
|
||||||
|
// If the first element of the sequence isn't the same as the root
|
||||||
|
// of this tree, then every sequence on this tree will disagree with
|
||||||
|
// the sequence here, so we increment tot_misses
|
||||||
|
if (root != seq[first]) {
|
||||||
|
tot_misses++;
|
||||||
|
}
|
||||||
|
|
||||||
|
first++; // shift the seq forward
|
||||||
|
if (first > last) { // reached the end of the seq
|
||||||
|
return tot_misses; // return a zero, i.e. no mismatches
|
||||||
|
}
|
||||||
|
|
||||||
|
// Now we want to add to tot_misses the smallest number of
|
||||||
|
// mismatches with any of this tree's subtrees. This tree's
|
||||||
|
// subtrees are in the linked list children->next->next->
|
||||||
|
FlexiTreeNode *next_node = children;
|
||||||
|
// last is the last element of the sequence, which is one less than
|
||||||
|
// the number of elements in the sequence. The most misses possible
|
||||||
|
// is the number of elements in the sequence.
|
||||||
|
int min_misses = last + 1;
|
||||||
|
int misses;
|
||||||
|
while (next_node != NULL) {
|
||||||
|
misses = next_node->tree->ComputeHDistForTree(seq, first, last);
|
||||||
|
if (misses < min_misses) {
|
||||||
|
min_misses = misses;
|
||||||
|
}
|
||||||
|
next_node = next_node->next;
|
||||||
|
}
|
||||||
|
return (tot_misses + min_misses);
|
||||||
|
}
|
||||||
|
//===========================================================================
|
||||||
|
// format for writing out: we do it df, each path is terminated by a negative number,
|
||||||
|
// which is -(the reqd backtrack length)-1. depth should start out as 0.
|
||||||
|
// the tree writing out will end with -1.
|
||||||
|
void FlexiTree::Write(ostream &s, int &depth) const {
|
||||||
|
s<<root<<" ";
|
||||||
|
FlexiTreeNode *temp_ptr = children;
|
||||||
|
while (temp_ptr) {
|
||||||
|
depth = 0;
|
||||||
|
temp_ptr->tree->Write(s, depth);
|
||||||
|
temp_ptr = temp_ptr->next;
|
||||||
|
if (temp_ptr) s<<"-"<<(depth + 1)<<" ";
|
||||||
|
}
|
||||||
|
depth++; // now incr the count
|
||||||
|
}
|
||||||
|
//=============================================================================
|
||||||
|
ostream &operator<<(ostream &s, const FlexiTree &tree) {
|
||||||
|
int depth = 0;
|
||||||
|
tree.Write(s, depth);
|
||||||
|
s<<" -1"; // we terminate with a -1
|
||||||
|
return s;
|
||||||
|
}
|
||||||
|
//===========================================================================
|
||||||
|
// returns 0 if we have reached the end of the file, 1 otherwise
|
||||||
|
int FlexiTree::Read(istream &s, int &depth) {
|
||||||
|
int next_num;
|
||||||
|
if (s.eof()) return 0;
|
||||||
|
s>>next_num;
|
||||||
|
|
||||||
|
if (next_num == -1) return 0; // we have reached the end of the tree
|
||||||
|
|
||||||
|
if (next_num >= 0) {
|
||||||
|
children = new FlexiTreeNode(next_num);
|
||||||
|
if (!children->tree->Read(s, depth)) return 0;
|
||||||
|
FlexiTreeNode *temp_ptr = children;
|
||||||
|
while (depth == 0) {
|
||||||
|
if (s.eof()) return 0;
|
||||||
|
s>>next_num;
|
||||||
|
if (next_num == -1) return 0; // we have reached the end of the tree
|
||||||
|
temp_ptr->next = new FlexiTreeNode(next_num);
|
||||||
|
temp_ptr = temp_ptr->next;
|
||||||
|
if (!temp_ptr->tree->Read(s, depth)) return 0;
|
||||||
|
}
|
||||||
|
} else depth = (-1 * next_num) - 1;
|
||||||
|
if (depth) depth--;
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
//=============================================================================
|
||||||
|
istream &operator>>(istream &s, FlexiTree &tree) {
|
||||||
|
int next_num, depth = 0;
|
||||||
|
s>>next_num;
|
||||||
|
tree.SetRoot(next_num);
|
||||||
|
tree.Read(s, depth);
|
||||||
|
return s;
|
||||||
|
}
|
||||||
|
//===========================================================================
|
||||||
|
// writes out in the format that dot uses for dags
|
||||||
|
int FlexiTree::OutputGraph(ostream &s) const {
|
||||||
|
// first write out the name of the tree
|
||||||
|
s<<" "<<id<<" [label=\""<<root<<"\",shape=plaintext];"<<endl;
|
||||||
|
|
||||||
|
FlexiTreeNode *temp_ptr = children;
|
||||||
|
int childid;
|
||||||
|
while (temp_ptr) {
|
||||||
|
childid = temp_ptr->tree->OutputGraph(s);
|
||||||
|
s<<" "<<id<<" -> "<<childid<<";"<<endl;
|
||||||
|
temp_ptr = temp_ptr->next;
|
||||||
|
}
|
||||||
|
return id;
|
||||||
|
}
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* IsSeqInForest() *
|
||||||
|
* Searches through database forest to locate sequence. Returns 1 *
|
||||||
|
* if it finds it, 0 otherwise *
|
||||||
|
*********************************************************************/
|
||||||
|
|
||||||
|
int SeqForest::IsSeqInForest(const vector<int> &seq, int seq_len) const
|
||||||
|
{
|
||||||
|
// Have we ever seen a sequence starting with the same root?
|
||||||
|
if (trees_found[seq[0]]) {
|
||||||
|
// Have we seen this precise sequence?
|
||||||
|
return trees[seq[0]].IsSeqInTree(seq, 0, seq_len-1);
|
||||||
|
}
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
SeqForest::SeqForest(int max_trees)
|
||||||
|
{
|
||||||
|
trees = vector<FlexiTree>(max_trees);
|
||||||
|
trees_found = vector<int>(max_trees, 0);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*
|
||||||
|
#include "fstream.h"
|
||||||
|
|
||||||
|
// for test purposes
|
||||||
|
void main(void) {
|
||||||
|
FlexiTree tree(1);
|
||||||
|
vector<int> seq(10);
|
||||||
|
|
||||||
|
// try out insert and write
|
||||||
|
seq[0] = 1; seq[1] = 1; seq[2] = 2; seq[3] = 3;
|
||||||
|
tree.SeqInsert(seq, 0, 3);
|
||||||
|
cout<<"1123:"<<tree<<endl;
|
||||||
|
seq[0] = 1; seq[1] = 1; seq[2] = 3; seq[3] = 5;
|
||||||
|
tree.SeqInsert(seq, 0, 3);
|
||||||
|
cout<<"1134:"<<tree<<endl;
|
||||||
|
seq[0] = 1; seq[1] = 2; seq[2] = 2; seq[3] = 3;
|
||||||
|
tree.SeqInsert(seq, 0, 3);
|
||||||
|
cout<<"1223:"<<tree<<endl;
|
||||||
|
seq[0] = 1; seq[1] = 2; seq[2] = 3; seq[3] = 3;
|
||||||
|
tree.SeqInsert(seq, 0, 3);
|
||||||
|
cout<<"1233:"<<tree<<endl;
|
||||||
|
seq[0] = 1; seq[1] = 2; seq[2] = 3; seq[3] = 4;
|
||||||
|
tree.SeqInsert(seq, 0, 3);
|
||||||
|
cout<<"1234:"<<tree<<endl;
|
||||||
|
seq[0] = 1; seq[1] = 2; seq[2] = 3; seq[3] = 4;
|
||||||
|
tree.SeqInsert(seq, 0, 3);
|
||||||
|
cout<<"1234:"<<tree<<endl;
|
||||||
|
seq[0] = 1; seq[1] = 2; seq[2] = 1; seq[3] = 4;
|
||||||
|
tree.SeqInsert(seq, 0, 3);
|
||||||
|
cout<<"1214:"<<tree<<endl;
|
||||||
|
|
||||||
|
// now try out search
|
||||||
|
seq[0] = 1; seq[1] = 2; seq[2] = 1; seq[3] = 4;
|
||||||
|
if (tree.SeqSearch(seq, 0, 3)) cout<<"found 1214"<<endl;
|
||||||
|
else cout<<"could not find 1214"<<endl;
|
||||||
|
seq[0] = 1; seq[1] = 2; seq[2] = 2; seq[3] = 4;
|
||||||
|
if (tree.SeqSearch(seq, 0, 3)) cout<<"found 1224"<<endl;
|
||||||
|
else cout<<"could not find 1224"<<endl;
|
||||||
|
seq[0] = 1; seq[1] = 2; seq[2] = 4; seq[3] = 4;
|
||||||
|
if (tree.SeqSearch(seq, 0, 3)) cout<<"found 1244"<<endl;
|
||||||
|
else cout<<"could not find 1244"<<endl;
|
||||||
|
seq[0] = 1; seq[1] = 1; seq[2] = 3; seq[3] = 5;
|
||||||
|
if (tree.SeqSearch(seq, 0, 3)) cout<<"found 1134"<<endl;
|
||||||
|
else cout<<"could not find 1134"<<endl;
|
||||||
|
|
||||||
|
// try out insert and write with shorter and longer sequences
|
||||||
|
seq[0] = 1; seq[1] = 3;
|
||||||
|
tree.SeqInsert(seq, 0, 1);
|
||||||
|
cout<<"13:"<<tree<<endl;
|
||||||
|
seq[0] = 1; seq[1] = 1; seq[2] = 4;
|
||||||
|
tree.SeqInsert(seq, 0, 2);
|
||||||
|
cout<<"114:"<<tree<<endl;
|
||||||
|
seq[0] = 1; seq[1] = 2; seq[2] = 3; seq[3] = 1; seq[4] = 1; seq[5] = 2; seq[6] = 1; seq[7] = 4;
|
||||||
|
tree.SeqInsert(seq, 0, 7);
|
||||||
|
cout<<"12311214:"<<tree<<endl;
|
||||||
|
|
||||||
|
if (tree.SeqSearch(seq, 0, 7)) cout<<"found 12311214"<<endl;
|
||||||
|
else cout<<"could not find 12311214"<<endl;
|
||||||
|
seq[0] = 1; seq[1] = 1; seq[2] = 5;
|
||||||
|
if (tree.SeqSearch(seq, 0, 2)) cout<<"found 115"<<endl;
|
||||||
|
else cout<<"could not find 115"<<endl;
|
||||||
|
if (tree.SeqSearch(seq, 0, 1)) cout<<"found 11"<<endl;
|
||||||
|
else cout<<"could not find 11"<<endl;
|
||||||
|
|
||||||
|
ofstream outf("test.out");
|
||||||
|
outf<<tree;
|
||||||
|
outf.close();
|
||||||
|
|
||||||
|
//counter = 0;
|
||||||
|
|
||||||
|
FlexiTree intree;
|
||||||
|
ifstream inf("test.out");
|
||||||
|
inf>>intree;
|
||||||
|
inf.close();
|
||||||
|
|
||||||
|
cout<<endl<<intree;
|
||||||
|
|
||||||
|
seq[0] = 1; seq[1] = 2; seq[2] = 3; seq[3] = 1; seq[4] = 1; seq[5] = 2; seq[6] = 1; seq[7] = 4;
|
||||||
|
if (intree.SeqSearch(seq, 0, 7)) cout<<"found 12311214"<<endl;
|
||||||
|
else cout<<"could not find 12311214"<<endl;
|
||||||
|
seq[0] = 1; seq[1] = 1; seq[2] = 5;
|
||||||
|
if (intree.SeqSearch(seq, 0, 2)) cout<<"found 115"<<endl;
|
||||||
|
else cout<<"could not find 115"<<endl;
|
||||||
|
if (intree.SeqSearch(seq, 0, 1)) cout<<"found 11"<<endl;
|
||||||
|
else cout<<"could not find 11"<<endl;
|
||||||
|
|
||||||
|
}
|
||||||
|
*/
|
||||||
|
|
||||||
|
/*
|
||||||
|
int FlexiTree::Read(istream &s, int &depth) {
|
||||||
|
int next_num, depth_decr = 0;
|
||||||
|
s>>next_num;
|
||||||
|
if (next_num == -1) return 0; // we have reached the end of the tree
|
||||||
|
if (next_num >= 0) {
|
||||||
|
children = new FlexiTreeNode(next_num);
|
||||||
|
if (!children->tree->Read(s, depth)) return 0;
|
||||||
|
if (depth) {
|
||||||
|
depth--;
|
||||||
|
depth_decr = 1;
|
||||||
|
}
|
||||||
|
FlexiTreeNode *temp_ptr = children;
|
||||||
|
while (depth == 0) {
|
||||||
|
depth_decr = 0;
|
||||||
|
s>>next_num;
|
||||||
|
if (next_num == -1) return 0; // we have reached the end of the tree
|
||||||
|
temp_ptr->next = new FlexiTreeNode(next_num);
|
||||||
|
temp_ptr = temp_ptr->next;
|
||||||
|
if (!temp_ptr->tree->Read(s, depth)) return 0;
|
||||||
|
if (depth) {
|
||||||
|
depth--;
|
||||||
|
depth_decr = 1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (!depth_decr && depth) depth--;
|
||||||
|
} else
|
||||||
|
depth = (-1 * next_num) - 1;
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
*/
|
||||||
|
|
54
11/wywolania/Data/stide_v1.2/flexitree.h
Normal file
54
11/wywolania/Data/stide_v1.2/flexitree.h
Normal file
@ -0,0 +1,54 @@
|
|||||||
|
#ifndef __FLEXITREE_H
|
||||||
|
#define __FLEXITREE_H
|
||||||
|
|
||||||
|
#include<vector>
|
||||||
|
#include<iostream>
|
||||||
|
#include<ostream>
|
||||||
|
|
||||||
|
using std::ostream;
|
||||||
|
using std::istream;
|
||||||
|
using std::vector;
|
||||||
|
|
||||||
|
class FlexiTreeNode;
|
||||||
|
class FlexiTree {
|
||||||
|
private:
|
||||||
|
FlexiTreeNode *children;
|
||||||
|
int root;
|
||||||
|
int id;
|
||||||
|
public:
|
||||||
|
void Write(ostream &s, int &depth) const;
|
||||||
|
int Read(istream &s, int &depth);
|
||||||
|
int OutputGraph(ostream &s) const;
|
||||||
|
FlexiTree();
|
||||||
|
FlexiTree(int d);
|
||||||
|
// FlexiTree(const FlexiTree& ft);
|
||||||
|
~FlexiTree();
|
||||||
|
void SetRoot(int d) {root = d;}
|
||||||
|
int InsertSeq(const vector<int> &seq, int first, int last);
|
||||||
|
int IsSeqInTree(const vector<int> &seq, int first, int last) const;
|
||||||
|
int ComputeHDistForTree(vector<int> &seq, int first, int last) const;
|
||||||
|
friend ostream &operator<<(ostream &s, const FlexiTree &tn);
|
||||||
|
friend istream &operator>>(istream &s, FlexiTree &tn);
|
||||||
|
int NumNodes() const; // returns the number of nodes in the tree
|
||||||
|
int NumLeaves() const; // returns the number of leaves in the tree, i.e num of distinct seqs
|
||||||
|
int NumBranches() const; // returns the total # of branches, of all nodes
|
||||||
|
};
|
||||||
|
|
||||||
|
//===========================================================================
|
||||||
|
class SeqForest {
|
||||||
|
public:
|
||||||
|
// this structure is a an array of N tree nodes, i.e. a tree for each value
|
||||||
|
// type
|
||||||
|
vector<FlexiTree> trees;
|
||||||
|
// this structure is to record what types of values actually occured -
|
||||||
|
// for efficiency, if there were actually fewer value types than
|
||||||
|
// specified in the config
|
||||||
|
vector<int> trees_found;
|
||||||
|
SeqForest(int max_trees);
|
||||||
|
int IsSeqInForest(const vector<int> &seq, int seq_len) const;
|
||||||
|
};
|
||||||
|
|
||||||
|
//===========================================================================
|
||||||
|
|
||||||
|
#endif
|
||||||
|
|
34
11/wywolania/Data/stide_v1.2/opt_info.h
Normal file
34
11/wywolania/Data/stide_v1.2/opt_info.h
Normal file
@ -0,0 +1,34 @@
|
|||||||
|
#ifndef __OPT_INFO_H
|
||||||
|
#define __OPT_INFO_H
|
||||||
|
|
||||||
|
#include <string>
|
||||||
|
|
||||||
|
#define NUM_OPTS 16
|
||||||
|
#define SHORT_NAME 0
|
||||||
|
#define LONG_NAME 1
|
||||||
|
|
||||||
|
using std::string;
|
||||||
|
|
||||||
|
class OptInfo {
|
||||||
|
public:
|
||||||
|
string long_name; // Long name of this option; used in
|
||||||
|
// configuration file and with the -- marker
|
||||||
|
// on the command line
|
||||||
|
string short_name; // Short name of this option; used with the -
|
||||||
|
// marker on the command line
|
||||||
|
int set; // Flag indicating if this option has already
|
||||||
|
// been set
|
||||||
|
char type; // type of value: legitimate values are f
|
||||||
|
// (flag, i.e., boolean), i (int), s (string)
|
||||||
|
// or h (help)
|
||||||
|
union { // pointer to actual value to be set
|
||||||
|
int *flag_val; // value if type = 'f'
|
||||||
|
int *int_val; // value if type = 'i'
|
||||||
|
string *str_val; // value if type = 's'
|
||||||
|
};
|
||||||
|
|
||||||
|
OptInfo() {};
|
||||||
|
};
|
||||||
|
|
||||||
|
#endif
|
||||||
|
|
54
11/wywolania/Data/stide_v1.2/sample.config
Normal file
54
11/wywolania/Data/stide_v1.2/sample.config
Normal file
@ -0,0 +1,54 @@
|
|||||||
|
#ConfigFileRev: 1
|
||||||
|
#Sample STIDE configuration file containing default values.
|
||||||
|
|
||||||
|
db_name: default.db # name of database
|
||||||
|
seq_len: 6 # length of sequences
|
||||||
|
max_elements: 500 # maximum number of unique elements in input
|
||||||
|
max_streams: 100 # maximum number of unique streams in input
|
||||||
|
pair_offset: 0 # offset for pair number count
|
||||||
|
add_output_format: \
|
||||||
|
"DB Size: %d\tStream: %s\tPair Number: %p\n"
|
||||||
|
# In verbose mode, STIDE will print
|
||||||
|
# this information for every new
|
||||||
|
# sequence added to the database. In
|
||||||
|
# very verbose mode, STIDE will print
|
||||||
|
# this information for every sequence
|
||||||
|
# considered. Possible data:
|
||||||
|
# %d Database Size
|
||||||
|
# %i Pair number of last data element of
|
||||||
|
# sequence in its particular
|
||||||
|
# data stream
|
||||||
|
# %p Pair number of last data element of
|
||||||
|
# sequence in the whole input
|
||||||
|
# stream
|
||||||
|
# %s Stream Number
|
||||||
|
|
||||||
|
compare_output_format: \
|
||||||
|
"Pair Number: %p\tStream Number: %s\n"
|
||||||
|
# In verbose mode, STIDE will print
|
||||||
|
# this information for every sequence
|
||||||
|
# which is itself an anomaly or whose
|
||||||
|
# locality frame conatins an anomaly.
|
||||||
|
# In very verbose mode, STIDE will
|
||||||
|
# print this information for every
|
||||||
|
# sequence. Possible data:
|
||||||
|
# %a 1 if this sequence is an anomaly, 0
|
||||||
|
# otherwise
|
||||||
|
# %c locality frame count of this sequence
|
||||||
|
# %h Hamming distance
|
||||||
|
# %i Pair number of last data element of
|
||||||
|
# its particular data stream
|
||||||
|
# %p Pair number of last data element of
|
||||||
|
# the entire input
|
||||||
|
# %s Stream Number
|
||||||
|
lf_size: 1 # 1 causes locality frame counts not
|
||||||
|
# to be computed
|
||||||
|
add_to_db: off # Add this data to the database, or, if there
|
||||||
|
# is no database, create a new one -- do not
|
||||||
|
# do comparisons
|
||||||
|
output_graph: off # Outputs graphing information in Dot
|
||||||
|
# format
|
||||||
|
compute_hdist: off # Compute Hamming distances
|
||||||
|
write_db_stats: off # At end, print out statistics about database
|
||||||
|
verbose: off # See add_ouput_format and compare_output_format
|
||||||
|
very_verbose: off # See add_ouput_format and compare_output_format
|
576
11/wywolania/Data/stide_v1.2/stide.C
Normal file
576
11/wywolania/Data/stide_v1.2/stide.C
Normal file
@ -0,0 +1,576 @@
|
|||||||
|
/*********************************************************************
|
||||||
|
* *
|
||||||
|
* STIDE: Sequence Time-Delay Embedding v1.2 *
|
||||||
|
* *
|
||||||
|
* Written by Steve Hofmeyr 7/21/1996 *
|
||||||
|
* Revised by Julie Rehmeyer 3/1998 *
|
||||||
|
* Revised by Hajime Inoue 11/2006 *
|
||||||
|
* *
|
||||||
|
* Copyright (C) 1996, 1998 Regents of the University of New Mexico. *
|
||||||
|
* Copyright (C) 2006 Hajime Inoue. *
|
||||||
|
* All Rights Reserved. *
|
||||||
|
* *
|
||||||
|
* This program is free software; you can redistribute it and/or *
|
||||||
|
* modify it under the terms of the GNU General Public License as *
|
||||||
|
* published by the Free Software Foundation; either version 2 of *
|
||||||
|
* the License, or (at your option) any later version. *
|
||||||
|
* *
|
||||||
|
* This program is distributed in the hope that it will be useful, *
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of *
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the *
|
||||||
|
* GNU General Public License for more details. *
|
||||||
|
* *
|
||||||
|
* You should have received a copy of the GNU General Public *
|
||||||
|
* License along with this program; if not, write to the Free *
|
||||||
|
* Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, *
|
||||||
|
* USA. *
|
||||||
|
* *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <string>
|
||||||
|
#include <iostream>
|
||||||
|
#include <fstream>
|
||||||
|
#include <vector>
|
||||||
|
#include <map>
|
||||||
|
#include "config.h"
|
||||||
|
#include "stream.h"
|
||||||
|
#include "flexitree.h"
|
||||||
|
|
||||||
|
#define DBREV 1
|
||||||
|
|
||||||
|
using std::vector;
|
||||||
|
|
||||||
|
using std::cin;
|
||||||
|
using std::cerr;
|
||||||
|
using std::cout;
|
||||||
|
using std::endl;
|
||||||
|
using std::ofstream;
|
||||||
|
|
||||||
|
typedef std::map<int, int> HashTableInt;
|
||||||
|
|
||||||
|
int counter = 0;
|
||||||
|
|
||||||
|
Stream *GetReadyStream(vector<Stream> &streams, HashTableInt
|
||||||
|
&sid_table, int &num_streams_fnd, int
|
||||||
|
&total_pairs_read, const Config &cfg);
|
||||||
|
int ReadDB(SeqForest &db_forest, const string &db_name,
|
||||||
|
int &seq_len);
|
||||||
|
void WriteDB(const SeqForest &db_forest, const string &db_name, const
|
||||||
|
int db_size, const int seq_len);
|
||||||
|
void FinalReport(const Config &cfg, const SeqForest &normal, const int
|
||||||
|
num_streams_fnd, const int num_seqs_added, const
|
||||||
|
vector<Stream> &streams, const int db_size);
|
||||||
|
void WriteDBStats(const SeqForest &db_forest, ostream &out_stream,
|
||||||
|
const int db_size);
|
||||||
|
void OutputGraph(const SeqForest &db_forest, string db_name);
|
||||||
|
int GetPrimeLargerThan(const int n);
|
||||||
|
|
||||||
|
|
||||||
|
int ExtToInt(HashTableInt &sid_table, int key, int next_value)
|
||||||
|
{
|
||||||
|
if(sid_table.find(key) == sid_table.end())
|
||||||
|
sid_table[key] = next_value;
|
||||||
|
|
||||||
|
return sid_table[key];
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* main() *
|
||||||
|
* Input: int argc: Number of command-line arguments *
|
||||||
|
* char *argv[]: array of strings containing *
|
||||||
|
* command-line arguments *
|
||||||
|
* Output: 0 if successful, -1 if unsuccessful *
|
||||||
|
*********************************************************************/
|
||||||
|
|
||||||
|
int main(int argc, char *argv[])
|
||||||
|
|
||||||
|
{
|
||||||
|
Config cfg((const int) argc, (const char **) argv);
|
||||||
|
// Declare configuration object and do
|
||||||
|
// the configuration on the basis of the
|
||||||
|
// command line arguments and the
|
||||||
|
// configuration file
|
||||||
|
Stream *active_stream; // This will point to the stream that
|
||||||
|
// currently has a sequence to be worked
|
||||||
|
// on (either added to the database or
|
||||||
|
// compared).
|
||||||
|
HashTableInt sid_table;
|
||||||
|
// Hash table relating external stream ids to
|
||||||
|
// internal sids; make size of table
|
||||||
|
// smallest prime larger than the number
|
||||||
|
// of streams
|
||||||
|
SeqForest normal(cfg.max_elements); // Uninitialized forest of
|
||||||
|
// normal sequences
|
||||||
|
vector<Stream> streams(cfg.max_streams); // Array of stream objects,
|
||||||
|
// one for each data stream
|
||||||
|
// in input, which are
|
||||||
|
// allocated as needed
|
||||||
|
int num_streams_fnd = 0; // Number of data streams
|
||||||
|
// encountered to date
|
||||||
|
int total_pairs_read = cfg.pair_offset; // Number of pairs read from
|
||||||
|
// input to date from all
|
||||||
|
// the data streams combined
|
||||||
|
// -- can be offset using
|
||||||
|
// the "-n" switch
|
||||||
|
int db_size; // Total number of unique
|
||||||
|
// sequences in the database
|
||||||
|
int init_db_size = 0; // Number of unique
|
||||||
|
// sequences in the
|
||||||
|
// pre-existing database
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
// Read database into normal, if database exists
|
||||||
|
db_size = init_db_size = ReadDB(normal, cfg.db_name, cfg.seq_len);
|
||||||
|
|
||||||
|
if (cfg.add_to_db)
|
||||||
|
{
|
||||||
|
while ((active_stream =
|
||||||
|
GetReadyStream(streams, sid_table, num_streams_fnd,
|
||||||
|
total_pairs_read, cfg)) != NULL)
|
||||||
|
{
|
||||||
|
active_stream->AddToDB(normal, db_size, total_pairs_read, cfg);
|
||||||
|
}
|
||||||
|
WriteDB(normal, cfg.db_name, db_size, cfg.seq_len);
|
||||||
|
if (cfg.output_graph)
|
||||||
|
{
|
||||||
|
OutputGraph(normal,cfg.db_name);
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
int i = 0;
|
||||||
|
while ((active_stream =
|
||||||
|
GetReadyStream(streams, sid_table, num_streams_fnd,
|
||||||
|
total_pairs_read, cfg)) != NULL)
|
||||||
|
{
|
||||||
|
active_stream->CompareSeq(cfg, normal, total_pairs_read);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
FinalReport(cfg, normal, num_streams_fnd, db_size - init_db_size,
|
||||||
|
streams, db_size);
|
||||||
|
|
||||||
|
return(0);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**********************************************************************
|
||||||
|
* GetReadyStream() *
|
||||||
|
* This function reads a pair from the input, appends the element *
|
||||||
|
* to the current sequence string in the appropriate data stream, *
|
||||||
|
* finds out if that data stream has a complete sequence to be *
|
||||||
|
* processed, continues until it has found such a data stream, and *
|
||||||
|
* returns a pointer to it. It updates num_streams_fnd, *
|
||||||
|
* total_pairs_read, sid_table, and streams. *
|
||||||
|
* *
|
||||||
|
* Input: vector<Stream> &streams: the array of streams that we have *
|
||||||
|
* found so far *
|
||||||
|
* HashTableInt &sid_table: hash table relating external sids *
|
||||||
|
* to internal sids *
|
||||||
|
* int &num_streams_fnd: the number of streams found so far; *
|
||||||
|
* int &total_pairs_read: the number of pairs read from the *
|
||||||
|
* input stream so far *
|
||||||
|
* const Config &cfg: configuration information *
|
||||||
|
* *
|
||||||
|
* Output: a pointer to the next stream that is ready for processing *
|
||||||
|
**********************************************************************/
|
||||||
|
|
||||||
|
Stream *GetReadyStream(vector<Stream> &streams, HashTableInt
|
||||||
|
&sid_table, int &num_streams_fnd, int
|
||||||
|
&total_pairs_read, const Config &cfg)
|
||||||
|
|
||||||
|
{
|
||||||
|
Stream *ready_stream = NULL;
|
||||||
|
int ext_sid;
|
||||||
|
int int_sid;
|
||||||
|
int sval;
|
||||||
|
|
||||||
|
cin >> ext_sid;
|
||||||
|
while (!cin.eof()) {
|
||||||
|
if (ext_sid == -1) {
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
// int_sid = sid_table.ExtToInt(ext_sid, num_streams_fnd);
|
||||||
|
int_sid = ExtToInt(sid_table, ext_sid, num_streams_fnd);
|
||||||
|
cin >> sval;
|
||||||
|
++total_pairs_read;
|
||||||
|
|
||||||
|
// Update num_streams_fnd, if necessary
|
||||||
|
if (int_sid >= num_streams_fnd) {
|
||||||
|
if (int_sid > cfg.max_streams) {
|
||||||
|
cerr<<"ERROR: Too many streams to follow, aborting..."<<endl;
|
||||||
|
exit(-1);
|
||||||
|
}
|
||||||
|
|
||||||
|
// We need a new stream object
|
||||||
|
if(num_streams_fnd == streams.size())
|
||||||
|
{
|
||||||
|
cerr << "WRITING OVER THE END OF THE ARRAY" << endl;
|
||||||
|
cerr << "num_streams_fnd: " << num_streams_fnd << endl;
|
||||||
|
cerr << "cfg.max_streams: " << cfg.max_streams << endl;
|
||||||
|
exit(-1);
|
||||||
|
}
|
||||||
|
streams[num_streams_fnd].Init(cfg, int_sid, ext_sid);
|
||||||
|
num_streams_fnd = int_sid + 1;
|
||||||
|
}
|
||||||
|
streams[int_sid].Append(sval);
|
||||||
|
if (streams[int_sid].Ready()) {
|
||||||
|
ready_stream = &streams[int_sid];
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
cin >> ext_sid;
|
||||||
|
}
|
||||||
|
|
||||||
|
return ready_stream;
|
||||||
|
}
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* ReadDB() *
|
||||||
|
* Reads the database from a file and returns the number of unique *
|
||||||
|
* sequences in the database. Checks for appropriate revision *
|
||||||
|
* number. If it is a revision DBREV database, the second line *
|
||||||
|
* will be "#DBseq_len: " followed by the sequence length. The *
|
||||||
|
* next line will contain a single number, giving the root of the *
|
||||||
|
* first tree. The following lines will contain the tree itself. *
|
||||||
|
* The first seq_len numbers make up the first sequence (so the *
|
||||||
|
* first number of the second line will be the same as the number *
|
||||||
|
* on the first line). The next number will be a negative number *
|
||||||
|
* between -(seq_len-1) and -2, indicating how far to backtrack in *
|
||||||
|
* the first sequence, and the following positive numbers give the *
|
||||||
|
* rest of the second sequence. So, for example, -3 would mean *
|
||||||
|
* backtrack 3 numbers, take the previous numbers including the *
|
||||||
|
* one you're on, and append the next two numbers. So after the *
|
||||||
|
* -3 you would find two positive numbers, followed by a negative *
|
||||||
|
* number (which you would use the same way as you used the -3, on *
|
||||||
|
* the most recent sequence). Each tree is terminated by the *
|
||||||
|
* number -1. So the sample input file *
|
||||||
|
* 3 *
|
||||||
|
* 3 4 2 9 10 3 -4 3 9 8 -2 3 -3 4 9 -1 *
|
||||||
|
* 2 *
|
||||||
|
* 2 3 4 5 6 7 -3 2 9 -1 *
|
||||||
|
* yields the sequences: *
|
||||||
|
* 3 4 2 9 10 3 *
|
||||||
|
* 3 4 2 3 9 8 *
|
||||||
|
* 3 4 2 3 9 3 *
|
||||||
|
* 3 4 2 3 4 9 *
|
||||||
|
* 2 3 4 5 6 7 *
|
||||||
|
* 2 3 4 5 2 9 *
|
||||||
|
* *
|
||||||
|
* Input: SeqForest &db_forest Forest of sequences *
|
||||||
|
* const string &db_name Name of database *
|
||||||
|
* int &seq_len User-specified sequence length *
|
||||||
|
* *
|
||||||
|
* Output: the number of unique sequences in the database *
|
||||||
|
* *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
int ReadDB(SeqForest &db_forest, const string &db_name,
|
||||||
|
int &seq_len)
|
||||||
|
{
|
||||||
|
ifstream in_db_file(db_name.c_str()); // file to read the database from
|
||||||
|
int db_size = 0; // size of the database
|
||||||
|
int root; // the first element of the sequences
|
||||||
|
// we are reading in at the moment;
|
||||||
|
// i.e., the root of this tree
|
||||||
|
string buff;
|
||||||
|
int db_seq_len;
|
||||||
|
int rev_num;
|
||||||
|
|
||||||
|
if (!in_db_file.is_open()) {
|
||||||
|
cerr<<"WARNING: Cannot open database file " << db_name
|
||||||
|
<< " for input"<<endl<<"Creating a new file"<<endl;
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check to see if the first line contains "#DBrev:"
|
||||||
|
in_db_file>>buff;
|
||||||
|
if (buff == "#DBrev:") {
|
||||||
|
in_db_file>>rev_num;
|
||||||
|
if (rev_num > DBREV) {
|
||||||
|
cerr << "ERROR: The revision number is greater than " << DBREV
|
||||||
|
<< ". This version of STIDE is only capable of dealing "
|
||||||
|
<< "with databases through DBrev " << DBREV
|
||||||
|
<< ". Aborting..."<<endl;
|
||||||
|
exit(-1);
|
||||||
|
}
|
||||||
|
if (rev_num < DBREV) {
|
||||||
|
cerr << "ERROR: Revision number of database must be >= " << DBREV
|
||||||
|
<< endl;
|
||||||
|
exit(-1);
|
||||||
|
}
|
||||||
|
// Now we know that it is revision DBREV. Check sequence length of
|
||||||
|
// database against user-indicated sequence length
|
||||||
|
in_db_file>>buff;
|
||||||
|
// Now check to see if next line is "#DBseq_len: " followed by a
|
||||||
|
// number
|
||||||
|
if (buff != "#DBseq_len:") {
|
||||||
|
cerr << "ERROR: The second line of the database does not "
|
||||||
|
<< "contain the string \"#DBseq_len: \"" << endl
|
||||||
|
<< "followed by the sequence length of the database, as "
|
||||||
|
<< "required of revision " << DBREV
|
||||||
|
<< " databases. Aborting..."<< endl;
|
||||||
|
exit(-1);
|
||||||
|
}
|
||||||
|
in_db_file>>db_seq_len;
|
||||||
|
if (db_seq_len != seq_len) {
|
||||||
|
cerr << "WARNING: Database sequence length is " << db_seq_len
|
||||||
|
<< ", which does not match "
|
||||||
|
<< "sequence length specified" << endl
|
||||||
|
<< "by user (or by default if no specification was given), "
|
||||||
|
<< "which is " << seq_len << endl
|
||||||
|
<< "I will use the database sequence length. If that is "
|
||||||
|
<< "not what you intended, type Ctrl-C to abort." << endl;
|
||||||
|
seq_len = db_seq_len;
|
||||||
|
}
|
||||||
|
// Read next number into root
|
||||||
|
in_db_file >> root;
|
||||||
|
}
|
||||||
|
// Otherwise, we assume we have an old-style database, and let the
|
||||||
|
// user know that that's our assumption
|
||||||
|
else {
|
||||||
|
cerr << "WARNING: The string \"DBrev: \" is not in the first "
|
||||||
|
<< "line of the database." << endl
|
||||||
|
<< "I'm assuming that it's an older style of database, and "
|
||||||
|
<< "will read it in" << endl
|
||||||
|
<< "based on that assumption. If that is not what you want "
|
||||||
|
<< "me to do, type CTRL-C" << endl << endl;
|
||||||
|
// we have just read the first root into buff -- put it in root
|
||||||
|
// instead
|
||||||
|
root = atoi(buff.c_str());
|
||||||
|
}
|
||||||
|
|
||||||
|
while (!in_db_file.eof()) {
|
||||||
|
if (root == -1) break;
|
||||||
|
db_forest.trees_found[root]++;
|
||||||
|
in_db_file>>db_forest.trees[root];
|
||||||
|
db_size += db_forest.trees[root].NumLeaves();
|
||||||
|
in_db_file>>root;
|
||||||
|
}
|
||||||
|
in_db_file.close();
|
||||||
|
|
||||||
|
return db_size;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* WriteDB() *
|
||||||
|
* Writes db_forest to the file db_name, with the format described *
|
||||||
|
* in the header of ReadDB(). Prints database statistics at the *
|
||||||
|
* end of the file. *
|
||||||
|
* *
|
||||||
|
* Input: const SeqForest &db_forest Forest of sequences in *
|
||||||
|
* database *
|
||||||
|
* const string &db_name Name of file in which to *
|
||||||
|
* put database. *
|
||||||
|
* const int db_size Number of unique sequences *
|
||||||
|
* in the database *
|
||||||
|
* const int seq_len Sequence length *
|
||||||
|
* *
|
||||||
|
* Output: none *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
void WriteDB(const SeqForest &db_forest, const string &db_name, const
|
||||||
|
int db_size, const int seq_len)
|
||||||
|
{
|
||||||
|
ofstream out_db_file(db_name.c_str());
|
||||||
|
|
||||||
|
if (!out_db_file.is_open())
|
||||||
|
{
|
||||||
|
cerr << "ERROR: Cannot open database file " << db_name
|
||||||
|
<< "for output, aborting..." << endl ;
|
||||||
|
exit(-2);
|
||||||
|
}
|
||||||
|
|
||||||
|
out_db_file << "#DBrev: " << DBREV << endl;
|
||||||
|
out_db_file << "#DBseq_len: " << seq_len << endl;
|
||||||
|
|
||||||
|
for (int i = 0; i < db_forest.trees.size(); i++)
|
||||||
|
{
|
||||||
|
if (db_forest.trees_found[i])
|
||||||
|
{
|
||||||
|
out_db_file<<i<<endl;
|
||||||
|
out_db_file << db_forest.trees[i] << endl;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
out_db_file<<" -1"<<endl;
|
||||||
|
// we can now write anything, so I will write the db stats
|
||||||
|
out_db_file<<"; DB STATS"<<endl;
|
||||||
|
WriteDBStats(db_forest, out_db_file, db_size);
|
||||||
|
out_db_file.close();
|
||||||
|
}
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* FinalReport() *
|
||||||
|
* Reports data at end of run. The number of streams, the number *
|
||||||
|
* of input pairs, and the number of sequences in the input are *
|
||||||
|
* always reported. If we have done a comparison run, we report *
|
||||||
|
* the number of anomalies, and the precentage of sequences that *
|
||||||
|
* were anomalous. Additionally, if asked for, the Hamming *
|
||||||
|
* distance or locality frame count is reported. If we have added *
|
||||||
|
* to the database, we report having done so and report the number *
|
||||||
|
* of sequences added. If database statistics are asked for, we *
|
||||||
|
* report the number of nodes, the number of unique sequences, the *
|
||||||
|
* number of branches, and the average database branch factor. *
|
||||||
|
* *
|
||||||
|
* Input: const Config &cfg: Configuration information *
|
||||||
|
* const SeqForest &normal: DB of normal sequences *
|
||||||
|
* const int num_streams_fnd: Total number of streams found*
|
||||||
|
* const int num_seqs_added: Number of unique sequences *
|
||||||
|
* added *
|
||||||
|
* const vector<Stream> &streams: Array of data streams *
|
||||||
|
* const int db_size: Number of unique sequences *
|
||||||
|
* in DB *
|
||||||
|
* *
|
||||||
|
* Output: none *
|
||||||
|
* *
|
||||||
|
*********************************************************************/
|
||||||
|
|
||||||
|
void FinalReport(const Config &cfg, const SeqForest &normal, const int
|
||||||
|
num_streams_fnd, const int num_seqs_added, const
|
||||||
|
vector<Stream> &streams, const int db_size)
|
||||||
|
{
|
||||||
|
int total_pairs = 0;
|
||||||
|
int total_seqs = 0;
|
||||||
|
int total_anoms = 0;
|
||||||
|
int total_max_lfc = 0;
|
||||||
|
int total_max_hdist = 0;
|
||||||
|
int db_nodes = 0;
|
||||||
|
int db_seqs = 0;
|
||||||
|
int db_branches = 0;
|
||||||
|
int j;
|
||||||
|
|
||||||
|
// Sum up number of pairs input and number of seqs from all the streams
|
||||||
|
for (j = 0; j < num_streams_fnd; j++) {
|
||||||
|
total_seqs += streams[j].GetNumSeqsFnd();
|
||||||
|
total_pairs += streams[j].GetNumPairsRead();
|
||||||
|
}
|
||||||
|
|
||||||
|
cout << endl;
|
||||||
|
cout << "Number of different streams in input = "
|
||||||
|
<< num_streams_fnd << endl;
|
||||||
|
cout << "Total number of input pairs = "
|
||||||
|
<< total_pairs << endl;
|
||||||
|
cout << "Total number of sequences in input = "
|
||||||
|
<< total_seqs << endl;
|
||||||
|
|
||||||
|
if (cfg.add_to_db) {
|
||||||
|
cout << "File added to database" << endl;
|
||||||
|
cout << "Number of new sequences added to the database: "
|
||||||
|
<< num_seqs_added << endl;
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
cout << "Scan completed" << endl;
|
||||||
|
// Sum up number of anomalies from all the streams
|
||||||
|
for (j = 0; j < num_streams_fnd; j++) {
|
||||||
|
total_anoms += streams[j].GetNumAnoms();
|
||||||
|
}
|
||||||
|
|
||||||
|
cout << "Number of anomalies = "
|
||||||
|
<< total_anoms << endl;
|
||||||
|
cout << "Percentage anomalous = "
|
||||||
|
<< ((float)total_anoms * 100.0)/total_seqs << endl;
|
||||||
|
|
||||||
|
// If asked for, compute Hamming distances across streams and report
|
||||||
|
if (cfg.compute_hdist) {
|
||||||
|
for (j = 0; j < num_streams_fnd; j++) {
|
||||||
|
if (streams[j].GetMaxHDist() > total_max_hdist) {
|
||||||
|
total_max_hdist = streams[j].GetMaxHDist();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
cout << "Largest minimum Hamming distance = "
|
||||||
|
<< total_max_hdist << endl;
|
||||||
|
}
|
||||||
|
|
||||||
|
// If asked for, compute lfc across streams and report
|
||||||
|
if (cfg.lf_size > 1) {
|
||||||
|
for (j = 0; j < num_streams_fnd; j++) {
|
||||||
|
if (streams[j].GetMaxLFC() > total_max_lfc) {
|
||||||
|
total_max_lfc = streams[j].GetMaxLFC();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
cout << "Maximum lfc = " << total_max_lfc << endl;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// If asked for, compute db stats and report
|
||||||
|
if (cfg.write_db_stats) {
|
||||||
|
WriteDBStats(normal, cout, db_size);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* WriteDBStats() *
|
||||||
|
* Computes and writes to standard output the number of nodes in *
|
||||||
|
* the database, the number of unique sequences, the number of *
|
||||||
|
* branches, and the average database branch factor. *
|
||||||
|
* *
|
||||||
|
* Input: const SeqForest &db_forest Forest of sequences in *
|
||||||
|
* database *
|
||||||
|
* ostream &out_stream Where to write info *
|
||||||
|
* const int db_size Number of unique sequences in the *
|
||||||
|
* database *
|
||||||
|
* *
|
||||||
|
* Output: none *
|
||||||
|
*********************************************************************/
|
||||||
|
|
||||||
|
void WriteDBStats(const SeqForest &db_forest, ostream &out_stream,
|
||||||
|
const int db_size)
|
||||||
|
{
|
||||||
|
int db_nodes = 0;
|
||||||
|
int db_branches = 0;
|
||||||
|
|
||||||
|
for (int i = 0; i < db_forest.trees.size(); i++)
|
||||||
|
{
|
||||||
|
if (db_forest.trees_found[i])
|
||||||
|
{
|
||||||
|
db_nodes += db_forest.trees[i].NumNodes();
|
||||||
|
db_branches += db_forest.trees[i].NumBranches();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
out_stream << "Number of DB nodes = " << db_nodes << endl;
|
||||||
|
out_stream << "Number of unique sequences = "<<db_size << endl;
|
||||||
|
out_stream << "Number of branches (edges) = "<<db_branches << endl;
|
||||||
|
out_stream << "Average DB branch factor = "
|
||||||
|
<<((float)db_branches/(db_nodes - db_size))<<endl;
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* OutputGraph() *
|
||||||
|
* Writes a file db_name.dot containing input for the program Dot. *
|
||||||
|
* Running Dot on db_name.dot produces a PostScript file *
|
||||||
|
* containing a picture of the whole database tree. *
|
||||||
|
* *
|
||||||
|
* Input: const SeqForest &db_forest Forest of sequences in *
|
||||||
|
* database *
|
||||||
|
* const string db_name Filename to use *
|
||||||
|
* *
|
||||||
|
* Output: none *
|
||||||
|
*********************************************************************/
|
||||||
|
|
||||||
|
void OutputGraph(const SeqForest &db_forest, const string db_name)
|
||||||
|
{
|
||||||
|
char *dot_filename;
|
||||||
|
dot_filename = new char [strlen(db_name.c_str())+4];
|
||||||
|
strcpy(dot_filename, db_name.c_str());
|
||||||
|
ofstream output_file(strcat(dot_filename,".dot"));
|
||||||
|
|
||||||
|
output_file<<"digraph \""<<db_name<<"\" {"<<endl;
|
||||||
|
output_file<<" ratio=auto;"<<endl;
|
||||||
|
output_file<<" page=\"8.5,11\";"<<endl;
|
||||||
|
for (int i = 0; i < db_forest.trees.size(); i++) {
|
||||||
|
if (db_forest.trees_found[i])
|
||||||
|
db_forest.trees[i].OutputGraph(output_file);
|
||||||
|
}
|
||||||
|
output_file<<"}"<<endl;
|
||||||
|
output_file.close();
|
||||||
|
}
|
||||||
|
|
367
11/wywolania/Data/stide_v1.2/stream.C
Normal file
367
11/wywolania/Data/stide_v1.2/stream.C
Normal file
@ -0,0 +1,367 @@
|
|||||||
|
#include <stdlib.h>
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <string>
|
||||||
|
#include <iostream>
|
||||||
|
#include <fstream>
|
||||||
|
#include "stream.h"
|
||||||
|
|
||||||
|
/********************************************************************
|
||||||
|
* Init() *
|
||||||
|
* Initializes an instance of Stream. *
|
||||||
|
* *
|
||||||
|
* Input: const Config &cfg Configuration information *
|
||||||
|
* const int intern internal stream identifier *
|
||||||
|
* const int extern external stream identifier *
|
||||||
|
* Output: none *
|
||||||
|
*******************************************************************/
|
||||||
|
|
||||||
|
using std::cerr;
|
||||||
|
using std::endl;
|
||||||
|
|
||||||
|
void Stream::Init(const Config &cfg,
|
||||||
|
const int intern_id, const int extern_id) {
|
||||||
|
int i;
|
||||||
|
// initialize all the arrays
|
||||||
|
current_seq.clear();
|
||||||
|
current_seq.reserve(cfg.seq_len);
|
||||||
|
for(i=0; i < cfg.seq_len; i++)
|
||||||
|
current_seq[i] = -1;
|
||||||
|
|
||||||
|
num_in_seq = -1;
|
||||||
|
num_pairs_read = 0;
|
||||||
|
num_anoms = 0;
|
||||||
|
num_seqs_fnd = 0;
|
||||||
|
int_sid = intern_id;
|
||||||
|
ext_sid = extern_id;
|
||||||
|
max_hdist = 0;
|
||||||
|
seq_hdist = 0;
|
||||||
|
lf.reserve(cfg.lf_size);
|
||||||
|
for(i=0; i < cfg.lf_size; i++)
|
||||||
|
lf[i] = 0;
|
||||||
|
seq_lfc = 0;
|
||||||
|
max_lfc = 0;
|
||||||
|
ready = 0;
|
||||||
|
seq_len = cfg.seq_len;
|
||||||
|
}
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* Append() *
|
||||||
|
* This function puts the integer given into the current_seq array *
|
||||||
|
* as the last element. It flags ready according to whether *
|
||||||
|
* current_seq is full. Updates num_in_seq, ready, current_seq, *
|
||||||
|
* num_seqs_fnd, and num_pairs_read. *
|
||||||
|
* *
|
||||||
|
* Input: const int new_value The next value to be put into the *
|
||||||
|
* current_seq array *
|
||||||
|
* Output: none *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
void Stream::Append(const int new_value)
|
||||||
|
{
|
||||||
|
// missing system call - zero the current sequence
|
||||||
|
if (new_value == -1) {
|
||||||
|
num_in_seq = -1;
|
||||||
|
ready = 0;
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
num_pairs_read++;
|
||||||
|
if (num_in_seq < seq_len - 1) { // window not yet full
|
||||||
|
num_in_seq++;
|
||||||
|
current_seq[num_in_seq] = new_value;
|
||||||
|
if (num_in_seq == seq_len - 1) {
|
||||||
|
ready = 1;
|
||||||
|
++num_seqs_fnd;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
else {
|
||||||
|
// Roll over current_seq array
|
||||||
|
for (int k = 0; k < num_in_seq; k++) {
|
||||||
|
current_seq[k] = current_seq[k + 1];
|
||||||
|
}
|
||||||
|
current_seq[num_in_seq] = new_value;
|
||||||
|
++num_seqs_fnd;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/********************************************************************
|
||||||
|
* AddToDB() *
|
||||||
|
* *
|
||||||
|
* Adds current_seq to the database if it isn't already there; *
|
||||||
|
* Returns 0 if it is already there, 1 if it is new. Updates *
|
||||||
|
* normal and db_size. *
|
||||||
|
* *
|
||||||
|
* Input: SeqForest &normal Forest of normal sequences *
|
||||||
|
* int &db_size Number of unique sequences in the *
|
||||||
|
* database *
|
||||||
|
* const int total_pairs_read Number of pairs read from the *
|
||||||
|
* entire input stream *
|
||||||
|
* const Config &cfg Configuration Information *
|
||||||
|
* Output: 0 if sequence isn't new, 1 if it is *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
|
||||||
|
int Stream::AddToDB(SeqForest &normal, int &db_size, const int
|
||||||
|
total_pairs_read, const Config &cfg) const
|
||||||
|
{
|
||||||
|
int is_new;
|
||||||
|
|
||||||
|
// If there is not a tree with the same root as this sequence has,
|
||||||
|
// make a new tree with that root and flag trees_found
|
||||||
|
if (!normal.trees_found[current_seq[0]])
|
||||||
|
{
|
||||||
|
normal.trees[current_seq[0]].SetRoot(current_seq[0]);
|
||||||
|
normal.trees_found[current_seq[0]] = 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Try to add the sequence. If it's already there, is_new will be
|
||||||
|
// set to 0, otherwise it will be set to 1.
|
||||||
|
is_new = normal.trees[current_seq[0]].InsertSeq(current_seq, 0, seq_len-1);
|
||||||
|
db_size += is_new;
|
||||||
|
|
||||||
|
if ((is_new && cfg.verbose) || cfg.very_verbose)
|
||||||
|
{
|
||||||
|
ReportNewSeq(cfg, total_pairs_read, db_size);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (is_new)
|
||||||
|
return 1;
|
||||||
|
else
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* CompareSeq() *
|
||||||
|
* Compares the current sequence in this stream to the database, *
|
||||||
|
* in the manner indicated by the configuration file. Reports *
|
||||||
|
* on anomalies if told to by the configuration file. Updates *
|
||||||
|
* num_anoms, seq_hdist, max_hdist, seq_lfc, and max_lfc. *
|
||||||
|
* *
|
||||||
|
* Input: const Config &cfg: Information from configuration file *
|
||||||
|
* const SeqForest &normal: DB of normal sequences *
|
||||||
|
* const int total_pairs_read: Number of pairs read from *
|
||||||
|
* all of the streams *
|
||||||
|
* Output: none *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
void Stream::CompareSeq(const Config &cfg, const SeqForest &normal,
|
||||||
|
const int total_pairs_read)
|
||||||
|
{
|
||||||
|
int is_anom; // flag to indicate whether current_seq is an anomaly
|
||||||
|
|
||||||
|
is_anom = ComputeMisses(normal);
|
||||||
|
if ((is_anom) && (cfg.compute_hdist)) {
|
||||||
|
ComputeHDist(normal);
|
||||||
|
}
|
||||||
|
if (cfg.lf_size > 1) {
|
||||||
|
ComputeLF(is_anom, cfg.lf_size);
|
||||||
|
}
|
||||||
|
// if we're in verbose mode and either current_seq is an anomaly or
|
||||||
|
// its locality frame contains an anomaly, report it
|
||||||
|
if ((cfg.very_verbose) || (cfg.verbose && (is_anom || seq_lfc))) {
|
||||||
|
ReportSeq(cfg, total_pairs_read, is_anom);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* ComputeMisses() *
|
||||||
|
* Compares the current sequence to the database sequences. If *
|
||||||
|
* there is an exact match, we return 0. Otherwise we return 1. *
|
||||||
|
* Updates num_anoms and seq_hdist. *
|
||||||
|
* *
|
||||||
|
* Input: const SeqForest &normal: DB of normal sequences *
|
||||||
|
* Output: 0 if there is an exact match *
|
||||||
|
* 1 if the sequence is anomalous *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
|
||||||
|
int Stream::ComputeMisses(const SeqForest &normal)
|
||||||
|
{
|
||||||
|
if (normal.IsSeqInForest(current_seq, seq_len)) {
|
||||||
|
seq_hdist = 0;
|
||||||
|
return(0);
|
||||||
|
}
|
||||||
|
|
||||||
|
// We have an anomaly
|
||||||
|
++num_anoms;
|
||||||
|
return(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* ComputeHDist() *
|
||||||
|
* Compares the current sequence in this stream to each sequence *
|
||||||
|
* in the database in turn, adding up the number of mismatches *
|
||||||
|
* between the two sequences. The smallest difference between *
|
||||||
|
* the current sequence and the database sequences is the minimum *
|
||||||
|
* Hamming distance for the current sequence. If this minimum *
|
||||||
|
* Hamming distance is greater than the largest minimum Hamming *
|
||||||
|
* distance encountered so far, then the variable max_hdist is *
|
||||||
|
* updated. Updates seq_hdist and max_hdist. *
|
||||||
|
* *
|
||||||
|
* Input: const SeqForest &normal: DB of normal sequences *
|
||||||
|
* *
|
||||||
|
* Output: none *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
void Stream::ComputeHDist(const SeqForest &normal)
|
||||||
|
{
|
||||||
|
int misses_on_this_seq; // the number of mismatches between
|
||||||
|
// current_seq and the sequence we're
|
||||||
|
// comparing it with at the moment
|
||||||
|
seq_hdist = seq_len; // start with seq_hdist as high as
|
||||||
|
// possible
|
||||||
|
|
||||||
|
// We compare current_seq with each sequence in our database tree
|
||||||
|
for (int i = 0; i < normal.trees.size(); i++) {
|
||||||
|
// Have we seen any sequences starting with element i? If not, we
|
||||||
|
// can go on to consider sequences starting with element i+1.
|
||||||
|
if (normal.trees_found[i]) {
|
||||||
|
misses_on_this_seq =
|
||||||
|
normal.trees[i].ComputeHDistForTree(current_seq, 0, seq_len-1);
|
||||||
|
if (misses_on_this_seq < seq_hdist) {
|
||||||
|
seq_hdist = misses_on_this_seq;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if (seq_hdist > max_hdist) {
|
||||||
|
max_hdist = seq_hdist;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* ComputeLF() *
|
||||||
|
* Computes the number of misses in current_seq's locality frame. *
|
||||||
|
* Updates lf, seq_lfc and max_lfc. *
|
||||||
|
* *
|
||||||
|
* Input: const int is_anom Flag to indicate whether *
|
||||||
|
* current_seq is an anomaly *
|
||||||
|
* const int lf_size Size of locality frame *
|
||||||
|
* Output: none *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
|
||||||
|
void Stream::ComputeLF(const int is_anom, const int lf_size)
|
||||||
|
{
|
||||||
|
// When num_seqs_fnd is less than lf_size, the locality frame
|
||||||
|
// array is not full
|
||||||
|
if (num_seqs_fnd <= lf_size) {
|
||||||
|
lf[num_seqs_fnd-1] = is_anom;
|
||||||
|
seq_lfc += is_anom;
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
// We're about to remove the first element of lf; since seq_lfc is
|
||||||
|
// the sum of the elements of lf, we should subtract lf[0] from
|
||||||
|
// seq_lfc to remove it from the sum.
|
||||||
|
seq_lfc -= lf[0];
|
||||||
|
// Now we add is_anom and seq_lfc is the sum of the new locality
|
||||||
|
// frame.
|
||||||
|
seq_lfc += is_anom;
|
||||||
|
|
||||||
|
// roll over the array
|
||||||
|
for (int i = 0; i < lf_size-1; i++) {
|
||||||
|
lf[i] = lf[i+1];
|
||||||
|
}
|
||||||
|
lf[lf_size-1] = is_anom;
|
||||||
|
}
|
||||||
|
if (seq_lfc > max_lfc) {
|
||||||
|
max_lfc = seq_lfc;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* ReportSeq() *
|
||||||
|
* This function reports data about a sequence. Specifically, it *
|
||||||
|
* can report the external stream id, a number indicating where *
|
||||||
|
* the first element of the current sequence occurs in the input, *
|
||||||
|
* a number indicating how many pairs from this particular data *
|
||||||
|
* stream have been read prior to the first element of the *
|
||||||
|
* sequence, the minimum Hamming distance for the current *
|
||||||
|
* sequence, the locality frame count, the locality frame count, *
|
||||||
|
* and whether this particular sequence is itself an anomaly (it *
|
||||||
|
* could be that some other sequence in its locality frame is *
|
||||||
|
* anomalous). The configuration file determines which of those *
|
||||||
|
* possible data are reported and in what format. Updates no *
|
||||||
|
* values. *
|
||||||
|
* *
|
||||||
|
* Input: const Config &cfg Configuration information *
|
||||||
|
* const int total_pairs_read Total number of pairs read *
|
||||||
|
* from the input stream from any data *
|
||||||
|
* stream, not just this one *
|
||||||
|
* const int is_anom flag for whether the current *
|
||||||
|
* sequence is itself an anomaly *
|
||||||
|
* Output: none *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
void Stream::ReportSeq(const Config &cfg, const int total_pairs_read,
|
||||||
|
const int is_anom) const
|
||||||
|
{
|
||||||
|
for (int i = 0; i < cfg.num_fvars; i++) {
|
||||||
|
switch (cfg.write_val[i]) {
|
||||||
|
case 'a':
|
||||||
|
printf(cfg.fmt_str[i], is_anom); break;
|
||||||
|
case 'c':
|
||||||
|
if (cfg.lf_size > 1) {
|
||||||
|
printf(cfg.fmt_str[i], seq_lfc);
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
case 'h':
|
||||||
|
if (cfg.compute_hdist) {
|
||||||
|
printf(cfg.fmt_str[i], seq_hdist);
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
case 'i':
|
||||||
|
printf(cfg.fmt_str[i], num_pairs_read); break;
|
||||||
|
case 'p':
|
||||||
|
printf(cfg.fmt_str[i], total_pairs_read); break;
|
||||||
|
case 's':
|
||||||
|
printf(cfg.fmt_str[i], ext_sid); break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
printf(cfg.fmt_str[cfg.num_fvars]);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*********************************************************************
|
||||||
|
* ReportNewSeq() *
|
||||||
|
* This function reports on sequences which have been newly added *
|
||||||
|
* to the database. It can report the external stream *
|
||||||
|
* identifier, where the first element of the sequence occurs *
|
||||||
|
* both within the whole input stream and within its own data *
|
||||||
|
* stream, and the number of unique sequences in the database *
|
||||||
|
* after this sequence has been added. The configuration file *
|
||||||
|
* determines which of those possible data are reported and in *
|
||||||
|
* what format. Updates no values. *
|
||||||
|
* *
|
||||||
|
* Input: const Config &cfg Configuration information *
|
||||||
|
* const int total_pairs_read Total number of pairs read *
|
||||||
|
* from the input stream from any data *
|
||||||
|
* stream, not just this one *
|
||||||
|
* const int db_size Number of unique sequences *
|
||||||
|
* in the database *
|
||||||
|
* Output: none *
|
||||||
|
********************************************************************/
|
||||||
|
|
||||||
|
void Stream::ReportNewSeq(const Config &cfg, const int total_pairs_read,
|
||||||
|
const int db_size) const
|
||||||
|
{
|
||||||
|
for (int i = 0; i < cfg.num_fvars; i++) {
|
||||||
|
switch (cfg.write_val[i]) {
|
||||||
|
case 'd':
|
||||||
|
printf(cfg.fmt_str[i], db_size); break;
|
||||||
|
case 'i':
|
||||||
|
printf(cfg.fmt_str[i], num_pairs_read); break;
|
||||||
|
case 'p':
|
||||||
|
printf(cfg.fmt_str[i], total_pairs_read); break;
|
||||||
|
case 's':
|
||||||
|
printf(cfg.fmt_str[i], ext_sid); break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
printf(cfg.fmt_str[cfg.num_fvars]);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
63
11/wywolania/Data/stide_v1.2/stream.h
Normal file
63
11/wywolania/Data/stide_v1.2/stream.h
Normal file
@ -0,0 +1,63 @@
|
|||||||
|
#ifndef __STREAM_H
|
||||||
|
#define __STREAM_H
|
||||||
|
|
||||||
|
#include <vector>
|
||||||
|
#include "config.h"
|
||||||
|
#include "flexitree.h"
|
||||||
|
|
||||||
|
using std::vector;
|
||||||
|
|
||||||
|
class Stream {
|
||||||
|
public:
|
||||||
|
Stream() {};
|
||||||
|
void Init(const Config &cfg, const int intern_id, const int
|
||||||
|
extern_id);
|
||||||
|
void Append(const int next_value);
|
||||||
|
int AddToDB(SeqForest &normal, int &db_size, int total_pairs_read,
|
||||||
|
const Config &cfg) const;
|
||||||
|
void CompareSeq(const Config &cfg, const SeqForest &normal, const
|
||||||
|
int total_pairs_read);
|
||||||
|
int GetMaxHDist(void) const {return max_hdist;}
|
||||||
|
int GetMaxLFC(void) const {return max_lfc;}
|
||||||
|
int Ready(void) const {return ready;}
|
||||||
|
int GetNumAnoms(void) const {return num_anoms;}
|
||||||
|
int GetNumPairsRead(void) const {return num_pairs_read;}
|
||||||
|
int GetNumSeqsFnd(void) const {return num_seqs_fnd;}
|
||||||
|
private:
|
||||||
|
vector<int> current_seq; // current sequence being filled or
|
||||||
|
// processed
|
||||||
|
int num_in_seq; // current_seq is full up through
|
||||||
|
// num_in_seq
|
||||||
|
int num_pairs_read; // the number of input pairs belonging to
|
||||||
|
// this stream that have been read so far
|
||||||
|
int num_anoms; // the number of anomalies found so far
|
||||||
|
int num_seqs_fnd; // the number of (not necessarily unique)
|
||||||
|
// sequences belonging to this stream
|
||||||
|
// found so far
|
||||||
|
int ext_sid; // the external stream id
|
||||||
|
int int_sid; // the internal stream id
|
||||||
|
int max_hdist; // the largest minimum Hamming distance
|
||||||
|
// found in this stream
|
||||||
|
int seq_hdist; // the minimum Hamming distance for
|
||||||
|
// current_seq
|
||||||
|
vector<int> lf; // array for locality frame
|
||||||
|
int seq_lfc; // the locality frame count for this
|
||||||
|
// sequence
|
||||||
|
int max_lfc; // the largest locality frame count
|
||||||
|
// encountered so far
|
||||||
|
int ready; // a flag to indicate whether this stream
|
||||||
|
// has a full sequence ready to be
|
||||||
|
// processed. 0 = no, 1 = yes.
|
||||||
|
int seq_len; // sequence length
|
||||||
|
int ComputeMisses(const SeqForest &normal);
|
||||||
|
void ComputeHDist(const SeqForest &normal);
|
||||||
|
void ComputeLF(const int is_anom, const int lf_size);
|
||||||
|
void ReportSeq(const Config &cfg, const int total_pairs_read,
|
||||||
|
const int is_anom) const;
|
||||||
|
void ReportNewSeq(const Config &cfg, const int total_pairs_read,
|
||||||
|
const int db_size) const;
|
||||||
|
};
|
||||||
|
|
||||||
|
#endif
|
||||||
|
|
||||||
|
|
0
11/wywolania/main.R
Normal file
0
11/wywolania/main.R
Normal file
BIN
426254-l9.tb2
Normal file
BIN
426254-l9.tb2
Normal file
Binary file not shown.
Loading…
Reference in New Issue
Block a user