This commit is contained in:
Arkadiusz Hypki 2024-05-10 10:41:58 +02:00
commit dfc612c819

View File

@ -6,12 +6,19 @@ https://en.wikipedia.org/wiki/FASTA_format
The FASTA format with one sequence for an elephant looks like this:
>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]
LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV
EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG
LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL
GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX
IENY
>BTBSCRYR
tgcaccaaacatgtctaaagctggaaccaaaattactttctttgaagacaaaaactttca
aggccgccactatgacagcgattgcgactgtgcagatttccacatgtacctgagccgctg
caactccatcagagtggaaggaggcacctgggctgtgtatgaaaggcccaattttgctgg
gtacatgtacatcctaccccggggcgagtatcctgagtaccagcactggatgggcctcaa
cgaccgcctcagctcctgcagggctgttcacctgtctagtggaggccagtataagcttca
gatctttgagaaaggggattttaatggtcagatgcatgagaccacggaagactgcccttc
catcatggagcagttccacatgcgggaggtccactcctgtaaggtgctggagggcgcctg
gatcttctatgagctgcccaactaccgaggcaggcagtacctgctggacaagaaggagta
ccggaagcccgtcgactggggtgcagcttccccagctgtccagtctttccgccgcattgt
ggagtgatgatacagatgcggccaaacgctggctggccttgtcatccaaataagcattat
aaataaaacaattggcatgc
The FASTA format has to separate parts:
@ -20,7 +27,7 @@ The FASTA format has to separate parts:
2. sequence
Thus, the excercise for reading FASTA file briefly looks like this:
Thus, the excercise for reading FASTA file, in C, briefly looks like this:
1. Select sequence and save it to a file, e.g. seq.fasta
@ -29,3 +36,19 @@ Thus, the excercise for reading FASTA file briefly looks like this:
https://cplusplus.com/reference/cstdio/fopen/
Tip: Use correct mode.
3. Read the file, line by line, using e.g. snippet of code from here:
https://cplusplus.com/forum/beginner/22558/
4. Check if the lines starts with ';' or with '>', then skip such a line
If the line is OK, then count how many A, C, T, G are in the given line and
save that information.
In order to check if the line starts with a given character you can
simply use index of a string (line[0] - this is first character in the
array line).
The end.