bioinf-2023-2024-introducti.../ex1-reading-fasta/description

31 lines
958 B
Plaintext
Raw Normal View History

2024-04-26 08:54:47 +02:00
The job is to read FASTA file and count how many A, C, T, G letters are in an example sequence.
FASTA one can get from the internet. We can use e.g. wiki page to get one sequence:
https://en.wikipedia.org/wiki/FASTA_format
The FASTA format with one sequence for an elephant looks like this:
>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]
LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV
EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG
LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL
GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX
IENY
The FASTA format has to separate parts:
1. header
2. sequence
Thus, the excercise for reading FASTA file briefly looks like this:
1. Select sequence and save it to a file, e.g. seq.fasta
2. Open the file seq.fasta in C with fopen:
https://cplusplus.com/reference/cstdio/fopen/
Tip: Use correct mode.