'Adding desc for FASTA reading'

2024-04-26 08:54:47 +02:00 · 2024-04-26 08:54:47 +02:00 · 4e75e8a9c6
commit 4e75e8a9c6
parent 890d6186a5
1 changed files with 31 additions and 0 deletions
--- a/ex1-reading-fasta/description
+++ b/ex1-reading-fasta/description
@ -0,0 +1,31 @@
 The job is to read FASTA file and count how many A, C, T, G letters are in an example sequence.
 FASTA one can get from the internet. We can use e.g. wiki page to get one sequence:
 https://en.wikipedia.org/wiki/FASTA_format
 The FASTA format with one sequence for an elephant looks like this:
 >gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]
 LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV
 EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG
 LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL
 GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX
 IENY
 The FASTA format has to separate parts:
 1. header
 2. sequence
 Thus, the excercise for reading FASTA file briefly looks like this:
 1. Select sequence and save it to a file, e.g. seq.fasta
 2. Open the file seq.fasta in C with fopen:
 https://cplusplus.com/reference/cstdio/fopen/
 Tip: Use correct mode.