From fb89c938822fafbe1860dc7d14c71dea74726b1b Mon Sep 17 00:00:00 2001 From: ahypki Date: Fri, 26 Apr 2024 09:21:33 +0200 Subject: [PATCH 1/2] Update ex1-reading-fasta/description --- ex1-reading-fasta/description | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/ex1-reading-fasta/description b/ex1-reading-fasta/description index 6d7824d..724fe00 100644 --- a/ex1-reading-fasta/description +++ b/ex1-reading-fasta/description @@ -20,7 +20,7 @@ The FASTA format has to separate parts: 2. sequence -Thus, the excercise for reading FASTA file briefly looks like this: +Thus, the excercise for reading FASTA file, in C, briefly looks like this: 1. Select sequence and save it to a file, e.g. seq.fasta @@ -28,4 +28,20 @@ Thus, the excercise for reading FASTA file briefly looks like this: https://cplusplus.com/reference/cstdio/fopen/ -Tip: Use correct mode. \ No newline at end of file +Tip: Use correct mode. + +3. Read the file, line by line, using e.g. snippet of code from here: + +https://cplusplus.com/forum/beginner/22558/ + +4. Check if the lines starts with ';' or with '>', then skip such a line + +If the line is OK, then count how many A, C, T, G are in the given line and +save that information. + +In order to check if the line starts with a given character you can +simply use index of a string (line[0] - this is first character in the +array line). + +The end. + From 76635f4f25fb070c6896561857aa20e9aeba093d Mon Sep 17 00:00:00 2001 From: ahypki Date: Fri, 26 Apr 2024 09:29:08 +0200 Subject: [PATCH 2/2] Update ex1-reading-fasta/description --- ex1-reading-fasta/description | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/ex1-reading-fasta/description b/ex1-reading-fasta/description index 724fe00..557cde4 100644 --- a/ex1-reading-fasta/description +++ b/ex1-reading-fasta/description @@ -6,12 +6,19 @@ https://en.wikipedia.org/wiki/FASTA_format The FASTA format with one sequence for an elephant looks like this: ->gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus] -LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV -EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG -LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL -GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX -IENY +>BTBSCRYR +tgcaccaaacatgtctaaagctggaaccaaaattactttctttgaagacaaaaactttca +aggccgccactatgacagcgattgcgactgtgcagatttccacatgtacctgagccgctg +caactccatcagagtggaaggaggcacctgggctgtgtatgaaaggcccaattttgctgg +gtacatgtacatcctaccccggggcgagtatcctgagtaccagcactggatgggcctcaa +cgaccgcctcagctcctgcagggctgttcacctgtctagtggaggccagtataagcttca +gatctttgagaaaggggattttaatggtcagatgcatgagaccacggaagactgcccttc +catcatggagcagttccacatgcgggaggtccactcctgtaaggtgctggagggcgcctg +gatcttctatgagctgcccaactaccgaggcaggcagtacctgctggacaagaaggagta +ccggaagcccgtcgactggggtgcagcttccccagctgtccagtctttccgccgcattgt +ggagtgatgatacagatgcggccaaacgctggctggccttgtcatccaaataagcattat +aaataaaacaattggcatgc + The FASTA format has to separate parts: