bioinf-2023-2024-introducti.../ex1-reading-fasta/description

The job is to read FASTA file and count how many A, C, T, G letters are in an example sequence.

FASTA one can get from the internet. We can use e.g. wiki page to get one sequence:

https://en.wikipedia.org/wiki/FASTA_format

The FASTA format with one sequence for an elephant looks like this:

>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]
LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV
EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG
LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL
GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX
IENY


The FASTA format has to separate parts:

1. header
2. sequence


Thus, the excercise for reading FASTA file, in C, briefly looks like this:

1. Select sequence and save it to a file, e.g. seq.fasta

2. Open the file seq.fasta in C with fopen:

https://cplusplus.com/reference/cstdio/fopen/

Tip: Use correct mode.

3. Read the file, line by line, using e.g. snippet of code from here:

https://cplusplus.com/forum/beginner/22558/

4. Check if the lines starts with ';' or with '>', then skip such a line

If the line is OK, then count how many A, C, T, G are in the given line and 
save that information.

In order to check if the line starts with a given character you can 
simply use index of a string (line[0] - this is first character in the 
array line).

The end.
'Adding desc for FASTA reading' 2024-04-26 08:54:47 +02:00			`The job is to read FASTA file and count how many A, C, T, G letters are in an example sequence.`

			`FASTA one can get from the internet. We can use e.g. wiki page to get one sequence:`

			`https://en.wikipedia.org/wiki/FASTA_format`

			`The FASTA format with one sequence for an elephant looks like this:`

			`>gi\|5524211\|gb\|AAD44166.1\| cytochrome b [Elephas maximus maximus]`
			`LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV`
			`EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG`
			`LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL`
			`GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX`
			`IENY`


			`The FASTA format has to separate parts:`

			`1. header`
			`2. sequence`


Update ex1-reading-fasta/description 2024-04-26 09:21:33 +02:00			`Thus, the excercise for reading FASTA file, in C, briefly looks like this:`
'Adding desc for FASTA reading' 2024-04-26 08:54:47 +02:00
			`1. Select sequence and save it to a file, e.g. seq.fasta`

			`2. Open the file seq.fasta in C with fopen:`

			`https://cplusplus.com/reference/cstdio/fopen/`

Update ex1-reading-fasta/description 2024-04-26 09:21:33 +02:00			`Tip: Use correct mode.`

			`3. Read the file, line by line, using e.g. snippet of code from here:`

			`https://cplusplus.com/forum/beginner/22558/`

			`4. Check if the lines starts with ';' or with '>', then skip such a line`

			`If the line is OK, then count how many A, C, T, G are in the given line and`
			`save that information.`

			`In order to check if the line starts with a given character you can`
			`simply use index of a string (line[0] - this is first character in the`
			`array line).`

			`The end.`