Bioinformatics Programming Exercises
These exercises accompany the Intro to Bioinformatics Programming module. They alternate between hands-on practice and knowledge checks.
Exercises developed by Kristine Lacek and Logan Fink
Exercise 1 — Pseudo Code Practical
Open a new .sh file for each of these problems and write out logical pseudo code as comments (#). This is to practice approaching a coding problem step by step, so there is no need to worry about syntax.
Think about what variables you might need to initiate or logic gates to use.
#1. If you had a file that was a list of numbers, determine the mean
Possible Solution
- Count up the number of entries in the file and store that number as a variable (e.g.,
total_entries). - Use a function to iterate through each number and add them line by line until the end of the file, then store that sum as a variable (
sum_entries). - Divide
sum_entriesbytotal_entries, and capture the final value as a float so it is not rounded to a whole integer.
#2. Using the same file from question 1, calculate the percentage of numbers that are below 6
Possible Solution
- Instantiate two variables,
xandy, and set both to zero (x= count of values below 6,y= count of values 6 or above). - Write a function to determine whether a value is less than 6.
- Iterate through the list and use each value as input to the function.
- If the condition is true, increment
xby 1; if false, incrementyby 1. - Create a new variable
zasx + y(total values), then dividexbyzand capture the result as a float.
#3. If you had a file containing all of the flu samples tested over the course of a year, determine how many different flu subtypes appear in that list?
Possible Solution
- Create an empty list to hold unique subtype values (e.g.,
unique_list). - Iterate through the subtype list and check whether each subtype is already in
unique_list. - If it is not present, add it to
unique_list. - Alternative: sort the list and condense to unique values (pipeline approach, e.g., sort/uniq).
#4. Translate a DNA sequence into its possible protein sequences (keeping in mind reading frames, both coding and non-coding)
Possible Solution
- Obtain a codon table for converting DNA codons into amino acids.
- Starting at nucleotide 1, iterate in steps of 3 and translate codons to build a protein sequence.
- Repeat from nucleotide 2 (frame 2).
- Repeat from nucleotide 3 (frame 3).
- Convert the DNA to its reverse complement.
- Repeat translation for the three reverse-complement frames.
Exercise 2 — Logic and Variables Practical
Retrieve the ordinal number file from github (ordinal_check.sh) to complete the following exercises:
wget https://raw.githubusercontent.com/CDCgov/id-bioifx-workshop/refs/heads/main/practical/bash_practical_exercises/logic_and_variable_practical/ordinal_check.sh
- Change the ordinal statement to execute as true if the number is greater than 50 and less than 100
- Change the ordinal statement to execute as true if the number is less than 25 or greater than 75
- Change the ordinal statement to execute as true if the number is greater than 1 and less than 10, or greater than or equal to 90 and less than 100
Possible Solution
-
Change the ordinal statement to execute as true if the number is greater than 50 and less than 100.
if [ "$n" -gt 50 ] && [ "$n" -lt 100 ]; then # amend echo statements to reflect exercise instructions fi -
Change the ordinal statement to execute as true if the number is less than 25 or greater than 75.
if [ "$n" -lt 25 ] || [ "$n" -gt 75 ]; then # amend echo statements to reflect exercise instructions fi -
Change the ordinal statement to execute as true if the number is greater than 1 and less than 10, or greater than or equal to 90 and less than 100.
if { [ "$n" -gt 1 ] && [ "$n" -lt 10 ]; } || { [ "$n" -ge 90 ] && [ "$n" -lt 100 ]; }; then # amend echo statements to reflect exercise instructions fi
Exercise 3 — Loops Practical
You’ll need to make a directory that contains a series of files for the next exercise. Use the directions to below to create the directory and download the files.
mkdir loops_practical && cd loops_practical; for ((i=1;i<=99;i++)); do wget https://raw.githubusercontent.com/CDCgov/id-bioifx-workshop/refs/heads/main/practical/bash_practical_exercises/loops_practical/cat${i}; done
- For every file in the loops_practical directory, if the file is not empty, print the name of the file to stout. (wc –byte < filename can be used to give the size of a file)
- For each file in a directory, find out if the file contains a shebang line as the first line, if so, print the filename to stout
- Find the sick cat! (Hint: execute the files with shebangs!)
Question:
What is the number of the sick cat?
Possible Solution
-
Print each non-empty file name in the
loops_practicaldirectory.for i in *; do bytesize=$(wc --byte < "$i") if [ "$bytesize" -gt 0 ]; then echo "$i" fi done -
Find the “sick cat” by executing only non-empty bash-readable files and printing successful file names.
A less elegant way to find the sick cat would be to execute every file in the directory, and scroll through the outputs.for i in *; do bytesize=$(wc --byte < "$i") if [ "$bytesize" -gt 0 ]; then if bash "$i" 2>/dev/null; then echo "$i" fi fi donefor i in *; do echo "$i" bash "$i" done
Exercise 4 — Pipeline practical
Download the files “flu_types.txt”, “decode_the_secret_message.txt”, and “secret_message_key.txt” into a directory called “pipeline_practical” using the following instructions.
mkdir pipeline_practical && cd pipeline_practical; for i in decode_the_secret_message.txt flu_types.txt secret_message_key.txt; do wget https://raw.githubusercontent.com/CDCgov/id-bioifx-workshop/refs/heads/main/practical/bash_practical_exercises/pipeline_practical/${i} ; done
- List the contents of a directory, pipe that output to word count to find how many files there are (may be helpful to use man wc to find out what wc can do)
- In one line, sort the contents of flu_types and output a list of the unique values
- Using the following instructions, decode the secret message (see if you can do it in a one-liner): 1%76q#948^4q5@23q2q492q07&/@i5#q#76
- Convert all the numbers into letters using the provided variable
- Reverse the order of the sequence
- Cut using “/” as delimiter, take the second field
- Convert all the letters into uppercase (Hint: you can translate
[:lower:]into[:upper:])
Possible Solution
-
Count files in a directory using a pipe to
wc.ls | wc -
Sort a file and return unique values (bonus: count unique values).
cat flu_types.txt | sort | uniq cat flu_types.txt | sort | uniq | wc -
Decode the secret message by converting characters, reversing, cutting field 2 by
/, then uppercasing.secret_message="$(cat decode_the_secret_message.txt)" secret_key="$(cat secret_message_key.txt)" echo "$secret_message" | tr "$secret_key" '123456789@#0%^&q=' | rev | cut -d"/" -f 2 | tr '[:lower:]' '[:upper:]' -
One-line direct decode example.
echo "1%76q#948^4q5@23q2q492q07&/@i5#q#76" | tr '123456789@#0%^&q=' '!abehnoprstuwxy_' | rev | cut -d"/" -f 2 | tr '[:lower:]' '[:upper:]'