Bioinformatics Programming Exercises
These exercises accompany the Intro to Bioinformatics Programming module. They alternate between hands-on practice and knowledge checks.
Exercises developed by Kristine Lacek and Logan Fink
Exercise 1 — Pseudo Code Practical
#1. From a list of numbers, determine the mean
Possible Solution
- Count up the number of entries in the file and store that number as a variable (e.g.,
total_entries). - Use a function to iterate through each number and add them line by line until the end of the file, then store that sum as a variable (
sum_entries). - Divide
sum_entriesbytotal_entries, and capture the final value as a float so it is not rounded to a whole integer.
#2. Calculate the percentage of numbers in the file that are below 6
Possible Solution
- Instantiate two variables,
xandy, and set both to zero (x= count of values below 6,y= count of values 6 or above). - Write a function to determine whether a value is less than 6.
- Iterate through the list and use each value as input to the function.
- If the condition is true, increment
xby 1; if false, incrementyby 1. - Create a new variable
zasx + y(total values), then dividexbyzand capture the result as a float.
#3. How many different flu subtypes appear in a list?
Possible Solution
- Create an empty list to hold unique subtype values (e.g.,
unique_list). - Iterate through the subtype list and check whether each subtype is already in
unique_list. - If it is not present, add it to
unique_list. - Alternative: sort the list and condense to unique values (pipeline approach, e.g., sort/uniq).
#4. Translate this DNA into its possible protein sequences (keeping in mind frames, coding and non-coding)
Possible Solution
- Obtain a codon table for converting DNA codons into amino acids.
- Starting at nucleotide 1, iterate in steps of 3 and translate codons to build a protein sequence.
- Repeat from nucleotide 2 (frame 2).
- Repeat from nucleotide 3 (frame 3).
- Convert the DNA to its reverse complement.
- Repeat translation for the three reverse-complement frames.
Exercise 2 — Logic and Variables Practical
- Using the ordinal number file on github (ordinal_check.sh)
wget https://raw.githubusercontent.com/CDCgov/id-bioifx-workshop/refs/heads/main/practical/bash_practical_exercises/logic_and_variable_practical/ordinal_check.sh
- Exercise 1: change the ordinal statement to execute as true if the number is greater than 50 and less than 100
- Exercise 2: change the ordinal statement to execute as true if the number is less than 25 or greater than 75
- Exercise 3: change the ordinal statement to execute as true if the number is greater than 1 and less than 10, or greater than or equal to 90 and less than 100
- Set a variable to be the current working directory. Change directory to the top level directory. Navigate back to the directory you were in using the variable
- Using the Random number generator file on github (random_number_generator.sh)
Possible Solution
-
Exercise 1: change the ordinal statement to execute as true if the number is greater than 50 and less than 100.
if [ "$n" -gt 50 ] && [ "$n" -lt 100 ]; then # amend echo statements to reflect exercise instructions fi -
Exercise 2: change the ordinal statement to execute as true if the number is less than 25 or greater than 75.
if [ "$n" -lt 25 ] || [ "$n" -gt 75 ]; then # amend echo statements to reflect exercise instructions fi -
Exercise 3: change the ordinal statement to execute as true if the number is greater than 1 and less than 10, or greater than or equal to 90 and less than 100.
if { [ "$n" -gt 1 ] && [ "$n" -lt 10 ]; } || { [ "$n" -ge 90 ] && [ "$n" -lt 100 ]; }; then # amend echo statements to reflect exercise instructions fi -
Set a variable to the current working directory, change to the top-level directory, then return using the variable.
thispath="$(pwd)" cd cd "$thispath"
Exercise 3 — Loops Practical
mkdir loops_practical && cd loops_practical; for ((i=1;i<=99;i++)); do wget https://raw.githubusercontent.com/CDCgov/id-bioifx-workshop/refs/heads/main/practical/bash_practical_exercises/loops_practical/gato${i}; done
- For every file in the loops_practical directory, if the file is not empty, print the name of the file to stout. (wc - -byte < filename can be used to give the size of a file)
- For each file in a directory, find out if the file contains a shebang line as the first line, if so, print the filename to stout
- Find the sick cat! (Hint: execute the files with shebangs!)
- For every file in the loops_practical directory, if the file is not empty, print the name of the file to stout. (“wc - -byte < filename” can be used to give the size of a file)
- Find the sick cat! (Hint: execute the files with shebangs!)
Possible Solution
-
Print each non-empty file name in the
loops_practicaldirectory.for i in *; do bytesize=$(wc --byte < "$i") if [ "$bytesize" -gt 0 ]; then echo "$i" fi done -
Find the “sick cat” by executing only non-empty bash-readable files and printing successful file names.
for i in *; do bytesize=$(wc --byte < "$i") if [ "$bytesize" -gt 0 ]; then if bash "$i" 2>/dev/null; then echo "$i" fi fi done
Exercise 4 — Pipeline practical
mkdir pipeline_practical && cd pipeline_practical; for i in decode_the_secret_message.txt flu_types.txt secret_message_key.txt; do wget https://raw.githubusercontent.com/CDCgov/id-bioifx-workshop/blob/main/practical/bash_practical_exercises/pipeline_practical/${i} ; done
- List the contents of a directory, pipe that output to word count to find how many files there are (may be helpful to use man wc to find out what wc can do)
- List the contents of a file, sort the contents and find a list of the unique values
- Decode the secret message: 1%76q#948^4q5@23q2q492q07&/@i5#q#76
- Convert all the numbers into letters using the provided variable
- Reverse the order of the sequence
- Cut using “/” as delimiter, take the second field
- convert all the letters into uppercase
- List the contents of a directory, pipe that output to word count to find how many files there are (may be helpful to use man wc to find out what wc can do)
- List the contents of the /usr/bin directory, pipe that output to word count to find how many files there are (may be helpful to use “man wc” to find out what wc can do)
- List the contents of a file, sort the contents and find a list of the unique values (bonus: count the number of unique values and output to screen)
- Convert all the numbers into letters using the provided conversion
- convert all the letters into uppercase
Possible Solution
-
Count files in a directory using a pipe to
wc.ls | wc -
Sort a file and return unique values (bonus: count unique values).
cat flu_types.txt | sort | uniq cat flu_types.txt | sort | uniq | wc -
Decode the secret message by converting characters, reversing, cutting field 2 by
/, then uppercasing.secret_message="$(cat decode_the_secret_message.txt)" secret_key="$(cat secret_message_key.txt)" echo "$secret_message" | tr "$secret_key" '123456789@#0%^&q=' | rev | cut -d"/" -f 2 | tr '[:lower:]' '[:upper:]' -
One-line direct decode example.
echo "1%76q#948^4q5@23q2q492q07&/@i5#q#76" | tr '123456789@#0%^&q=' '!abehnoprstuwxy_' | rev | cut -d"/" -f 2 | tr '[:lower:]' '[:upper:]'