Bioinformatics Programming Exercises

These exercises accompany the Intro to Bioinformatics Programming module. They alternate between hands-on practice and knowledge checks.


Exercises developed by Kristine Lacek and Logan Fink

Exercise 1 — Pseudo Code Practical

#1. From a list of numbers, determine the mean

Possible Solution
  1. Count up the number of entries in the file and store that number as a variable (e.g., total_entries).
  2. Use a function to iterate through each number and add them line by line until the end of the file, then store that sum as a variable (sum_entries).
  3. Divide sum_entries by total_entries, and capture the final value as a float so it is not rounded to a whole integer.

#2. Calculate the percentage of numbers in the file that are below 6

Possible Solution
  1. Instantiate two variables, x and y, and set both to zero (x = count of values below 6, y = count of values 6 or above).
  2. Write a function to determine whether a value is less than 6.
  3. Iterate through the list and use each value as input to the function.
  4. If the condition is true, increment x by 1; if false, increment y by 1.
  5. Create a new variable z as x + y (total values), then divide x by z and capture the result as a float.

#3. How many different flu subtypes appear in a list?

Possible Solution
  1. Create an empty list to hold unique subtype values (e.g., unique_list).
  2. Iterate through the subtype list and check whether each subtype is already in unique_list.
  3. If it is not present, add it to unique_list.
  4. Alternative: sort the list and condense to unique values (pipeline approach, e.g., sort/uniq).

#4. Translate this DNA into its possible protein sequences (keeping in mind frames, coding and non-coding)

Possible Solution
  1. Obtain a codon table for converting DNA codons into amino acids.
  2. Starting at nucleotide 1, iterate in steps of 3 and translate codons to build a protein sequence.
  3. Repeat from nucleotide 2 (frame 2).
  4. Repeat from nucleotide 3 (frame 3).
  5. Convert the DNA to its reverse complement.
  6. Repeat translation for the three reverse-complement frames.

Exercise 2 — Logic and Variables Practical

  1. Using the ordinal number file on github (ordinal_check.sh)
wget https://raw.githubusercontent.com/CDCgov/id-bioifx-workshop/refs/heads/main/practical/bash_practical_exercises/logic_and_variable_practical/ordinal_check.sh

  1. Exercise 1: change the ordinal statement to execute as true if the number is greater than 50 and less than 100
  2. Exercise 2: change the ordinal statement to execute as true if the number is less than 25 or greater than 75
  3. Exercise 3: change the ordinal statement to execute as true if the number is greater than 1 and less than 10, or greater than or equal to 90 and less than 100
  4. Set a variable to be the current working directory. Change directory to the top level directory. Navigate back to the directory you were in using the variable
  5. Using the Random number generator file on github (random_number_generator.sh)
Possible Solution
  1. Exercise 1: change the ordinal statement to execute as true if the number is greater than 50 and less than 100.
    if [ "$n" -gt 50 ] && [ "$n" -lt 100 ]; then
      # amend echo statements to reflect exercise instructions
    fi
  2. Exercise 2: change the ordinal statement to execute as true if the number is less than 25 or greater than 75.
    if [ "$n" -lt 25 ] || [ "$n" -gt 75 ]; then
      # amend echo statements to reflect exercise instructions
    fi
  3. Exercise 3: change the ordinal statement to execute as true if the number is greater than 1 and less than 10, or greater than or equal to 90 and less than 100.
    if { [ "$n" -gt 1 ] && [ "$n" -lt 10 ]; } || { [ "$n" -ge 90 ] && [ "$n" -lt 100 ]; }; then
      # amend echo statements to reflect exercise instructions
    fi
  4. Set a variable to the current working directory, change to the top-level directory, then return using the variable.
    thispath="$(pwd)"
    cd
    cd "$thispath"

Exercise 3 — Loops Practical

mkdir loops_practical && cd loops_practical; for ((i=1;i<=99;i++)); do wget https://raw.githubusercontent.com/CDCgov/id-bioifx-workshop/refs/heads/main/practical/bash_practical_exercises/loops_practical/gato${i}; done
  1. For every file in the loops_practical directory, if the file is not empty, print the name of the file to stout. (wc - -byte < filename can be used to give the size of a file)
  2. For each file in a directory, find out if the file contains a shebang line as the first line, if so, print the filename to stout
  3. Find the sick cat! (Hint: execute the files with shebangs!)
  4. For every file in the loops_practical directory, if the file is not empty, print the name of the file to stout. (“wc - -byte < filename” can be used to give the size of a file)
  5. Find the sick cat! (Hint: execute the files with shebangs!)
Possible Solution
  1. Print each non-empty file name in the loops_practical directory.
    for i in *; do
      bytesize=$(wc --byte < "$i")
      if [ "$bytesize" -gt 0 ]; then
        echo "$i"
      fi
    done
  2. Find the “sick cat” by executing only non-empty bash-readable files and printing successful file names.
    for i in *; do
      bytesize=$(wc --byte < "$i")
      if [ "$bytesize" -gt 0 ]; then
        if bash "$i" 2>/dev/null; then
          echo "$i"
        fi
      fi
    done

Exercise 4 — Pipeline practical

mkdir pipeline_practical && cd pipeline_practical; for i in decode_the_secret_message.txt flu_types.txt secret_message_key.txt; do wget https://raw.githubusercontent.com/CDCgov/id-bioifx-workshop/blob/main/practical/bash_practical_exercises/pipeline_practical/${i} ; done

  1. List the contents of a directory, pipe that output to word count to find how many files there are (may be helpful to use man wc to find out what wc can do)
  2. List the contents of a file, sort the contents and find a list of the unique values
  3. Decode the secret message: 1%76q#948^4q5@23q2q492q07&/@i5#q#76
  4. Convert all the numbers into letters using the provided variable
  5. Reverse the order of the sequence
  6. Cut using “/” as delimiter, take the second field
  7. convert all the letters into uppercase
  8. List the contents of a directory, pipe that output to word count to find how many files there are (may be helpful to use man wc to find out what wc can do)
  9. List the contents of the /usr/bin directory, pipe that output to word count to find how many files there are (may be helpful to use “man wc” to find out what wc can do)
  10. List the contents of a file, sort the contents and find a list of the unique values (bonus: count the number of unique values and output to screen)
  11. Convert all the numbers into letters using the provided conversion
  12. convert all the letters into uppercase
Possible Solution
  1. Count files in a directory using a pipe to wc.
    ls | wc
  2. Sort a file and return unique values (bonus: count unique values).
    cat flu_types.txt | sort | uniq
    cat flu_types.txt | sort | uniq | wc
  3. Decode the secret message by converting characters, reversing, cutting field 2 by /, then uppercasing.
    secret_message="$(cat decode_the_secret_message.txt)"
    secret_key="$(cat secret_message_key.txt)"
    echo "$secret_message" | tr "$secret_key" '123456789@#0%^&q=' | rev | cut -d"/" -f 2 | tr '[:lower:]' '[:upper:]'
  4. One-line direct decode example.
    echo "1%76q#948^4q5@23q2q492q07&/@i5#q#76" | tr '123456789@#0%^&q=' '!abehnoprstuwxy_' | rev | cut -d"/" -f 2 | tr '[:lower:]' '[:upper:]'