Selected scripts and codes used in the dectection of Cyclospora cayetanensis from 18S and COX3 amplicon sequencing. Test data sets included in this repository are from a subset of samples pulled from COX3 sequencing.
use ncbi-blast+ with fasta files to blast against a selected database with a 90% ID & coverage cutoff and return results with tab-delimited format and the following parameters: queryID accession taxID title length percentID query coverage mismatch gapopen query start query end subject start subject end evalue bitscore.
extract bam alignments based on genome coordinates from indexed bam files. Requires two coordinates. Test with Test_dataset_GENESLICE.
Example Usage:
sh "NW_020312409:1749-1750" "NW_020312409:1874-1875"
trim overhangs from extracted bam alignments. Requires bed file with designated trimming locations. Test with Test_dataset_GENESLICE following coordinate extraction.
Example Usage:
sh 1749-1875-bed
parse blast outputs and duplicate sequence count data from seqkit rmdup ( and return table with combined top counts based on max bitscores for all samples. A randomly selected hit will be selected if max bitscores are tied among several hits. Note: tied hits may include all hits with the same taxID, does not distinguish. Requires a blast output file and count data for each sample. Use with Test_dataset_BLAST.
An optional taxa -target
may also be selected to retrieve all max bitscore hits based on taxa ID. Note: for target option, tied hits are only considered tied maxscore hits between different taxID (i.e tied hits among same taxID will considered unique and not tied). ** Warning: inconsistent counts may occur when using the string option if taxa have different names or are missing labels for the same taxID! **
General Parameters:
-d directory containing files
-s1 file suffix for blast output results
-s2 file suffix for duplicate count files
-o prefix for output files (optional)
-target character string or taxID (optional)
Example Usage:
python -d . -s1 _blast-hits-nt.txt -s2 _duplicated.detail.txt
Example Usage with Targets:
python -d . -s1 _blast-hits-nt.txt -s2 _duplicated.detail.txt -o Cyclo_ -target 88456
python -d . -s1 _blast-hits-nt.txt -s2 _duplicated.detail.txt -o Eimeria_ -target Eimeria
create a list of representative sequence IDs for each sample based on a community percent threshold. Use with Test_dataset_FASTQLIST.
-d directory containing files
-s file suffix of duplicate count files
-p threshold percentage (given as a decimal)
Example Usage:
python -d . -s _duplicated.detail.txt -p 0.25
rmarkdown to create dendrogram plot of most abundant taxa found in pond and sludge samples based on 18S gene sliced reads mapped to C. cayetanensis 18S gene region (655-807bp).