Tutorial¶
Once you have the dependencies installed, it’s time to actually annotate something! This guide will take you through a short example on some test data.
Data¶
First let’s download some test data. We’ll start small and use a Schizosaccharomyces pombe transcriptome. Make a working directory and move there, and then download the file:
mkdir dammit_test
cd dammit_test
wget ftp://ftp.ebi.ac.uk/pub/databases/pombase/FASTA/cdna_nointrons_utrs.fa.gz
wget ftp://ftp.ebi.ac.uk/pub/databases/pombase/FASTA/pep.fa.gz
Decompress the file with gunzip:
gunzip cdna_nointrons_utrs.fa.gz pep.fa.gz
Databases¶
If you’re just starting, you probably haven’t downloaded the databases yet. Here we’ll install the main databases, as well as the eukaryota BUSCO database for our yeast dataset. This could take a while, so consider walking away and getting yourself a cup of coffee. If you installed dammit into a virtual environment, be sure to activate it first:
dammit databases --install --busco-group eukaryota
Alternatively, if you happen to have downloaded many of these databases before, you can follow the directions in the databases guide.
While the initial download takes a while, once its done, you won’t need to do it again –
dammit keeps track of the database state and won’t repeat work its already completed,
even if you accidentally rerun with the --install
flag.
Annotation¶
Now we’ll do a simple run of the annotator. We’ll use pep.fa as a user database;
this is a toy example, seeing as these proteins came from the same set of
transcripts as we’re annotating, but they illustrate the usage nicely enough.
We’ll also specify a non-default BUSCO group. You can replace the argument to
--n_threads
with however many cores are available on your system in order to
speed it up.:
dammit annotate cdna_nointrons_utrs.fa --user-databases pep.fa --busco-group eukaryota --n_threads 1
This will take a bit, so go get another cup of coffee…