Usage

If you’re looking for a quick start, head over to the tutorial. This page has more complete usage information and a better breakdown of the functionality.

Dependencies

dammit has three components. The first, dependencies, checks whether you have the dependencies installed correctly and warns you if not. It is run with:

dammit dependencies

There isn’t much to this command; either you have the dependencies or you don’t. If you don’t, there are instructions for getting them on the installation page.

Databases

The next component is the databases subcommand. This handles all of dammit’s external data; the documentation can be found here.

Annotation

The annotate command runs the BUSCO assessment, assembly stats, and homology searches, aggregates the results, and outputs a GFF3 file and annotation report. It takes the --full, --database-dir, and --busco-group options in the same manner as the databases command. Additionally, it can specify an optional output directory, the number of threads to use with threaded subprograms like HMMER, and a list of user-supplied protein databases in FASTA format. A simple invocation with the default databases would look like:

dammit annotate <transcriptome.fasta>

While a more complex invocation might look like:

dammit annotate <transcriptome.fasta> --database-dir /path/to/dbs --busco-group vertebrata --n_threads 4 --user-databases whale.pep.fasta dolphin.pep.fasta

User databases will be searched with CRBB; this runs blastx, so if you supply ridiculously huge databases, it will take a long time. Future versions will use LAST for all searches to improve performance, but for now, we’re stuck with the NCBI’s dinosaur. Also note that the information from the deflines in your databases will be used to construct the GFF3 file, so if your databases lack useful IDs, your annotations will too.