dammit.tasks package

Submodules

dammit.tasks.busco module

class dammit.tasks.busco.BuscoTask(logger=None)[source]

Bases: dammit.tasks.utils.DependentTask

deps()[source]
task(input_filename, output_name, busco_db_dir, input_type='tran', n_threads=1, config_file=None, params=None)[source]

Get a task to run BUSCO on the given FASTA file.

Parameters:
  • input_filename (str) – The FASTA file to run BUSCO on.
  • output_name (str) – Base name for the BUSCO output directory.
  • busco_db_dir (str) – Directory with the BUSCO databases.
  • input_type (str) – By default, trans for transcriptome.
  • n_threads (int) – Number of threads to use.
  • params (list) – Extra parameters to pass to the executable.
Returns:

A doit task.

Return type:

dict

dammit.tasks.busco.busco_to_df(fn_list, dbs=['metazoa', 'vertebrata'])[source]

Given a list of BUSCO results from different databases, produce an appropriately multi-indexed DataFrame of the results.

Parameters:
  • fn_list (list) – The BUSCO summary files.
  • dbs (list) – The BUSCO databases used for these runs.
Returns:

The BUSCO results.

Return type:

DataFrame

dammit.tasks.busco.parse_busco_full(fn)[source]

Parses a BUSCO full result table into a Pandas DataFrame.

Parameters:fn (str) – The results file.
Returns:The results DataFrame.
Return type:DataFrame
dammit.tasks.busco.parse_busco_multiple(fn_list, dbs=['metazoa', 'vertebrata'])[source]

Parses multiple BUSCO results summaries into an appropriately index DataFrame.

Parameters:
  • fn_list (list) – List of paths to results files.
  • dbs (list) – List of BUSCO database names.
Returns:

The formated DataFrame.

Return type:

DataFrame

dammit.tasks.busco.parse_busco_summary(fn)[source]

Parses a BUSCO summary file into a JSON compatible dictionary.

Parameters:fn (str) – The summary results file.
Returns:The BUSCO results.
Return type:dict

dammit.tasks.fastx module

dammit.tasks.fastx.get_rename_transcriptome_task(transcriptome_fn, output_fn, names_fn, transcript_basename, split_regex=None)[source]

Create a doit task to copy a FASTA file and rename the headers.

Parameters:
  • transcriptome_fn (str) – The FASTA file.
  • output_fn (str) – Destination to copy to.
  • names_fn (str) – Destination to the store mapping from old to new names.
  • transcript_basename (str) – String to contruct new names from.
  • split_regex (regex) – Regex to split the input names with; must contain a name field.
Returns:

A doit task.

Return type:

dict

dammit.tasks.fastx.get_transcriptome_stats_task(transcriptome, output_fn)[source]

Create a doit task to run basic metrics on a transcriptome.

Parameters:
  • transcriptome (str) – The input FASTA file.
  • output_fn (str) – File to store the results.
Returns:

A doit task.

Return type:

dict

dammit.tasks.fastx.strip_seq_extension(fn)[source]

dammit.tasks.gff module

dammit.tasks.gff.get_cmscan_gff3_task(input_filename, output_filename, database)[source]

Given raw input from Infernal’s cmscan, convert it to GFF3 and save the results.

Parameters:
  • input_filename (str) – The input CSV.
  • output_filename (str) – Destination for GFF3 output.
  • database (str) – Tag to use in the GFF3 Dbxref field.
Returns:

A doit task.

Return type:

dict

dammit.tasks.gff.get_gff3_merge_task(gff3_filenames, output_filename)[source]

Given a list of GFF3 files, merge them all together.

Parameters:
  • gff3_filenames (list) – Paths to the GFF3 files.
  • output_filename (str) – Path to pipe the results.
Returns:

A doit task.

Return type:

dict

dammit.tasks.gff.get_hmmscan_gff3_task(input_filename, output_filename, database)[source]

Given HMMER output converted to CSV, convert it to GFF3 and save the results. CSV generated from the DataFrame(s) returned by the HMMerParser.

Parameters:
  • input_filename (str) – The input CSV.
  • output_filename (str) – Destination for GFF3 output.
  • database (str) – Tag to use in the GFF3 Dbxref field.
Returns:

A doit task.

Return type:

dict

dammit.tasks.gff.get_maf_best_hits_task(maf_fn, output_fn)[source]

Doit task to get the best hits from a lastal MAF file.

Parameters:
  • maf_fn (str) – Path to the MAF file.
  • output_fn (str) – Path to store resulting CSV file.
Returns:

A doit task.

Return type:

dict

dammit.tasks.gff.get_maf_gff3_task(input_filename, output_filename, database)[source]

Given either a raw MAF file or a CSV file with the proper MAF colums, convert it to GFF3 and save the results.

Parameters:
  • input_filename (str) – The input MAF or CSV.
  • output_filename (str) – Destination for GFF3 output.
  • database (str) – Tag to use in the GFF3 Dbxref field.
Returns:

A doit task.

Return type:

dict

dammit.tasks.gff.get_shmlast_gff3_task(input_filename, output_filename, database)[source]

Given the CSV output from shmlast, convert it to GFF3 and save the results.

Parameters:
  • input_filename (str) – The input CSV.
  • output_filename (str) – Destination for GFF3 output.
  • database (str) – Tag to use in the GFF3 Dbxref field.
Returns:

A doit task.

Return type:

dict

dammit.tasks.hmmer module

class dammit.tasks.hmmer.HMMPressTask(logger=None)[source]

Bases: dammit.tasks.utils.DependentTask

deps()[source]
task(db_filename, params=None, task_dep=None)[source]

Run hmmpress on a profile HMM database.

Parameters:
  • db_filename (str) – The database to run on.
  • params (list) – Extra parameters to pass to executable.
  • task_dep (str) – Task dep to add to doit task.
Returns:

A doit task.

Return type:

dict

class dammit.tasks.hmmer.HMMScanTask(logger=None)[source]

Bases: dammit.tasks.utils.DependentTask

deps()[source]
task(input_filename, output_filename, db_filename, cutoff=1e-05, n_threads=1, sshloginfile=None, params=None)[source]

Run HMMER’s hmmscan with the given database on the given FASTA file.

Parameters:
  • input_filename (str) – The path to the input FASTA.
  • output_filename (str) – Path to save the results.
  • db_filename (str) – Path to the formatted database.
  • cutoff (float) – The e-value cutoff to filter with.
  • n_threads (int) – Number of threads to use.
  • pbs (bool) – If True, pass the right parameters to gnu-parallel to run on a cluster.
  • params (list) – Extra parameters to pass to executable.
Returns:

A doit task.

Return type:

dict

dammit.tasks.hmmer.get_remap_hmmer_task(hmmer_filename, remap_gff_filename, output_filename, transcript_basename='Transcript')[source]

Given an hmmscan result from the ORFs generated by TransDecoder.LongOrfs and TransDecoder’s GFF3, remap the HMMER results so that they refer to the original nucleotide coordinates rather than the translated ORF coordinates. Produces a CSV file with columns matching those in HMMerParser.

Parameters:
  • hmmer_filename (str) – Path to the hmmscan results.
  • remap_gff_filename (str) – The GFF3 produced by TransDecoder.LongOrfs.
  • output_filename (str) – Path to store remapped results.
Returns:

A doit task.

Return type:

dict

dammit.tasks.infernal module

class dammit.tasks.infernal.CMPressTask(logger=None)[source]

Bases: dammit.tasks.utils.DependentTask

deps()[source]
task(db_filename, params=None, task_dep=None)[source]

Run Infernal’s cmpress on a covariance model database.

Parameters:
  • db_filename (str) – Path to the covariance model database.
  • params (list) – Extra parameters to pass to the executable.
  • task_dep (str) – Task dep to give doit task.
Returns:

A doit task.

Return type:

dict

class dammit.tasks.infernal.CMScanTask(logger=None)[source]

Bases: dammit.tasks.utils.DependentTask

deps()[source]
task(input_filename, output_filename, db_filename, cutoff=1e-05, n_threads=1, sshloginfile=None, params=None)[source]

Run Infernal’s cmscan on the given FASTA and covariance model database.

Parameters:
  • input_filename (str) – Path to the input FASTA.
  • output_filename (str) – Path to store results.
  • db_filename (str) – Path to formatted covariance model database.
  • cutoff (float) – e-value cutoff to filter by.
  • n_threads (int) – Number of threads to run with via gnu-parallel.
  • pbs (bool) – If True, pass parameters to gnu-parallel for running on a cluster.
  • params (list) – Extra parameters to pass to executable.
Returns:

A doit task.

Return type:

dict

dammit.tasks.report module

dammit.tasks.report.generate_sequence_name(original_name, sequence, annotation_df)[source]
dammit.tasks.report.generate_sequence_summary(original_name, sequence, annotation_df)[source]

Given a FASTA sequence’s original name, the sequence itself, and a DataFrame with its corresponding GFF3 annotations, generate a summary line of the annotations in key=value format.

Parameters:
  • original_name (str) – Original name of the sequence.
  • sequence (str) – The sequence itself.
  • annotation_df (DataFrame) – DataFrame with GFF3 format annotations.
Returns:

The new summary header.

Return type:

str

dammit.tasks.report.get_annotate_fasta_task(transcriptome_fn, gff3_fn, output_fn)[source]

Annotation the headers in a FASTA file with its corresponding GFF3 file.

Parameters:
  • transcriptome_fn (str) – Path to the FASTA file.
  • gff3_fn (str) – Path to the GFF3 annotations.
  • output_fn (str) – Path to store the resulting annotated FASTA.
Returns:

A doit task.

Return type:

dict

dammit.tasks.shell module

dammit.tasks.shell.check_hash(target_fn, expected)[source]
dammit.tasks.shell.get_cat_task(file_list, target_fn)[source]

Create a doit task to cat together the given files and pipe the result to the given target.

Parameters:
  • file_list (list) – The files to cat.
  • target_fn (str) – The target file.
Returns:

A doit task.

Return type:

dict

dammit.tasks.shell.get_download_and_gunzip_task(url, target_fn)[source]

Create a doit task which downloads and gunzips a file.

Parameters:
  • url (str) – URL to download.
  • target_fn (str) – Target file for the download.
Returns:

doit task.

Return type:

dict

dammit.tasks.shell.get_download_and_untar_task(url, target_dir, label=None)[source]

Create a doit task to download a file and untar it in the given directory.

Parameters:
  • url (str) – URL to download.
  • (str (target_dir) – Directory to put the untarred folder in.
  • label (str) – Optional label to resolve doit name conflicts when putting multiple results in the same folder.
Returns:

doit task.

Return type:

dict

dammit.tasks.shell.get_download_task(url, target_fn, md5=None, metalink=None)[source]

Creates a doit task to download the given URL.

Parameters:
  • url (str) – URL to download.
  • target_fn (str) – Target for the download.
Returns:

doit task.

Return type:

dict

dammit.tasks.shell.get_gunzip_task(archive_fn, target_fn)[source]

Create a doit task to gunzip a gzip archive.

Parameters:
  • archive_fn (str) – The gzip file.
  • target_fn (str) – Output filename.
Returns:

doit task.

Return type:

dict

Soft-link file to the current directory, or to the destination target if given.

Parameters:
  • src (str) – The file to link.
  • dst (str) – The destination; by default, the current directory.
Returns:

A doit task.

Return type:

dict

dammit.tasks.shell.get_untargz_task(archive_fn, target_dir, label=None)[source]

Create a doit task to untar and gunip a *.tar.gz archive.

Parameters:
  • archive_fn (str) – The .tar.gz file.
  • target_dir (str) – The folder to untar into.
  • label (str) – Optional label to resolve doit task name conflicts.
Returns:

doit task.

Return type:

dict

dammit.tasks.shell.hashfile(path, hasher=None, blocksize=65536)[source]

A function to hash files.

See: http://stackoverflow.com/questions/3431825

dammit.tasks.transdecoder module

class dammit.tasks.transdecoder.TransDecoderLongOrfsTask(logger=None)[source]

Bases: dammit.tasks.utils.DependentTask

deps()[source]
task(input_filename, params=None)[source]

Get a task to run Transdecoder.LongOrfs.

Parameters:
  • input_filename (str) – FASTA file to analyze.
  • params (list) – Extra parameters to pass to the executable.
Returns:

A doit task.

Return type:

dict

class dammit.tasks.transdecoder.TransDecoderPredictTask(logger=None)[source]

Bases: dammit.tasks.utils.DependentTask

deps()[source]
task(input_filename, pfam_filename=None, params=None)[source]

Get a task to run TransDecoder.Predict.

Parameters:
  • input_filename (str) – The FASTA file to analyze.
  • pfam_filename (str) – If HMMER has been run against Pfam, pass this file name to –retain_pfam_hits.
  • params (list) – Extra parameters to pass to the executable.
Returns:

A doit task.

Return type:

dict

dammit.tasks.utils module

class dammit.tasks.utils.DependentTask(logger=None)[source]

Bases: object

deps()[source]
task(*args, **kwargs)[source]
exception dammit.tasks.utils.InstallationError[source]

Bases: RuntimeError

dammit.tasks.utils.clean_folder(target)[source]

Function for doit task’s clean parameter to remove a folder.

Parameters:target (str) – The folder to remove.
dammit.tasks.utils.get_group_task(group_name, tasks)[source]

Creat a task group from the given tasks.

Parameters:
  • group_name (str) – The name to give the group.
  • tasks (list) – List of Task objects to add to group.
Returns:

A doit task for the group.

Return type:

dict

Module contents