dammit.tasks package¶
Submodules¶
dammit.tasks.busco module¶
-
class
dammit.tasks.busco.
BuscoTask
(logger=None)[source]¶ Bases:
dammit.tasks.utils.DependentTask
-
task
(input_filename, output_name, busco_db_dir, input_type='tran', n_threads=1, params=None)[source]¶ Get a task to run BUSCO on the given FASTA file.
Parameters: - input_filename (str) – The FASTA file to run BUSCO on.
- output_name (str) – Base name for the BUSCO output directory.
- busco_db_dir (str) – Directory with the BUSCO databases.
- input_type (str) – By default, trans for transcriptome.
- n_threads (int) – Number of threads to use.
- params (list) – Extra parameters to pass to the executable.
Returns: A doit task.
Return type: dict
-
-
dammit.tasks.busco.
busco_to_df
(fn_list, dbs=['metazoa', 'vertebrata'])[source]¶ Given a list of BUSCO results from different databases, produce an appropriately multi-indexed DataFrame of the results.
Parameters: - fn_list (list) – The BUSCO summary files.
- dbs (list) – The BUSCO databases used for these runs.
Returns: The BUSCO results.
Return type: DataFrame
-
dammit.tasks.busco.
parse_busco_full
(fn)[source]¶ Parses a BUSCO full result table into a Pandas DataFrame.
Parameters: fn (str) – The results file. Returns: The results DataFrame. Return type: DataFrame
-
dammit.tasks.busco.
parse_busco_multiple
(fn_list, dbs=['metazoa', 'vertebrata'])[source]¶ Parses multiple BUSCO results summaries into an appropriately index DataFrame.
Parameters: - fn_list (list) – List of paths to results files.
- dbs (list) – List of BUSCO database names.
Returns: The formated DataFrame.
Return type: DataFrame
dammit.tasks.fastx module¶
-
dammit.tasks.fastx.
get_rename_transcriptome_task
(transcriptome_fn, output_fn, names_fn, transcript_basename, split_regex=None)[source]¶ Create a doit task to copy a FASTA file and rename the headers.
Parameters: - transcriptome_fn (str) – The FASTA file.
- output_fn (str) – Destination to copy to.
- names_fn (str) – Destination to the store mapping from old to new names.
- transcript_basename (str) – String to contruct new names from.
- split_regex (regex) – Regex to split the input names with; must contain a name field.
Returns: A doit task.
Return type: dict
dammit.tasks.gff module¶
-
dammit.tasks.gff.
get_cmscan_gff3_task
(input_filename, output_filename, database)[source]¶ Given raw input from Infernal’s cmscan, convert it to GFF3 and save the results.
Parameters: - input_filename (str) – The input CSV.
- output_filename (str) – Destination for GFF3 output.
- database (str) – Tag to use in the GFF3 Dbxref field.
Returns: A doit task.
Return type: dict
-
dammit.tasks.gff.
get_gff3_merge_task
(gff3_filenames, output_filename)[source]¶ Given a list of GFF3 files, merge them all together.
Parameters: - gff3_filenames (list) – Paths to the GFF3 files.
- output_filename (str) – Path to pipe the results.
Returns: A doit task.
Return type: dict
-
dammit.tasks.gff.
get_hmmscan_gff3_task
(input_filename, output_filename, database)[source]¶ Given HMMER output converted to CSV, convert it to GFF3 and save the results. CSV generated from the DataFrame(s) returned by the HMMerParser.
Parameters: - input_filename (str) – The input CSV.
- output_filename (str) – Destination for GFF3 output.
- database (str) – Tag to use in the GFF3 Dbxref field.
Returns: A doit task.
Return type: dict
-
dammit.tasks.gff.
get_maf_best_hits_task
(maf_fn, output_fn)[source]¶ Doit task to get the best hits from a lastal MAF file.
Parameters: - maf_fn (str) – Path to the MAF file.
- output_fn (str) – Path to store resulting CSV file.
Returns: A doit task.
Return type: dict
-
dammit.tasks.gff.
get_maf_gff3_task
(input_filename, output_filename, database)[source]¶ Given either a raw MAF file or a CSV file with the proper MAF colums, convert it to GFF3 and save the results.
Parameters: - input_filename (str) – The input MAF or CSV.
- output_filename (str) – Destination for GFF3 output.
- database (str) – Tag to use in the GFF3 Dbxref field.
Returns: A doit task.
Return type: dict
-
dammit.tasks.gff.
get_shmlast_gff3_task
(input_filename, output_filename, database)[source]¶ Given the CSV output from shmlast, convert it to GFF3 and save the results.
Parameters: - input_filename (str) – The input CSV.
- output_filename (str) – Destination for GFF3 output.
- database (str) – Tag to use in the GFF3 Dbxref field.
Returns: A doit task.
Return type: dict
dammit.tasks.hmmer module¶
-
class
dammit.tasks.hmmer.
HMMScanTask
(logger=None)[source]¶ Bases:
dammit.tasks.utils.DependentTask
-
task
(input_filename, output_filename, db_filename, cutoff=1e-05, n_threads=1, sshloginfile=None, params=None)[source]¶ Run HMMER’s hmmscan with the given database on the given FASTA file.
Parameters: - input_filename (str) – The path to the input FASTA.
- output_filename (str) – Path to save the results.
- db_filename (str) – Path to the formatted database.
- cutoff (float) – The e-value cutoff to filter with.
- n_threads (int) – Number of threads to use.
- pbs (bool) – If True, pass the right parameters to gnu-parallel to run on a cluster.
- params (list) – Extra parameters to pass to executable.
Returns: A doit task.
Return type: dict
-
-
dammit.tasks.hmmer.
get_remap_hmmer_task
(hmmer_filename, remap_gff_filename, output_filename, transcript_basename='Transcript')[source]¶ Given an hmmscan result from the ORFs generated by TransDecoder.LongOrfs and TransDecoder’s GFF3, remap the HMMER results so that they refer to the original nucleotide coordinates rather than the translated ORF coordinates. Produces a CSV file with columns matching those in HMMerParser.
Parameters: - hmmer_filename (str) – Path to the hmmscan results.
- remap_gff_filename (str) – The GFF3 produced by TransDecoder.LongOrfs.
- output_filename (str) – Path to store remapped results.
Returns: A doit task.
Return type: dict
dammit.tasks.infernal module¶
-
class
dammit.tasks.infernal.
CMPressTask
(logger=None)[source]¶ Bases:
dammit.tasks.utils.DependentTask
-
task
(db_filename, params=None, task_dep=None)[source]¶ Run Infernal’s cmpress on a covariance model database.
Parameters: - db_filename (str) – Path to the covariance model database.
- params (list) – Extra parameters to pass to the executable.
- task_dep (str) – Task dep to give doit task.
Returns: A doit task.
Return type: dict
-
-
class
dammit.tasks.infernal.
CMScanTask
(logger=None)[source]¶ Bases:
dammit.tasks.utils.DependentTask
-
task
(input_filename, output_filename, db_filename, cutoff=1e-05, n_threads=1, sshloginfile=None, params=None)[source]¶ Run Infernal’s cmscan on the given FASTA and covariance model database.
Parameters: - input_filename (str) – Path to the input FASTA.
- output_filename (str) – Path to store results.
- db_filename (str) – Path to formatted covariance model database.
- cutoff (float) – e-value cutoff to filter by.
- n_threads (int) – Number of threads to run with via gnu-parallel.
- pbs (bool) – If True, pass parameters to gnu-parallel for running on a cluster.
- params (list) – Extra parameters to pass to executable.
Returns: A doit task.
Return type: dict
-
dammit.tasks.report module¶
-
dammit.tasks.report.
generate_sequence_summary
(original_name, sequence, annotation_df)[source]¶ Given a FASTA sequence’s original name, the sequence itself, and a DataFrame with its corresponding GFF3 annotations, generate a summary line of the annotations in key=value format.
Parameters: - original_name (str) – Original name of the sequence.
- sequence (str) – The sequence itself.
- annotation_df (DataFrame) – DataFrame with GFF3 format annotations.
Returns: The new summary header.
Return type: str
-
dammit.tasks.report.
get_annotate_fasta_task
(transcriptome_fn, gff3_fn, output_fn)[source]¶ Annotation the headers in a FASTA file with its corresponding GFF3 file.
Parameters: - transcriptome_fn (str) – Path to the FASTA file.
- gff3_fn (str) – Path to the GFF3 annotations.
- output_fn (str) – Path to store the resulting annotated FASTA.
Returns: A doit task.
Return type: dict
dammit.tasks.shell module¶
-
dammit.tasks.shell.
get_cat_task
(file_list, target_fn)[source]¶ Create a doit task to cat together the given files and pipe the result to the given target.
Parameters: - file_list (list) – The files to cat.
- target_fn (str) – The target file.
Returns: A doit task.
Return type: dict
-
dammit.tasks.shell.
get_download_and_gunzip_task
(url, target_fn)[source]¶ Create a doit task which downloads and gunzips a file.
Parameters: - url (str) – URL to download.
- target_fn (str) – Target file for the download.
Returns: doit task.
Return type: dict
-
dammit.tasks.shell.
get_download_and_untar_task
(url, target_dir, label=None)[source]¶ Create a doit task to download a file and untar it in the given directory.
Parameters: - url (str) – URL to download.
- (str (target_dir) – Directory to put the untarred folder in.
- label (str) – Optional label to resolve doit name conflicts when putting multiple results in the same folder.
Returns: doit task.
Return type: dict
-
dammit.tasks.shell.
get_download_task
(url, target_fn, md5=None, metalink=None)[source]¶ Creates a doit task to download the given URL.
Parameters: - url (str) – URL to download.
- target_fn (str) – Target for the download.
Returns: doit task.
Return type: dict
-
dammit.tasks.shell.
get_gunzip_task
(archive_fn, target_fn)[source]¶ Create a doit task to gunzip a gzip archive.
Parameters: - archive_fn (str) – The gzip file.
- target_fn (str) – Output filename.
Returns: doit task.
Return type: dict
-
dammit.tasks.shell.
get_link_file_task
(src, dst='')[source]¶ Soft-link file to the current directory, or to the destination target if given.
Parameters: - src (str) – The file to link.
- dst (str) – The destination; by default, the current directory.
Returns: A doit task.
Return type: dict
-
dammit.tasks.shell.
get_untargz_task
(archive_fn, target_dir, label=None)[source]¶ Create a doit task to untar and gunip a *.tar.gz archive.
Parameters: - archive_fn (str) – The .tar.gz file.
- target_dir (str) – The folder to untar into.
- label (str) – Optional label to resolve doit task name conflicts.
Returns: doit task.
Return type: dict
dammit.tasks.transdecoder module¶
-
class
dammit.tasks.transdecoder.
TransDecoderPredictTask
(logger=None)[source]¶ Bases:
dammit.tasks.utils.DependentTask
-
task
(input_filename, pfam_filename=None, params=None)[source]¶ Get a task to run TransDecoder.Predict.
Parameters: - input_filename (str) – The FASTA file to analyze.
- pfam_filename (str) – If HMMER has been run against Pfam, pass this file name to –retain_pfam_hits.
- params (list) – Extra parameters to pass to the executable.
Returns: A doit task.
Return type: dict
-