API Docs

Submodules

dammit.annotate module

dammit.annotate.build_default_pipeline(handler, config, databases)[source]

Register tasks for the default dammit pipeline.

This is all the main tasks, without lastal uniref90 task.

Parameters:
  • handler (handler.TaskHandler) – The task handler to register on.
  • config (dict) – Config dictionary, which contains the command line arguments and the entries from the config file.
  • databases (dict) – The dictionary of files from a database TaskHandler.
Returns:

The handler passed in.

Return type:

handler.TaskHandler

dammit.annotate.build_full_pipeline(handler, config, databases)[source]

Register tasks for the full dammit pipeline (with uniref90).

Parameters:
  • handler (handler.TaskHandler) – The task handler to register on.
  • config (dict) – Config dictionary, which contains the command line arguments and the entries from the config file.
  • databases (dict) – The dictionary of files from a database TaskHandler.
Returns:

The handler passed in.

Return type:

handler.TaskHandler

dammit.annotate.build_quick_pipeline(handler, config, databases)[source]

Register tasks for the quick annotation pipeline.

Leaves out the Pfam search (and so does not pass these hits to TransDecoder.Predict), the Rfam search, and the lastal searches against OrthoDB and uniref90. Best suited for users who have built their own protein databases and would just like to annotate off them.

Parameters:
  • handler (handler.TaskHandler) – The task handler to register on.
  • config (dict) – Config dictionary, which contains the command line arguments and the entries from the config file.
  • databases (dict) – The dictionary of files from a database TaskHandler.
Returns:

The handler passed in.

Return type:

handler.TaskHandler

dammit.annotate.get_handler(config, databases)[source]

Build the TaskHandler for the annotation pipelines. The handler will not have registered tasks when returned.

Parameters:
  • config (dict) – Config dictionary, which contains the command line arguments and the entries from the config file.
  • databases (dict) – The dictionary of files from a database TaskHandler.
Returns:

A constructed TaskHandler.

Return type:

handler.TaskHandler

dammit.annotate.register_annotate_tasks(handler, config, databases)[source]

Register tasks for aggregating the annotations into one GFF3 file and writing out summary descriptions in a new FASTA file.

Parameters:
  • handler (handler.TaskHandler) – The task handler to register on.
  • config (dict) – Config dictionary, which contains the command line arguments and the entries from the config file.
  • databases (dict) – The dictionary of files from a database TaskHandler.
dammit.annotate.register_busco_task(handler, config, databases)[source]

Register tasks for BUSCO. Note that this expects a proper dammit config dictionary.

dammit.annotate.register_lastal_tasks(handler, config, databases, include_uniref=False)[source]

Register tasks for lastal searches. By default, this will just align the transcriptome against OrthoDB; if requested, it will align against uniref90 as well, which takes considerably longer.

Parameters:
  • handler (handler.TaskHandler) – The task handler to register on.
  • config (dict) – Config dictionary, which contains the command line arguments and the entries from the config file.
  • databases (dict) – The dictionary of files from a database TaskHandler.
  • include_uniref (bool) – If True, add tasks for searching uniref90.
dammit.annotate.register_rfam_tasks(handler, config, databases)[source]

Registers tasks for Infernal’s cmscan against Rfam. Rfam is an RNA secondary structure database comprising covariance models for many known RNAs. This is a relatively slow step. A proper dammit config dictionary is required.

dammit.annotate.register_stats_task(handler)[source]

Register the tasks for basic transcriptome metrics.

dammit.annotate.register_transdecoder_tasks(handler, config, databases, include_hmmer=True)[source]

Register tasks for TransDecoder. TransDecoder first finds long ORFs with TransDecoder.LongOrfs, which are output as a FASTA file of protein sequences. These sequences can are then used to search against Pfam-A for conserved domains, and the coordinates from the resulting matches mapped back relative to the original transcripts. TransDecoder.Predict the builds the final gene models based on the training data provided by TransDecoder.LongOrfs, optionally using the Pfam-A results to keep ORFs which otherwise don’t fit the model closely enough. Once again, note that a proper dammit config dictionary is required.

dammit.annotate.register_user_db_tasks(handler, config, databases)[source]

Run conditional recipricol best hits LAST (CRBL) against the user-supplied databases.

dammit.annotate.run_annotation(handler)[source]

Run the annotation pipeline from the given handler.

Prints the appropriate output and exits of the pipeline is alredy completed.

Parameters:handler (handler.TaskHandler) – Handler with tasks for the pipeline.

dammit.app module

class dammit.app.DammitApp(arg_src=['-T', '-E', '-b', 'readthedocs', '-d', '_build/doctrees-readthedocs', '-D', 'language=en', '.', '_build/html'])[source]

Bases: object

description()[source]
epilog()[source]
get_parser()[source]

Build the main parser.

handle_annotate()[source]
handle_databases()[source]
handle_migrate()[source]
run()[source]

dammit.databases module

dammit.databases.build_default_pipeline(handler, config, databases, with_uniref=False)[source]

Register tasks for dammit’s builtin database prep pipeline.

Parameters:
  • handler (handler.TaskHandler) – The task handler to register on.
  • config (dict) – Config dictionary, which contains the command line arguments and the entries from the config file.
  • databases (dict) – The dictionary of files from databases.json.
  • with_uniref (bool) – If True, download and install the uniref90 database. Note that this will take 16+Gb of RAM and a looong time to prepare with lastdb.
Returns:

The handler passed in.

Return type:

handler.TaskHandler

dammit.databases.build_quick_pipeline(handler, config, databases)[source]
dammit.databases.check_or_fail(handler)[source]

Check that the handler’s tasks are complete, and if not, exit with status 2.

dammit.databases.default_database_dir(logger)[source]

Get the default database directory: checks the environment for a DAMMIT_DB_DIR variable, and if it is not found, returns the default location of $HOME/.dammit/databases.

Parameters:logger (logging.logger) – Logger to write to.
Returns:Path to the database directory.
Return type:str
dammit.databases.get_handler(config)[source]

Build the TaskHandler for the database prep pipeline. The handler will not have registered tasks when returned.

Parameters:
  • config (dict) – Config dictionary, which contains the command line arguments and the entries from the config file.
  • databases (dict) – The database dictionary from databases.json.
Returns:

A constructed TaskHandler.

Return type:

handler.TaskHandler

dammit.databases.install(handler)[source]

Run the database prep pipeline from the given handler.

dammit.databases.print_meta(handler)[source]

Print metadata about the database pipeline.

Parameters:handler (handler.TaskHandler) – The database task handler.
dammit.databases.register_busco_tasks(handler, config, databases)[source]
dammit.databases.register_orthodb_tasks(handler, params, databases)[source]
dammit.databases.register_pfam_tasks(handler, params, databases)[source]
dammit.databases.register_rfam_tasks(handler, params, databases)[source]
dammit.databases.register_uniref90_tasks(handler, params, databases)[source]

dammit.handler module

class dammit.handler.TaskHandler(directory, logger, files=None, profile=False, db=None, n_threads=1, **doit_config_kwds)[source]

Bases: doit.cmd_base.TaskLoader

check_uptodate()[source]

Check if all tasks are up-to-date, ie if the pipeline is complete. Note that this moves to the handler’s directory to lessen issues with relative versus absolute paths.

Returns:True if all are up to date.
Return type:bool
clear_tasks()[source]

Empty the task dictionary.

get_status(task, move=False)[source]

Get the up-to-date status of a single task.

Parameters:
  • task (str) – The task name to look up.
  • move (bool) – If True, move to the handler’s directory before checking. Whether this is necessary depends mostly on whether the task uses relative or absolute paths.
Returns:

The string represenation of the status. Either “run” or “uptodate”.

Return type:

str

load_tasks(cmd, opt_values, pos_args)[source]

Internal to doit – triggered by the TaskLoader.

print_statuses(uptodate_msg='All tasks up-to-date!', outofdate_msg='Some tasks out of date!')[source]

Print the up-to-date status of all tasks.

Parameters:
  • uptodate_msg (str) – The message to print if all tasks are up to
  • date.
Returns:

A bool (True if all up to date) and a dictionary of statuses.

Return type:

tuple

register_task(name, task, files=None)[source]

Register a new task and its files with the handler.

It may seem redundant or confusing to give the tasks a name different than their internal doit name. I do this because doit tasks need to have names as unique as possible, so that they can be reused in different projects. A particular TaskHandler instance is only used for one pipeline run, and allowing different names makes it easier to reference tasks from elsewhere.

Parameters:
  • name (str) – Name of the task. Does not have to correspond to doit’s internal task name.
  • ( (task) – obj:): Either a dictionary or Task object.
  • files (dict) – Dictionary of files used.
run(doit_args=None, verbose=True)[source]

Run the pipeline. Movees to the directory, loads the tasks into doit, and executes that tasks that are not up-to-date.

Parameters:
  • doit_args (list) – Args that would be passed to the doit shell command. By default, just run.
  • verbose (bool) – If True, print UI stuff.
Returns:

Exit status of the doit command.

Return type:

int

dammit.log module

dammit.log.init_default_logger()[source]
dammit.log.start_logging(filename=None, test=False)

dammit.meta module

Program metadata: the version, install path, description, and default config.

dammit.meta.get_config()[source]

Parse the default JSON config files and return them as dictionaries.

Returns:The config and databases dictionaries.
Return type:tuple

dammit.parallel module

dammit.parallel.check_parallel(logger=None)[source]
dammit.parallel.parallel_fasta(input_filename, output_filename, command, n_jobs, sshloginfile=None, check_dep=True, logger=None)[source]

Given an input FASTA source, target, shell command, and number of jobs, construct a gnu-parallel command to act on the sequences.

Parameters:
  • input_filename (str) – The source FASTA.
  • output_filename (str) – The target.
  • command (list) – The shell command (in subprocess format).
  • n_jobs (int) – Number of cores or nodes to split to.
  • sshloginfile (str) – Path to file with node addresses.
  • check_dep (bool) – If True, check for the gnu-parallel executable.
  • logger (logging.Logger) – A logger to use.
Returns:

The constructed shell command.

Return type:

str

dammit.profile module

class dammit.profile.Profiler[source]

Bases: object

Thread-safe performance profiler.

start_profiler(filename=None, blockname='__main__')[source]

Start the profiler, with results stored in the given filename.

Parameters:
  • filename (str) – Path to store profiling results. If not given, uses a representation of the current time
  • blockname (str) – Name assigned to the main block.
stop_profiler()[source]

Shut down the profiler and write the final elapsed time.

write_result(task_name, start_time, end_time, elapsed_time)[source]

Write results to the file, using the given task name as the name for the results block.

Parameters:
  • task_name (str) – ID for the result row (the block profiled).
  • start_time (float) – Time of block start.
  • end_time (float) – Time of block end.
  • elapsed_time (float) – Total time.
dammit.profile.StartProfiler(filename=None, blockname='__main__')
class dammit.profile.Timer[source]

Bases: object

Simple timer class.

start()[source]

Start the timer.

stop()[source]

Stop the timer and return the elapsed time.

dammit.profile.add_profile_actions(task)
dammit.profile.profile_task(task_func)
dammit.profile.setup_profiler()[source]

Returns a context manager, a funnction to add profiling actions to doit tasks, and a decoratator to apply that function to task functions.

The profiling function adds new actions to the beginning and end of the given task’s action list, which start and stop the profiler and record the results. The task decorator applies this function. The actions only record data if the profiler is running when they are called, and they are removed from doit’s execution output to reduce clutter.

The context manager starts the profiler in its block, storing data in the given file.

Yes, this is a function function function which creates six different functions at seven different function scopes. Written in honor of javascript programmers everywhere, and to baffle and irritate @ryneches.

dammit.profile.title_without_profile_actions(task)[source]

Generate title without profiling actions

dammit.ui module

class dammit.ui.GithubMarkdownReporter(outstream, options)[source]

Bases: doit.reporter.ConsoleReporter

Specialized doit reporter to make task output Github Markdown compliant.

execute_task(task)[source]

called when excution starts

skip_ignore(task)[source]

skipped ignored task

skip_uptodate(task)[source]

skipped up-to-date task

dammit.ui.checkbox(msg, checked=False)[source]

Generate a Github markdown checkbox for the message.

dammit.ui.header(msg, level=1)[source]

Standardize output headers for submodules.

This doesn’t need to be logged, but it’s nice for the user.

dammit.ui.listing(d)[source]

Generate a markdown list.

dammit.ui.paragraph(msg, wrap=80)[source]

Generate a wrapped paragraph.

dammit.utils module

class dammit.utils.DammitTask(name, actions, file_dep=(), targets=(), task_dep=(), uptodate=(), calc_dep=(), setup=(), clean=(), teardown=(), subtask_of=None, has_subtask=False, doc=None, params=(), pos_arg=None, verbosity=None, title=None, getargs=None, watch=(), loader=None)[source]

Bases: doit.task.Task

Subclass doit.task.Task for dammit. Updates the string __repr__ and adds a uniform updated title function.

title()[source]
class dammit.utils.Move(target, create=False, verbose=False)[source]

Bases: object

Context manager to change current working directory.

dammit.utils.cleaned_actions(actions)[source]

Get a cleanup list of actions: Python actions have their <locals> portion stripped, which clutters up PythonActions that are closures.

dammit.utils.dict_to_task(task_dict)[source]

Given a doit task dict, return a DammitTask.

Parameters:task_dict (dict) – A doit task dict.
Returns:Subclassed doit task.
Return type:DammitTask
dammit.utils.doit_task(task_dict_func)[source]

Wrapper to decorate functions returning pydoit Task dictionaries and have them return pydoit Task objects

dammit.utils.touch(filename)[source]

Perform the equivalent of bash’s touch on the file.

Parameters:filename (str) – File path to touch.
dammit.utils.which(program)[source]

Checks whether the given program (or program path) is valid and executable.

NOTE: Sometimes copypasta is okay! This function came from stackoverflow:

Parameters:program (str) – Either a program name or full path to a program.
Returns:Return the path to the executable or None if not found

Module contents