dammit.fileio package

Submodules

dammit.fileio.base module

class dammit.fileio.base.BaseParser(filename)[source]

Bases: object

raise_empty()[source]
class dammit.fileio.base.ChunkParser(filename, chunksize=10000)[source]

Bases: dammit.fileio.base.BaseParser

empty()[source]

Get an empty DataFrame with the appropriate columns.

read()[source]

Read the entire file at once and return as a single DataFrame.

exception dammit.fileio.base.EmptyFile[source]

Bases: Exception

dammit.fileio.base.convert_dtypes(df, dtypes)[source]

Convert the columns of a DataFrame to the types specified in the given dictionary, inplace.

Parameters:
  • df (DataFrame) – The DataFrame to convert.
  • dtypes (dict) – Dictionary mapping columns to types.
dammit.fileio.base.next_or_raise(fp)[source]

Get the next line and raise an exception if its empty.

dammit.fileio.base.warn_empty(msg)[source]

Warn that a file is empty.

dammit.fileio.gff3 module

class dammit.fileio.gff3.GFF3Parser(filename, **kwargs)[source]

Bases: dammit.fileio.base.ChunkParser

columns = [('seqid', <class 'str'>), ('source', <class 'str'>), ('type', <class 'str'>), ('start', <class 'int'>), ('end', <class 'int'>), ('score', <class 'float'>), ('strand', <class 'str'>), ('phase', <class 'float'>), ('attributes', <class 'str'>)]
static decompose_attr_column(col)[source]
empty()[source]

Get an empty DataFrame with the appropriate columns.

class dammit.fileio.gff3.GFF3Writer(filename=None, converter=None, **converter_kwds)[source]

Bases: object

convert(data_df)[source]
static mangle_coordinates(gff3_df)[source]

Although 1-based fully closed intervals are of the Beast, we will respect the convention in the interests of peace between worlds and compatibility.

Parameters:gff3_df (DataFrame) – The DataFrame to “fix”.
version_line = '##gff-version 3.2.1'
write(data_df, version_line=True)[source]

Write the given data to a GFF3 file, using the converter if given.

Generates an empty file if given an empty DataFrame.

Parameters:
  • version_line (bool) – If True, write the GFF3 version line at the.
  • that this will cause an existing file to be overwritten, but (Note) –
  • only be added in the first call to write. (will) –
dammit.fileio.gff3.cmscan_to_gff3(cmscan_df, tag='', database='')[source]
dammit.fileio.gff3.hmmscan_to_gff3(hmmscan_df, tag='', database='')[source]
dammit.fileio.gff3.id_gen_wrapper()[source]
dammit.fileio.gff3.maf_to_gff3(maf_df, tag='', database='', ftype='translated_nucleotide_match')[source]

Convert a MAF DataFrame to a GFF3 DataFrame ready to be written to disk.

Parameters:
  • maf_df (pandas.DataFrame) – The MAF DataFrame. See dammit.fileio.maf.MafParser for column specs.
  • tag (str) – Extra tag to add to the source column.
  • database (str) – For the database entry in the attributes column.
  • ftype (str) – The feature type; GMOD compliant if possible.
Returns:

The GFF3 compliant DataFrame.

Return type:

pandas.DataFrame

dammit.fileio.gff3.next_ID()
dammit.fileio.gff3.shmlast_to_gff3(df, database='')[source]

dammit.fileio.hmmer module

class dammit.fileio.hmmer.HMMerParser(filename, query_regex=None, query_basename='Transcript', **kwargs)[source]

Bases: dammit.fileio.base.ChunkParser

columns = [('target_name', <class 'str'>), ('target_accession', <class 'str'>), ('tlen', <class 'int'>), ('query_name', <class 'str'>), ('query_accession', <class 'str'>), ('query_len', <class 'int'>), ('full_evalue', <class 'float'>), ('full_score', <class 'float'>), ('full_bias', <class 'float'>), ('domain_num', <class 'int'>), ('domain_total', <class 'int'>), ('domain_c_evalue', <class 'float'>), ('domain_i_evalue', <class 'float'>), ('domain_score', <class 'float'>), ('domain_bias', <class 'float'>), ('hmm_coord_from', <class 'int'>), ('hmm_coord_to', <class 'int'>), ('ali_coord_from', <class 'int'>), ('ali_coord_to', <class 'int'>), ('env_coord_from', <class 'int'>), ('env_coord_to', <class 'int'>), ('accuracy', <class 'float'>), ('description', <class 'str'>)]

dammit.fileio.infernal module

class dammit.fileio.infernal.InfernalParser(filename, **kwargs)[source]

Bases: dammit.fileio.base.ChunkParser

columns = [('target_name', <class 'str'>), ('target_accession', <class 'str'>), ('query_name', <class 'str'>), ('query_accession', <class 'str'>), ('mdl', <class 'str'>), ('mdl_from', <class 'int'>), ('mdl_to', <class 'int'>), ('seq_from', <class 'int'>), ('seq_to', <class 'int'>), ('strand', <class 'str'>), ('trunc', <class 'str'>), ('pass', <class 'str'>), ('gc', <class 'float'>), ('bias', <class 'float'>), ('score', <class 'float'>), ('e_value', <class 'float'>), ('inc', <class 'str'>), ('description', <class 'str'>)]

dammit.fileio.maf module

class dammit.fileio.maf.MafParser(filename, aln_strings=False, chunksize=10000, **kwargs)[source]

Bases: dammit.fileio.base.ChunkParser

columns = [('E', <class 'float'>), ('EG2', <class 'float'>), ('q_aln_len', <class 'int'>), ('q_len', <class 'int'>), ('q_name', <class 'str'>), ('q_start', <class 'int'>), ('q_strand', <class 'str'>), ('s_aln_len', <class 'int'>), ('s_len', <class 'int'>), ('s_name', <class 'str'>), ('s_start', <class 'int'>), ('s_strand', <class 'str'>), ('score', <class 'float'>), ('bitscore', <class 'float'>)]

Module contents