Skip to main content

ProteInfer

Predicting the functional properties of protein sequences using deep neural networks.


Important

This file type requires the parsomics-plugin-proteinfer plugin

File naming

The file names must adhere to one of the following patterns:

  • <MAG-name>_ProteInfer_out.tsv,
  • <MAG-name>.tsv,

File format

The file must include a header (i.e. it should include column names at the top). It must have the following columns:

Column nameColumn obligatorinessData typeData nullability
sequence_nameMandatoryStringNot nullable
predicted_labelMandatoryStringNullable
confidenceMandatoryStringNullable
descriptionOptionalStringNullable

Mapping to database

ProteinAnnotationFile

Original dataProteinAnnotationFile field
ProteInfer TSV file pathpath

ProteinAnnotationEntry

Original dataProteinAnnotationEntry field
sequence_nameprotein_key 1
predicted_labelaccession and annotation_type 2
confidencescore
descriptiondescription

Footnotes

  1. The protein name in the ProteInfer TSV file name is used to query the primary key of the corresponding protein in the database

  2. The predicted_label column in the ProteInfer TSV files is formatted like so: <Annotation-type>:<Accession>. One such example would be Pfam:CL0023, in which case this plugin would set annotation_type to "PFAM" and accession to "CL0023".