dRep
Rapid and accurate comparison and de-replication of microbial genomes
File naming
The dRep format is a special case that requires two files:
Cdb.csv
Wdb.csv
Their names have to be exactly as above.
File format
tip
For more information on the dRep output files, visit the dRep documentation.
Cdb.tsv
This file informs the cluster of every MAG.
The file must follow the Tab Separated Values (TSV). It must have columns representing the following data, in that order and with a header:
Column name | Column obligatoriness | Data type | Data nullability |
---|---|---|---|
genome | Mandatory | String | Not nullable |
secondary_cluster | Mandatory | String | Nullable |
threshold | Optional (ignored) | N/A | N/A |
cluster_method | Optional (ignored) | N/A | N/A |
comparison_algorithm | Optional (ignored) | N/A | N/A |
primary_cluster | Optional (ignored) | N/A | N/A |
Wdb.tsv
This file informs the "winners" (i.e. best representatives) of each cluster.
The file must follow the Tab Separated Values (TSV). It must have columns representing the following data, in that order and with a header:
Column name | Column obligatoriness | Data type | Data nullability |
---|---|---|---|
genome | Mandatory | String | Not nullable |
score | Optional (ignored) | N/A | N/A |
cluster | Optional (ignored) | N/A | N/A |
Mapping to database
DrepDirectory
Original data | DrepDirectory field |
---|---|
dRep directory path | path 1 |
DrepEntry
Original data | DrepEntry field |
---|---|
genome column of Wdb.csv | winner 2 |
genome column of Cdb.csv | genome_name |
secondary_cluster column of Cdb.csv | genome_cluster_name |