Skip to main content

Querying

Using SQL

To query the parsomics database using SQL, feel free to use your SQL client of choice. Make sure you follow check out the information at Database credentials.

Again, the parsomics team recommends using DBeaver. Not only only does it improve the readability of query results using tables, but also allows its users to save their SQL queries for later usage. Check out the section on Setting up DBeaver for a full tutorial on using DBeaver with your local parsomics database.

Using Python

You can query the parsomics database with Python, using SQLModel, the Object–Relational Mapping (ORM) library that powers parsomics-core. SQLModel cleverly uses Python's typing system to create "pythonic" abstractions for interacting with relational databases.

One of the best features of SQLModel when it comes to querying is that it can infer which attributes are used for joining tables, when not ambiguous. You can read more about it in SQLModel's documentation.

Using the REST API

Finally, you can also query the parsomics database using the querying endpoint of the API. However, beware that this endpoint only supports queries with SELECT, JOIN, and WHERE statements .

tip

Make sure you follow the instructions for starting the REST API server before you proceed.

Certainly. Here's a concise and clear API spec you can include in your project documentation:

Examples

Get the name and the sequence of a certain contig

In this example, 42 is the primary key of the contig.

SELECT FASTAEntry.sequence_name, FASTAEntry.sequence
FROM FASTAEntry
JOIN Contig ON Contig.fasta_entry_key = FASTAEntry.key
WHERE Contig.key = 42;

Get all protein annotations produced by CLEAN

In this example, clean is the name of the tool.

SELECT proteinannotationentry.*
FROM proteinannotationentry
JOIN proteinannotationfile ON proteinannotationentry.file_key = proteinannotationfile.key
JOIN run ON proteinannotationfile.run_key = run.key
JOIN tool ON run.tool_key = tool.key
WHERE tool.name = 'clean'

Get the name of the sample on which a gene was found

In this example, 7 is the primary key of the gene.

SELECT sample.name
FROM sample
JOIN genome ON genome.sample_key = sample.key
JOIN contig ON contig.genome_key = genome.key
JOIN gene ON gene.contig_key = contig.key
WHERE gene.key = 7

Get the project that a contig belongs to

In this example, 66 is the primary key of the contig.

SELECT project.*
FROM project
JOIN assembly ON assembly.project_key = project.key
JOIN run ON run.assembly_key = assembly.key
JOIN fastafile ON fastafile.run_key = run.key
JOIN fastaentry ON fastaentry.file_key = fastafile.key
JOIN contig ON contig.fasta_entry_key = fastaentry.key
WHERE contig.key = 66

Get detailed information on all protein annotations of a contig

In this example, 13 is the primary key of the contig.

SELECT
proteinannotationentry.key,
proteinannotationentry.description,
proteinannotationentry.coord_start as annotation_coord_start,
proteinannotationentry.coord_stop as annotation_coord_stop,
gffentry.coord_start as cds_coord_start,
gffentry.coord_stop as cds_coord_stop,
tool.key as tool_key,
tool.name as tool_name,
source.key as source_key,
source.name as source_name
FROM proteinannotationentry
JOIN protein ON proteinannotationentry.protein_key = protein.key
JOIN fragmentproteinlink ON protein.key = fragmentproteinlink.protein_key
JOIN fragment ON fragmentproteinlink.fragment_key = fragment.key
JOIN gene ON fragment.gene_key = gene.key
JOIN gffentry ON fragment.gff_entry_key = gffentry.key
LEFT JOIN source ON proteinannotationentry.source_key = source.key
JOIN proteinannotationfile ON proteinannotationentry.file_key = proteinannotationfile.key
JOIN run ON proteinannotationfile.run_key = run.key
JOIN tool ON run.tool_key = tool.key
WHERE
gene.contig_key = 13 AND gffentry.fragment_type = 'CDS'