Incenp.org’s utilities for computational biology.
You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Go to file
Damien Goutte-Gattat 0e801b94ea Switch to Black-enforced code style. 1 year ago
biosql seqvault: Add support for SQLite databases. 2 years ago
docs Add more documentation. 2 years ago
incenp Switch to Black-enforced code style. 1 year ago
tests Factorize test code. 2 years ago
.gitignore Initial commit. 3 years ago
AUTHORS Initial commit. 3 years ago
COPYING Initial commit. 3 years ago
MANIFEST.in Prepare the 0.2.0 release. 2 years ago
NEWS Prepare the 0.2.0 release. 2 years ago
README.md Remove the seqvault.rc configuration file. 2 years ago
pyproject.toml Switch to Black-enforced code style. 1 year ago
setup.py Switch to Black-enforced code style. 1 year ago

README.md

Incenp.Bioutils - Incenp.orgs utilities for computational biology

Incenp.Bioutils is a set of command line utilities and helper Python modules to deal with some computational biology tasks.

Utilities

SeqTool

SeqTool is intended as a “Swiss Army knife” to perform various operations on biological sequence files.

Among other things, it provides wrappers for the blast and dotter alignment tools; the main interest of those wrappers is that they can be used with sequences in any format supported by Biopythons SeqIO module, whereas the original tools only read files in the FASTA format.

SeqVault

SeqVault is a command-line interface to BioSQL databases.

SeqVault is intended to be used with a slightly modified version of the BioSQL database schema (provided in the biosql directory), where every biodatabase is associated with a 3-letters prefix. That prefix is then used to automatically assign accession numbers (of the form PRE_xxxxxx where PRE is the prefix) when importing sequences into the database. However SeqVault can also be used with pristine BioSQL databases.

Quickstart (example with PostgreSQL)

Create a new PostgreSQL user account and a new database::

# createuser <username>
# createdb -O owner <username> <dbname>

Initialize the newly created database by running the provided biosql/biosqldb-pg.sql script::

$ psql -h localhost -U <username> <dbname> < biosql/biosqldb-pg.sql

Next, create a plaintext file in $XDG_CONFIG_HOME/bioutils/databases.ini and fill it as follows::

[Server]
host: localhost
user: <username>
password: <PostgreSQL user password>
database: <dbname>

SeqVault is now ready to be used.

cc3d-runner

CC3d-Runner is a small wrapper around the command-line interface of CompuCell3D, to run headless simulations.

Helper modules

The main helper module is the incenp.bio.seq.usa module, which provides an implementation of the Uniform Sequence Address format .

A Uniform Sequence Address or USA is a way to refer to a biological sequence and to fetch that sequence from its source.

The following example shows how to read a sequence specified by a USA::

from incenp.bio.seq import usa

records = usa.read_usa('genbank::file.gb:acc:X65923')

The returned records object is an array of Bio.SeqRecord.SeqRecord objects.

Prerequisites

The following Python packages are necessary:

ReportLab is required to run the plasmm subcommand of the seqtool utility.

The seqvault utility requires a SQL adapter for the database server you plan to use it with, such as psycopg2 for PostgreSQL.

Copying

Incenp.Bioutils is distributed under the terms of the GNU General Public License, version 3 or higher. The full license is included in the COPYING file of the source distribution.

Homepage and repository

The project is located at https://incenp.org/dvlpt/bioutils.html. The source code is available in a Git repository at https://git.incenp.org/damien/bioutils.