Skip to content
Contact Support

GLOSSARY

List of terms used in this documentation.

ABAQUS:

Finite Element Analysis software for modeling, visualization and best-in-class implicit and explicit dynamics FEA.

ABRicate:

Mass screening of contigs for antimicrobial and virulence genes

ABySS:

Assembly By Short Sequences - a de novo, parallel, paired-end sequence assembler

ACTC:

ACTC converts independent triangles into triangle strips or fans.

AGAT:

Suite of tools to handle gene annotations in any GTF/GFF format.

AGDR:

The Aotearoa Genomic Data Repository provides secure within-nation storage, management and sharing of non-human genomic data generated from biological and environmental samples originating in Aotearoa New Zealand.

AGE:

Alignment of sequences with structural variants.

AMOS:

Collection of tools for genome assembly

AMRFinderPlus:

NCBI Antimicrobial Resistance Gene Finder Plus

ANIcalculator:

Calculate the bidirectional average nucleotide identity (gANI) and Alignment Fraction (AF) between two genomes.

ANNOVAR:

Efficient software tool to utilize update-to-date information to functionally annotate genetic variants detected from diverse genomes .

ANSYS:

A bundle of computer-aided engineering software including Fluent and CFX.

ANTLR:

ANother Tool for Language Recognition

ANTs:

ANTs extracts information from complex datasets that include imaging. ANTs is useful for managing, interpreting and visualizing multidimensional data.

AOCC:

AMD Optimized C/C++ & Fortran compilers (AOCC) based on LLVM 13.0

AOCL-BLIS:

Optimized version of BLIS for AMD EPYC family of processors..

AOCL-FFTW:

Optimized version of FFTW for AMD EPYC family of processors.

AOCL-ScaLAPACK:

Optimized version of ScaLAPACK for AMD EPYC family of processors.

APR:

Apache Portable Runtime (APR) libraries.

APR-util:

Apache Portable Runtime (APR) util libraries.

ARIBA:

Antimicrobial Resistance Identification By Assembly

ASAGI:

a pArallel Server for Adaptive GeoInformation

ATK:

ATK provides the set of accessibility interfaces that are implemented by other toolkits and applications.

AUGUSTUS:

AUGUSTUS is a program that predicts genes in eukaryotic genomic sequences

Abseil:

Collection of C++ library code designed to augment the C++ standard library.

AdapterRemoval:

Ssearches for and removes remnant adapter sequences from High-Throughput Sequencing data.

AdaptiveCpp:

AdaptiveCpp (formerly hipSYCL) is a SYCL implementation targeting CPUs and GPUs, with a focus on leveraging existing toolchains such as CUDA or HIP

Advisor:

Vectorization Optimization and Thread Prototyping - Vectorize & thread code or performance “dies” - Easy workflow + data + tips = faster code faster - Prioritize, Prototype & Predict performance gain

AlphaFold:

AlphaFold can predict protein structures with atomic accuracy even where no similar structure is known

AlphaFold2DB:

AlphaFold2 databases

AlwaysIntelMKL:

Overrides the MKL internal utility function mkl_serv_intel_cpu_true so that AVX2 optimised kernels will be used, even when running on an AMD CPU.

Anaconda3:

Built to complement the rich, open source Python community, the Anaconda platform provides an enterprise-ready data analytics platform that empowers companies to adopt a modern open data science analytics architecture.

IMPORTANT: This version of Anaconda Python comes with Intel MKL support to speed up certain types of mathematical computations, such as linear algebra or FFT. The module sets

       MKL_NUM_THREADS=1

       to run MKL on a single thread by default, avoiding accidental oversubscription
       of cores. The number of threads can be increased for large problems, please
       refer to the Intel MKL documentation for guidance.

Apptainer:

Apptainer is a portable application stack packaging and runtime utility.

Armadillo:

C++ linear algebra library (matrix maths) aiming towards a good balance between speed and ease of use.

Arrow:

Apache Arrow, a cross-language development platform for in-memory data.

Aspera-CLI:

IBM Aspera Command-Line Interface (the Aspera CLI) is a collection of Aspera tools for performing high-speed, secure data transfers from the command line. The Aspera CLI is for users and organizations who want to automate their transfer workflows.

AutoDock-GPU:

OpenCL and Cuda accelerated version of AutoDock. It leverages its embarrasingly parallelizable LGA by processing ligand-receptor poses in parallel over multiple compute units.

AutoDock_Vina:

AutoDock Vina is an open-source program for doing molecular docking.

Autoconf-archive:

A collection of more than 500 macros for GNU Autoconf

BBMap:

BBMap short read aligner, and other bioinformatic tools.

BCFtools:

Manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF.

BCL-Convert:

Converts per cycle binary data output by Illumina sequencers containing basecall files and quality scores to per read FASTQ files

BEAST:

Bayesian MCMC phylogenetic analysis of molecular sequences for reconstructing phylogenies and testing evolutionary hypotheses.

BEDOPS:

BEDOPS is an open-source command-line toolkit that performs highly efficient and scalable Boolean and other set operations, statistical calculations, archiving, conversion and other management of genomic data of arbitrary scale.

BEDTools:

The BEDTools utilities allow one to address common genomics tasks such as finding feature overlaps and computing coverage. The utilities are largely based on four widely-used file formats: BED, GFF/GTF, VCF, and SAM/BAM.

BEEF:

BEEF is a library implementing the Bayesian Error Estimation Functional, a description of which can be found here:

http://dx.doi.org/10.1103/PhysRevB.85.235149

BGC-Bayesian-genomic-clines:

Collection of code for Bayesian genomic cline analyses.

BLAST:

Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences.

BLASTDB:

BLAST databases downloaded from NCBI.

BLAT:

BLAT on DNA is designed to quickly find sequences of 95% and greater similarity of length 25 bases or more.

BLIS:

BLIS is a portable software framework for instantiating high-performance BLAS-like dense linear algebra libraries.

BOLT-LMM:

The BOLT-LMM algorithm for mixed model association testing, and the BOLT-REML algorithm for variance components analysis

BRAKER:

BRAKER is a pipeline for fully automated prediction of protein coding genes with GeneMark-ES/ET and AUGUSTUS in novel eukaryotic genomes.

BUSCO:

Assessing genome assembly and annotation completeness with Benchmarking Universal Single-Copy Orthologs

BWA:

Burrows-Wheeler Aligner (BWA) is an efficient program that aligns relatively short nucleotide sequences against a long reference sequence such as the human genome.

BamTools:

BamTools provides both a programmer's API and an end-user's toolkit for handling BAM files.

Bandage:

Bandage is a program for visualising de novo assembly graphs

Basilisk:

Basilisk is a Free Software program for the solution of partial differential equations on adaptive Cartesian meshes.

BayPass:

Genome-Wide Scan for Adaptive Differentiation and Association Analysis with population-specific covariables

BayeScan:

Identify candidate loci under natural selection from genetic data, using differences in allele frequencies between populations.

BayesAss:

Program for inference of recent immigration rates between populations using unlinked multilocus genotypes

Bazel:

Bazel is a build tool that builds code quickly and reliably. It is used to build the majority of Google's software.

Beagle:

Package for phasing genotypes and for imputing ungenotyped markers.

BiG-SCAPE:

Constructs sequence similarity networks of Biosynthetic Gene Clusters (BGCs) and groups them into Gene Cluster Families (GCFs).

Bifrost:

Highly parallel construction, indexing and querying of colored and compacted de Bruijn graphs.

Bio-DB-BigFile:

Read BigWig and BigBed genome feature databases

Bio-DB-HTS:

Read files using HTSlib including BAM/CRAM, Tabix and BCF database files

BioPP:

Bio++ is a set of C++ libraries for Bioinformatics, including sequence analysis, phylogenetics, molecular evolution and population genetics. Bio++ is Object Oriented and is designed to be both easy to use and computer efficient. Bio++ intends to help programmers to write computer expensive programs, by providing them a set of re-usable tools.

Bismark:

A tool to map bisulfite converted sequence reads and determine cytosine methylation states

Bison:

Bison is a general-purpose parser generator that converts an annotated context-free grammar into a deterministic LR or generalized LR (GLR) parser employing LALR(1) parser tables.

BlenderPy:

Blender provides a pipeline for 3D modeling, rigging, animation, simulation, rendering, compositing, motion tracking, video editing and 2D animation.
This particular build of Blender provides a Python package 'bpy' rather than the stand-alone application.

Boost:

Boost provides free peer-reviewed portable C++ source libraries.

Bowtie:

Ultrafast, memory-efficient short read aligner.

Bowtie2:

Ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.

Bpipe:

A platform for running big bioinformatics jobs that consist of a series of processing stages

Bracken:

Hghly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.

BreakSeq2:

Nucleotide-resolution analysis of structural variants

CCL:

Clozure CL (often called CCL for short) is a free Common Lisp implementation

CD-HIT:

CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences.

CDO:

CDO is a collection of command line Operators to manipulate and analyse Climate and NWP model Data.

CFITSIO:

CFITSIO is a library of C and Fortran subroutines for reading and writing data files in FITS (Flexible Image Transport System) data format.

CGAL:

The goal of the CGAL Open Source Project is to provide easy access to efficient and reliable geometric algorithms in the form of a C++ library.

CMake:

CMake, the cross-platform, open-source build system. CMake is a family of tools designed to build, test and package software.

CNVnator:

Copy Number Variation discovery and genotyping from depth of read mapping.

CNVpytor:

Python package and command line tool for CNV/CNA analysis from depth-of-coverage by mapped read

COMSOL:

COMSOL is a multiphysics solver that provides a unified workflow for electrical, mechanical, fluid, and chemical applications.

CONCOCT:

Program for unsupervised binning of metagenomic contigs by using nucleotide composition, coverage data in multiple samples and linkage data from paired end reads.

CP2K:

CP2K is a freely available (GPL) program, written in Fortran 95, to perform atomistic and molecular simulations of solid state, liquid, molecular and biological systems. It provides a general framework for different methods such as e.g. density functional theory (DFT) using a mixed Gaussian and plane waves approach (GPW), and classical pair and many-body potentials.

CPMD:

The CPMD code is a parallelized plane wave / pseudopotential implementation of DFT, particularly designed for ab-initio molecular dynamics.

CPU:

Electronic circuitry that executes instructions of a computer program.

CRABS:

Creating Reference databases for Amplicon-Based Sequencing.

CRAMINO:

A tool for quick quality assessment of cram and bam files, intended for long read sequencing

CTPL:

C++ Thread Pool Library

CUDA:

CUDA (formerly Compute Unified Device Architecture) is a parallel computing platform and programming model created by NVIDIA and implemented by the graphics processing units (GPUs) that they produce. CUDA gives developers access to the virtual instruction set and memory of the parallel computational elements in CUDA GPUs.

CUnit:

Automated testing framework for C.

Canu:

Sequence assembler designed for high-noise single-molecule sequencing.

CapnProto:

Fast data interchange format and capability-based RPC system.

Catch2:

A modern, C++-native, header-only, test framework for unit-tests, TDD and BDD - using C++11, C++14, C++17 and later (or C++03 on the Catch1.x branch)

CellRanger:

Cell Ranger is a set of analysis pipelines that process Chromium single-cell RNA-seq output to align reads, generate gene-cell matrices and perform clustering and gene expression analysis.

Centrifuge:

Classifier for metagenomic sequences

Cereal:

C++11 serialization library

CheckM:

CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes.

CheckM2:

Rapid assessment of genome bin quality using machine learning

CheckV:

Assess the quality of metagenome-assembled viral genomes.

Circlator:

A tool to circularize genome assemblies

Circos:

Package for visualizing data in a circular layout - this makes Circos ideal for exploring relationships between objects or positions.

Clair3:

Syumphonizing pileup and full-alignment for high-performance long-read variant calling.

Clang:

C, C++, Objective-C compiler, based on LLVM. Does not include C++ standard library -- use libstdc++ from GCC.

Clustal-Omega:

Clustal Omega is a multiple sequence alignment program for proteins. It produces biologically meaningful multiple sequence alignments of divergent sequences. Evolutionary relationships can be seen via viewing Cladograms or Phylograms

ClustalW2:

ClustalW2 is a general purpose multiple sequence alignment program for DNA or proteins.

Corset:

Clusters contigs and counts reads from de novo assembled transcriptomes.

CoverM:

DNA read coverage and relative abundance calculator focused on metagenomics applications

CppUnit:

C++ port of the JUnit framework for unit testing.

CubeLib:

Cube general purpose C++ library component and command-line tools.

CubeWriter:

Cube high-performance C writer library component.

Transcript assembly, differential expression, and differential regulation for RNA-Seq

Cytoscape:

Cytoscape is an open source software platform for visualizing molecular interaction networks and biological pathways and integrating these networks with annotations, gene expression profiles and other state data.

D-Genies:

D-Genies also allows to display dot plots from other aligners by uploading their PAF or MAF alignment file.

DAS_Tool:

DAS Tool is an automated method that integrates the results of a flexible number of binning algorithms to calculate an optimized, non-redundant set of bins from a single assembly.

DB:

Berkeley DB enables the development of custom data management solutions, without the overhead traditionally associated with such custom projects.

DBus:

D-Bus is a message bus system, a simple way for applications to talk to one another. In addition to interprocess communication, D-Bus helps coordinate process lifecycle; it makes it simple and reliable to code a "single instance" application or daemon, and to launch applications and daemons on demand when their services are needed.

DFT-D4:

Generally Applicable Atomic-Charge Dependent London Dispersion Correction.

DIAMOND:

Sequence aligner for protein and translated DNA searches

DISCOVARdenovo:

Assembler suitable for large genomes based on Illumina reads of length 250 or longer.

DOI:

A unique identifier that identifies digital objects. The object may change physical locations, but the DOI assigned to that object will never change.

DRAM:

Tool for annotating metagenomic assembled genomes and VirSorter identified viral contigs..

DaliLite:

Tool set for simulating/evaluating SVs, merging and comparing SVs within and among samples, and includes various methods to reformat or summarize SVs.

DeconSeq:

A tool that can be used to automatically detect and efficiently remove sequence contaminations from genomic and metagenomic datasets.

DeepLabCut:

Efficient method for 3D markerless pose estimation based on transfer learning with deep neural networks.

Delft3D:

Integrated simulation of sediment transport and morphology, waves, water quality and ecology.

Delft3D_FM:

3D modeling suite to investigate hydrodynamics, sediment transport and morphology and water quality for fluvial, estuarine and coastal environments

Delly:

Structural variant discovery by integrated paired-end and split-read analysis

Dorado:

High-performance, easy-to-use, open source basecaller for Oxford Nanopore reads.

Doxygen:

Doxygen is a documentation system for C++, C, Java, Objective-C, Python, IDL (Corba and Microsoft flavors), Fortran, VHDL, PHP, C#, and to some extent D.

Dsuite:

Fast calculation of the ABBA-BABA statistics across many populations/species

EDTA:

Automated whole-genome de-novo TE annotation and benchmarking the annotation performance of TE libraries.

EIGENSOFT:

The EIGENSOFT package combines functionality from our population genetics methods (Patterson et al. 2006) and our EIGENSTRAT stratification correction method (Price et al. 2006). The EIGENSTRAT method uses principal components analysis to explicitly model ancestry differences between cases and controls along continuous axes of variation; the resulting correction is specific to a candidate marker’s variation in frequency across ancestral populations, minimizing spurious associations while maximizing power to detect true associations. The EIGENSOFT package has a built-in plotting script and supports multiple file formats and quantitative phenotypes.

ELPA:

Eigenvalue SoLvers for Petaflop-Applications .

EMAN2:

Greyscale scientific image processing suite with a primary focus on processing data from transmission electron microscopes

EMBOSS:

EMBOSS is 'The European Molecular Biology Open Software Suite'. EMBOSS is a free Open Source software analysis package specially developed for the needs of the molecular biology (e.g. EMBnet) user community.

ENMTML:

R package for integrated construction of Ecological Niche Models.

ESMF:

The Earth System Modeling Framework (ESMF) is software for building and coupling weather, climate, and related models.

ETE:

A Python framework for the analysis and visualization of phylogenetic trees

Eigen:

Eigen is a C++ template library for linear algebra: matrices, vectors, numerical solvers, and related algorithms.

Embree:

Embree is a collection of high-performance ray tracing kernels, developed at Intel. The target users of Embree are graphics application engineers who want to improve the performance of their photo-realistic rendering application by leveraging Embree's performance-optimized ray tracing kernels.

EukRep-EukCC:

Completeness and contamination estimator for metagenomic assembled microbial eukaryotic genomes. Also condatains smetana, carveme and memote .

ExaBayes:

Bayesian tree inference, particularly suitable for large-scale analyses.

ExaML:

Exascale Maximum Likelihood for phylogenetic inference using MPI.

ExpansionHunter:

Tool for estimating repeat sizes

Extrae:

Extrae is capable of instrumenting applications based on MPI, OpenMP, pthreads, CUDA1, OpenCL1, and StarSs1 using different instrumentation approaches

FALCON:

Falcon: a set of tools for fast aligning long reads for consensus and assembly

FASTX-Toolkit:

Tools for Short-Reads FASTA/FASTQ files preprocessing.

FCM:

FCM Build - A powerful build system for modern Fortran software applications. FCM Version Control - Wrappers to the Subversion version control system, usage conventions and processes for scientific software development.

FDS:

Fire Dynamics Simulator (FDS) is a large-eddy simulation (LES) code for low-speed flows, with an emphasis on smoke and heat transport from fires.

FFTW:

FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and of both real and complex data.

FFTW.MPI:

FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and of both real and complex data.

FFmpeg:

A complete, cross-platform solution to record, convert and stream audio and video.

FIGARO:

An efficient and objective tool for optimizing microbiome rRNA gene trimming parameters.

FLTK:

FLTK is a cross-platform C++ GUI toolkit for UNIX/Linux (X11), Microsoft Windows, and MacOS X. FLTK provides modern GUI functionality without the bloat and supports 3D graphics via OpenGL and its built-in GLUT emulation.

FTGL:

FTGL is a free open source library to enable developers to use arbitrary fonts in their OpenGL (www.opengl.org) applications.

FastANI:

Tool for fast alignment-free computation of whole-genome Average Nucleotide Identity (ANI).

FastME:

FastME: a comprehensive, accurate and fast distance-based phylogeny inference program.

FastQC:

A set of tools (in Java) for working with next generation sequencing data in the BAM format.

FastQ_Screen:

FastQ Screen allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.

FastTree:

FastTree infers approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences. FastTree can handle alignments with up to a million of sequences in a reasonable amount of time and memory.

File-Rename:

A Perl version of the rename utility, with support for regular expressions.

FileSender:

Send large files quickly and securely using REANNZ FileSender.

Filtlong:

Tool for filtering long reads by quality.

FimTyper:

Identifies the FimH type in total or partial sequenced isolates of E. coli..

FlexiBLAS:

FlexiBLAS is a wrapper library that enables the exchange of the BLAS and LAPACK implementation used by a program without recompiling or relinking it.

Flye:

Flye is a de novo assembler for long and noisy reads, such as those produced by PacBio and Oxford Nanopore Technologies.

FragGeneScan:

FragGeneScan is an application for finding (fragmented) genes in short reads.

FreeBayes:

Genetic variant detector designed to find polymorphisms smaller than the length of a short-read sequencing alignment.

FreeSurfer:

FreeSurfer is a set of tools for analysis and visualization of structural and functional brain imaging data. FreeSurfer contains a fully automatic structural imaging stream for processing cross sectional and longitudinal data.

FreeXL:

FreeXL is an open source library to extract valid data from within an Excel (.xls) spreadsheet.

FriBidi:

Free Implementation of the Unicode Bidirectional Algorithm.

GATK:

The Genome Analysis Toolkit or GATK is a software package developed at the Broad Institute to analyse next-generation resequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

GCC:

The GNU Compiler Collection includes front ends for C, C++, Objective-C, Fortran, Java, and Ada, as well as libraries for these languages (libstdc++, libgcj,...).

GCCcore:

The GNU Compiler Collection includes front ends for C, C++, Objective-C, Fortran, Java, and Ada, as well as libraries for these languages (libstdc++, libgcj,...).

GD:

Interface to Gd Graphics Library

GDAL:

GDAL is a translator library for raster geospatial data formats that is released under an X/MIT style Open Source license by the Open Source Geospatial Foundation. As a library, it presents a single abstract data model to the calling application for all supported formats. It also comes with a variety of useful command-line utilities for data translation and processing. NOTE: The GDAL IO cache by default uses 5% of total memory. This seems not necessary. This module sets GDAL_CACHEMAX=256 (256MB), which should have no performance impact. Feel free to change if necessary, using 'export GDAL_CACHEMAX=xxx' (in your job script) after loading the GDAL module.

GEMMA:

Genome-wide Efficient Mixed Model Association

GEOS:

GEOS (Geometry Engine - Open Source) is a C++ port of the Java Topology Suite (JTS)

GLM:

OpenGL Mathematics (GLM) is a header only C++ mathematics library for graphics software based on the OpenGL Shading Language (GLSL) specifications.

GLPK:

GNU Linear Programming Kit is intended for solving large-scale linear programming (LP), mixed integer programming (MIP), and other related problems.

GLib:

GLib is one of the base libraries of the GTK+ project

GMAP-GSNAP:

GMAP: A Genomic Mapping and Alignment Program for mRNA and EST Sequences GSNAP: Genomic Short-read Nucleotide Alignment Program

GMP:

GMP is a free library for arbitrary precision arithmetic, operating on signed integers, rational numbers, and floating point numbers.

GOLD:

A genetic algorithm for docking flexible ligands into protein binding sites

GObject-Introspection:

GObject introspection is a middleware layer between C libraries (using GObject) and language bindings. The C library can be scanned at compile time and generate a metadata file, in addition to the actual native C library. Then at runtime, language bindings can read this metadata and automatically provide bindings to call into the C library.

GPAW:

GPAW is a density-functional theory (DFT) Python code based on the projector-augmented wave (PAW) method and the atomic simulation environment (ASE). It uses real-space uniform grids and multigrid methods or atom-centered basis-functions.

GPFS:

High-performance clustered file system software developed by IBM.

GRASS:

The Geographic Resources Analysis Support System - used for geospatial data management and analysis, image processing, graphics and maps production, spatial modeling, and visualization

GRIDSS:

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.

GROMACS:

GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles.

This is a GPU enabled build, containing both MPI and threadMPI binaries.

GSL:

The GNU Scientific Library (GSL) is a numerical library for C and C++ programmers. The library provides a wide range of mathematical routines such as random number generators, special functions and least-squares fitting.

GST-plugins-base:

GStreamer plug-ins and elements.

GStreamer:

library for constructing graphs of media-handling components..

GTDB-Tk:

A toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes.

GTK+:

GTK+ is the primary library used to construct user interfaces in GNOME.

GTS:

GTS stands for the GNU Triangulated Surface Library. It is an Open Source Free Software Library intended to provide a set of useful functions to deal with 3D surfaces meshed with interconnected triangles.

GUI:

A digital interface in which a user interacts with graphical components such as icons, buttons, and menus.

GUSHR:

Assembly-free construction of UTRs from short read RNA-Seq data on the basis of coding sequence annotation.

Gdk-Pixbuf:

The Gdk Pixbuf is a toolkit for image loading and pixel buffer manipulation. It is used by GTK+ 2 and GTK+ 3 to load and manipulate images. In the past it was distributed as part of GTK+ 2 but it was split off into a separate package in preparation for the change to GTK+ 3.

GeneMark-ES:

Eukaryotic gene prediction suite with automatic training

GenoVi:

Generates circular genome representations for complete, draft, and multiple bacterial and archaeal genomes.

GenomeThreader:

GenomeThreader is a software tool to compute gene structure predictions.

GetOrganelle:

Toolkit to assemble organelle genome from genomic skimming data.

GlimmerHMM:

Gene finder based on a Generalized Hidden Markov Model.

Go:

An open source programming language

Graphviz:

Graphviz is open source graph visualization software. Graph visualization is a way of representing structural information as diagrams of abstract graphs and networks. It has important applications in networking, bioinformatics, software engineering, database and web design, machine learning, and in visual interfaces for other technical domains.

Gubbins:

Genealogies Unbiased By recomBinations In Nucleotide Sequences

HDF:

HDF (also known as HDF4) is a library and multi-object file format for storing and managing data between machines.

HDF5:

HDF5 is a unique technology suite that makes possible the management of extremely large and complex data collections.

HISAT2:

HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) against the general human population (as well as against a single reference genome).

HMMER:

HMMER is used for searching sequence databases for homologs of protein sequences, and for making protein sequence alignments. It implements methods using probabilistic models called profile hidden Markov models (profile HMMs). Compared to BLAST, FASTA, and other sequence alignment and database search tools based on older scoring methodology, HMMER aims to be significantly more accurate and more able to detect remote homologs because of the strength of its underlying mathematical models. In the past, this strength came at significant computational expense, but in the new HMMER3 project, HMMER is now essentially as fast as BLAST.

HMMER2:

HMMER is used for searching sequence databases for homologs of protein sequences, and for making protein sequence alignments. It implements methods using probabilistic models called profile hidden Markov models (profile HMMs). Compared to BLAST, FASTA, and other sequence alignment and database search tools based on older scoring methodology, HMMER aims to be significantly more accurate and more able to detect remote homologs because of the strength of its underlying mathematical models. In the past, this strength came at significant computational expense, but in the new HMMER3 project, HMMER is now essentially as fast as BLAST.

HOPS:

Pipeline which focuses on screening MALT data for the presence of a user-specified list of target species.

HPC:

Like a regular computer, but larger. Primarily used for heating data centers.

HTSeq:

HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments.

HTSlib:

A C library for reading/writing high-throughput sequencing data. This package includes the utilities bgzip and tabix

HarfBuzz:

HarfBuzz is an OpenType text shaping engine.

HpcGridRunner:

HPC GridRunner is a simple command-line interface to high throughput computing using a variety of different grid computing platforms, including LSF, SGE, SLURM, and PBS.

Humann:

Pipeline for efficiently and accurately determining the coverage and abundance of microbial pathways in a community from metagenomic data.

HybPiper:

Extracting Coding Sequence and Introns for Phylogenetics from High-Throughput Sequencing Reads Using Target Enrichment.

Hypre:

Hypre is a library for solving large, sparse linear systems of equations on massively parallel computers. The problems of interest arise in the simulation codes being developed at LLNL and elsewhere to study physical phenomena in the defense, environmental, energy, and biological sciences.

ICU:

C/C++ and Java libraries providing Unicode and Globalization support for software applications.

IDBA-UD:

IDBA-UD is a iterative De Bruijn Graph De Novo Assembler for Short Reads Sequencing data with Highly Uneven Sequencing Depth.

IGV:

The Integrative Genomics Viewer (IGV) is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data

IMPUTE:

Genotype imputation and haplotype phasing.

IQ-TREE:

Efficient phylogenomic software by maximum likelihood

IRkernel:

R packages for providing R kernel for Jupyter.

ISA-L:

Intelligent Storage Acceleration Library

ImageMagick:

Create, edit, compose, or convert bitmap images

Infernal:

Infernal ('INFERence of RNA ALignment') is for searching DNA sequence databases for RNA structure and sequence similarities.

Inspector:

Intel Inspector XE is an easy to use memory error checker and thread checker for serial and parallel applications

InterProScan:

Sequence analysis application (nucleotide and protein sequences) that combines different protein signature recognition methods into one resource.

JAGS:

Just Another Gibbs Sampler - a program for the statistical analysis of Bayesian hierarchical models by Markov Chain Monte Carlo.

JUnit:

A programmer-oriented testing framework for Java.

JasPer:

The JasPer Project is an open-source initiative to provide a free software-based reference implementation of the codec specified in the JPEG-2000 Part-1 standard.

Java:

Java Platform, Standard Edition (Java SE) lets you develop and deploy Java applications on desktops and servers.

Jellyfish:

Jellyfish is a tool for fast, memory-efficient counting of k-mers in DNA.

JsonCpp:

JsonCpp is a C++ library that allows manipulating JSON values, including serialization and deserialization to and from strings. It can also preserve existing comment in unserialization/serialization steps, making it a convenient format to store user input files.

JupyterLab:

An extensible environment for interactive and reproducible computing, based on the Jupyter Notebook and Architecture.

KAT:

The K-mer Analysis Toolkit (KAT) contains a number of tools that analyse and compare K-mer spectra.

KEALib:

KEALib provides an implementation of the GDAL data model. The format supports raster attribute tables, image pyramids, meta-data and in-built statistics while also handling very large files and compression throughout. Based on the HDF5 standard, it also provides a base from which other formats can be derived and is a good choice for long term data archiving. An independent software library (libkea) provides complete access to the KEA image format and a GDAL driver allowing KEA images to be used from any GDAL supported software.

KMC:

Disk-based programm for counting k-mers from (possibly gzipped) FASTQ/FASTA files.

Kaiju:

Kaiju is a program for sensitive taxonomic classification of high-throughput sequencing reads from metagenomic whole genome sequencing experiments

Kent_tools:

Collection of tools used by the UCSC genome browser.

KmerGenie:

KmerGenie estimates the best k-mer length for genome de novo assembly.

KorfSNAP:

Semi-HMM-based Nucleic Acid Parser

Kraken2:

Taxonomic sequence classifier.

KronaTools:

Krona Tools is a set of scripts to create Krona charts from several Bioinformatics tools as well as from text and XML files.

KyotoCabinet:

Library of routines for managing a database.

LAME:

LAME is a high quality MPEG Audio Layer III (MP3) encoder licensed under the LGPL.

LAMMPS:

LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. LAMMPS has potentials for solid-state materials (metals, semiconductors) and soft matter (biomolecules, polymers) and coarse-grained or mesoscopic systems. It can be used to model atoms or, more generically, as a parallel particle simulator at the atomic, meso, or continuum scale. LAMMPS runs on single processors or in parallel using message-passing techniques and a spatial-decomposition of the simulation domain. The code is designed to be easy to modify or extend with new functionality.

LAST:

LAST finds similar regions between sequences.

LASTZ:

LASTZ is a program for aligning DNA sequences, a pairwise aligner. Originally designed to handle sequences the size of human chromosomes and from different species, it is also useful for sequences produced by NGS sequencing technologies such as Roche 454.

LDC:

D programming language compiler

LEfSe:

Determines the features most likely to explain differences between classes by coupling standard tests for statistical significance with additional tests encoding biological consistency and effect relevance

Alignment-free scaffolding of genome assembly drafts with long reads

LLVM:

The LLVM Core libraries provide a modern source- and target-independent optimizer, along with code generation support for many popular CPUs (as well as some less common ones!) These libraries are built around a well specified code representation known as the LLVM intermediate representation ("LLVM IR"). The LLVM Core libraries are well documented, and it is particularly easy to invent your own language (or port an existing compiler) to use LLVM as an optimizer and code generator.

LMDB:

LMDB is a fast, memory-efficient database. With memory-mapped files, it has the read performance of a pure in-memory database while retaining the persistence of standard disk-based databases.

LSD2:

Least-squares methods to estimate rates and dates from phylogenies

LTR_retriever:

Highly accurate and sensitive program for identification of LTR retrotransposons; The LTR Assembly Index (LAI) is also included in this package.

LUMPY:

A probabilistic framework for structural variant discovery.

LZO:

Portable lossless data compression library

LibTIFF:

tiff: Library and tools for reading and writing TIFF data files

Libint:

Libint library is used to evaluate the traditional (electron repulsion) and certain novel two-body matrix elements (integrals) over Cartesian Gaussian functions used in modern atomic and molecular theory.

Liftoff:

Tool that accurately maps annotations in GFF or GTF between assemblies of the same, or closely-related species.

LittleCMS:

Color management engine.

LongStitch:

A genome assembly correction and scaffolding pipeline using long reads

M4:

GNU M4 is an implementation of the traditional Unix macro processor. It is mostly SVR4 compatible although it has some extensions (for example, handling more than 9 positional parameters to macros). GNU M4 also has built-in functions for including files, running shell commands, doing arithmetic, etc.

MAFFT:

Multiple sequence alignment program offering a range of methods.

MAGMA:

Tool for gene analysis and generalized gene-set analysis of GWAS data.

MAKER:

Genome annotation pipeline

MATIO:

matio is an C library for reading and writing Matlab MAT files.

MATLAB:

A high-level language and interactive environment for numerical computing.

MCL:

The MCL algorithm is short for the Markov Cluster Algorithm, a fast and scalable unsupervised cluster algorithm for graphs (also known as networks) based on simulation of (stochastic) flow in graphs.

MCR:

The Matlab Compiler Runtime is required for running compiled MATLAB executables without MATLAB itself.

MEGAHIT:

An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph

METABOLIC:

Metabolic And Biogeochemistry anaLyses In microbes

METIS:

METIS is a set of serial programs for partitioning graphs, partitioning finite element meshes, and producing fill reducing orderings for sparse matrices. The algorithms implemented in METIS are based on the multilevel recursive-bisection, multilevel k-way, and multi-constraint partitioning schemes.

MMseqs2:

MMseqs2: ultra fast and sensitive search and clustering suite

MODFLOW:

MODFLOW is the U.S. Geological Survey modular finite-difference flow model, which is a computer code that solves the groundwater flow equation. The program is used by hydrogeologists to simulate the flow of groundwater through aquifers.

MPFR:

The MPFR library is a C library for multiple-precision floating-point computations with correct rounding.

MPI:

A standardised message-passing standard designed to function on parallel computing architectures.

MSMC:

Multiple Sequentially Markovian Coalescent, infers population size and gene flow from multiple genome sequences

MUMPS:

A parallel sparse direct solver

MUMmer:

MUMmer is a system for rapidly aligning entire genomes, whether in complete or draft form. AMOS makes use of it.

MUSCLE:

MUSCLE is a program for creating multiple alignments of amino acid or nucleotide sequences. A range of options is provided that give you the choice of optimizing accuracy, speed, or some compromise between the two.

MUST:

MUST detects usage errors of the Message Passing Interface (MPI) and reports them to the user.

MaSuRCA:

MaSuRCA is whole genome assembly software. It combines the efficiency of the de Bruijn graph
and Overlap-Layout-Consensus (OLC) approaches. MaSuRCA can assemble data sets containing only short reads from Illumina sequencing or a mixture of short reads and long reads (Sanger, 454, Pacbio and Nanopore).

Magma:

Magma is a large, well-supported software package designed for computations in algebra, number theory, algebraic geometry and algebraic combinatorics. It provides a mathematically rigorous environment for defining and working with structures such as groups, rings, fields, modules, algebras, schemes, curves, graphs, designs, codes and many others. Magma also supports a number of databases designed to aid computational research in those areas of mathematics which are algebraic in nature.

whatis([==[Homepage: http://magma.maths.usyd.edu.au/magma/

Mamba:

Mamba is a fast, robust, and cross-platform package manager.

MarkerMiner:

Workflow for effective discovery of SCN loci in flowering plants angiosperms

Mash:

Fast genome and metagenome distance estimation using MinHash

MashMap:

Implements a fast and approximate algorithm for computing local alignment boundaries between long DNA sequences

Mashtree:

Create a tree using Mash distances.

Maven:

Binary maven install, Apache Maven is a software project management and comprehension tool. Based on the concept of a project object model (POM), Maven can manage a project's build, reporting and documentation from a central piece of information.

MaxBin:

MaxBin is software for binning assembled metagenomic sequences based on an Expectation-Maximization algorithm.

Merqury:

Evaluate genome assemblies with k-mers and more

Mesa:

Mesa is an open-source implementation of the OpenGL specification - a system for rendering interactive 3D graphics.

Note that this build enables CPU-based rendering with OpenSWR and LLVM. The module is intended to be used with visualisation software, such as ParaView, on nodes where no GPU hardware is available.

Both on-screen and off-screen rendering are supported.

Meson:

Meson is a cross-platform build system designed to be both as fast and as user friendly as possible.

MetaBAT:

An efficient tool for accurately reconstructing single genomes from complex microbial communities

MetaEuk:

MetaEuk is a modular toolkit designed for large-scale gene discovery and annotation in eukaryotic metagenomic contigs.

MetaGeneAnnotator:

MetaGeneAnnotator is a gene-finding program for prokaryote and phage.

MetaPhlAn:

MetaPhlAn is a computational tool for profiling the composition of microbial communities (Bacteria, Archaea, Eukaryotes and Viruses) from metagenomic shotgun sequencing data (i.e. not 16S) with species-level. With the newly added StrainPhlAn module, it is now possible to perform accurate strain-level microbial profiling.

MetaPhlAn2:

MetaPhlAn is a computational tool for profiling the composition of microbial communities (Bacteria, Archaea, Eukaryotes and Viruses) from metagenomic shotgun sequencing data (i.e. not 16S) with species-level. With the newly added StrainPhlAn module, it is now possible to perform accurate strain-level microbial profiling.

MetaSV:

Structural-variant caller

Metaxa2:

Taxonomic classification of rRNA.

MiMiC:

MiMiC: A Framework for Multiscale Modeling in Computational Chemistry

This package includes mimicpy

MiMiC-CommLib:

The MiMiC communication library (MCL) enables communication between external programs coupled through the MiMiC framework.

Miniconda3:

A platform for Python-based data analytics

Miniforge3:

Community-led recipes, infrastructure and distributions for conda.

Minimac3:

Low memory and more computationally efficient implementation of the genotype imputation algorithms.

Minimac4:

Low memory and more computationally efficient implementation of the genotype imputation algorithms.

MitoZ:

Toolkit which aims to automatically filter pair-end raw data, assemble genome, search for mitogenome sequences from the genome assembly result, annotate mitogenome, and mitogenome visualization.

ModDotPlot:

Novel dot plot visualization tool used to view tandem repeats

ModelTest-NG:

Tool for selecting the best-fit model of evolution for DNA and protein alignments.

Molcas:

Molcas is an ab initio quantum chemistry software package developed by scientists to be used by scientists. The basic philosophy is is to be able to treat general electronic structures for molecules consisting of atoms from most of the periodic table. As such, the primary focus of the package is on multiconfigurational methods with applications typically connected to the treatment of highly degenerate states.

Molpro:

Molpro is a complete system of ab initio programs for molecular electronic structure calculations.

Mono:

An open source, cross-platform, implementation of C# and the CLR that is binary compatible with Microsoft.NET.

Monocle3:

An analysis toolkit for single-cell RNA-seq.

Mothur:

Mothur is a single piece of open-source, expandable software to fill the bioinformatics needs of the microbial ecology community.

MrBayes:

MrBayes is a program for the Bayesian estimation of phylogeny.

MultiQC:

Aggregate results from bioinformatics analyses across many samples into a single report. MultiQC searches a given directory for analysis logs and compiles a HTML report. It's a general use tool, perfect for summarising the output from numerous bioinformatics tools.

NAMD:

NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems.

NASM:

NASM: General-purpose x86 assembler

NCCL:

The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multi-node collective communication primitives that are performance optimized for NVIDIA GPUs.

NCL:

NCL is an interpreted language designed specifically for scientific data analysis and visualization.

NCO:

manipulates and analyzes data stored in netCDF-accessible formats, including DAP, HDF4, and HDF5

NECAT:

Rrror correction and de-novo assembly tool for Nanopore long noisy reads

NGS:

NGS is a new, domain-specific API for accessing reads, alignments and pileups produced from Next Generation Sequencing.

NIWA:

Crown Research Institute, conducts research across a broad range of disciplines in the environmental sciences.

NLopt:

NLopt is a free/open-source library for nonlinear optimization, providing a common interface for a number of different free optimization routines available online as well as original implementations of various other algorithms

NSPR:

Netscape Portable Runtime (NSPR) provides a platform-neutral API for system level and libc-like functions.

NSS:

Network Security Services (NSS) is a set of libraries designed to support cross-platform development of security-enabled client and server applications.

NVHPC:

C, C++ and Fortran compilers included with the NVIDIA HPC SDK (previously: PGI)

NWChem:

NWChem aims to provide its users with computational chemistry tools that are scalable both in their ability to treat large scientific computational chemistry problems efficiently, and in their use of available parallel computing resources from high-performance parallel supercomputers to conventional workstation clusters. NWChem software can handle: biomolecules, nanostructures, and solid-state; from quantum to classical, and all combinations; Gaussian basis functions or plane-waves; scaling from one to thousands of processors; properties and relativity.

NanoComp:

Comparing runs of Oxford Nanopore sequencing data and alignments

NanoLyse:

Removing reads mapping to the lambda genome.

NanoPlot:

Plotting suite for Oxford Nanopore sequencing data and alignments.

NanoStat:

Tool for phasing genomic variants using DNA sequencing reads, also called read-based phasing or haplotype assembly.

NeSI:

New Zealand national high performance computing platform.

NewHybrids:

This implements a Gibbs sampler to estimate the posterior probability that genetically sampled individuals fall into each of a set of user-defined hybrid categories.

Newton-X:

NX is a general-purpose program package for simulating the dynamics of electronically excited molecules and molecular assemblies.

NextGenMap:

NextGenMap is a flexible highly sensitive short read mapping tool that handles much higher mismatch rates than comparable algorithms while still outperforming them in terms of runtime.

NextPolish2:

a fast and efficient genome polishing tool for long-read assembly

Nextflow:

Nextflow is a reactive workflow framework and a programming DSL that eases writing computational pipelines with complex data

Nim:

Nim is a systems and applications programming language.

Ninja:

Ninja is a small build system with a focus on speed.

Nsight-Compute:

NVIDIA® Nsight™ Compute is an interactive kernel profiler for CUDA applications. It provides detailed performance metrics and API debugging via a user interface and command line tool.

Nsight-Systems:

NVIDIA® Nsight™ Systems is a system-wide performance analysis tool designed to visualize an application’s algorithm, help you select the largest opportunities to optimize, and tune to scale efficiently across any quantity of CPUs and GPUs

OBITools:

Manipulate various data and sequence files.

OMA:

Orthologous MAtrix project is a method and database for the inference of orthologs among complete genomes

OPARI2:

source-to-source instrumentation tool for OpenMP and hybrid codes. It surrounds OpenMP directives and runtime library calls with calls to the POMP2 measurement interface.

ORCA:

ORCA is a flexible, efficient and easy-to-use general purpose tool for quantum chemistry with specific emphasis on spectroscopic properties of open-shell molecules. It features a wide variety of standard quantum chemical methods ranging from semiempirical methods to DFT to single- and multireference correlated ab initio methods. It can also treat environmental and relativistic effects.

ORCID:

A nonproprietary alphanumeric code to uniquely identify authors and contributors of scholarly communication, bibliographic output and other user-supplied pieces of information.

OSPRay:

OSPRay features interactive CPU rendering capabilities geared towards Scientific Visualization applications. Advanced shading effects such as Ambient Occlusion, shadows, and transparency can be rendered interactively, enabling new insights into data exploration.

OSU-Micro-Benchmarks:

OSU Micro-Benchmarks for MPI

OTF2:

The Open Trace Format 2 is a highly scalable, memory efficient event trace data format plus support library

OTP:

An automatically generated numeric code that authenticates a user for a single login.

OpenBLAS:

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.

OpenBabel:

Open Babel is a chemical toolbox designed to speak the many languages of chemical data. It's an open, collaborative project allowing anyone to search, convert, analyze, or store data from molecular modeling, chemistry, solid-state materials, biochemistry, or related areas.

OpenCV:

OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library. OpenCV was built to provide a common infrastructure for computer vision applications and to accelerate the use of machine perception in the commercial products.

OpenFAST:

Wind turbine multiphysics simulation tool

OpenFOAM:

OpenFOAM is a free, open source CFD software package. OpenFOAM has an extensive range of features to solve anything from complex fluid flows involving chemical reactions, turbulence and heat transfer, to solid dynamics and electromagnetics.

OpenJPEG:

An open-source JPEG 2000 codec written in C

OpenMPI:

The Open MPI Project is an open source MPI-3 implementation.

OpenSSL:

The OpenSSL Project is a collaborative effort to develop a robust, commercial-grade, full-featured, and Open Source toolchain implementing the Secure Sockets Layer (SSL v2/v3) and Transport Layer Security (TLS v1) protocols as well as a full-strength general purpose cryptography library.

OpenSees:

Simulates the performance of structural and geotechnical systems subjected to earthquakes.

OpenSeesPy:

Wraps OpenSees for Python. Load an OpenSees module as well.

OpenSlide:

OpenSlide is a C library that provides a simple interface to read whole-slide images (also known as virtual slides).

OrfM:

A simple and not slow open reading frame (ORF) caller.

OrthoFiller:

Identifies missing annotations for evolutionarily conserved genes.

OrthoFinder:

OrthoFinder is a fast, accurate and comprehensive platform for comparative genomics

OrthoMCL:

Genome-scale algorithm for grouping orthologous protein sequences.

PALEOMIX:

pipelines and tools designed to aid the rapid processing of High-Throughput Sequencing (HTS) data.

PAML:

PAML is a package of programs for phylogenetic analyses of DNA or protein sequences using maximum likelihood.

PAPI:

PAPI provides the tool designer and application engineer with a consistent interface and methodology for use of the performance counter hardware found in most major microprocessors. PAPI enables software engineers to see, in near real time, the relation between software performance and processor events. In addition Component PAPI provides access to a collection of components that expose performance measurement opportunites across the hardware and software stack.

PCRE:

The PCRE library is a set of functions that implement regular expression pattern matching using the same syntax and semantics as Perl 5.

PCRE2:

The PCRE library is a set of functions that implement regular expression pattern matching using the same syntax and semantics as Perl 5.

PDT:

Program Database Toolkit (PDT) is a framework for analyzing source code written in several programming languages and for making rich program knowledge accessible to developers of static and dynamic analysis tools.

PEAR:

Memory-efficient,fully parallelized and highly accurate pair-end read merger.

PEST++:

PEST++ is a software suite aimed at supporting complex numerical models in the decision-support context. Much focus has been devoted to supporting environmental models (groundwater, surface water, etc) but these tools are readily applicable to any computer model.

PETSc:

PETSc, pronounced PET-see (the S is silent), is a suite of data structures and routines for the scalable (parallel) solution of scientific applications modeled by partial differential equations.

PHASIUS:

A tool to visualize phase block structure from (many) BAM or CRAM files together with BED annotation

PLINK is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner. The focus of PLINK is purely on analysis of genotype/phenotype data, so there is no support for steps prior to this (e.g. study design and planning, generating genotype or CNV calls from raw data). Through integration with gPLINK and Haploview, there is some support for the subsequent visualization, annotation and storage of results.

PLUMED:

PLUMED is an open source library for free energy calculations in molecular systems which works together with some of the most popular molecular dynamics engines. Free energy calculations can be performed as a function of many order parameters with a particular focus on biological problems, using state of the art methods such as metadynamics, umbrella sampling and Jarzynski-equation based steered MD. The software, written in C++, can be easily interfaced with both fortran and C/C++ codes.

POSIX:

A set of standard operating system interfaces based on the Unix operating system

PRANK:

Probabilistic multiple alignment program for DNA, codon and amino-acid sequences. .

PROJ:

Program proj is a standard Unix filter function which converts geographic longitude and latitude coordinates into cartesian coordinates

PSpaMM:

Generates inline-Assembly for sparse Matrix Multiplication.

PUMI:

parallel unstructured mesh infrastructure API

Pango:

Pango is a library for laying out and rendering of text, with an emphasis on internationalization. Pango can be used anywhere that text layout is needed, though most of the work on Pango so far has been done in the context of the GTK+ widget toolkit. Pango forms the core of text and font handling for GTK+-2.x.

ParMETIS:

ParMETIS is an MPI-based parallel library that implements a variety of algorithms for partitioning unstructured graphs, meshes, and for computing fill-reducing orderings of sparse matrices. ParMETIS extends the functionality provided by METIS and includes routines that are especially suited for parallel AMR computations and large scale numerical simulations. The algorithms implemented in ParMETIS are based on the parallel multilevel k-way graph-partitioning, adaptive repartitioning, and parallel multi-constrained partitioning schemes.

ParaView:

ParaView is a scientific parallel visualizer.

This version supports CPU-only rendering without X context using the OSMesa library, it does not support GPU rendering.

Use the GALLIUM_DRIVER environment variable to choose a software renderer, it is recommended to use

GALLIUM_DRIVER=swr

for best performance.

Ray tracing using the OSPRay library is also supported.

Parallel:

Build and execute shell commands in parallel

ParallelIO:

A high-level Parallel I/O Library for structured grid applications

Peregrine:

Genome assembler for long reads (length > 10kb, accuracy > 99%). Based on Sparse HIereachical MimiMizER (SHIMMER) for fast read-to-read overlaping

Perl:

Larry Wall's Practical Extraction and Report Language

PhyML:

Phylogenetic estimation using Maximum Likelihood

PhyloPhlAn:

Integrated pipeline for large-scale phylogenetic profiling of genomes and metagenomes.

Pilon:

Pilon is an automated genome assembly improvement and variant detection tool

PnetCDF:

Parallel netCDF: A Parallel I/O Library for NetCDF File Access

Porechop:

Porechop is a tool for finding and removing adapters from Oxford Nanopore reads. Adapters on the ends of reads are trimmed off, and when a read has an adapter in its middle, it is treated as chimeric and chopped into separate reads. Porechop performs thorough alignments to effectively find adapters, even at low sequence identity

Porechop_ABI:

Extension of Porechop whose purpose is to process adapter sequences in ONT reads

PostgreSQL:

Object-relational database system.

Prodigal:

Prodigal (Prokaryotic Dynamic Programming Genefinding Algorithm) is a microbial (bacterial and archaeal) gene finding program developed at Oak Ridge National Laboratory and the University of Tennessee.

ProtHint:

Pipeline for predicting and scoring hints (in the form of introns, start and stop codons) in the genome of interest by mapping and spliced aligning predicted genes to a database of reference protein sequences.

Proteinortho:

Proteinortho is a tool to detect orthologous genes within different species.

PyOpenGL:

PyOpenGL is the most common cross platform Python binding to OpenGL and related APIs.

PyQt:

PyQt5 is a set of Python bindings for v5 of the Qt application framework from The Qt Company. This bundle includes PyQtWebEngine, a set of Python bindings for The Qt Company’s Qt WebEngine framework.

PyTorch:

Tensors and Dynamic neural networks in Python with strong GPU acceleration. PyTorch is a deep learning framework that puts Python first.

Python:

Python is a programming language that lets you work more quickly and integrate your systems more effectively.

Python-Geo:

GDAL, pyModis, RIOS, Fiona, Shapely, descartes and pygrib - Python packages for geospatial data I/O, mostly based on the OSGEO libraries GDAL and OGR

QIIME2:

An open-source bioinformatics pipeline for microbiome analysis from raw DNA sequencing data.

QUAST:

Evaluates genome assemblies

Qt5:

Qt is a comprehensive cross-platform C++ application framework.

QuantumESPRESSO:

Quantum ESPRESSO is an integrated suite of computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves, and pseudopotentials (both norm-conserving and ultrasoft).

QuickTree:

Efficient implementation of the Neighbor-Joining algorithm, capable of reconstructing phylogenies from huge alignments .

R:

R is a free software environment for statistical computing and graphics.

R-Geo:

R packages for Geometric and Geospatial data which depend on GEOS and/or GDAL.

R-bundle-Bioconductor:

Bioconductor provides tools for the analysis and comprehension of high-throughput genomic data.

RAxML:

RAxML search algorithm for maximum likelihood based inference of phylogenetic trees.

RAxML-NG:

RAxML-NG is a phylogenetic tree inference tool which uses maximum-likelihood (ML) optimality criterion. Its search heuristic is based on iteratively performing a series of Subtree Pruning and Regrafting (SPR) moves, which allows to quickly navigate to the best-known ML tree.

RDP-Classifier:

The RDP Classifier is a naive Bayesian classifier that can rapidly and accurately provides taxonomic assignments from domain to genus, with confidence estimates for each assignment.

RE2:

fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python. It is a C++library.

RECON:

De novo identification and classification of repeat sequence families from genomic sequences

REViewer:

Tool for visualizing alignments of reads in regions containing tandem repeats

RFPlasmid:

Predicting plasmid contigs from assemblies

RFdiffusion:

Structure generation, with or without conditional information (a motif, target etc) It can perform a whole range of protein design challenges as we have outlined in the RFdiffusion paper.

RMBlast:

RMBlast supports RepeatMasker searches by adding a few necessary features to the stock NCBI blastn program. These include: Support for custom matrices ( without KA-Statistics ). Support for cross_match-like complexity adjusted scoring. Cross_match is Phil Green's seeded smith-waterman search algorithm. Support for cross_match-like masklevel filtering..

RNAmmer:

consistent and rapid annotation of ribosomal RNA genes.

ROCm:

Platform for GPU Enabled HPC and UltraScale Computing

ROOT:

The ROOT system provides a set of OO frameworks with all the functionality needed to handle and analyze large amounts of data in a very efficient way.

RSEM:

Estimates gene and isoform expression levels from RNA-Seq data

RSGISLib:

The Remote Sensing and GIS software library (RSGISLib) is a collection of tools for processing remote sensing and GIS datasets. The tools are accessed using Python bindings or an XML interface.

RStudio-Server:

RStudio-Server for OpenOnDemand.

Racon:

Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads.

Ragout:

Tool for chromosome assembly using multiple references.

RapidNJ:

An algorithmic engineered implementation of canonical neighbour-joining.

Ratatosk:

Phased hybrid error correction of long reads using colored de Bruijn graphs

Raven:

De novo genome assembler for long uncorrected reads.

Rcorrector:

kmer-based error correction method for RNA-seq data.

Relion:

RELION (for REgularised LIkelihood OptimisatioN, pronounce rely-on) is a stand-alone computer program that employs an empirical Bayesian approach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM).

RepeatMasker:

Screens DNA sequences for interspersed repeats and low complexity DNA sequences.
For licensing reasons RepBase is not included, instead you must set LIBDIR to point at a directory which contains your copy of it.

RepeatModeler:

De novo transposable element (TE) family identification and modeling package.

RepeatScout:

De novo identification of repeat families in large genomes

Riskscape:

RiskScape is an open-source spatial data processing application used for multi-hazard risk analysis. RiskScape is highly customisable, letting modellers tailor the risk analysis to suit the problem domain and input data being modelled.

Roary:

Rapid large-scale prokaryote pan genome analysis

Rosetta:

Rosetta is the premier software suite for modeling macromolecular structures. As a flexible, multi-purpose application, it includes tools for structure prediction, design, and remodeling of proteins and nucleic acids.

Ruby:

Ruby is a dynamic, open source programming language with a focus on simplicity and productivity. It has an elegant syntax that is natural to read and easy to write.

Rust:

Systems programming language that runs blazingly fast, prevents segfaults, and guarantees thread safety.

SAGE:

Ppackage containing programs for use in the genetic analysis of family, pedigree and individual data.

SAMtools:

Samtools is a suite of programs for interacting with high-throughput sequencing data. SAMtools - Reading/writing/editing/indexing/viewing SAM/BAM/CRAM format

SAS:

SAS is a statistical software suite developed by SAS Institute for data management, advanced analytics, multivariate analysis, business intelligence, criminal investigation, and predictive analytics. - Homepage: https://www.sas.com/en_nz/home.html/

SCOTCH:

Software package and libraries for sequential and parallel graph partitioning, static mapping, and sparse matrix block ordering, and sequential mesh and hypergraph partitioning.

SCP:

Means of securely transferring files between over an SSH connection.

SDL2:

Simple DirectMedia Layer, a cross-platform multimedia library

SEPP:

SATe-enabled Phylogenetic Placement. Phylogenetic placement of short reads into reference alignments and trees.

SHAPEIT4:

Estimation of haplotypes (aka phasing) for SNP array and high coverage sequencing data.

SIONlib:

Scalable I/O library for parallel access to task-local files.

SIP:

SIP is a tool that makes it very easy to create Python bindings for C and C++ libraries.

SKESA:

SKESA is a de-novo sequence read assembler for cultured single isolate genomes based on DeBruijn graphs.

PacBio’s open-source software suite is designed for use with Single Molecule, Real-Time (SMRT) Sequencing data.

SNVoter-NanoMethPhase:

SNVoter - A top up tool to enhance SNV calling from Nanopore sequencing data & NanoMethPhase - Phase long reads and CpG methylations from Oxford Nanopore Technologies.

SOCI:

Database access library for C++ that makes the illusion of embedding SQL queries in the regular C++ code, staying entirely within the Standard C++.

SPAdes:

Genome assembler for single-cell and isolates data sets

SPIDER:

System for Processing Image Data from Electron microscopy and Related fields

SQLite:

SQLite: SQL Database Engine in a C Library

SSAHA2:

Pairwise sequence alignment program designed for the efficient mapping of sequencing reads onto genomic reference sequences.

SSH:

A network communication protocol that enables two computers to communicate

STAR:

Fast universal RNA-seq aligner

STAR-Fusion:

Processes the output generated by the STAR aligner to map junction reads and spanning reads to a reference annotation set

SUNDIALS:

SUNDIALS: SUite of Nonlinear and DIfferential/ALgebraic Equation Solvers

SURVIVOR:

Tool set for simulating/evaluating SVs, merging and comparing SVs within and among samples, and includes various methods to reformat or summarize SVs.

SWIG:

SWIG is a software development tool that connects programs written in C and C++ with a variety of high-level programming languages.

Salmon:

Salmon is a wicked-fast program to produce a highly-accurate, transcript-level quantification estimates from RNA-seq data.

Sambamba:

Tools for working with SAM/BAM data

ScaLAPACK:

The ScaLAPACK (or Scalable LAPACK) library includes a subset of LAPACK routines redesigned for distributed memory MIMD parallel computers.

SeisSol:

SeisSol is a software package for simulating wave propagation and dynamic rupture based on the arbitrary high-order accurate derivative discontinuous Galerkin method (ADER-DG).

SeqAn:

SeqAn is an open source C++ library of efficient algorithms and data structures for the analysis of sequences with the focus on biological data.

SeqAn3:

C++ library of efficient algorithms and data structures for the analysis of sequences with the focus on biological data.

SeqKit:

Ultrafast toolkit for FASTA/Q file manipulation

SiBELia:

A comparative genomics tool for analysing genomic variations that correlate with pathogens, or
microorganisms adapt in different environments.

Siesta:

SIESTA is both a method and its computer program implementation, to perform efficient electronic structure calculations and ab initio molecular dynamics simulations of molecules and solids.

SignalP:

SignalP predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms

Singularity:

Singularity is a portable application stack packaging and runtime utility.

Sniffles:

A fast structural variant caller for long-read sequencing.

SortMeRNA:

SortMeRNA is a biological sequence analysis tool for filtering, mapping and OTU-picking NGS reads.

SourceTracker:

SourceTracker is a Bayesian approach to estimating the proportion of a novel community that comes from a set of source environments.

Spack:

Spack is a package manager for supercomputers, Linux, and macOS. It makes installing scientific software easy. With Spack, you can build a package with multiple versions, configurations, platforms, and compilers, and all of these builds can coexist on the same machine.

Spark:

Spark is Hadoop MapReduce done in memory

SpectrA:

C++ library for large scale eigenvalue problems, built on top of Eigen, an open source linear algebra library.

Spectrum Scale:

High-performance clustered file system software developed by IBM.

SqueezeMeta:

fully automated metagenomics pipeline, from reads to bins.

Stacks:

Stacks is a software pipeline for building loci from short-read sequences, such as those generated on the Illumina platform. Stacks was developed to work with restriction enzyme-based data, such as RAD-seq, for the purpose of building genetic maps and conducting population genomics and phylogeography.

StringTie:

StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts.

Structure:

The program structure is a free software package for using multi-locus genotype data to investigate population structure.

Subread:

High performance read alignment, quantification and mutation discovery

Subversion:

Subversion is an open source version control system. Subversion exists to be universally recognized and adopted as an open-source, centralized version control system characterized by its reliability as a safe haven for valuable data; the simplicity of its model and usage; and its ability to support the needs of a wide variety of users and projects, from individuals to large-scale enterprise operations.

SuiteSparse:

SuiteSparse is a collection of libraries manipulate sparse matrices.

SuperLU:

Solution of large, sparse, nonsymmetric systems of linear equations.

Supernova:

Supernova is a software package for de novo assembly from Chromium Linked-Reads that are made from a single whole-genome library from an individual DNA source

Szip:

Szip compression software, providing lossless compression of scientific data

TEtranscripts:

Takes RNA-seq (and similar data) and annotates reads to both genes & transposable elements.

TMHMM:

Prediction of transmembrane helices in proteins

TOGA:

Implements a novel machine learning based paradigm to infer orthologous genes between related species and to accurately distinguish orthologs from paralogs or processed pseudogenes.

TSEBRA:

Transcript Selector for BRAKER

TURBOMOLE:

Program Package For Electronic Structure Calculations.

TWL-NINJA:

Nearly Infinite Neighbor Joining Application.

Tcl:

Tcl (Tool Command Language) is a very powerful but easy to learn dynamic programming language, suitable for a very wide range of uses, including web and desktop applications, networking, administration, testing and many more.

TensorFlow:

An open-source software library for Machine Intelligence

TensorRT:

NVIDIA TensorRT is a platform for high-performance deep learning inference

Theano:

Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently.

Tk:

Tk is an open source, cross-platform widget toolchain that provides a library of basic elements for building a graphical user interface (GUI) in many different programming languages.

TransDecoder:

TransDecoder identifies candidate coding regions within transcript sequences.

TreeMix:

TreeMix is a method for inferring the patterns of population splits and mixtures in the history of a set of populations.

TrimGalore:

A wrapper of FastQC and cutadapt to automate quality and adapter trimming as well as quality control, with some added functionality to remove biased methylation positions for RRBS sequence files

Trimmomatic:

Trimmomatic performs a variety of useful trimming tasks for illumina paired-end and single ended data.The selection of trimming steps and their associated parameters are supplied on the command line.

Trinity:

Trinity represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-Seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-Seq reads.

Trinotate:

C++ library of efficient algorithms and data structures for the analysis of sequences with the focus on biological data.

Trycycler:

Tool for generating consensus long-read assemblies for bacterial genomes.

TuiView:

TuiView is a lightweight raster GIS with powerful raster attribute table manipulation abilities.

TurboVNC:

TurboVNC is a derivative of VNC (Virtual Network Computing) that is tuned to provide peak performance for 3D and video workloads.

UCC:

UCC (Unified Collective Communication) is a collective communication operations API and library that is flexible, complete, and feature-rich for current and emerging programming models and runtimes.

UCX:

Unified Communication X An open-source production grade communication framework for data centric and high-performance applications

UDUNITS:

UDUNITS supports conversion of unit specifications between formatted and binary forms, arithmetic manipulation of units, and conversion of values between compatible scales of measurement.

USEARCH:

USEARCH is a unique sequence analysis tool which offers search and clustering algorithms that are often orders of magnitude faster than BLAST.

Unicycler:

Assembly pipeline for bacterial genomes. It can assemble Illumina-only read sets where it functions as a SPAdes-optimiser.

VASP:

The Vienna Ab initio Simulation Package (VASP) is a computer program for atomic scale materials modelling, e.g. electronic structure calculations and quantum-mechanical molecular dynamics, from first principles.

VCF-kit:

VCF-kit is a command-line based collection of utilities for performing analysis on Variant Call Format (VCF) files.

VCFtools:

The aim of VCFtools is to provide methods for working with VCF files: validating, merging, comparing and calculate some basic population genetic statistics.

VEP:

Variant Effect Predictor (VEP) determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions. Includes EnsEMBL-XS, which provides pre-compiled replacements for frequently used routines in VEP.

VIBRANT:

Virus Identification By iteRative ANnoTation

VMD:

VMD is a molecular visualization program for displaying, animating, and analyzing large biomolecular systems using 3-D graphics and built-in scripting.

VPN:

Method of extending access to a private network.

VSEARCH:

An open source alternative to the metagenomics tool USEARCH.

Performs chimera detection, clustering, full-length and prefix dereplication, rereplication, masking, all-vs-all pairwise global alignment, exact and global alignment searching, shuffling, subsampling and sorting. It also supports FASTQ file analysis, filtering, conversion and merging of paired-end reads.

VTK:

The Visualization Toolkit (VTK) is an open-source, freely available software system for 3D computer graphics, image processing and visualization. VTK consists of a C++ class library and several interpreted interface layers including Tcl/Tk, Java, and Python. VTK supports a wide variety of visualization algorithms including: scalar, vector, tensor, texture, and volumetric methods; and advanced modeling techniques such as: implicit modeling, polygon reduction, mesh smoothing, cutting, contouring, and Delaunay triangulation.

VTune:

Intel VTune Amplifier XE is the premier performance profiler for C, C++, C#, Fortran, Assembly and Java.

Valgrind:

Valgrind: Debugging and profiling tools

VarScan:

Variant calling and somatic mutation/CNV detection for next-generation sequencing data

Velvet:

Sequence assembler for very short reads

VelvetOptimiser:

Perl script for optimising the three primary parameter options of the Velvet de novo sequence assembler.

ViennaRNA:

The Vienna RNA Package consists of a C code library and several stand-alone programs for the prediction and comparison of RNA secondary structures.

Vim:

Vim is an advanced text editor that seeks to provide the power of the de-facto Unix editor 'Vi', with a more complete feature set.

VirHostMatcher:

Tools for computing various oligonucleotide frequency (ONF) based distance/dissimialrity measures.

VirSorter:

VirSorter: mining viral signal from microbial genomic data.

VirtualGL:

VirtualGL is an open source toolkit that gives any Linux or Unix remote display software the ability to run OpenGL applications with full hardware acceleration.

WAAFLE:

Workflow to Annotate Assemblies and Find LGT Events.

WhatsHap:

Tool for phasing genomic variants using DNA sequencing reads, also called read-based phasing or haplotype assembly.

Winnowmap:

Winnowmap is a long-read mapping algorithm, and a result of our exploration into superior minimizer sampling techniques.

Wise2:

Aligning proteins or protein HMMs to DNA

XHMM:

Calls copy number variation (CNV) from normalized read-depth data from exome capture or other targeted sequencing experiments.

XMDS2:

Fast integrator of stochastic partial differential equations.

XSD:

CodeSynthesis XSD is an open-source, cross-platform W3C XML Schema to C++ data binding compiler. Provided with an XML instance specification (XML Schema), it generates C++ classes that represent the given vocabulary as well as XML parsing and serialization code. You can then access the data stored in XML using types and functions that semantically correspond to your application domain rather than dealing with the intricacies of reading and writing XML

XVFB:

A display server implementing the X11 display server protocol, XVFB performs all graphical operations in virtual memory without showing any screen output. This allows applications that 'require' a GUI to run in a command line environment. Can be invoked with xvfb-run.

XZ:

xz: XZ utilities

Xerces-C++:

Xerces-C++ is a validating XML parser written in a portable subset of C++. Xerces-C++ makes it easy to give your application the ability to read and write XML data. A shared library is provided for parsing, generating, manipulating, and validating XML documents using the DOM, SAX, and SAX2 APIs.

YAXT:

Yet Another eXchange Tool

Yasm:

Yasm: Complete rewrite of the NASM assembler with BSD license

Z3:

A theorem prover from Microsoft Research.

ZeroMQ:

ZeroMQ looks like an embeddable networking library but acts like a concurrency framework. It gives you sockets that carry atomic messages across various transports like in-process, inter-process, TCP, and multicast. You can connect sockets N-to-N with patterns like fanout, pub-sub, task distribution, and request-reply.

Zip:

Zip is a compression and file packaging/archive utility. Although highly compatible both with PKWARE's PKZIP and PKUNZIP utilities for MS-DOS and with Info-ZIP's own UnZip, our primary objectives have been portability and other-than-MSDOS functionality

abritamr:

AMR gene detection pipeline that runs AMRFinderPlus on a single (or list ) of given isolates

angsd:

Program for analysing NGS data.

ant:

Apache Ant is a Java library and command-line tool whose mission is to drive processes described in build files as targets and extension points dependent upon each other. The main known usage of Ant is the build of Java applications.

antiSMASH:

antiSMASH allows the rapid genome-wide identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genomes.

any2fasta:

Convert various sequence formats to FASTA

argtable:

Argtable is an ANSI C library for parsing GNU style command line options with a minimum of fuss.

aria2:

aria2 is a lightweight multi-protocol & multi-source command-line download utility.

arpack-ng:

ARPACK is a collection of Fortran77 subroutines designed to solve large scale eigenvalue problems.

at-spi2-atk:

AT-SPI 2 toolkit bridge

at-spi2-core:

Assistive Technology Service Provider Interface.

attr:

Commands for Manipulating Filesystem Extended Attributes

azul-zulu:

Java Development Kit (JDK), and a compliant implementation of the Java Standard Edition (SE) specification.

bamUtil:

Repository that contains several programs that perform operations on SAM/BAM files.

barrnap:

Barrnap predicts the location of ribosomal RNA genes in genomes.

bcl2fastq2:

bcl2fastq Conversion Software both demultiplexes data and converts BCL files generated by Illumina sequencing systems to standard FASTQ file formats for downstream analysis.

beagle-lib:

beagle-lib is a high-performance library that can perform the core calculations at the heart of most Bayesian and Maximum Likelihood phylogenetics packages.

best:

Bam Error Stats Tool (best): analysis of error types in aligned reads

binutils:

binutils: GNU binary utilities

bioawk:

An extension to awk, adding the support of several common biological data formats

breseq:

breseq is a computational pipeline for the analysis of short-read re-sequencing data

bsddb3:

bsddb3 is a nearly complete Python binding of the Oracle/Sleepycat C API for the Database Environment, Database, Cursor, Log Cursor, Sequence and Transaction objects.

bzip2:

bzip2 is a freely available, patent free, high-quality data compressor. It typically compresses files to within 10% to 15% of the best available techniques (the PPM family of statistical compressors), whilst being around twice as fast at compression and six times faster at decompression.

c-ares:

c-ares is a C library for asynchronous DNS requests (including name resolves)

cURL:

libcurl is a free and easy-to-use client-side URL transfer library, supporting DICT, FILE, FTP, FTPS, Gopher, HTTP, HTTPS, IMAP, IMAPS, LDAP, LDAPS, POP3, POP3S, RTMP, RTSP, SCP, SFTP, SMTP, SMTPS, Telnet and TFTP. libcurl supports SSL certificates, HTTP POST, HTTP PUT, FTP uploading, HTTP form based upload, proxies, cookies, user+password authentication (Basic, Digest, NTLM, Negotiate, Kerberos), file transfer resume, http proxy tunneling and more.

cairo:

Cairo is a 2D graphics library with support for multiple output devices. Currently supported output targets include the X Window System (via both Xlib and XCB), Quartz, Win32, image buffers, PostScript, PDF, and SVG file output. Experimental backends include OpenGL, BeOS, OS/2, and DirectFB

cdbfasta:

Fasta file indexing and retrival tool

chainforge:

Nvidia and AMD GPU utility for SeisSol.

chewBBACA:

A complete suite for gene-by-gene schema creation and strain identification..

chopper:

Rust implementation of NanoFilt+NanoLyse

code-server:

code-server for OpenOnDemand

compleasm:

faster and more accurate reimplementation of BUSCO.

cromwell:

Workflow Management System geared towards scientific workflows.

ctags:

Ctags generates an index (or tag) file of language objects found in source files that allows these items to be quickly and easily located by a text editor or other utility.

ctffind:

ctffind is a program for finding CTFs of electron micrographs

cuDNN:

The NVIDIA CUDA Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks.

cutadapt:

cutadapt removes adapter sequences from high-throughput sequencing data. This is usually necessary when the read length of the sequencing machine is longer than the molecule that is sequenced, for example when sequencing microRNAs.

cuteSV:

Fast and scalable long-read-based SV detection

cwltool:

Common Workflow Language tool description reference implementation

cyvcf2:

cython + htslib == fast VCF and BCF processing

dadi:

Diffusion Approximation for Demographic Inference

dammit:

de novo transcriptome annotator..

datasets:

Tool to gather data from across NCBI databases

deepTools:

deepTools is a suite of python tools particularly developed for the efficient analysis of high-throughput sequencing data, such as ChIP-seq, RNA-seq or MNase-seq.

devtools:

R functions that simplify and expedite common tasks in package development.

double-conversion:

Efficient binary-decimal and decimal-binary conversion routines for IEEE doubles.

drep:

Rapid and accurate comparison and de-replication of microbial genomes

dtcmp:

DTCMP Library provides pre-defined and user-defined comparison operations to compare the values of two items which can be arbitrary MPI datatypes.

duphold:

uphold your DUP and DEL calls

duplex-tools:

Range of tools to support operations on Duplex Sequencing read pairs.

eDNA:

A suite of tools to conduct metabarcoding analyses targeting any group of organisms. Includes utilities for preprocessing raw data and building your own custom reference database.

easi:

easi is a library for the Easy Initialization of models in three (or less or more) dimensional domains.

ecCodes:

ecCodes is a package developed by ECMWF which provides an application programming interface and a set of tools for decoding and encoding messages in the following formats: WMO FM-92 GRIB edition 1 and edition 2, WMO FM-94 BUFR edition 3 and edition 4, WMO GTS abbreviated header (only decoding).

ectyper:

Standalone versatile serotyping module for Escherichia coli..

edlib:

Lightweight, super fast library for sequence alignment using edit (Levenshtein) distance.

eggnog-mapper:

Tool for fast functional annotation of novel sequences (genes or proteins) using precomputed eggNOG-based orthology assignments

emmtyper:

Tool for emm-typing of Streptococcus pyogenes using a de novo or complete assembly

ensmallen:

C++ header-only library for numerical optimization

entrez-direct:

an advanced method for accessing the NCBI's set of interconnected databases such as publication, sequence, structure, gene, variation, expression, etc.

exonerate:

Generic tool for pairwise sequence comparison

expat:

Expat is an XML parser library written in C. It is a stream-oriented parser in which an application registers handlers for things the parser might find in the XML document (like start tags)

fastStructure:

fastStructure is an algorithm for inferring population structure from large SNP genotype data. It is based on a variational Bayesian framework for posterior inference and is written in Python2.x.

fastp:

A tool designed to provide fast all-in-one preprocessing for FastQ files.

fastq-tools:

A collection of small and efficient programs for performing some common and uncommon tasks with FASTQ files.

fcGENE:

Format converting tool for genotype Data.

fgbio:

A set of tools to analyze genomic data with a focus on Next Generation Sequencing.

fineRADstructure:

A package for population structure inference from RAD-seq data

fineSTRUCTURE:

Population assignment using large numbers of densely sampled genomes, including both SNP chips and sequence dat

flatbuffers:

FlatBuffers: Memory Efficient Serialization Library

flex:

Flex (Fast Lexical Analyzer) is a tool for generating scanners. A scanner, sometimes called a tokenizer, is a program which recognizes lexical patterns in text.

fmlrc:

Tool for performing hybrid correction of long read sequencing using the BWT and FM-index of short-read sequencing data

fmt:

Formatting library providing a fast and safe alternative to C stdio and C++ iostreams.

fontconfig:

Fontconfig is a library designed to provide system-wide font configuration, customization and application access.

forge:

Arm Forge combines Arm DDT, the leading debugger for time-saving high performance application debugging, and Arm MAP, the trusted performance profiler for invaluable optimization advice.

foss:

GNU Compiler Collection (GCC) based compiler toolchain, including OpenMPI for MPI support, OpenBLAS (BLAS and LAPACK support), FFTW and ScaLAPACK.

freetype:

FreeType 2 is a software font engine that is designed to be small, efficient, highly customizable, and portable while capable of producing high-quality output (glyph images). It can be used in graphics libraries, display servers, font conversion tools, text image generation tools, and many other products as well.

funcx-endpoint:

funcX is a distributed Function as a Service (FaaS) platform that enables flexible, scalable, and high performance remote function execution. Unlike centralized FaaS platforms, funcX allows users to execute functions on heterogeneous remote computers, from laptops to campus clusters, clouds, and supercomputers. A funcX endpoint is a persistent service launched by the user on a compute system to serve as a conduit for executing functions on that computer.

fxtract:

Extract sequences from a fastx (fasta or fastq) file given a subsequence.

g2clib:

Library contains GRIB2 encoder/decoder ('C' version).

g2lib:

Library contains GRIB2 encoder/decoder and search/indexing routines.

ga4gh:

A reference implementation of the GA4GH API

gcloud:

Libraries and tools for interacting with Google Cloud products and services.

gemmforge:

GPU-GEMM generator for the Discontinuous Galerkin method.

genometools:

GenomeTools: A Comprehensive Software Library for Efficient Processing of Structured Genome Annotations.

gettext:

GNU 'gettext' is an important step for the GNU Translation Project, as it is an asset on which we may build many other steps. This package offers to programmers, translators, and even users, a well integrated set of tools and documentation

gfastats:

single fast and exhaustive tool for summary statistics and simultaneous fa (fasta, fastq, gfa [.gz]) genome assembly file manipulation.

gffread:

GFF/GTF parsing utility providing format conversions, region filtering, FASTA sequence extraction and more.

giflib:

giflib is a library for reading and writing gif images. It is API and ABI compatible with libungif which was in wide use while the LZW compression algorithm was patented.

gimkl:

GNU Compiler Collection (GCC) based compiler toolchain with Intel MPI and MKL

gimpi:

GNU Compiler Collection (GCC) based compiler toolchain with Intel MPI.

git:

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.

globus-automate-client:

Client for the Globus Flows service.

globus-compute-endpoint:

Globus Compute is a distributed Function as a Service (FaaS) platform that enables flexible, scalable, and high performance remote function execution. Unlike centralized FaaS platforms, Globus Compute allows users to execute functions on heterogeneous remote computers, from laptops to campus clusters, clouds, and supercomputers. A Globus Compute endpoint is a persistent service launched by the user on a compute system to serve as a conduit for executing functions on that computer.

gmsh:

Gmsh is a 3D finite element grid generator with a build-in CAD engine and post-processor..

gnuplot:

Portable interactive, function plotting utility

gompi:

GNU Compiler Collection (GCC) based compiler toolchain, including OpenMPI for MPI support.

google-sparsehash:

An extremely memory-efficient hash_map implementation. 2 bits/entry overhead! The SparseHash library contains several hash-map implementations, including implementations that optimize for space or speed.

googletest:

Google's C++ test framework

gperf:

GNU gperf is a perfect hash function generator. For a given list of strings, it produces a hash function and hash table, in form of C or C++ code, for looking up a value depending on the input string. The hash function is perfect, which means that the hash table has no collisions, and the hash table lookup needs a single string comparison only.

grive2:

Command line tool for Google Drive.

gsort:

Tool to sort genomic files according to a genomefile.

h5pp:

A simple C++17 wrapper for HDF5.

haplocheck:

Detects in-sample contamination in mtDNA or WGS sequencing studies by analyzing the mitchondrial content

help2man:

help2man produces simple manual pages from the '--help' and '--version' output of other commands.

hifiasm:

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads.

hunspell:

Spell checker and morphological analyzer library and program designed for languages with rich morphology and complex word compounding or character encoding.

hwloc:

The Portable Hardware Locality (hwloc) software package provides a portable abstraction (across OS, versions, architectures, ...) of the hierarchical topology of modern architectures, including NUMA memory nodes, sockets, shared caches, cores and simultaneous multithreading. It also gathers various system attributes such as cache and memory information as well as the locality of I/O devices such as network interfaces, InfiniBand HCAs or GPUs. It primarily aims at helping applications with gathering information about modern computing hardware so as to exploit it accordingly and efficiently.

hypothesis:

Hypothesis is an advanced testing library for Python. It lets you write tests which are parametrized by a source of examples, and then generates simple and comprehensible examples that make your tests fail. This lets you find more bugs in your code with less work.

icc:

Intel C and C++ compilers

iccifort:

Intel C, C++ & Fortran compilers

ifort:

Intel Fortran compiler

iimpi:

Intel C/C++ and Fortran compilers, alongside Intel MPI.

imkl:

Intel Math Kernel Library is a library of highly optimized, extensively threaded math routines for science, engineering, and financial applications that require maximum performance. Core math functions include BLAS, LAPACK, ScaLAPACK, Sparse Solvers, Fast Fourier Transforms, Vector Math, and more.

imkl-FFTW:

FFTW interfaces using Intel oneAPI Math Kernel Library

impalajit:

A lightweight JIT compiler for flexible data access in simulation applications

impi:

The Intel(R) MPI Library for Linux* OS is a multi-fabric message passing library based on ANL MPICH2 and OSU MVAPICH2. The Intel MPI Library for Linux OS implements the Message Passing Interface, version 2 (MPI-2) specification. - Homepage: http://software.intel.com/en-us/intel-mpi-library/

intel:

Intel Cluster Toolkit Compiler Edition provides Intel C/C++ and Fortran compilers, Intel MPI & Intel MKL.

intel-compilers:

Intel C, C++ & Fortran compilers (classic and oneAPI)

iofbf:

Intel based compiler toolchain, including OpenMPI for MPI support, FlexiBLAS (Defaulting to OpenBLAS), FFTW and ScaLAPACK.

iompi:

GNU Compiler Collection (GCC) based compiler toolchain, including OpenMPI for MPI support.

ipyrad:

ipyrad is an interactive toolkit for assembly and analysis of restriction-site associated genomic data sets (e.g., RAD, ddRAD, GBS) for population genetic and phylogenetic studies.

ispc:

Intel SPMD Program Compilers; An open-source compiler for high-performance SIMD programming on the CPU. ispc is a compiler for a variant of the C programming language, with extensions for 'single program, multiple data' (SPMD) programming. Under the SPMD model, the programmer writes a program that generally appears to be a regular serial program, though the execution model is actually that a number of program instances execute in parallel on the hardware.

jbigkit:

JBIG-KIT is a software implementation of the JBIG1 data compression standard

jcvi:

Collection of Python libraries to parse bioinformatics files, or perform computation related to assembly, annotation, and comparative genomics.

jemalloc:

jemalloc is a general purpose malloc(3) implementation that emphasizes fragmentation avoidance and scalable concurrency support.

jq:

Lightweight and flexible command-line JSON processor.

json-c:

JSON-C implements a reference counting object model that allows you to easily construct JSON objects in C, output them as JSON formatted strings and parse JSON formatted strings back into the C representation of JSON objects.

jvarkit:

Java utilities for Bioinformatics

kalign2:

Kalign is a fast multiple sequence alignment program for biological sequences.

kallisto:

kallisto is a program for quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.

kineto:

A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters

kma:

KMA is a mapping method designed to map raw reads directly against redundant databases, in an ultra-fast manner using seed and extend.

libFLAME:

libFLAME is a portable library for dense matrix computations, providing much of the functionality present in LAPACK.

libGLU:

The OpenGL Utility Library (GLU) is a computer graphics library for OpenGL.

libKML:

Reference implementation of OGC KML 2.2

libStatGen:

Set of classes for creating statistical genetic programs.

libaec:

Libaec provides fast lossless compression of 1 up to 32 bit wide signed or unsigned integers (samples)

libarchive:

Multi-format archive and compression library

libcircle:

API for distributing embarrassingly parallel workloads using self-stabilization.

libdeflate:

Heavily optimized library for DEFLATE/zlib/gzip compression and decompression.

libdrm:

Direct Rendering Manager runtime library.

libdwarf:

The DWARF Debugging Information Format is of interest to programmers working on compilers and debuggers

libepoxy:

Library for handling OpenGL function pointer management

libevent:

The libevent API provides a mechanism to execute a callback function when a specific event occurs on a file descriptor or after a timeout has been reached. Furthermore, libevent also support callbacks due to signals or regular timeouts.

libffi:

The libffi library provides a portable, high level programming interface to various calling conventions. This allows a programmer to call any function specified by a call interface description at run-time.

libgcrypt:

Libgpg-error is a small library that defines common error values for all GnuPG components.

libgd:

GD is an open source code library for the dynamic creation of images by programmers.

libgeotiff:

Library for reading and writing coordinate system information from/to GeoTIFF files

libgit2:

libgit2 is a portable, pure C implementation of the Git core methods provided as a re-entrant linkable library with a solid API, allowing you to write native speed custom Git applications in any language which supports C bindings.

libglvnd:

libglvnd is a vendor-neutral dispatch layer for arbitrating OpenGL API calls between multiple vendors.

libgpg-error:

Libgpg-error is a small library that defines common error values for all GnuPG components.

libgpuarray:

Arrays on GPU device memory, for Theano

libgtextutils:

ligtextutils is a dependency of fastx-toolkit and is provided via the same upstream

libiconv:

Libiconv converts from one character encoding to another through Unicode conversion

libjpeg-turbo:

libjpeg-turbo is a fork of the original IJG libjpeg which uses SIMD to accelerate baseline JPEG compression and decompression. libjpeg is a library that implements JPEG image encoding, decoding and transcoding.

libpciaccess:

Generic PCI access library.

libpng:

libpng is the official PNG reference library

libreadline:

The GNU Readline library provides a set of functions for use by applications that allow users to edit command lines as they are typed in. Both Emacs and vi editing modes are available. The Readline library includes additional functions to maintain a list of previously-entered command lines, to recall and perhaps reedit those lines, and perform csh-like history expansion on previous commands.

libsodium:

library for encryption, decryption, signatures, password hashing and more.

libspatialite:

SpatiaLite is an open source library intended to extend the SQLite core to support fully fledged Spatial SQL capabilities.

libtool:

GNU libtool is a generic library support script. Libtool hides the complexity of using shared libraries behind a consistent, portable interface.

libunistring:

This library provides functions for manipulating Unicode strings and for manipulating C strings according to the Unicode standard.

libunwind:

Define a portable and efficient C programming API to determine the call-chain of a program.

libvdwxc:

libvdwxc is a general library for evaluating energy and potential for exchange-correlation (XC) functionals from the vdW-DF family that can be used with various of density functional theory (DFT) codes.

libxc:

Libxc is a library of exchange-correlation functionals for density-functional theory. The aim is to provide a portable, well tested and reliable set of exchange and correlation functionals.

libxml2:

Libxml2 is the XML C parser and toolchain developed for the Gnome project (but usable outside of the Gnome platform).

libxslt:

Libxslt is the XSLT C library developed for the GNOME project (but usable outside of the Gnome platform).

libxsmm:

LIBXSMM is a library for small dense and small sparse matrix-matrix multiplications targeting Intel Architecture (x86).

libzstd:

Fast lossless compression algorithm.

lighttpd:

A web server.

likwid:

Command line tools for Linux to support programmers in developing high performance multi threaded programs.

lp_solve:

Mixed Integer Linear Programming (MILP) solver.

lwgrp:

The light-weight group library defines data structures and collective operations to group MPI processes as an ordered set.

lz4:

LZ4 is lossless compression algorithm, providing compression speed at 400 MB/s per core. It features an extremely fast decoder, with speed in multiple GB/s per core.

maf_stream:

Collection of utilities to manipulate multiple alignments in the Multiple Alignment Format

magma:

The MAGMA project aims to develop a dense linear algebra library similar to LAPACK but for heterogeneous/hybrid architectures, starting with current Multicore+GPU systems.

manta:

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads.

mapDamage:

tracks and quantifies DNA damage patterns among ancient DNA sequencing reads generated by Next-Generation Sequencing platforms.

matlab-proxy:

Python package which enables you to launch MATLAB and access it from a web browser.

meRanTK:

High performance toolkit for complete analysis of methylated RNA data.

medaka:

Medaka is a tool to create a consensus sequence from nanopore sequencing data.

megalodon:

Tool to extract high accuracy modified base and sequence variant calls from raw nanopore reads by anchoring the information rich basecalling neural network output to a reference genome/transcriptome.

metaWRAP:

Flexible pipeline for genome-resolved metagenomic data analysis.

miRDeep2:

Completely overhauled tool which discovers microRNA genes by analyzing sequenced RNAs

mimalloc:

mimalloc is a general purpose allocator with excellent performance characteristics.

miniBUSCO:

faster and more accurate reimplementation of BUSCO.

miniasm:

Fast OLC-based de novo assembler for noisy long reads.

minimap2:

Minimap2 is a fast sequence mapping and alignment program that can find overlaps between long noisy reads, or map long reads or their assemblies to a reference genome optionally with detailed alignment (i.e. CIGAR). At present, it works efficiently with query sequences from a few kilobases to ~100 megabases in length at an error rate ~15%. .

miniprot:

Aligns a protein sequence against a genome with affine gap penalty, splicing and frameshift..

mlpack:

Fast, and flexible C++ machine learning library with bindings to other languages

modbam2bed:

A program to aggregate modified base counts stored in a modified-base BAM file to a bedMethyl file.

modkit:

Tool for working with modified bases from Oxford Nanopore

mosdepth:

Fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing

mpcci:

MpCCI is a vendor neutral and application independent interface for co-simulation. MpCCI offers advanced and proven features for multiphysics modelling.

mpifileutils:

MPI-Based File Utilities For Distributed Systems

muParser:

muParser is an extensible high performance math expression parser library written in C++. It works by transforming a mathematical expression into bytecode and precalculating constant parts of the expression.

nanoQC:

Create fastQC-like plots for Oxford Nanopore sequencing data.

nanofilt:

Filtering and trimming of long read sequencing data.

nanoget:

Functions to extract information from Oxford Nanopore sequencing data and alignments

nanomath:

A few simple math function for other Oxford Nanopore processing scripts

nanopolish:

Software package for signal-level analysis of Oxford Nanopore sequencing data.

ncbi-vdb:

The SRA Toolkit and SDK from NCBI is a collection of tools and libraries for using data in the INSDC Sequence Read Archives.

ncurses:

The Ncurses (new curses) library is a free software emulation of curses in System V Release 4.0, and more. It uses Terminfo format, supports pads and color and multiple highlights and forms characters and function-key mapping, and has all the other SYSV-curses enhancements over BSD Curses.

ncview:

Visual browser for netCDF format files.

ne:

ne is a free (GPL'd) text editor based on the POSIX standard that runs (we hope) on almost any UN*X machine. ne is easy to use for the beginner, but powerful and fully configurable for the wizard, and most sparing in its resource usage.

netCDF:

NetCDF (network Common Data Form) is a set of software libraries and machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data.

netCDF-C++:

NetCDF (network Common Data Form) is a set of software libraries and machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data.

netCDF-C++4:

NetCDF (network Common Data Form) is a set of software libraries and machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data.

netCDF-Fortran:

NetCDF (network Common Data Form) is a set of software libraries and machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data.

nodejs:

Node.js is a platform built on Chrome's JavaScript runtime for easily building fast, scalable network applications. Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient, perfect for data-intensive real-time applications that run across distributed devices.

nseg:

Used to mask nucleic acid sequences

nsync:

nsync is a C library that exports various synchronization primitives, such as mutexes

nullarbor:

Reads to report pipeline for bacterial isolate NGS data.

numactl:

The numactl program allows you to run your application program on specific cpu's and memory nodes. It does this by supplying a NUMA memory policy to the operating system before running your program. The libnuma library provides convenient ways for you to add NUMA memory policies into your own program.

ont-guppy-gpu:

Data processing toolkit that contains the Oxford Nanopore Technologies' basecalling algorithms, and several bioinformatic post-processing features

padloc:

Prokaryotic Antiviral Defence LOCator

pairtools:

CLI tools to process mapped Hi-C data

panaroo:

A pangenome analysis pipeline.

pandoc:

Almost universal document converter

parallel-fastq-dump:

parallel fastq-dump wrapper

parasail:

parasail is a SIMD C (C99) library containing implementations of the Smith-Waterman (local), Needleman-Wunsch (global), and semi-global pairwise sequence alignment algorithms.

patchelf:

PatchELF is a small utility to modify the dynamic linker and RPATH of ELF executables.

pauvre:

Tools for plotting Oxford Nanopore and other long-read data.

pggb:

PanGenome Graph Builder(pggb)

pgge:

pangenome graph evaluator

phonopy:

Phonopy is an open source package of phonon calculations based on the supercell approach.

phyx:

phyx performs phylogenetics analyses on trees and sequences.

picard:

A set of tools (in Java) for working with next generation sequencing data in the BAM format.

pigz:

parallel implementation of gzip,

pixman:

Pixman is a low-level software library for pixel manipulation, providing features such as image compositing and trapezoid rasterization. Important users of pixman are the cairo graphics library and the X server.

pod5:

File format for storing nanopore dna data in an easily accessible way.

pplacer:

Places query sequences on a fixed reference phylogenetic tree to maximize phylogenetic likelihood or posterior probability according to a reference alignment

preseq:

Software for predicting library complexity and genome coverage in high-throughput sequencing.

prodigal:

Prodigal (Prokaryotic Dynamic Programming Genefinding Algorithm) is a microbial (bacterial and archaeal) gene finding program developed at Oak Ridge National Laboratory and the University of Tennessee.

prodigal-gv:

A fork of Prodigal meant to improve gene calling for giant viruses and viruses that use alternative genetic codes.

prokka:

Prokka is a software tool for the rapid annotation of prokaryotic genomes.

protobuf:

Google Protocol Buffers.

psmc:

Infers population size history from a diploid sequence using the PSMC model.

pstoedit:

pstoedit translates PostScript and PDF graphics into other vector formats

pullseq:

Utility program for extracting sequences from a fasta/fastq file

purge_dups:

purge haplotigs and overlaps in an assembly based on read depth

purge_haplotigs:

Pipeline to help with curating heterozygous diploid genome assemblies

pv:

Monitors the progress of data through a unix pipeline.

pyani:

Whole-genome classification using Average Nucleotide Identity

pycoQC:

Computes metrics and generates interactive QC plots for Oxford Nanopore technologies sequencing data.

pymol-open-source:

PyMOL (open source version) molecular visualization system.

pyspoa:

Python bindings to spoa.

qcat:

Command-line tool for demultiplexing Oxford Nanopore reads from FASTQ files

rDock:

rDock is a fast and versatile Open Source docking program that can be used to dock small molecules against proteins and nucleic acids. It is designed for High Throughput Virtual Screening (HTVS) campaigns and Binding Mode prediction studies. rDock is mainly written in C++ and accessory scripts and programs are written in C++, perl or python languages.

randfold:

Minimum free energy of folding randomization test software

rasusa:

Randomly subsample sequencing reads to a specified coverage.

razers3:

Tool for mapping millions of short genomic reads onto a reference genome.

rclone:

Rclone is a command line program to sync files and directories to and from a variety of online storage services

re2c:

re2c is a free and open-source lexer generator for C and C++. Its main goal is generating fast lexers: at least as fast as their reasonably optimized hand-coded counterparts. Instead of using traditional table-driven approach, re2c encodes the generated finite state automata directly in the form of conditional jumps and comparisons.

rkcommon:

A common set of C++ infrastructure and CMake utilities used by various components of Intel® oneAPI Rendering Toolkit.

rnaQUAST:

Tool for evaluating RNA-Seq assemblies using reference genome and gene database

rust-fmlrc:

FM-index Long Read Corrector (Rust implementation)

samblaster:

samblaster is a fast and flexible program for marking duplicates in read-id grouped paired-end SAM files. It can also optionally output discordant read pairs and/or split read mappings to separate SAM files, and/or unmapped/clipped reads to a separate FASTQ file. When marking duplicates, samblaster will require approximately 20MB of memory per 1M read pairs.

samclip:

Filter SAM file for soft and hard clipped alignments.

savvy:

Interface to various variant calling formats.

sbt:

sbt is a build tool for Scala, Java, and more.

sc-RNA:

Bioconductor bundle for single-cell RNA-Seq Data analysis

screen_assembly:

Pipeline that screens for presence of genes of interest (GOI) in bacterial assemblies.

seqmagick:

Seqmagick is a utility built in the spirit of imagemagick to expose the file format conversion in Biopython in a convenient way. Instead of having a big mess of scripts, there is one that takes arguments.

seqtk:

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. It seamlessly parses both FASTA and FASTQ files which can also be optionally compressed by gzip.

shrinkwrap:

A std::streambuf wrapper for compression formats.

simuG:

A general-purpose genome simulator

sismonr:

Simulation of In Silico Multi-Omic Networks R package.

skani:

accurate, fast nucleotide identity calculation for MAGs, genomes, and databases

slow5tools:

Toolkit for converting (FAST5 <-> SLOW5), compressing, viewing, indexing and manipulating data in SLOW5 format.

smafa:

Smafa attempts to align or cluster pre-aligned biological sequences, handling sequences which are all the same length.

smoove:

simplifies and speeds calling and genotyping SVs for short reads.

snakemake:

The Snakemake workflow management system is a tool to create reproducible and scalable data analyses.

snaphu:

SNAPHU is an implementation of the Statistical-cost, Network-flow Algorithm for Phase Unwrapping proposed by Chen and Zebker

snappy:

Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression.

snp-sites:

Finds SNP sites from a multi-FASTA alignment file.

snpEff:

SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of genetic variants (such as amino acid changes).

somalier:

extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF

spaln:

Stand-alone program that maps and aligns a set of cDNA or protein sequences onto a whole genomic sequence in a single job.

spdlog:

Fast C++ logging library.

spoa:

c++ implementation of the partial order alignment (POA) algorithm which is used to generate consensus sequences

sratoolkit:

The SRA Toolkit, and the source-code SRA System Development Kit (SDK), will allow you to programmatically access data housed within SRA and convert it from the SRA format

supercomputer:

Like a regular computer, but larger. Primarily used for heating data centers.

supercomputing:

Like a regular computer, but larger. Primarily used for heating data centers.

swarm:

A robust and fast clustering method for amplicon-based studies. The purpose of swarm is to provide a novel clustering algorithm that handles massive sets of amplicons. Results of traditional clustering algorithms are strongly input-order dependent, and rely on an arbitrary global clustering threshold. swarm results are resilient to input-order changes and rely on a small local linking threshold d, representing the maximum number of differences between two amplicons.

swissknife:

Perl module for reading and writing UniProtKB data in plain text format.

tRNAscan-SE:

Transfer RNA detection

tabix:

Generic indexer for TAB-delimited genome position files

tabixpp:

C++ wrapper to tabix indexer

tbb:

Intel(R) Threading Building Blocks (Intel(R) TBB) lets you easily write parallel C++ programs that take full advantage of multicore performance, that are portable, composable and have future-proof scalability.

tbl2asn:

Command-line program that automates the creation of sequence records for submission to GenBank

tmux:

tmux is a terminal multiplexer. It lets you switch easily between several programs in one terminal, detach them (they keep running in the background) and reattach them to a different terminal.

trf:

Locates tandem repeats in DNA sequences.

trimAl:

Tool for automated alignment trimming in large-scale phylogenetic analyses

unimap:

Fork of minimap2 optimized for assembly-to-reference alignment.

unrar:

RAR is a powerful archive manager.

util-linux:

Set of Linux utilities

vcflib:

Genetic variant detector designed to find polymorphisms smaller than the length of a short-read sequencing alignment.

verkko:

Hybrid genome assembly pipeline developed for telomere-to-telomere assembly of PacBio HiFi and Oxford Nanopore reads

vg:

variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods

wgsim:

Wgsim is a small tool for simulating sequence reads from a reference genome.

wheel:

A built-package format for Python.

wtdbg:

de novo sequence assembler for long noisy reads produced by PacBio or Oxford Nanopore Technologies.

wxWidgets:

widget toolkit and tools library for creating graphical user interfaces for cross-platform applications.

x264:

x264 is a free software library and application for encoding video streams into the H.264/MPEG-4 AVC compression format, and is released under the terms of the GNU GPL.

x265:

x265 is a free software library and application for encoding video streams into the H.265 AVC compression format, and is released under the terms of the GNU GPL.

xkbcommon:

keyboard keymap compiler and support library

xtb:

xtb - An extended tight-binding semi-empirical program package.

yacrd:

Chimeric Read Detector for long reads

yajl:

Yet Another JSON Library. Why does the world need another C library for parsing JSON? Good question.

yak:

Yet another k-mer analyzer

yaml-cpp:

YAML parser and emitter in C++

zlib:

zlib is designed to be a free, general-purpose, legally unencumbered -- that is, not covered by any patents -- lossless data-compression library for use on virtually any computer hardware and operating system.

zstd:

Zstandard is a real-time compression algorithm, providing high compression ratios. It offers a very wide range of compression/speed trade-off, while being backed by a very fast decoder. It also offers a special mode for small data, called dictionary compression, and can create dictionaries from any sample set.