GLOSSARY

List of terms used in this documentation.

ABAQUS:¶

Finite Element Analysis software for modeling, visualization and best-in-class implicit and explicit dynamics FEA.

ABRicate:¶

Mass screening of contigs for antimicrobial and virulence genes

ABySS:¶

Assembly By Short Sequences - a de novo, parallel, paired-end sequence assembler

ACTC:¶

ACTC converts independent triangles into triangle strips or fans.

AGAT:¶

Suite of tools to handle gene annotations in any GTF/GFF format.

AGDR:¶

The Aotearoa Genomic Data Repository provides secure within-nation storage, management and sharing of non-human genomic data generated from biological and environmental samples originating in Aotearoa New Zealand.

AGE:¶

Alignment of sequences with structural variants.

AMOS:¶

Collection of tools for genome assembly

AMRFinderPlus:¶

NCBI Antimicrobial Resistance Gene Finder Plus

ANIcalculator:¶

Calculate the bidirectional average nucleotide identity (gANI) and Alignment Fraction (AF) between two genomes.

ANNOVAR:¶

Efficient software tool to utilize update-to-date information to functionally annotate genetic variants detected from diverse genomes .

ANSYS:¶

A bundle of computer-aided engineering software including Fluent and CFX.

ANTs:¶

ANTs extracts information from complex datasets that include imaging. ANTs is useful for managing, interpreting and visualizing multidimensional data.

AOCC:¶

AMD Optimized C/C++ & Fortran compilers (AOCC) based on LLVM 13.0

AOCL-BLIS:¶

Optimized version of BLIS for AMD EPYC family of processors..

AOCL-FFTW:¶

Optimized version of FFTW for AMD EPYC family of processors.

AOCL-ScaLAPACK:¶

Optimized version of ScaLAPACK for AMD EPYC family of processors.

APR:¶

Apache Portable Runtime (APR) libraries.

APR-util:¶

Apache Portable Runtime (APR) util libraries.

ARIBA:¶

Antimicrobial Resistance Identification By Assembly

ASAGI:¶

a pArallel Server for Adaptive GeoInformation

ATK:¶

ATK provides the set of accessibility interfaces that are implemented by other toolkits and applications.

AUGUSTUS:¶

AUGUSTUS is a program that predicts genes in eukaryotic genomic sequences

Abseil:¶

Collection of C++ library code designed to augment the C++ standard library.

AdapterRemoval:¶

Ssearches for and removes remnant adapter sequences from High-Throughput Sequencing data.

AdaptiveCpp:¶

AdaptiveCpp (formerly hipSYCL) is a SYCL implementation targeting CPUs and GPUs, with a focus on leveraging existing toolchains such as CUDA or HIP

Advisor:¶

Vectorization Optimization and Thread Prototyping - Vectorize & thread code or performance “dies” - Easy workflow + data + tips = faster code faster - Prioritize, Prototype & Predict performance gain

AlphaFold:¶

AlphaFold can predict protein structures with atomic accuracy even where no similar structure is known

AlphaFold2DB:¶

AlphaFold2 databases

AlphaFold3DB:¶

AlphaFold3 databases

AlwaysIntelMKL:¶

Overrides the MKL internal utility function mkl_serv_intel_cpu_true so that AVX2 optimised kernels will be used, even when running on an AMD CPU.

Anaconda3:¶

Built to complement the rich, open source Python community, the Anaconda platform provides an enterprise-ready data analytics platform that empowers companies to adopt a modern open data science analytics architecture.

IMPORTANT: This version of Anaconda Python comes with Intel MKL support to speed up certain types of mathematical computations, such as linear algebra or FFT. The module sets

       MKL_NUM_THREADS=1

       to run MKL on a single thread by default, avoiding accidental oversubscription
       of cores. The number of threads can be increased for large problems, please
       refer to the Intel MKL documentation for guidance.

Armadillo:¶

C++ linear algebra library (matrix maths) aiming towards a good balance between speed and ease of use.

Arrow:¶

Apache Arrow, a cross-language development platform for in-memory data.

Aspera-CLI:¶

IBM Aspera Command-Line Interface (the Aspera CLI) is a collection of Aspera tools for performing high-speed, secure data transfers from the command line. The Aspera CLI is for users and organizations who want to automate their transfer workflows.

AutoDock-GPU:¶

OpenCL and Cuda accelerated version of AutoDock. It leverages its embarrasingly parallelizable LGA by processing ligand-receptor poses in parallel over multiple compute units.

AutoDock_Vina:¶

AutoDock Vina is an open-source program for doing molecular docking.

BBMap:¶

BBMap short read aligner, and other bioinformatic tools.

BCFtools:¶

Manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF.

BCL-Convert:¶

Converts per cycle binary data output by Illumina sequencers containing basecall files and quality scores to per read FASTQ files

BEAST:¶

Bayesian MCMC phylogenetic analysis of molecular sequences for reconstructing phylogenies and testing evolutionary hypotheses.

BEDOPS:¶

BEDOPS is an open-source command-line toolkit that performs highly efficient and scalable Boolean and other set operations, statistical calculations, archiving, conversion and other management of genomic data of arbitrary scale.

BEDTools:¶

The BEDTools utilities allow one to address common genomics tasks such as finding feature overlaps and computing coverage. The utilities are largely based on four widely-used file formats: BED, GFF/GTF, VCF, and SAM/BAM.

BEEF:¶

BEEF is a library implementing the Bayesian Error Estimation Functional, a description of which can be found here:

http://dx.doi.org/10.1103/PhysRevB.85.235149

BGC-Bayesian-genomic-clines:¶

Collection of code for Bayesian genomic cline analyses.

BLAST:¶

Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences.

BLASTDB:¶

BLAST databases downloaded from NCBI.

BLAT:¶

BLAT on DNA is designed to quickly find sequences of 95% and greater similarity of length 25 bases or more.

BLIS:¶

BLIS is a portable software framework for instantiating high-performance BLAS-like dense linear algebra libraries.

BOLT-LMM:¶

The BOLT-LMM algorithm computes statistics for testing association between phenotype and genotypes using a linear mixed model (LMM) [1]. By default, BOLT-LMM assumes a Bayesian mixture-of-normals prior for the random effect attributed to SNPs other than the one being tested. This model generalizes the standard "infinitesimal" mixed model used by previous mixed model association methods (e.g., EMMAX, FaST-LMM, GEMMA, GRAMMAR-Gamma, GCTA-LOCO), providing an opportunity for increased power to detect associations while controlling false positives. Additionally, BOLT-LMM applies algorithmic advances to compute mixed model association statistics much faster than eigendecomposition-based methods, both when using the Bayesian mixture model and when specialized to standard mixed model association.

BRAKER:¶

Pipeline for fully automated prediction of protein coding genes with GeneMark-ES/ET and AUGUSTUS in novel eukaryotic genomes.

BUSCO:¶

Assessing genome assembly and annotation completeness with Benchmarking Universal Single-Copy Orthologs

BWA:¶

Burrows-Wheeler Aligner (BWA) is an efficient program that aligns relatively short nucleotide sequences against a long reference sequence such as the human genome.

BamTools:¶

BamTools provides both a programmer's API and an end-user's toolkit for handling BAM files.

Bandage:¶

Bandage is a program for visualising de novo assembly graphs

Basilisk:¶

Basilisk is a Free Software program for the solution of partial differential equations on adaptive Cartesian meshes.

BayPass:¶

Genome-Wide Scan for Adaptive Differentiation and Association Analysis with population-specific covariables

BayeScan:¶

Identify candidate loci under natural selection from genetic data, using differences in allele frequencies between populations.

BayesAss:¶

Program for inference of recent immigration rates between populations using unlinked multilocus genotypes

Beagle:¶

Package for phasing genotypes and for imputing ungenotyped markers.

BiG-SCAPE:¶

Constructs sequence similarity networks of Biosynthetic Gene Clusters (BGCs) and groups them into Gene Cluster Families (GCFs).

Bifrost:¶

Highly parallel construction, indexing and querying of colored and compacted de Bruijn graphs.

Bio-DB-BigFile:¶

Read BigWig and BigBed genome feature databases

Bio-DB-HTS:¶

Read files using HTSlib including BAM/CRAM, Tabix and BCF database files

BioPP:¶

Bio++ is a set of C++ libraries for Bioinformatics, including sequence analysis, phylogenetics, molecular evolution and population genetics. Bio++ is Object Oriented and is designed to be both easy to use and computer efficient. Bio++ intends to help programmers to write computer expensive programs, by providing them a set of re-usable tools.

Bismark:¶

A tool to map bisulfite converted sequence reads and determine cytosine methylation states

Bison:¶

Bison is a general-purpose parser generator that converts an annotated context-free grammar into a deterministic LR or generalized LR (GLR) parser employing LALR(1) parser tables.

Boost:¶

Boost provides free peer-reviewed portable C++ source libraries.

Bowtie:¶

Ultrafast, memory-efficient short read aligner.

Bowtie2:¶

Ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.

Bracken:¶

Hghly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.

CD-HIT:¶

CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences.

CDO:¶

CDO is a collection of command line Operators to manipulate and analyse Climate and NWP model Data.

CFITSIO:¶

CFITSIO is a library of C and Fortran subroutines for reading and writing data files in FITS (Flexible Image Transport System) data format.

CGAL:¶

The goal of the CGAL Open Source Project is to provide easy access to efficient and reliable geometric algorithms in the form of a C++ library.

CMake:¶

CMake, the cross-platform, open-source build system. CMake is a family of tools designed to build, test and package software.

CNVnator:¶

Copy Number Variation discovery and genotyping from depth of read mapping.

CNVpytor:¶

Python package and command line tool for CNV/CNA analysis from depth-of-coverage by mapped read

COMSOL:¶

COMSOL is a multiphysics solver that provides a unified workflow for electrical, mechanical, fluid, and chemical applications.

CONCOCT:¶

Program for unsupervised binning of metagenomic contigs by using nucleotide composition, coverage data in multiple samples and linkage data from paired end reads.

CP2K:¶

CP2K is a freely available (GPL) program, written in Fortran 95, to perform atomistic and molecular simulations of solid state, liquid, molecular and biological systems. It provides a general framework for different methods such as e.g. density functional theory (DFT) using a mixed Gaussian and plane waves approach (GPW), and classical pair and many-body potentials.

CPMD:¶

The CPMD code is a parallelized plane wave / pseudopotential implementation of DFT, particularly designed for ab-initio molecular dynamics.

CPU:¶

Electronic circuitry that executes instructions of a computer program.

CRABS:¶

Creating Reference databases for Amplicon-Based Sequencing.

CRAMINO:¶

A tool for quick quality assessment of cram and bam files, intended for long read sequencing

CREST:¶

CREST is an utility/driver program for the xtb program. Originally it was designed as conformer sampling program, hence the abbreviation Conformer–Rotamer Ensemble Sampling Tool, but now offers also some utility functions for calculations with the GFNn–xTB methods. Generally the program functions as an IO based OMP scheduler (i.e., calculations are performed by the xtb program) and tool for the creation and analysation of structure ensembles.

CUDA:¶

CUDA (formerly Compute Unified Device Architecture) is a parallel computing platform and programming model created by NVIDIA and implemented by the graphics processing units (GPUs) that they produce. CUDA gives developers access to the virtual instruction set and memory of the parallel computational elements in CUDA GPUs.

Canu:¶

Sequence assembler designed for high-noise single-molecule sequencing.

CapnProto:¶

Fast data interchange format and capability-based RPC system.

CellRanger:¶

Cell Ranger is a set of analysis pipelines that process Chromium single-cell RNA-seq output to align reads, generate gene-cell matrices and perform clustering and gene expression analysis.

Centrifuge:¶

Classifier for metagenomic sequences

Cereal:¶

C++11 serialization library

CheckM:¶

CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes.

CheckM2:¶

Rapid assessment of genome bin quality using machine learning

CheckV:¶

Assess the quality of metagenome-assembled viral genomes.

Circlator:¶

A tool to circularize genome assemblies

Circos:¶

Package for visualizing data in a circular layout - this makes Circos ideal for exploring relationships between objects or positions.

Clair3:¶

Syumphonizing pileup and full-alignment for high-performance long-read variant calling.

Clang:¶

C, C++, Objective-C compiler, based on LLVM. Does not include C++ standard library -- use libstdc++ from GCC.

Clustal-Omega:¶

Clustal Omega is a multiple sequence alignment program for proteins. It produces biologically meaningful multiple sequence alignments of divergent sequences. Evolutionary relationships can be seen via viewing Cladograms or Phylograms

ClustalW2:¶

ClustalW2 is a general purpose multiple sequence alignment program for DNA or proteins.

CoverM:¶

DNA read coverage and relative abundance calculator focused on metagenomics applications

CppUnit:¶

C++ port of the JUnit framework for unit testing.

CubeLib:¶

Cube general purpose C++ library component and command-line tools.

CubeWriter:¶

Cube high-performance C writer library component.

Cytoscape:¶

Cytoscape is an open source software platform for visualizing molecular interaction networks and biological pathways and integrating these networks with annotations, gene expression profiles and other state data.

D-Genies:¶

D-Genies also allows to display dot plots from other aligners by uploading their PAF or MAF alignment file.

DAS_Tool:¶

DAS Tool is an automated method that integrates the results of a flexible number of binning algorithms to calculate an optimized, non-redundant set of bins from a single assembly.

DB:¶

Berkeley DB enables the development of custom data management solutions, without the overhead traditionally associated with such custom projects.

DBus:¶

D-Bus is a message bus system, a simple way for applications to talk to one another. In addition to interprocess communication, D-Bus helps coordinate process lifecycle; it makes it simple and reliable to code a "single instance" application or daemon, and to launch applications and daemons on demand when their services are needed.

DFT-D4:¶

Generally Applicable Atomic-Charge Dependent London Dispersion Correction.

DIAMOND:¶

Sequence aligner for protein and translated DNA searches

DOI:¶

A unique identifier that identifies digital objects. The object may change physical locations, but the DOI assigned to that object will never change.

DRAM:¶

Tool for annotating metagenomic assembled genomes and VirSorter identified viral contigs..

DeconSeq:¶

A tool that can be used to automatically detect and efficiently remove sequence contaminations from genomic and metagenomic datasets.

DeePMD-plugin:¶

Deep learning-based models of interatomic potential energy and force field, as a LAMMPS plugin.

DeepLabCut:¶

Efficient method for 3D markerless pose estimation based on transfer learning with deep neural networks.

Delft3D:¶

Integrated simulation of sediment transport and morphology, waves, water quality and ecology.

Delft3D_FM:¶

3D modeling suite to investigate hydrodynamics, sediment transport and morphology and water quality for fluvial, estuarine and coastal environments

Delly:¶

Structural variant discovery by integrated paired-end and split-read analysis

Dorado:¶

High-performance, easy-to-use, open source basecaller for Oxford Nanopore reads.

Doxygen:¶

Doxygen is a documentation system for C++, C, Java, Objective-C, Python, IDL (Corba and Microsoft flavors), Fortran, VHDL, PHP, C#, and to some extent D.

Dsuite:¶

Fast calculation of the ABBA-BABA statistics across many populations/species

EDTA:¶

Automated whole-genome de-novo TE annotation and benchmarking the annotation performance of TE libraries.

EIGENSOFT:¶

The EIGENSOFT package combines functionality from our population genetics methods (Patterson et al. 2006) and our EIGENSTRAT stratification correction method (Price et al. 2006). The EIGENSTRAT method uses principal components analysis to explicitly model ancestry differences between cases and controls along continuous axes of variation; the resulting correction is specific to a candidate marker’s variation in frequency across ancestral populations, minimizing spurious associations while maximizing power to detect true associations. The EIGENSOFT package has a built-in plotting script and supports multiple file formats and quantitative phenotypes.

ELPA:¶

Eigenvalue SoLvers for Petaflop-Applications .

EMAN2:¶

Greyscale scientific image processing suite with a primary focus on processing data from transmission electron microscopes

EMBOSS:¶

EMBOSS is 'The European Molecular Biology Open Software Suite'. EMBOSS is a free Open Source software analysis package specially developed for the needs of the molecular biology (e.g. EMBnet) user community.

ETE:¶

A Python framework for the analysis and visualization of phylogenetic trees

EasyBuild:¶

EasyBuild is a software build and installation framework written in Python that allows you to install software in a structured, repeatable and robust way.

Eigen:¶

Eigen is a C++ template library for linear algebra: matrices, vectors, numerical solvers, and related algorithms.

Embree:¶

Embree is a collection of high-performance ray tracing kernels, developed at Intel. The target users of Embree are graphics application engineers who want to improve the performance of their photo-realistic rendering application by leveraging Embree's performance-optimized ray tracing kernels.

Emu:¶

species-level taxonomic abundance for full-length 16S reads.

EukRep-EukCC:¶

Completeness and contamination estimator for metagenomic assembled microbial eukaryotic genomes. Also condatains smetana, carveme and memote .

ExaBayes:¶

Bayesian tree inference, particularly suitable for large-scale analyses.

ExaML:¶

Exascale Maximum Likelihood for phylogenetic inference using MPI.

ExpansionHunter:¶

Tool for estimating repeat sizes

FASTX-Toolkit:¶

Tools for Short-Reads FASTA/FASTQ files preprocessing.

FCM:¶

FCM Build - A powerful build system for modern Fortran software applications. FCM Version Control - Wrappers to the Subversion version control system, usage conventions and processes for scientific software development.

FDS:¶

Fire Dynamics Simulator (FDS) is a large-eddy simulation (LES) code for low-speed flows, with an emphasis on smoke and heat transport from fires.

FFTW:¶

FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and of both real and complex data.

FFTW.MPI:¶

FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and of both real and complex data.

FFmpeg:¶

A complete, cross-platform solution to record, convert and stream audio and video.

FIGARO:¶

An efficient and objective tool for optimizing microbiome rRNA gene trimming parameters.

FLTK:¶

FLTK is a cross-platform C++ GUI toolkit for UNIX/Linux (X11), Microsoft Windows, and MacOS X. FLTK provides modern GUI functionality without the bloat and supports 3D graphics via OpenGL and its built-in GLUT emulation.

Faiss:¶

Faiss is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning. Faiss is written in C++ with complete wrappers for Python/numpy. Some of the most useful algorithms are implemented on the GPU. It is developed primarily at Meta's Fundamental AI Research group.

FastANI:¶

Tool for fast alignment-free computation of whole-genome Average Nucleotide Identity (ANI).

FastME:¶

FastME: a comprehensive, accurate and fast distance-based phylogeny inference program.

FastQC:¶

A set of tools (in Java) for working with next generation sequencing data in the BAM format.

FastQ_Screen:¶

FastQ Screen allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.

FastTree:¶

FastTree infers approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences. FastTree can handle alignments with up to a million of sequences in a reasonable amount of time and memory.

Fastsimcoal2:¶

While preserving all the simulation flexibility of simcoal2, fastsimcoal is now implemented under a faster continous-time sequential Markovian coalescent approximation, allowing it to efficiently generate genetic diversity for different types of markers along large genomic regions, for both present or ancient samples. It includes a parameter sampler allowing its integration into Bayesian or likelihood parameter estimation procedure. fastsimcoal can handle very complex evolutionary scenarios including an arbitrary migration matrix between samples, historical events allowing for population resize, population fusion and fission, admixture events, changes in migration matrix, or changes in population growth rates. The time of sampling can be specified independently for each sample, allowing for serial sampling in the same or in different populations.

File-Rename:¶

A Perl version of the rename utility, with support for regular expressions.

FileSender:¶

Send large files quickly and securely using REANNZ FileSender.

Filtlong:¶

Tool for filtering long reads by quality.

FimTyper:¶

Identifies the FimH type in total or partial sequenced isolates of E. coli..

FlexiBLAS:¶

FlexiBLAS is a wrapper library that enables the exchange of the BLAS and LAPACK implementation used by a program without recompiling or relinking it.

Flye:¶

Flye is a de novo assembler for long and noisy reads, such as those produced by PacBio and Oxford Nanopore Technologies.

Foldseek:¶

Foldseek enables fast and sensitive comparisons of large protein structure sets, supporting monomer and multimer searches, as well as clustering.

FragGeneScan:¶

FragGeneScan is an application for finding (fragmented) genes in short reads.

FreeBayes:¶

Genetic variant detector designed to find polymorphisms smaller than the length of a short-read sequencing alignment.

FreeFEM:¶

FreeFEM offers a fast interpolation algorithm and a language for the manipulation of data on multiple meshes.

FreeSurfer:¶

FreeSurfer is a set of tools for analysis and visualization of structural and functional brain imaging data. FreeSurfer contains a fully automatic structural imaging stream for processing cross sectional and longitudinal data.

FreeXL:¶

FreeXL is an open source library to extract valid data from within an Excel (.xls) spreadsheet.

FriBidi:¶

Free Implementation of the Unicode Bidirectional Algorithm.

GATK:¶

The Genome Analysis Toolkit or GATK is a software package developed at the Broad Institute to analyse next-generation resequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

GCC:¶

The GNU Compiler Collection includes front ends for C, C++, Objective-C, Fortran, Java, and Ada, as well as libraries for these languages (libstdc++, libgcj,...).

GCCcore:¶

The GNU Compiler Collection includes front ends for C, C++, Objective-C, Fortran, Java, and Ada, as well as libraries for these languages (libstdc++, libgcj,...).

GD:¶

Interface to Gd Graphics Library

GDAL:¶

GDAL is a translator library for raster geospatial data formats that is released under an X/MIT style Open Source license by the Open Source Geospatial Foundation. As a library, it presents a single abstract data model to the calling application for all supported formats. It also comes with a variety of useful command-line utilities for data translation and processing. NOTE: The GDAL IO cache by default uses 5% of total memory. This seems not necessary. This module sets GDAL_CACHEMAX=256 (256MB), which should have no performance impact. Feel free to change if necessary, using 'export GDAL_CACHEMAX=xxx' (in your job script) after loading the GDAL module.

GEMMA:¶

Genome-wide Efficient Mixed Model Association

GEOS:¶

GEOS (Geometry Engine - Open Source) is a C++ port of the Java Topology Suite (JTS)

GLib:¶

GLib is one of the base libraries of the GTK+ project

GMAP-GSNAP:¶

GMAP: A Genomic Mapping and Alignment Program for mRNA and EST Sequences GSNAP: Genomic Short-read Nucleotide Alignment Program

GMP:¶

GMP is a free library for arbitrary precision arithmetic, operating on signed integers, rational numbers, and floating point numbers.

GMT:¶

GMT is an open source collection of about 80 command-line tools for manipulating geographic and Cartesian data sets (including filtering, trend fitting, gridding, projecting, etc.) and producing PostScript illustrations ranging from simple x-y plots via contour maps to artificially illuminated surfaces and 3D perspective views; the GMT supplements add another 40 more specialized and discipline-specific tools.

GOLD:¶

A genetic algorithm for docking flexible ligands into protein binding sites

GPAW:¶

GPAW is a density-functional theory (DFT) Python code based on the projector-augmented wave (PAW) method and the atomic simulation environment (ASE). It uses real-space uniform grids and multigrid methods or atom-centered basis-functions.

GPFS:¶

High-performance clustered file system software developed by IBM.

GRASS:¶

The Geographic Resources Analysis Support System - used for geospatial data management and analysis, image processing, graphics and maps production, spatial modeling, and visualization

GRIDSS:¶

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.

GROMACS:¶

GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles.

This is a GPU enabled build, containing both MPI and threadMPI binaries.

GSL:¶

The GNU Scientific Library (GSL) is a numerical library for C and C++ programmers. The library provides a wide range of mathematical routines such as random number generators, special functions and least-squares fitting.

GST-plugins-base:¶

GStreamer plug-ins and elements.

GStreamer:¶

library for constructing graphs of media-handling components..

GTDB-Tk:¶

A toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes.

GTK+:¶

GTK+ is the primary library used to construct user interfaces in GNOME.

GTS:¶

GTS stands for the GNU Triangulated Surface Library. It is an Open Source Free Software Library intended to provide a set of useful functions to deal with 3D surfaces meshed with interconnected triangles.

GUI:¶

A digital interface in which a user interacts with graphical components such as icons, buttons, and menus.

GUSHR:¶

Assembly-free construction of UTRs from short read RNA-Seq data on the basis of coding sequence annotation.

Gdk-Pixbuf:¶

The Gdk Pixbuf is a toolkit for image loading and pixel buffer manipulation. It is used by GTK+ 2 and GTK+ 3 to load and manipulate images. In the past it was distributed as part of GTK+ 2 but it was split off into a separate package in preparation for the change to GTK+ 3.

GeneMark-ES:¶

Eukaryotic gene prediction suite with automatic training

GenoVi:¶

Generates circular genome representations for complete, draft, and multiple bacterial and archaeal genomes.

GenomeThreader:¶

GenomeThreader is a software tool to compute gene structure predictions.

GetOrganelle:¶

Toolkit to assemble organelle genome from genomic skimming data.

GlimmerHMM:¶

Gene finder based on a Generalized Hidden Markov Model.

Globus-CLI:¶

A Command Line Wrapper over the Globus SDK for Python, which provides an interface to Globus services from the shell, and is suited to both interactive and simple scripting use cases.

Go:¶

An open source programming language

Graphviz:¶

Graphviz is open source graph visualization software. Graph visualization is a way of representing structural information as diagrams of abstract graphs and networks. It has important applications in networking, bioinformatics, software engineering, database and web design, machine learning, and in visual interfaces for other technical domains.

Gubbins:¶

Genealogies Unbiased By recomBinations In Nucleotide Sequences

HDF:¶

HDF (also known as HDF4) is a library and multi-object file format for storing and managing data between machines.

HDF5:¶

HDF5 is a data model, library, and file format for storing and managing data. It supports an unlimited variety of datatypes, and is designed for flexible and efficient I/O and for high volume and complex data.

HISAT2:¶

HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) against the general human population (as well as against a single reference genome).

HMMER:¶

HMMER is used for searching sequence databases for homologs of protein sequences, and for making protein sequence alignments. It implements methods using probabilistic models called profile hidden Markov models (profile HMMs). Compared to BLAST, FASTA, and other sequence alignment and database search tools based on older scoring methodology, HMMER aims to be significantly more accurate and more able to detect remote homologs because of the strength of its underlying mathematical models. In the past, this strength came at significant computational expense, but in the new HMMER3 project, HMMER is now essentially as fast as BLAST.

HMMER2:¶

HPC:¶

Like a regular computer, but larger. Primarily used for heating data centers.

HTSeq:¶

HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments.

HTSlib:¶

A C library for reading/writing high-throughput sequencing data. This package includes the utilities bgzip and tabix

HarfBuzz:¶

HarfBuzz is an OpenType text shaping engine.

HpcGridRunner:¶

HPC GridRunner is a simple command-line interface to high throughput computing using a variety of different grid computing platforms, including LSF, SGE, SLURM, and PBS.

Humann:¶

Pipeline for efficiently and accurately determining the coverage and abundance of microbial pathways in a community from metagenomic data.

HybPiper:¶

Extracting Coding Sequence and Introns for Phylogenetics from High-Throughput Sequencing Reads Using Target Enrichment.

Hypre:¶

Hypre is a library for solving large, sparse linear systems of equations on massively parallel computers. The problems of interest arise in the simulation codes being developed at LLNL and elsewhere to study physical phenomena in the defense, environmental, energy, and biological sciences.

ICU:¶

C/C++ and Java libraries providing Unicode and Globalization support for software applications.

IDBA-UD:¶

IDBA-UD is a iterative De Bruijn Graph De Novo Assembler for Short Reads Sequencing data with Highly Uneven Sequencing Depth.

IGV:¶

The Integrative Genomics Viewer (IGV) is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data

IQ-TREE:¶

Efficient phylogenomic software by maximum likelihood

IRkernel:¶

R packages for providing R kernel for Jupyter.

ISA-L:¶

Intelligent Storage Acceleration Library

ImageMagick:¶

Create, edit, compose, or convert bitmap images

Infernal:¶

Infernal ('INFERence of RNA ALignment') is for searching DNA sequence databases for RNA structure and sequence similarities.

InterProScan:¶

Sequence analysis application (nucleotide and protein sequences) that combines different protein signature recognition methods into one resource.

JAGS:¶

Just Another Gibbs Sampler - a program for the statistical analysis of Bayesian hierarchical models by Markov Chain Monte Carlo.

JasPer:¶

The JasPer Project is an open-source initiative to provide a free software-based reference implementation of the codec specified in the JPEG-2000 Part-1 standard.

Java:¶

Java Platform, Standard Edition (Java SE) lets you develop and deploy Java applications on desktops and servers.

Jellyfish:¶

Jellyfish is a tool for fast, memory-efficient counting of k-mers in DNA.

JsonCpp:¶

JsonCpp is a C++ library that allows manipulating JSON values, including serialization and deserialization to and from strings. It can also preserve existing comment in unserialization/serialization steps, making it a convenient format to store user input files.

Julia:¶

A high-level, high-performance dynamic language for technical computing.

This version was compiled from source with USE_INTEL_JITEVENTS=1 to enable profiling with VTune.

JupyterLab:¶

An extensible environment for interactive and reproducible computing, based on the Jupyter Notebook and Architecture.

KAT:¶

The K-mer Analysis Toolkit (KAT) contains a number of tools that analyse and compare K-mer spectra.

KEALib:¶

KEALib provides an implementation of the GDAL data model. The format supports raster attribute tables, image pyramids, meta-data and in-built statistics while also handling very large files and compression throughout. Based on the HDF5 standard, it also provides a base from which other formats can be derived and is a good choice for long term data archiving. An independent software library (libkea) provides complete access to the KEA image format and a GDAL driver allowing KEA images to be used from any GDAL supported software.

KMC:¶

Disk-based programm for counting k-mers from (possibly gzipped) FASTQ/FASTA files.

Kaiju:¶

Kaiju is a program for sensitive taxonomic classification of high-throughput sequencing reads from metagenomic whole genome sequencing experiments

Kent_tools:¶

Collection of tools used by the UCSC genome browser.

KmerGenie:¶

KmerGenie estimates the best k-mer length for genome de novo assembly.

KorfSNAP:¶

Semi-HMM-based Nucleic Acid Parser

Kraken2:¶

Taxonomic sequence classifier.

KronaTools:¶

Krona Tools is a set of scripts to create Krona charts from several Bioinformatics tools as well as from text and XML files.

LAME:¶

LAME is a high quality MPEG Audio Layer III (MP3) encoder licensed under the LGPL.

LAMMPS:¶

LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. LAMMPS has potentials for solid-state materials (metals, semiconductors) and soft matter (biomolecules, polymers) and coarse-grained or mesoscopic systems. It can be used to model atoms or, more generically, as a parallel particle simulator at the atomic, meso, or continuum scale. LAMMPS runs on single processors or in parallel using message-passing techniques and a spatial-decomposition of the simulation domain. The code is designed to be easy to modify or extend with new functionality.

LAST:¶

LAST finds similar regions between sequences.

LASTZ:¶

LASTZ is a program for aligning DNA sequences, a pairwise aligner. Originally designed to handle sequences the size of human chromosomes and from different species, it is also useful for sequences produced by NGS sequencing technologies such as Roche 454.

LDC:¶

D programming language compiler

LEfSe:¶

Determines the features most likely to explain differences between classes by coupling standard tests for statistical significance with additional tests encoding biological consistency and effect relevance

LINKS:¶

Alignment-free scaffolding of genome assembly drafts with long reads

LLVM:¶

The LLVM Core libraries provide a modern source- and target-independent optimizer, along with code generation support for many popular CPUs (as well as some less common ones!) These libraries are built around a well specified code representation known as the LLVM intermediate representation ("LLVM IR"). The LLVM Core libraries are well documented, and it is particularly easy to invent your own language (or port an existing compiler) to use LLVM as an optimizer and code generator.

LMDB:¶

LMDB is a fast, memory-efficient database. With memory-mapped files, it has the read performance of a pure in-memory database while retaining the persistence of standard disk-based databases.

LSD2:¶

Least-squares methods to estimate rates and dates from phylogenies

LTR_retriever:¶

Highly accurate and sensitive program for identification of LTR retrotransposons; The LTR Assembly Index (LAI) is also included in this package.

LUMPY:¶

A probabilistic framework for structural variant discovery.

LZO:¶

Portable lossless data compression library

LibTIFF:¶

tiff: Library and tools for reading and writing TIFF data files

Libint:¶

Libint library is used to evaluate the traditional (electron repulsion) and certain novel two-body matrix elements (integrals) over Cartesian Gaussian functions used in modern atomic and molecular theory.

Liftoff:¶

Tool that accurately maps annotations in GFF or GTF between assemblies of the same, or closely-related species.

LittleCMS:¶

Color management engine.

LongStitch:¶

A genome assembly correction and scaffolding pipeline using long reads

M4:¶

GNU M4 is an implementation of the traditional Unix macro processor. It is mostly SVR4 compatible although it has some extensions (for example, handling more than 9 positional parameters to macros). GNU M4 also has built-in functions for including files, running shell commands, doing arithmetic, etc.

MAFFT:¶

Multiple sequence alignment program offering a range of methods.

MAKER:¶

Genome annotation pipeline

MATLAB:¶

A high-level language and interactive environment for numerical computing.

MCL:¶

The MCL algorithm is short for the Markov Cluster Algorithm, a fast and scalable unsupervised cluster algorithm for graphs (also known as networks) based on simulation of (stochastic) flow in graphs.

MCR:¶

The Matlab Compiler Runtime is required for running compiled MATLAB executables without MATLAB itself.

MDI:¶

The MolSSI Driver Interface (MDI) project provides a standardized API for fast, on-the-fly communication between computational chemistry codes. This greatly simplifies the process of implementing methods that require the cooperation of multiple software packages and enables developers to write a single implementation that works across many different codes. The API is sufficiently general to support a wide variety of techniques, including QM/MM, ab initio MD, machine learning, advanced sampling, and path integral MD, while also being straightforwardly extensible. Communication between codes is handled by the MDI Library, which enables tight coupling between codes using either the MPI or TCP/IP methods.

MEGAHIT:¶

An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph

METABOLIC:¶

Metabolic And Biogeochemistry anaLyses In microbes

METIS:¶

METIS is a set of serial programs for partitioning graphs, partitioning finite element meshes, and producing fill reducing orderings for sparse matrices. The algorithms implemented in METIS are based on the multilevel recursive-bisection, multilevel k-way, and multi-constraint partitioning schemes.

MMseqs2:¶

MMseqs2: ultra fast and sensitive search and clustering suite

MPFR:¶

The MPFR library is a C library for multiple-precision floating-point computations with correct rounding.

MPI:¶

A standardised message-passing standard designed to function on parallel computing architectures.

MUMPS:¶

A parallel sparse direct solver

MUMmer:¶

MUMmer is a system for rapidly aligning entire genomes, whether in complete or draft form. AMOS makes use of it.

MUSCLE:¶

MUSCLE is a program for creating multiple alignments of amino acid or nucleotide sequences. A range of options is provided that give you the choice of optimizing accuracy, speed, or some compromise between the two.

MaSuRCA:¶

MaSuRCA is whole genome assembly software. It combines the efficiency of the de Bruijn graph
and Overlap-Layout-Consensus (OLC) approaches. MaSuRCA can assemble data sets containing only short reads from Illumina sequencing or a mixture of short reads and long reads (Sanger, 454, Pacbio and Nanopore).

Mamba:¶

Mamba is a fast, robust, and cross-platform package manager.

Mash:¶

Fast genome and metagenome distance estimation using MinHash

MashMap:¶

Implements a fast and approximate algorithm for computing local alignment boundaries between long DNA sequences

Mashtree:¶

Create a tree using Mash distances.

Maven:¶

Binary maven install, Apache Maven is a software project management and comprehension tool. Based on the concept of a project object model (POM), Maven can manage a project's build, reporting and documentation from a central piece of information.

MaxBin:¶

MaxBin is software for binning assembled metagenomic sequences based on an Expectation-Maximization algorithm.

Merqury:¶

Evaluate genome assemblies with k-mers and more

Mesa:¶

Mesa is an open-source implementation of the OpenGL specification - a system for rendering interactive 3D graphics.

Note that this build enables CPU-based rendering with OpenSWR and LLVM. The module is intended to be used with visualisation software, such as ParaView, on nodes where no GPU hardware is available.

Both on-screen and off-screen rendering are supported.

Meson:¶

Meson is a cross-platform build system designed to be both as fast and as user friendly as possible.

MetaBAT:¶

An efficient tool for accurately reconstructing single genomes from complex microbial communities

MetaEuk:¶

MetaEuk is a modular toolkit designed for large-scale gene discovery and annotation in eukaryotic metagenomic contigs.

MetaGeneAnnotator:¶

MetaGeneAnnotator is a gene-finding program for prokaryote and phage.

MetaPhlAn:¶

MetaPhlAn is a computational tool for profiling the composition of microbial communities (Bacteria, Archaea, Eukaryotes and Viruses) from metagenomic shotgun sequencing data (i.e. not 16S) with species-level. With the newly added StrainPhlAn module, it is now possible to perform accurate strain-level microbial profiling.

MetaPhlAn2:¶

MetaSV:¶

Structural-variant caller

Metaxa2:¶

Taxonomic classification of rRNA.

MiMiC:¶

MiMiC: A Framework for Multiscale Modeling in Computational Chemistry

This package includes mimicpy

MiMiC-CommLib:¶

The MiMiC communication library (MCL) enables communication between external programs coupled through the MiMiC framework.

Miniconda3:¶

A platform for Python-based data analytics

Miniforge3:¶

Community-led recipes, infrastructure and distributions for conda.

Minimac3:¶

Low memory and more computationally efficient implementation of the genotype imputation algorithms.

Minimac4:¶

Low memory and more computationally efficient implementation of the genotype imputation algorithms.

MitoZ:¶

Toolkit which aims to automatically filter pair-end raw data, assemble genome, search for mitogenome sequences from the genome assembly result, annotate mitogenome, and mitogenome visualization.

Mmg:¶

Mmg is an open source software for simplicial remeshing. It provides 3 applications and 4 libraries: the mmg2d application and the libmmg2d library: adaptation and optimization of a two-dimensional triangulation and generation of a triangulation from a set of points or from given boundary edges the mmgs application and the libmmgs library: adaptation and optimization of a surface triangulation and isovalue discretization the mmg3d application and the libmmg3d library: adaptation and optimization of a tetrahedral mesh and implicit domain meshing the libmmg library gathering the libmmg2d, libmmgs and libmmg3d libraries.

ModDotPlot:¶

Novel dot plot visualization tool used to view tandem repeats

ModelTest-NG:¶

Tool for selecting the best-fit model of evolution for DNA and protein alignments.

Molpro:¶

Molpro is a complete system of ab initio programs for molecular electronic structure calculations.

Mono:¶

An open source, cross-platform, implementation of C# and the CLR that is binary compatible with Microsoft.NET.

Monocle3:¶

An analysis toolkit for single-cell RNA-seq.

Mothur:¶

Mothur is a single piece of open-source, expandable software to fill the bioinformatics needs of the microbial ecology community.

MrBayes:¶

MrBayes is a program for the Bayesian estimation of phylogeny.

MultiQC:¶

Aggregate results from bioinformatics analyses across many samples into a single report. MultiQC searches a given directory for analysis logs and compiles a HTML report. It's a general use tool, perfect for summarising the output from numerous bioinformatics tools.

NAMD:¶

NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems.

NASM:¶

NASM: General-purpose x86 assembler

NCCL:¶

The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multi-node collective communication primitives that are performance optimized for NVIDIA GPUs.

NCL:¶

NCL is an interpreted language designed specifically for scientific data analysis and visualization.

NCO:¶

manipulates and analyzes data stored in netCDF-accessible formats, including DAP, HDF4, and HDF5

NECAT:¶

Rrror correction and de-novo assembly tool for Nanopore long noisy reads

NGS:¶

NGS is a new, domain-specific API for accessing reads, alignments and pileups produced from Next Generation Sequencing.

NIWA:¶

Crown Research Institute, conducts research across a broad range of disciplines in the environmental sciences.

NLopt:¶

NLopt is a free/open-source library for nonlinear optimization, providing a common interface for a number of different free optimization routines available online as well as original implementations of various other algorithms

NSPR:¶

Netscape Portable Runtime (NSPR) provides a platform-neutral API for system level and libc-like functions.

NSS:¶

Network Security Services (NSS) is a set of libraries designed to support cross-platform development of security-enabled client and server applications.

NVHPC:¶

C, C++ and Fortran compilers included with the NVIDIA HPC SDK (previously: PGI)

NWChem:¶

NWChem aims to provide its users with computational chemistry tools that are scalable both in their ability to treat large scientific computational chemistry problems efficiently, and in their use of available parallel computing resources from high-performance parallel supercomputers to conventional workstation clusters. NWChem software can handle: biomolecules, nanostructures, and solid-state; from quantum to classical, and all combinations; Gaussian basis functions or plane-waves; scaling from one to thousands of processors; properties and relativity.

NanoComp:¶

Comparing runs of Oxford Nanopore sequencing data and alignments

NanoLyse:¶

Removing reads mapping to the lambda genome.

NanoPlot:¶

Plotting suite for Oxford Nanopore sequencing data and alignments.

NanoStat:¶

Tool for phasing genomic variants using DNA sequencing reads, also called read-based phasing or haplotype assembly.

NeSI:¶

New Zealand national high performance computing platform.

NewHybrids:¶

This implements a Gibbs sampler to estimate the posterior probability that genetically sampled individuals fall into each of a set of user-defined hybrid categories.

Newton-X:¶

NX is a general-purpose program package for simulating the dynamics of electronically excited molecules and molecular assemblies.

NextPolish2:¶

a fast and efficient genome polishing tool for long-read assembly

Nextflow:¶

Nextflow is a reactive workflow framework and a programming DSL that eases writing computational pipelines with complex data

Nim:¶

Nim is a systems and applications programming language.

Ninja:¶

Ninja is a small build system with a focus on speed.

Nsight-Compute:¶

NVIDIA® Nsight™ Compute is an interactive kernel profiler for CUDA applications. It provides detailed performance metrics and API debugging via a user interface and command line tool.

Nsight-Systems:¶

NVIDIA® Nsight™ Systems is a system-wide performance analysis tool designed to visualize an application’s algorithm, help you select the largest opportunities to optimize, and tune to scale efficiently across any quantity of CPUs and GPUs

OBITools:¶

Manipulate various data and sequence files.

OPARI2:¶

source-to-source instrumentation tool for OpenMP and hybrid codes. It surrounds OpenMP directives and runtime library calls with calls to the POMP2 measurement interface.

ORCA:¶

ORCA is a flexible, efficient and easy-to-use general purpose tool for quantum chemistry with specific emphasis on spectroscopic properties of open-shell molecules. It features a wide variety of standard quantum chemical methods ranging from semiempirical methods to DFT to single- and multireference correlated ab initio methods. It can also treat environmental and relativistic effects.

ORCID:¶

A nonproprietary alphanumeric code to uniquely identify authors and contributors of scholarly communication, bibliographic output and other user-supplied pieces of information.

OSPRay:¶

OSPRay features interactive CPU rendering capabilities geared towards Scientific Visualization applications. Advanced shading effects such as Ambient Occlusion, shadows, and transparency can be rendered interactively, enabling new insights into data exploration.

OSU-Micro-Benchmarks:¶

OSU Micro-Benchmarks for MPI

OTF2:¶

The Open Trace Format 2 is a highly scalable, memory efficient event trace data format plus support library

OTP:¶

An automatically generated numeric code that authenticates a user for a single login.

OpenBLAS:¶

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.

OpenBabel:¶

Open Babel is a chemical toolbox designed to speak the many languages of chemical data. It's an open, collaborative project allowing anyone to search, convert, analyze, or store data from molecular modeling, chemistry, solid-state materials, biochemistry, or related areas.

OpenCV:¶

OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library. OpenCV was built to provide a common infrastructure for computer vision applications and to accelerate the use of machine perception in the commercial products.

OpenFOAM:¶

OpenFOAM is a free, open source CFD software package. OpenFOAM has an extensive range of features to solve anything from complex fluid flows involving chemical reactions, turbulence and heat transfer, to solid dynamics and electromagnetics.

OpenJPEG:¶

An open-source JPEG 2000 codec written in C

OpenMPI:¶

The Open MPI Project is an open source MPI-3 implementation.

OpenSSL:¶

The OpenSSL Project is a collaborative effort to develop a robust, commercial-grade, full-featured, and Open Source toolchain implementing the Secure Sockets Layer (SSL v2/v3) and Transport Layer Security (TLS v1) protocols as well as a full-strength general purpose cryptography library.

OpenSees:¶

OpenSees is a software framework for developing applications to simulate the performance of structural and geotechnical systems subjected to earthquakes.

OpenSeesPy:¶

Wraps OpenSees for Python. Load an OpenSees module as well.

OpenSlide:¶

OpenSlide is a C library that provides a simple interface to read whole-slide images (also known as virtual slides).

OrthoFinder:¶

OrthoFinder is a fast, accurate and comprehensive platform for comparative genomics

PALEOMIX:¶

pipelines and tools designed to aid the rapid processing of High-Throughput Sequencing (HTS) data.

PAML:¶

PAML is a package of programs for phylogenetic analyses of DNA or protein sequences using maximum likelihood.

PAPI:¶

PAPI provides the tool designer and application engineer with a consistent interface and methodology for use of the performance counter hardware found in most major microprocessors. PAPI enables software engineers to see, in near real time, the relation between software performance and processor events. In addition Component PAPI provides access to a collection of components that expose performance measurement opportunites across the hardware and software stack.

PCRE:¶

The PCRE library is a set of functions that implement regular expression pattern matching using the same syntax and semantics as Perl 5.

PCRE2:¶

The PCRE library is a set of functions that implement regular expression pattern matching using the same syntax and semantics as Perl 5.

PDAL:¶

PDAL is Point Data Abstraction Library. It is a C/C++ open source library and applications for translating and processing point cloud data. It is not limited to LiDAR data, although the focus and impetus for many of the tools in the library have their origins in LiDAR.

PEAR:¶

Memory-efficient,fully parallelized and highly accurate pair-end read merger.

PETSc:¶

PETSc, pronounced PET-see (the S is silent), is a suite of data structures and routines for the scalable (parallel) solution of scientific applications modeled by partial differential equations.

PLINK:¶

PLINK is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner. The focus of PLINK is purely on analysis of genotype/phenotype data, so there is no support for steps prior to this (e.g. study design and planning, generating genotype or CNV calls from raw data). Through integration with gPLINK and Haploview, there is some support for the subsequent visualization, annotation and storage of results.

PLUMED:¶

PLUMED is an open source library for free energy calculations in molecular systems which works together with some of the most popular molecular dynamics engines. Free energy calculations can be performed as a function of many order parameters with a particular focus on biological problems, using state of the art methods such as metadynamics, umbrella sampling and Jarzynski-equation based steered MD. The software, written in C++, can be easily interfaced with both fortran and C/C++ codes.

POSIX:¶

A set of standard operating system interfaces based on the Unix operating system

PRANK:¶

Probabilistic multiple alignment program for DNA, codon and amino-acid sequences. .

PROJ:¶

Program proj is a standard Unix filter function which converts geographic longitude and latitude coordinates into cartesian coordinates

PSpaMM:¶

Generates inline-Assembly for sparse Matrix Multiplication.

PUMI:¶

parallel unstructured mesh infrastructure API

Pango:¶

Pango is a library for laying out and rendering of text, with an emphasis on internationalization. Pango can be used anywhere that text layout is needed, though most of the work on Pango so far has been done in the context of the GTK+ widget toolkit. Pango forms the core of text and font handling for GTK+-2.x.

ParMETIS:¶

ParMETIS is an MPI-based parallel library that implements a variety of algorithms for partitioning unstructured graphs, meshes, and for computing fill-reducing orderings of sparse matrices. ParMETIS extends the functionality provided by METIS and includes routines that are especially suited for parallel AMR computations and large scale numerical simulations. The algorithms implemented in ParMETIS are based on the parallel multilevel k-way graph-partitioning, adaptive repartitioning, and parallel multi-constrained partitioning schemes.

ParaView:¶

Parallel scientific visualizer.

Parallel:¶

parallel: Build and execute shell commands in parallel

ParallelIO:¶

A high-level Parallel I/O Library for structured grid applications

Perl:¶

Larry Wall's Practical Extraction and Report Language

PhyML:¶

Phylogenetic estimation using Maximum Likelihood

PhyloPhlAn:¶

Integrated pipeline for large-scale phylogenetic profiling of genomes and metagenomes.

Pilon:¶

Pilon is an automated genome assembly improvement and variant detection tool

PnetCDF:¶

Parallel netCDF: A Parallel I/O Library for NetCDF File Access

Porechop:¶

Porechop is a tool for finding and removing adapters from Oxford Nanopore reads. Adapters on the ends of reads are trimmed off, and when a read has an adapter in its middle, it is treated as chimeric and chopped into separate reads. Porechop performs thorough alignments to effectively find adapters, even at low sequence identity

Porechop_ABI:¶

Extension of Porechop whose purpose is to process adapter sequences in ONT reads

PostgreSQL:¶

Object-relational database system.

Prodigal:¶

Prodigal (Prokaryotic Dynamic Programming Genefinding Algorithm) is a microbial (bacterial and archaeal) gene finding program developed at Oak Ridge National Laboratory and the University of Tennessee.

ProtHint:¶

Pipeline for predicting and scoring hints (in the form of introns, start and stop codons) in the genome of interest by mapping and spliced aligning predicted genes to a database of reference protein sequences.

Proteinortho:¶

Proteinortho is a tool to detect orthologous genes within different species.

PyQt:¶

PyQt5 is a set of Python bindings for v5 of the Qt application framework from The Qt Company. This bundle includes PyQtWebEngine, a set of Python bindings for The Qt Company’s Qt WebEngine framework.

PyTorch:¶

Tensors and Dynamic neural networks in Python with strong GPU acceleration. PyTorch is a deep learning framework that puts Python first.

Python:¶

Python is a programming language that lets you work more quickly and integrate your systems more effectively.

Python-Geo:¶

Python packages for geospatial data I/O, mostly based on the OSGEO libraries GDAL and OGR

QIIME2:¶

An open-source bioinformatics pipeline for microbiome analysis from raw DNA sequencing data.

QUAST:¶

Evaluates genome assemblies

Qt5:¶

Qt is a comprehensive cross-platform C++ application framework.

QuantumESPRESSO:¶

Quantum ESPRESSO is an integrated suite of computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves, and pseudopotentials (both norm-conserving and ultrasoft).

QuickTree:¶

Efficient implementation of the Neighbor-Joining algorithm, capable of reconstructing phylogenies from huge alignments .

R:¶

R is a free software environment for statistical computing and graphics.

R-Geo:¶

R packages for Geometric and Geospatial data which depend on GEOS and/or GDAL.

R-bundle-Bioconductor:¶

Bioconductor provides tools for the analysis and comprehension of high-throughput genomic data.

RAxML:¶

RAxML search algorithm for maximum likelihood based inference of phylogenetic trees.

RAxML-NG:¶

RAxML-NG is a phylogenetic tree inference tool which uses maximum-likelihood (ML) optimality criterion. Its search heuristic is based on iteratively performing a series of Subtree Pruning and Regrafting (SPR) moves, which allows to quickly navigate to the best-known ML tree.

RDP-Classifier:¶

The RDP Classifier is a naive Bayesian classifier that can rapidly and accurately provides taxonomic assignments from domain to genus, with confidence estimates for each assignment.

RE2:¶

fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python. It is a C++library.

RECON:¶

De novo identification and classification of repeat sequence families from genomic sequences

REViewer:¶

Tool for visualizing alignments of reads in regions containing tandem repeats

RFPlasmid:¶

Predicting plasmid contigs from assemblies

RFdiffusion:¶

Structure generation, with or without conditional information (a motif, target etc) It can perform a whole range of protein design challenges as we have outlined in the RFdiffusion paper.

RJMCMC:¶

This library provides routines for running Reversible Jump Monte-Carlo Markov chains for 1-D and 2-D spatial regression problems.

RMBlast:¶

RMBlast supports RepeatMasker searches by adding a few necessary features to the stock NCBI blastn program. These include: Support for custom matrices ( without KA-Statistics ). Support for cross_match-like complexity adjusted scoring. Cross_match is Phil Green's seeded smith-waterman search algorithm. Support for cross_match-like masklevel filtering..

RNAmmer:¶

consistent and rapid annotation of ribosomal RNA genes.

ROOT:¶

The ROOT system provides a set of OO frameworks with all the functionality needed to handle and analyze large amounts of data in a very efficient way.

RSEM:¶

Estimates gene and isoform expression levels from RNA-Seq data

RSGISLib:¶

The Remote Sensing and GIS software library (RSGISLib) is a collection of tools for processing remote sensing and GIS datasets. The tools are accessed using Python bindings or an XML interface.

RStudio-Server:¶

RStudio-Server for OpenOnDemand.

Racon:¶

Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads.

Ragout:¶

Tool for chromosome assembly using multiple references.

RapidNJ:¶

An algorithmic engineered implementation of canonical neighbour-joining.

Ratatosk:¶

Phased hybrid error correction of long reads using colored de Bruijn graphs

Raven:¶

De novo genome assembler for long uncorrected reads.

Rcorrector:¶

kmer-based error correction method for RNA-seq data.

Relion:¶

RELION (for REgularised LIkelihood OptimisatioN, pronounce rely-on) is a stand-alone computer program that employs an empirical Bayesian approach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM).

RepeatMasker:¶

Screens DNA sequences for interspersed repeats and low complexity DNA sequences.
For licensing reasons RepBase is not included, instead you must set LIBDIR to point at a directory which contains your copy of it.

RepeatModeler:¶

De novo transposable element (TE) family identification and modeling package.

RepeatScout:¶

De novo identification of repeat families in large genomes

Riskscape:¶

RiskScape is an open-source spatial data processing application used for multi-hazard risk analysis. RiskScape is highly customisable, letting modellers tailor the risk analysis to suit the problem domain and input data being modelled.

Roary:¶

Rapid large-scale prokaryote pan genome analysis

Rosetta:¶

Rosetta is the premier software suite for modeling macromolecular structures. As a flexible, multi-purpose application, it includes tools for structure prediction, design, and remodeling of proteins and nucleic acids.

Ruby:¶

Ruby is a dynamic, open source programming language with a focus on simplicity and productivity. It has an elegant syntax that is natural to read and easy to write.

Rust:¶

Systems programming language that runs blazingly fast, prevents segfaults, and guarantees thread safety.

SAGE:¶

Ppackage containing programs for use in the genetic analysis of family, pedigree and individual data.

SAMtools:¶

Samtools is a suite of programs for interacting with high-throughput sequencing data. SAMtools - Reading/writing/editing/indexing/viewing SAM/BAM/CRAM format

SCOTCH:¶

Software package and libraries for sequential and parallel graph partitioning, static mapping, and sparse matrix block ordering, and sequential mesh and hypergraph partitioning.

SCP:¶

Means of securely transferring files between over an SSH connection.

SCons:¶

SCons is a software construction tool.

SDL2:¶

Simple DirectMedia Layer, a cross-platform multimedia library

SEPP:¶

SATe-enabled Phylogenetic Placement. Phylogenetic placement of short reads into reference alignments and trees.

SHAPEIT4:¶

Estimation of haplotypes (aka phasing) for SNP array and high coverage sequencing data.

SIONlib:¶

Scalable I/O library for parallel access to task-local files.

SIP:¶

SIP is a tool that makes it very easy to create Python bindings for C and C++ libraries.

SKESA:¶

SKESA is a de-novo sequence read assembler for cultured single isolate genomes based on DeBruijn graphs.

SLEPc:¶

SLEPc (Scalable Library for Eigenvalue Problem Computations) is a software library for the solution of large scale sparse eigenvalue problems on parallel computers. It is an extension of PETSc and can be used for either standard or generalized eigenproblems, with real or complex arithmetic. It can also be used for computing a partial SVD of a large, sparse, rectangular matrix, and to solve quadratic eigenvalue problems.

SMRT-Link:¶

PacBio’s open-source software suite is designed for use with Single Molecule, Real-Time (SMRT) Sequencing data.

SNVoter-NanoMethPhase:¶

SNVoter - A top up tool to enhance SNV calling from Nanopore sequencing data & NanoMethPhase - Phase long reads and CpG methylations from Oxford Nanopore Technologies.

SPAdes:¶

Genome assembler for single-cell and isolates data sets

SQLite:¶

SQLite: SQL Database Engine in a C Library

SSAHA2:¶

Pairwise sequence alignment program designed for the efficient mapping of sequencing reads onto genomic reference sequences.

SSH:¶

A network communication protocol that enables two computers to communicate

STAR:¶

Fast universal RNA-seq aligner

SUNDIALS:¶

SUNDIALS: SUite of Nonlinear and DIfferential/ALgebraic Equation Solvers

SURVIVOR:¶

Tool set for simulating/evaluating SVs, merging and comparing SVs within and among samples, and includes various methods to reformat or summarize SVs.

SWIG:¶

SWIG is a software development tool that connects programs written in C and C++ with a variety of high-level programming languages.

Salmon:¶

Salmon is a wicked-fast program to produce a highly-accurate, transcript-level quantification estimates from RNA-seq data.

Sambamba:¶

Tools for working with SAM/BAM data

ScaLAPACK:¶

The ScaLAPACK (or Scalable LAPACK) library includes a subset of LAPACK routines redesigned for distributed memory MIMD parallel computers.

SeisSol:¶

SeisSol is a software package for simulating wave propagation and dynamic rupture based on the arbitrary high-order accurate derivative discontinuous Galerkin method (ADER-DG).

SeqAn:¶

SeqAn is an open source C++ library of efficient algorithms and data structures for the analysis of sequences with the focus on biological data.

SeqAn3:¶

C++ library of efficient algorithms and data structures for the analysis of sequences with the focus on biological data.

SeqKit:¶

Ultrafast toolkit for FASTA/Q file manipulation

SiBELia:¶

A comparative genomics tool for analysing genomic variations that correlate with pathogens, or
microorganisms adapt in different environments.

Siesta:¶

SIESTA is both a method and its computer program implementation, to perform efficient electronic structure calculations and ab initio molecular dynamics simulations of molecules and solids.

SignalP:¶

SignalP predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms

Sniffles:¶

A fast structural variant caller for long-read sequencing.

SortMeRNA:¶

SortMeRNA is a biological sequence analysis tool for filtering, mapping and OTU-picking NGS reads.

SourceTracker:¶

SourceTracker is a Bayesian approach to estimating the proportion of a novel community that comes from a set of source environments.

Spack:¶

Spack is a package manager for supercomputers, Linux, and macOS. It makes installing scientific software easy. With Spack, you can build a package with multiple versions, configurations, platforms, and compilers, and all of these builds can coexist on the same machine.

Spark:¶

Spark is Hadoop MapReduce done in memory

SpectrA:¶

C++ library for large scale eigenvalue problems, built on top of Eigen, an open source linear algebra library.

Spectrum Scale:¶

High-performance clustered file system software developed by IBM.

SqueezeMeta:¶

Fully automated metagenomics pipeline, from reads to bins.

Stacks:¶

Stacks is a software pipeline for building loci from short-read sequences, such as those generated on the Illumina platform. Stacks was developed to work with restriction enzyme-based data, such as RAD-seq, for the purpose of building genetic maps and conducting population genomics and phylogeography.

StringTie:¶

StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts.

Structure:¶

The program structure is a free software package for using multi-locus genotype data to investigate population structure.

Subread:¶

High performance read alignment, quantification and mutation discovery

Subversion:¶

Subversion is an open source version control system. Subversion exists to be universally recognized and adopted as an open-source, centralized version control system characterized by its reliability as a safe haven for valuable data; the simplicity of its model and usage; and its ability to support the needs of a wide variety of users and projects, from individuals to large-scale enterprise operations.

SuiteSparse:¶

SuiteSparse is a collection of libraries manipulate sparse matrices.

SuperLU:¶

Solution of large, sparse, nonsymmetric systems of linear equations.

Supernova:¶

Supernova is a software package for de novo assembly from Chromium Linked-Reads that are made from a single whole-genome library from an individual DNA source

Szip:¶

Szip compression software, providing lossless compression of scientific data

TEtranscripts:¶

Takes RNA-seq (and similar data) and annotates reads to both genes & transposable elements.

TMHMM:¶

Prediction of transmembrane helices in proteins

TOGA:¶

Implements a novel machine learning based paradigm to infer orthologous genes between related species and to accurately distinguish orthologs from paralogs or processed pseudogenes.

TSEBRA:¶

Transcript Selector for BRAKER

TURBOMOLE:¶

Program Package For Electronic Structure Calculations.

TWL-NINJA:¶

Nearly Infinite Neighbor Joining Application.

Tcl:¶

Tcl (Tool Command Language) is a very powerful but easy to learn dynamic programming language, suitable for a very wide range of uses, including web and desktop applications, networking, administration, testing and many more.

TensorFlow:¶

An open-source software library for Machine Intelligence

TensorRT:¶

NVIDIA TensorRT is a platform for high-performance deep learning inference

Tk:¶

Tk is an open source, cross-platform widget toolchain that provides a library of basic elements for building a graphical user interface (GUI) in many different programming languages.

TransDecoder:¶

TransDecoder identifies candidate coding regions within transcript sequences.

TreeMix:¶

TreeMix is a method for inferring the patterns of population splits and mixtures in the history of a set of populations.

TrimGalore:¶

A wrapper of FastQC and cutadapt to automate quality and adapter trimming as well as quality control, with some added functionality to remove biased methylation positions for RRBS sequence files

Trimmomatic:¶

Trimmomatic performs a variety of useful trimming tasks for illumina paired-end and single ended data.The selection of trimming steps and their associated parameters are supplied on the command line.

Trinity:¶

Trinity represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-Seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-Seq reads.

Trinotate:¶

C++ library of efficient algorithms and data structures for the analysis of sequences with the focus on biological data.

Trycycler:¶

Tool for generating consensus long-read assemblies for bacterial genomes.

TuiView:¶

TuiView is a lightweight raster GIS with powerful raster attribute table manipulation abilities.

TurboVNC:¶

TurboVNC is a derivative of VNC (Virtual Network Computing) that is tuned to provide peak performance for 3D and video workloads.

UCC:¶

UCC (Unified Collective Communication) is a collective communication operations API and library that is flexible, complete, and feature-rich for current and emerging programming models and runtimes.

UCC-CUDA:¶

UCC (Unified Collective Communication) is a collective communication operations API and library that is flexible, complete, and feature-rich for current and emerging programming models and runtimes.

This module adds the UCC CUDA support.

UCX:¶

Unified Communication X An open-source production grade communication framework for data centric and high-performance applications

UCX-CUDA:¶

Unified Communication X An open-source production grade communication framework for data centric and high-performance applications

This module adds the UCX CUDA support.

UDUNITS:¶

UDUNITS supports conversion of unit specifications between formatted and binary forms, arithmetic manipulation of units, and conversion of values between compatible scales of measurement.

USEARCH:¶

USEARCH is a unique sequence analysis tool which offers search and clustering algorithms that are often orders of magnitude faster than BLAST.

Unicycler:¶

Assembly pipeline for bacterial genomes. It can assemble Illumina-only read sets where it functions as a SPAdes-optimiser.

VASP:¶

The Vienna Ab initio Simulation Package (VASP) is a computer program for atomic scale materials modelling, e.g. electronic structure calculations and quantum-mechanical molecular dynamics, from first principles.

VBZ-Compression:¶

VBZ compression HDF5 plugin for nanopolish

VCF-kit:¶

VCF-kit is a command-line based collection of utilities for performing analysis on Variant Call Format (VCF) files.

VCFtools:¶

The aim of VCFtools is to provide methods for working with VCF files: validating, merging, comparing and calculate some basic population genetic statistics.

VEP:¶

Variant Effect Predictor (VEP) determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions. Includes EnsEMBL-XS, which provides pre-compiled replacements for frequently used routines in VEP.

VIBRANT:¶

Virus Identification By iteRative ANnoTation

VMD:¶

VMD is a molecular visualization program for displaying, animating, and analyzing large biomolecular systems using 3-D graphics and built-in scripting.

VPN:¶

Method of extending access to a private network.

VSEARCH:¶

An open source alternative to the metagenomics tool USEARCH.

Performs chimera detection, clustering, full-length and prefix dereplication, rereplication, masking, all-vs-all pairwise global alignment, exact and global alignment searching, shuffling, subsampling and sorting. It also supports FASTQ file analysis, filtering, conversion and merging of paired-end reads.

VTune:¶

Intel VTune Amplifier XE is the premier performance profiler for C, C++, C#, Fortran, Assembly and Java.

Valgrind:¶

Valgrind: Debugging and profiling tools

VarScan:¶

Variant calling and somatic mutation/CNV detection for next-generation sequencing data

Velvet:¶

Sequence assembler for very short reads

VelvetOptimiser:¶

Perl script for optimising the three primary parameter options of the Velvet de novo sequence assembler.

ViennaRNA:¶

The Vienna RNA Package consists of a C code library and several stand-alone programs for the prediction and comparison of RNA secondary structures.

Vim:¶

Vim is an advanced text editor that seeks to provide the power of the de-facto Unix editor 'Vi', with a more complete feature set.

VirHostMatcher:¶

Tools for computing various oligonucleotide frequency (ONF) based distance/dissimialrity measures.

VirSorter:¶

VirSorter: mining viral signal from microbial genomic data.

Voro++:¶

Voro++ is a software library for carrying out three-dimensional computations of the Voronoi tessellation. A distinguishing feature of the Voro++ library is that it carries out cell-based calculations, computing the Voronoi cell for each particle individually. It is particularly well-suited for applications that rely on cell-based statistics, where features of Voronoi cells (eg. volume, centroid, number of faces) can be used to analyze a system of particles.

WAAFLE:¶

Workflow to Annotate Assemblies and Find LGT Events.

WhatsHap:¶

Tool for phasing genomic variants using DNA sequencing reads, also called read-based phasing or haplotype assembly.

Winnowmap:¶

Winnowmap is a long-read mapping algorithm, and a result of our exploration into superior minimizer sampling techniques.

Wise2:¶

Aligning proteins or protein HMMs to DNA

XVFB:¶

A display server implementing the X11 display server protocol, XVFB performs all graphical operations in virtual memory without showing any screen output. This allows applications that 'require' a GUI to run in a command line environment. Can be invoked with xvfb-run.

XZ:¶

xz: XZ utilities

Xerces-C++:¶

Xerces-C++ is a validating XML parser written in a portable subset of C++. Xerces-C++ makes it easy to give your application the ability to read and write XML data. A shared library is provided for parsing, generating, manipulating, and validating XML documents using the DOM, SAX, and SAX2 APIs.

YAXT:¶

Yet Another eXchange Tool

Z3:¶

A theorem prover from Microsoft Research.

ZeroMQ:¶

ZeroMQ looks like an embeddable networking library but acts like a concurrency framework. It gives you sockets that carry atomic messages across various transports like in-process, inter-process, TCP, and multicast. You can connect sockets N-to-N with patterns like fanout, pub-sub, task distribution, and request-reply.

Zip:¶

Zip is a compression and file packaging/archive utility. Although highly compatible both with PKWARE's PKZIP and PKUNZIP utilities for MS-DOS and with Info-ZIP's own UnZip, our primary objectives have been portability and other-than-MSDOS functionality

abritamr:¶

AMR gene detection pipeline that runs AMRFinderPlus on a single (or list ) of given isolates

angsd:¶

Program for analysing NGS data.

ant:¶

Apache Ant is a Java library and command-line tool whose mission is to drive processes described in build files as targets and extension points dependent upon each other. The main known usage of Ant is the build of Java applications.

antiSMASH:¶

antiSMASH allows the rapid genome-wide identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genomes.

any2fasta:¶

Convert various sequence formats to FASTA

argtable:¶

Argtable is an ANSI C library for parsing GNU style command line options with a minimum of fuss.

aria2:¶

aria2 is a lightweight multi-protocol & multi-source command-line download utility.

arpack-ng:¶

ARPACK is a collection of Fortran77 subroutines designed to solve large scale eigenvalue problems.

at-spi2-atk:¶

AT-SPI 2 toolkit bridge

at-spi2-core:¶

Assistive Technology Service Provider Interface.

attr:¶

Commands for Manipulating Filesystem Extended Attributes

azul-zulu:¶

Java Development Kit (JDK), and a compliant implementation of the Java Standard Edition (SE) specification.

bamUtil:¶

Repository that contains several programs that perform operations on SAM/BAM files.

barrnap:¶

Barrnap predicts the location of ribosomal RNA genes in genomes.

bcl2fastq2:¶

bcl2fastq Conversion Software both demultiplexes data and converts BCL files generated by Illumina sequencing systems to standard FASTQ file formats for downstream analysis.

beagle-lib:¶

beagle-lib is a high-performance library that can perform the core calculations at the heart of most Bayesian and Maximum Likelihood phylogenetics packages.

best:¶

Bam Error Stats Tool (best): analysis of error types in aligned reads

binutils:¶

binutils: GNU binary utilities

bioawk:¶

An extension to awk, adding the support of several common biological data formats

breseq:¶

breseq is a computational pipeline for the analysis of short-read re-sequencing data

bzip2:¶

bzip2 is a freely available, patent free, high-quality data compressor. It typically compresses files to within 10% to 15% of the best available techniques (the PPM family of statistical compressors), whilst being around twice as fast at compression and six times faster at decompression.

c-ares:¶

c-ares is a C library for asynchronous DNS requests (including name resolves)

cURL:¶

libcurl is a free and easy-to-use client-side URL transfer library, supporting DICT, FILE, FTP, FTPS, Gopher, HTTP, HTTPS, IMAP, IMAPS, LDAP, LDAPS, POP3, POP3S, RTMP, RTSP, SCP, SFTP, SMTP, SMTPS, Telnet and TFTP. libcurl supports SSL certificates, HTTP POST, HTTP PUT, FTP uploading, HTTP form based upload, proxies, cookies, user+password authentication (Basic, Digest, NTLM, Negotiate, Kerberos), file transfer resume, http proxy tunneling and more.

cairo:¶

Cairo is a 2D graphics library with support for multiple output devices. Currently supported output targets include the X Window System (via both Xlib and XCB), Quartz, Win32, image buffers, PostScript, PDF, and SVG file output. Experimental backends include OpenGL, BeOS, OS/2, and DirectFB

cdbfasta:¶

Fasta file indexing and retrival tool

chainforge:¶

Nvidia and AMD GPU utility for SeisSol.

chewBBACA:¶

A complete suite for gene-by-gene schema creation and strain identification..

chopper:¶

Rust implementation of NanoFilt+NanoLyse

cimfomfa:¶

library supports both MCL, a cluster algorithm for graphs, and zoem, a macro/DSL language

code-server:¶

code-server for OpenOnDemand

compleasm:¶

faster and more accurate reimplementation of BUSCO.

cromwell:¶

Workflow Management System geared towards scientific workflows.

csvtk:¶

A cross-platform, efficient and practical CSV/TSV toolkit

ctags:¶

Ctags generates an index (or tag) file of language objects found in source files that allows these items to be quickly and easily located by a text editor or other utility.

ctffind:¶

ctffind is a program for finding CTFs of electron micrographs

cuDNN:¶

The NVIDIA CUDA Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks.

cutadapt:¶

cutadapt removes adapter sequences from high-throughput sequencing data. This is usually necessary when the read length of the sequencing machine is longer than the molecule that is sequenced, for example when sequencing microRNAs.

cuteSV:¶

Fast and scalable long-read-based SV detection

cyvcf2:¶

cython + htslib == fast VCF and BCF processing

dadi:¶

Diffusion Approximation for Demographic Inference

datasets:¶

Tool to gather data from across NCBI databases

deepTools:¶

deepTools is a suite of python tools particularly developed for the efficient analysis of high-throughput sequencing data, such as ChIP-seq, RNA-seq or MNase-seq.

devtools:¶

R functions that simplify and expedite common tasks in package development.

double-conversion:¶

Efficient binary-decimal and decimal-binary conversion routines for IEEE doubles.

drep:¶

Rapid and accurate comparison and de-replication of microbial genomes

dtcmp:¶

DTCMP Library provides pre-defined and user-defined comparison operations to compare the values of two items which can be arbitrary MPI datatypes.

duphold:¶

uphold your DUP and DEL calls

duplex-tools:¶

Range of tools to support operations on Duplex Sequencing read pairs.

eDNA:¶

A suite of tools to conduct metabarcoding analyses targeting any group of organisms. Includes utilities for preprocessing raw data and building your own custom reference database.

easi:¶

easi is a library for the Easy Initialization of models in three (or less or more) dimensional domains.

ecCodes:¶

ecCodes is a package developed by ECMWF which provides an application programming interface and a set of tools for decoding and encoding messages in the following formats: WMO FM-92 GRIB edition 1 and edition 2, WMO FM-94 BUFR edition 3 and edition 4, WMO GTS abbreviated header (only decoding).

edlib:¶

Lightweight, super fast library for sequence alignment using edit (Levenshtein) distance.

eggnog-mapper:¶

Tool for fast functional annotation of novel sequences (genes or proteins) using precomputed eggNOG-based orthology assignments

emmtyper:¶

Tool for emm-typing of Streptococcus pyogenes using a de novo or complete assembly

ensmallen:¶

C++ header-only library for numerical optimization

entrez-direct:¶

an advanced method for accessing the NCBI's set of interconnected databases such as publication, sequence, structure, gene, variation, expression, etc.

exonerate:¶

Generic tool for pairwise sequence comparison

expat:¶

Expat is an XML parser library written in C. It is a stream-oriented parser in which an application registers handlers for things the parser might find in the XML document (like start tags)

fastStructure:¶

fastStructure is an algorithm for inferring population structure from large SNP genotype data. It is based on a variational Bayesian framework for posterior inference and is written in Python2.x.

fastp:¶

A tool designed to provide fast all-in-one preprocessing for FastQ files.

fgbio:¶

A set of tools to analyze genomic data with a focus on Next Generation Sequencing.

fineRADstructure:¶

A package for population structure inference from RAD-seq data

fineSTRUCTURE:¶

Population assignment using large numbers of densely sampled genomes, including both SNP chips and sequence dat

flatbuffers:¶

FlatBuffers: Memory Efficient Serialization Library

flex:¶

Flex (Fast Lexical Analyzer) is a tool for generating scanners. A scanner, sometimes called a tokenizer, is a program which recognizes lexical patterns in text.

fmt:¶

fmt (formerly cppformat) is an open-source formatting library.

fontconfig:¶

Fontconfig is a library designed to provide system-wide font configuration, customization and application access.

foss:¶

GNU Compiler Collection (GCC) based compiler toolchain, including OpenMPI for MPI support, OpenBLAS (BLAS and LAPACK support), FFTW and ScaLAPACK.

freetype:¶

FreeType 2 is a software font engine that is designed to be small, efficient, highly customizable, and portable while capable of producing high-quality output (glyph images). It can be used in graphics libraries, display servers, font conversion tools, text image generation tools, and many other products as well.

funcx-endpoint:¶

funcX is a distributed Function as a Service (FaaS) platform that enables flexible, scalable, and high performance remote function execution. Unlike centralized FaaS platforms, funcX allows users to execute functions on heterogeneous remote computers, from laptops to campus clusters, clouds, and supercomputers. A funcX endpoint is a persistent service launched by the user on a compute system to serve as a conduit for executing functions on that computer.

g2clib:¶

Library contains GRIB2 encoder/decoder ('C' version).

gcloud:¶

Libraries and tools for interacting with Google Cloud products and services.

gemmforge:¶

GPU-GEMM generator for the Discontinuous Galerkin method.

genometools:¶

GenomeTools: A Comprehensive Software Library for Efficient Processing of Structured Genome Annotations.

gettext:¶

GNU 'gettext' is an important step for the GNU Translation Project, as it is an asset on which we may build many other steps. This package offers to programmers, translators, and even users, a well integrated set of tools and documentation

gfastats:¶

single fast and exhaustive tool for summary statistics and simultaneous fa (fasta, fastq, gfa [.gz]) genome assembly file manipulation.

gfatools:¶

Tools for manipulating sequence graphs in the GFA and rGFA formats

gffread:¶

GFF/GTF parsing utility providing format conversions, region filtering, FASTA sequence extraction and more.

giflib:¶

giflib is a library for reading and writing gif images. It is API and ABI compatible with libungif which was in wide use while the LZW compression algorithm was patented.

gimkl:¶

GNU Compiler Collection (GCC) based compiler toolchain with Intel MPI and MKL

gimpi:¶

GNU Compiler Collection (GCC) based compiler toolchain with Intel MPI.

git:¶

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.

globus-compute-endpoint:¶

Globus Compute is a distributed Function as a Service (FaaS) platform that enables flexible, scalable, and high performance remote function execution. Unlike centralized FaaS platforms, Globus Compute allows users to execute functions on heterogeneous remote computers, from laptops to campus clusters, clouds, and supercomputers. A Globus Compute endpoint is a persistent service launched by the user on a compute system to serve as a conduit for executing functions on that computer.

gmsh:¶

Gmsh is a 3D finite element grid generator with a build-in CAD engine and post-processor..

gompi:¶

GNU Compiler Collection (GCC) based compiler toolchain, including OpenMPI for MPI support.

google-sparsehash:¶

An extremely memory-efficient hash_map implementation. 2 bits/entry overhead! The SparseHash library contains several hash-map implementations, including implementations that optimize for space or speed.

googletest:¶

Google's C++ test framework

gperf:¶

Pperfect hash function generator.

h5pp:¶

A simple C++17 wrapper for HDF5.

haplocheck:¶

Detects in-sample contamination in mtDNA or WGS sequencing studies by analyzing the mitchondrial content

hifiasm:¶

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads.

hwloc:¶

The Portable Hardware Locality (hwloc) software package provides a portable abstraction (across OS, versions, architectures, ...) of the hierarchical topology of modern architectures, including NUMA memory nodes, sockets, shared caches, cores and simultaneous multithreading. It also gathers various system attributes such as cache and memory information as well as the locality of I/O devices such as network interfaces, InfiniBand HCAs or GPUs. It primarily aims at helping applications with gathering information about modern computing hardware so as to exploit it accordingly and efficiently.

iccifort:¶

Intel C, C++ & Fortran compilers

iimpi:¶

Intel C/C++ and Fortran compilers, alongside Intel MPI.

imkl:¶

Intel oneAPI Math Kernel Library

imkl-FFTW:¶

FFTW interfaces using Intel oneAPI Math Kernel Library

impalajit:¶

A lightweight JIT compiler for flexible data access in simulation applications

impi:¶

Intel MPI Library, compatible with MPICH ABI

intel:¶

Compiler toolchain including Intel compilers, Intel MPI and Intel Math Kernel Library (MKL).

intel-compilers:¶

Intel C, C++ & Fortran compilers (classic and oneAPI)

iofbf:¶

Intel based compiler toolchain, including OpenMPI for MPI support, FlexiBLAS (Defaulting to OpenBLAS), FFTW and ScaLAPACK.

iompi:¶

GNU Compiler Collection (GCC) based compiler toolchain, including OpenMPI for MPI support.

ipyrad:¶

ipyrad is an interactive toolkit for assembly and analysis of restriction-site associated genomic data sets (e.g., RAD, ddRAD, GBS) for population genetic and phylogenetic studies.

jbigkit:¶

JBIG-KIT is a software implementation of the JBIG1 data compression standard

jcvi:¶

Collection of Python libraries to parse bioinformatics files, or perform computation related to assembly, annotation, and comparative genomics.

jemalloc:¶

A general purpose malloc(3) implementation that emphasizes fragmentation avoidance and scalable concurrency support.

jq:¶

Lightweight and flexible command-line JSON processor.

json-c:¶

JSON-C implements a reference counting object model that allows you to easily construct JSON objects in C, output them as JSON formatted strings and parse JSON formatted strings back into the C representation of JSON objects.

json-fortran:¶

JSON-Fortran: A Modern Fortran JSON API

jvarkit:¶

Java utilities for Bioinformatics

kalign2:¶

Kalign is a fast multiple sequence alignment program for biological sequences.

kallisto:¶

kallisto is a program for quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.

kineto:¶

A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters

kma:¶

KMA is a mapping method designed to map raw reads directly against redundant databases, in an ultra-fast manner using seed and extend.

libFLAME:¶

libFLAME is a portable library for dense matrix computations, providing much of the functionality present in LAPACK.

libGLU:¶

The OpenGL Utility Library (GLU) is a computer graphics library for OpenGL.

libKML:¶

Reference implementation of OGC KML 2.2

libStatGen:¶

Set of classes for creating statistical genetic programs.

libaec:¶

Libaec provides fast lossless compression of 1 up to 32 bit wide signed or unsigned integers (samples)

libarchive:¶

Multi-format archive and compression library

libcircle:¶

API for distributing embarrassingly parallel workloads using self-stabilization.

libdeepmd:¶

Deep learning-based models of interatomic potential energy and force field

libdeflate:¶

Heavily optimized library for DEFLATE/zlib/gzip compression and decompression.

libdrm:¶

Direct Rendering Manager runtime library.

libdwarf:¶

The DWARF Debugging Information Format is of interest to programmers working on compilers and debuggers

libepoxy:¶

Library for handling OpenGL function pointer management

libevent:¶

The libevent API provides a mechanism to execute a callback function when a specific event occurs on a file descriptor or after a timeout has been reached. Furthermore, libevent also support callbacks due to signals or regular timeouts.

libffi:¶

The libffi library provides a portable, high level programming interface to various calling conventions. This allows a programmer to call any function specified by a call interface description at run-time.

libgcrypt:¶

Libgpg-error is a small library that defines common error values for all GnuPG components.

libgd:¶

GD is an open source code library for the dynamic creation of images by programmers.

libgeotiff:¶

Library for reading and writing coordinate system information from/to GeoTIFF files

libgit2:¶

libgit2 is a portable, pure C implementation of the Git core methods provided as a re-entrant linkable library with a solid API, allowing you to write native speed custom Git applications in any language which supports C bindings.

libglvnd:¶

libglvnd is a vendor-neutral dispatch layer for arbitrating OpenGL API calls between multiple vendors.

libgpg-error:¶

Libgpg-error is a small library that defines common error values for all GnuPG components.

libgpuarray:¶

Arrays on GPU device memory, for Theano

libgtextutils:¶

ligtextutils is a dependency of fastx-toolkit

libiconv:¶

Libiconv converts from one character encoding to another through Unicode conversion

libjpeg-turbo:¶

libjpeg-turbo is a fork of the original IJG libjpeg which uses SIMD to accelerate baseline JPEG compression and decompression. libjpeg is a library that implements JPEG image encoding, decoding and transcoding.

libpciaccess:¶

Generic PCI access library.

libpng:¶

libpng is the official PNG reference library

libreadline:¶

The GNU Readline library provides a set of functions for use by applications that allow users to edit command lines as they are typed in. Both Emacs and vi editing modes are available. The Readline library includes additional functions to maintain a list of previously-entered command lines, to recall and perhaps reedit those lines, and perform csh-like history expansion on previous commands.

libsodium:¶

library for encryption, decryption, signatures, password hashing and more.

libspatialite:¶

SpatiaLite is an open source library intended to extend the SQLite core to support fully fledged Spatial SQL capabilities.

libtool:¶

GNU libtool is a generic library support script. Libtool hides the complexity of using shared libraries behind a consistent, portable interface.

libunwind:¶

Define a portable and efficient C programming API to determine the call-chain of a program.

libvdwxc:¶

libvdwxc is a general library for evaluating energy and potential for exchange-correlation (XC) functionals from the vdW-DF family that can be used with various of density functional theory (DFT) codes.

libxc:¶

Libxc is a library of exchange-correlation functionals for density-functional theory. The aim is to provide a portable, well tested and reliable set of exchange and correlation functionals.

libxml2:¶

Libxml2 is the XML C parser and toolchain developed for the Gnome project (but usable outside of the Gnome platform).

libxslt:¶

Libxslt is the XSLT C library developed for the GNOME project (but usable outside of the Gnome platform).

libxsmm:¶

LIBXSMM is a library for small dense and small sparse matrix-matrix multiplications targeting Intel Architecture (x86).

libzstd:¶

Fast lossless compression algorithm.

lighttpd:¶

A web server.

likwid:¶

Command line tools for Linux to support programmers in developing high performance multi threaded programs.

lp_solve:¶

Mixed Integer Linear Programming (MILP) solver.

lwgrp:¶

The light-weight group library defines data structures and collective operations to group MPI processes as an ordered set.

lz4:¶

LZ4 is lossless compression algorithm, providing compression speed at 400 MB/s per core. It features an extremely fast decoder, with speed in multiple GB/s per core.

maf_stream:¶

Collection of utilities to manipulate multiple alignments in the Multiple Alignment Format

magma:¶

The MAGMA project aims to develop a dense linear algebra library similar to LAPACK but for heterogeneous/hybrid architectures, starting with current Multicore+GPU systems.

manta:¶

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads.

mapDamage:¶

tracks and quantifies DNA damage patterns among ancient DNA sequencing reads generated by Next-Generation Sequencing platforms.

matlab-proxy:¶

Python package which enables you to launch MATLAB and access it from a web browser.

mctc-lib:¶

Common tool chain for working with molecular structure data in various applications. This library provides a unified way to perform operations on molecular structure data, like reading and writing to common geometry file formats.

medaka:¶

Medaka is a tool to create a consensus sequence from nanopore sequencing data.

megalodon:¶

Tool to extract high accuracy modified base and sequence variant calls from raw nanopore reads by anchoring the information rich basecalling neural network output to a reference genome/transcriptome.

metaWRAP:¶

Flexible pipeline for genome-resolved metagenomic data analysis.

miRDeep2:¶

Completely overhauled tool which discovers microRNA genes by analyzing sequenced RNAs

miniBUSCO:¶

faster and more accurate reimplementation of BUSCO.

miniasm:¶

Fast OLC-based de novo assembler for noisy long reads.

minigraph:¶

Sequence-to-graph mapper and graph generator

minimap2:¶

Minimap2 is a fast sequence mapping and alignment program that can find overlaps between long noisy reads, or map long reads or their assemblies to a reference genome optionally with detailed alignment (i.e. CIGAR). At present, it works efficiently with query sequences from a few kilobases to ~100 megabases in length at an error rate ~15%. .

miniprot:¶

Aligns a protein sequence against a genome with affine gap penalty, splicing and frameshift..

modbam2bed:¶

A program to aggregate modified base counts stored in a modified-base BAM file to a bedMethyl file.

modkit:¶

Tool for working with modified bases from Oxford Nanopore

mosdepth:¶

Fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing

mpcci:¶

MpCCI is a vendor neutral and application independent interface for co-simulation. MpCCI offers advanced and proven features for multiphysics modelling.

muParser:¶

muParser is an extensible high performance math expression parser library written in C++. It works by transforming a mathematical expression into bytecode and precalculating constant parts of the expression.

nanoQC:¶

Create fastQC-like plots for Oxford Nanopore sequencing data.

nanofilt:¶

Filtering and trimming of long read sequencing data.

nanoget:¶

Functions to extract information from Oxford Nanopore sequencing data and alignments

nanomath:¶

A few simple math function for other Oxford Nanopore processing scripts

nanopolish:¶

Software package for signal-level analysis of Oxford Nanopore sequencing data.

ncbi-vdb:¶

The SRA Toolkit and SDK from NCBI is a collection of tools and libraries for using data in the INSDC Sequence Read Archives.

ncurses:¶

The Ncurses (new curses) library is a free software emulation of curses in System V Release 4.0, and more. It uses Terminfo format, supports pads and color and multiple highlights and forms characters and function-key mapping, and has all the other SYSV-curses enhancements over BSD Curses.

ncview:¶

Ncview is a visual browser for netCDF format files. Typically you would use ncview to get a quick and easy, push-button look at your netCDF files. You can view simple movies of the data, view along various dimensions, take a look at the actual data values, change color maps, invert the data, etc.

ne:¶

ne is a free (GPL'd) text editor based on the POSIX standard that runs (we hope) on almost any UN*X machine. ne is easy to use for the beginner, but powerful and fully configurable for the wizard, and most sparing in its resource usage.

netCDF:¶

NetCDF (network Common Data Form) is a set of software libraries and machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data.

netCDF-C++:¶

NetCDF (network Common Data Form) is a set of software libraries and machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data.

netCDF-C++4:¶

NetCDF (network Common Data Form) is a set of software libraries and machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data.

netCDF-Fortran:¶

NetCDF (network Common Data Form) is a set of software libraries and machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data.

nodejs:¶

Node.js is a platform built on Chrome's JavaScript runtime for easily building fast, scalable network applications. Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient, perfect for data-intensive real-time applications that run across distributed devices.

nseg:¶

Used to mask nucleic acid sequences

nsync:¶

nsync is a C library that exports various synchronization primitives, such as mutexes

nullarbor:¶

Reads to report pipeline for bacterial isolate NGS data.

numactl:¶

The numactl program allows you to run your application program on specific cpu's and memory nodes. It does this by supplying a NUMA memory policy to the operating system before running your program. The libnuma library provides convenient ways for you to add NUMA memory policies into your own program.

ont-guppy-gpu:¶

Data processing toolkit that contains the Oxford Nanopore Technologies' basecalling algorithms, and several bioinformatic post-processing features

padloc:¶

Prokaryotic Antiviral Defence LOCator

pairtools:¶

CLI tools to process mapped Hi-C data

panaroo:¶

A pangenome analysis pipeline.

pandoc:¶

Almost universal document converter

parallel-fastq-dump:¶

parallel fastq-dump wrapper

patchelf:¶

PatchELF is a small utility to modify the dynamic linker and RPATH of ELF executables.

pauvre:¶

Tools for plotting Oxford Nanopore and other long-read data.

pggb:¶

PanGenome Graph Builder(pggb)

pgge:¶

pangenome graph evaluator

phonopy:¶

Phonopy is an open source package of phonon calculations based on the supercell approach.

phyx:¶

phyx performs phylogenetics analyses on trees and sequences.

picard:¶

A set of tools (in Java) for working with next generation sequencing data in the BAM format.

pigz:¶

parallel implementation of gzip,

pixman:¶

Pixman is a low-level software library for pixel manipulation, providing features such as image compositing and trapezoid rasterization. Important users of pixman are the cairo graphics library and the X server.

pod5:¶

File format for storing nanopore dna data in an easily accessible way.

pplacer:¶

Places query sequences on a fixed reference phylogenetic tree to maximize phylogenetic likelihood or posterior probability according to a reference alignment

preseq:¶

Software for predicting library complexity and genome coverage in high-throughput sequencing.

prodigal:¶

prodigal-gv:¶

A fork of Prodigal meant to improve gene calling for giant viruses and viruses that use alternative genetic codes.

prokka:¶

Prokka is a software tool for the rapid annotation of prokaryotic genomes.

protobuf:¶

Google Protocol Buffers.

psmc:¶

Infers population size history from a diploid sequence using the PSMC model.

pstoedit:¶

pstoedit translates PostScript and PDF graphics into other vector formats

pullseq:¶

Utility program for extracting sequences from a fasta/fastq file

purge_dups:¶

purge haplotigs and overlaps in an assembly based on read depth

purge_haplotigs:¶

Pipeline to help with curating heterozygous diploid genome assemblies

pv:¶

Monitors the progress of data through a unix pipeline.

pyani:¶

Whole-genome classification using Average Nucleotide Identity

pycoQC:¶

Computes metrics and generates interactive QC plots for Oxford Nanopore technologies sequencing data.

pymol-open-source:¶

PyMOL (open source version) molecular visualization system.

qcat:¶

Command-line tool for demultiplexing Oxford Nanopore reads from FASTQ files

randfold:¶

Minimum free energy of folding randomization test software

rasusa:¶

Randomly subsample sequencing reads to a specified coverage.

rclone:¶

Rclone is a command line program to sync files and directories to and from a variety of online storage services

re2c:¶

re2c is a free and open-source lexer generator for C and C++. Its main goal is generating fast lexers: at least as fast as their reasonably optimized hand-coded counterparts. Instead of using traditional table-driven approach, re2c encodes the generated finite state automata directly in the form of conditional jumps and comparisons.

rnaQUAST:¶

Tool for evaluating RNA-Seq assemblies using reference genome and gene database

samblaster:¶

samblaster is a fast and flexible program for marking duplicates in read-id grouped paired-end SAM files. It can also optionally output discordant read pairs and/or split read mappings to separate SAM files, and/or unmapped/clipped reads to a separate FASTQ file. When marking duplicates, samblaster will require approximately 20MB of memory per 1M read pairs.

sbt:¶

sbt is a build tool for Scala, Java, and more.

sc-RNA:¶

Bioconductor bundle for single-cell RNA-Seq Data analysis

screen_assembly:¶

Pipeline that screens for presence of genes of interest (GOI) in bacterial assemblies.

seqmagick:¶

Seqmagick is a utility built in the spirit of imagemagick to expose the file format conversion in Biopython in a convenient way. Instead of having a big mess of scripts, there is one that takes arguments.

seqtk:¶

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. It seamlessly parses both FASTA and FASTQ files which can also be optionally compressed by gzip.

shrinkwrap:¶

A std::streambuf wrapper for compression formats.

simuG:¶

A general-purpose genome simulator

sismonr:¶

Simulation of In Silico Multi-Omic Networks R package.

skani:¶

accurate, fast nucleotide identity calculation for MAGs, genomes, and databases

slow5tools:¶

Toolkit for converting (FAST5 <-> SLOW5), compressing, viewing, indexing and manipulating data in SLOW5 format.

smoove:¶

simplifies and speeds calling and genotyping SVs for short reads.

snakemake:¶

The Snakemake workflow management system is a tool to create reproducible and scalable data analyses.

snappy:¶

Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression.

snp-sites:¶

Finds SNP sites from a multi-FASTA alignment file.

snpEff:¶

SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of genetic variants (such as amino acid changes).

somalier:¶

extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF

spaln:¶

Stand-alone program that maps and aligns a set of cDNA or protein sequences onto a whole genomic sequence in a single job.

spdlog:¶

Fast C++ logging library.

spoa:¶

c++ implementation of the partial order alignment (POA) algorithm which is used to generate consensus sequences

sratoolkit:¶

The SRA Toolkit, and the source-code SRA System Development Kit (SDK), will allow you to programmatically access data housed within SRA and convert it from the SRA format

supercomputer:¶

Like a regular computer, but larger. Primarily used for heating data centers.

supercomputing:¶

Like a regular computer, but larger. Primarily used for heating data centers.

swarm:¶

A robust and fast clustering method for amplicon-based studies. The purpose of swarm is to provide a novel clustering algorithm that handles massive sets of amplicons. Results of traditional clustering algorithms are strongly input-order dependent, and rely on an arbitrary global clustering threshold. swarm results are resilient to input-order changes and rely on a small local linking threshold d, representing the maximum number of differences between two amplicons.

tRNAscan-SE:¶

Transfer RNA detection

tabix:¶

Generic indexer for TAB-delimited genome position files

tabixpp:¶

C++ wrapper to tabix indexer

tbb:¶

Intel(R) Threading Building Blocks (Intel(R) TBB) lets you easily write parallel C++ programs that take full advantage of multicore performance, that are portable, composable and have future-proof scalability.

tbl2asn:¶

Command-line program that automates the creation of sequence records for submission to GenBank

tmux:¶

tmux is a terminal multiplexer. It lets you switch easily between several programs in one terminal, detach them (they keep running in the background) and reattach them to a different terminal.

tomo:¶

This code computes 2D Travel Time Tomography using the Reversible Jump algorithm with a Voronoi cell parameterisation.

trf:¶

Locates tandem repeats in DNA sequences.

trimAl:¶

Tool for automated alignment trimming in large-scale phylogenetic analyses

unimap:¶

Fork of minimap2 optimized for assembly-to-reference alignment.

unrar:¶

RAR is a powerful archive manager.

util-linux:¶

Set of Linux utilities

vcflib:¶

Genetic variant detector designed to find polymorphisms smaller than the length of a short-read sequencing alignment.

verkko:¶

Hybrid genome assembly pipeline developed for telomere-to-telomere assembly of PacBio HiFi and Oxford Nanopore reads

vg:¶

variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods

wannier90:¶

Wannier90 is an open-source code for generating maximally-localized Wannier functions and using them to compute advanced electronic properties of materials with high efficiency and accuracy.

wgsim:¶

Wgsim is a small tool for simulating sequence reads from a reference genome.

wheel:¶

A built-package format for Python.

wtdbg:¶

de novo sequence assembler for long noisy reads produced by PacBio or Oxford Nanopore Technologies.

wxWidgets:¶

widget toolkit and tools library for creating graphical user interfaces for cross-platform applications.

x264:¶

x264 is a free software library and application for encoding video streams into the H.264/MPEG-4 AVC compression format, and is released under the terms of the GNU GPL.

x265:¶

x265 is a free software library and application for encoding video streams into the H.265 AVC compression format, and is released under the terms of the GNU GPL.

xPore:¶

A Python package for identification and quantification of differential RNA modifications from direct RNA sequencing

xkbcommon:¶

keyboard keymap compiler and support library

xtb:¶

xtb - An extended tight-binding semi-empirical program package.

yacrd:¶

Chimeric Read Detector for long reads

yajl:¶

Yet Another JSON Library. Why does the world need another C library for parsing JSON? Good question.

yak:¶

Yet another k-mer analyzer

yaml-cpp:¶

YAML parser and emitter in C++

zlib:¶

zlib is designed to be a free, general-purpose, legally unencumbered -- that is, not covered by any patents -- lossless data-compression library for use on virtually any computer hardware and operating system.

zstd:¶

Zstandard is a real-time compression algorithm, providing high compression ratios. It offers a very wide range of compression/speed trade-off, while being backed by a very fast decoder. It also offers a special mode for small data, called dictionary compression, and can create dictionaries from any sample set.