MPI4All
MPI is the predominant and most extensively utilized programming model in the HPC area. The standard only provides bindings for the low-level programming languages C, C++, and Fortran. While efforts are being made to offer MPI bindings for other programming languages, the support provided may be limited, potentially resulting in functionality gaps, performance overhead, and compatibility problems. To deal with those issues, we introduce MPI4All, a novel tool aimed at simplifying the process of creating efficient MPI bindings for any programming language. MPI4All is not dependent on the MPI implementation, and adding support for new languages does not require significant effort. The current version of MPI4All includes binding generators for Java and Go programming languages.
Citation:
César Piñeiro, Álvaro Vázquez and Juan C. Pichel. Towards universal MPI bindings for enhanced new language support.
Journal of Computational Science, Vol. 87, 2025.
César Piñeiro, Álvaro Vázquez and Juan C. Pichel. MPI4All: universal binding generation for MPI parallel programming.
24th Int. Conf. on Computational Science (ICCS), 2024.
BigSeqKit
BigSeqKit is a parallel toolkit to manipulate FASTA/Q files at scale with speed and scalability at its core. BigSeqKit takes advantage of an HPC-Big Data framework (IgnisHPC) to parallelize and optimize the commands included in seqkit. In this way, in most cases, it is from tens to hundreds of times faster than other state-of-the-art tools such as seqkit, samtools, and pyfastx.
At the same time, our tool is easy to use and install on any kind of hardware platform (single server or cluster). Routines in BigSeqKit can be used as a bioinformatics library or from the command line.
In order to improve usability and facilitate the adoption of BigSeqKit, it implements the same command interface as seqkit.
Citation:
César Piñeiro and Juan C. Pichel. BigSeqKit: a parallel Big Data toolkit to process FASTA and FASTQ files at scale.
GigaScience, Vol. 12, 2023.
PyPlexity
This package provides a simple interface to apply Perplexity filters to any text document.
A possible use case for this technology could be the removal of boilerplate
(sentences with a high perplexity score): ads, incomplete or noisy text, and remnants of the
navigation structure, such as menus or navigation bars. Furthermore, it provides a rough HTML tag
cleaner and a WARC and HTML bulk processor, with distributed capabilities.
Citation:
Marcos Fernández-Pichel, Manuel Prada-Corral, David E. Losada, Juan C. Pichel, and Pablo Gamallo.
An Unsupervised Perplexity-based Method for Boilerplate Removal.
Natural Language Engineering, Vol. 30, 2024.
IgnisHPC
IgnisHPC is a framework whose main objective is to unify the execution of Big Data and HPC workloads in the same computing engine. IgnisHPC has native support for multi-language applications using JVM and non-JVM-based languages. Currently, it supports C, C++, Python, Go, and Java. Since MPI was used as its backbone technology, IgnisHPC allows MPI applications and libraries to be directly executed efficiently within the framework. The experimental evaluation demonstrates the benefits of our proposal in terms of performance and productivity over other frameworks such as Spark. For example, on a 12-node cluster with 2 × Intel Xeon E5-2630v4 (2.2Ghz, 10 cores) per node, the experimental results show:
Application |
No. times faster than Spark |
Minebench | 3.87x [Python & C++], 1.26x [Python] |
TeraSort | 1.76x [C++], 1.35x [Python] |
K-Means | 1.94x [Python & C++] |
PageRank | 1.10x [Python] |
Transitive Closure | 1.12x [Python] |
IgnisHPC is publicly available for the Big Data and HPC research community.
Citation:
César Piñeiro and Juan C. Pichel. A Unified Framework to Improve the Interoperability between HPC and Big Data Languages and Programming Models.
Future Generation Computing Systems, Vol. 134, 2022.
VeryFastTree
VeryFastTree is a new tool designed for efficient phylogenetic tree inference, specifically tailored to handle massive taxonomic datasets.
It is a highly-tuned implementation based on the FastTree-2 tool that takes advantage of parallelization and vectorization strategies to speed up the inference of phylogenies for huge alignments. For example, VeryFastTree (v4.0 - July 2023) can construct a tree on one server (two 32-core Intel Xeon Ice Lake 8352Y processors) using single-precision arithmetic from an **ultra-large one-million taxa alignment in just 36 hours. In contrast, VeryFastTree-3.0 and FastTree-2 require over 5 days for the same task.
That means VeryFastTree-4.0 is over 3x faster than its previous version and FastTree-2.
VeryFastTree is available as a package in:
Bioconda,
MacPorts, and Debian Linux distributions.
It also provides Python bindings.
Citations:
César Piñeiro and Juan C. Pichel. Efficient phylogenetic tree inference for massive taxonomic datasets: harnessing the power of a server to analyze 1 million taxa.
GigaScience, Vol. 13, pages 1-12, 2024.
César Piñeiro, José M. Abuín, and Juan C. Pichel. VeryFastTree: speeding up the estimation of phylogenies for large alignments through parallelization and vectorization strategies.
Bioinformatics, Vol. 36, Issue 17, pages 4658-4659, 2020.
PASTASpark
PASTASpark is a tool that uses the Big Data engine Apache Spark to boost the performance of the alignment phase of
PASTA (Practical Alignments using SATé and TrAnsitivity). PASTASpark guarantees scalability and fault tolerance,
allowing users to obtain Multiple Sequence Alignments (MSAs) from very large datasets in reasonable time.
Citation:
José M. Abuín, Tomás F. Pena, and Juan C. Pichel.
PASTASpark: multiple sequence alignment meets Big Data.
Bioinformatics, Vol. 33, Issue 18, pp. 2948-2950, 2017.
SparkBWA
SparkBWA is a new tool that exploits the capabilities of Big Data technology like
Apache Spark to boost the performance of one of the most widely adopted DNA sequence aligners,
the Burrows-Wheeler Aligner (BWA).
Citation:
José M. Abuín, Juan C. Pichel, Tomás F. Pena, and Jorge Amigo.
SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data.
PLoS ONE, Vol. 11, Issue 5, pp. 1-21, 2016.
Citation:
José M. Abuín, Juan C. Pichel, Tomás F. Pena, and Jorge Amigo.
BigBWA: Approaching the Burrows-Wheeler Aligner to Big Data Technologies.
Bioinformatics, Vol. 31, Issue 24, pp. 4003-4005, 2015.