IgnisHPC
IgnisHPC is a framework whose main objective is to unify the execution of Big Data and HPC workloads in the same computing engine. IgnisHPC has native support for multi-language applications using JVM and non-JVM-based languages. Currently it supports C, C++, Python and Java. Since MPI was used as its backbone technology, IgnisHPC allows MPI applications and libraries to be directly executed in an efficient way in the framework. The main consequence is that users could combine in the same multi-language code HPC tasks (using MPI) with Big Data tasks (using MapReduce operations). The experimental evaluation demonstrates the benefits of our proposal in terms of performance and productivity with respect to other frameworks such as Spark. For example, considering a 12-node cluster with 2 × Intel Xeon E5-2630v4 (2.2Ghz, 10 cores) per node, the experimental results show that:
Application |
No. times faster than Spark |
Minebench |
3.87x [Python & C++], 1.26x [Python] |
TeraSort |
1.76x [C++], 1.35x [Python] |
K-Means |
1.94x [Python & C++] |
PageRank |
1.10x [Python] |
Transitive Closure |
1.12x [Python] |
IgnisHPC is publicly available for the Big Data and HPC research community.
Citation: César Piñeiro and Juan C. Pichel. A Unified Framework to Improve the Interoperability between HPC and Big Data Languages and Programming Models. Future Generation Computing Systems, Vol. 134, 2022.
VeryFastTree
VeryFastTree is a highly-tuned implementation of the FastTree-2 tool that takes advantage of parallelization and vectorization strategies to speed up the inference of phylogenies for huge alignments. It is important to highlight that VeryFastTree keeps unchanged the phases, methods and heuristics used by FastTree-2 to estimate the phylogenetic tree. In this way, it produces trees with the same topological accuracy than FastTree-2. In addition, unlike the parallel version of FastTree-2, VeryFastTree is deterministic. Regarding the performance, for example, VeryFastTree is able to construct a tree on a standard server (12-core Intel Xeon E5-2680v3 processor and 128 GiB of memory) using double precision arithmetic from an ultra-large 330k alignment in only 4.5 hours, which is 7.8× and 3.5× faster than the sequential and best parallel FastTree-2 times, respectively.
Citation: César Piñeiro, José M. Abuín and Juan C. Pichel. VeryFastTree: speeding up the estimation of phylogenies for large alignments through parallelization and vectorization strategies. Bioinformatics, Vol. 36, Issue 17, pages 4658-4659, 2020.
Perldoop2
The most relevant Big Data frameworks do not support natively the Perl language. To take advantage of these Big Data engines Perl programmers should port their applications to Java or Scala, which requires a huge effort, or use utilities as Hadoop Streaming with the corresponding degradation in the performance. For this reason we introduce Perldoop2, a Big Data-oriented Perl-Java source-to-source compiler. The compiler is able to generate Java code from Perl applications for sequential execution, but also for running on clusters taking advantage of Hadoop, Spark and Storm engines.
Citation: César Piñeiro, José M. Abuín and Juan C. Pichel. Perldoop2: a Big Data-oriented source-to-source Perl-Java compiler. IEEE Int. Conference on Big Data Intelligence and Computing (DataCom), pp. 933-940, 2017.
PASTASpark
PASTASpark is a tool that uses the Big Data engine Apache Spark to boost the performance of the alignment phase of PASTA (Practical Alignments using SATé and TrAnsitivity). PASTASpark guarantees scalability and fault tolerance, and allows to obtain MSAs from very large datasets in reasonable time.
Citation: José M. Abuín, Tomás F. Pena and Juan C. Pichel. PASTASpark: multiple sequence alignment meets Big Data.
Bioinformatics, Vol. 33, Issue 18, pp. 2948-2950, 2017.
SparkBWA
SparkBWA is a new tool that exploits the capabilities of a Big Data technology as Apache Spark to boost the performance of one of the most widely adopted DNA sequence aligner, the Burrows-Wheeler Aligner (BWA).
Citation: José M. Abuín, Juan C. Pichel, Tomás F. Pena and Jorge Amigo. SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data.
PLoS ONE, Vol. 11, Issue 5, pp. 1-21, 2016.
Citation: José M. Abuín, Juan C. Pichel, Tomás F. Pena and Jorge Amigo. BigBWA: Approaching the Burrows-Wheeler Aligner to Big Data Technologies.
Bioinformatics, Vol. 31, Issue 24, pp. 4003-4005, 2015.