Juan C. Pichel

Bio

Currently I am an associate professor at CITIUS in University of Santiago de Compostela (Galicia, Spain). My research interests include parallel and distributed computing, Big Data technologies, programming models and software optimization techniques for emerging architectures. I received the B.Sc. in physics and the Ph.D. in computer science (2006) from University of Santiago de Compostela (Spain). I was a visiting postdoctoral researcher at University Carlos III de Madrid (Spain) and University of Illinois at Urbana-Champaign (USA), and I also worked as researcher and project manager at Galicia Supercomputing Center (Spain).

Publications

Journals

NetQIR: An extension of QIR for distributed quantum computing
F. Javier Cardama, Jorge Vázquez-Péreza, César Piñeiro, Natalia Costas, Tomás F. Pena, Juan C. Pichel and Andrés Gómez
Future Generation Computer Systems, Vol. 174, 2025.
Review of Distributed Quantum Computing: From single QPU to High Performance Quantum Computing
David Barral, F. Javier Cardama, Guillermo Díaz-Camacho, Daniel Faílde, Iago F. Llovo, Mariamo Mussa-Juane, Jorge Vázquez-Péreza, Juan Villasuso, César Piñeiro, Natalia Costas, Juan C. Pichel, Tomás F. Pena and Andrés Gómez
Computer Science Review, Vol. 57, 2025.
Evaluating search engines and large language models for answering health questions
Marcos Fernández-Pichel, Juan C. Pichel and David E. Losada.
npj Digital Medicine, Vol. 8, 2025.
Towards universal MPI bindings for enhanced new language support
César Piñeiro, Álvaro Vázquez and Juan C. Pichel.
Journal of Computational Science, Vol. 87, 2025.
InQASM: InQuIR compiler to NetQASM
Jorge Vázquez‑Pérez, F. Javier Cardama, César Piñeiro, Juan C. Pichel, Tomás F. Pena and Andrés Gómez
Journal of Supercomputing, Vol. 81, 2025.
Review of Intermediate Representations for Quantum Computing
F. Javier Cardama, Jorge Vázquez‑Pérez, César Piñeiro, Juan C. Pichel, Tomás F. Pena and Andrés Gómez
Journal of Supercomputing, Vol. 81, 2025.
Efficient phylogenetic tree inference for massive taxonomic datasets: harnessing the power of a server to analyze 1 million taxa
César Piñeiro and Juan C. Pichel.
GigaScience, Vol. 13, pages 1-12, 2024.
QPU integration in OpenCL for heterogeneous programming
Jorge Vázquez-Pérez, César Piñeiro, Juan C. Pichel, Tomás F. Pena and Andrés Gómez.
Journal of Supercomputing, Vol. 80, 2024.
An Unsupervised Perplexity-based Method for Boilerplate Removal
Marcos Fernández-Pichel, Manuel Prada-Corral, David E. Losada, Juan C. Pichel and Pablo Gamallo.
Natural Language Engineering, Vol. 30, pages 132-149, 2024.
BigSeqKit: a parallel Big Data toolkit to process FASTA and FASTQ files at scale
César Piñeiro and Juan C. Pichel.
GigaScience, Vol. 12, pages 1-12, 2023.
A machine learning approach to model the impact of line edge roughness on gate-all- around nanowire FETs while reducing the carbon footprint
Antonio García-Loureiro, Natalia Seoane, Julián G. Fernández, Enrique Comesaña and Juan C. Pichel.
PLoS ONE, Vol. 18, Issue 7, pages 1-17, 2023.
An Accurate Machine Learning Model to Study the Impact of Realistic Metal Grain Granularity on Nanosheet FETs
Julián G. Fernández, Natalia Seoane, Enrique Comesaña, Juan C. Pichel and Antonio García-Loureiro
Solid State Electronics, pages 108710, 2023.
A Multistage Retrieval System for Health-related Misinformation Detection
Marcos Fernández-Pichel, David E. Losada and Juan C. Pichel.
Engineering Applications of Artificial Intelligence, Vol. 115, pages 1-17, 2022.
A Unified Framework to Improve the Interoperability between HPC and Big Data Languages and Programming Models
César Piñeiro and Juan C. Pichel.
Future Generation Computer Systems, Vol. 134, pages 123-139, 2022.
Real-Time Focused Extraction of Social Media Users
Rodrigo Martínez-Castaño, David E. Losada and Juan C. Pichel.
IEEE Access, Vol. 10, pages 42607-42622, 2022.
A Big Data Platform for Real Time Analysis of Signs of Depression in Social Media
Rodrigo Martínez-Castaño, Juan C. Pichel and David E. Losada.
Int. Journal of Environmental Research and Public Health, Vol. 17 (3), 2020.
VeryFastTree: Speeding Up the Estimation of Phylogenies for Large Alignments through Parallelization and Vectorization Strategies
César Piñeiro, José M. Abuín and Juan C. Pichel.
Bioinformatics, Vol. 36, Issue 17, pages 4658-4659, 2020.
A Big Data Approach to Metagenomics for All-food-sequencing
Robin Kobus, José M. Abuín, André Müller, Sören Lukas Hellmann, Juan C. Pichel, Tomás F. Pena, Andreas Hildebrandt, Thomas Hankeln and Bertil Schmidt.
BMC Bioinformatics, Vol. 21 (102), 2020.
Ignis: An efficient and scalable multi-language Big Data framework
César Piñeiro, Rodrigo Martínez-Castaño and Juan C. Pichel.
Future Generation Computer Systems, Vol. 105, pages 705-716, 2020.
Sparse Matrix Classification on Imbalanced Datasets using Convolutional Neural Networks
Juan C. Pichel and Beatriz Pateiro-López.
IEEE Access, Vol. 7, pages 82377-82389, 2019.
PASTASpark: multiple sequence alignment meets Big Data
José M. Abuín, Tomás F. Pena and Juan C. Pichel.
Bioinformatics, Vol. 33, Issue 18, pages 2948-2950, 2017.
SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data
José M. Abuín, Juan C. Pichel, Tomás F. Pena and Jorge Amigo.
PLoS ONE, Vol. 11, Issue 5, pages 1-21, 2016.
Boosting Performance of a Statistical Machine Translation System Using Dynamic Parallelism
M. Fernández, Juan C. Pichel, José C. Cabaleiro and Tomás F. Pena.
Journal of Computational Science, Vol. 13, pages 37-48, 2016.
BigBWA: Approaching the Burrows-Wheeler Aligner to Big Data Technologies
José M. Abuín, Juan C. Pichel, Tomás F. Pena and Jorge Amigo.
Bioinformatics, Vol. 31, Issue 24, pages 4003-4005, 2015.
Power and Energy Implications of the Number of Threads Used on the Intel Xeon Phi
Oscar G. Lorenzo, Tomás F. Pena, José C. Cabaleiro, Juan C. Pichel, F.F. Rivera and D. S. Nikolopoulos.
Annals of Multicore and GPU Programming, Vol. 2, Issue 1, pages 55-65, 2015.
Análisis Morfosintáctico y Clasificación de Entidades Nombradas en un Entorno Big Data
Pablo Gamallo, Juan C. Pichel, Marcos García, José M. Abuín and Tomás F. Pena.
Procesamiento del Lenguaje Natural, Vol. 53, pages 17-24, 2014.
Using an Extended Roofline Model to Understand Data and Thread Affinities on NUMA Systems
Oscar G. Lorenzo, Tomás F. Pena, José C. Cabaleiro, Juan C. Pichel and Francisco F. Rivera.
Annals of Multicore and GPU Programming, Vol. 1, Issue 1, pages 56-67, 2014.
A Hardware Counter-Based Toolkit for the Analysis of Memory Accesses in SMPs
Oscar G. Lorenzo, Tomás F. Pena, José C. Cabaleiro, Juan C. Pichel, Juan A. Lorenzo and Francisco F. Rivera.
Concurrency and Computation: Practice and Experience, Vol. 26, Issue 6, pages 1328-1341, 2014.
Using Sampled Information, Is It Enough for the SpMV Locality Optimization?
Juan C. Pichel, Juan A. Lorenzo, Dora B. Heras, Francisco F. Rivera and Tomás F. Pena.
Concurrency and Computation: Practice and Experience, Vol. 26, Issue 1, pages 98-117, 2014.
3DyRM: A Dynamic Roofline Model Including Memory Latency Information
Oscar G. Lorenzo, Tomás F. Pena, Juan C. Pichel, José C. Cabaleiro and Francisco F. Rivera.
Journal of Supercomputing, Vol. 70, Issue 2, pages 696-708, 2014.
Sparse Matrix–Vector Multiplication on the Single-Chip Cloud Computer Many-Core Processor
Juan C. Pichel and Francisco F. Rivera.
Journal of Parallel and Distributed Computing, Vol. 73, Issue 12, pages 1539-1550, 2013.
A Flexible and Dynamic Page Migration Infrastructure Based on Hardware Counters
Juan A. Lorenzo, Juan C. Pichel, Francisco F. Rivera, Jose C. Cabaleiro and Tomás F. Pena.
Journal of Supercomputing, Vol. 65, Issue 2, pages 930-948, 2013.
Optimization of Sparse Matrix-Vector Multiplication Using Reordering Techniques on GPUs
Juan C. Pichel, Francisco F. Rivera, Marcos Fernández and Aurelio Rodríguez.
Microprocessors and Microsystems, Vol. 36, Issue 2, pages 65-77, 2012.
Analyzing the Execution of Sparse Matrix-Vector Product on the Finisterrae SMP-NUMA System
Juan C. Pichel, Juan A. Lorenzo, Dora B. Heras, José C. Cabaleiro and Tomás F. Pena.
Journal of Supercomputing, Vol. 58, Issue 2, pages 195-205, 2011.
Increasing the Locality of Iterative Methods and its Application to the Simulation of Semiconductor Devices
Juan C. Pichel, Dora B. Heras, José C. Cabaleiro, A. J. Garcia-Loureiro and Francisco F. Rivera.
Int. Journal of High Performance Computing Applications, Vol. 24, Issue 2, pages 136-153, 2010.
Increasing Data Reuse of Sparse Algebra Codes on Simultaneous Multithreading Architectures
Juan C. Pichel, Dora B. Heras, José C. Cabaleiro and Francisco F. Rivera.
Concurrency and Computation: Practice and Experience, Vol. 21, Issue 15, pages 1838-1856, 2009.
A Collective I/O Implementation Based on Inspector-Executor Paradigm
David E. Singh, Florin Isaila, Juan C. Pichel and Jesús Carretero.
Journal of Supercomputing, Vol. 47, Issue 1, pages 53-75, 2009.
Image Segmentation Based on Merging of Sub-Optimal Segmentations
Juan C. Pichel, David E. Singh and Francisco F. Rivera.
Pattern Recognition Letters, Vol. 27, Issue 10, pages 1105-1116, 2006.
Performance Optimization of Irregular Codes Based on the Combination of Reordering and Blocking Techniques
Juan C. Pichel, Dora B. Heras, José C. Cabaleiro and Francisco F. Rivera.
Parallel Computing, Vol. 31, Issue 8-9, pages 858-876, 2005.

Conferences

Quantum Compilation Process: A Survey
F. Javier Cardama, Jorge Vázquez‑Pérez, Tomás F. Pena, Juan C. Pichel and Andrés Gómez
European Conference on Parallel and Distributed Computing (Euro-Par). Madrid, Spain, 2024.
MPI4All: Universal Binding Generation for MPI Parallel Programming
César Piñeiro, Álvaro Vázquez and Juan C. Pichel.
24th International Conference on Computational Science (ICCS). Málaga, Spain, 2024.
Large Language Models for Binary Health-Related Question Answering: A Zero- and Few-Shot Evaluations
Marcos Fernández-Pichel, David E. Losada and Juan C. Pichel.
24th International Conference on Computational Science (ICCS). Málaga, Spain, 2024.
An Accurate Neural Network Model to Study Threshold Voltage Variability due to Metal Grain Granularity in Nanosheet FETs
Julián G. Fernández, Enrique Comesaña, Natalia Seoane, Juan C. Pichel and Antonio García-Loureiro.
Joint International EuroSOI Workshop. Tarragona, Spain, 2023.
CiTIUS at the TREC 2022 Health Misinformation Track
Marcos Fernández-Pichel, Manuel Prada-Corral, David E. Losada and Juan C. Pichel.
Text Retrieval Conference (TREC). 2022.
Social Minder: a tool for Social Media monitoring and its use for detecting COVID-19 misinformation
Marcos Fernández-Pichel, David E. Losada and Juan C. Pichel.
Joint Conference of the Information Retrieval Communities in Europe (CIRCLE). Toulouse, France, 2022.
CiTIUS at the TREC 2021 Health Misinformation Track
Marcos Fernández-Pichel, Manuel Prada-Corral, David E. Losada, Juan C. Pichel and Pablo Gamallo.
Text Retrieval Conference (TREC). 2021.
Comparing Traditional and Neural Approaches for detecting Health-related Misinformation
Marcos Fernández-Pichel, David E. Losada, Juan C. Pichel and David Elsweiler.
Conference and Lab of the Evaluation Forum (CLEF). Bucharest, Romania, 2021.
Reliability Prediction for Health-related Content: A Replicability Study
Marcos Fernández-Pichel, David E. Losada, Juan C. Pichel and David Elsweiler.
European Conference on Information Retrieval (ECIR). Lucca, Italy, 2021.
Colaboración entre docentes de una universidad alemana y una española para el desarrollo de seminarios prácticos acerca de la credibilidad de la información
Marcos Fernández-Pichel, David Elsweiler, David E. Losada and Juan C. Pichel.
XXVII Jornadas sobre la Enseñanza Universitaria de la Informática (JENUI). Valencia, Spain, 2021.
CiTIUS at the TREC 2020 Health Misinformation Track
Marcos Fernández-Pichel, David E. Losada, Juan C. Pichel and David Elsweiler.
Text Retrieval Conference (TREC). Gaithersburg, USA, 2020.
eXtream: a System for Real-time Monitoring of Dynamic Web Sources
Marcos Fernández-Pichel, Rodrigo Martínez-Castaño, David E. Losada and Juan C. Pichel.
Joint Conference of the Information Retrieval Communities in Europe (CIRCLE). Samatan, France, July 2020.
Dataflow Execution of Hierarchically Tiled Arrays
Chih-Chieh Yang, Juan C. Pichel and David A. Padua.
European Conference on Parallel and Distributed Computing (Euro-Par). Göttingen, Germany, August 2019.
LinguaKit: a Big Data-based multilingual tool for linguistic analysis and information extraction
Pablo Gamallo, Marcos Garcia, César Piñeiro, Rodrigo Martínez-Castaño and Juan C. Pichel.
Int. Workshop on Advances in Natural Language Processing (ANLP). Valencia, Spain, October 2018.
A New Approach for Sparse Matrix Classification Based on Deep Learning Techniques
Juan C. Pichel and Beatriz Pateiro-López.
IEEE Cluster (CLUSTER). Belfast, UK, September 2018.
Towards a Big Data Multi-language Framework using Docker Containers
César Piñeiro, Rodrigo Martínez-Castaño and Juan C. Pichel.
Jornadas Sarteco (JP). Teruel, Spain, September 2018.
Building Python-Based Topologies for Massive Processing of Social Media Data in Real Time
Rodrigo Martínez-Castaño, Juan C. Pichel and David E. Losada.
5th Spanish Conference in Information Retrieval (CERI). Zaragoza, Spain, June 2018.
A Micromodule Approach for Building Real-Time Systems with Python-Based Models: Application to Early Risk Detection of Depression on Social Media
Rodrigo Martínez-Castaño, Juan C. Pichel, David E. Losada and Fabio Crestani.
40th European Conference on Information Retrieval (ECIR). Grenoble, France, March 2018.
Perldoop2: a Big Data-oriented source-to-source Perl-Java compiler
César Piñeiro, José M. Abuín and Juan C. Pichel.
IEEE Int. Conference on Big Data Intelligence and Computing (DataCom). Orlando, USA, November 2017.
Sentiment Analysis on Multilingual Tweets using Big Data Technologies
Rodrigo Martínez-Castaño, Juan C. Pichel and Pablo Gamallo.
Jornadas Sarteco (JP). Salamanca, Spain, September 2016.
Power and Energy Implications of the Number of Threads Used on the Intel Xeon Phi
Oscar G. Lorenzo, Tomás F. Pena, José C. Cabaleiro, Juan C. Pichel, F.F. Rivera and D. S. Nikolopoulos.
2nd Congress on Multicore and GPU Programming. Cáceres, Spain, March 2015.
Perldoop: Efficient Execution of Perl Scripts on Hadoop Clusters
José M. Abuín, Juan C. Pichel, Tomás F. Pena, Pablo Gamallo and Marcos García.
IEEE Int. Conference on Big Data (IEEE Big Data). Washington D.C., USA, October 2014.
Thread Migration Techniques Based on Dynamic Roofline Models and Latency Information
Oscar G. Lorenzo, Tomás F. Pena, José C. Cabaleiro, Juan C. Pichel and F.F. Rivera.
XXV Jornadas de Paralelismo. Valladolid, Spain, September 2014.
Multiobjective Optimization Technique Based on Monitoring Information to Increase the Performance of Thread Migration on Multicores
Oscar G. Lorenzo, Tomás F. Pena, José C. Cabaleiro, Juan C. Pichel and F.F. Rivera
IEEE Cluster (CLUSTER). Madrid, Spain, September 2014.
Hierarchically Tiled Array as a High-Level Abstraction for Codelets
Chih-Chieh Yang, Juan C. Pichel, Adam R. Smith and David A. Padua.
4th Int. Workshop on Data-Flow Models for Extreme Scale Computing (DFM). Edmonton, Alberta, Canada, August 2014.
DyRM: A Dynamic Roofline Model Based on Runtime Information
Oscar G. Lorenzo, Tomás F. Pena, José C. Cabaleiro, Juan C. Pichel and Francisco F. Rivera.
13th Int. Conference on Computational and Mathematical Methods in Science and Engineering (CMMSE). Almería, Spain, June 2013.
Hardware Counters Based Analysis of Memory Accesses in SMPs
Oscar G. Lorenzo, Tomás F. Pena, Jose C. Cabaleiro, Juan C. Pichel, Juan A. Lorenzo and Francisco F. Rivera.
10th IEEE Int. Symposium on Parallel and Distributed Processing with Applications (ISPA). Leganés, Spain, July 2012.
A Graphical Tool for Performance Analysis of Multicore Systems Based on the Roofline Model
Francisco F. Rivera, R. Iglesias, Juan A. Lorenzo, Juan C. Pichel, Tomás F. Pena and Jose C. Cabaleiro.
10th IEEE Int. Symposium on Parallel and Distributed Processing with Applications (ISPA). Leganés, Spain, July 2012.
Experiences with the Sparse Matrix-Vector Multiplication on a Many-Core Processor
Juan C. Pichel and Francisco F. Rivera.
21st Int. Heterogeneity in Computing Workshop (HCW, together with IPDPS). Shanghai, China, May 2012.
Herramientas para la Monitorización de los Accesos a Memoria de Códigos Paralelos Mediante Contadores Hardware
Oscar G. Lorenzo, Juan A. Lorenzo, Dora B. Heras, Juan C. Pichel and Francsico F. Rivera.
XXII Jornadas de Paralelismo. La Laguna, Spain, September 2011.
Study of Performance Issues on a SMP-NUMA System Using the Roofline Model
Juan A. Lorenzo, Juan C. Pichel, Tomás F. Pena, Marcos Suarez and Francisco F. Rivera.
Int. Conf. on Parallel and Distributed Processing Techniques and Applications (PDPTA). Las Vegas, USA, July 2011.
A Study of Memory Access Patterns in Irregular Parallel Codes Using Hardware Counter-Based Tools
Oscar G. Lorenzo, Juan A. Lorenzo, José C. Cabaleiro, Dora B. Heras, Marcos Suarez, and Juan C. Pichel.
Int. Conf. on Parallel and Distributed Processing Techniques and Applications (PDPTA). Las Vegas, USA, July 2011.
Lessons Learnt Porting Parallelisation Techniques for Irregular Codes to NUMA Systems
Juan A. Lorenzo, Juan C. Pichel, David LaFrance-Linden, Francisco F. Rivera and David E. Singh.
18th Euromicro Conference on Parallel, Distributed and Network based Processing (PDP). Pisa, Italia, February 2010.
On the Influence of Thread Allocation for Irregular Codes in NUMA Systems
Juan A. Lorenzo, Francisco F. Rivera, Petr Tuma and Juan C. Pichel.
10th Int. Conf. on Parallel and Distributed Computing, Applications and Technologies (PDCAT). Hiroshima, Japan, December 2009.
Thread Allocation Issues for Irregular Codes in the Finisterrae System
Juan A. Lorenzo, Francisco F. Rivera, Dora B. Heras, José C. Cabaleiro, Tomás F. Pena, Juan C. Pichel and David E. Singh.
XX Jornadas de Paralelismo. A Coruña, Galicia, Spain, September 2009.
Evaluating Sparse Matrix-Vector Product on the FinisTerrae Supercomputer
Juan C. Pichel, Juan A. Lorenzo, Dora B. Heras and José C. Cabaleiro.
9th Int. Conference on Computational and Mathematical Methods in Science and Engineering (CMMSE). Gijón, Spain, June 2009.
Exploiting Data Compression in Collective I/O Techniques
Rosa Filgueira, David E. Singh, Juan C. Pichel and Jesús Carretero.
IEEE Int. Conference on Cluster Computing. Tsukuba, Japan, September 2008.
Reordering Algorithms for Increasing Locality on Multicore Processors
Juan C. Pichel, David E. Singh and Jesús Carretero.
10th IEEE Int. Conference on High Performance Computing and Communications (HPCC). Dalian, China, September 2008.
Data Locality Aware Strategy for Two-Phase Collective I/O
Rosa Filgueira, David E. Singh, Juan C. Pichel, Florin Isaila and Jesús Carretero.
Int. Meeting High Performance Computing for Computational Science (VECPAR). Toulouse, France, June 2008.
A Collective I/O Implementation Based on Inspector-Executor Paradigm
David E. Singh, Florin Isaila, Juan C. Pichel and Jesús Carretero.
Int. Workshop on Scalable Data Management Applications and Systems (SDMAS). Las Vegas, USA, June 2007.
A New Technique to Reduce False Sharing in Irregular Codes Based on Distance Functions
Juan C. Pichel, Dora B. Heras, José C. Cabaleiro and Francisco F. Rivera.
8th Int. Symposium on Parallel Architectures, Algorithms and Networks (I-SPAN). pp. 306-311. Las Vegas, USA, December 2005.
Mejora de la Localidad en SMPs: el Producto Matriz Dispersa-Vector como Caso de Estudio
Juan C. Pichel, Dora B. Heras, José C. Cabaleiro, Marcos Boullón, David E. Singh and Francsico F. Rivera.
XV Jornadas de Paralelismo. Almería, Spain, September 2004.
Improving the Locality of the Sparse Matrix-Vector product on Shared Memory Multiprocessors
Juan C. Pichel, Dora B. Heras, José C. Cabaleiro and Francisco F. Rivera.
12th Euromicro Conference on Parallel, Distributed and Network based Processing (PDP). A Coruña, Galicia, February 2004.
Algoritmo Paralelo de Segmentación de Imágenes Basado en el Crecimiento Desacoplado de Regiones
Juan C. Pichel, David E. Singh and Francisco F. Rivera.
Conferencia Iberoamericana en Sistemas, Cibernética e Informática (CISCI). pp. 134-139. Orlando, USA, July 2002.

Preprints

Evaluating Search Engines and Large Language Models for Answering Health Questions
Marcos Fernández-Pichel, Juan C. Pichel and David E. Losada
arXiv:2407.12468v3, 2025.
OMP4Py: a pure Python implementation of OpenMP
César Piñeiro and Juan C. Pichel
arXiv:2411.14887, 2024.
NetQIR: An Extension of QIR for Distributed Quantum Computing
Jorge Vázquez-Pérez, F. Javier Cardama, César Piñeiro, Tomás F. Pena, Juan C. Pichel and Andrés Gómez
arXiv:2408.03712, 2024.
Review of Distributed Quantum Computing. From single QPU to High Performance Quantum Computing
David Barral, F. Javier Cardama, Guillermo Díaz, Daniel Faílde, Iago F. Llovo, Mariamo Mussa Juane, Jorge Vázquez-Pérez, Juan Villasuso, César Piñeiro, Natalia Costas, Juan C. Pichel, Tomás F. Pena and Andrés Gómez
arXiv:2404.01265, 2024.
A unified framework to improve the interoperability between HPC and Big Data languages and programming models
César Piñeiro and Juan C. Pichel
arXiv:2112.00467, 2021.
Polypus: a Big Data Self-Deployable Architecture for Microblogging Text Extraction and Real-Time Sentiment Analysis
Rodrigo Martínez-Castaño, Juan C. Pichel and Pablo Gamallo
arXiv:1801.03710, 2018.

Book chapters

A Parallel Framework for Image Segmentation Using Region Based Techniques
Juan C. Pichel, David E. Singh and Francisco F. Rivera
Vision Systems: Segmentation and Pattern Recognition, edited by Goro Obinata and Ashish Dutta, 2007.

Software

MPI4All

MPI is the predominant and most extensively utilized programming model in the HPC area. The standard only provides bindings for the low-level programming languages C, C++, and Fortran. While efforts are being made to oﬀer MPI bindings for other programming languages, the support provided may be limited, potentially resulting in functionality gaps, performance overhead, and compatibility problems. To deal with those issues, we introduce MPI4All, a novel tool aimed at simplifying the process of creating eﬃcient MPI bindings for any programming language. MPI4All is not dependent on the MPI implementation, and adding support for new languages does not require significant eﬀort. The current version of MPI4All includes binding generators for Java and Go programming languages.

Citation:
César Piñeiro, Álvaro Vázquez and Juan C. Pichel. Towards universal MPI bindings for enhanced new language support. Journal of Computational Science, Vol. 87, 2025.
César Piñeiro, Álvaro Vázquez and Juan C. Pichel. MPI4All: universal binding generation for MPI parallel programming. 24th Int. Conf. on Computational Science (ICCS), 2024.

BigSeqKit

BigSeqKit is a parallel toolkit to manipulate FASTA/Q files at scale with speed and scalability at its core. BigSeqKit takes advantage of an HPC-Big Data framework (IgnisHPC) to parallelize and optimize the commands included in seqkit. In this way, in most cases, it is from tens to hundreds of times faster than other state-of-the-art tools such as seqkit, samtools, and pyfastx. At the same time, our tool is easy to use and install on any kind of hardware platform (single server or cluster). Routines in BigSeqKit can be used as a bioinformatics library or from the command line. In order to improve usability and facilitate the adoption of BigSeqKit, it implements the same command interface as seqkit.

Citation:
César Piñeiro and Juan C. Pichel. BigSeqKit: a parallel Big Data toolkit to process FASTA and FASTQ files at scale. GigaScience, Vol. 12, 2023.

PyPlexity

This package provides a simple interface to apply Perplexity filters to any text document. A possible use case for this technology could be the removal of boilerplate (sentences with a high perplexity score): ads, incomplete or noisy text, and remnants of the navigation structure, such as menus or navigation bars. Furthermore, it provides a rough HTML tag cleaner and a WARC and HTML bulk processor, with distributed capabilities.

Citation:
Marcos Fernández-Pichel, Manuel Prada-Corral, David E. Losada, Juan C. Pichel, and Pablo Gamallo. An Unsupervised Perplexity-based Method for Boilerplate Removal. Natural Language Engineering, Vol. 30, 2024.

IgnisHPC

IgnisHPC is a framework whose main objective is to unify the execution of Big Data and HPC workloads in the same computing engine. IgnisHPC has native support for multi-language applications using JVM and non-JVM-based languages. Currently, it supports C, C++, Python, Go, and Java. Since MPI was used as its backbone technology, IgnisHPC allows MPI applications and libraries to be directly executed efficiently within the framework. The experimental evaluation demonstrates the benefits of our proposal in terms of performance and productivity over other frameworks such as Spark. For example, on a 12-node cluster with 2 × Intel Xeon E5-2630v4 (2.2Ghz, 10 cores) per node, the experimental results show:

Application	No. times faster than Spark
Minebench	3.87x [Python & C++], 1.26x [Python]
TeraSort	1.76x [C++], 1.35x [Python]
K-Means	1.94x [Python & C++]
PageRank	1.10x [Python]
Transitive Closure	1.12x [Python]

IgnisHPC is publicly available for the Big Data and HPC research community.

Citation:
César Piñeiro and Juan C. Pichel. A Unified Framework to Improve the Interoperability between HPC and Big Data Languages and Programming Models. Future Generation Computing Systems, Vol. 134, 2022.

VeryFastTree

VeryFastTree is a new tool designed for efficient phylogenetic tree inference, specifically tailored to handle massive taxonomic datasets. It is a highly-tuned implementation based on the FastTree-2 tool that takes advantage of parallelization and vectorization strategies to speed up the inference of phylogenies for huge alignments. For example, VeryFastTree (v4.0 - July 2023) can construct a tree on one server (two 32-core Intel Xeon Ice Lake 8352Y processors) using single-precision arithmetic from an **ultra-large one-million taxa alignment in just 36 hours. In contrast, VeryFastTree-3.0 and FastTree-2 require over 5 days for the same task. That means VeryFastTree-4.0 is over 3x faster than its previous version and FastTree-2.

VeryFastTree is available as a package in: Bioconda, MacPorts, and Debian Linux distributions. It also provides Python bindings.

Citations:
César Piñeiro and Juan C. Pichel. Efficient phylogenetic tree inference for massive taxonomic datasets: harnessing the power of a server to analyze 1 million taxa. GigaScience, Vol. 13, pages 1-12, 2024.
César Piñeiro, José M. Abuín, and Juan C. Pichel. VeryFastTree: speeding up the estimation of phylogenies for large alignments through parallelization and vectorization strategies. Bioinformatics, Vol. 36, Issue 17, pages 4658-4659, 2020.

PASTASpark

PASTASpark is a tool that uses the Big Data engine Apache Spark to boost the performance of the alignment phase of PASTA (Practical Alignments using SATé and TrAnsitivity). PASTASpark guarantees scalability and fault tolerance, allowing users to obtain Multiple Sequence Alignments (MSAs) from very large datasets in reasonable time.

Citation:
José M. Abuín, Tomás F. Pena, and Juan C. Pichel. PASTASpark: multiple sequence alignment meets Big Data. Bioinformatics, Vol. 33, Issue 18, pp. 2948-2950, 2017.

SparkBWA

SparkBWA is a new tool that exploits the capabilities of Big Data technology like Apache Spark to boost the performance of one of the most widely adopted DNA sequence aligners, the Burrows-Wheeler Aligner (BWA).

Citation:
José M. Abuín, Juan C. Pichel, Tomás F. Pena, and Jorge Amigo. SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data. PLoS ONE, Vol. 11, Issue 5, pp. 1-21, 2016.

BigBWA

BigBWA allows executing the Burrows-Wheeler Aligner (BWA) on an Apache Hadoop cluster, bringing the power of Big Data technologies to high-throughput DNA sequencing.

Citation:
José M. Abuín, Juan C. Pichel, Tomás F. Pena, and Jorge Amigo. BigBWA: Approaching the Burrows-Wheeler Aligner to Big Data Technologies. Bioinformatics, Vol. 31, Issue 24, pp. 4003-4005, 2015.

Projects

Here you can find a list of some of the most recent research projects I am/was involved with:

C3HS: Content curation for consumer health search - Search and misinformation detection
Funded by Ministerio de Economía y Competitividad (PID2022-137061OB-C22)
Period: Sep 2023 - Aug 2026

HYBRIDS: Hybrid Intelligence to monitor, promote and analyse transformations in good democracy practices
Funded by Horizon Europe, Marie Skłodowska-Curie Actions (MSCA), Doctoral Networks, European Union (101073351)
Period: Jan 2023 - Dec 2026

Big-eRisk: Early Prediction of Personal Risks on Massive Data
Funded by Ministerio de Economía y Competitividad (PLEC2021-007662)
Period: Nov 2021 - Nov 2024

eRISK: Technologies for the early prediction of signs related with psychological disorders
Funded by Ministerio de Economía, Industria y Competitividad (RTI2018-093336-B-C21)
Period: Jan 2019 - Dec 2021

Contact

	Juan Carlos Pichel
	CITIUS (Universidade de Santiago de Compostela) Rúa de Jenaro de la Fuente 15782 Santiago de Compostela (Spain)
	juancarlos.pichel@usc.es
	+34 881816437