Publicações

Show all

2024

Mencagli, Gabriele; Torquati, Massimo; Griebler, Dalvan; Fais, Alessandra; Danelutto, Marco

General-purpose data stream processing on heterogeneous architectures with WindFlow Journal Article doi

Journal of Parallel and Distributed Computing, 184 , pp. 104782, 2024.

Abstract | Links | BibTeX | Tags: GPGPU, Stream processing

2023

Leonarczyk, Ricardo; Griebler, Dalvan; Mencagli, Gabriele; Danelutto, Marco

Evaluation of Adaptive Micro-batching Techniques for GPU-accelerated Stream Processing Inproceedings

Euro-ParW 2023: Parallel Processing Workshops, pp. 1-8, Springer, Limassol, 2023.

Links | BibTeX | Tags: GPGPU, Stream processing

Leonarczyk, Ricardo; Griebler, Dalvan

Avaliação da Auto-Adaptação de Micro-Lote para aplicação de Processamento de Streaming em GPUs Inproceedings doi

Anais da XXIII Escola Regional de Alto Desempenho da Região Sul, pp. 123-124, Sociedade Brasileira de Computação, Porto Alegre, Brazil, 2023.

Abstract | Links | BibTeX | Tags: GPGPU, Self-adaptation, Stream processing

Araujo, Gabriell; Griebler, Dalvan; Rockenbach, Dinei A; Danelutto, Marco; Fernandes, Luiz Gustavo

NAS Parallel Benchmarks with CUDA and Beyond Journal Article doi

Software: Practice and Experience, 53 (1), pp. 53-80, 2023.

Abstract | Links | BibTeX | Tags: Benchmark, GPGPU, Parallel programming

@article{ARAUJO:SPE:23,
title = {NAS Parallel Benchmarks with CUDA and Beyond},
author = {Gabriell Araujo and Dalvan Griebler and Dinei A Rockenbach and Marco Danelutto and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1002/spe.3056},
doi = {10.1002/spe.3056},
year = {2023},
date = {2023-01-01},
journal = {Software: Practice and Experience},
volume = {53},
number = {1},
pages = {53-80},
publisher = {Wiley},
abstract = {NAS Parallel Benchmarks (NPB) is a standard benchmark suite used in the evaluation of parallel hardware and software. Several research efforts from academia have made these benchmarks available with different parallel programming models beyond the original versions with OpenMP and MPI. This work joins these research efforts by providing a new CUDA implementation for NPB. Our contribution covers different aspects beyond the implementation. First, we define design principles based on the best programming practices for GPUs and apply them to each benchmark using CUDA. Second, we provide ease of use parametrization support for configuring the number of threads per block in our version. Third, we conduct a broad study on the impact of the number of threads per block in the benchmarks. Fourth, we propose and evaluate five strategies for helping to find a better number of threads per block configuration. The results have revealed relevant performance improvement solely by changing the number of threads per block, showing performance improvements from 8% up to 717% among the benchmarks. Fifth, we conduct a comparative analysis with the literature, evaluating performance, memory consumption, code refactoring required, and parallelism implementations. The performance results have shown up to 267% improvements over the best benchmarks versions available. We also observe the best and worst design choices, concerning code size and the performance trade-off. Lastly, we highlight the challenges of implementing parallel CFD applications for GPUs and how the computations impact the GPU's behavior.},
keywords = {Benchmark, GPGPU, Parallel programming},
pubstate = {published},
tppubtype = {article}
}

2022

Rockenbach, Dinei A; Löff, Júnior; Araujo, Gabriell; Griebler, Dalvan; Fernandes, Luiz G

High-Level Stream and Data Parallelism in C++ for GPUs Inproceedings doi

XXVI Brazilian Symposium on Programming Languages (SBLP), pp. 41-49, ACM, Uberlândia, Brazil, 2022.

Abstract | Links | BibTeX | Tags: GPGPU, Parallel programming, Stream processing

Mencagli, Gabriele; Griebler, Dalvan; Danelutto, Marco

Towards Parallel Data Stream Processing on System-on-Chip CPU+GPU Devices Inproceedings doi

30th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 34-38, IEEE, Valladolid, Spain, 2022.

Abstract | Links | BibTeX | Tags: GPGPU, IoT, Stream processing

Scheer, Claudio; Araujo, Gabriell; Griebler, Dalvan; Meneguzzi, Felipe; Fernandes, Luiz Gustavo

Encontrando a Configuração de Threads por Bloco para os Kernels NPB-CUDA com Q-Learning Inproceedings doi

Anais da XXII Escola Regional de Alto Desempenho da Região Sul, pp. 119-120, Sociedade Brasileira de Computação, Curitiba, Brazil, 2022.

Abstract | Links | BibTeX | Tags: Benchmark, Deep learning, GPGPU

2020

Stein, Charles M; Rockenbach, Dinei A; Griebler, Dalvan; Torquati, Massimo; Mencagli, Gabriele; Danelutto, Marco; Fernandes, Luiz Gustavo

Latency‐aware adaptive micro‐batching techniques for streamed data compression on graphics processing units Journal Article doi

Concurrency and Computation: Practice and Experience, na (na), pp. e5786, 2020.

Abstract | Links | BibTeX | Tags: GPGPU, Stream processing

@article{STEIN:CCPE:20,
title = {Latency‐aware adaptive micro‐batching techniques for streamed data compression on graphics processing units},
author = {Charles M Stein and Dinei A Rockenbach and Dalvan Griebler and Massimo Torquati and Gabriele Mencagli and Marco Danelutto and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1002/cpe.5786},
doi = {10.1002/cpe.5786},
year = {2020},
date = {2020-05-01},
journal = {Concurrency and Computation: Practice and Experience},
volume = {na},
number = {na},
pages = {e5786},
publisher = {Wiley Online Library},
abstract = {Stream processing is a parallel paradigm used in many application domains. With the advance of graphics processing units (GPUs), their usage in stream processing applications has increased as well. The efficient utilization of GPU accelerators in streaming scenarios requires to batch input elements in microbatches, whose computation is offloaded on the GPU leveraging data parallelism within the same batch of data. Since data elements are continuously received based on the input speed, the bigger the microbatch size the higher the latency to completely buffer it and to start the processing on the device. Unfortunately, stream processing applications often have strict latency requirements that need to find the best size of the microbatches and to adapt it dynamically based on the workload conditions as well as according to the characteristics of the underlying device and network. In this work, we aim at implementing latency‐aware adaptive microbatching techniques and algorithms for streaming compression applications targeting GPUs. The evaluation is conducted using the Lempel‐Ziv‐Storer‐Szymanski compression application considering different input workloads. As a general result of our work, we noticed that algorithms with elastic adaptation factors respond better for stable workloads, while algorithms with narrower targets respond better for highly unbalanced workloads.},
keywords = {GPGPU, Stream processing},
pubstate = {published},
tppubtype = {article}
}

de Araujo, Gabriell ; Griebler, Dalvan; Danelutto, Marco; Fernandes, Luiz Gustavo

Efficient NAS Parallel Benchmark Kernels with CUDA Inproceedings doi

28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 9-16, IEEE, Västerås, Sweden, Sweden, 2020.

Abstract | Links | BibTeX | Tags: Benchmark, GPGPU

2019

Rockenbach, Dinei A; Griebler, Dalvan; Danelutto, Marco; Fernandes, Luiz Gustavo

High-Level Stream Parallelism Abstractions with SPar Targeting GPUs Inproceedings doi

Parallel Computing is Everywhere, Proceedings of the International Conference on Parallel Computing (ParCo), pp. 543-552, IOS Press, Prague, Czech Republic, 2019.

Abstract | Links | BibTeX | Tags: GPGPU, Stream processing

Rockenbach, Dinei A; Stein, Charles Michael; Griebler, Dalvan; Mencagli, Gabriele; Torquati, Massimo; Danelutto, Marco; Fernandes, Luiz Gustavo

Stream Processing on Multi-cores with GPUs: Parallel Programming Models' Challenges Inproceedings doi

International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 834-841, IEEE, Rio de Janeiro, Brazil, 2019.

Abstract | Links | BibTeX | Tags: GPGPU, Stream processing

Stein, Charles M; Stein, Joao V; Boz, Leonardo; Rockenbach, Dinei A; Griebler, Dalvan

Mandelbrot Streaming para Sistemas Multi-core com GPUs Inproceedings

19th Escola Regional de Alto Desempenho da Região Sul (ERAD/RS), Sociedade Brasileira de Computação, Três de Maio, RS, Brazil, 2019.

Abstract | Links | BibTeX | Tags: GPGPU, Stream processing

Stein, Charles M; Rockenbach, Dinei A; Griebler, Dalvan

Paralelização do Dedup para Sistemas Multi-core com GPUs Inproceedings

19th Escola Regional de Alto Desempenho da Região Sul (ERAD/RS), Sociedade Brasileira de Computação, Três de Maio, RS, Brazil, 2019.

Abstract | Links | BibTeX | Tags: GPGPU

Stein, Charles Michael; Griebler, Dalvan; Danelutto, Marco; Fernandes, Luiz Gustavo

Stream Parallelism on the LZSS Data Compression Application for Multi-Cores with GPUs Inproceedings doi

27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 247-251, IEEE, Pavia, Italy, 2019.

Abstract | Links | BibTeX | Tags: GPGPU, Stream processing

2018

Stein, Charles

Programação Paralela para GPU em Aplicações de Processamento Stream Undergraduate Thesis

Undergraduate Thesis, 2018.

Abstract | Links | BibTeX | Tags: GPGPU, Stream processing

Stein, Charles M; Griebler, Dalvan

Explorando o Paralelismo de Stream em CPU e de Dados em GPU na Aplicação de Filtro Sobel Inproceedings

18th Escola Regional de Alto Desempenho do Estado do Rio Grande do Sul (ERAD/RS), pp. 137-140, Sociedade Brasileira de Computação, Porto Alegre, RS, Brazil, 2018.

Abstract | Links | BibTeX | Tags: GPGPU, Stream processing