Publicações

2023

Dopke, Luan; Griebler, Dalvan

Estudo Sobre Spark nas Aplicações de Processamento de Log e Análise de Cliques Inproceedings doi

Anais da XXIII Escola Regional de Alto Desempenho da Região Sul, pp. 85-88, Sociedade Brasileira de Computação, Porto Alegre, Brazil, 2023.

Abstract | Links | BibTeX | Tags: Benchmark, Stream processing

Garcia, Adriano Marques; Griebler, Dalvan; Schepke, Claudio; García, José Daniel; Muñoz, Javier Fernández; Fernandes, Luiz Gustavo

A Latency, Throughput, and Programmability Perspective of GrPPI for Streaming on Multi-cores Inproceedings doi

31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 164-168, IEEE, Naples, Italy, 2023.

Abstract | Links | BibTeX | Tags: Benchmark, Stream processing

Garcia, Adriano Marques; Griebler, Dalvan; Schepke, Claudio; Fernandes, Luiz Gustavo

SPBench: a framework for creating benchmarks of stream processing applications Journal Article doi

Computing, 105 (5), pp. 1077-1099, 2023.

Abstract | Links | BibTeX | Tags: Benchmark, Stream processing

Araujo, Gabriell; Griebler, Dalvan; Rockenbach, Dinei A; Danelutto, Marco; Fernandes, Luiz Gustavo

NAS Parallel Benchmarks with CUDA and Beyond Journal Article doi

Software: Practice and Experience, 53 (1), pp. 53-80, 2023.

Abstract | Links | BibTeX | Tags: Benchmark, GPGPU, Parallel programming

@article{ARAUJO:SPE:23,
title = {NAS Parallel Benchmarks with CUDA and Beyond},
author = {Gabriell Araujo and Dalvan Griebler and Dinei A Rockenbach and Marco Danelutto and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1002/spe.3056},
doi = {10.1002/spe.3056},
year = {2023},
date = {2023-01-01},
journal = {Software: Practice and Experience},
volume = {53},
number = {1},
pages = {53-80},
publisher = {Wiley},
abstract = {NAS Parallel Benchmarks (NPB) is a standard benchmark suite used in the evaluation of parallel hardware and software. Several research efforts from academia have made these benchmarks available with different parallel programming models beyond the original versions with OpenMP and MPI. This work joins these research efforts by providing a new CUDA implementation for NPB. Our contribution covers different aspects beyond the implementation. First, we define design principles based on the best programming practices for GPUs and apply them to each benchmark using CUDA. Second, we provide ease of use parametrization support for configuring the number of threads per block in our version. Third, we conduct a broad study on the impact of the number of threads per block in the benchmarks. Fourth, we propose and evaluate five strategies for helping to find a better number of threads per block configuration. The results have revealed relevant performance improvement solely by changing the number of threads per block, showing performance improvements from 8% up to 717% among the benchmarks. Fifth, we conduct a comparative analysis with the literature, evaluating performance, memory consumption, code refactoring required, and parallelism implementations. The performance results have shown up to 267% improvements over the best benchmarks versions available. We also observe the best and worst design choices, concerning code size and the performance trade-off. Lastly, we highlight the challenges of implementing parallel CFD applications for GPUs and how the computations impact the GPU's behavior.},
keywords = {Benchmark, GPGPU, Parallel programming},
pubstate = {published},
tppubtype = {article}
}

Garcia, Adriano Marques; Griebler, Dalvan; Schepke, Claudio; Fernandes, Luiz Gustavo

Micro-batch and data frequency for stream processing on multi-cores Journal Article doi

The Journal of Supercomputing, 79 (8), pp. 9206-9244, 2023.

Abstract | Links | BibTeX | Tags: Benchmark, Self-adaptation, Stream processing

2022

Garcia, Adriano Marques; Griebler, Dalvan; Schepke, Claudio; Fernandes, Luiz Gustavo

Evaluating Micro-batch and Data Frequency for Stream Processing Applications on Multi-cores Inproceedings doi

30th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 10-17, IEEE, Valladolid, Spain, 2022.

Abstract | Links | BibTeX | Tags: Benchmark, Stream processing

@inproceedings{GARCIA:PDP:22,
title = {Evaluating Micro-batch and Data Frequency for Stream Processing Applications on Multi-cores},
author = {Adriano Marques Garcia and Dalvan Griebler and Claudio Schepke and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1109/PDP55904.2022.00011},
doi = {10.1109/PDP55904.2022.00011},
year = {2022},
date = {2022-04-01},
booktitle = {30th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)},
pages = {10-17},
publisher = {IEEE},
address = {Valladolid, Spain},
series = {PDP'22},
abstract = {In stream processing, data arrives constantly and is often unpredictable. It can show large fluctuations in arrival frequency, size, complexity, and other factors. These fluctuations can strongly impact application latency and throughput, which are critical factors in this domain. Therefore, there is a significant amount of research on self-adaptive techniques involving elasticity or micro-batching as a way to mitigate this impact. However, there is a lack of benchmarks and tools for helping researchers to investigate micro-batching and data stream frequency implications. In this paper, we extend a benchmarking framework to support dynamic micro-batching and data stream frequency management. We used it to create custom benchmarks and compare latency and throughput aspects from two different parallel libraries. We validate our solution through an extensive analysis of the impact of micro-batching and data stream frequency on stream processing applications using Intel TBB and FastFlow, which are two libraries that leverage stream parallelism on multi-core architectures. Our results demonstrated up to 33% throughput gain over latency using micro-batches. Additionally, while TBB ensures lower latency, FastFlow ensures higher throughput in the parallel applications for different data stream frequency configurations.},
keywords = {Benchmark, Stream processing},
pubstate = {published},
tppubtype = {inproceedings}
}

Scheer, Claudio; Araujo, Gabriell; Griebler, Dalvan; Meneguzzi, Felipe; Fernandes, Luiz Gustavo

Encontrando a Configuração de Threads por Bloco para os Kernels NPB-CUDA com Q-Learning Inproceedings doi

Anais da XXII Escola Regional de Alto Desempenho da Região Sul, pp. 119-120, Sociedade Brasileira de Computação, Curitiba, Brazil, 2022.

Abstract | Links | BibTeX | Tags: Benchmark, Deep learning, GPGPU

2021

Löff, Júnior; Griebler, Dalvan; Mencagli, Gabriele; de Araujo, Gabriell ; Torquati, Massimo; Danelutto, Marco; Fernandes, Luiz Gustavo

The NAS parallel benchmarks for evaluating C++ parallel programming frameworks on shared-memory architectures Journal Article doi

Future Generation Computer Systems, 125 , pp. 743-757, 2021.

Abstract | Links | BibTeX | Tags: Benchmark, Parallel programming

@article{LOFF:FGCS:21,
title = {The NAS parallel benchmarks for evaluating C++ parallel programming frameworks on shared-memory architectures},
author = {Júnior Löff and Dalvan Griebler and Gabriele Mencagli and Gabriell {de Araujo} and Massimo Torquati and Marco Danelutto and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1016/j.future.2021.07.021},
doi = {10.1016/j.future.2021.07.021},
year = {2021},
date = {2021-07-01},
journal = {Future Generation Computer Systems},
volume = {125},
pages = {743-757},
publisher = {Elsevier},
abstract = {The NAS Parallel Benchmarks (NPB), originally implemented mostly in Fortran, is a consolidated suite containing several benchmarks extracted from Computational Fluid Dynamics (CFD) models. The benchmark suite has important characteristics such as intensive memory communications, complex data dependencies, different memory access patterns, and hardware components/sub-systems overload. Parallel programming APIs, libraries, and frameworks that are written in C++ as well as new optimizations and parallel processing techniques can benefit if NPB is made fully available in this programming language. In this paper we present NPB-CPP, a fully C++ translated version of NPB consisting of all the NPB kernels and pseudo-applications developed using OpenMP, Intel TBB, and FastFlow parallel frameworks for multicores. The design of NPB-CPP leverages the Structured Parallel Programming methodology (essentially based on parallel design patterns). We show the structure of each benchmark application in terms of composition of few patterns (notably Map and MapReduce constructs) provided by the selected C++ frameworks. The experimental evaluation shows the accuracy of NPB-CPP with respect to the original NPB source code. Furthermore, we carefully evaluate the parallel performance on three multi-core systems (Intel, IBM Power and AMD) with different C++ compilers (gcc, icc and clang) by discussing the performance differences in order to give to the researchers useful insights to choose the best parallel programming framework for a given type of problem.},
keywords = {Benchmark, Parallel programming},
pubstate = {published},
tppubtype = {article}
}

Leonarczyk, Ricardo; Griebler, Dalvan

Implementação MPIC++ e HPX dos Kernels NPB Inproceedings doi

21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 81-84, Sociedade Brasileira de Computação, Joinville, RS, Brazil, 2021.

Abstract | Links | BibTeX | Tags: Benchmark, Parallel programming

2020

Bordin, Maycon Viana; Griebler, Dalvan; Mencagli, Gabriele; Geyer, Claudio F R; Fernandes, Luiz Gustavo

DSPBench: a Suite of Benchmark Applications for Distributed Data Stream Processing Systems Journal Article doi

IEEE Access, 8 (na), pp. 222900-222917, 2020.

Abstract | Links | BibTeX | Tags: Benchmark, Stream processing

Leonarczyk, Ricardo; Griebler, Dalvan

Implementação MPIC++ dos kernels NPB EP, IS e CG Inproceedings doi

20th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 101-104, Sociedade Brasileira de Computação, Santa Maria, RS, Brazil, 2020.

Abstract | Links | BibTeX | Tags: Benchmark

Maliszewski, Anderson M; Roloff, Eduardo; Griebler, Dalvan; Navaux, Philippe O A

Avaliando o Impacto da Rede no Desempenho e Custo de Execução de Aplicações HPC Inproceedings doi

20th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 159-160, Sociedade Brasileira de Computação, Santa Maria, RS, Brazil, 2020.

Abstract | Links | BibTeX | Tags: Benchmark, Cloud computing

de Araujo, Gabriell ; Griebler, Dalvan; Danelutto, Marco; Fernandes, Luiz Gustavo

Efficient NAS Parallel Benchmark Kernels with CUDA Inproceedings doi

28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 9-16, IEEE, Västerås, Sweden, Sweden, 2020.

Abstract | Links | BibTeX | Tags: Benchmark, GPGPU

2019

Maliszewski, Anderson M; Fim, Gabriel R; Maron, Carlos A F; Vogel, Adriano; Griebler, Dalvan

Avaliação de Desempenho em Contêineres LXD com Aplicações Científicas na Nuvem OpenNebula Inproceedings

19th Escola Regional de Alto Desempenho da Região Sul (ERAD/RS), Sociedade Brasileira de Computação, Três de Maio, RS, Brazil, 2019.

Abstract | Links | BibTeX | Tags: Benchmark, Cloud computing

Maron, Carlos A F; Vogel, Adriano; Griebler, Dalvan; Fernandes, Luiz Gustavo

Should PARSEC Benchmarks be More Parametric? A Case Study with Dedup Inproceedings doi

27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 217-221, IEEE, Pavia, Italy, 2019.

Abstract | Links | BibTeX | Tags: Benchmark

@inproceedings{MARON:parametric-parsec:PDP:19,
title = {Should PARSEC Benchmarks be More Parametric? A Case Study with Dedup},
author = {Carlos A F Maron and Adriano Vogel and Dalvan Griebler and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1109/EMPDP.2019.8671592},
doi = {10.1109/EMPDP.2019.8671592},
year = {2019},
date = {2019-02-01},
booktitle = {27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)},
pages = {217-221},
publisher = {IEEE},
address = {Pavia, Italy},
series = {PDP'19},
abstract = {Parallel applications of the same domain can present similar patterns of behavior and characteristics. Characterizing common application behaviors can help for understanding performance aspects in the real-world scenario. One way to better understand and evaluate applications' characteristics is by using customizable/parametric benchmarks that enable users to represent important characteristics at run-time. We observed that parameterization techniques should be better exploited in the available benchmarks, especially on stream processing domain. For instance, although widely used, the stream processing benchmarks available in PARSEC do not support the simulation and evaluation of relevant and modern characteristics. Therefore, our goal is to identify the stream parallelism characteristics present in PARSEC. We also implemented a ready to use parameterization support and evaluated the application behaviors considering relevant performance metrics for stream parallelism (service time, throughput, latency). We choose Dedup to be our case study. The experimental results have shown performance improvements in our parameterization support for Dedup. Moreover, this support increased the customization space for benchmark users, which is simple to use. In the future, our solution can be potentially explored on different parallel architectures and parallel programming frameworks.},
keywords = {Benchmark},
pubstate = {published},
tppubtype = {inproceedings}
}

Maliszewski, Anderson M; Griebler, Dalvan

Avaliação de Desempenho da Agregação de Interfaces de Rede em Ambientes de Nuvem Privada HiPerfCloud: High Performance in Cloud Technical Report doi

Laboratory of Advanced Research on Cloud Computing (LARCC) 2019.

Links | BibTeX | Tags: Benchmark, Cloud computing

2018

Klein, Maikel; Maliszewski, Anderson Mattheus; Griebler, Dalvan

Avaliação do Desempenho do Protocolo Bonding em Máquinas Virtuais LXC e KVM Inproceedings

15th Escola Regional de Redes de Computadores (ERRC), pp. 1-8, Sociedade Brasileira de Computação, Pelotas, BR, 2018.

Abstract | Links | BibTeX | Tags: Benchmark, Cloud computing

Maliszewski, Anderson M; Griebler, Dalvan; Schepke, Claudio; Ditter, Alexander; Fey, Dietmar; Fernandes, Luiz Gustavo

The NAS Benchmark Kernels for Single and Multi-Tenant Cloud Instances with LXC/KVM Inproceedings doi

International Conference on High Performance Computing & Simulation (HPCS), pp. 359-366, IEEE, Orleans, France, 2018.

Abstract | Links | BibTeX | Tags: Benchmark, Cloud computing

@inproceedings{larcc:NAS_cloud_LXC_KVM:HPCS:2018,
title = {The NAS Benchmark Kernels for Single and Multi-Tenant Cloud Instances with LXC/KVM},
author = {Anderson M Maliszewski and Dalvan Griebler and Claudio Schepke and Alexander Ditter and Dietmar Fey and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1109/HPCS.2018.00066},
doi = {10.1109/HPCS.2018.00066},
year = {2018},
date = {2018-07-01},
booktitle = {International Conference on High Performance Computing & Simulation (HPCS)},
pages = {359-366},
publisher = {IEEE},
address = {Orleans, France},
series = {HPCS'18},
abstract = {Private IaaS clouds are an attractive environment for scientific workloads and applications. It provides advantages such as almost instantaneous availability of high-performance computing in a single node as well as compute clusters, easy access for researchers, and users that do not have access to conventional supercomputers. Furthermore, a cloud infrastructure provides elasticity and scalability to ensure and manage any software dependency on the system with no third-party dependency for researchers. However, one of the biggest challenges is to avoid significant performance degradation when migrating these applications from physical nodes to a cloud environment. Also, we lack more research investigations for multi-tenant cloud instances. In this paper, our goal is to perform a comparative performance evaluation of scientific applications with single and multi-tenancy cloud instances using KVM and LXC virtualization technologies under private cloud conditions. All analyses and evaluations were carried out based on NAS Benchmark kernels to simulate different types of workloads. We applied statistic significance tests to highlight the differences. The results have shown that applications running on LXC-based cloud instances outperform KVM-based cloud instances in 93.75% of the experiments w.r.t single tenant. Regarding multi-tenant, LXC instances outperform KVM instances in 45% of the results, where the performance differences were not as significant as expected.},
keywords = {Benchmark, Cloud computing},
pubstate = {published},
tppubtype = {inproceedings}
}

Griebler, Dalvan; Vogel, Adriano; Maron, Carlos A F; Maliszewski, Anderson M; Schepke, Claudio; Fernandes, Luiz Gustavo

Performance of Data Mining, Media, and Financial Applications under Private Cloud Conditions Inproceedings doi

IEEE Symposium on Computers and Communications (ISCC), pp. 1530-1346, IEEE, Natal, Brazil, 2018.

Abstract | Links | BibTeX | Tags: Benchmark, Cloud computing

Rockenbach, Dinei A; Anderle, Nadine; Griebler, Dalvan; Souza, Samuel

Estudo Comparativo de Bancos de Dados NoSQL Journal Article doi

Revista Eletrônica Argentina-Brasil de Tecnologias da Informação e da Comunicação (REABTIC), 1 (8), 2018.

Abstract | Links | BibTeX | Tags: Benchmark, Databases, NoSQL databases

Maron, Carlos A F; Vogel, Adriano; Griebler, Dalvan

Caracterizando a Implantação e o Desempenho de Aplicações em Ambientes de Nuvem Privada com Recursos Compartilhados e Dedicados Technical Report doi

Laboratory of Advanced Research on Cloud Computing (LARCC) 2018.

Links | BibTeX | Tags: Benchmark, Cloud computing