Se você prefere baixar um arquivo único com todas as referências do LARCC, você pode encontrá-lo neste link. Você também pode acompanhar novas publicações via RSS.
Adicionalmente, você também pode encontrar as publicações no perfil do LARCC no Google Scholar .
2024 |
Vogel, Adriano; Danelutto, Marco; Torquati, Massimo; Griebler, Dalvan; Fernandes, Luiz Gustavo Enhancing self-adaptation for efficient decision-making at run-time in streaming applications on multicores Journal Article doi The Journal of Supercomputing, pp. 1573-0484, 2024. Abstract | Links | BibTeX | Tags: multicore, Parallel computing, Stream processing @article{Supercomputing, title = {Enhancing self-adaptation for efficient decision-making at run-time in streaming applications on multicores}, author = {Adriano Vogel and Marco Danelutto and Massimo Torquati and Dalvan Griebler and Luiz Gustavo Fernandes }, editor = {Adriano Vogel and Marco Danelutto and Massimo Torquati and Dalvan Griebler and Luiz Gustavo Fernandes }, url = { https://link.springer.com/article/10.1007/s11227-024-06191-w}, doi = {10.1007/s11227-024-06191-w}, year = {2024}, date = {2024-06-21}, journal = {The Journal of Supercomputing}, pages = {1573-0484}, abstract = {Parallel computing is very important to accelerate the performance of computing applications. Moreover, parallel applications are expected to continue executing in more dynamic environments and react to changing conditions. In this context, applying self-adaptation is a potential solution to achieve a higher level of autonomic abstractions and runtime responsiveness. In our research, we aim to explore and assess the possible abstractions attainable through the transparent management of parallel executions by self-adaptation. Our primary objectives are to expand the adaptation space to better reflect real-world applications and assess the potential for self-adaptation to enhance efficiency. We provide the following scientific contributions: (I) A conceptual framework to improve the designing of self-adaptation; (II) A new decision-making strategy for applications with multiple parallel stages; (III) A comprehensive evaluation of the proposed decision-making strategy compared to the state-of-the-art. The results demonstrate that the proposed conceptual framework can help design and implement self-adaptive strategies that are more modular and reusable. The proposed decision-making strategy provides significant gains in accuracy compared to the state-of-the-art, increasing the parallel applications’ performance and efficiency.}, keywords = {multicore, Parallel computing, Stream processing}, pubstate = {published}, tppubtype = {article} } Parallel computing is very important to accelerate the performance of computing applications. Moreover, parallel applications are expected to continue executing in more dynamic environments and react to changing conditions. In this context, applying self-adaptation is a potential solution to achieve a higher level of autonomic abstractions and runtime responsiveness. In our research, we aim to explore and assess the possible abstractions attainable through the transparent management of parallel executions by self-adaptation. Our primary objectives are to expand the adaptation space to better reflect real-world applications and assess the potential for self-adaptation to enhance efficiency. We provide the following scientific contributions: (I) A conceptual framework to improve the designing of self-adaptation; (II) A new decision-making strategy for applications with multiple parallel stages; (III) A comprehensive evaluation of the proposed decision-making strategy compared to the state-of-the-art. The results demonstrate that the proposed conceptual framework can help design and implement self-adaptive strategies that are more modular and reusable. The proposed decision-making strategy provides significant gains in accuracy compared to the state-of-the-art, increasing the parallel applications’ performance and efficiency. |
Barth, Vitor; Vogel, Adriano; Griebler, Dalvan; Pinho, Marcio Sarroglia Open Source Augmented Reality in Data Center Infrastructure Maintenance Inproceedings doi 2024, ISBN: 9798400700026. Abstract | Links | BibTeX | Tags: Augmented Reality, Data Center Management @inproceedings{10.1145/3604479.3604527, title = {Open Source Augmented Reality in Data Center Infrastructure Maintenance}, author = {Vitor Barth and Adriano Vogel and Dalvan Griebler and Marcio Sarroglia Pinho}, doi = {https://doi.org/10.1145/3604479.3604527}, isbn = {9798400700026}, year = {2024}, date = {2024-04-08}, journal = {Association for Computing Machinery}, abstract = {The use of Augmented Reality (AR) is increasing in several application domains. However, applying AR in data center infrastructure management is still limited. Although solutions are providing DCIM (Data Center Infrastructure Management) with augmented reality, they are paid commercial applications that do not provide the source code. Therefore, using such solutions, their reproducibility, and their evaluation is difficult in academic and computational research environments. In this vein, this work aims to create a prototype of an open-source AR mobile application for maintenance management in data centers. Moreover, this work provides a theoretical foundation based on a literature review of solutions available in the academic and industry fields. We evaluated the proposed solution with qualitative and quantitative methods, including user feedback on the developed application and comparison to OpenDCIM - a well-known tool for data center management. The results from the qualitative evaluation demonstrate that the proposed solution improves the user experience. However, it is not possible to infer significant productivity gains from the quantitative evaluation.}, keywords = {Augmented Reality, Data Center Management}, pubstate = {published}, tppubtype = {inproceedings} } The use of Augmented Reality (AR) is increasing in several application domains. However, applying AR in data center infrastructure management is still limited. Although solutions are providing DCIM (Data Center Infrastructure Management) with augmented reality, they are paid commercial applications that do not provide the source code. Therefore, using such solutions, their reproducibility, and their evaluation is difficult in academic and computational research environments. In this vein, this work aims to create a prototype of an open-source AR mobile application for maintenance management in data centers. Moreover, this work provides a theoretical foundation based on a literature review of solutions available in the academic and industry fields. We evaluated the proposed solution with qualitative and quantitative methods, including user feedback on the developed application and comparison to OpenDCIM - a well-known tool for data center management. The results from the qualitative evaluation demonstrate that the proposed solution improves the user experience. However, it is not possible to infer significant productivity gains from the quantitative evaluation. |
Mencagli, Gabriele; Torquati, Massimo; Griebler, Dalvan; Fais, Alessandra; Danelutto, Marco General-purpose data stream processing on heterogeneous architectures with WindFlow Journal Article doi Journal of Parallel and Distributed Computing, 184 , pp. 104782, 2024. Abstract | Links | BibTeX | Tags: GPGPU, Stream processing @article{MENCAGLI:JPDC:24, title = {General-purpose data stream processing on heterogeneous architectures with WindFlow}, author = {Gabriele Mencagli and Massimo Torquati and Dalvan Griebler and Alessandra Fais and Marco Danelutto}, url = {https://www.sciencedirect.com/science/article/pii/S0743731523001521}, doi = {https://doi.org/10.1016/j.jpdc.2023.104782}, year = {2024}, date = {2024-02-01}, journal = {Journal of Parallel and Distributed Computing}, volume = {184}, pages = {104782}, publisher = {Elsevier}, abstract = {Many emerging applications analyze data streams by running graphs of communicating tasks called operators. To develop and deploy such applications, Stream Processing Systems (SPSs) like Apache Storm and Flink have been made available to researchers and practitioners. They exhibit imperative or declarative programming interfaces to develop operators running arbitrary algorithms working on structured or unstructured data streams. In this context, the interest in leveraging hardware acceleration with GPUs has become more pronounced in high-throughput use cases. Unfortunately, GPU acceleration has been studied for relational operators working on structured streams only, while non-relational operators have often been overlooked. This paper presents WindFlow, a library supporting the seamless GPU offloading of general partitioned-stateful operators, extending the range of operators that benefit from hardware acceleration. Its design provides high throughput still exposing a high-level API to users compared with the raw utilization of GPUs in Apache Flink.}, keywords = {GPGPU, Stream processing}, pubstate = {published}, tppubtype = {article} } Many emerging applications analyze data streams by running graphs of communicating tasks called operators. To develop and deploy such applications, Stream Processing Systems (SPSs) like Apache Storm and Flink have been made available to researchers and practitioners. They exhibit imperative or declarative programming interfaces to develop operators running arbitrary algorithms working on structured or unstructured data streams. In this context, the interest in leveraging hardware acceleration with GPUs has become more pronounced in high-throughput use cases. Unfortunately, GPU acceleration has been studied for relational operators working on structured streams only, while non-relational operators have often been overlooked. This paper presents WindFlow, a library supporting the seamless GPU offloading of general partitioned-stateful operators, extending the range of operators that benefit from hardware acceleration. Its design provides high throughput still exposing a high-level API to users compared with the raw utilization of GPUs in Apache Flink. |
2023 |
Welter, Greice Aline; Vogel, Adriano Proposta de Monitoramento de um Data Center usando IoT Undergraduate Thesis Undergraduate Thesis, 2023. Abstract | Links | BibTeX | Tags: Cloud computing, IoT, Monitoramento @misc{Welter2023, title = {Proposta de Monitoramento de um Data Center usando IoT}, author = {Greice Aline Welter and Adriano Vogel }, editor = {Greice Aline Welter and Adriano Vogel and Fauzi Shubeita }, url = {https://larcc.setrem.com.br/wp-content/uploads/2024/10/EC_PROJETO_TCC_New-2.pdf}, year = {2023}, date = {2023-12-30}, abstract = {Nowadays, monitoring a datacenter is of paramount importance due to the growing dependence of companies on technology and online services. Continuous moni- toring allows you to quickly identify and solve problems, minimizing downtime and maintaining the availability of critical systems. Furthermore, monitoring assists in detecting security threats, protecting sensitive data and preventing cyber-attacks. With ever-increasing demands for performance and energy efficiency, monitoring helps to optimize the use of resources, reducing costs and environmental impact. The objective of this work is to evaluate the performance in monitoring the environ- ment of a datacenter. This work used ALLNET devices, such as the ll3500v2, ll4404 and ll3008 to carry out the humidity and temperature readings of a datacenter, and a comparison was made with an esp8266 and a dht22 sensor.}, howpublished = {Undergraduate Thesis}, keywords = {Cloud computing, IoT, Monitoramento}, pubstate = {published}, tppubtype = {misc} } Nowadays, monitoring a datacenter is of paramount importance due to the growing dependence of companies on technology and online services. Continuous moni- toring allows you to quickly identify and solve problems, minimizing downtime and maintaining the availability of critical systems. Furthermore, monitoring assists in detecting security threats, protecting sensitive data and preventing cyber-attacks. With ever-increasing demands for performance and energy efficiency, monitoring helps to optimize the use of resources, reducing costs and environmental impact. The objective of this work is to evaluate the performance in monitoring the environ- ment of a datacenter. This work used ALLNET devices, such as the ll3500v2, ll4404 and ll3008 to carry out the humidity and temperature readings of a datacenter, and a comparison was made with an esp8266 and a dht22 sensor. |
Maliszewski, Anderson Matthias; Griebler, Dalvan; Roloff, Eduardo; da Righi, Rodrigo Rosa; Navaux, Philippe O A Evaluation Model and Performance Analysis of NIC Aggregations in Containerized Private Clouds Inproceedings International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW), pp. 1-8, IEEE, Porto Alegre, Brazil, 2023. Links | BibTeX | Tags: Cloud computing @inproceedings{larcc:MALISZEWSKI:SBAC-PADW:23, title = {Evaluation Model and Performance Analysis of NIC Aggregations in Containerized Private Clouds}, author = {Anderson Matthias Maliszewski and Dalvan Griebler and Eduardo Roloff and Rodrigo Rosa da Righi and Philippe O A Navaux}, url = {https://doi.org/}, year = {2023}, date = {2023-10-01}, booktitle = {International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)}, pages = {1-8}, publisher = {IEEE}, address = {Porto Alegre, Brazil}, series = {SBAC-PADW'23}, keywords = {Cloud computing}, pubstate = {published}, tppubtype = {inproceedings} } |
Alf, Lucas; Hoffmann, Renato Barreto; Müller, Caetano; Griebler, Dalvan Análise da Execução de Algoritmos de Aprendizado de Máquina em Dispositivos Embarcados Inproceedings Anais do XXIII Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD), pp. 1-12, SBC, Porto Alegre, Brasil, 2023. Links | BibTeX | Tags: Deep learning, IoT @inproceedings{ALF:WSCAD:23, title = {Análise da Execução de Algoritmos de Aprendizado de Máquina em Dispositivos Embarcados}, author = {Lucas Alf and Renato Barreto Hoffmann and Caetano Müller and Dalvan Griebler}, url = {https://doi.org/}, year = {2023}, date = {2023-10-01}, booktitle = {Anais do XXIII Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD)}, pages = {1-12}, publisher = {SBC}, address = {Porto Alegre, Brasil}, keywords = {Deep learning, IoT}, pubstate = {published}, tppubtype = {inproceedings} } |
Leonarczyk, Ricardo; Griebler, Dalvan; Mencagli, Gabriele; Danelutto, Marco Evaluation of Adaptive Micro-batching Techniques for GPU-accelerated Stream Processing Inproceedings Euro-ParW 2023: Parallel Processing Workshops, pp. 1-8, Springer, Limassol, 2023. Links | BibTeX | Tags: GPGPU, Stream processing @inproceedings{LEONARCZYK:Euro-ParW:23, title = {Evaluation of Adaptive Micro-batching Techniques for GPU-accelerated Stream Processing}, author = {Ricardo Leonarczyk and Dalvan Griebler and Gabriele Mencagli and Marco Danelutto}, url = {https://doi.org/}, year = {2023}, date = {2023-08-01}, booktitle = {Euro-ParW 2023: Parallel Processing Workshops}, pages = {1-8}, publisher = {Springer}, address = {Limassol}, series = {Euro-ParW'23}, keywords = {GPGPU, Stream processing}, pubstate = {published}, tppubtype = {inproceedings} } |
Leonarczyk, Ricardo; Griebler, Dalvan Avaliação da Auto-Adaptação de Micro-Lote para aplicação de Processamento de Streaming em GPUs Inproceedings doi Anais da XXIII Escola Regional de Alto Desempenho da Região Sul, pp. 123-124, Sociedade Brasileira de Computação, Porto Alegre, Brazil, 2023. Abstract | Links | BibTeX | Tags: GPGPU, Self-adaptation, Stream processing @inproceedings{LEONARCZYK:ERAD:23, title = {Avaliação da Auto-Adaptação de Micro-Lote para aplicação de Processamento de Streaming em GPUs}, author = {Ricardo Leonarczyk and Dalvan Griebler}, url = {https://doi.org/10.5753/eradrs.2023.229267}, doi = {10.5753/eradrs.2023.229267}, year = {2023}, date = {2023-05-01}, booktitle = {Anais da XXIII Escola Regional de Alto Desempenho da Região Sul}, pages = {123-124}, publisher = {Sociedade Brasileira de Computação}, address = {Porto Alegre, Brazil}, abstract = {Este artigo apresenta uma avaliação de algoritmos para regular a latência através da auto-adaptação de micro-lote em sistemas de processamento de streaming acelerados por GPU. Os resultados demonstraram que o algoritmo com o fator de adaptação fixo conseguiu ficar por mais tempo na região de latência especificada para a aplicação.}, keywords = {GPGPU, Self-adaptation, Stream processing}, pubstate = {published}, tppubtype = {inproceedings} } Este artigo apresenta uma avaliação de algoritmos para regular a latência através da auto-adaptação de micro-lote em sistemas de processamento de streaming acelerados por GPU. Os resultados demonstraram que o algoritmo com o fator de adaptação fixo conseguiu ficar por mais tempo na região de latência especificada para a aplicação. |
Fim, Gabriel Rustick; Griebler, Dalvan Implementação e Avaliação do Paralelismo de Flink nas Aplicações de Processamento de Log e Análise de Cliques Inproceedings doi Anais da XXIII Escola Regional de Alto Desempenho da Região Sul, pp. 69-72, Sociedade Brasileira de Computação, Porto Alegre, Brazil, 2023. Abstract | Links | BibTeX | Tags: Parallel programming, Stream processing @inproceedings{larcc:FIM:ERAD:23, title = {Implementação e Avaliação do Paralelismo de Flink nas Aplicações de Processamento de Log e Análise de Cliques}, author = {Gabriel Rustick Fim and Dalvan Griebler}, url = {https://doi.org/10.5753/eradrs.2023.229290}, doi = {10.5753/eradrs.2023.229290}, year = {2023}, date = {2023-05-01}, booktitle = {Anais da XXIII Escola Regional de Alto Desempenho da Região Sul}, pages = {69-72}, publisher = {Sociedade Brasileira de Computação}, address = {Porto Alegre, Brazil}, abstract = {Este trabalho visou implementar e avaliar o desempenho das aplicações de Processamento de Log e Análise de Cliques no Apache Flink, comparando o desempenho com Apache Storm em um ambiente computacional distribuído. Os resultados mostram que a execução em Flink apresenta um consumo de recursos relativamente menor quando comparada a execução em Storm, mas possui um desvio padrão alto expondo um desbalanceamento de carga em execuções onde algum componente da aplicação é replicado.}, keywords = {Parallel programming, Stream processing}, pubstate = {published}, tppubtype = {inproceedings} } Este trabalho visou implementar e avaliar o desempenho das aplicações de Processamento de Log e Análise de Cliques no Apache Flink, comparando o desempenho com Apache Storm em um ambiente computacional distribuído. Os resultados mostram que a execução em Flink apresenta um consumo de recursos relativamente menor quando comparada a execução em Storm, mas possui um desvio padrão alto expondo um desbalanceamento de carga em execuções onde algum componente da aplicação é replicado. |
Dopke, Luan; Griebler, Dalvan Estudo Sobre Spark nas Aplicações de Processamento de Log e Análise de Cliques Inproceedings doi Anais da XXIII Escola Regional de Alto Desempenho da Região Sul, pp. 85-88, Sociedade Brasileira de Computação, Porto Alegre, Brazil, 2023. Abstract | Links | BibTeX | Tags: Benchmark, Stream processing @inproceedings{larcc:DOPKE:ERAD:23, title = {Estudo Sobre Spark nas Aplicações de Processamento de Log e Análise de Cliques}, author = {Luan Dopke and Dalvan Griebler}, url = {https://doi.org/10.5753/eradrs.2023.229298}, doi = {10.5753/eradrs.2023.229298}, year = {2023}, date = {2023-05-01}, booktitle = {Anais da XXIII Escola Regional de Alto Desempenho da Região Sul}, pages = {85-88}, publisher = {Sociedade Brasileira de Computação}, address = {Porto Alegre, Brazil}, abstract = {O uso de aplicações de processamento de dados de fluxo contínuo vem crescendo cada vez mais, dado este fato o presente estudo visa mensurar a desempenho do framework Apache Spark Strucutured Streaming perante o framework Apache Storm nas aplicações de fluxo contínuo de dados, estas sendo processamento de logs e análise de cliques. Os resultados demonstram melhor desempenho para o Apache Storm em ambas as aplicações.}, keywords = {Benchmark, Stream processing}, pubstate = {published}, tppubtype = {inproceedings} } O uso de aplicações de processamento de dados de fluxo contínuo vem crescendo cada vez mais, dado este fato o presente estudo visa mensurar a desempenho do framework Apache Spark Strucutured Streaming perante o framework Apache Storm nas aplicações de fluxo contínuo de dados, estas sendo processamento de logs e análise de cliques. Os resultados demonstram melhor desempenho para o Apache Storm em ambas as aplicações. |
Andrade, Gabriella; Griebler, Dalvan; Santos, Rodrigo; Fernandes, Luiz Gustavo A parallel programming assessment for stream processing applications on multi-core systems Journal Article doi Computer Standards & Interfaces, 84 , pp. 103691, 2023. Abstract | Links | BibTeX | Tags: Parallel programming, Stream processing @article{ANDRADE:CSI:2023, title = {A parallel programming assessment for stream processing applications on multi-core systems}, author = {Gabriella Andrade and Dalvan Griebler and Rodrigo Santos and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1016/j.csi.2022.103691}, doi = {10.1016/j.csi.2022.103691}, year = {2023}, date = {2023-03-01}, journal = {Computer Standards & Interfaces}, volume = {84}, pages = {103691}, publisher = {Elsevier}, abstract = {Multi-core systems are any computing device nowadays and stream processing applications are becoming recurrent workloads, demanding parallelism to achieve the desired quality of service. As soon as data, tasks, or requests arrive, they must be computed, analyzed, or processed. Since building such applications is not a trivial task, the software industry must adopt parallel APIs (Application Programming Interfaces) that simplify the exploitation of parallelism in hardware for accelerating time-to-market. In the last years, research efforts in academia and industry provided a set of parallel APIs, increasing productivity to software developers. However, a few studies are seeking to prove the usability of these interfaces. In this work, we aim to present a parallel programming assessment regarding the usability of parallel API for expressing parallelism on the stream processing application domain and multi-core systems. To this end, we conducted an empirical study with beginners in parallel application development. The study covered three parallel APIs, reporting several quantitative and qualitative indicators involving developers. Our contribution also comprises a parallel programming assessment methodology, which can be replicated in future assessments. This study revealed important insights such as recurrent compile-time and programming logic errors performed by beginners in parallel programming, as well as the programming effort, challenges, and learning curve. Moreover, we collected the participants’ opinions about their experience in this study to understand deeply the results achieved.}, keywords = {Parallel programming, Stream processing}, pubstate = {published}, tppubtype = {article} } Multi-core systems are any computing device nowadays and stream processing applications are becoming recurrent workloads, demanding parallelism to achieve the desired quality of service. As soon as data, tasks, or requests arrive, they must be computed, analyzed, or processed. Since building such applications is not a trivial task, the software industry must adopt parallel APIs (Application Programming Interfaces) that simplify the exploitation of parallelism in hardware for accelerating time-to-market. In the last years, research efforts in academia and industry provided a set of parallel APIs, increasing productivity to software developers. However, a few studies are seeking to prove the usability of these interfaces. In this work, we aim to present a parallel programming assessment regarding the usability of parallel API for expressing parallelism on the stream processing application domain and multi-core systems. To this end, we conducted an empirical study with beginners in parallel application development. The study covered three parallel APIs, reporting several quantitative and qualitative indicators involving developers. Our contribution also comprises a parallel programming assessment methodology, which can be replicated in future assessments. This study revealed important insights such as recurrent compile-time and programming logic errors performed by beginners in parallel programming, as well as the programming effort, challenges, and learning curve. Moreover, we collected the participants’ opinions about their experience in this study to understand deeply the results achieved. |
Vogel, Adriano; Danelutto, Marco; Griebler, Dalvan; Fernandes, Luiz Gustavo Revisiting self-adaptation for efficient decision-making at run-time in parallel executions Inproceedings doi 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 43-50, IEEE, Naples, Italy, 2023. Abstract | Links | BibTeX | Tags: Parallel programming, Self-adaptation @inproceedings{VOGEL:PDP:23, title = {Revisiting self-adaptation for efficient decision-making at run-time in parallel executions}, author = {Adriano Vogel and Marco Danelutto and Dalvan Griebler and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1109/PDP59025.2023.00015}, doi = {10.1109/PDP59025.2023.00015}, year = {2023}, date = {2023-03-01}, booktitle = {31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)}, pages = {43-50}, publisher = {IEEE}, address = {Naples, Italy}, series = {PDP'23}, abstract = {Self-adaptation is a potential alternative to provide a higher level of autonomic abstractions and run-time responsiveness in parallel executions. However, the recurrent problem is that self-adaptation is still limited in flexibility and efficiency. For instance, there is a lack of mechanisms to apply adaptation actions and efficient decision-making strategies to decide which configurations should be conveniently enforced at run-time. In this work, we are interested in providing and evaluating potential abstractions achievable with self-adaptation transparently managing parallel executions. Therefore, we provide a new mechanism to support self-adaptation in applications with multiple parallel stages executed in multi-cores. Moreover, we reproduce, reimplement, and evaluate an existing decision-making strategy in our scenario. The observations from the results show that the proposed mechanism for self-adaptation can provide new parallelism abstractions and autonomous responsiveness at run-time. On the other hand, there is a need for more accurate decision-making strategies to enable efficient executions of applications in resource-constrained scenarios like multi-cores.}, keywords = {Parallel programming, Self-adaptation}, pubstate = {published}, tppubtype = {inproceedings} } Self-adaptation is a potential alternative to provide a higher level of autonomic abstractions and run-time responsiveness in parallel executions. However, the recurrent problem is that self-adaptation is still limited in flexibility and efficiency. For instance, there is a lack of mechanisms to apply adaptation actions and efficient decision-making strategies to decide which configurations should be conveniently enforced at run-time. In this work, we are interested in providing and evaluating potential abstractions achievable with self-adaptation transparently managing parallel executions. Therefore, we provide a new mechanism to support self-adaptation in applications with multiple parallel stages executed in multi-cores. Moreover, we reproduce, reimplement, and evaluate an existing decision-making strategy in our scenario. The observations from the results show that the proposed mechanism for self-adaptation can provide new parallelism abstractions and autonomous responsiveness at run-time. On the other hand, there is a need for more accurate decision-making strategies to enable efficient executions of applications in resource-constrained scenarios like multi-cores. |
Garcia, Adriano Marques; Griebler, Dalvan; Schepke, Claudio; García, José Daniel; Muñoz, Javier Fernández; Fernandes, Luiz Gustavo A Latency, Throughput, and Programmability Perspective of GrPPI for Streaming on Multi-cores Inproceedings doi 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 164-168, IEEE, Naples, Italy, 2023. Abstract | Links | BibTeX | Tags: Benchmark, Stream processing @inproceedings{GARCIA:PDP:23, title = {A Latency, Throughput, and Programmability Perspective of GrPPI for Streaming on Multi-cores}, author = {Adriano Marques Garcia and Dalvan Griebler and Claudio Schepke and José Daniel García and Javier Fernández Muñoz and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1109/PDP59025.2023.00033}, doi = {10.1109/PDP59025.2023.00033}, year = {2023}, date = {2023-03-01}, booktitle = {31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)}, pages = {164-168}, publisher = {IEEE}, address = {Naples, Italy}, series = {PDP'23}, abstract = {Several solutions aim to simplify the burdening task of parallel programming. The GrPPI library is one of them. It allows users to implement parallel code for multiple backends through a unified, abstract, and generic layer while promising minimal overhead on performance. An outspread evaluation of GrPPI regarding stream parallelism with representative metrics for this domain, such as throughput and latency, was not yet done. In this work, we evaluate GrPPI focused on stream processing. We evaluate performance, memory usage, and programming effort and compare them against handwritten parallel code. For this, we use the benchmarking framework SPBench to build custom GrPPI benchmarks. The basis of the benchmarks is real applications, such as Lane Detection, Bzip2, Face Recognizer, and Ferret. Experiments show that while performance is competitive with handwritten code in some cases, in other cases, the infeasibility of fine-tuning GrPPI is a crucial drawback. Despite this, programmability experiments estimate that GrPPI has the potential to reduce by about three times the development time of parallel applications.}, keywords = {Benchmark, Stream processing}, pubstate = {published}, tppubtype = {inproceedings} } Several solutions aim to simplify the burdening task of parallel programming. The GrPPI library is one of them. It allows users to implement parallel code for multiple backends through a unified, abstract, and generic layer while promising minimal overhead on performance. An outspread evaluation of GrPPI regarding stream parallelism with representative metrics for this domain, such as throughput and latency, was not yet done. In this work, we evaluate GrPPI focused on stream processing. We evaluate performance, memory usage, and programming effort and compare them against handwritten parallel code. For this, we use the benchmarking framework SPBench to build custom GrPPI benchmarks. The basis of the benchmarks is real applications, such as Lane Detection, Bzip2, Face Recognizer, and Ferret. Experiments show that while performance is competitive with handwritten code in some cases, in other cases, the infeasibility of fine-tuning GrPPI is a crucial drawback. Despite this, programmability experiments estimate that GrPPI has the potential to reduce by about three times the development time of parallel applications. |
Garcia, Adriano Marques; Griebler, Dalvan; Schepke, Claudio; Fernandes, Luiz Gustavo SPBench: a framework for creating benchmarks of stream processing applications Journal Article doi Computing, 105 (5), pp. 1077-1099, 2023. Abstract | Links | BibTeX | Tags: Benchmark, Stream processing @article{GARCIA:Computing:23, title = {SPBench: a framework for creating benchmarks of stream processing applications}, author = {Adriano Marques Garcia and Dalvan Griebler and Claudio Schepke and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1007/s00607-021-01025-6}, doi = {10.1007/s00607-021-01025-6}, year = {2023}, date = {2023-01-01}, journal = {Computing}, volume = {105}, number = {5}, pages = {1077-1099}, publisher = {Springer}, abstract = {In a fast-changing data-driven world, real-time data processing systems are becoming ubiquitous in everyday applications. The increasing data we produce, such as audio, video, image, and, text are demanding quickly and efficiently computation. Stream Parallelism allows accelerating this computation for real-time processing. But it is still a challenging task and most reserved for experts. In this paper, we present SPBench, a framework for benchmarking stream processing applications. It aims to support users with a set of real-world stream processing applications, which are made accessible through an Application Programming Interface (API) and executable via Command Line Interface (CLI) to create custom benchmarks. We tested SPBench by implementing parallel benchmarks with Intel Threading Building Blocks (TBB), FastFlow, and SPar. This evaluation provided useful insights and revealed the feasibility of the proposed framework in terms of usage, customization, and performance analysis. SPBench demonstrated to be a high-level, reusable, extensible, and easy of use abstraction to build parallel stream processing benchmarks on multi-core architectures.}, keywords = {Benchmark, Stream processing}, pubstate = {published}, tppubtype = {article} } In a fast-changing data-driven world, real-time data processing systems are becoming ubiquitous in everyday applications. The increasing data we produce, such as audio, video, image, and, text are demanding quickly and efficiently computation. Stream Parallelism allows accelerating this computation for real-time processing. But it is still a challenging task and most reserved for experts. In this paper, we present SPBench, a framework for benchmarking stream processing applications. It aims to support users with a set of real-world stream processing applications, which are made accessible through an Application Programming Interface (API) and executable via Command Line Interface (CLI) to create custom benchmarks. We tested SPBench by implementing parallel benchmarks with Intel Threading Building Blocks (TBB), FastFlow, and SPar. This evaluation provided useful insights and revealed the feasibility of the proposed framework in terms of usage, customization, and performance analysis. SPBench demonstrated to be a high-level, reusable, extensible, and easy of use abstraction to build parallel stream processing benchmarks on multi-core architectures. |
Araujo, Gabriell; Griebler, Dalvan; Rockenbach, Dinei A; Danelutto, Marco; Fernandes, Luiz Gustavo NAS Parallel Benchmarks with CUDA and Beyond Journal Article doi Software: Practice and Experience, 53 (1), pp. 53-80, 2023. Abstract | Links | BibTeX | Tags: Benchmark, GPGPU, Parallel programming @article{ARAUJO:SPE:23, title = {NAS Parallel Benchmarks with CUDA and Beyond}, author = {Gabriell Araujo and Dalvan Griebler and Dinei A Rockenbach and Marco Danelutto and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1002/spe.3056}, doi = {10.1002/spe.3056}, year = {2023}, date = {2023-01-01}, journal = {Software: Practice and Experience}, volume = {53}, number = {1}, pages = {53-80}, publisher = {Wiley}, abstract = {NAS Parallel Benchmarks (NPB) is a standard benchmark suite used in the evaluation of parallel hardware and software. Several research efforts from academia have made these benchmarks available with different parallel programming models beyond the original versions with OpenMP and MPI. This work joins these research efforts by providing a new CUDA implementation for NPB. Our contribution covers different aspects beyond the implementation. First, we define design principles based on the best programming practices for GPUs and apply them to each benchmark using CUDA. Second, we provide ease of use parametrization support for configuring the number of threads per block in our version. Third, we conduct a broad study on the impact of the number of threads per block in the benchmarks. Fourth, we propose and evaluate five strategies for helping to find a better number of threads per block configuration. The results have revealed relevant performance improvement solely by changing the number of threads per block, showing performance improvements from 8% up to 717% among the benchmarks. Fifth, we conduct a comparative analysis with the literature, evaluating performance, memory consumption, code refactoring required, and parallelism implementations. The performance results have shown up to 267% improvements over the best benchmarks versions available. We also observe the best and worst design choices, concerning code size and the performance trade-off. Lastly, we highlight the challenges of implementing parallel CFD applications for GPUs and how the computations impact the GPU's behavior.}, keywords = {Benchmark, GPGPU, Parallel programming}, pubstate = {published}, tppubtype = {article} } NAS Parallel Benchmarks (NPB) is a standard benchmark suite used in the evaluation of parallel hardware and software. Several research efforts from academia have made these benchmarks available with different parallel programming models beyond the original versions with OpenMP and MPI. This work joins these research efforts by providing a new CUDA implementation for NPB. Our contribution covers different aspects beyond the implementation. First, we define design principles based on the best programming practices for GPUs and apply them to each benchmark using CUDA. Second, we provide ease of use parametrization support for configuring the number of threads per block in our version. Third, we conduct a broad study on the impact of the number of threads per block in the benchmarks. Fourth, we propose and evaluate five strategies for helping to find a better number of threads per block configuration. The results have revealed relevant performance improvement solely by changing the number of threads per block, showing performance improvements from 8% up to 717% among the benchmarks. Fifth, we conduct a comparative analysis with the literature, evaluating performance, memory consumption, code refactoring required, and parallelism implementations. The performance results have shown up to 267% improvements over the best benchmarks versions available. We also observe the best and worst design choices, concerning code size and the performance trade-off. Lastly, we highlight the challenges of implementing parallel CFD applications for GPUs and how the computations impact the GPU's behavior. |
Garcia, Adriano Marques; Griebler, Dalvan; Schepke, Claudio; Fernandes, Luiz Gustavo Micro-batch and data frequency for stream processing on multi-cores Journal Article doi The Journal of Supercomputing, 79 (8), pp. 9206-9244, 2023. Abstract | Links | BibTeX | Tags: Benchmark, Self-adaptation, Stream processing @article{GARCIA:JS:23, title = {Micro-batch and data frequency for stream processing on multi-cores}, author = {Adriano Marques Garcia and Dalvan Griebler and Claudio Schepke and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1007/s11227-022-05024-y}, doi = {10.1007/s11227-022-05024-y}, year = {2023}, date = {2023-01-01}, journal = {The Journal of Supercomputing}, volume = {79}, number = {8}, pages = {9206-9244}, publisher = {Springer}, abstract = {Latency or throughput is often critical performance metrics in stream processing. Applications’ performance can fluctuate depending on the input stream. This unpredictability is due to the variety in data arrival frequency and size, complexity, and other factors. Researchers are constantly investigating new ways to mitigate the impact of these variations on performance with self-adaptive techniques involving elasticity or micro-batching. However, there is a lack of benchmarks capable of creating test scenarios to further evaluate these techniques. This work extends and improves the SPBench benchmarking framework to support dynamic micro-batching and data stream frequency management. We also propose a set of algorithms that generates the most commonly used frequency patterns for benchmarking stream processing in related work. It allows the creation of a wide variety of test scenarios. To validate our solution, we use SPBench to create custom benchmarks and evaluate the impact of micro-batching and data stream frequency on the performance of Intel TBB and FastFlow. These are two libraries that leverage stream parallelism for multi-core architectures. Our results demonstrated that our test cases did not benefit from micro-batches on multi-cores. For different data stream frequency configurations, TBB ensured the lowest latency, while FastFlow assured higher throughput in shorter pipelines.}, keywords = {Benchmark, Self-adaptation, Stream processing}, pubstate = {published}, tppubtype = {article} } Latency or throughput is often critical performance metrics in stream processing. Applications’ performance can fluctuate depending on the input stream. This unpredictability is due to the variety in data arrival frequency and size, complexity, and other factors. Researchers are constantly investigating new ways to mitigate the impact of these variations on performance with self-adaptive techniques involving elasticity or micro-batching. However, there is a lack of benchmarks capable of creating test scenarios to further evaluate these techniques. This work extends and improves the SPBench benchmarking framework to support dynamic micro-batching and data stream frequency management. We also propose a set of algorithms that generates the most commonly used frequency patterns for benchmarking stream processing in related work. It allows the creation of a wide variety of test scenarios. To validate our solution, we use SPBench to create custom benchmarks and evaluate the impact of micro-batching and data stream frequency on the performance of Intel TBB and FastFlow. These are two libraries that leverage stream parallelism for multi-core architectures. Our results demonstrated that our test cases did not benefit from micro-batches on multi-cores. For different data stream frequency configurations, TBB ensured the lowest latency, while FastFlow assured higher throughput in shorter pipelines. |
2022 |
Löff, Júnior; Hoffmann, Renato Barreto; Griebler, Dalvan; Fernandes, Luiz Gustavo Combining stream with data parallelism abstractions for multi-cores Journal Article doi Journal of Computer Languages, 73 , pp. 101160, 2022. Abstract | Links | BibTeX | Tags: Parallel programming, Stream processing @article{LOFF:COLA:22, title = {Combining stream with data parallelism abstractions for multi-cores}, author = {Júnior Löff and Renato Barreto Hoffmann and Dalvan Griebler and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1016/j.cola.2022.101160}, doi = {10.1016/j.cola.2022.101160}, year = {2022}, date = {2022-12-01}, journal = {Journal of Computer Languages}, volume = {73}, pages = {101160}, publisher = {Elsevier}, abstract = {Stream processing applications have seen an increasing demand with the raised availability of sensors, IoT devices, and user data. Modern systems can generate millions of data items per day that require to be processed timely. To deal with this demand, application programmers must consider parallelism to exploit the maximum performance of the underlying hardware resources. In this work, we introduce improvements to stream processing applications by exploiting fine-grained data parallelism (via Map and MapReduce) inside coarse-grained stream parallelism stages. The improvements are including techniques for identifying data parallelism in sequential codes, a new language, semantic analysis, and a set of definition and transformation rules to perform source-to-source parallel code generation. Moreover, we investigate the feasibility of employing higher-level programming abstractions to support the proposed optimizations. For that, we elect SPar programming model as a use case, and extend it by adding two new attributes to its language and implementing our optimizations as a new algorithm in the SPar compiler. We conduct a set of experiments in representative stream processing and data-parallel applications. The results showed that our new compiler algorithm is efficient and that performance improved by up to 108.4x in data-parallel applications. Furthermore, experiments evaluating stream processing applications towards the composition of stream and data parallelism revealed new insights. The results showed that such composition may improve latencies by up to an order of magnitude. Also, it enables programmers to exploit different degrees of stream and data parallelism to accomplish a balance between throughput and latency according to their necessity.}, keywords = {Parallel programming, Stream processing}, pubstate = {published}, tppubtype = {article} } Stream processing applications have seen an increasing demand with the raised availability of sensors, IoT devices, and user data. Modern systems can generate millions of data items per day that require to be processed timely. To deal with this demand, application programmers must consider parallelism to exploit the maximum performance of the underlying hardware resources. In this work, we introduce improvements to stream processing applications by exploiting fine-grained data parallelism (via Map and MapReduce) inside coarse-grained stream parallelism stages. The improvements are including techniques for identifying data parallelism in sequential codes, a new language, semantic analysis, and a set of definition and transformation rules to perform source-to-source parallel code generation. Moreover, we investigate the feasibility of employing higher-level programming abstractions to support the proposed optimizations. For that, we elect SPar programming model as a use case, and extend it by adding two new attributes to its language and implementing our optimizations as a new algorithm in the SPar compiler. We conduct a set of experiments in representative stream processing and data-parallel applications. The results showed that our new compiler algorithm is efficient and that performance improved by up to 108.4x in data-parallel applications. Furthermore, experiments evaluating stream processing applications towards the composition of stream and data parallelism revealed new insights. The results showed that such composition may improve latencies by up to an order of magnitude. Also, it enables programmers to exploit different degrees of stream and data parallelism to accomplish a balance between throughput and latency according to their necessity. |
Andrade, Gabriella; Griebler, Dalvan; Santos, Rodrigo; Fernandes, Luiz Gustavo Opinião de Brasileiros Sobre a Produtividade no Desenvolvimento de Aplicações Paralelas Inproceedings doi Anais do XXII Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD), pp. 276-287, SBC, Florianópolis, Brasil, 2022. Abstract | Links | BibTeX | Tags: Parallel programming @inproceedings{ANDRADE:WSCAD:22, title = {Opinião de Brasileiros Sobre a Produtividade no Desenvolvimento de Aplicações Paralelas}, author = {Gabriella Andrade and Dalvan Griebler and Rodrigo Santos and Luiz Gustavo Fernandes}, url = {https://doi.org/10.5753/wscad.2022.226392}, doi = {10.5753/wscad.2022.226392}, year = {2022}, date = {2022-10-01}, booktitle = {Anais do XXII Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD)}, pages = {276-287}, publisher = {SBC}, address = {Florianópolis, Brasil}, abstract = {A partir da popularização das arquiteturas paralelas, surgiram várias interfaces de programação a fim de facilitar a exploração de tais arquiteturas e aumentar a produtividade dos desenvolvedores. Entretanto, desenvolver aplicações paralelas ainda é uma tarefa complexa para desenvolvedores com pouca experiência. Neste trabalho, realizamos uma pesquisa para descobrir a opinião de desenvolvedores de aplicações paralelas sobre os fatores que impedem a produtividade. Nossos resultados mostraram que a experiência dos desenvolvedores é uma das principais razões para a baixa produtividade. Além disso, os resultados indicaram formas para contornar este problema, como melhorar e incentivar o ensino de programação paralela em cursos de graduação.}, keywords = {Parallel programming}, pubstate = {published}, tppubtype = {inproceedings} } A partir da popularização das arquiteturas paralelas, surgiram várias interfaces de programação a fim de facilitar a exploração de tais arquiteturas e aumentar a produtividade dos desenvolvedores. Entretanto, desenvolver aplicações paralelas ainda é uma tarefa complexa para desenvolvedores com pouca experiência. Neste trabalho, realizamos uma pesquisa para descobrir a opinião de desenvolvedores de aplicações paralelas sobre os fatores que impedem a produtividade. Nossos resultados mostraram que a experiência dos desenvolvedores é uma das principais razões para a baixa produtividade. Além disso, os resultados indicaram formas para contornar este problema, como melhorar e incentivar o ensino de programação paralela em cursos de graduação. |
Rockenbach, Dinei A; Löff, Júnior; Araujo, Gabriell; Griebler, Dalvan; Fernandes, Luiz G High-Level Stream and Data Parallelism in C++ for GPUs Inproceedings doi XXVI Brazilian Symposium on Programming Languages (SBLP), pp. 41-49, ACM, Uberlândia, Brazil, 2022. Abstract | Links | BibTeX | Tags: GPGPU, Parallel programming, Stream processing @inproceedings{ROCKENBACH:SBLP:22, title = {High-Level Stream and Data Parallelism in C++ for GPUs}, author = {Dinei A Rockenbach and Júnior Löff and Gabriell Araujo and Dalvan Griebler and Luiz G Fernandes}, url = {https://doi.org/10.1145/3561320.3561327}, doi = {10.1145/3561320.3561327}, year = {2022}, date = {2022-10-01}, booktitle = {XXVI Brazilian Symposium on Programming Languages (SBLP)}, pages = {41-49}, publisher = {ACM}, address = {Uberlândia, Brazil}, series = {SBLP'22}, abstract = {GPUs are massively parallel processors that allow solving problems that are not viable to traditional processors like CPUs. However, implementing applications for GPUs is challenging to programmers as it requires parallel programming to efficiently exploit the GPU resources. In this sense, parallel programming abstractions, notably domain-specific languages, are fundamental for improving programmability. SPar is a high-level Domain-Specific Language (DSL) that allows expressing stream and data parallelism in the serial code through annotations using C++ attributes. This work elaborates on a methodology and tool for GPU code generation by introducing new attributes to SPar language and transformation rules to SPar compiler. These new contributions, besides the gains in simplicity and code reduction compared to CUDA and OpenCL, enabled SPar achieve of higher throughput when exploring combined CPU and GPU parallelism, and when using batching.}, keywords = {GPGPU, Parallel programming, Stream processing}, pubstate = {published}, tppubtype = {inproceedings} } GPUs are massively parallel processors that allow solving problems that are not viable to traditional processors like CPUs. However, implementing applications for GPUs is challenging to programmers as it requires parallel programming to efficiently exploit the GPU resources. In this sense, parallel programming abstractions, notably domain-specific languages, are fundamental for improving programmability. SPar is a high-level Domain-Specific Language (DSL) that allows expressing stream and data parallelism in the serial code through annotations using C++ attributes. This work elaborates on a methodology and tool for GPU code generation by introducing new attributes to SPar language and transformation rules to SPar compiler. These new contributions, besides the gains in simplicity and code reduction compared to CUDA and OpenCL, enabled SPar achieve of higher throughput when exploring combined CPU and GPU parallelism, and when using batching. |
Andrade, Gabriella; Griebler, Dalvan; Santos, Rodrigo; Kessler, Christoph; Ernstsson, August; Fernandes, Luiz Gustavo Analyzing Programming Effort Model Accuracy of High-Level Parallel Programs for Stream Processing Inproceedings doi 48th Euromicro Conference on Software Engineering and Advanced Applications (SEAA 2022), pp. 229-232, IEEE, Gran Canaria, Spain, 2022. Abstract | Links | BibTeX | Tags: Parallel programming, Stream processing @inproceedings{ANDRADE:SEAA:22, title = {Analyzing Programming Effort Model Accuracy of High-Level Parallel Programs for Stream Processing}, author = {Gabriella Andrade and Dalvan Griebler and Rodrigo Santos and Christoph Kessler and August Ernstsson and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1109/SEAA56994.2022.00043}, doi = {10.1109/SEAA56994.2022.00043}, year = {2022}, date = {2022-09-01}, booktitle = {48th Euromicro Conference on Software Engineering and Advanced Applications (SEAA 2022)}, pages = {229-232}, publisher = {IEEE}, address = {Gran Canaria, Spain}, series = {SEAA'22}, abstract = {Over the years, several Parallel Programming Models (PPMs) have supported the abstraction of programming complexity for parallel computer systems. However, few studies aim to evaluate the productivity reached by such abstractions since this is a complex task that involves human beings. There are several studies to develop predictive methods to estimate the effort required to program applications in software engineering. In order to evaluate the reliability of such metrics, it is necessary to assess the accuracy in different programming domains. In this work, we used the data of an experiment conducted with beginners in parallel programming to determine the effort required for implementing stream parallelism using FastFlow, SPar, and TBB. Our results show that some traditional software effort estimation models, such as COCOMO II, fall short, while Putnam's model could be an alternative for high-level PPMs evaluation. To overcome the limitations of existing models, we plan to create a parallelism-aware model to evaluate applications in this domain in future work.}, keywords = {Parallel programming, Stream processing}, pubstate = {published}, tppubtype = {inproceedings} } Over the years, several Parallel Programming Models (PPMs) have supported the abstraction of programming complexity for parallel computer systems. However, few studies aim to evaluate the productivity reached by such abstractions since this is a complex task that involves human beings. There are several studies to develop predictive methods to estimate the effort required to program applications in software engineering. In order to evaluate the reliability of such metrics, it is necessary to assess the accuracy in different programming domains. In this work, we used the data of an experiment conducted with beginners in parallel programming to determine the effort required for implementing stream parallelism using FastFlow, SPar, and TBB. Our results show that some traditional software effort estimation models, such as COCOMO II, fall short, while Putnam's model could be an alternative for high-level PPMs evaluation. To overcome the limitations of existing models, we plan to create a parallelism-aware model to evaluate applications in this domain in future work. |
Garcia, Adriano Marques; Griebler, Dalvan; Schepke, Claudio; Fernandes, Luiz Gustavo Evaluating Micro-batch and Data Frequency for Stream Processing Applications on Multi-cores Inproceedings doi 30th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 10-17, IEEE, Valladolid, Spain, 2022. Abstract | Links | BibTeX | Tags: Benchmark, Stream processing @inproceedings{GARCIA:PDP:22, title = {Evaluating Micro-batch and Data Frequency for Stream Processing Applications on Multi-cores}, author = {Adriano Marques Garcia and Dalvan Griebler and Claudio Schepke and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1109/PDP55904.2022.00011}, doi = {10.1109/PDP55904.2022.00011}, year = {2022}, date = {2022-04-01}, booktitle = {30th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)}, pages = {10-17}, publisher = {IEEE}, address = {Valladolid, Spain}, series = {PDP'22}, abstract = {In stream processing, data arrives constantly and is often unpredictable. It can show large fluctuations in arrival frequency, size, complexity, and other factors. These fluctuations can strongly impact application latency and throughput, which are critical factors in this domain. Therefore, there is a significant amount of research on self-adaptive techniques involving elasticity or micro-batching as a way to mitigate this impact. However, there is a lack of benchmarks and tools for helping researchers to investigate micro-batching and data stream frequency implications. In this paper, we extend a benchmarking framework to support dynamic micro-batching and data stream frequency management. We used it to create custom benchmarks and compare latency and throughput aspects from two different parallel libraries. We validate our solution through an extensive analysis of the impact of micro-batching and data stream frequency on stream processing applications using Intel TBB and FastFlow, which are two libraries that leverage stream parallelism on multi-core architectures. Our results demonstrated up to 33% throughput gain over latency using micro-batches. Additionally, while TBB ensures lower latency, FastFlow ensures higher throughput in the parallel applications for different data stream frequency configurations.}, keywords = {Benchmark, Stream processing}, pubstate = {published}, tppubtype = {inproceedings} } In stream processing, data arrives constantly and is often unpredictable. It can show large fluctuations in arrival frequency, size, complexity, and other factors. These fluctuations can strongly impact application latency and throughput, which are critical factors in this domain. Therefore, there is a significant amount of research on self-adaptive techniques involving elasticity or micro-batching as a way to mitigate this impact. However, there is a lack of benchmarks and tools for helping researchers to investigate micro-batching and data stream frequency implications. In this paper, we extend a benchmarking framework to support dynamic micro-batching and data stream frequency management. We used it to create custom benchmarks and compare latency and throughput aspects from two different parallel libraries. We validate our solution through an extensive analysis of the impact of micro-batching and data stream frequency on stream processing applications using Intel TBB and FastFlow, which are two libraries that leverage stream parallelism on multi-core architectures. Our results demonstrated up to 33% throughput gain over latency using micro-batches. Additionally, while TBB ensures lower latency, FastFlow ensures higher throughput in the parallel applications for different data stream frequency configurations. |
Mencagli, Gabriele; Griebler, Dalvan; Danelutto, Marco Towards Parallel Data Stream Processing on System-on-Chip CPU+GPU Devices Inproceedings doi 30th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 34-38, IEEE, Valladolid, Spain, 2022. Abstract | Links | BibTeX | Tags: GPGPU, IoT, Stream processing @inproceedings{MENCAGLI:PDP:22, title = {Towards Parallel Data Stream Processing on System-on-Chip CPU+GPU Devices}, author = {Gabriele Mencagli and Dalvan Griebler and Marco Danelutto}, url = {https://doi.org/10.1109/PDP55904.2022.00014}, doi = {10.1109/PDP55904.2022.00014}, year = {2022}, date = {2022-04-01}, booktitle = {30th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)}, pages = {34-38}, publisher = {IEEE}, address = {Valladolid, Spain}, series = {PDP'22}, abstract = {Data Stream Processing is a pervasive computing paradigm with a wide spectrum of applications. Traditional streaming systems exploit the processing capabilities provided by homogeneous Clusters and Clouds. Due to the transition to streaming systems suitable for IoT/Edge environments, there has been the urgent need of new streaming frameworks and tools tailored for embedded platforms, often available as System-onChips composed of a small multicore CPU and an integrated onchip GPU. Exploiting this hybrid hardware requires special care in the runtime system design. In this paper, we discuss the support provided by the WindFlow library, showing its design principles and its effectiveness on the NVIDIA Jetson Nano board.}, keywords = {GPGPU, IoT, Stream processing}, pubstate = {published}, tppubtype = {inproceedings} } Data Stream Processing is a pervasive computing paradigm with a wide spectrum of applications. Traditional streaming systems exploit the processing capabilities provided by homogeneous Clusters and Clouds. Due to the transition to streaming systems suitable for IoT/Edge environments, there has been the urgent need of new streaming frameworks and tools tailored for embedded platforms, often available as System-onChips composed of a small multicore CPU and an integrated onchip GPU. Exploiting this hybrid hardware requires special care in the runtime system design. In this paper, we discuss the support provided by the WindFlow library, showing its design principles and its effectiveness on the NVIDIA Jetson Nano board. |
Scheer, Claudio; Araujo, Gabriell; Griebler, Dalvan; Meneguzzi, Felipe; Fernandes, Luiz Gustavo Encontrando a Configuração de Threads por Bloco para os Kernels NPB-CUDA com Q-Learning Inproceedings doi Anais da XXII Escola Regional de Alto Desempenho da Região Sul, pp. 119-120, Sociedade Brasileira de Computação, Curitiba, Brazil, 2022. Abstract | Links | BibTeX | Tags: Benchmark, Deep learning, GPGPU @inproceedings{SCHEER:ERAD:22, title = {Encontrando a Configuração de Threads por Bloco para os Kernels NPB-CUDA com Q-Learning}, author = {Claudio Scheer and Gabriell Araujo and Dalvan Griebler and Felipe Meneguzzi and Luiz Gustavo Fernandes}, url = {https://doi.org/10.5753/eradrs.2022.19191}, doi = {10.5753/eradrs.2022.19191}, year = {2022}, date = {2022-04-01}, booktitle = {Anais da XXII Escola Regional de Alto Desempenho da Região Sul}, pages = {119-120}, publisher = {Sociedade Brasileira de Computação}, address = {Curitiba, Brazil}, abstract = {Este trabalho apresenta um novo método que utiliza aprendizado de máquina para prever a melhor configuração de threads por bloco para aplicações de GPUs. Os resultados foram similares a estratégias manuais.}, keywords = {Benchmark, Deep learning, GPGPU}, pubstate = {published}, tppubtype = {inproceedings} } Este trabalho apresenta um novo método que utiliza aprendizado de máquina para prever a melhor configuração de threads por bloco para aplicações de GPUs. Os resultados foram similares a estratégias manuais. |
Fim, Gabriel; Welter, Greice; Löff, Júnior; Griebler, Dalvan Compressão de Dados em Clusters HPC com Flink, MPI e SPar Inproceedings doi Anais da XXII Escola Regional de Alto Desempenho da Região Sul, pp. 29-32, Sociedade Brasileira de Computação, Curitiba, Brazil, 2022. Abstract | Links | BibTeX | Tags: Parallel programming, Stream processing @inproceedings{larcc:FIM:ERAD:22, title = {Compressão de Dados em Clusters HPC com Flink, MPI e SPar}, author = {Gabriel Fim and Greice Welter and Júnior Löff and Dalvan Griebler}, url = {https://doi.org/10.5753/eradrs.2022.19153}, doi = {10.5753/eradrs.2022.19153}, year = {2022}, date = {2022-04-01}, booktitle = {Anais da XXII Escola Regional de Alto Desempenho da Região Sul}, pages = {29-32}, publisher = {Sociedade Brasileira de Computação}, address = {Curitiba, Brazil}, abstract = {Este trabalho visa avaliar o desempenho do algoritmo de compressão de dados Bzip2 com as ferramentas de processamento de stream Apache Flink, MPI e SPar utilizando um cluster Beowulf. Os resultados mostram que as versões com maior desempenho em relação ao tempo sequencial são o MPI e SPar com speed-up de 7,6 e 7,2 vezes, respectivamente.}, keywords = {Parallel programming, Stream processing}, pubstate = {published}, tppubtype = {inproceedings} } Este trabalho visa avaliar o desempenho do algoritmo de compressão de dados Bzip2 com as ferramentas de processamento de stream Apache Flink, MPI e SPar utilizando um cluster Beowulf. Os resultados mostram que as versões com maior desempenho em relação ao tempo sequencial são o MPI e SPar com speed-up de 7,6 e 7,2 vezes, respectivamente. |
Gomes, Márcio Miguel; da Righi, Rodrigo Rosa; da Costa, Cristiano André; Griebler, Dalvan Steam++: An Extensible End-to-end Framework for Developing IoT Data Processing Applications in the Fog Journal Article doi International Journal of Computer Science & Information Technology, 14 (1), pp. 31-51, 2022. Abstract | Links | BibTeX | Tags: Cloud computing, IoT, Stream processing @article{GOMES:IJCSIT:22, title = {Steam++: An Extensible End-to-end Framework for Developing IoT Data Processing Applications in the Fog}, author = {Márcio Miguel Gomes and Rodrigo Rosa da Righi and Cristiano André da Costa and Dalvan Griebler}, url = {http://dx.doi.org/10.5121/ijcsit.2022.14103}, doi = {10.5121/ijcsit.2022.14103}, year = {2022}, date = {2022-02-01}, journal = {International Journal of Computer Science & Information Technology}, volume = {14}, number = {1}, pages = {31-51}, publisher = {AIRCC}, abstract = {IoT applications usually rely on cloud computing services to perform data analysis such as filtering, aggregation, classification, pattern detection, and prediction. When applied to specific domains, the IoT needs to deal with unique constraints. Besides the hostile environment such as vibration and electricmagnetic interference, resulting in malfunction, noise, and data loss, industrial plants often have Internet access restricted or unavailable, forcing us to design stand-alone fog and edge computing solutions. In this context, we present STEAM++, a lightweight and extensible framework for real-time data stream processing and decision-making in the network edge, targeting hardware-limited devices, besides proposing a micro-benchmark methodology for assessing embedded IoT applications. In real-case experiments in a semiconductor industry, we processed an entire data flow, from values sensing, processing and analysing data, detecting relevant events, and finally, publishing results to a dashboard. On average, the application consumed less than 500kb RAM and 1.0% of CPU usage, processing up to 239 data packets per second and reducing the output data size to 14% of the input raw data size when notifying events.}, keywords = {Cloud computing, IoT, Stream processing}, pubstate = {published}, tppubtype = {article} } IoT applications usually rely on cloud computing services to perform data analysis such as filtering, aggregation, classification, pattern detection, and prediction. When applied to specific domains, the IoT needs to deal with unique constraints. Besides the hostile environment such as vibration and electricmagnetic interference, resulting in malfunction, noise, and data loss, industrial plants often have Internet access restricted or unavailable, forcing us to design stand-alone fog and edge computing solutions. In this context, we present STEAM++, a lightweight and extensible framework for real-time data stream processing and decision-making in the network edge, targeting hardware-limited devices, besides proposing a micro-benchmark methodology for assessing embedded IoT applications. In real-case experiments in a semiconductor industry, we processed an entire data flow, from values sensing, processing and analysing data, detecting relevant events, and finally, publishing results to a dashboard. On average, the application consumed less than 500kb RAM and 1.0% of CPU usage, processing up to 239 data packets per second and reducing the output data size to 14% of the input raw data size when notifying events. |
Hoffmann, Renato Barreto; Löff, Júnior; Griebler, Dalvan; Fernandes, Luiz Gustavo OpenMP as runtime for providing high-level stream parallelism on multi-cores Journal Article doi The Journal of Supercomputing, 78 (1), pp. 7655-7676, 2022. Abstract | Links | BibTeX | Tags: Parallel programming, Stream processing @article{HOFFMANN:Jsuper:2022, title = {OpenMP as runtime for providing high-level stream parallelism on multi-cores}, author = {Renato Barreto Hoffmann and Júnior Löff and Dalvan Griebler and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1007/s11227-021-04182-9}, doi = {10.1007/s11227-021-04182-9}, year = {2022}, date = {2022-01-01}, journal = {The Journal of Supercomputing}, volume = {78}, number = {1}, pages = {7655-7676}, publisher = {Springer}, address = {New York, United States}, abstract = {OpenMP is an industry and academic standard for parallel programming. However, using it for developing parallel stream processing applications is complex and challenging. OpenMP lacks key programming mechanisms and abstractions for this particular domain. To tackle this problem, we used a high-level parallel programming framework (named SPar) for automatically generating parallel OpenMP code. We achieved this by leveraging SPar’s language and its domain-specific code annotations for simplifying the complexity and verbosity added by OpenMP in this application domain. Consequently, we implemented a new compiler algorithm in SPar for automatically generating parallel code targeting the OpenMP runtime using source-to-source code transformations. The experiments in four different stream processing applications demonstrated that the execution time of SPar was improved up to 25.42% when using the OpenMP runtime. Additionally, our abstraction over OpenMP introduced at most 1.72% execution time overhead when compared to handwritten parallel codes. Furthermore, SPar significantly reduces the total source lines of code required to express parallelism with respect to plain OpenMP parallel codes.}, keywords = {Parallel programming, Stream processing}, pubstate = {published}, tppubtype = {article} } OpenMP is an industry and academic standard for parallel programming. However, using it for developing parallel stream processing applications is complex and challenging. OpenMP lacks key programming mechanisms and abstractions for this particular domain. To tackle this problem, we used a high-level parallel programming framework (named SPar) for automatically generating parallel OpenMP code. We achieved this by leveraging SPar’s language and its domain-specific code annotations for simplifying the complexity and verbosity added by OpenMP in this application domain. Consequently, we implemented a new compiler algorithm in SPar for automatically generating parallel code targeting the OpenMP runtime using source-to-source code transformations. The experiments in four different stream processing applications demonstrated that the execution time of SPar was improved up to 25.42% when using the OpenMP runtime. Additionally, our abstraction over OpenMP introduced at most 1.72% execution time overhead when compared to handwritten parallel codes. Furthermore, SPar significantly reduces the total source lines of code required to express parallelism with respect to plain OpenMP parallel codes. |
Löff, Júnior; Hoffmann, Renato Barreto; Pieper, Ricardo; Griebler, Dalvan; Fernandes, Luiz Gustavo DSParLib: A C++ Template Library for Distributed Stream Parallelism Journal Article doi International Journal of Parallel Programming, 50 (5), pp. 454-485, 2022. Abstract | Links | BibTeX | Tags: Distributed computing, Parallel programming @article{LOFF:IJPP:22, title = {DSParLib: A C++ Template Library for Distributed Stream Parallelism}, author = {Júnior Löff and Renato Barreto Hoffmann and Ricardo Pieper and Dalvan Griebler and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1007/s10766-022-00737-2}, doi = {10.1007/s10766-022-00737-2}, year = {2022}, date = {2022-01-01}, journal = {International Journal of Parallel Programming}, volume = {50}, number = {5}, pages = {454-485}, publisher = {Springer}, abstract = {Stream processing applications deal with millions of data items continuously generated over time. Often, they must be processed in real-time and scale performance, which requires the use of distributed parallel computing resources. In C/C++, the current state-of-the-art for distributed architectures and High-Performance Computing is Message Passing Interface (MPI). However, exploiting stream parallelism using MPI is complex and error-prone because it exposes many low-level details to the programmer. In this work, we introduce a new parallel programming abstraction for implementing distributed stream parallelism named DSParLib. Our abstraction of MPI simplifies parallel programming by providing a pattern-based and building block-oriented development to inter-connect, model, and parallelize data streams found in modern applications. Experiments conducted with five different stream processing applications and the representative PARSEC Ferret benchmark revealed that DSParLib is efficient and flexible. Also, DSParLib achieved similar or better performance, required less coding, and provided simpler abstractions to express parallelism with respect to handwritten MPI programs.}, keywords = {Distributed computing, Parallel programming}, pubstate = {published}, tppubtype = {article} } Stream processing applications deal with millions of data items continuously generated over time. Often, they must be processed in real-time and scale performance, which requires the use of distributed parallel computing resources. In C/C++, the current state-of-the-art for distributed architectures and High-Performance Computing is Message Passing Interface (MPI). However, exploiting stream parallelism using MPI is complex and error-prone because it exposes many low-level details to the programmer. In this work, we introduce a new parallel programming abstraction for implementing distributed stream parallelism named DSParLib. Our abstraction of MPI simplifies parallel programming by providing a pattern-based and building block-oriented development to inter-connect, model, and parallelize data streams found in modern applications. Experiments conducted with five different stream processing applications and the representative PARSEC Ferret benchmark revealed that DSParLib is efficient and flexible. Also, DSParLib achieved similar or better performance, required less coding, and provided simpler abstractions to express parallelism with respect to handwritten MPI programs. |
2021 |
Löff, Júnior; Griebler, Dalvan; Mencagli, Gabriele; de Araujo, Gabriell ; Torquati, Massimo; Danelutto, Marco; Fernandes, Luiz Gustavo The NAS parallel benchmarks for evaluating C++ parallel programming frameworks on shared-memory architectures Journal Article doi Future Generation Computer Systems, 125 , pp. 743-757, 2021. Abstract | Links | BibTeX | Tags: Benchmark, Parallel programming @article{LOFF:FGCS:21, title = {The NAS parallel benchmarks for evaluating C++ parallel programming frameworks on shared-memory architectures}, author = {Júnior Löff and Dalvan Griebler and Gabriele Mencagli and Gabriell {de Araujo} and Massimo Torquati and Marco Danelutto and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1016/j.future.2021.07.021}, doi = {10.1016/j.future.2021.07.021}, year = {2021}, date = {2021-07-01}, journal = {Future Generation Computer Systems}, volume = {125}, pages = {743-757}, publisher = {Elsevier}, abstract = {The NAS Parallel Benchmarks (NPB), originally implemented mostly in Fortran, is a consolidated suite containing several benchmarks extracted from Computational Fluid Dynamics (CFD) models. The benchmark suite has important characteristics such as intensive memory communications, complex data dependencies, different memory access patterns, and hardware components/sub-systems overload. Parallel programming APIs, libraries, and frameworks that are written in C++ as well as new optimizations and parallel processing techniques can benefit if NPB is made fully available in this programming language. In this paper we present NPB-CPP, a fully C++ translated version of NPB consisting of all the NPB kernels and pseudo-applications developed using OpenMP, Intel TBB, and FastFlow parallel frameworks for multicores. The design of NPB-CPP leverages the Structured Parallel Programming methodology (essentially based on parallel design patterns). We show the structure of each benchmark application in terms of composition of few patterns (notably Map and MapReduce constructs) provided by the selected C++ frameworks. The experimental evaluation shows the accuracy of NPB-CPP with respect to the original NPB source code. Furthermore, we carefully evaluate the parallel performance on three multi-core systems (Intel, IBM Power and AMD) with different C++ compilers (gcc, icc and clang) by discussing the performance differences in order to give to the researchers useful insights to choose the best parallel programming framework for a given type of problem.}, keywords = {Benchmark, Parallel programming}, pubstate = {published}, tppubtype = {article} } The NAS Parallel Benchmarks (NPB), originally implemented mostly in Fortran, is a consolidated suite containing several benchmarks extracted from Computational Fluid Dynamics (CFD) models. The benchmark suite has important characteristics such as intensive memory communications, complex data dependencies, different memory access patterns, and hardware components/sub-systems overload. Parallel programming APIs, libraries, and frameworks that are written in C++ as well as new optimizations and parallel processing techniques can benefit if NPB is made fully available in this programming language. In this paper we present NPB-CPP, a fully C++ translated version of NPB consisting of all the NPB kernels and pseudo-applications developed using OpenMP, Intel TBB, and FastFlow parallel frameworks for multicores. The design of NPB-CPP leverages the Structured Parallel Programming methodology (essentially based on parallel design patterns). We show the structure of each benchmark application in terms of composition of few patterns (notably Map and MapReduce constructs) provided by the selected C++ frameworks. The experimental evaluation shows the accuracy of NPB-CPP with respect to the original NPB source code. Furthermore, we carefully evaluate the parallel performance on three multi-core systems (Intel, IBM Power and AMD) with different C++ compilers (gcc, icc and clang) by discussing the performance differences in order to give to the researchers useful insights to choose the best parallel programming framework for a given type of problem. |
Pieper, Ricardo; Löff, Júnior; Hoffmann, Renato Berreto; Griebler, Dalvan; Fernandes, Luiz Gustavo High-level and Efficient Structured Stream Parallelism for Rust on Multi-cores Journal Article Journal of Computer Languages, na (na), pp. na, 2021. Abstract | BibTeX | Tags: Parallel programming, Stream processing @article{PIEPER:COLA:21, title = {High-level and Efficient Structured Stream Parallelism for Rust on Multi-cores}, author = {Ricardo Pieper and Júnior Löff and Renato Berreto Hoffmann and Dalvan Griebler and Luiz Gustavo Fernandes}, year = {2021}, date = {2021-07-01}, journal = {Journal of Computer Languages}, volume = {na}, number = {na}, pages = {na}, publisher = {Elsevier}, abstract = {This work aims at contributing with a structured parallel programming abstraction for Rust in order to provide ready-to-use parallel patterns that abstract low-level and architecture-dependent details from application programmers. We focus on stream processing applications running on shared-memory multi-core architectures (i.e, video processing, compression, and others). Therefore, we provide a new high-level and efficient parallel programming abstraction for expressing stream parallelism, named Rust-SSP. We also created a new stream benchmark suite for Rust that represents real-world scenarios and has different application characteristics and workloads. Our benchmark suite is an initiative to assess existing parallelism abstraction for this domain, as parallel implementations using these abstractions were provided. The results revealed that Rust-SSP achieved up to 41.1% better performance than other solutions. In terms of programmability, the results revealed that Rust-SSP requires the smallest number of extra lines of code to enable stream parallelism..}, keywords = {Parallel programming, Stream processing}, pubstate = {published}, tppubtype = {article} } This work aims at contributing with a structured parallel programming abstraction for Rust in order to provide ready-to-use parallel patterns that abstract low-level and architecture-dependent details from application programmers. We focus on stream processing applications running on shared-memory multi-core architectures (i.e, video processing, compression, and others). Therefore, we provide a new high-level and efficient parallel programming abstraction for expressing stream parallelism, named Rust-SSP. We also created a new stream benchmark suite for Rust that represents real-world scenarios and has different application characteristics and workloads. Our benchmark suite is an initiative to assess existing parallelism abstraction for this domain, as parallel implementations using these abstractions were provided. The results revealed that Rust-SSP achieved up to 41.1% better performance than other solutions. In terms of programmability, the results revealed that Rust-SSP requires the smallest number of extra lines of code to enable stream parallelism.. |
Maliszewski, Anderson M; Vogel, Adriano; Griebler, Dalvan; Schepke, Claudio; Navaux, Philippe Ambiente de Nuvem Computacional Privada para Teste e Desenvolvimento de Programas Paralelos Incollection doi Charão, Andrea; Serpa, Matheus (Ed.): Minicursos da XXI Escola Regional de Alto Desempenho da Região Sul, pp. 104-128, Sociedade Brasileira de Computação (SBC), Porto Alegre, 2021. Abstract | Links | BibTeX | Tags: Cloud computing, Parallel programming @incollection{larcc:minicurso:ERAD:21, title = {Ambiente de Nuvem Computacional Privada para Teste e Desenvolvimento de Programas Paralelos}, author = {Anderson M. Maliszewski and Adriano Vogel and Dalvan Griebler and Claudio Schepke and Philippe Navaux}, editor = {Andrea Charão and Matheus Serpa}, url = {https://doi.org/10.5753/sbc.6150.4}, doi = {10.5753/sbc.6150.4}, year = {2021}, date = {2021-06-01}, booktitle = {Minicursos da XXI Escola Regional de Alto Desempenho da Região Sul}, pages = {104-128}, publisher = {Sociedade Brasileira de Computação (SBC)}, address = {Porto Alegre}, chapter = {5}, abstract = {A computação de alto desempenho costuma utilizar agregados de computadores para a execução de aplicações paralelas. Alternativamente, a computação em nuvem oferece recursos computacionais distribuídos para processamento com um nível de abstração além do tradicional, dinâmico e sob-demanda. Este capítulo tem como objetivo introduzir conceitos básicos, apresentar noções básicas para implantar uma nuvem privada e demonstrar os benefícios para o desenvolvimento e teste de programas paralelos em nuvem.}, keywords = {Cloud computing, Parallel programming}, pubstate = {published}, tppubtype = {incollection} } A computação de alto desempenho costuma utilizar agregados de computadores para a execução de aplicações paralelas. Alternativamente, a computação em nuvem oferece recursos computacionais distribuídos para processamento com um nível de abstração além do tradicional, dinâmico e sob-demanda. Este capítulo tem como objetivo introduzir conceitos básicos, apresentar noções básicas para implantar uma nuvem privada e demonstrar os benefícios para o desenvolvimento e teste de programas paralelos em nuvem. |
Gomes, Márcio Miguel; da Righi, Rodrigo Rosa; da Costa, Cristiano André; Griebler, Dalvan Simplifying IoT data stream enrichment and analytics in the edge Journal Article doi Computers & Electrical Engineering, 92 , pp. 107110, 2021. Abstract | Links | BibTeX | Tags: IoT, Stream processing @article{GOMES:CEE:21, title = {Simplifying IoT data stream enrichment and analytics in the edge}, author = {Márcio Miguel Gomes and Rodrigo Rosa da Righi and Cristiano André da Costa and Dalvan Griebler}, url = {https://doi.org/10.1016/j.compeleceng.2021.107110}, doi = {10.1016/j.compeleceng.2021.107110}, year = {2021}, date = {2021-06-01}, journal = {Computers & Electrical Engineering}, volume = {92}, pages = {107110}, publisher = {Elsevier}, abstract = {Edge devices are usually limited in resources. They often send data to the cloud, where techniques such as filtering, aggregation, classification, pattern detection, and prediction are performed. This process results in critical issues such as data loss, high response time, and overhead. On the other hand, processing data in the edge is not a simple task due to devices’ heterogeneity, resource limitations, a variety of programming languages and standards. In this context, this work proposes STEAM, a framework for developing data stream processing applications in the edge targeting hardware-limited devices. As the main contribution, STEAM enables the development of applications for different platforms, with standardized functions and class structures that use consolidated IoT data formats and communication protocols. Moreover, the experiments revealed the viability of stream processing in the edge resulting in the reduction of response time without compromising the quality of results.}, keywords = {IoT, Stream processing}, pubstate = {published}, tppubtype = {article} } Edge devices are usually limited in resources. They often send data to the cloud, where techniques such as filtering, aggregation, classification, pattern detection, and prediction are performed. This process results in critical issues such as data loss, high response time, and overhead. On the other hand, processing data in the edge is not a simple task due to devices’ heterogeneity, resource limitations, a variety of programming languages and standards. In this context, this work proposes STEAM, a framework for developing data stream processing applications in the edge targeting hardware-limited devices. As the main contribution, STEAM enables the development of applications for different platforms, with standardized functions and class structures that use consolidated IoT data formats and communication protocols. Moreover, the experiments revealed the viability of stream processing in the edge resulting in the reduction of response time without compromising the quality of results. |
Vogel, Adriano; Gabriele, ; Mencagli, ; Griebler, Dalvan; Danelutto, Marco; Fernandes, Luiz Gustavo Online and Transparent Self-adaptation of Stream Parallel Patterns Journal Article doi Computing, 105 (5), pp. 1039-1057, 2021. Abstract | Links | BibTeX | Tags: Parallel programming, Self-adaptation, Stream processing @article{VOGEL:Computing:23, title = {Online and Transparent Self-adaptation of Stream Parallel Patterns}, author = {Adriano Vogel and Gabriele and Mencagli and Dalvan Griebler and Marco Danelutto and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1007/s00607-021-00998-8}, doi = {10.1007/s00607-021-00998-8}, year = {2021}, date = {2021-05-01}, journal = {Computing}, volume = {105}, number = {5}, pages = {1039-1057}, publisher = {Springer}, abstract = {Several real-world parallel applications are becoming more dynamic and long-running, demanding online (at run-time) adaptations. Stream processing is a representative scenario that computes data items arriving in real-time and where parallel executions are necessary. However, it is challenging for humans to monitor and manually self-optimize complex and long-running parallel executions continuously. Moreover, although high-level and structured parallel programming aims to facilitate parallelism, several issues still need to be addressed for improving the existing abstractions. In this paper, we extend self-adaptiveness for supporting autonomous and online changes of the parallel pattern compositions. Online self-adaptation is achieved with an online profiler that characterizes the applications, which is combined with a new self-adaptive strategy and a model for smooth transitions on reconfigurations. The solution provides a new abstraction layer that enables application programmers to define non-functional requirements instead of hand-tuning complex configurations. Hence, we contribute with additional abstractions and flexible self-adaptation for responsiveness at run-time. The proposed solution is evaluated with applications having different processing characteristics, workloads, and configurations. The results show that it is possible to provide additional abstractions, flexibility, and responsiveness while achieving performance comparable to the best static configuration executions.}, keywords = {Parallel programming, Self-adaptation, Stream processing}, pubstate = {published}, tppubtype = {article} } Several real-world parallel applications are becoming more dynamic and long-running, demanding online (at run-time) adaptations. Stream processing is a representative scenario that computes data items arriving in real-time and where parallel executions are necessary. However, it is challenging for humans to monitor and manually self-optimize complex and long-running parallel executions continuously. Moreover, although high-level and structured parallel programming aims to facilitate parallelism, several issues still need to be addressed for improving the existing abstractions. In this paper, we extend self-adaptiveness for supporting autonomous and online changes of the parallel pattern compositions. Online self-adaptation is achieved with an online profiler that characterizes the applications, which is combined with a new self-adaptive strategy and a model for smooth transitions on reconfigurations. The solution provides a new abstraction layer that enables application programmers to define non-functional requirements instead of hand-tuning complex configurations. Hence, we contribute with additional abstractions and flexible self-adaptation for responsiveness at run-time. The proposed solution is evaluated with applications having different processing characteristics, workloads, and configurations. The results show that it is possible to provide additional abstractions, flexibility, and responsiveness while achieving performance comparable to the best static configuration executions. |
Leonarczyk, Ricardo; Griebler, Dalvan Implementação MPIC++ e HPX dos Kernels NPB Inproceedings doi 21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 81-84, Sociedade Brasileira de Computação, Joinville, RS, Brazil, 2021. Abstract | Links | BibTeX | Tags: Benchmark, Parallel programming @inproceedings{larcc:NPB_HPX_MPI:ERAD:21, title = {Implementação MPIC++ e HPX dos Kernels NPB}, author = {Ricardo Leonarczyk and Dalvan Griebler}, url = {https://doi.org/10.5753/eradrs.2021.14780}, doi = {10.5753/eradrs.2021.14780}, year = {2021}, date = {2021-04-01}, booktitle = {21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS)}, pages = {81-84}, publisher = {Sociedade Brasileira de Computação}, address = {Joinville, RS, Brazil}, abstract = {Este artigo apresenta a implementação paralela dos cinco kernels pertencentes ao NAS Parallel Benchmarks (NPB) com MPIC++ e HPX para execução em arquiteturas de cluster. Os resultados demonstraram que o modelo de programação HPX pode ser mais eficiente do que MPIC++ em algoritmos tais como transformada rápida de Fourier, ordenação e Gradiente Conjugado.}, keywords = {Benchmark, Parallel programming}, pubstate = {published}, tppubtype = {inproceedings} } Este artigo apresenta a implementação paralela dos cinco kernels pertencentes ao NAS Parallel Benchmarks (NPB) com MPIC++ e HPX para execução em arquiteturas de cluster. Os resultados demonstraram que o modelo de programação HPX pode ser mais eficiente do que MPIC++ em algoritmos tais como transformada rápida de Fourier, ordenação e Gradiente Conjugado. |
Vanzan, Anthony; Fim, Gabriel; Welter, Greice; Griebler, Dalvan Aceleração da Classificação de Lavouras de Milho com MPI e Estratégias de Paralelismo Inproceedings doi 21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 49-52, Sociedade Brasileira de Computação, Joinville, RS, Brazil, 2021. Abstract | Links | BibTeX | Tags: Agriculture, Distributed computing, Parallel programming, Stream processing @inproceedings{larcc:DL_Classificaiton_MPI:ERAD:21, title = {Aceleração da Classificação de Lavouras de Milho com MPI e Estratégias de Paralelismo}, author = {Anthony Vanzan and Gabriel Fim and Greice Welter and Dalvan Griebler}, url = {https://doi.org/10.5753/eradrs.2021.14772}, doi = {10.5753/eradrs.2021.14772}, year = {2021}, date = {2021-04-01}, booktitle = {21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS)}, pages = {49-52}, publisher = {Sociedade Brasileira de Computação}, address = {Joinville, RS, Brazil}, abstract = {Este trabalho visou acelerar a execução de um algoritmo de classificação de lavouras em imagens áreas. Para isso, foram implementadas diferentes versões paralelas usando a biblioteca MPI na linguagem Python. A avaliação foi conduzida em dois ambientes computacionais. Conclui-se que é possível reduzir o tempo de execução a medida que mais recursos paralelos são usados e a estratégia de distribuição de trabalho dinâmica é mais eficiente.}, keywords = {Agriculture, Distributed computing, Parallel programming, Stream processing}, pubstate = {published}, tppubtype = {inproceedings} } Este trabalho visou acelerar a execução de um algoritmo de classificação de lavouras em imagens áreas. Para isso, foram implementadas diferentes versões paralelas usando a biblioteca MPI na linguagem Python. A avaliação foi conduzida em dois ambientes computacionais. Conclui-se que é possível reduzir o tempo de execução a medida que mais recursos paralelos são usados e a estratégia de distribuição de trabalho dinâmica é mais eficiente. |
Dopke, Luan; Rockenbach, Dinei A; Griebler, Dalvan Avaliação de Desempenho para Banco de Dados com Genoma em Nuvem Privada Inproceedings doi 21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 45-48, Sociedade Brasileira de Computação, Joinville, RS, Brazil, 2021. Abstract | Links | BibTeX | Tags: Databases, Genomics, NoSQL databases @inproceedings{larcc:cloud_DNA_databases:ERAD:21, title = {Avaliação de Desempenho para Banco de Dados com Genoma em Nuvem Privada}, author = {Luan Dopke and Dinei A Rockenbach and Dalvan Griebler}, url = {https://doi.org/10.5753/eradrs.2021.14771}, doi = {10.5753/eradrs.2021.14771}, year = {2021}, date = {2021-04-01}, booktitle = {21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS)}, pages = {45-48}, publisher = {Sociedade Brasileira de Computação}, address = {Joinville, RS, Brazil}, abstract = {Os bancos de dados são ferramentas particularmente interessantes para a manipulação de dados gerados através do sequenciamento de DNA. Este artigo tem como objetivo avaliar o desempenho de três bancos de dados com cargas relacionadas ao sequenciamento de DNA: PostgreSQL e MySQL como bancos de dados relacionais e MongoDB como banco de dados NoSQL. Os resultados demonstram que o PostgreSQL se sobressai aos demais..}, keywords = {Databases, Genomics, NoSQL databases}, pubstate = {published}, tppubtype = {inproceedings} } Os bancos de dados são ferramentas particularmente interessantes para a manipulação de dados gerados através do sequenciamento de DNA. Este artigo tem como objetivo avaliar o desempenho de três bancos de dados com cargas relacionadas ao sequenciamento de DNA: PostgreSQL e MySQL como bancos de dados relacionais e MongoDB como banco de dados NoSQL. Os resultados demonstram que o PostgreSQL se sobressai aos demais.. |
Vogel, Adriano; Mencagli, Gabriele; Griebler, Dalvan; Danelutto, Marco; Fernandes, Luiz Gustavo Towards On-the-fly Self-Adaptation of Stream Parallel Patterns Inproceedings doi 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 89-93, IEEE, Valladolid, Spain, 2021. Abstract | Links | BibTeX | Tags: Self-adaptation, Stream processing @inproceedings{VOGEL:PDP:21, title = {Towards On-the-fly Self-Adaptation of Stream Parallel Patterns}, author = {Adriano Vogel and Gabriele Mencagli and Dalvan Griebler and Marco Danelutto and Luiz Gustavo Fernandes}, doi = {10.1109/PDP52278.2021.00022}, year = {2021}, date = {2021-03-01}, booktitle = {29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)}, pages = {89-93}, publisher = {IEEE}, address = {Valladolid, Spain}, series = {PDP'21}, abstract = {Stream processing applications compute streams of data and provide insightful results in a timely manner, where parallel computing is necessary for accelerating the application executions. Considering that these applications are becoming increasingly dynamic and long-running, a potential solution is to apply dynamic runtime changes. However, it is challenging for humans to continuously monitor and manually self-optimize the executions. In this paper, we propose self-adaptiveness of the parallel patterns used, enabling flexible on-the-fly adaptations. The proposed solution is evaluated with an existing programming framework and running experiments with a synthetic and a real-world application. The results show that the proposed solution is able to dynamically self-adapt to the most suitable parallel pattern configuration and achieve performance competitive with the best static cases. The feasibility of the proposed solution encourages future optimizations and other applicabilities.}, keywords = {Self-adaptation, Stream processing}, pubstate = {published}, tppubtype = {inproceedings} } Stream processing applications compute streams of data and provide insightful results in a timely manner, where parallel computing is necessary for accelerating the application executions. Considering that these applications are becoming increasingly dynamic and long-running, a potential solution is to apply dynamic runtime changes. However, it is challenging for humans to continuously monitor and manually self-optimize the executions. In this paper, we propose self-adaptiveness of the parallel patterns used, enabling flexible on-the-fly adaptations. The proposed solution is evaluated with an existing programming framework and running experiments with a synthetic and a real-world application. The results show that the proposed solution is able to dynamically self-adapt to the most suitable parallel pattern configuration and achieve performance competitive with the best static cases. The feasibility of the proposed solution encourages future optimizations and other applicabilities. |
Vogel, Adriano; Griebler, Dalvan; Fernandes, Luiz Gustavo Providing High-level Self-adaptive Abstractions for Stream Parallelism on Multi-cores Journal Article doi Software: Practice and Experience, 51 (6), pp. 1194-1217, 2021. Abstract | Links | BibTeX | Tags: Self-adaptation, Stream processing @article{VOGEL:SPE:21, title = {Providing High-level Self-adaptive Abstractions for Stream Parallelism on Multi-cores}, author = {Adriano Vogel and Dalvan Griebler and Luiz Gustavo Fernandes}, doi = {10.1002/spe.2948}, year = {2021}, date = {2021-01-01}, journal = {Software: Practice and Experience}, volume = {51}, number = {6}, pages = {1194-1217}, publisher = {Wiley Online Library}, abstract = {Stream processing applications are common computing workloads that demand parallelism to increase their performance. As in the past, parallel programming remains a difficult task for application programmers. The complexity increases when application programmers must set non-intuitive parallelism parameters, i.e. the degree of parallelism. The main problem is that state-of-the-art libraries use a static degree of parallelism and are not sufficiently abstracted for developing stream processing applications. In this paper, we propose a self-adaptive regulation of the degree of parallelism to provide higher-level abstractions. Flexibility is provided to programmers with two new self-adaptive strategies, one is for performance experts, and the other abstracts the need to set a performance goal. We evaluated our solution using compiler transformation rules to generate parallel code with the SPar domain-specific language. The experimental results with real-world applications highlighted higher abstraction levels without significant performance degradation in comparison to static executions. The strategy for performance experts achieved slightly higher performance than the one that works without user-defined performance goals.}, keywords = {Self-adaptation, Stream processing}, pubstate = {published}, tppubtype = {article} } Stream processing applications are common computing workloads that demand parallelism to increase their performance. As in the past, parallel programming remains a difficult task for application programmers. The complexity increases when application programmers must set non-intuitive parallelism parameters, i.e. the degree of parallelism. The main problem is that state-of-the-art libraries use a static degree of parallelism and are not sufficiently abstracted for developing stream processing applications. In this paper, we propose a self-adaptive regulation of the degree of parallelism to provide higher-level abstractions. Flexibility is provided to programmers with two new self-adaptive strategies, one is for performance experts, and the other abstracts the need to set a performance goal. We evaluated our solution using compiler transformation rules to generate parallel code with the SPar domain-specific language. The experimental results with real-world applications highlighted higher abstraction levels without significant performance degradation in comparison to static executions. The strategy for performance experts achieved slightly higher performance than the one that works without user-defined performance goals. |
Allebrandt, Alisson; Schmidt, Diego Henrique; Griebler, Dalvan Simplificando a Interpretação de Laudos de Análise de Solo com Deep Learning em Nuvem Journal Article doi Revista Eletrônica Argentina-Brasil de Tecnologias da Informação e da Comunicação (REABTIC), 1 (13), 2021. Abstract | Links | BibTeX | Tags: Agriculture, Cloud computing, Deep learning @article{larcc:DL_solos:REABTIC:21, title = {Simplificando a Interpretação de Laudos de Análise de Solo com Deep Learning em Nuvem}, author = {Alisson Allebrandt and Diego Henrique Schmidt and Dalvan Griebler}, url = {https://revistas.setrem.com.br/index.php/reabtic/article/view/387}, doi = {10.5281/zenodo.4445204}, year = {2021}, date = {2021-01-01}, journal = {Revista Eletrônica Argentina-Brasil de Tecnologias da Informação e da Comunicação (REABTIC)}, volume = {1}, number = {13}, publisher = {SETREM}, address = {Três de Maio, RS, Brazil}, abstract = {Um dos aspectos que interfere em uma boa produtividade agrícolaé o solo, consequentemente, a sua conservação por meio da aplicação corretade nutrientes e adubação é de suma importância. Neste artigo, propõe-se umaarquitetura de software e um aplicativo mobile capaz de auxiliar agricultores eengenheiros agrônomos na interpretação de análises de solo geradas em laboratórios.A arquitetura de software foi concebida para atuar em um ambientede nuvem e o aplicativo mobile é a interface para captura e apresentação dosdados. Inicialmente, foi necessário criar uma base de dados com diferentestipos e configurações de imagens. O dataset foi tratado para eliminar ruídos(tais como luminosidade, sombras e distorções) e usado para avaliação de duassoluções de Deep Learning (Google Vision e Tesseract OCR), onde o TesseractOCR se mostrou mais preciso usando as mesmas imagens. Além de ofertar oaplicativo mobile, que é um primeiro passo, a pesquisa realizada revela váriascarências tecnológicas e oportunidades para inovações na área de ciência dossolos.}, keywords = {Agriculture, Cloud computing, Deep learning}, pubstate = {published}, tppubtype = {article} } Um dos aspectos que interfere em uma boa produtividade agrícolaé o solo, consequentemente, a sua conservação por meio da aplicação corretade nutrientes e adubação é de suma importância. Neste artigo, propõe-se umaarquitetura de software e um aplicativo mobile capaz de auxiliar agricultores eengenheiros agrônomos na interpretação de análises de solo geradas em laboratórios.A arquitetura de software foi concebida para atuar em um ambientede nuvem e o aplicativo mobile é a interface para captura e apresentação dosdados. Inicialmente, foi necessário criar uma base de dados com diferentestipos e configurações de imagens. O dataset foi tratado para eliminar ruídos(tais como luminosidade, sombras e distorções) e usado para avaliação de duassoluções de Deep Learning (Google Vision e Tesseract OCR), onde o TesseractOCR se mostrou mais preciso usando as mesmas imagens. Além de ofertar oaplicativo mobile, que é um primeiro passo, a pesquisa realizada revela váriascarências tecnológicas e oportunidades para inovações na área de ciência dossolos. |
2020 |
Bordin, Maycon Viana; Griebler, Dalvan; Mencagli, Gabriele; Geyer, Claudio F R; Fernandes, Luiz Gustavo DSPBench: a Suite of Benchmark Applications for Distributed Data Stream Processing Systems Journal Article doi IEEE Access, 8 (na), pp. 222900-222917, 2020. Abstract | Links | BibTeX | Tags: Benchmark, Stream processing @article{BORDIN:IEEEAccess:20, title = {DSPBench: a Suite of Benchmark Applications for Distributed Data Stream Processing Systems}, author = {Maycon Viana Bordin and Dalvan Griebler and Gabriele Mencagli and Claudio F R Geyer and Luiz Gustavo Fernandes}, doi = {10.1109/ACCESS.2020.3043948}, year = {2020}, date = {2020-12-01}, journal = {IEEE Access}, volume = {8}, number = {na}, pages = {222900-222917}, publisher = {IEEE}, abstract = {Systems enabling the continuous processing of large data streams have recently attracted the attention of the scientific community and industrial stakeholders. Data Stream Processing Systems (DSPSs) are complex and powerful frameworks able to ease the development of streaming applications in distributed computing environments like clusters and clouds. Several systems of this kind have been released and currently maintained as open source projects, like Apache Storm and Spark Streaming. Some benchmark applications have often been used by the scientific community to test and evaluate new techniques to improve the performance and usability of DSPSs. However, the existing benchmark suites lack of representative workloads coming from the wide set of application domains that can leverage the benefits offered by the stream processing paradigm in terms of near real-time performance. The goal of this paper is to present a new benchmark suite composed of 15 applications coming from areas like Finance, Telecommunications, Sensor Networks, Social Networks and others. This paper describes in detail the nature of these applications, their full workload characterization in terms of selectivity, processing cost, input size and overall memory occupation. In addition, it exemplifies the usefulness of our benchmark suite to compare real DSPSs by selecting Apache Storm and Spark Streaming for this analysis.}, keywords = {Benchmark, Stream processing}, pubstate = {published}, tppubtype = {article} } Systems enabling the continuous processing of large data streams have recently attracted the attention of the scientific community and industrial stakeholders. Data Stream Processing Systems (DSPSs) are complex and powerful frameworks able to ease the development of streaming applications in distributed computing environments like clusters and clouds. Several systems of this kind have been released and currently maintained as open source projects, like Apache Storm and Spark Streaming. Some benchmark applications have often been used by the scientific community to test and evaluate new techniques to improve the performance and usability of DSPSs. However, the existing benchmark suites lack of representative workloads coming from the wide set of application domains that can leverage the benefits offered by the stream processing paradigm in terms of near real-time performance. The goal of this paper is to present a new benchmark suite composed of 15 applications coming from areas like Finance, Telecommunications, Sensor Networks, Social Networks and others. This paper describes in detail the nature of these applications, their full workload characterization in terms of selectivity, processing cost, input size and overall memory occupation. In addition, it exemplifies the usefulness of our benchmark suite to compare real DSPSs by selecting Apache Storm and Spark Streaming for this analysis. |
Welter, Greice Aline; Vanzan, Anthony; Fim, Gabriel Rustick; Sausen, Matheus César; Griebler, Dalvan Avaliação dos Frameworks TensorFlow, PyTorch e Keras para Deep Learning Inproceedings 22 Salão de Pesquisa Setrem (SAPS), pp. 5, Sociedade Educacional Três de Maio, Três de Maio, RS, Brazil, 2020. Abstract | Links | BibTeX | Tags: Cloud computing, Deep learning @inproceedings{larcc:DL_Frameworks:SAPS:20, title = {Avaliação dos Frameworks TensorFlow, PyTorch e Keras para Deep Learning}, author = {Greice Aline Welter and Anthony Vanzan and Gabriel Rustick Fim and Matheus César Sausen and Dalvan Griebler}, url = {https://larcc.setrem.com.br/wp-content/uploads/2020/11/SAPS_2020_Greice.pdf}, year = {2020}, date = {2020-10-01}, booktitle = {22 Salão de Pesquisa Setrem (SAPS)}, pages = {5}, publisher = {Sociedade Educacional Três de Maio}, address = {Três de Maio, RS, Brazil}, abstract = {Os frameworks de programação são um conjunto de classes sobre a qual uma ferramenta é constituída. Esse esqueleto disponibilizará classes para construir, por exemplo, uma solução que permita criar e trabalhar com vários modelos heterogêneos de RNAs (Redes Neurais Artificiais) interligadas. Como existem diversas opções disponíveis, não se sabe qual delas é a mais adequada. Em uma pesquisa prévia no contexto do projeto de pesquisa AGROCOMPUTAÇÃO (parceria entre SETREM e TECNICON), identificou-se que os principais frameworks para o desenvolvimento de RNAs profundas (Deep Learning) são TensorFlow, PyTorch e Keras. A partir disso, o objetivo foi avaliar qual delas é a melhor para o desenvolvimento de modelos de RNAs. Além de contribuir com o projeto AGROCOMPUTAÇÃO, os resultados dessa pesquisa poderão auxiliar na tomada de decisão de outros projetos de software que visam criar modelos de RNAs profundas. Os testes efetuados apontaram que o PyTorch é melhor no quesito acurácia e é mais flexível para o desenvolvimento de modelos de RNAs se comparado ao TensorFlow e Keras.}, keywords = {Cloud computing, Deep learning}, pubstate = {published}, tppubtype = {inproceedings} } Os frameworks de programação são um conjunto de classes sobre a qual uma ferramenta é constituída. Esse esqueleto disponibilizará classes para construir, por exemplo, uma solução que permita criar e trabalhar com vários modelos heterogêneos de RNAs (Redes Neurais Artificiais) interligadas. Como existem diversas opções disponíveis, não se sabe qual delas é a mais adequada. Em uma pesquisa prévia no contexto do projeto de pesquisa AGROCOMPUTAÇÃO (parceria entre SETREM e TECNICON), identificou-se que os principais frameworks para o desenvolvimento de RNAs profundas (Deep Learning) são TensorFlow, PyTorch e Keras. A partir disso, o objetivo foi avaliar qual delas é a melhor para o desenvolvimento de modelos de RNAs. Além de contribuir com o projeto AGROCOMPUTAÇÃO, os resultados dessa pesquisa poderão auxiliar na tomada de decisão de outros projetos de software que visam criar modelos de RNAs profundas. Os testes efetuados apontaram que o PyTorch é melhor no quesito acurácia e é mais flexível para o desenvolvimento de modelos de RNAs se comparado ao TensorFlow e Keras. |
Sausen, Matheus César; Welter, Greice Aline; Vanzan, Anthony; Fim, Gabriel Rustick; Griebler, Dalvan Metodologia para Captura de Imagens com VANT para a Cultura do Milho Inproceedings 22 Salão de Pesquisa Setrem (SAPS), pp. 5, Sociedade Educacional Três de Maio, Três de Maio, RS, Brazil, 2020. Abstract | Links | BibTeX | Tags: Agriculture, IoT @inproceedings{larcc:DL_Metodologia:SAPS:20, title = {Metodologia para Captura de Imagens com VANT para a Cultura do Milho}, author = {Matheus César Sausen and Greice Aline Welter and Anthony Vanzan and Gabriel Rustick Fim and Dalvan Griebler}, url = {https://larcc.setrem.com.br/wp-content/uploads/2020/11/SAPS_2020_Matheus.pdf}, year = {2020}, date = {2020-10-01}, booktitle = {22 Salão de Pesquisa Setrem (SAPS)}, pages = {5}, publisher = {Sociedade Educacional Três de Maio}, address = {Três de Maio, RS, Brazil}, abstract = {O setor agropecuário ao longo das décadas vem passando por transformações, que por muitas vezes tem relação direta com o avanço tecnológico. Os Veículos Aéreos não Tripulados (VANTs) vêm sendo utilizados como forma auxiliar para pulverização, detecção de pragas e realizar estimativas sobre diferentes culturas. Grande parte desse processo envolve a extração de imagens que são analisadas por programas especializados. Porém, as imagens não podem ser capturadas na lavoura e da cultura de qualquer forma, pois compromete a eficiência nas análises. Por isso, o objetivo é definir uma metodologia de captura de imagens para obter métricas na cultura do milho (Zea mays). Essa metodologia com representação taxonômica foi elaborada baseando-se em artigos publicados em eventos e revistas internacionais e nacionais. Assim, foi possível oferecer um guia para que agrônomos saibam como capturar as imagens com VANTS.}, keywords = {Agriculture, IoT}, pubstate = {published}, tppubtype = {inproceedings} } O setor agropecuário ao longo das décadas vem passando por transformações, que por muitas vezes tem relação direta com o avanço tecnológico. Os Veículos Aéreos não Tripulados (VANTs) vêm sendo utilizados como forma auxiliar para pulverização, detecção de pragas e realizar estimativas sobre diferentes culturas. Grande parte desse processo envolve a extração de imagens que são analisadas por programas especializados. Porém, as imagens não podem ser capturadas na lavoura e da cultura de qualquer forma, pois compromete a eficiência nas análises. Por isso, o objetivo é definir uma metodologia de captura de imagens para obter métricas na cultura do milho (Zea mays). Essa metodologia com representação taxonômica foi elaborada baseando-se em artigos publicados em eventos e revistas internacionais e nacionais. Assim, foi possível oferecer um guia para que agrônomos saibam como capturar as imagens com VANTS. |
Vanzan, Anthony; Fim, Gabriel Rustick; Welter, Greice Aline; Sausen, Matheus César; Griebler, Dalvan Algoritmo de Deep Learning para Classificação de Áreas de Lavaoura com VANTs Inproceedings 22 Salão de Pesquisa Setrem (SAPS), pp. 5, Sociedade Educacional Três de Maio, Três de Maio, RS, Brazil, 2020. Abstract | Links | BibTeX | Tags: Agriculture, Deep learning, IoT @inproceedings{larcc:DL_Classificacao:SAPS:20, title = {Algoritmo de Deep Learning para Classificação de Áreas de Lavaoura com VANTs}, author = {Anthony Vanzan and Gabriel Rustick Fim and Greice Aline Welter and Matheus César Sausen and Dalvan Griebler}, url = {https://larcc.setrem.com.br/wp-content/uploads/2020/11/SAPS_2020_Anthony.pdf}, year = {2020}, date = {2020-10-01}, booktitle = {22 Salão de Pesquisa Setrem (SAPS)}, pages = {5}, publisher = {Sociedade Educacional Três de Maio}, address = {Três de Maio, RS, Brazil}, abstract = {O Brasil é um dos maiores produtores e exportadores de milho do globo. A criação e implementação de novas tecnologias partindo da inteligência artificial podem proporcionar melhorias na produção do grão e, consequentemente, uma melhoria econômica no país. Nota-se também que as tecnologias de inteligência artificial estão conquistando espaço no mercado e auxiliando diversas áreas, tendo um avanço considerável de desempenho e produtividade. Nesse sentido, o presente trabalho visa apresentar a implementação e resultados de um modelo de rede neural utilizando a arquitetura LeNet5 para realizar a classificação de imagens, de cultivo do milho. Estas áreas classificadas servirão futuramente para o cálculo de estimativa de produção.}, keywords = {Agriculture, Deep learning, IoT}, pubstate = {published}, tppubtype = {inproceedings} } O Brasil é um dos maiores produtores e exportadores de milho do globo. A criação e implementação de novas tecnologias partindo da inteligência artificial podem proporcionar melhorias na produção do grão e, consequentemente, uma melhoria econômica no país. Nota-se também que as tecnologias de inteligência artificial estão conquistando espaço no mercado e auxiliando diversas áreas, tendo um avanço considerável de desempenho e produtividade. Nesse sentido, o presente trabalho visa apresentar a implementação e resultados de um modelo de rede neural utilizando a arquitetura LeNet5 para realizar a classificação de imagens, de cultivo do milho. Estas áreas classificadas servirão futuramente para o cálculo de estimativa de produção. |
Fim, Gabriel Rustick; Vanzan, Anthony; Welter, Greice Aline; Sausen, Matheus César; Griebler, Dalvan Desenvolvimento de Um Algoritmo de Pré-processamento Automático de Imagens Retiradas com VANTs Inproceedings 22 Salão de Pesquisa Setrem (SAPS), pp. 5, Sociedade Educacional Três de Maio, Três de Maio, RS, Brazil, 2020. Abstract | Links | BibTeX | Tags: Agriculture, IoT @inproceedings{larcc:DL_preprocessamento:SAPS:20, title = {Desenvolvimento de Um Algoritmo de Pré-processamento Automático de Imagens Retiradas com VANTs}, author = {Gabriel Rustick Fim and Anthony Vanzan and Greice Aline Welter and Matheus César Sausen and Dalvan Griebler}, url = {https://larcc.setrem.com.br/wp-content/uploads/2020/11/SAPS_2020_Gabriel.pdf}, year = {2020}, date = {2020-10-01}, booktitle = {22 Salão de Pesquisa Setrem (SAPS)}, pages = {5}, publisher = {Sociedade Educacional Três de Maio}, address = {Três de Maio, RS, Brazil}, abstract = {Um dos passos mais importantes para a realização do treinamento de uma rede neural é a criação de um dataset que possua imagens que satisfaçam os requisitos de entrada da rede. O presente trabalho tem como objetivo o desenvolvimento de um algoritmo na linguagem de programação Python para realizar o pré-processamento de uma série de imagens retiradas com veículos aéreos não tripulados. Para realizar este pré-processamento duas bibliotecas foram utilizadas, o NumPy e o OpenCV, ambas trabalhando em conjunto para realizar modificações nas imagens, tais como cortá-las em imagens menores e realizar conversões de cores de forma automática. Ao final, foi possível testar no escopo do projeto AGROCOMPUTAÇÃO (uma parceria entre SETREM e TECNICON Sistemas) e concluir que o algoritmo criado foi eficiente. Ele permitiu o pré-processamento de uma grande quantidade de imagens de forma automática.}, keywords = {Agriculture, IoT}, pubstate = {published}, tppubtype = {inproceedings} } Um dos passos mais importantes para a realização do treinamento de uma rede neural é a criação de um dataset que possua imagens que satisfaçam os requisitos de entrada da rede. O presente trabalho tem como objetivo o desenvolvimento de um algoritmo na linguagem de programação Python para realizar o pré-processamento de uma série de imagens retiradas com veículos aéreos não tripulados. Para realizar este pré-processamento duas bibliotecas foram utilizadas, o NumPy e o OpenCV, ambas trabalhando em conjunto para realizar modificações nas imagens, tais como cortá-las em imagens menores e realizar conversões de cores de forma automática. Ao final, foi possível testar no escopo do projeto AGROCOMPUTAÇÃO (uma parceria entre SETREM e TECNICON Sistemas) e concluir que o algoritmo criado foi eficiente. Ele permitiu o pré-processamento de uma grande quantidade de imagens de forma automática. |
Maliszewski, Anderson M; Roloff, Eduardo; Griebler, Dalvan; Gaspary, Luciano P; Navaux, Philippe O A Performance Impact of IEEE 802.3ad in Container-based Clouds for HPC Applications Inproceedings doi International Conference on Computational Science and its Applications (ICCSA), pp. 158-167, Springer, Cagliari, Italy, 2020. Abstract | Links | BibTeX | Tags: Cloud computing @inproceedings{larcc:ieee802.3ad_containers:ICCSA:20, title = {Performance Impact of IEEE 802.3ad in Container-based Clouds for HPC Applications}, author = {Anderson M. Maliszewski and Eduardo Roloff and Dalvan Griebler and Luciano P. Gaspary and Philippe O. A. Navaux}, url = {https://doi.org/10.1007/978-3-030-58817-5_13}, doi = {10.1007/978-3-030-58817-5_13}, year = {2020}, date = {2020-07-01}, booktitle = {International Conference on Computational Science and its Applications (ICCSA)}, pages = {158-167}, publisher = {Springer}, address = {Cagliari, Italy}, series = {ICCSA'20}, abstract = {Historically, large computational clusters have supported hardware requirements for executing High-Performance Computing (HPC) applications. This model has become out of date due to the high costs of maintaining and updating these infrastructures. Currently, computing resources are delivered as a service because of the cloud computing paradigm. In this way, we witnessed consistent efforts to migrate HPC applications to the cloud. However, if on the one hand cloud computing offers an attractive environment for HPC, benefiting from the pay-per-use model and on-demand resource allocation, on the other, there are still significant performance challenges to be addressed, such as the known network bottleneck. In this article, we evaluate the use of a Network Interface Cards (NIC) aggregation approach, using the IEEE 802.3ad standard to improve the performance of representative HPC applications executed in LXD container based-cloud. We assessed the aggregation impact using two and four NICs with three distinct transmission hash policies. Our results demonstrated that if the correct hash policy is selected, the NIC aggregation can significantly improve the performance of network-intensive HPC applications by up to 40%.}, keywords = {Cloud computing}, pubstate = {published}, tppubtype = {inproceedings} } Historically, large computational clusters have supported hardware requirements for executing High-Performance Computing (HPC) applications. This model has become out of date due to the high costs of maintaining and updating these infrastructures. Currently, computing resources are delivered as a service because of the cloud computing paradigm. In this way, we witnessed consistent efforts to migrate HPC applications to the cloud. However, if on the one hand cloud computing offers an attractive environment for HPC, benefiting from the pay-per-use model and on-demand resource allocation, on the other, there are still significant performance challenges to be addressed, such as the known network bottleneck. In this article, we evaluate the use of a Network Interface Cards (NIC) aggregation approach, using the IEEE 802.3ad standard to improve the performance of representative HPC applications executed in LXD container based-cloud. We assessed the aggregation impact using two and four NICs with three distinct transmission hash policies. Our results demonstrated that if the correct hash policy is selected, the NIC aggregation can significantly improve the performance of network-intensive HPC applications by up to 40%. |
Maliszewski, Anderson M; Roloff, Eduardo; Carreño, Emmanuell D; Griebler, Dalvan; Gaspary, Luciano P; Navaux, Philippe O A Performance and Cost-Aware in Clouds: A Network Interconnection Assessment Inproceedings doi IEEE Symposium on Computers and Communications (ISCC), pp. 1-6, IEEE, Rennes, France, 2020. Abstract | Links | BibTeX | Tags: Cloud computing @inproceedings{larcc:network_azure_cost_perf:ISCC:20, title = {Performance and Cost-Aware in Clouds: A Network Interconnection Assessment}, author = {Anderson M. Maliszewski and Eduardo Roloff and Emmanuell D. Carreño and Dalvan Griebler and Luciano P. Gaspary and Philippe O. A. Navaux}, url = {https://doi.org/10.1109/ISCC50000.2020.9219554}, doi = {10.1109/ISCC50000.2020.9219554}, year = {2020}, date = {2020-07-01}, booktitle = {IEEE Symposium on Computers and Communications (ISCC)}, pages = {1-6}, publisher = {IEEE}, address = {Rennes, France}, series = {ISCC'20}, abstract = {The availability of computing resources has significantly changed due to the growing adoption of the cloud computing paradigm. Aiming at potential advantages such as cost savings through the pay-per-use method and resource allocation in a scalable/elastic way, we witnessed consistent efforts to execute high-performance computing (HPC) applications in the cloud. Performance in this environment depends heavily upon two main system components: processing power and network interconnection. If, on the one hand, allocating more powerful hardware theoretically boosts performance, on the other hand, it increases the allocation cost. In this paper, we evaluated how the network interconnection impacts on performance and cost efficiency. Our experiments were carried out using NAS Parallel Benchmarks and Alya HPC application on Microsoft Azure public cloud provider, with three different cloud instances/network interconnections. The results revealed that through the use of the accelerated networking approach, which allows the instance to have a high-performance interconnect without additional charges, the performance of HPC applications can be significantly improved with a better cost efficiency.}, keywords = {Cloud computing}, pubstate = {published}, tppubtype = {inproceedings} } The availability of computing resources has significantly changed due to the growing adoption of the cloud computing paradigm. Aiming at potential advantages such as cost savings through the pay-per-use method and resource allocation in a scalable/elastic way, we witnessed consistent efforts to execute high-performance computing (HPC) applications in the cloud. Performance in this environment depends heavily upon two main system components: processing power and network interconnection. If, on the one hand, allocating more powerful hardware theoretically boosts performance, on the other hand, it increases the allocation cost. In this paper, we evaluated how the network interconnection impacts on performance and cost efficiency. Our experiments were carried out using NAS Parallel Benchmarks and Alya HPC application on Microsoft Azure public cloud provider, with three different cloud instances/network interconnections. The results revealed that through the use of the accelerated networking approach, which allows the instance to have a high-performance interconnect without additional charges, the performance of HPC applications can be significantly improved with a better cost efficiency. |
Stein, Charles M; Rockenbach, Dinei A; Griebler, Dalvan; Torquati, Massimo; Mencagli, Gabriele; Danelutto, Marco; Fernandes, Luiz Gustavo Latency‐aware adaptive micro‐batching techniques for streamed data compression on graphics processing units Journal Article doi Concurrency and Computation: Practice and Experience, na (na), pp. e5786, 2020. Abstract | Links | BibTeX | Tags: GPGPU, Stream processing @article{STEIN:CCPE:20, title = {Latency‐aware adaptive micro‐batching techniques for streamed data compression on graphics processing units}, author = {Charles M Stein and Dinei A Rockenbach and Dalvan Griebler and Massimo Torquati and Gabriele Mencagli and Marco Danelutto and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1002/cpe.5786}, doi = {10.1002/cpe.5786}, year = {2020}, date = {2020-05-01}, journal = {Concurrency and Computation: Practice and Experience}, volume = {na}, number = {na}, pages = {e5786}, publisher = {Wiley Online Library}, abstract = {Stream processing is a parallel paradigm used in many application domains. With the advance of graphics processing units (GPUs), their usage in stream processing applications has increased as well. The efficient utilization of GPU accelerators in streaming scenarios requires to batch input elements in microbatches, whose computation is offloaded on the GPU leveraging data parallelism within the same batch of data. Since data elements are continuously received based on the input speed, the bigger the microbatch size the higher the latency to completely buffer it and to start the processing on the device. Unfortunately, stream processing applications often have strict latency requirements that need to find the best size of the microbatches and to adapt it dynamically based on the workload conditions as well as according to the characteristics of the underlying device and network. In this work, we aim at implementing latency‐aware adaptive microbatching techniques and algorithms for streaming compression applications targeting GPUs. The evaluation is conducted using the Lempel‐Ziv‐Storer‐Szymanski compression application considering different input workloads. As a general result of our work, we noticed that algorithms with elastic adaptation factors respond better for stable workloads, while algorithms with narrower targets respond better for highly unbalanced workloads.}, keywords = {GPGPU, Stream processing}, pubstate = {published}, tppubtype = {article} } Stream processing is a parallel paradigm used in many application domains. With the advance of graphics processing units (GPUs), their usage in stream processing applications has increased as well. The efficient utilization of GPU accelerators in streaming scenarios requires to batch input elements in microbatches, whose computation is offloaded on the GPU leveraging data parallelism within the same batch of data. Since data elements are continuously received based on the input speed, the bigger the microbatch size the higher the latency to completely buffer it and to start the processing on the device. Unfortunately, stream processing applications often have strict latency requirements that need to find the best size of the microbatches and to adapt it dynamically based on the workload conditions as well as according to the characteristics of the underlying device and network. In this work, we aim at implementing latency‐aware adaptive microbatching techniques and algorithms for streaming compression applications targeting GPUs. The evaluation is conducted using the Lempel‐Ziv‐Storer‐Szymanski compression application considering different input workloads. As a general result of our work, we noticed that algorithms with elastic adaptation factors respond better for stable workloads, while algorithms with narrower targets respond better for highly unbalanced workloads. |
Leonarczyk, Ricardo; Griebler, Dalvan Implementação MPIC++ dos kernels NPB EP, IS e CG Inproceedings doi 20th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 101-104, Sociedade Brasileira de Computação, Santa Maria, RS, Brazil, 2020. Abstract | Links | BibTeX | Tags: Benchmark @inproceedings{larcc:NPB_MPI:ERAD:20, title = {Implementação MPIC++ dos kernels NPB EP, IS e CG}, author = {Ricardo Leonarczyk and Dalvan Griebler}, url = {https://doi.org/10.5753/eradrs.2020.10766}, doi = {10.5753/eradrs.2020.10766}, year = {2020}, date = {2020-04-01}, booktitle = {20th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS)}, pages = {101-104}, publisher = {Sociedade Brasileira de Computação}, address = {Santa Maria, RS, Brazil}, abstract = {Este trabalho busca contribuir com prévios esforços para disponibilizar os NAS Parallel benchmarks na linguagem C++, focando-se no aspecto memória distribuída com MPI. São apresentadas implementações do CG, EP e IS portadas da versão MPI original do NPB. Os experimentos realizados demonstram que a versão proposta dos benchmarks obteve um desempenho próximo da original.}, keywords = {Benchmark}, pubstate = {published}, tppubtype = {inproceedings} } Este trabalho busca contribuir com prévios esforços para disponibilizar os NAS Parallel benchmarks na linguagem C++, focando-se no aspecto memória distribuída com MPI. São apresentadas implementações do CG, EP e IS portadas da versão MPI original do NPB. Os experimentos realizados demonstram que a versão proposta dos benchmarks obteve um desempenho próximo da original. |
Maliszewski, Anderson M; Roloff, Eduardo; Griebler, Dalvan; Navaux, Philippe O A Avaliando o Impacto da Rede no Desempenho e Custo de Execução de Aplicações HPC Inproceedings doi 20th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 159-160, Sociedade Brasileira de Computação, Santa Maria, RS, Brazil, 2020. Abstract | Links | BibTeX | Tags: Benchmark, Cloud computing @inproceedings{larcc:network_impact:ERAD:20, title = {Avaliando o Impacto da Rede no Desempenho e Custo de Execução de Aplicações HPC}, author = {Anderson M Maliszewski and Eduardo Roloff and Dalvan Griebler and Philippe O A Navaux}, url = {https://doi.org/10.5753/eradrs.2020.10786}, doi = {10.5753/eradrs.2020.10786}, year = {2020}, date = {2020-04-01}, booktitle = {20th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS)}, pages = {159-160}, publisher = {Sociedade Brasileira de Computação}, address = {Santa Maria, RS, Brazil}, abstract = {O desempenho das aplicações HPC depende de dois componentes principais; poder de processamento e interconexão de rede. Este artigo avalia o impacto que a interconexão de rede exerce em programas paralelos usando um cluster homogêneo, em relação a desempenho e custo de execução estimado.}, keywords = {Benchmark, Cloud computing}, pubstate = {published}, tppubtype = {inproceedings} } O desempenho das aplicações HPC depende de dois componentes principais; poder de processamento e interconexão de rede. Este artigo avalia o impacto que a interconexão de rede exerce em programas paralelos usando um cluster homogêneo, em relação a desempenho e custo de execução estimado. |
de Araujo, Gabriell ; Griebler, Dalvan; Danelutto, Marco; Fernandes, Luiz Gustavo Efficient NAS Parallel Benchmark Kernels with CUDA Inproceedings doi 28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 9-16, IEEE, Västerås, Sweden, Sweden, 2020. Abstract | Links | BibTeX | Tags: Benchmark, GPGPU @inproceedings{ARAUJO:PDP:20, title = {Efficient NAS Parallel Benchmark Kernels with CUDA}, author = {Gabriell {de Araujo} and Dalvan Griebler and Marco Danelutto and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1109/PDP50117.2020.00009}, doi = {10.1109/PDP50117.2020.00009}, year = {2020}, date = {2020-03-01}, booktitle = {28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)}, pages = {9-16}, publisher = {IEEE}, address = {Västerås, Sweden, Sweden}, series = {PDP'20}, abstract = {NAS Parallel Benchmarks (NPB) are one of the standard benchmark suites used to evaluate parallel hardware and software. There are many research efforts trying to provide different parallel versions apart from the original OpenMP and MPI. Concerning GPU accelerators, there are only the OpenCL and OpenACC available as consolidated versions. Our goal is to provide an efficient parallel implementation of the five NPB kernels with CUDA. Our contribution covers different aspects. First, best parallel programming practices were followed to implement NPB kernels using CUDA. Second, the support of larger workloads (class B and C) allow to stress and investigate the memory of robust GPUs. Third, we show that it is possible to make NPB efficient and suitable for GPUs although the benchmarks were designed for CPUs in the past. We succeed in achieving double performance with respect to the state-of-the-art in some cases as well as implementing efficient memory usage. Fourth, we discuss new experiments comparing performance and memory usage against OpenACC and OpenCL state-of-the-art versions using a relative new GPU architecture. The experimental results also revealed that our version is the best one for all the NPB kernels compared to OpenACC and OpenCL. The greatest differences were observed for the FT and EP kernels.}, keywords = {Benchmark, GPGPU}, pubstate = {published}, tppubtype = {inproceedings} } NAS Parallel Benchmarks (NPB) are one of the standard benchmark suites used to evaluate parallel hardware and software. There are many research efforts trying to provide different parallel versions apart from the original OpenMP and MPI. Concerning GPU accelerators, there are only the OpenCL and OpenACC available as consolidated versions. Our goal is to provide an efficient parallel implementation of the five NPB kernels with CUDA. Our contribution covers different aspects. First, best parallel programming practices were followed to implement NPB kernels using CUDA. Second, the support of larger workloads (class B and C) allow to stress and investigate the memory of robust GPUs. Third, we show that it is possible to make NPB efficient and suitable for GPUs although the benchmarks were designed for CPUs in the past. We succeed in achieving double performance with respect to the state-of-the-art in some cases as well as implementing efficient memory usage. Fourth, we discuss new experiments comparing performance and memory usage against OpenACC and OpenCL state-of-the-art versions using a relative new GPU architecture. The experimental results also revealed that our version is the best one for all the NPB kernels compared to OpenACC and OpenCL. The greatest differences were observed for the FT and EP kernels. |
Vogel, Adriano; Rista, Cassiano; Justo, Gabriel; Ewald, Endrius; Griebler, Dalvan; Mencagli, Gabriele; Fernandes, Luiz Gustavo Parallel Stream Processing with MPI for Video Analytics and Data Visualization Inproceedings doi High Performance Computing Systems, pp. 102-116, Springer, Cham, 2020. Abstract | Links | BibTeX | Tags: Stream processing @inproceedings{VOGEL:CCIS:20, title = {Parallel Stream Processing with MPI for Video Analytics and Data Visualization}, author = {Adriano Vogel and Cassiano Rista and Gabriel Justo and Endrius Ewald and Dalvan Griebler and Gabriele Mencagli and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1007/978-3-030-41050-6_7}, doi = {10.1007/978-3-030-41050-6_7}, year = {2020}, date = {2020-02-01}, booktitle = {High Performance Computing Systems}, volume = {1171}, pages = {102-116}, publisher = {Springer}, address = {Cham}, series = {Communications in Computer and Information Science (CCIS)}, abstract = {The amount of data generated is increasing exponentially. However, processing data and producing fast results is a technological challenge. Parallel stream processing can be implemented for handling high frequency and big data flows. The MPI parallel programming model offers low-level and flexible mechanisms for dealing with distributed architectures such as clusters. This paper aims to use it to accelerate video analytics and data visualization applications so that insight can be obtained as soon as the data arrives. Experiments were conducted with a Domain-Specific Language for Geospatial Data Visualization and a Person Recognizer video application. We applied the same stream parallelism strategy and two task distribution strategies. The dynamic task distribution achieved better performance than the static distribution in the HPC cluster. The data visualization achieved lower throughput with respect to the video analytics due to the I/O intensive operations. Also, the MPI programming model shows promising performance outcomes for stream processing applications.}, keywords = {Stream processing}, pubstate = {published}, tppubtype = {inproceedings} } The amount of data generated is increasing exponentially. However, processing data and producing fast results is a technological challenge. Parallel stream processing can be implemented for handling high frequency and big data flows. The MPI parallel programming model offers low-level and flexible mechanisms for dealing with distributed architectures such as clusters. This paper aims to use it to accelerate video analytics and data visualization applications so that insight can be obtained as soon as the data arrives. Experiments were conducted with a Domain-Specific Language for Geospatial Data Visualization and a Person Recognizer video application. We applied the same stream parallelism strategy and two task distribution strategies. The dynamic task distribution achieved better performance than the static distribution in the HPC cluster. The data visualization achieved lower throughput with respect to the video analytics due to the I/O intensive operations. Also, the MPI programming model shows promising performance outcomes for stream processing applications. |