Se você prefere baixar um arquivo único com todas as referências do LARCC, você pode encontrá-lo neste link. Você também pode acompanhar novas publicações via RSS.
Adicionalmente, você também pode encontrar as publicações no perfil do LARCC no Google Scholar .
2023 |
Fim, Gabriel Rustick; Griebler, Dalvan Implementação e Avaliação do Paralelismo de Flink nas Aplicações de Processamento de Log e Análise de Cliques Inproceedings doi Anais da XXIII Escola Regional de Alto Desempenho da Região Sul, pp. 69-72, Sociedade Brasileira de Computação, Porto Alegre, Brazil, 2023. Abstract | Links | BibTeX | Tags: Parallel programming, Stream processing @inproceedings{larcc:FIM:ERAD:23, title = {Implementação e Avaliação do Paralelismo de Flink nas Aplicações de Processamento de Log e Análise de Cliques}, author = {Gabriel Rustick Fim and Dalvan Griebler}, url = {https://doi.org/10.5753/eradrs.2023.229290}, doi = {10.5753/eradrs.2023.229290}, year = {2023}, date = {2023-05-01}, booktitle = {Anais da XXIII Escola Regional de Alto Desempenho da Região Sul}, pages = {69-72}, publisher = {Sociedade Brasileira de Computação}, address = {Porto Alegre, Brazil}, abstract = {Este trabalho visou implementar e avaliar o desempenho das aplicações de Processamento de Log e Análise de Cliques no Apache Flink, comparando o desempenho com Apache Storm em um ambiente computacional distribuído. Os resultados mostram que a execução em Flink apresenta um consumo de recursos relativamente menor quando comparada a execução em Storm, mas possui um desvio padrão alto expondo um desbalanceamento de carga em execuções onde algum componente da aplicação é replicado.}, keywords = {Parallel programming, Stream processing}, pubstate = {published}, tppubtype = {inproceedings} } Este trabalho visou implementar e avaliar o desempenho das aplicações de Processamento de Log e Análise de Cliques no Apache Flink, comparando o desempenho com Apache Storm em um ambiente computacional distribuído. Os resultados mostram que a execução em Flink apresenta um consumo de recursos relativamente menor quando comparada a execução em Storm, mas possui um desvio padrão alto expondo um desbalanceamento de carga em execuções onde algum componente da aplicação é replicado. |
Andrade, Gabriella; Griebler, Dalvan; Santos, Rodrigo; Fernandes, Luiz Gustavo A parallel programming assessment for stream processing applications on multi-core systems Journal Article doi Computer Standards & Interfaces, 84 , pp. 103691, 2023. Abstract | Links | BibTeX | Tags: Parallel programming, Stream processing @article{ANDRADE:CSI:2023, title = {A parallel programming assessment for stream processing applications on multi-core systems}, author = {Gabriella Andrade and Dalvan Griebler and Rodrigo Santos and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1016/j.csi.2022.103691}, doi = {10.1016/j.csi.2022.103691}, year = {2023}, date = {2023-03-01}, journal = {Computer Standards & Interfaces}, volume = {84}, pages = {103691}, publisher = {Elsevier}, abstract = {Multi-core systems are any computing device nowadays and stream processing applications are becoming recurrent workloads, demanding parallelism to achieve the desired quality of service. As soon as data, tasks, or requests arrive, they must be computed, analyzed, or processed. Since building such applications is not a trivial task, the software industry must adopt parallel APIs (Application Programming Interfaces) that simplify the exploitation of parallelism in hardware for accelerating time-to-market. In the last years, research efforts in academia and industry provided a set of parallel APIs, increasing productivity to software developers. However, a few studies are seeking to prove the usability of these interfaces. In this work, we aim to present a parallel programming assessment regarding the usability of parallel API for expressing parallelism on the stream processing application domain and multi-core systems. To this end, we conducted an empirical study with beginners in parallel application development. The study covered three parallel APIs, reporting several quantitative and qualitative indicators involving developers. Our contribution also comprises a parallel programming assessment methodology, which can be replicated in future assessments. This study revealed important insights such as recurrent compile-time and programming logic errors performed by beginners in parallel programming, as well as the programming effort, challenges, and learning curve. Moreover, we collected the participants’ opinions about their experience in this study to understand deeply the results achieved.}, keywords = {Parallel programming, Stream processing}, pubstate = {published}, tppubtype = {article} } Multi-core systems are any computing device nowadays and stream processing applications are becoming recurrent workloads, demanding parallelism to achieve the desired quality of service. As soon as data, tasks, or requests arrive, they must be computed, analyzed, or processed. Since building such applications is not a trivial task, the software industry must adopt parallel APIs (Application Programming Interfaces) that simplify the exploitation of parallelism in hardware for accelerating time-to-market. In the last years, research efforts in academia and industry provided a set of parallel APIs, increasing productivity to software developers. However, a few studies are seeking to prove the usability of these interfaces. In this work, we aim to present a parallel programming assessment regarding the usability of parallel API for expressing parallelism on the stream processing application domain and multi-core systems. To this end, we conducted an empirical study with beginners in parallel application development. The study covered three parallel APIs, reporting several quantitative and qualitative indicators involving developers. Our contribution also comprises a parallel programming assessment methodology, which can be replicated in future assessments. This study revealed important insights such as recurrent compile-time and programming logic errors performed by beginners in parallel programming, as well as the programming effort, challenges, and learning curve. Moreover, we collected the participants’ opinions about their experience in this study to understand deeply the results achieved. |
Vogel, Adriano; Danelutto, Marco; Griebler, Dalvan; Fernandes, Luiz Gustavo Revisiting self-adaptation for efficient decision-making at run-time in parallel executions Inproceedings doi 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 43-50, IEEE, Naples, Italy, 2023. Abstract | Links | BibTeX | Tags: Parallel programming, Self-adaptation @inproceedings{VOGEL:PDP:23, title = {Revisiting self-adaptation for efficient decision-making at run-time in parallel executions}, author = {Adriano Vogel and Marco Danelutto and Dalvan Griebler and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1109/PDP59025.2023.00015}, doi = {10.1109/PDP59025.2023.00015}, year = {2023}, date = {2023-03-01}, booktitle = {31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)}, pages = {43-50}, publisher = {IEEE}, address = {Naples, Italy}, series = {PDP'23}, abstract = {Self-adaptation is a potential alternative to provide a higher level of autonomic abstractions and run-time responsiveness in parallel executions. However, the recurrent problem is that self-adaptation is still limited in flexibility and efficiency. For instance, there is a lack of mechanisms to apply adaptation actions and efficient decision-making strategies to decide which configurations should be conveniently enforced at run-time. In this work, we are interested in providing and evaluating potential abstractions achievable with self-adaptation transparently managing parallel executions. Therefore, we provide a new mechanism to support self-adaptation in applications with multiple parallel stages executed in multi-cores. Moreover, we reproduce, reimplement, and evaluate an existing decision-making strategy in our scenario. The observations from the results show that the proposed mechanism for self-adaptation can provide new parallelism abstractions and autonomous responsiveness at run-time. On the other hand, there is a need for more accurate decision-making strategies to enable efficient executions of applications in resource-constrained scenarios like multi-cores.}, keywords = {Parallel programming, Self-adaptation}, pubstate = {published}, tppubtype = {inproceedings} } Self-adaptation is a potential alternative to provide a higher level of autonomic abstractions and run-time responsiveness in parallel executions. However, the recurrent problem is that self-adaptation is still limited in flexibility and efficiency. For instance, there is a lack of mechanisms to apply adaptation actions and efficient decision-making strategies to decide which configurations should be conveniently enforced at run-time. In this work, we are interested in providing and evaluating potential abstractions achievable with self-adaptation transparently managing parallel executions. Therefore, we provide a new mechanism to support self-adaptation in applications with multiple parallel stages executed in multi-cores. Moreover, we reproduce, reimplement, and evaluate an existing decision-making strategy in our scenario. The observations from the results show that the proposed mechanism for self-adaptation can provide new parallelism abstractions and autonomous responsiveness at run-time. On the other hand, there is a need for more accurate decision-making strategies to enable efficient executions of applications in resource-constrained scenarios like multi-cores. |
Araujo, Gabriell; Griebler, Dalvan; Rockenbach, Dinei A; Danelutto, Marco; Fernandes, Luiz Gustavo NAS Parallel Benchmarks with CUDA and Beyond Journal Article doi Software: Practice and Experience, 53 (1), pp. 53-80, 2023. Abstract | Links | BibTeX | Tags: Benchmark, GPGPU, Parallel programming @article{ARAUJO:SPE:23, title = {NAS Parallel Benchmarks with CUDA and Beyond}, author = {Gabriell Araujo and Dalvan Griebler and Dinei A Rockenbach and Marco Danelutto and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1002/spe.3056}, doi = {10.1002/spe.3056}, year = {2023}, date = {2023-01-01}, journal = {Software: Practice and Experience}, volume = {53}, number = {1}, pages = {53-80}, publisher = {Wiley}, abstract = {NAS Parallel Benchmarks (NPB) is a standard benchmark suite used in the evaluation of parallel hardware and software. Several research efforts from academia have made these benchmarks available with different parallel programming models beyond the original versions with OpenMP and MPI. This work joins these research efforts by providing a new CUDA implementation for NPB. Our contribution covers different aspects beyond the implementation. First, we define design principles based on the best programming practices for GPUs and apply them to each benchmark using CUDA. Second, we provide ease of use parametrization support for configuring the number of threads per block in our version. Third, we conduct a broad study on the impact of the number of threads per block in the benchmarks. Fourth, we propose and evaluate five strategies for helping to find a better number of threads per block configuration. The results have revealed relevant performance improvement solely by changing the number of threads per block, showing performance improvements from 8% up to 717% among the benchmarks. Fifth, we conduct a comparative analysis with the literature, evaluating performance, memory consumption, code refactoring required, and parallelism implementations. The performance results have shown up to 267% improvements over the best benchmarks versions available. We also observe the best and worst design choices, concerning code size and the performance trade-off. Lastly, we highlight the challenges of implementing parallel CFD applications for GPUs and how the computations impact the GPU's behavior.}, keywords = {Benchmark, GPGPU, Parallel programming}, pubstate = {published}, tppubtype = {article} } NAS Parallel Benchmarks (NPB) is a standard benchmark suite used in the evaluation of parallel hardware and software. Several research efforts from academia have made these benchmarks available with different parallel programming models beyond the original versions with OpenMP and MPI. This work joins these research efforts by providing a new CUDA implementation for NPB. Our contribution covers different aspects beyond the implementation. First, we define design principles based on the best programming practices for GPUs and apply them to each benchmark using CUDA. Second, we provide ease of use parametrization support for configuring the number of threads per block in our version. Third, we conduct a broad study on the impact of the number of threads per block in the benchmarks. Fourth, we propose and evaluate five strategies for helping to find a better number of threads per block configuration. The results have revealed relevant performance improvement solely by changing the number of threads per block, showing performance improvements from 8% up to 717% among the benchmarks. Fifth, we conduct a comparative analysis with the literature, evaluating performance, memory consumption, code refactoring required, and parallelism implementations. The performance results have shown up to 267% improvements over the best benchmarks versions available. We also observe the best and worst design choices, concerning code size and the performance trade-off. Lastly, we highlight the challenges of implementing parallel CFD applications for GPUs and how the computations impact the GPU's behavior. |
2022 |
Löff, Júnior; Hoffmann, Renato Barreto; Griebler, Dalvan; Fernandes, Luiz Gustavo Combining stream with data parallelism abstractions for multi-cores Journal Article doi Journal of Computer Languages, 73 , pp. 101160, 2022. Abstract | Links | BibTeX | Tags: Parallel programming, Stream processing @article{LOFF:COLA:22, title = {Combining stream with data parallelism abstractions for multi-cores}, author = {Júnior Löff and Renato Barreto Hoffmann and Dalvan Griebler and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1016/j.cola.2022.101160}, doi = {10.1016/j.cola.2022.101160}, year = {2022}, date = {2022-12-01}, journal = {Journal of Computer Languages}, volume = {73}, pages = {101160}, publisher = {Elsevier}, abstract = {Stream processing applications have seen an increasing demand with the raised availability of sensors, IoT devices, and user data. Modern systems can generate millions of data items per day that require to be processed timely. To deal with this demand, application programmers must consider parallelism to exploit the maximum performance of the underlying hardware resources. In this work, we introduce improvements to stream processing applications by exploiting fine-grained data parallelism (via Map and MapReduce) inside coarse-grained stream parallelism stages. The improvements are including techniques for identifying data parallelism in sequential codes, a new language, semantic analysis, and a set of definition and transformation rules to perform source-to-source parallel code generation. Moreover, we investigate the feasibility of employing higher-level programming abstractions to support the proposed optimizations. For that, we elect SPar programming model as a use case, and extend it by adding two new attributes to its language and implementing our optimizations as a new algorithm in the SPar compiler. We conduct a set of experiments in representative stream processing and data-parallel applications. The results showed that our new compiler algorithm is efficient and that performance improved by up to 108.4x in data-parallel applications. Furthermore, experiments evaluating stream processing applications towards the composition of stream and data parallelism revealed new insights. The results showed that such composition may improve latencies by up to an order of magnitude. Also, it enables programmers to exploit different degrees of stream and data parallelism to accomplish a balance between throughput and latency according to their necessity.}, keywords = {Parallel programming, Stream processing}, pubstate = {published}, tppubtype = {article} } Stream processing applications have seen an increasing demand with the raised availability of sensors, IoT devices, and user data. Modern systems can generate millions of data items per day that require to be processed timely. To deal with this demand, application programmers must consider parallelism to exploit the maximum performance of the underlying hardware resources. In this work, we introduce improvements to stream processing applications by exploiting fine-grained data parallelism (via Map and MapReduce) inside coarse-grained stream parallelism stages. The improvements are including techniques for identifying data parallelism in sequential codes, a new language, semantic analysis, and a set of definition and transformation rules to perform source-to-source parallel code generation. Moreover, we investigate the feasibility of employing higher-level programming abstractions to support the proposed optimizations. For that, we elect SPar programming model as a use case, and extend it by adding two new attributes to its language and implementing our optimizations as a new algorithm in the SPar compiler. We conduct a set of experiments in representative stream processing and data-parallel applications. The results showed that our new compiler algorithm is efficient and that performance improved by up to 108.4x in data-parallel applications. Furthermore, experiments evaluating stream processing applications towards the composition of stream and data parallelism revealed new insights. The results showed that such composition may improve latencies by up to an order of magnitude. Also, it enables programmers to exploit different degrees of stream and data parallelism to accomplish a balance between throughput and latency according to their necessity. |
Andrade, Gabriella; Griebler, Dalvan; Santos, Rodrigo; Fernandes, Luiz Gustavo Opinião de Brasileiros Sobre a Produtividade no Desenvolvimento de Aplicações Paralelas Inproceedings doi Anais do XXII Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD), pp. 276-287, SBC, Florianópolis, Brasil, 2022. Abstract | Links | BibTeX | Tags: Parallel programming @inproceedings{ANDRADE:WSCAD:22, title = {Opinião de Brasileiros Sobre a Produtividade no Desenvolvimento de Aplicações Paralelas}, author = {Gabriella Andrade and Dalvan Griebler and Rodrigo Santos and Luiz Gustavo Fernandes}, url = {https://doi.org/10.5753/wscad.2022.226392}, doi = {10.5753/wscad.2022.226392}, year = {2022}, date = {2022-10-01}, booktitle = {Anais do XXII Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD)}, pages = {276-287}, publisher = {SBC}, address = {Florianópolis, Brasil}, abstract = {A partir da popularização das arquiteturas paralelas, surgiram várias interfaces de programação a fim de facilitar a exploração de tais arquiteturas e aumentar a produtividade dos desenvolvedores. Entretanto, desenvolver aplicações paralelas ainda é uma tarefa complexa para desenvolvedores com pouca experiência. Neste trabalho, realizamos uma pesquisa para descobrir a opinião de desenvolvedores de aplicações paralelas sobre os fatores que impedem a produtividade. Nossos resultados mostraram que a experiência dos desenvolvedores é uma das principais razões para a baixa produtividade. Além disso, os resultados indicaram formas para contornar este problema, como melhorar e incentivar o ensino de programação paralela em cursos de graduação.}, keywords = {Parallel programming}, pubstate = {published}, tppubtype = {inproceedings} } A partir da popularização das arquiteturas paralelas, surgiram várias interfaces de programação a fim de facilitar a exploração de tais arquiteturas e aumentar a produtividade dos desenvolvedores. Entretanto, desenvolver aplicações paralelas ainda é uma tarefa complexa para desenvolvedores com pouca experiência. Neste trabalho, realizamos uma pesquisa para descobrir a opinião de desenvolvedores de aplicações paralelas sobre os fatores que impedem a produtividade. Nossos resultados mostraram que a experiência dos desenvolvedores é uma das principais razões para a baixa produtividade. Além disso, os resultados indicaram formas para contornar este problema, como melhorar e incentivar o ensino de programação paralela em cursos de graduação. |
Rockenbach, Dinei A; Löff, Júnior; Araujo, Gabriell; Griebler, Dalvan; Fernandes, Luiz G High-Level Stream and Data Parallelism in C++ for GPUs Inproceedings doi XXVI Brazilian Symposium on Programming Languages (SBLP), pp. 41-49, ACM, Uberlândia, Brazil, 2022. Abstract | Links | BibTeX | Tags: GPGPU, Parallel programming, Stream processing @inproceedings{ROCKENBACH:SBLP:22, title = {High-Level Stream and Data Parallelism in C++ for GPUs}, author = {Dinei A Rockenbach and Júnior Löff and Gabriell Araujo and Dalvan Griebler and Luiz G Fernandes}, url = {https://doi.org/10.1145/3561320.3561327}, doi = {10.1145/3561320.3561327}, year = {2022}, date = {2022-10-01}, booktitle = {XXVI Brazilian Symposium on Programming Languages (SBLP)}, pages = {41-49}, publisher = {ACM}, address = {Uberlândia, Brazil}, series = {SBLP'22}, abstract = {GPUs are massively parallel processors that allow solving problems that are not viable to traditional processors like CPUs. However, implementing applications for GPUs is challenging to programmers as it requires parallel programming to efficiently exploit the GPU resources. In this sense, parallel programming abstractions, notably domain-specific languages, are fundamental for improving programmability. SPar is a high-level Domain-Specific Language (DSL) that allows expressing stream and data parallelism in the serial code through annotations using C++ attributes. This work elaborates on a methodology and tool for GPU code generation by introducing new attributes to SPar language and transformation rules to SPar compiler. These new contributions, besides the gains in simplicity and code reduction compared to CUDA and OpenCL, enabled SPar achieve of higher throughput when exploring combined CPU and GPU parallelism, and when using batching.}, keywords = {GPGPU, Parallel programming, Stream processing}, pubstate = {published}, tppubtype = {inproceedings} } GPUs are massively parallel processors that allow solving problems that are not viable to traditional processors like CPUs. However, implementing applications for GPUs is challenging to programmers as it requires parallel programming to efficiently exploit the GPU resources. In this sense, parallel programming abstractions, notably domain-specific languages, are fundamental for improving programmability. SPar is a high-level Domain-Specific Language (DSL) that allows expressing stream and data parallelism in the serial code through annotations using C++ attributes. This work elaborates on a methodology and tool for GPU code generation by introducing new attributes to SPar language and transformation rules to SPar compiler. These new contributions, besides the gains in simplicity and code reduction compared to CUDA and OpenCL, enabled SPar achieve of higher throughput when exploring combined CPU and GPU parallelism, and when using batching. |
Andrade, Gabriella; Griebler, Dalvan; Santos, Rodrigo; Kessler, Christoph; Ernstsson, August; Fernandes, Luiz Gustavo Analyzing Programming Effort Model Accuracy of High-Level Parallel Programs for Stream Processing Inproceedings doi 48th Euromicro Conference on Software Engineering and Advanced Applications (SEAA 2022), pp. 229-232, IEEE, Gran Canaria, Spain, 2022. Abstract | Links | BibTeX | Tags: Parallel programming, Stream processing @inproceedings{ANDRADE:SEAA:22, title = {Analyzing Programming Effort Model Accuracy of High-Level Parallel Programs for Stream Processing}, author = {Gabriella Andrade and Dalvan Griebler and Rodrigo Santos and Christoph Kessler and August Ernstsson and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1109/SEAA56994.2022.00043}, doi = {10.1109/SEAA56994.2022.00043}, year = {2022}, date = {2022-09-01}, booktitle = {48th Euromicro Conference on Software Engineering and Advanced Applications (SEAA 2022)}, pages = {229-232}, publisher = {IEEE}, address = {Gran Canaria, Spain}, series = {SEAA'22}, abstract = {Over the years, several Parallel Programming Models (PPMs) have supported the abstraction of programming complexity for parallel computer systems. However, few studies aim to evaluate the productivity reached by such abstractions since this is a complex task that involves human beings. There are several studies to develop predictive methods to estimate the effort required to program applications in software engineering. In order to evaluate the reliability of such metrics, it is necessary to assess the accuracy in different programming domains. In this work, we used the data of an experiment conducted with beginners in parallel programming to determine the effort required for implementing stream parallelism using FastFlow, SPar, and TBB. Our results show that some traditional software effort estimation models, such as COCOMO II, fall short, while Putnam's model could be an alternative for high-level PPMs evaluation. To overcome the limitations of existing models, we plan to create a parallelism-aware model to evaluate applications in this domain in future work.}, keywords = {Parallel programming, Stream processing}, pubstate = {published}, tppubtype = {inproceedings} } Over the years, several Parallel Programming Models (PPMs) have supported the abstraction of programming complexity for parallel computer systems. However, few studies aim to evaluate the productivity reached by such abstractions since this is a complex task that involves human beings. There are several studies to develop predictive methods to estimate the effort required to program applications in software engineering. In order to evaluate the reliability of such metrics, it is necessary to assess the accuracy in different programming domains. In this work, we used the data of an experiment conducted with beginners in parallel programming to determine the effort required for implementing stream parallelism using FastFlow, SPar, and TBB. Our results show that some traditional software effort estimation models, such as COCOMO II, fall short, while Putnam's model could be an alternative for high-level PPMs evaluation. To overcome the limitations of existing models, we plan to create a parallelism-aware model to evaluate applications in this domain in future work. |
Fim, Gabriel; Welter, Greice; Löff, Júnior; Griebler, Dalvan Compressão de Dados em Clusters HPC com Flink, MPI e SPar Inproceedings doi Anais da XXII Escola Regional de Alto Desempenho da Região Sul, pp. 29-32, Sociedade Brasileira de Computação, Curitiba, Brazil, 2022. Abstract | Links | BibTeX | Tags: Parallel programming, Stream processing @inproceedings{larcc:FIM:ERAD:22, title = {Compressão de Dados em Clusters HPC com Flink, MPI e SPar}, author = {Gabriel Fim and Greice Welter and Júnior Löff and Dalvan Griebler}, url = {https://doi.org/10.5753/eradrs.2022.19153}, doi = {10.5753/eradrs.2022.19153}, year = {2022}, date = {2022-04-01}, booktitle = {Anais da XXII Escola Regional de Alto Desempenho da Região Sul}, pages = {29-32}, publisher = {Sociedade Brasileira de Computação}, address = {Curitiba, Brazil}, abstract = {Este trabalho visa avaliar o desempenho do algoritmo de compressão de dados Bzip2 com as ferramentas de processamento de stream Apache Flink, MPI e SPar utilizando um cluster Beowulf. Os resultados mostram que as versões com maior desempenho em relação ao tempo sequencial são o MPI e SPar com speed-up de 7,6 e 7,2 vezes, respectivamente.}, keywords = {Parallel programming, Stream processing}, pubstate = {published}, tppubtype = {inproceedings} } Este trabalho visa avaliar o desempenho do algoritmo de compressão de dados Bzip2 com as ferramentas de processamento de stream Apache Flink, MPI e SPar utilizando um cluster Beowulf. Os resultados mostram que as versões com maior desempenho em relação ao tempo sequencial são o MPI e SPar com speed-up de 7,6 e 7,2 vezes, respectivamente. |
Hoffmann, Renato Barreto; Löff, Júnior; Griebler, Dalvan; Fernandes, Luiz Gustavo OpenMP as runtime for providing high-level stream parallelism on multi-cores Journal Article doi The Journal of Supercomputing, 78 (1), pp. 7655-7676, 2022. Abstract | Links | BibTeX | Tags: Parallel programming, Stream processing @article{HOFFMANN:Jsuper:2022, title = {OpenMP as runtime for providing high-level stream parallelism on multi-cores}, author = {Renato Barreto Hoffmann and Júnior Löff and Dalvan Griebler and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1007/s11227-021-04182-9}, doi = {10.1007/s11227-021-04182-9}, year = {2022}, date = {2022-01-01}, journal = {The Journal of Supercomputing}, volume = {78}, number = {1}, pages = {7655-7676}, publisher = {Springer}, address = {New York, United States}, abstract = {OpenMP is an industry and academic standard for parallel programming. However, using it for developing parallel stream processing applications is complex and challenging. OpenMP lacks key programming mechanisms and abstractions for this particular domain. To tackle this problem, we used a high-level parallel programming framework (named SPar) for automatically generating parallel OpenMP code. We achieved this by leveraging SPar’s language and its domain-specific code annotations for simplifying the complexity and verbosity added by OpenMP in this application domain. Consequently, we implemented a new compiler algorithm in SPar for automatically generating parallel code targeting the OpenMP runtime using source-to-source code transformations. The experiments in four different stream processing applications demonstrated that the execution time of SPar was improved up to 25.42% when using the OpenMP runtime. Additionally, our abstraction over OpenMP introduced at most 1.72% execution time overhead when compared to handwritten parallel codes. Furthermore, SPar significantly reduces the total source lines of code required to express parallelism with respect to plain OpenMP parallel codes.}, keywords = {Parallel programming, Stream processing}, pubstate = {published}, tppubtype = {article} } OpenMP is an industry and academic standard for parallel programming. However, using it for developing parallel stream processing applications is complex and challenging. OpenMP lacks key programming mechanisms and abstractions for this particular domain. To tackle this problem, we used a high-level parallel programming framework (named SPar) for automatically generating parallel OpenMP code. We achieved this by leveraging SPar’s language and its domain-specific code annotations for simplifying the complexity and verbosity added by OpenMP in this application domain. Consequently, we implemented a new compiler algorithm in SPar for automatically generating parallel code targeting the OpenMP runtime using source-to-source code transformations. The experiments in four different stream processing applications demonstrated that the execution time of SPar was improved up to 25.42% when using the OpenMP runtime. Additionally, our abstraction over OpenMP introduced at most 1.72% execution time overhead when compared to handwritten parallel codes. Furthermore, SPar significantly reduces the total source lines of code required to express parallelism with respect to plain OpenMP parallel codes. |
Löff, Júnior; Hoffmann, Renato Barreto; Pieper, Ricardo; Griebler, Dalvan; Fernandes, Luiz Gustavo DSParLib: A C++ Template Library for Distributed Stream Parallelism Journal Article doi International Journal of Parallel Programming, 50 (5), pp. 454-485, 2022. Abstract | Links | BibTeX | Tags: Distributed computing, Parallel programming @article{LOFF:IJPP:22, title = {DSParLib: A C++ Template Library for Distributed Stream Parallelism}, author = {Júnior Löff and Renato Barreto Hoffmann and Ricardo Pieper and Dalvan Griebler and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1007/s10766-022-00737-2}, doi = {10.1007/s10766-022-00737-2}, year = {2022}, date = {2022-01-01}, journal = {International Journal of Parallel Programming}, volume = {50}, number = {5}, pages = {454-485}, publisher = {Springer}, abstract = {Stream processing applications deal with millions of data items continuously generated over time. Often, they must be processed in real-time and scale performance, which requires the use of distributed parallel computing resources. In C/C++, the current state-of-the-art for distributed architectures and High-Performance Computing is Message Passing Interface (MPI). However, exploiting stream parallelism using MPI is complex and error-prone because it exposes many low-level details to the programmer. In this work, we introduce a new parallel programming abstraction for implementing distributed stream parallelism named DSParLib. Our abstraction of MPI simplifies parallel programming by providing a pattern-based and building block-oriented development to inter-connect, model, and parallelize data streams found in modern applications. Experiments conducted with five different stream processing applications and the representative PARSEC Ferret benchmark revealed that DSParLib is efficient and flexible. Also, DSParLib achieved similar or better performance, required less coding, and provided simpler abstractions to express parallelism with respect to handwritten MPI programs.}, keywords = {Distributed computing, Parallel programming}, pubstate = {published}, tppubtype = {article} } Stream processing applications deal with millions of data items continuously generated over time. Often, they must be processed in real-time and scale performance, which requires the use of distributed parallel computing resources. In C/C++, the current state-of-the-art for distributed architectures and High-Performance Computing is Message Passing Interface (MPI). However, exploiting stream parallelism using MPI is complex and error-prone because it exposes many low-level details to the programmer. In this work, we introduce a new parallel programming abstraction for implementing distributed stream parallelism named DSParLib. Our abstraction of MPI simplifies parallel programming by providing a pattern-based and building block-oriented development to inter-connect, model, and parallelize data streams found in modern applications. Experiments conducted with five different stream processing applications and the representative PARSEC Ferret benchmark revealed that DSParLib is efficient and flexible. Also, DSParLib achieved similar or better performance, required less coding, and provided simpler abstractions to express parallelism with respect to handwritten MPI programs. |
2021 |
Löff, Júnior; Griebler, Dalvan; Mencagli, Gabriele; de Araujo, Gabriell ; Torquati, Massimo; Danelutto, Marco; Fernandes, Luiz Gustavo The NAS parallel benchmarks for evaluating C++ parallel programming frameworks on shared-memory architectures Journal Article doi Future Generation Computer Systems, 125 , pp. 743-757, 2021. Abstract | Links | BibTeX | Tags: Benchmark, Parallel programming @article{LOFF:FGCS:21, title = {The NAS parallel benchmarks for evaluating C++ parallel programming frameworks on shared-memory architectures}, author = {Júnior Löff and Dalvan Griebler and Gabriele Mencagli and Gabriell {de Araujo} and Massimo Torquati and Marco Danelutto and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1016/j.future.2021.07.021}, doi = {10.1016/j.future.2021.07.021}, year = {2021}, date = {2021-07-01}, journal = {Future Generation Computer Systems}, volume = {125}, pages = {743-757}, publisher = {Elsevier}, abstract = {The NAS Parallel Benchmarks (NPB), originally implemented mostly in Fortran, is a consolidated suite containing several benchmarks extracted from Computational Fluid Dynamics (CFD) models. The benchmark suite has important characteristics such as intensive memory communications, complex data dependencies, different memory access patterns, and hardware components/sub-systems overload. Parallel programming APIs, libraries, and frameworks that are written in C++ as well as new optimizations and parallel processing techniques can benefit if NPB is made fully available in this programming language. In this paper we present NPB-CPP, a fully C++ translated version of NPB consisting of all the NPB kernels and pseudo-applications developed using OpenMP, Intel TBB, and FastFlow parallel frameworks for multicores. The design of NPB-CPP leverages the Structured Parallel Programming methodology (essentially based on parallel design patterns). We show the structure of each benchmark application in terms of composition of few patterns (notably Map and MapReduce constructs) provided by the selected C++ frameworks. The experimental evaluation shows the accuracy of NPB-CPP with respect to the original NPB source code. Furthermore, we carefully evaluate the parallel performance on three multi-core systems (Intel, IBM Power and AMD) with different C++ compilers (gcc, icc and clang) by discussing the performance differences in order to give to the researchers useful insights to choose the best parallel programming framework for a given type of problem.}, keywords = {Benchmark, Parallel programming}, pubstate = {published}, tppubtype = {article} } The NAS Parallel Benchmarks (NPB), originally implemented mostly in Fortran, is a consolidated suite containing several benchmarks extracted from Computational Fluid Dynamics (CFD) models. The benchmark suite has important characteristics such as intensive memory communications, complex data dependencies, different memory access patterns, and hardware components/sub-systems overload. Parallel programming APIs, libraries, and frameworks that are written in C++ as well as new optimizations and parallel processing techniques can benefit if NPB is made fully available in this programming language. In this paper we present NPB-CPP, a fully C++ translated version of NPB consisting of all the NPB kernels and pseudo-applications developed using OpenMP, Intel TBB, and FastFlow parallel frameworks for multicores. The design of NPB-CPP leverages the Structured Parallel Programming methodology (essentially based on parallel design patterns). We show the structure of each benchmark application in terms of composition of few patterns (notably Map and MapReduce constructs) provided by the selected C++ frameworks. The experimental evaluation shows the accuracy of NPB-CPP with respect to the original NPB source code. Furthermore, we carefully evaluate the parallel performance on three multi-core systems (Intel, IBM Power and AMD) with different C++ compilers (gcc, icc and clang) by discussing the performance differences in order to give to the researchers useful insights to choose the best parallel programming framework for a given type of problem. |
Pieper, Ricardo; Löff, Júnior; Hoffmann, Renato Berreto; Griebler, Dalvan; Fernandes, Luiz Gustavo High-level and Efficient Structured Stream Parallelism for Rust on Multi-cores Journal Article Journal of Computer Languages, na (na), pp. na, 2021. Abstract | BibTeX | Tags: Parallel programming, Stream processing @article{PIEPER:COLA:21, title = {High-level and Efficient Structured Stream Parallelism for Rust on Multi-cores}, author = {Ricardo Pieper and Júnior Löff and Renato Berreto Hoffmann and Dalvan Griebler and Luiz Gustavo Fernandes}, year = {2021}, date = {2021-07-01}, journal = {Journal of Computer Languages}, volume = {na}, number = {na}, pages = {na}, publisher = {Elsevier}, abstract = {This work aims at contributing with a structured parallel programming abstraction for Rust in order to provide ready-to-use parallel patterns that abstract low-level and architecture-dependent details from application programmers. We focus on stream processing applications running on shared-memory multi-core architectures (i.e, video processing, compression, and others). Therefore, we provide a new high-level and efficient parallel programming abstraction for expressing stream parallelism, named Rust-SSP. We also created a new stream benchmark suite for Rust that represents real-world scenarios and has different application characteristics and workloads. Our benchmark suite is an initiative to assess existing parallelism abstraction for this domain, as parallel implementations using these abstractions were provided. The results revealed that Rust-SSP achieved up to 41.1% better performance than other solutions. In terms of programmability, the results revealed that Rust-SSP requires the smallest number of extra lines of code to enable stream parallelism..}, keywords = {Parallel programming, Stream processing}, pubstate = {published}, tppubtype = {article} } This work aims at contributing with a structured parallel programming abstraction for Rust in order to provide ready-to-use parallel patterns that abstract low-level and architecture-dependent details from application programmers. We focus on stream processing applications running on shared-memory multi-core architectures (i.e, video processing, compression, and others). Therefore, we provide a new high-level and efficient parallel programming abstraction for expressing stream parallelism, named Rust-SSP. We also created a new stream benchmark suite for Rust that represents real-world scenarios and has different application characteristics and workloads. Our benchmark suite is an initiative to assess existing parallelism abstraction for this domain, as parallel implementations using these abstractions were provided. The results revealed that Rust-SSP achieved up to 41.1% better performance than other solutions. In terms of programmability, the results revealed that Rust-SSP requires the smallest number of extra lines of code to enable stream parallelism.. |
Maliszewski, Anderson M; Vogel, Adriano; Griebler, Dalvan; Schepke, Claudio; Navaux, Philippe Ambiente de Nuvem Computacional Privada para Teste e Desenvolvimento de Programas Paralelos Incollection doi Charão, Andrea; Serpa, Matheus (Ed.): Minicursos da XXI Escola Regional de Alto Desempenho da Região Sul, pp. 104-128, Sociedade Brasileira de Computação (SBC), Porto Alegre, 2021. Abstract | Links | BibTeX | Tags: Cloud computing, Parallel programming @incollection{larcc:minicurso:ERAD:21, title = {Ambiente de Nuvem Computacional Privada para Teste e Desenvolvimento de Programas Paralelos}, author = {Anderson M. Maliszewski and Adriano Vogel and Dalvan Griebler and Claudio Schepke and Philippe Navaux}, editor = {Andrea Charão and Matheus Serpa}, url = {https://doi.org/10.5753/sbc.6150.4}, doi = {10.5753/sbc.6150.4}, year = {2021}, date = {2021-06-01}, booktitle = {Minicursos da XXI Escola Regional de Alto Desempenho da Região Sul}, pages = {104-128}, publisher = {Sociedade Brasileira de Computação (SBC)}, address = {Porto Alegre}, chapter = {5}, abstract = {A computação de alto desempenho costuma utilizar agregados de computadores para a execução de aplicações paralelas. Alternativamente, a computação em nuvem oferece recursos computacionais distribuídos para processamento com um nível de abstração além do tradicional, dinâmico e sob-demanda. Este capítulo tem como objetivo introduzir conceitos básicos, apresentar noções básicas para implantar uma nuvem privada e demonstrar os benefícios para o desenvolvimento e teste de programas paralelos em nuvem.}, keywords = {Cloud computing, Parallel programming}, pubstate = {published}, tppubtype = {incollection} } A computação de alto desempenho costuma utilizar agregados de computadores para a execução de aplicações paralelas. Alternativamente, a computação em nuvem oferece recursos computacionais distribuídos para processamento com um nível de abstração além do tradicional, dinâmico e sob-demanda. Este capítulo tem como objetivo introduzir conceitos básicos, apresentar noções básicas para implantar uma nuvem privada e demonstrar os benefícios para o desenvolvimento e teste de programas paralelos em nuvem. |
Vogel, Adriano; Gabriele, ; Mencagli, ; Griebler, Dalvan; Danelutto, Marco; Fernandes, Luiz Gustavo Online and Transparent Self-adaptation of Stream Parallel Patterns Journal Article doi Computing, 105 (5), pp. 1039-1057, 2021. Abstract | Links | BibTeX | Tags: Parallel programming, Self-adaptation, Stream processing @article{VOGEL:Computing:23, title = {Online and Transparent Self-adaptation of Stream Parallel Patterns}, author = {Adriano Vogel and Gabriele and Mencagli and Dalvan Griebler and Marco Danelutto and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1007/s00607-021-00998-8}, doi = {10.1007/s00607-021-00998-8}, year = {2021}, date = {2021-05-01}, journal = {Computing}, volume = {105}, number = {5}, pages = {1039-1057}, publisher = {Springer}, abstract = {Several real-world parallel applications are becoming more dynamic and long-running, demanding online (at run-time) adaptations. Stream processing is a representative scenario that computes data items arriving in real-time and where parallel executions are necessary. However, it is challenging for humans to monitor and manually self-optimize complex and long-running parallel executions continuously. Moreover, although high-level and structured parallel programming aims to facilitate parallelism, several issues still need to be addressed for improving the existing abstractions. In this paper, we extend self-adaptiveness for supporting autonomous and online changes of the parallel pattern compositions. Online self-adaptation is achieved with an online profiler that characterizes the applications, which is combined with a new self-adaptive strategy and a model for smooth transitions on reconfigurations. The solution provides a new abstraction layer that enables application programmers to define non-functional requirements instead of hand-tuning complex configurations. Hence, we contribute with additional abstractions and flexible self-adaptation for responsiveness at run-time. The proposed solution is evaluated with applications having different processing characteristics, workloads, and configurations. The results show that it is possible to provide additional abstractions, flexibility, and responsiveness while achieving performance comparable to the best static configuration executions.}, keywords = {Parallel programming, Self-adaptation, Stream processing}, pubstate = {published}, tppubtype = {article} } Several real-world parallel applications are becoming more dynamic and long-running, demanding online (at run-time) adaptations. Stream processing is a representative scenario that computes data items arriving in real-time and where parallel executions are necessary. However, it is challenging for humans to monitor and manually self-optimize complex and long-running parallel executions continuously. Moreover, although high-level and structured parallel programming aims to facilitate parallelism, several issues still need to be addressed for improving the existing abstractions. In this paper, we extend self-adaptiveness for supporting autonomous and online changes of the parallel pattern compositions. Online self-adaptation is achieved with an online profiler that characterizes the applications, which is combined with a new self-adaptive strategy and a model for smooth transitions on reconfigurations. The solution provides a new abstraction layer that enables application programmers to define non-functional requirements instead of hand-tuning complex configurations. Hence, we contribute with additional abstractions and flexible self-adaptation for responsiveness at run-time. The proposed solution is evaluated with applications having different processing characteristics, workloads, and configurations. The results show that it is possible to provide additional abstractions, flexibility, and responsiveness while achieving performance comparable to the best static configuration executions. |
Vanzan, Anthony; Fim, Gabriel; Welter, Greice; Griebler, Dalvan Aceleração da Classificação de Lavouras de Milho com MPI e Estratégias de Paralelismo Inproceedings doi 21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 49-52, Sociedade Brasileira de Computação, Joinville, RS, Brazil, 2021. Abstract | Links | BibTeX | Tags: Agriculture, Distributed computing, Parallel programming, Stream processing @inproceedings{larcc:DL_Classificaiton_MPI:ERAD:21, title = {Aceleração da Classificação de Lavouras de Milho com MPI e Estratégias de Paralelismo}, author = {Anthony Vanzan and Gabriel Fim and Greice Welter and Dalvan Griebler}, url = {https://doi.org/10.5753/eradrs.2021.14772}, doi = {10.5753/eradrs.2021.14772}, year = {2021}, date = {2021-04-01}, booktitle = {21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS)}, pages = {49-52}, publisher = {Sociedade Brasileira de Computação}, address = {Joinville, RS, Brazil}, abstract = {Este trabalho visou acelerar a execução de um algoritmo de classificação de lavouras em imagens áreas. Para isso, foram implementadas diferentes versões paralelas usando a biblioteca MPI na linguagem Python. A avaliação foi conduzida em dois ambientes computacionais. Conclui-se que é possível reduzir o tempo de execução a medida que mais recursos paralelos são usados e a estratégia de distribuição de trabalho dinâmica é mais eficiente.}, keywords = {Agriculture, Distributed computing, Parallel programming, Stream processing}, pubstate = {published}, tppubtype = {inproceedings} } Este trabalho visou acelerar a execução de um algoritmo de classificação de lavouras em imagens áreas. Para isso, foram implementadas diferentes versões paralelas usando a biblioteca MPI na linguagem Python. A avaliação foi conduzida em dois ambientes computacionais. Conclui-se que é possível reduzir o tempo de execução a medida que mais recursos paralelos são usados e a estratégia de distribuição de trabalho dinâmica é mais eficiente. |
Leonarczyk, Ricardo; Griebler, Dalvan Implementação MPIC++ e HPX dos Kernels NPB Inproceedings doi 21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 81-84, Sociedade Brasileira de Computação, Joinville, RS, Brazil, 2021. Abstract | Links | BibTeX | Tags: Benchmark, Parallel programming @inproceedings{larcc:NPB_HPX_MPI:ERAD:21, title = {Implementação MPIC++ e HPX dos Kernels NPB}, author = {Ricardo Leonarczyk and Dalvan Griebler}, url = {https://doi.org/10.5753/eradrs.2021.14780}, doi = {10.5753/eradrs.2021.14780}, year = {2021}, date = {2021-04-01}, booktitle = {21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS)}, pages = {81-84}, publisher = {Sociedade Brasileira de Computação}, address = {Joinville, RS, Brazil}, abstract = {Este artigo apresenta a implementação paralela dos cinco kernels pertencentes ao NAS Parallel Benchmarks (NPB) com MPIC++ e HPX para execução em arquiteturas de cluster. Os resultados demonstraram que o modelo de programação HPX pode ser mais eficiente do que MPIC++ em algoritmos tais como transformada rápida de Fourier, ordenação e Gradiente Conjugado.}, keywords = {Benchmark, Parallel programming}, pubstate = {published}, tppubtype = {inproceedings} } Este artigo apresenta a implementação paralela dos cinco kernels pertencentes ao NAS Parallel Benchmarks (NPB) com MPIC++ e HPX para execução em arquiteturas de cluster. Os resultados demonstraram que o modelo de programação HPX pode ser mais eficiente do que MPIC++ em algoritmos tais como transformada rápida de Fourier, ordenação e Gradiente Conjugado. |
2019 |
Maliszewski, Anderson; Roloff, Eduardo; Griebler, Dalvan; Navaux, Philippe O Impacto da Interconexão de Rede no Desempenho de Programas Paralelos Inproceedings doi Anais do XX Simpósio em Sistemas Computacionais de Alto Desempenho, pp. 73-84, Sociedade Brasileira de Computação, Campo Grande, Brazil, 2019. Abstract | Links | BibTeX | Tags: Cloud computing, Parallel programming @inproceedings{larcc:impacto_interconexao_HPC:WSCAD:19, title = {O Impacto da Interconexão de Rede no Desempenho de Programas Paralelos}, author = {Anderson Maliszewski and Eduardo Roloff and Dalvan Griebler and Philippe Navaux}, url = {https://doi.org/10.5753/wscad.2019.8658}, doi = {10.5753/wscad.2019.8658}, year = {2019}, date = {2019-10-01}, booktitle = {Anais do XX Simpósio em Sistemas Computacionais de Alto Desempenho}, pages = {73-84}, publisher = {Sociedade Brasileira de Computação}, address = {Campo Grande, Brazil}, series = {WSCAD'19}, abstract = {O desempenho de aplicações paralelas depende de dois componentes principais do ambiente; o poder de processamento e a interconexão de rede. Neste trabalho, foi avaliado o impacto de uma interconexão de alto desempenho em programas paralelos em um cluster homogêneo de servidores interconectados por Gigabit Ethernet 1 Gbps e InfiniBand FDR 56 Gbps. Foi realizada uma caracterização do NAS Parallel Benchmarks em relação à computação, comunicação e custo de execução em instâncias da Microsoft Azure. Os resultados mostraram que, em aplicações altamente dependentes de rede, o desempenho pode ser significativamente melhorado ao utilizar InfiniBand a um custo de execução melhor, mesmo com o preço superior da instância.}, keywords = {Cloud computing, Parallel programming}, pubstate = {published}, tppubtype = {inproceedings} } O desempenho de aplicações paralelas depende de dois componentes principais do ambiente; o poder de processamento e a interconexão de rede. Neste trabalho, foi avaliado o impacto de uma interconexão de alto desempenho em programas paralelos em um cluster homogêneo de servidores interconectados por Gigabit Ethernet 1 Gbps e InfiniBand FDR 56 Gbps. Foi realizada uma caracterização do NAS Parallel Benchmarks em relação à computação, comunicação e custo de execução em instâncias da Microsoft Azure. Os resultados mostraram que, em aplicações altamente dependentes de rede, o desempenho pode ser significativamente melhorado ao utilizar InfiniBand a um custo de execução melhor, mesmo com o preço superior da instância. |
2018 |
Griebler, Dalvan; De Sensi, Daniele ; Vogel, Adriano; Danelutto, Marco; Fernandes, Luiz Gustavo Service Level Objectives via C++11 Attributes Inproceedings doi Euro-Par 2018: Parallel Processing Workshops, pp. 745-756, Springer, Turin, Italy, 2018. Abstract | Links | BibTeX | Tags: Parallel programming @inproceedings{GRIEBLER:SLO-SPar-Nornir:REPARA:18, title = {Service Level Objectives via C++11 Attributes}, author = {Dalvan Griebler and Daniele {De Sensi} and Adriano Vogel and Marco Danelutto and Luiz Gustavo Fernandes}, url = {http://dx.doi.org/10.1007/978-3-030-10549-5_58}, doi = {10.1007/978-3-030-10549-5_58}, year = {2018}, date = {2018-08-01}, booktitle = {Euro-Par 2018: Parallel Processing Workshops}, pages = {745-756}, publisher = {Springer}, address = {Turin, Italy}, series = {Lecture Notes in Computer Science}, abstract = {In recent years, increasing attention has been given to the possibility of guaranteeing Service Level Objectives (SLOs) to users about their applications, either regarding performance or power consumption. SLO can be implemented for parallel applications since they can provide many control knobs (e.g., the number of threads to use, the clock frequency of the cores, etc.) to tune the performance and power consumption of the application. Different from most of the existing approaches, we target sequential stream processing applications by proposing a solution based on C++ annotations. The user specifies which parts of the code to parallelize and what type of requirements should be enforced on that part of the code. Our solution first automatically parallelizes the annotated code and then applies self-adaptation approaches at run-time to enforce the user-expressed objectives. We ran experiments on different real-world applications, showing its simplicity and effectiveness.}, keywords = {Parallel programming}, pubstate = {published}, tppubtype = {inproceedings} } In recent years, increasing attention has been given to the possibility of guaranteeing Service Level Objectives (SLOs) to users about their applications, either regarding performance or power consumption. SLO can be implemented for parallel applications since they can provide many control knobs (e.g., the number of threads to use, the clock frequency of the cores, etc.) to tune the performance and power consumption of the application. Different from most of the existing approaches, we target sequential stream processing applications by proposing a solution based on C++ annotations. The user specifies which parts of the code to parallelize and what type of requirements should be enforced on that part of the code. Our solution first automatically parallelizes the annotated code and then applies self-adaptation approaches at run-time to enforce the user-expressed objectives. We ran experiments on different real-world applications, showing its simplicity and effectiveness. |