Latency‐aware adaptive micro‐batching techniques for streamed data compression on graphics processing units

LARCC researchers (Charles Stein, Dinei Rockenbach and Dalvan Griebler) had the collaboration of researchers (Gabriele Mencagli, Massimo Torquati and Marco Danelutto) from the University of Pisa (Italy) and a researcher (Luiz G. Fernandes) from PUCRS (Brazil) for a paper submitted to the journal Concurrency and Computation: Practice and Experience (CCPE). It was prepared and submitted during 2019. It was recently accepted and made available online on May 4, 2020, and can be accessed through the following link: https://doi.org/10.1002/cpe.5786.

This paper studies adaptive techniques to allow the use of GPUs in data compression respecting latency limits established by the programmer (called Service Level Objectives or SLO). In the study, four algorithms are presented to adapt the size of the micro-batch (data block) sent for processing in the GPU. To evaluate the performance of these algorithms, four files were compressed: the content of the Wikipedia site in English (14 GB), the source code of the Linux operating system (816 MB), the Silesia data corpus (202 MB) and a customized file that the authors generated specifically for these tests (1.6 GB).

Considering the 4 algorithms with the 4 workloads, the different adaptation factors, latency targets and acceptance margins (thresholds), a total of 768 experiments were performed on the LARCC server infrastructure for the generation of the data. At the end of the study, it was possible to realize that it is possible to achieve latency objectives with automatic adaptation to respond to fluctuations in the workload. However, the characteristics of these loads (the files being compressed) directly influence the performance of the algorithm, therefore the choice of the best algorithm depends directly on the characteristics of the file being compressed. The study continues research previously published, namely the paper published in IPDPSW on the use of GPUs in streaming applications, and the paper published in the PDP on the use of GPUs in the LZSS compression algorithm.

The paper’s title and abstract can be seen below:

Paper title: Latency‐aware adaptive micro‐batching techniques for streamed data compression on graphics processing units

Abstract: Stream processing is a parallel paradigm used in many application domains. With the advance of graphics processing units (GPUs), their usage in stream processing applications has increased as well. The efficient utilization of GPU accelerators in streaming scenarios requires to batch input elements in microbatches, whose computation is offloaded on the GPU leveraging data parallelism within the same batch of data. Since data elements are continuously received based on the input speed, the bigger the microbatch size the higher the latency to completely buffer it and to start the processing on the device. Unfortunately, stream processing applications often have strict latency requirements that need to find the best size of the microbatches and to adapt it dynamically based on the workload conditions as well as according to the characteristics of the underlying device and network. In this work, we aim at implementing latency‐aware adaptive microbatching techniques and algorithms for streaming compression applications targeting GPUs. The evaluation is conducted using the Lempel‐Ziv‐Storer‐Szymanski compression application considering different input workloads. As a general result of our work, we noticed that algorithms with elastic adaptation factors respond better for stable workloads, while algorithms with narrower targets respond better for highly unbalanced workloads.