Library Technology - Reviews, Tips, Giveaways, Freeware

Library Technology – Reviews, Tips, Giveaways, Freeware

Performance of CUDA JPEG Encoding Exceeds Two Times the Bandwidth of PCI-Express 3.0 x16

Posted In Utilities - By Techtiplib on Tuesday, May 26th, 2015 With No Comments »

Fastvideo company has released a super fast CUDA JPEG encoder. Performance of the encoder for NVIDIA GeForce GTX 980 could be more than 20 GByte per second for images loaded into GPU memory, which is two times more than PCIE-3.0 x16 bandwidth. CUDA JPEG encoder from Fastvideo was the fastest on the market and now it´s three times faster in comparison with the previous version.

In 2011 Fastvideo pioneered the first fully parallel JPEG codec for NVIDIA GPUs. Since then there was a lot of progress both with NVIDIA hardware and Fastvideo software. As a result, high performance was achieved for JPEG compression on GPU. Now there is an answer to the question “What could be done faster: to send uncompressed 4K image from CPU to GPU over PCI Express 3.0 x16 or to do JPEG compression on GPU?” Since now JPEG encoding could be two times faster. This is a new reality and a new level of modern hardware and software.

Fast JPEG compression is a must in various media, industrial, scientific, medical and other applications. Nowadays quite standard task is long-term realtime video recording for cameras with very high resolution or high frame rate. JPEG is the most common format for image storage. Massive JPEG handling is important for web and currently it´s possible to resize more than million of JPEG images per hour at just one GPU.


Trying to compare time which is necessary to send image data from PC RAM to GPU memory and JPEG encoding time on GPU, one could clearly see that JPEG compression could be much faster than data transfer over PCI Express 3.0 x16 bus. Sending 24-bit 4K image with resolution 3840 x 2160 from CPU to GPU over PCI-E 3.0 x16 takes about 2.17 ms. JPEG encoding time on GPU for the same image with compression ratio ~10:1 (JPEG quality 90%) and subsampling 4:2:0 is about 1.13 ms on NVIDIA GeForce GTX 980. This outstanding result comes from powerful NVIDIA hardware and from highly optimized massive parallel implementation of JPEG algorithm from Fastvideo.

The idea of full image processing pipeline on GPU is very promising. This is the way to avoid unnecessary data transfers over PCIE bus and to improve total performance and reduce latency due to parallel algorithms for image and video processing. That idea is successfully implemented in GPU Image Processing SDK from Fastvideo. Many cameras are already working in realtime with that software while doing all image processing on GPU.


JPEG codec from Fastvideo is available as a part of GPU Image Processing SDK for Windows-7/8 and Linux. Demo version of CUDA JPEG codec is available from Fastvideo website and works under Windows-7/8. Fastvideo SDK trial is available upon request.


Product page:


Fastvideo was founded in 2009 in Dubna, Russia. Company is specializing in high speed camera design and GPU image and video processing. The most powerful product of Fastvideo is high performance SDK for realtime image and video processing on NVIDIA GPUs.

More contents in:

About - Hey, this blog belongs to me! I am the founder of TechTipLib and managing editor right now. And I love to hear what do you think about this article, leave comment below! Thank you so much...