VP vs CUDA decoding performance

Support forum for DGDecNV
DAE avatar
JoeH
Posts: 16
Joined: Mon Jan 10, 2011 6:06 am

VP vs CUDA decoding performance

Post by JoeH »

Build 2041 of DGDecNV has allowed us to decode video using CUDA, as well as the option we have always had to decode using VP. I am creating this topic to post comparisons of decoding performance, especially comparing VP vs CUDA decoding performance of AVC/VC1 on some of the higher end cards, where CUDA might stand a chance of being faster than VP.

I currently own a very very old OEM NVidia card, so unfortunately I can't share any numbers myself. I am, however, hoping to base my next purchase on this comparison.
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: VP vs CUDA decoding performance

Post by admin »

Good idea, Joe, because the existing thread was based on a bad release, so the results are only semi-valid. :agree:

@all

The first one to see a high-end card with VP5 please shout it from the rooftop!
DAE avatar
JoeH
Posts: 16
Joined: Mon Jan 10, 2011 6:06 am

Re: VP vs CUDA decoding performance

Post by JoeH »

I got out a GT450 that I have, but don't regularly use because it has stability problems. Should be fine for pure CUDA, hopefully.

I have build 2041 running, and on the "list GPU devices" tab it identifies my GT450 as active. When I set the DGIndexNV.ini file to "Decode_Modes=1,1,1", the "List GPU Devices" tab says that I am using "cuda" to decode all three types of video. However, if I open up an M2TS AVC file, set playback speed to maximum, hit the option to "Disable Display" and then hit the play button, my GPUz says that the VP engine is at 99%, and the GPU stays at 0%!! This result is the same when I set the "Decode_Modes=0,0,0", and the speed result is the same as well.

Am I doing something wrong? Is there a "proper" way to test CUDA vs VP decoding speed?
DAE avatar
statica
Posts: 10
Joined: Wed Apr 18, 2012 9:38 am

Re: VP vs CUDA decoding performance

Post by statica »

Hi

I just got my license for DGDecNV yesterday and have been testing how it performs when demuxing a video. I have a GTS450 1GB DDR5 CUDA gfx, i7 2600K @ 3.8ghz, 16GB DDR3 and a 60GB OCZ SSD system disc. Running Windows 7 x64 Ultimate. I have set the Decode_Modes in DGIndexNV.ini file to 1,1,1 and I am using Sherlock Holmes 2 BD50 and remuxing the main .m2ts file with DTS-HD audio, AVC, 4.1 High Profile. The movie is 2h1m without end credits.

Demuxing and saving this .m2ts file took 18m35s. I tried at first without the change to the DGIndexNV.ini file and that took 23m42s. So I know its perhaps not the test you want but it is still clear to me that the same m2ts file is demuxed and saved in 5 minutes less which is ~25% faster compared to first attemt at 23m42s. Only change I made was to set Decode_Modes to 1,1,1 for the second test.

Hope this might provide you with some information that can be of use to you. Else feel free to let me know if there is something other you would like tested and I will gladly do that if it can be of any help to you :)
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: VP vs CUDA decoding performance

Post by admin »

Another point to remember is that the cuviddec.h file states this for the variable to force CUDA over CUVID:

"Use a CUDA-based decoder if faster than dedicated engines"

I don't know how that decision is made but it could account for cases where there appears to be no difference.
DAE avatar
JoeH
Posts: 16
Joined: Mon Jan 10, 2011 6:06 am

Re: VP vs CUDA decoding performance

Post by JoeH »

Anyone have a 680 to run this test on? It would be interesting to see what type of CUDA decoding performance it has.
DAE avatar
statica
Posts: 10
Joined: Wed Apr 18, 2012 9:38 am

Re: VP vs CUDA decoding performance

Post by statica »

Can I use DGDecNV to try and test the CUDA performance in BD-Rebuilder?

Is so then how do I do that?

Sorry for the noob question but as I normally do not use BD-Rebuilder then I am not quite sure how to make DGDecNV help by adding CUDA support to BD-Rebuilder. If BD-Rebuilder cant be used then what program would you then recommend for use with DGDecNV to test CUDA performance?

Thank you for your patience with me and my beginner questions :)
DAE avatar
statica
Posts: 10
Joined: Wed Apr 18, 2012 9:38 am

Re: VP vs CUDA decoding performance

Post by statica »

Thanks for that suggestion and I tried it but since my gfx died 2 days ago I ran the test with a borrowed Geforce 9800GT:

Operating System: Windows 7 Ultimate, 64-bit (Service Pack 1)
DirectX version: 11.0
GPU processor: GeForce 9800 GT
Driver version: 296.10
DirectX support: 10
CUDA Cores: 112
Core clock: 600 MHz
Shader clock: 1500 MHz
Memory clock: 900 MHz (1800 MHz data rate)
Memory interface: 256-bit
Total available graphics memory: 4095 MB
Dedicated video memory: 512 MB GDDR3
System video memory: 0 MB
Shared system memory: 3583 MB
Video BIOS version: 62.92.6D.00.00
IRQ: 16
Bus: PCI Express x16 Gen2

1. run - FPS: (min/max/avg): 28.34 : 57.33 : 49.28
2. run - FPS: (min/max/avg): 28.48 : 57:24 : 49.30

So now I can see if there will be a difference when I get my own GFX return. It is a GTS450 so perhaps more FPS to gain there :D
DAE avatar
statica
Posts: 10
Joined: Wed Apr 18, 2012 9:38 am

Re: VP vs CUDA decoding performance

Post by statica »

A GTS450 will give you roughly twice the speed of a 9800GT.
That seems like a plausible guess as I did run DGDecNV with my GTS450 before both my psu and gfx died and I was getting around 80-90fps and with the 9800GT the average is roughly 50fps so I think that your estimate is what is to be expected when my gfx returns.
DAE avatar
statica
Posts: 10
Joined: Wed Apr 18, 2012 9:38 am

Re: VP vs CUDA decoding performance

Post by statica »

Well I got a new graphicscard but as they couldnt get me a GTS450 I got a GTX550Ti instead :D

So I ran the test again to see the the difference in speed.

Operating System: Windows 7 Ultimate, 64-bit (Service Pack 1)
DirectX version: 11.0
GPU processor: GeForce GTX 550 Ti
Driver version: 296.10
DirectX support: 11
CUDA Cores: 192
Core clock: 900 MHz
Shader clock: 1800 MHz
Memory clock: 2052 MHz (4104 MHz data rate)
Memory interface: 192-bit
Total available graphics memory: 4095 MB
Dedicated video memory: 1024 MB GDDR5
System video memory: 0 MB
Shared system memory: 3071 MB
Video BIOS version: 70.26.20.00.00
IRQ: 16
Bus: PCI Express x16 Gen2

1. run - FPS: (min/max/avg): 171.91 : 179.86 : 174.88
2. run - FPS: (min/max/avg): 172.30 : 179.82 : 173.96

So I now get more than 3x the FPS that I did with my old GTS450!

So I am quite sure that CUDA makes a huge difference....or at least it does in my case :)
DAE avatar
JoeH
Posts: 16
Joined: Mon Jan 10, 2011 6:06 am

Re: VP vs CUDA decoding performance

Post by JoeH »

The question is whether in some top of the line cards CUDA can beat VP5 in some cases....
User avatar
flyordie
Posts: 39
Joined: Thu Nov 18, 2010 10:07 am

Re: VP vs CUDA decoding performance

Post by flyordie »

I have a GTX 570, GTX 460, and GT 520. They are in different machines and I stopped after running the GTX 570.
I used the rat.264 file mentioned above and the following avs script:

Code: Select all

LoadPlugin("DGDecodeNV.dll")
DGSource("rat.dgi")
loop(10)
I used AVSMeter v1.17
Using 2041
In the rat.dgi file the line:
DECODE_MODES 1,1,1
or
DECODE_MODES 0,0,0
made no difference. The video engine was used.

Code: Select all

AVSMeter v1.17 by Groucho2004
AviSynth 2.58, build:Dec 22 2008 [08:46:51]
Number of frames:            8390
Length (h:m:s.ms):       00:05:49.933
Frame width:                 1920
Frame height:                1080
Framerate:                     23.976 (24000/1001)
Progressive:                  Yes
Colorspace:                  YV12
Hit ESC to exit...
Frame 8390/8390, fps (min/max/avg): 28.67 | 77.70 | 62.13
Running time (h:m:s.ms):  00:02:15.048
I had DGIndexNV.exe generate the .dgi file each time and repaced the DGIndexNV.dll in the local folder with the appropriate version of DGIndexNV I was running at the time and versions 2015, 2039, 2040_cuda, 2042, 2042rc1 all used the Video engine on the GTX 570.
DAE avatar
statica
Posts: 10
Joined: Wed Apr 18, 2012 9:38 am

Re: VP vs CUDA decoding performance

Post by statica »

My test was done with the rat.dgi file and AVSmeter. I created aa Avisynth script as was suggested to me earlier to do both my tests and only difference between my first and second test is the graphicscard. Changed from GTS450 to GTX550Ti. So I did follow the Doom9 guide on both tests.
DAE avatar
JoeH
Posts: 16
Joined: Mon Jan 10, 2011 6:06 am

Re: VP vs CUDA decoding performance

Post by JoeH »

statica, could you see if you can duplicate your speed results with AVSMeter 1.17 and rat.264? If you are able, could you post the steps you followed?

As far as I know you are the first person to get results like that (hopefully not the last). It would be good if flyordie could try to reproduce your steps. As he has a GTX 570, if your method works correctly he should also be able to get those numbers or higher.
DAE avatar
JoeH
Posts: 16
Joined: Mon Jan 10, 2011 6:06 am

Re: VP vs CUDA decoding performance

Post by JoeH »

That could be. But, why are you so sure that CUDA can't decode faster than VP on high end cards? It seems like almost no tests have been done (or if they have, I haven't seen them). With all the power the high end CUDA engines have, it would seem strange to me that they couldn't best the VP engines, even VP5.
DAE avatar
statica
Posts: 10
Joined: Wed Apr 18, 2012 9:38 am

Re: VP vs CUDA decoding performance

Post by statica »

Okay I did a new test with Avsmeter 1.17, rat.264 and dgdecnv2042rc1.

Operating System: Windows 7 Ultimate, 64-bit (Service Pack 1)
DirectX version: 11.0
GPU processor: GeForce GTX 550 Ti
Driver version: 301.24
DirectX support: 11
CUDA Cores: 192
Core clock: 900 MHz
Shader clock: 1800 MHz
Memory clock: 2052 MHz (4104 MHz data rate)
Memory interface: 192-bit
Total available graphics memory: 4094 MB
Dedicated video memory: 1023 MB GDDR5
System video memory: 0 MB
Shared system memory: 3071 MB
Video BIOS version: 70.26.20.00.00
IRQ: 16
Bus: PCI Express x16 Gen2


Frame 839/839, fps <min/max/avg>: 128.72 : 169.49 : 160.15
Running time <h:m:s.ms>: 00:00:13.499

So is that correctly done?
DAE avatar
JoeH
Posts: 16
Joined: Mon Jan 10, 2011 6:06 am

Re: VP vs CUDA decoding performance

Post by JoeH »

Could you also post the steps you are taking to activate the option to decode using CUDA instead of VP?
User avatar
Rabomil
Posts: 1
Joined: Wed May 23, 2012 6:32 pm

Re: VP vs CUDA decoding performance

Post by Rabomil »

Operating System: Windows 7 Enterprise, 64-bit (Service Pack 1)
DirectX version: 11.0
GPU processor: GeForce GTX 460
Driver version: 301.42
DirectX support: 11.1
CUDA Cores: 336
Core clock: 700 MHz
Shader clock: 1400 MHz
Memory clock: 1800 MHz (3600 MHz data rate)
Memory interface: 256-bit
Total available graphics memory: 3834 MB
Dedicated video memory: 1023 MB GDDR5
System video memory: 0 MB
Shared system memory: 2811 MB
Video BIOS version: 70.04.13.00.01
IRQ: 24
Bus: PCI Express x16 Gen2

Number of frames: 8390
Length (h:m:s.ms): 00:05:49.933
Frame width: 1920
Frame height: 1080
Framerate: 23.976 (24000/1001)
Progressive: Yes
Colorspace: YV12

Hit ESC to exit...
Frame 8390/8390, fps (min/max/avg): 28.67 | 75.49 | 62.27
Running time (h:m:s.ms): 00:02:14.743


When you look at the average framerate calculated by AVSMeter, it is more or less the same as when you would calculate it manually (8390 frames / 135 seconds = 62.15 fps). This works everytime, except with the tests run by statica. AVSMeter calculates a 160 FPS average, while the manual calculation shows 62.15 fps (839 frames / 13.499 sec.). The manual calculation appears to be more correct, knowing that the tests run by flyordie and by me show an average 62.2 fps, using a VP4 card and statica is also using a VP4 card.
DAE avatar
statica
Posts: 10
Joined: Wed Apr 18, 2012 9:38 am

Re: VP vs CUDA decoding performance

Post by statica »

I have no clue what I have done to get those frame rates. I have followed the instructions in this thread and the link to the doom9 forum and nothing else as far as I can remember. Only thing is that if I set my CUDA to 255 then it crashes DGDecNV but at 192 there is no problem. So I have no idea what results in me getting those frame rates but I do think they are somehow inaccurate because I have run some long time encoding where BD-Rebuilder show avg. 80 FPS.

So I dont think the frame rates are correct in the test but why they show up as being so high, I have no idea. I suspect a bug or something in either my system or in the combination of my system and DGDecNV.
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: VP vs CUDA decoding performance

Post by admin »

statica wrote:Only thing is that if I set my CUDA to 255 then it crashes DGDecNV but at 192 there is no problem.
Please give me more details. What do you mean by "set my CUDA"? What crashes, DGIndexNV or DGDecodeNV? Please post your complete INI file for the crashing case. Thank you.
DAE avatar
JoeH
Posts: 16
Joined: Mon Jan 10, 2011 6:06 am

Re: VP vs CUDA decoding performance

Post by JoeH »

neuron2 wrote:Another point to remember is that the cuviddec.h file states this for the variable to force CUDA over CUVID:

"Use a CUDA-based decoder if faster than dedicated engines"

I don't know how that decision is made but it could account for cases where there appears to be no difference.
admin, have you ever followed up with NVidia about this to understand better what's going on to make this choice? I wonder if they would be willing to add an option to force CUDA decode... of course we might just find out that VP is in fact faster, but it would be interesting to have the option.
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: VP vs CUDA decoding performance

Post by admin »

Interesting, thank you for the results. MPEG2 was defaulted to GPU to work around an older bug for some streams. As that has likely been repaired by now it would be worth exploring restoring the default for MPEG2 to VPU.
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: VP vs CUDA decoding performance

Post by admin »

Time to go back to default 0,0,0 then it seems. Thanks for your testing.
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: VP vs CUDA decoding performance

Post by admin »

I originally made the default 0,1,0 because of a single MPEG2 stream that decoded with errors. But that was many years and driver versions ago, and I was not aware of the large performance implications you have shown. So overall, it seems that the default should now be 0,0,0 and in the probably now rare case of a decoding error, 0,1,0 can be tried.
User avatar
hydra3333
Posts: 406
Joined: Wed Oct 06, 2010 3:34 am
Contact:

Re: VP vs CUDA decoding performance

Post by hydra3333 »

admin wrote:I originally made the default 0,1,0 because of a single MPEG2 stream that decoded with errors. But that was many years and driver versions ago, and I was not aware of the large performance implications you have shown. So overall, it seems that the default should now be 0,0,0 and in the probably now rare case of a decoding error, 0,1,0 can be tried.
OK thanks.
I've just edited DGIndexNV.ini to use "Decode_Modes=0,0,0" instead of "Decode_Modes=0,1,0" as I almost exclusively have PAL 576i mpeg2 source files and have a 750Ti card. What error condition would I see to know to revert to the default of 0,1,0 ?
I really do like it here.
Post Reply