CUDASynth

These CUDA filters are packaged into DGDecodeNV, which is part of DGDecNV.
Post Reply
DAE avatar
Guest

Re: CUDASynth

Post by Guest »

The price is reduced by a factor of 5

Sales are good, faster is better
:hat:
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: CUDASynth

Post by admin »

Wow, gonca, listen to this...

I did some optimizations to reduce the sizes of the critical sections for the lock. Then I ran this script:

DGSource(fdst="cpu")

I got 207.1 fps. Note that adding prefetch here does not help and in fact greatly reduces the fps.

Now I ran this script:

DGSource(fdst="gpu0")
DGSharpen(fsrc="gpu0")
prefetch(2)

I got 206.3 fps! :wow: This means that thanks to CUDASynth, a limited sharpen filter is being executed on a 3840x2160 frame essentially for free.

This is what I mean by bringing out the true power of NVDec/CUDA for Avisynth/Vapoursynth.
DAE avatar
Guest

Re: CUDASynth

Post by Guest »

This brings up the issue of hardware encoding.
Depending on Nvidia's capabilities on the new cards, a two card system could be amazingly fast
One card pre-processing and frame serving, and the second encoding
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: CUDASynth

Post by admin »

gonca wrote:
Sat Sep 08, 2018 2:02 pm
This brings up the issue of hardware encoding.
Depending on Nvidia's capabilities on the new cards, a two card system could be amazingly fast
One card pre-processing and frame serving, and the second encoding
For sure. Someone send me another 1080 Ti, or maybe an RTX 2080 Ti. I would accept either one. :P :salute:

Even with one GPU we can make an encoder filter and put it at the end of the pipeline taking input from gpu0/1, saving PCIe and copies between the Avisynth output and the encoder. On the list now.
User avatar
hydra3333
Posts: 394
Joined: Wed Oct 06, 2010 3:34 am
Contact:

Re: CUDASynth

Post by hydra3333 »

admin wrote:
Sat Sep 08, 2018 12:40 pm
Now I ran this script:
DGSource(fdst="gpu0")
DGSharpen(fsrc="gpu0")
prefetch(2)

I got 206.3 fps! :wow: This means that thanks to CUDASynth, a limited sharpen filter is being executed on a 3840x2160 frame essentially for free.

This is what I mean by bringing out the true power of NVDec/CUDA for Avisynth/Vapoursynth.
That is truly impressive.
admin wrote:
Sat Sep 08, 2018 12:40 pm
Even with one GPU we can make an encoder filter and put it at the end of the pipeline taking input from gpu0/1, saving PCIe and copies between the Avisynth output and the encoder. On the list now.
And I'd thought I could not get any more excited ... :)
I really do like it here.
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: CUDASynth

Post by admin »

I finished supporting both fsrc and fdst for DGSharpen so I am able to show the results on a longer pipeline. Here is the script:

dgsource("LG Chess 4K Demo.dgi",fulldepth=false,fdst="gpu0")
DGSharpen(fsrc="gpu0",fdst="gpu1")
DGSharpen(fsrc="gpu1",fdst="gpu0")
DGSharpen(fsrc="gpu0")
prefetch(5)

For the 3840x2160 59.94 fps stream I get 161.6 fps. When I do not use the pipeline (all fsrc and fdst are "cpu"), then I get 76.8 fps. So CUDASynth is twice as fast and makes the difference between real-time and non-real-time operation. There is still a lot of headroom to have more filters while remaining real-time. The average "price" of the sharpens here is about 15 fps each.

Next, after I add P16 support, I'm going to CUDASynth-enable DGHDRtoSDR and see what kind of frame rate we can get for a full HDR to SDR script.
DAE avatar
Guest

Re: CUDASynth

Post by Guest »

what kind of frame rate we can get for a full HDR to SDR script.

This should be good!
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: CUDASynth

Post by admin »

Oy, it took a whole week to get the pipeline running in P16 [DGSource(fulldepth=true]. My NV12toYV12 kernel was broken for P16 and I had some bad pitch handling in DGSharpen for the case of fsrc=gpu and fdst=gpu. The implementation was tricky as debugging is hard when the intermediate ping-pong buffers are not visible from the host to check their contents at various points along the pipeline. Writing code to copy them down to the host would be possible but tricky so I resorted to some hunches and other tricks such as memsetting the device memory at various points to see if the values showed up at the final output as expected. It was something of a week-long nightmare because when I go to bed with outstanding bugs in my code my brain works overtime all night. And the bugs were like a layered onion; rack your brains to fix one layer and another gets exposed. After multiple layers in a row stretching over a week I got pretty zonked out from lack of proper sleep. :(

But I am a patient and persistent soul and never give up (unless the goal is theoretically impossible), and so everything is working perfectly now :D for a 4-filter GPU pipeline (DGSource + 3 x DGSharpen) in P8 and P16. I'm going to CUDASynth-enable DGHDRtoSDR now and see how it performs in a DGSource->DGHDRtoSDR pipeline.

At some point I will publish a specification for how to implement a CUDASynth compatible filter together with a source code example. Without that CUDASynth acceleration would be limited to my own filters.
DAE avatar
Guest

Re: CUDASynth

Post by Guest »

I guess you never give up then since theoretically impossible is theoretically impossible.
Theoretically improbable, maybe
Theoretically not possible at this time, ok
But with technological improvements and increases in knowledge what is impossible today might be possible tomorrow
Therefore
theoretically impossible is theoretically impossible
and QED
you never give up
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: CUDASynth

Post by admin »

OK. :salute:
DAE avatar
Guest

Re: CUDASynth

Post by Guest »

I am just happy you never give up
We keep getting new and better avs/vpy tools to use
Thank you
:bravo:
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: CUDASynth

Post by admin »

I have CUDASynth-enabled DGHDRtoSDR and have some preliminary performance numbers for your enjoyment. The source is the same as previously used: 3840x2160 59.94 fps HDR10. The script with GPU pipelining is:

dgsource("LG Chess 4K Demo.dgi",fulldepth=true,fdst="gpu0")
dghdrtosdr(impl="255",light=250,fsrc="gpu0") # outputs YV12
prefetch(4)

Not pipelined on GPU: 80 fps, CPU 13%
Pipelined on GPU: 204 fps, CPU 8%

Quite a substantial performance boost!
DAE avatar
Guest

Re: CUDASynth

Post by Guest »

Nice boost, 250%
DAE avatar
Guest

Re: CUDASynth

Post by Guest »

Any ETA on public testing?
No rush, just getting antsy
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: CUDASynth

Post by admin »

Still have some things to finish up: Vapoursynth support, fdst parameter for DGHDRtoSDR, CUDASynth-enable DGDenoise, documentation, source code example. And I'm timesharing with DGIndex MKV support. Hang in there.

BTW, the CPU reduction is also important as it leaves more CPU for encoding.

It's hard to find 2080 Ti's:

https://www.nowinstock.net/computers/vi ... rtx2080ti/

Puts the lie to some of the whining at other forums by people saying it's too expensive, nobody wants it, nVidia are dirty rotten criminal capitalists, how dare they make a GPU I can't afford, blah blah blah.

gonca, what's the fastest most powerful Threadripper likely to be available within a few months?

I saved a lot of dough doing my own bathroom remodeling so I have the ready green to dish out for the best hardware. :twisted:
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: CUDASynth

Post by admin »

Looks good. Any suggestions for a compatible mobo?
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: CUDASynth

Post by admin »

Thank you!
DAE avatar
Guest

Re: CUDASynth

Post by Guest »

The other option you might want to consider, seeing as you have this thing about GPUs (NVidia), is to drop the CPU to the 16 core/ 32 thread version and maybe go with 2 GPUs
or
(note: here goes the budget)
Get the 32 core CPU and 2 GPUs
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: CUDASynth

Post by admin »

gonca wrote:
Wed Sep 19, 2018 3:37 pm
(note: here goes the budget)
Get the 32 core CPU and 2 GPUs
That sounds good. I don't have a budget. I'm getting older so I'm going to blow it all on hardware and travel adventures. :P

BestBuy went to preorder on the 2080 Ti but by the time I saw the alert it was all taken. Nobody wants these things. ;)

Thanks for making this thread go into flames (the icon on the forum list).
DAE avatar
Guest

Re: CUDASynth

Post by Guest »

Thanks for making this thread go into flames
That is how I learned about the hardware side of things.
Making things go up in flames
burn baby burn.jpg
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: CUDASynth

Post by admin »

Wow, parallel ATA connectors. :wow: What's the CPU, a 386?

The first processor I coded for was an 8080. The OS was CP/M. It had a 100K floppy drive that raised and lowered the head on each sector access (the infamous head-loading solenoid), causing a pleasing bang-bang-bang that the neighbors loved. I clearly remember tossing that system (Heathkit H8) in the dumpster when I upgraded to a 386-based system. :lol:
DAE avatar
Guest

Re: CUDASynth

Post by Guest »

8080 and 386
Those were the days, never to be seen again, thank whichever supreme deity for that small favor
Gee-sh, now I am getting politically correct
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: CUDASynth

Post by admin »

If using the word "God" is politically incorrect, then we are done. Praise the Lord!
Post Reply