Page 1 of 16

Re: CUDASynth

Posted: Sat Sep 08, 2018 10:46 am
by Guest
The price is reduced by a factor of 5

Sales are good, faster is better
:hat:

Re: CUDASynth

Posted: Sat Sep 08, 2018 12:40 pm
by admin
Wow, gonca, listen to this...

I did some optimizations to reduce the sizes of the critical sections for the lock. Then I ran this script:

DGSource(fdst="cpu")

I got 207.1 fps. Note that adding prefetch here does not help and in fact greatly reduces the fps.

Now I ran this script:

DGSource(fdst="gpu0")
DGSharpen(fsrc="gpu0")
prefetch(2)

I got 206.3 fps! :wow: This means that thanks to CUDASynth, a limited sharpen filter is being executed on a 3840x2160 frame essentially for free.

This is what I mean by bringing out the true power of NVDec/CUDA for Avisynth/Vapoursynth.

Re: CUDASynth

Posted: Sat Sep 08, 2018 2:02 pm
by Guest
This brings up the issue of hardware encoding.
Depending on Nvidia's capabilities on the new cards, a two card system could be amazingly fast
One card pre-processing and frame serving, and the second encoding

Re: CUDASynth

Posted: Sat Sep 08, 2018 2:05 pm
by admin
gonca wrote:
Sat Sep 08, 2018 2:02 pm
This brings up the issue of hardware encoding.
Depending on Nvidia's capabilities on the new cards, a two card system could be amazingly fast
One card pre-processing and frame serving, and the second encoding
For sure. Someone send me another 1080 Ti, or maybe an RTX 2080 Ti. I would accept either one. :P :salute:

Even with one GPU we can make an encoder filter and put it at the end of the pipeline taking input from gpu0/1, saving PCIe and copies between the Avisynth output and the encoder. On the list now.

Re: CUDASynth

Posted: Sat Sep 08, 2018 10:15 pm
by hydra3333
admin wrote:
Sat Sep 08, 2018 12:40 pm
Now I ran this script:
DGSource(fdst="gpu0")
DGSharpen(fsrc="gpu0")
prefetch(2)

I got 206.3 fps! :wow: This means that thanks to CUDASynth, a limited sharpen filter is being executed on a 3840x2160 frame essentially for free.

This is what I mean by bringing out the true power of NVDec/CUDA for Avisynth/Vapoursynth.
That is truly impressive.
admin wrote:
Sat Sep 08, 2018 12:40 pm
Even with one GPU we can make an encoder filter and put it at the end of the pipeline taking input from gpu0/1, saving PCIe and copies between the Avisynth output and the encoder. On the list now.
And I'd thought I could not get any more excited ... :)

Re: CUDASynth

Posted: Mon Sep 10, 2018 8:44 am
by admin
I finished supporting both fsrc and fdst for DGSharpen so I am able to show the results on a longer pipeline. Here is the script:

dgsource("LG Chess 4K Demo.dgi",fulldepth=false,fdst="gpu0")
DGSharpen(fsrc="gpu0",fdst="gpu1")
DGSharpen(fsrc="gpu1",fdst="gpu0")
DGSharpen(fsrc="gpu0")
prefetch(5)

For the 3840x2160 59.94 fps stream I get 161.6 fps. When I do not use the pipeline (all fsrc and fdst are "cpu"), then I get 76.8 fps. So CUDASynth is twice as fast and makes the difference between real-time and non-real-time operation. There is still a lot of headroom to have more filters while remaining real-time. The average "price" of the sharpens here is about 15 fps each.

Next, after I add P16 support, I'm going to CUDASynth-enable DGHDRtoSDR and see what kind of frame rate we can get for a full HDR to SDR script.

Re: CUDASynth

Posted: Mon Sep 10, 2018 3:32 pm
by Guest
what kind of frame rate we can get for a full HDR to SDR script.

This should be good!

Re: CUDASynth

Posted: Mon Sep 17, 2018 12:26 pm
by admin
Oy, it took a whole week to get the pipeline running in P16 [DGSource(fulldepth=true]. My NV12toYV12 kernel was broken for P16 and I had some bad pitch handling in DGSharpen for the case of fsrc=gpu and fdst=gpu. The implementation was tricky as debugging is hard when the intermediate ping-pong buffers are not visible from the host to check their contents at various points along the pipeline. Writing code to copy them down to the host would be possible but tricky so I resorted to some hunches and other tricks such as memsetting the device memory at various points to see if the values showed up at the final output as expected. It was something of a week-long nightmare because when I go to bed with outstanding bugs in my code my brain works overtime all night. And the bugs were like a layered onion; rack your brains to fix one layer and another gets exposed. After multiple layers in a row stretching over a week I got pretty zonked out from lack of proper sleep. :(

But I am a patient and persistent soul and never give up (unless the goal is theoretically impossible), and so everything is working perfectly now :D for a 4-filter GPU pipeline (DGSource + 3 x DGSharpen) in P8 and P16. I'm going to CUDASynth-enable DGHDRtoSDR now and see how it performs in a DGSource->DGHDRtoSDR pipeline.

At some point I will publish a specification for how to implement a CUDASynth compatible filter together with a source code example. Without that CUDASynth acceleration would be limited to my own filters.

Re: CUDASynth

Posted: Mon Sep 17, 2018 3:20 pm
by Guest
I guess you never give up then since theoretically impossible is theoretically impossible.
Theoretically improbable, maybe
Theoretically not possible at this time, ok
But with technological improvements and increases in knowledge what is impossible today might be possible tomorrow
Therefore
theoretically impossible is theoretically impossible
and QED
you never give up

Re: CUDASynth

Posted: Mon Sep 17, 2018 3:57 pm
by admin
OK. :salute:

Re: CUDASynth

Posted: Mon Sep 17, 2018 4:40 pm
by Guest
I am just happy you never give up
We keep getting new and better avs/vpy tools to use
Thank you
:bravo:

Re: CUDASynth

Posted: Tue Sep 18, 2018 9:34 am
by admin
I have CUDASynth-enabled DGHDRtoSDR and have some preliminary performance numbers for your enjoyment. The source is the same as previously used: 3840x2160 59.94 fps HDR10. The script with GPU pipelining is:

dgsource("LG Chess 4K Demo.dgi",fulldepth=true,fdst="gpu0")
dghdrtosdr(impl="255",light=250,fsrc="gpu0") # outputs YV12
prefetch(4)

Not pipelined on GPU: 80 fps, CPU 13%
Pipelined on GPU: 204 fps, CPU 8%

Quite a substantial performance boost!

Re: CUDASynth

Posted: Tue Sep 18, 2018 3:34 pm
by Guest
Nice boost, 250%

Re: CUDASynth

Posted: Tue Sep 18, 2018 5:08 pm
by Guest
Any ETA on public testing?
No rush, just getting antsy

Re: CUDASynth

Posted: Tue Sep 18, 2018 8:25 pm
by admin
Still have some things to finish up: Vapoursynth support, fdst parameter for DGHDRtoSDR, CUDASynth-enable DGDenoise, documentation, source code example. And I'm timesharing with DGIndex MKV support. Hang in there.

BTW, the CPU reduction is also important as it leaves more CPU for encoding.

It's hard to find 2080 Ti's:

https://www.nowinstock.net/computers/vi ... rtx2080ti/

Puts the lie to some of the whining at other forums by people saying it's too expensive, nobody wants it, nVidia are dirty rotten criminal capitalists, how dare they make a GPU I can't afford, blah blah blah.

gonca, what's the fastest most powerful Threadripper likely to be available within a few months?

I saved a lot of dough doing my own bathroom remodeling so I have the ready green to dish out for the best hardware. :twisted:

Re: CUDASynth

Posted: Tue Sep 18, 2018 9:00 pm
by Guest

Re: CUDASynth

Posted: Tue Sep 18, 2018 9:40 pm
by admin
Looks good. Any suggestions for a compatible mobo?

Re: CUDASynth

Posted: Wed Sep 19, 2018 4:48 am
by Guest

Re: CUDASynth

Posted: Wed Sep 19, 2018 12:08 pm
by admin
Thank you!

Re: CUDASynth

Posted: Wed Sep 19, 2018 3:37 pm
by Guest
The other option you might want to consider, seeing as you have this thing about GPUs (NVidia), is to drop the CPU to the 16 core/ 32 thread version and maybe go with 2 GPUs
or
(note: here goes the budget)
Get the 32 core CPU and 2 GPUs

Re: CUDASynth

Posted: Wed Sep 19, 2018 4:14 pm
by admin
gonca wrote:
Wed Sep 19, 2018 3:37 pm
(note: here goes the budget)
Get the 32 core CPU and 2 GPUs
That sounds good. I don't have a budget. I'm getting older so I'm going to blow it all on hardware and travel adventures. :P

BestBuy went to preorder on the 2080 Ti but by the time I saw the alert it was all taken. Nobody wants these things. ;)

Thanks for making this thread go into flames (the icon on the forum list).

Re: CUDASynth

Posted: Wed Sep 19, 2018 4:28 pm
by Guest
Thanks for making this thread go into flames
That is how I learned about the hardware side of things.
Making things go up in flames
burn baby burn.jpg

Re: CUDASynth

Posted: Wed Sep 19, 2018 5:03 pm
by admin
Wow, parallel ATA connectors. :wow: What's the CPU, a 386?

The first processor I coded for was an 8080. The OS was CP/M. It had a 100K floppy drive that raised and lowered the head on each sector access (the infamous head-loading solenoid), causing a pleasing bang-bang-bang that the neighbors loved. I clearly remember tossing that system (Heathkit H8) in the dumpster when I upgraded to a 386-based system. :lol:

Re: CUDASynth

Posted: Wed Sep 19, 2018 5:38 pm
by Guest
8080 and 386
Those were the days, never to be seen again, thank whichever supreme deity for that small favor
Gee-sh, now I am getting politically correct

Re: CUDASynth

Posted: Wed Sep 19, 2018 8:26 pm
by admin
If using the word "God" is politically incorrect, then we are done. Praise the Lord!