CUDASynth

These CUDA filters are packaged into DGDecodeNV, which is part of DGDecNV.
Post Reply
DAE avatar
Guest

CUDASynth

Post by Guest »

Rocky wrote:
Thu Feb 25, 2021 5:21 pm
Got my 3090 tracker with alerts running for days. Twice I got the alert and within seconds went to the seller site and tried to buy. But both times they were already out of stock. This is getting to be very annoying. By the time I can actually buy one there'll be 4090 and 5090. :evil:
Know how you feel. I am "planning" a new build for a little later in the year and a major hurdle is the complete lack of stock of RTX30xx cards
User avatar
DJATOM
Posts: 176
Joined: Fri Oct 16, 2015 6:14 pm

CUDASynth

Post by DJATOM »

I got tired by waiting for stocks and bought bitcoin yesterday for 50% of holding sum for RTX3080. I'm expecting to sell it at 70-90k and cover inflation/overprice by our sellers. Also hoping for better availability in summer or autumn. Most likely I'm not gonna have it this spring.
User avatar
Rocky
Posts: 3557
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

Elon Musk appears to be all in on bitcoin. Might have to revise my stance on that. :scratch:

https://www.cnbc.com/2021/02/08/tesla-b ... tcoin.html
User avatar
Rocky
Posts: 3557
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

My my, how time flies. I was thinking about CUDASynth again. It's hard to pass up a 350% performance improvement for the case I demonstrated. Actually, the improvement gets greater the more CUDA filters there are in the chain. That's because with CUDASynth you always have only two CPU<->GPU transfers, instead of 2 x N, where N is the number of CUDA filters (including DGSource). And for UHD, the CPU<->GPU transfers are the bottleneck, so it's really attractive for UHD and above.

Realistically, even though I published code for a generic CUDA filter supporting this mode of operation, nobody was interested. And honestly, only a handful of people globally can even write a CUDA filter, let alone one that supports this mode of operation. So we're not going to change the world. :cry:

No matter, there's no reason why I can't just do it in the DG world. The way I previously suggested requires some fiddly syntax in the script, the fsrc/fdst parameters. Sure I could make a script compiler that adds those things automatically. But since no-one cares I'm going to just put the filters inside DGSource(), requiring only to have new DGSource() parameters to, for example, turn on denoising. I'll probably support DGDenoise(), DGSharpen(), and DGHDRtoSDR(), starting with DGDenoise(). Future CUDA filters could be added.

The script reduces to something like:

DGSource(source parameters, denoise parameters, sharpen parameters, hdr2sdr parameters)

If the denoise parameters are omitted then denoising is not performed, etc.

This framework will be a big incentive to create more-and-more CUDA filters. The standalone filters would still work fine of course.

Well, it's something fun to do in the cold months. :ugeek:
User avatar
hydra3333
Posts: 394
Joined: Wed Oct 06, 2010 3:34 am
Contact:

CUDASynth

Post by hydra3333 »

Beaut ! Fantasmagorical.
requiring only to have new DGSource() parameters to, for example, turn on denoising.
I wonder if you would consider having off/on parameters, eg "denoise=True", in there too.

Sounds silly, but in a "reusable script" it would allow the call to have favourite parameters "preset" and the only thing I'd then need to do is turn the filter on or off via those True/False depending on the need.

If I have this right in my noggin - with HDR becoming ubiquitous especially on phones, cudasynth DGHDRtoSDR is especially welcome !

Thank you and Cheers.
I really do like it here.
User avatar
Rocky
Posts: 3557
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

Great idea, hydra3333. We could have one main parameter, e.g.:

denoise = 0 is off (default)
1 is preset 1
2 is preset 2
...
-1 is no preset with parameters to follow

Presets would be stored in a config file that you program.

Thank you for the idea and keep them coming.
User avatar
hydra3333
Posts: 394
Joined: Wed Oct 06, 2010 3:34 am
Contact:

CUDASynth

Post by hydra3333 »

Rocky wrote:
Sat Jan 20, 2024 3:08 am
Presets would be stored in a config file that you program.
OK. Not sure how that may work in the case of a vanilla .vpy script consumed by vspipe and piped into ffmpeg.
Unless perhaps DGSource opens and checks a presets file containing bunches of presets per filter for the specified id, by text id name :) ?, (it may, I'll have to check the documentation).

Deinterlacing is a daily thing for me, I guess that may not change or perhaps somehow default to the current nvidia deinterlacer.

Looking forward to seeing what you settle on !

Kind Regards,
ye olde hydra
I really do like it here.
User avatar
Rocky
Posts: 3557
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

hydra3333 wrote:
Sat Jan 20, 2024 7:26 pm
Unless perhaps DGSource opens and checks a presets file containing bunches of presets per filter for the specified id, by text id name :) ?, (it may, I'll have to check the documentation).
It doesn't do that now, but yes, that's what I have in mind.
Deinterlacing is a daily thing for me, I guess that may not change or perhaps somehow default to the current nvidia deinterlacer.
We could have a preset for the deinterlace option.

Remember the old .def file for DGMPGDec's mpeg2_source()? We'll reincarnate it for DGSource().

I may not do the -1 option. All parameters for these filters would have to be in the .def file. Not sure yet. Just don't want to have super long DGSource() lines in the script.
User avatar
hydra3333
Posts: 394
Joined: Wed Oct 06, 2010 3:34 am
Contact:

CUDASynth

Post by hydra3333 »

He he, no long lines etc ? :)
Cool ! Oops, I guess I wandered back to my mis-spent youth where it a thing to see stuff you needed in one place and fiddling with "sequential files" could be challenging. Oh well. I suppose I don't really miss paper tape and debugging device drivers by single stepping using front panel switches and lights and octal though.
I really do like it here.
User avatar
Curly
Posts: 712
Joined: Sun Mar 15, 2020 11:05 am

CUDASynth

Post by Curly »

hydra3333 wrote:
Sun Jan 21, 2024 2:18 pm
octal
hehe yer a stud
i cut my teeth on hexadecimal way easier
akshully Sherman taught me
wud u believe it
Curly Howard
Director of EAC3TO Development
User avatar
Rocky
Posts: 3557
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

Wow guys, you're gonna be shocked, shocked, I tell ya.

So, I implemented the first part of the changes for the new DGSource() that will integrate some CUDA filters like DGDenoise(). Three things I had to do for that were: 1) do the NV12->YV12 conversion in a CUDA kernel rather than the CPU, 2) use consistent pitches for all buffers except the Avisynth buffers (which I cannot control), and which allowed replacement of pitched copies with linear copies, and 3) use env->BitBlt() for the final copying to the Avisynth buffers [instead of for loops and memcpy()]. Here are the results:

Original DGSource():

Code: Select all

AVSMeter 2.7.5 (x64) - Copyright (c) 2012-2017, Groucho2004
AviSynth+ 3.7.3 (r4029, 3.7, x86_64) (3.7.3.0)

Number of frames:                 6772
Length (hh:mm:ss.ms):     00:01:52.980
Frame width:                      3840
Frame height:                     2160
Framerate:                      59.940 (60000/1001)
Colorspace:                  YUV420P16

Frames processed:               6772 (0 - 6771)
FPS (min | max | average):      106.0 | 278.7 | 263.4
Memory usage (phys | virt):     602 | 1550 MiB
Thread count:                   23
CPU usage (average):            7%

Time (elapsed):                 00:00:25.711
Revised DGSource():

Code: Select all

AviSynth+ 3.7.3 (r4029, 3.7, x86_64) (3.7.3.0)

Number of frames:                 6772
Length (hh:mm:ss.ms):     00:01:52.980
Frame width:                      3840
Frame height:                     2160
Framerate:                      59.940 (60000/1001)
Colorspace:                  YUV420P16

Frames processed:               6772 (0 - 6771)
FPS (min | max | average):      119.1 | 379.9 | 354.0
Memory usage (phys | virt):     602 | 1562 MiB
Thread count:                   23
CPU usage (average):            7%

Time (elapsed):                 00:00:19.127
I hope you noticed the 34% performance improvement. And that's for UHD content. I think that deserves a :wow:.

I was pulling my fur out trying to get this working for days. Looks like it was worth the effort. I might release this as is and then add the filters. DG is gonna be thrilled. Can you imagine the change log entry?

* Improved DGSource() performance by 34%.

:lol:
User avatar
Rocky
Posts: 3557
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

Now 444 is kicking my ass, but I'll win in the end.
User avatar
SomeHumanPerson
Posts: 96
Joined: Fri Mar 24, 2023 10:41 am

CUDASynth

Post by SomeHumanPerson »

That's an impressive performance gain and it sounds like you've really got your teeth into this.

:bow:
User avatar
Rocky
Posts: 3557
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

Thank you, sir. I just cracked the 444 nut! Silly me, I forgot that for 444, the U and V are not interleaved. I just couldn't figure out why my de-interleaving kernel wasn't working for 444. :? :oops: All I needed was a straight copy. That was 12 hours down the toilet. I finally thought to see what the current version does with 444 and it jumped right out at me because it uses a straight copy. Still, seeing that beautiful decoded 444 playing washed away all the frustration and negativity that had set in. After starting vdub hundreds of times while debugging, when it finally shows a clean picture, there is euphoria.

Now I just have to port it to vapoursynth, which should not be fraught in any way, and I can give you a test version, probably tomorrow.

Bonus: Ever since I started using a UHD monitor, I've had an issue with mouse double clicks. Any slight motion of the mouse between the clicks causes it to fail to detect a double click. So frustrating. Well today, I stumbled on a solution for that. In the registry you can configure the size of the rectangle within which motion doesn't matter. By default windows has 4x4 pixels, which isn't very large on a UHD monitor. I increased it to 16x16 (may even go to 32x32) and I don't have to be a brain surgeon to make a double click anymore.

Are you not entertained?
User avatar
Rocky
Posts: 3557
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

Boys and girls, be the first on your block to try out the new and improved DGSource()! Best price and approved by Granny. Your testing will be appreciated.

* 33 percent improvement in frame rate (measured for a UHD stream).

https://rationalqm.us/misc/DGDecodeNV_test1.rar (64 bit)

Simply replace your existing DGDecN.dll with this one.
User avatar
Rocky
Posts: 3557
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

Got DGHDRtoSDR working integrated into DGSource() as a test of concept. I put the parameters in hard-coded. Now I have to decide about the DGSource() configuration file stuff.

Kinda surprised at the lack of feedback on the sped-up DGSource(). I suppose maybe people don't follow this thread. I thought it was a big deal as it throws the cat among the pigeons in the CPU vs. GPU decoding wars. The two have been comparable with an edge to GPU for its lesser CPU utilization, but this starts to make it a mismatch, IMHO. With the integrated filters it will become a stomping.
DAE avatar
Guest 2
Posts: 903
Joined: Mon Sep 20, 2010 2:18 pm

CUDASynth

Post by Guest 2 »

Rocky wrote:
Fri Jan 19, 2024 5:03 pm
Guest 2 is gonna be so happy
Long time no see. :salute:

You know that I have a crush on CUDA. :D

Have my blessing about internal processing with DGSource.
User avatar
Rocky
Posts: 3557
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

Great to hear from you.

What do you think about the config file idea:

dgsource:deinterlace=2
dgsource:use_pf=true
dghdrtosdr:white=1500
etc.

If something is not included the existing default is used. Giving the parameter explicitly to DGSource() overrides the config file. Or we could ditch the explicit parameters entirely and use only the config file, but that breaks existing scripts. We need a way to avoid unwieldy parameter lists to DGSource(). Or do we really, what's so terrible about it? I would just have to check that Avisynth/Vapoursynth supports potentially very long lists. Maybe just change to the config file approach when we exceed that limit as we add filters.

EDIT: Apparently, 1024 parameters can be given, so we'll just do it that way for now. :salute: StainlessS

Another issue is ordering of the filter execution. I have some ideas for that. We should support an arbitrary ordering. Maybe each filter has an order parameter. Or, we get it from the order of the parameters, that's the ticket. I used to fantasize about changing Avisynth to do this stuff, i.e., kernel management and ordering as a solution to the cumbersome CUDASynth original idea with the fsrc/fdst parameters. Now this management will be a part of DGSource(). It's wonderful to be in control. :lol:
User avatar
Rocky
Posts: 3557
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

Frame serving and HDRtoSDR for UHD video is clocking in at 331 fps. That seems pretty good to me. Now, I'm going to add DGDenoise next.
DAE avatar
Guest 2
Posts: 903
Joined: Mon Sep 20, 2010 2:18 pm

CUDASynth

Post by Guest 2 »

Rocky wrote:
Sun Jan 28, 2024 7:50 am
What do you think about the config file idea
Mmm... it's rare to see common features that can be applied to all videos. I would leave the config file for DGIndexNV only and let the command line speak for DGSource itself.
Rocky wrote:
Sun Jan 28, 2024 11:43 am
Frame serving and HDRtoSDR for UHD video is clocking in at 331 fps. That seems pretty good to me. Now, I'm going to add DGDenoise next.
That is fast. Let's see how it scales on different cards. I am still on PCI-E 3.0 with a 1660 Super.

When DGCube time will come, we will need and internal conversion path too, as now CPU <-> GPU transfer is mandatory in certain type of HDR conversions (PQ->HLG, as example).
User avatar
Rocky
Posts: 3557
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

Guest 2 wrote:
Sun Jan 28, 2024 12:45 pm
When DGCube time will come, we will need an internal conversion path
I knew you would bring that up. I seem to have put myself on the path for that. :?
User avatar
Rocky
Posts: 3557
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

My horoscope for today:

Sagittarius, only those who follow the developed plan and complete all its stages can achieve success in their field.
User avatar
Rocky
Posts: 3557
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

Got Vapoursynth supported. I just want to check a few things as I found some strangenesses in the Vapoursynth support that trigger my internal inconsistency detector. I don't like to just ignore them. Hopefully, I'll make a test release tomorrow.
User avatar
hydra3333
Posts: 394
Joined: Wed Oct 06, 2010 3:34 am
Contact:

CUDASynth

Post by hydra3333 »

Rocky wrote:
Sun Jan 28, 2024 7:34 am
Kinda surprised at the lack of feedback on the sped-up DGSource().
Sorry, I've been away. I am amazed and thrilled !
I really do like it here.
User avatar
Rocky
Posts: 3557
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

Thanks, m8. I'm gonna put out a test version with HDRtoSDR today, and then start working on DGDenoise() integration.
Post Reply