CUDASynth

These CUDA filters are packaged into DGDecodeNV, which is part of DGDecNV.
Post Reply
User avatar
Boris
Posts: 92
Joined: Sun Nov 10, 2019 2:55 pm

Re: CUDASynth

Post by Boris »

Thank you, hydra3333. IV blood drip working wonders. You should visit bedside. Moose will pay for this.
User avatar
JoyBell
Posts: 16
Joined: Mon Feb 17, 2020 11:50 pm

Re: CUDASynth

Post by JoyBell »

Very interested in your CUDASynth.

I made a zoomed in crop of CUDA Synth Denoise and Sharpen vs my encoder favorite SMDegrain and LimitedSharpenFasterMod.

Comparison Pics
https://slow.pics/c/UmJUKAkF

Source:
Image

Good stuff.
DGDenoise vs SMDegrain(tr = 2, thSAD = 500)
Image
DGDenoise- much more smoothing, like it's a stronger setting than my SMDegrain Medium preset.
Image
SMDegrain- does a good job on the grain and keeps and enhances details.

DGSharpen vs LSFmod(strength = 400)
Image
DGSharpen- The grain pops and the image looks a bit more in focus.
Image
LSFmod- My LSF pic looks like I didn't apply the filter. I really might of made the mistake. :(


DGDenoise + DGSharpen vs SMDegrain and LimitedSharpenFasterMod.
Image
DGDenoise + DGSharpen- Well cleaned of grain, and picture has some sharpening but lost of hair and textile textures.
Image
SMDegrain + LSFmod- Well cleaned of grain without loss of textiles and hair, picture looks clearly sharper as if in better focus.

Very good work here by Mr. DG!
I would rate these better than KNLMeansCL and WarpSharp. That being said, SMDegrain and LSF remain the tools to beat from what I can see.
User avatar
JoyBell
Posts: 16
Joined: Mon Feb 17, 2020 11:50 pm

Re: CUDASynth

Post by JoyBell »

These cropped screens are from full CRF 22 1080p encodes of the clip.

70 Midway.2019.1080 1m46s DGHDRtoSDR
Image
70 Midway.2019.1080 1m39s DGHDRtoSDR DGDegrain DGSharp
Image
70 Midway.2019.1080 1s47m DGHDRtoSDR SMDegrain LSF
Image
User avatar
JoyBell
Posts: 16
Joined: Mon Feb 17, 2020 11:50 pm

Re: CUDASynth

Post by JoyBell »

I tried actually using the DGSharp and unfortunately it just seemed to bring out the dither more than anything else. I think most sharpens have a function to ignore grain and dither as sharping those is sub-optimal in most cases.
User avatar
hydra3333
Posts: 393
Joined: Wed Oct 06, 2010 3:34 am
Contact:

CUDASynth

Post by hydra3333 »

Just wondering ... is cudasynth still a thing, or more like an experiment that has run it's course ?

Cheers
User avatar
Rocky
Posts: 3525
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

Until the video community starts writing CUDA filters in significant numbers, there is no space for CUDASynth. CUDASynth is just a technology to link CUDA filters without requiring PCI transfers. We hope for increased traction for CUDA in the video community. Over at Doom9 pinterf has recently started to get interested in CUDA.
User avatar
hydra3333
Posts: 393
Joined: Wed Oct 06, 2010 3:34 am
Contact:

CUDASynth

Post by hydra3333 »

thanks. nice.
DAE avatar
Guest 2
Posts: 903
Joined: Mon Sep 20, 2010 2:18 pm

CUDASynth

Post by Guest 2 »

It seems that AVS+ is about doing the big jump to support CUDA.

Are you planning something?
User avatar
Rocky
Posts: 3525
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

Not yet, just waiting to see what it will be.
User avatar
Rocky
Posts: 3525
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

Got my 3090 tracker with alerts running for days. Twice I got the alert and within seconds went to the seller site and tried to buy. But both times they were already out of stock. This is getting to be very annoying. By the time I can actually buy one there'll be 4090 and 5090. :evil:
DAE avatar
Guest

CUDASynth

Post by Guest »

Rocky wrote:
Thu Feb 25, 2021 5:21 pm
Got my 3090 tracker with alerts running for days. Twice I got the alert and within seconds went to the seller site and tried to buy. But both times they were already out of stock. This is getting to be very annoying. By the time I can actually buy one there'll be 4090 and 5090. :evil:
Know how you feel. I am "planning" a new build for a little later in the year and a major hurdle is the complete lack of stock of RTX30xx cards
User avatar
DJATOM
Posts: 176
Joined: Fri Oct 16, 2015 6:14 pm

CUDASynth

Post by DJATOM »

I got tired by waiting for stocks and bought bitcoin yesterday for 50% of holding sum for RTX3080. I'm expecting to sell it at 70-90k and cover inflation/overprice by our sellers. Also hoping for better availability in summer or autumn. Most likely I'm not gonna have it this spring.
User avatar
Rocky
Posts: 3525
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

Elon Musk appears to be all in on bitcoin. Might have to revise my stance on that. :scratch:

https://www.cnbc.com/2021/02/08/tesla-b ... tcoin.html
User avatar
Rocky
Posts: 3525
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

My my, how time flies. I was thinking about CUDASynth again. It's hard to pass up a 350% performance improvement for the case I demonstrated. Actually, the improvement gets greater the more CUDA filters there are in the chain. That's because with CUDASynth you always have only two CPU<->GPU transfers, instead of 2 x N, where N is the number of CUDA filters (including DGSource). And for UHD, the CPU<->GPU transfers are the bottleneck, so it's really attractive for UHD and above.

Realistically, even though I published code for a generic CUDA filter supporting this mode of operation, nobody was interested. And honestly, only a handful of people globally can even write a CUDA filter, let alone one that supports this mode of operation. So we're not going to change the world. :cry:

No matter, there's no reason why I can't just do it in the DG world. The way I previously suggested requires some fiddly syntax in the script, the fsrc/fdst parameters. Sure I could make a script compiler that adds those things automatically. But since no-one cares I'm going to just put the filters inside DGSource(), requiring only to have new DGSource() parameters to, for example, turn on denoising. I'll probably support DGDenoise(), DGSharpen(), and DGHDRtoSDR(), starting with DGDenoise(). Future CUDA filters could be added.

The script reduces to something like:

DGSource(source parameters, denoise parameters, sharpen parameters, hdr2sdr parameters)

If the denoise parameters are omitted then denoising is not performed, etc.

This framework will be a big incentive to create more-and-more CUDA filters. The standalone filters would still work fine of course.

Well, it's something fun to do in the cold months. :ugeek:
User avatar
hydra3333
Posts: 393
Joined: Wed Oct 06, 2010 3:34 am
Contact:

CUDASynth

Post by hydra3333 »

Beaut ! Fantasmagorical.
requiring only to have new DGSource() parameters to, for example, turn on denoising.
I wonder if you would consider having off/on parameters, eg "denoise=True", in there too.

Sounds silly, but in a "reusable script" it would allow the call to have favourite parameters "preset" and the only thing I'd then need to do is turn the filter on or off via those True/False depending on the need.

If I have this right in my noggin - with HDR becoming ubiquitous especially on phones, cudasynth DGHDRtoSDR is especially welcome !

Thank you and Cheers.
I really do like it here.
User avatar
Rocky
Posts: 3525
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

Great idea, hydra3333. We could have one main parameter, e.g.:

denoise = 0 is off (default)
1 is preset 1
2 is preset 2
...
-1 is no preset with parameters to follow

Presets would be stored in a config file that you program.

Thank you for the idea and keep them coming.
User avatar
hydra3333
Posts: 393
Joined: Wed Oct 06, 2010 3:34 am
Contact:

CUDASynth

Post by hydra3333 »

Rocky wrote:
Sat Jan 20, 2024 3:08 am
Presets would be stored in a config file that you program.
OK. Not sure how that may work in the case of a vanilla .vpy script consumed by vspipe and piped into ffmpeg.
Unless perhaps DGSource opens and checks a presets file containing bunches of presets per filter for the specified id, by text id name :) ?, (it may, I'll have to check the documentation).

Deinterlacing is a daily thing for me, I guess that may not change or perhaps somehow default to the current nvidia deinterlacer.

Looking forward to seeing what you settle on !

Kind Regards,
ye olde hydra
I really do like it here.
User avatar
Rocky
Posts: 3525
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

hydra3333 wrote:
Sat Jan 20, 2024 7:26 pm
Unless perhaps DGSource opens and checks a presets file containing bunches of presets per filter for the specified id, by text id name :) ?, (it may, I'll have to check the documentation).
It doesn't do that now, but yes, that's what I have in mind.
Deinterlacing is a daily thing for me, I guess that may not change or perhaps somehow default to the current nvidia deinterlacer.
We could have a preset for the deinterlace option.

Remember the old .def file for DGMPGDec's mpeg2_source()? We'll reincarnate it for DGSource().

I may not do the -1 option. All parameters for these filters would have to be in the .def file. Not sure yet. Just don't want to have super long DGSource() lines in the script.
User avatar
hydra3333
Posts: 393
Joined: Wed Oct 06, 2010 3:34 am
Contact:

CUDASynth

Post by hydra3333 »

He he, no long lines etc ? :)
Cool ! Oops, I guess I wandered back to my mis-spent youth where it a thing to see stuff you needed in one place and fiddling with "sequential files" could be challenging. Oh well. I suppose I don't really miss paper tape and debugging device drivers by single stepping using front panel switches and lights and octal though.
I really do like it here.
User avatar
Curly
Posts: 709
Joined: Sun Mar 15, 2020 11:05 am

CUDASynth

Post by Curly »

hydra3333 wrote:
Sun Jan 21, 2024 2:18 pm
octal
hehe yer a stud
i cut my teeth on hexadecimal way easier
akshully Sherman taught me
wud u believe it
Curly Howard
Director of EAC3TO Development
User avatar
Rocky
Posts: 3525
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

Wow guys, you're gonna be shocked, shocked, I tell ya.

So, I implemented the first part of the changes for the new DGSource() that will integrate some CUDA filters like DGDenoise(). Three things I had to do for that were: 1) do the NV12->YV12 conversion in a CUDA kernel rather than the CPU, 2) use consistent pitches for all buffers except the Avisynth buffers (which I cannot control), and which allowed replacement of pitched copies with linear copies, and 3) use env->BitBlt() for the final copying to the Avisynth buffers [instead of for loops and memcpy()]. Here are the results:

Original DGSource():

Code: Select all

AVSMeter 2.7.5 (x64) - Copyright (c) 2012-2017, Groucho2004
AviSynth+ 3.7.3 (r4029, 3.7, x86_64) (3.7.3.0)

Number of frames:                 6772
Length (hh:mm:ss.ms):     00:01:52.980
Frame width:                      3840
Frame height:                     2160
Framerate:                      59.940 (60000/1001)
Colorspace:                  YUV420P16

Frames processed:               6772 (0 - 6771)
FPS (min | max | average):      106.0 | 278.7 | 263.4
Memory usage (phys | virt):     602 | 1550 MiB
Thread count:                   23
CPU usage (average):            7%

Time (elapsed):                 00:00:25.711
Revised DGSource():

Code: Select all

AviSynth+ 3.7.3 (r4029, 3.7, x86_64) (3.7.3.0)

Number of frames:                 6772
Length (hh:mm:ss.ms):     00:01:52.980
Frame width:                      3840
Frame height:                     2160
Framerate:                      59.940 (60000/1001)
Colorspace:                  YUV420P16

Frames processed:               6772 (0 - 6771)
FPS (min | max | average):      119.1 | 379.9 | 354.0
Memory usage (phys | virt):     602 | 1562 MiB
Thread count:                   23
CPU usage (average):            7%

Time (elapsed):                 00:00:19.127
I hope you noticed the 34% performance improvement. And that's for UHD content. I think that deserves a :wow:.

I was pulling my fur out trying to get this working for days. Looks like it was worth the effort. I might release this as is and then add the filters. DG is gonna be thrilled. Can you imagine the change log entry?

* Improved DGSource() performance by 34%.

:lol:
User avatar
Rocky
Posts: 3525
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

Now 444 is kicking my ass, but I'll win in the end.
User avatar
SomeHumanPerson
Posts: 96
Joined: Fri Mar 24, 2023 10:41 am

CUDASynth

Post by SomeHumanPerson »

That's an impressive performance gain and it sounds like you've really got your teeth into this.

:bow:
User avatar
Rocky
Posts: 3525
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

Thank you, sir. I just cracked the 444 nut! Silly me, I forgot that for 444, the U and V are not interleaved. I just couldn't figure out why my de-interleaving kernel wasn't working for 444. :? :oops: All I needed was a straight copy. That was 12 hours down the toilet. I finally thought to see what the current version does with 444 and it jumped right out at me because it uses a straight copy. Still, seeing that beautiful decoded 444 playing washed away all the frustration and negativity that had set in. After starting vdub hundreds of times while debugging, when it finally shows a clean picture, there is euphoria.

Now I just have to port it to vapoursynth, which should not be fraught in any way, and I can give you a test version, probably tomorrow.

Bonus: Ever since I started using a UHD monitor, I've had an issue with mouse double clicks. Any slight motion of the mouse between the clicks causes it to fail to detect a double click. So frustrating. Well today, I stumbled on a solution for that. In the registry you can configure the size of the rectangle within which motion doesn't matter. By default windows has 4x4 pixels, which isn't very large on a UHD monitor. I increased it to 16x16 (may even go to 32x32) and I don't have to be a brain surgeon to make a double click anymore.

Are you not entertained?
User avatar
Rocky
Posts: 3525
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

Boys and girls, be the first on your block to try out the new and improved DGSource()! Best price and approved by Granny. Your testing will be appreciated.

* 33 percent improvement in frame rate (measured for a UHD stream).

https://rationalqm.us/misc/DGDecodeNV_test1.rar (64 bit)

Simply replace your existing DGDecN.dll with this one.
Post Reply