CUDASynth

Guest 2 · Post by **Guest 2** » Mon Jan 29, 2024 7:43 am

Rocky wrote: ↑
Mon Jan 29, 2024 1:49 am
Thanks, m8. I'm gonna put out a test version with HDRtoSDR today, and then start working on DGDenoise() integration.

When you have some spare time, have a look at https://github.com/WolframRhodium/Vapou ... A/issues/7

Seems that AVS support is getting orphaned. It's a very good filter, please evaluate if it's feasible to run internally.

Post by **Rocky** » Mon Jan 29, 2024 7:55 am

I heard BM3D sucks. Is it all just hype or can you prove otherwise?

Guest 2 · Post by **Guest 2** » Mon Jan 29, 2024 8:59 am

Rocky wrote: ↑
Mon Jan 29, 2024 7:55 am
I heard BM3D sucks. Is it all just hype or can you prove otherwise?

My experience and eyes. I find BM3D more precise and less blurry and it has the capability to go temporal too.

NLMeans and BM3D work best toghether, often, applying first BM3D and then NLMeans.

You can find and entire thread on https://forum.doom9.org/showthread.php?t=172172

I stopped using it as the AVS part build became obsolete.

Post by **Rocky** » Mon Jan 29, 2024 9:07 am

If you make it hard for me I won't do anything. Show me a sample and a script and tell why it is better.

Guest 2 · Post by **Guest 2** » Mon Jan 29, 2024 4:39 pm

Rocky wrote: ↑
Mon Jan 29, 2024 9:07 am
If you make it hard for me I won't do anything. Show me a sample and a script and tell why it is better.

Let me finish this batch encode and I will post proper examples.

Post by **Rocky** » Mon Jan 29, 2024 4:50 pm

Thank you.

Guest 2 · Post by **Guest 2** » Wed Jan 31, 2024 6:30 am

Rocky wrote: ↑
Mon Jan 29, 2024 4:50 pm
Thank you.

I have found something seriously made

https://www.aanda.org/articles/aa/pdf/2 ... 278-19.pdf

I know you will like it.

For something more practical, those images https://forum.doom9.org/showthread.php? ... ost1794468

P.S: I am not one of the lucky owners but is possible to access the denoising that Tensor Cores apply to raytraced images?

Post by **Rocky** » Wed Jan 31, 2024 6:53 am

I do like it, thank you.

Just talking about BM3D vs. NL means because DGDenoise is NL-means-based...

BM3D is clearly better with stationary noise, while NL means is a bit better with non-stationary noise. I'd argue that our uses cases are closer to non-stationary.

I'll look at your practical cases and report back.

"the denoising that Tensor Cores apply to raytraced images"

I don't know what they are doing. Do you?

Post by **Rocky** » Wed Jan 31, 2024 6:57 am

Your link says:

"V-BM3D is the best among 3 in general, but it suffers from some kind of "liquefying" low frequency artifacts which does not exist in NLMeans"

It also says:

"quality of high frequency filtering: NLMeans > V-BM3D"

His "general quality" assessment is subjective, and honestly, the NL means fish looks better to me.

Each filter wins at different frequencies. There's no knock-out punch for any of them, IMHO.

He doesn't state if the noise is stationary or non-stationary. Are all the filters doing spatio-temporal, etc.?

One improvement for DGDenoise would be to add temporal processing.

You're gonna have to do much better if you want me to invest a lot of time in this.

hydra3333 · Post by **hydra3333** » Wed Jan 31, 2024 7:22 am

Jogs memory, slips a cog, fails to recall properly, guesses ...
one wonders what maybe up for consideration next in cudasynth ... dgsharpen, dgdeblock ?
did I mention dgdeblock ?

Post by **Rocky** » Wed Jan 31, 2024 7:32 am

DGDenoise then DGSharpen.

I never heard of DGDeblock.

Guest 2 · Post by **Guest 2** » Wed Jan 31, 2024 8:09 am

Rocky wrote: ↑
Wed Jan 31, 2024 6:57 am
You're gonna have to do much better if you want me to invest a lot of time in this.

The reason is that usually you apply first BM3D then KNLMeansCL to have the best of both worlds.

Having both in CUDA would have been for us easier and faster.

Guest 2 · Post by **Guest 2** » Wed Jan 31, 2024 8:13 am

Rocky wrote: ↑
Wed Jan 31, 2024 6:53 am
I don't know what they are doing. Do you?

Watch Two minute papers on

https://www.youtube.com/channel/UCbfYPy ... upoX8nvctg

The owner is a researcher who loves CGI and has a huge esteem for Nvidia researchers.

He explained very well why light tracing needs denoising in a recent video.

Post by **Rocky** » Wed Jan 31, 2024 8:21 am

Just wasted time watching stupid little fake things running around. Thanks!

Please give a link to the noise video. And I'm looking for the denoising algorithm not just why it is needed.

Post by **Rocky** » Wed Jan 31, 2024 8:25 am

Guest 2 wrote: ↑
Wed Jan 31, 2024 8:09 am
The reason is that usually you apply first BM3D then KNLMeansCL to have the best of both worlds.

You have to support things and not make unsupported claims. And don't just link some idiot saying he does that. Show that it is better and explain why.

Post by **Sherman** » Wed Jan 31, 2024 6:27 pm

Guys, I had a brainstorm! Now pay attention.

We can implement a filter consisting of every known denoising algorithm run in succession. That way we would be assured of getting the best of every world. Isn't science easy?

Britney · Post by **Britney** » Wed Jan 31, 2024 6:28 pm

Could be worth a try.

Post by **Rocky** » Thu Feb 01, 2024 7:46 pm

Started integrating DGDenoise.

Post by **Rocky** » Sun Feb 04, 2024 10:30 am

Got it working! The challenge was to have the same denoising in both DGSource() and DGDenoise() without clashing or requiring duplicated cu files.

Still have some loose ends to tie up. Also, I see that the latest nVidia code is different. Maybe it's better than the older version we use. I'll check.

Post by **Rocky** » Sun Feb 04, 2024 5:33 pm

DG is trying to spend all his money. He just bought us an RTX 4090 and a 1200W PSU.

I would have said don't waste your money as the PCIe bus is the bottleneck. But in CUDASynth world that is no longer true.

Guest 2 · Post by **Guest 2** » Sun Feb 04, 2024 11:39 pm

Rocky wrote: ↑
Sun Feb 04, 2024 10:30 am
Got it working! The challenge was to have the same denoising in both DGSource() and DGDenoise() without clashing or requiring duplicated cu files.

I have thought about your idea about the ini file. Perhaps it's feasible if you introduce a small panel in DGIndex to control filters and you can save there in its ini the default demux (and later decode) parameters. So you can pass the wanted filter to DGSource too having some parameters variable, such as you do with crop.

Post by **Rocky** » Mon Feb 05, 2024 4:00 am

Yes, I agree with that. if we get too many filters we'll have to do something. But rather than an INI file, maybe leverage the existing script template generation.

Post by **Rocky** » Mon Feb 05, 2024 5:57 am

There's another benefit from the CUDASynth way for our denoising. I'll explain. There are two things that can slow it down: PCIe transfers and the window size for the algorithm. With the reduction of the PCIe overhead, we have some leeway to increase the window size, and therefore the quality. So I changed the choice of window sizes from the current 5/7/9 to 9/11/15 yielding a noticeable improvement in quality without losing much overall speed. I'm gonna change the parameters to name them low/medium/high quality. Understand that low is the same as the previous highest quality. Here are the speeds for 1920x1080 (decoding plus denoising):

low: 518 fps
medium: 270 fps
high: 173 fps

On the very noisy Nostalghi, all the quality levels look fine but high looks amazing.

The highest level is still 7 times real time, so maybe we should have an ultra quality level as well. That could be window size 25 giving 65 fps, still twice real-time.

I'm also probably going to ditch the blend/cblend options and just always use 0.0 blend, which means never mix in the original pixels with the filtered pixels. I never saw the point of that because it just adds back the noise you just took out. Elimination of the lerp raises the fps by a small amount too. And there's less possible crud on the parameter list too, making coding and maintenance easier.

Post by **Rocky** » Mon Feb 05, 2024 8:37 am

After getting awesome performance from our algorithm, I thought I'd try out BM3D (test 10). I got this script online:

ConvertBits(bits=32)
BM3D_CUDA(sigma=0.5, radius=2)
BM3D_VAggregate(radius=2)
ConvertBits(bits=16)

When I tried it there was no perceptible denoising at all. I had to bump sigma to 25 to get something even starting to be comparable to ours. What's up with that? Even then the denoising was poor with artifacts. And worse than that the speed was 25 fps. It's not surprising with the temporal smoothing, the conversions, the extra filter, and all the PCIe bus overhead.

OK, temporal filtering is enabled, so I disabled it by setting radius=0 and ditching BM3D_VAggregate(). The denoising was understandably even worse and the speed was 136 fps. Even at low quality, ours is 518 fps with better results.

To turn on chroma, the doc says you need YUV444PS input. The insanity just multiplies.

So tell me, what's to like here? It seems to me that the only thing this filter has going for it is the name.

hydra3333 · Post by **hydra3333** » Tue Feb 06, 2024 4:53 am

Guest 2 wrote: ↑
Sun Feb 04, 2024 11:39 pm
Perhaps it's feasible if you introduce a small panel in DGIndex to control filters and you can save there in its ini the default demux (and later decode) parameters. So you can pass the wanted filter to DGSource too having some parameters variable, such as you do with crop.

Hmm, just a thought, perhaps as an option dgsource specify a .ini file to use ? That could be what you meant and I missed it.