DGDenoise

Post by **admin** » Sun Feb 19, 2017 5:51 pm

Thank you, that's very kind. Obviously, I would defer to your greater knowledge and appreciation of global varieties.

Make Beer Great Again

Post by **admin** » Sun Feb 19, 2017 6:30 pm

gonca wrote: If I want a new system I just store the old spare outside, give it a couple of bad weather days (I am in Toronto, Canada) and there you go. Darn it, old one is not really usable anymore

Brilliant. A star is born.

Guest · Post by **Guest** » Thu Feb 23, 2017 5:21 pm

Follow up
Re-encode my movies to so I can store them on a "media server"
Use x264 medium preset crf=18 bluray -compat=1 tune film
Goal is to get video bitrate to 70% or less of original
Sometimes, because of grain and or noise, it doesn't work, bitrates are too high, on occasion higher than original
I am now using DGDenoise, not for script speed testing, to actually denoise.
My humble opinion
Very efficient at denoising even at very low strengths
Results show very good retention of detail
One word summary of results with DGDenoise
Mah vellous!

Disclaimer
I don't use stackhorizontal or vertical and zoom up to 500%
I watch the movie at a normal viewing distance
In this case I watched parts of the original and then compared to the encode

Well done D.G.

Post by **admin** » Thu Feb 23, 2017 5:43 pm

That's great to hear, gonca. Thanks for the perspective. Of course the marvellousness of the quality comes from the NLM algorithm; I didn't invent it. And I do stack and zoom to check quality against the other implementations. But the marvellousness of the speed comes from my optimizations to the original nVidia CUDA kernel sample code and my interface to Avisynth, etc. For 1920x1080 the kernel runs in 2.8 ms on my system, suggesting a maximum frame rate of 357 fps. I obtain only 184 fps and I see from the profiling that the missing time is in Avisynth/application overhead, so I don't expect any further significant gains from kernel optimization. The memcpys are already fully optimized, and they are very fast anyway. I mentioned eliminating a couple memcpys in an earlier post but it would add only a few fps at the cost of more complexity.

As you have extensive experience, let me ask whether you think some temporal denoising would help. I could integrate something like DNR(2) pretty easily. I think trying to implement something NLM like in the time dimension would lead to poor performance. Most people like to transcode at frame rates greater than 1. What do you think from a practical point of view?

Guest · Post by **Guest** » Thu Feb 23, 2017 5:57 pm

Thanks for the perspective

I should thank you for your efforts

DNR(2) could be useful as optional switch / setting
When you refer to frame rates greater than 1 I assume you mean 1X real speed
To be honest I am not a placebo kind of guy, I like the results of my settings and whichever filters I choose to use, and it takes as long as it takes.
If I think the process is to slow I will not reduce my settings or eliminate filters, I just buid a faster system, or live with the time requirements.
However, for those users that want as small as possible and as fast as possible, something like DNR(2) as you said, would be useful as an option.

I must say though, your DGDenoise has minimal impact on encoding speed.
Next time I do another grainy / noisy source, which will be real soon, I will keep track of the encoding rates and let you know.

Post by **admin** » Thu Feb 23, 2017 6:00 pm

I actually meant frame rate 1 frame per second, which is called normal elsewhere, in the use case of added temporal behavior. I will never release a filter that runs at 1 fps.

I think you are right, we do need some kind of temporal denoising. I can make it fast using CUDA.

Guest · Post by **Guest** » Thu Feb 23, 2017 6:10 pm

admin wrote:No, I meant frame rate 1 frame per second, which is called normal elsewhere.

Sorry, when I see 1 fps I think x265 placebo

To be blunt, DGDenoise has excellent performance and results, as far as I am concerned. Mind you, I don't create a 1000% zoom and stare at it with my nose 6" from the screen.
Personally I would say it is fine as is, if you wish to refine it further great, but if no one else has a compelling reason for extra "features" I would say that it is fine as is. If you are having fun with DGDenoise, by all means go for it and I will try to help with testing as much as I can.
May I remind you that in x_32 environment the nornal of 1 fps on my system can be done at over 200 fps with DGDenoise. Don't really need to speed it up.
P.S.
Thanks to you I can finally get some real usage out of my video card

Post by **admin** » Thu Feb 23, 2017 6:15 pm

Oh, well, go on being blunt.

So you want to get more use out of your card, eh? OK, other than QTGMC (mvtools etc.) what else would you like to see implemented on CUDA? It's easy to make pipelines of kernels with intermediate frames stored in device memory. Then only the source and destination frames need to copied to/from the host, It's something I have been thinking about and whether there are useful analogies of scripting that can be applied to purely device operation flows. Sort of a CUDASynth.

Guest · Post by **Guest** » Thu Feb 23, 2017 6:19 pm

Lets go for QTGMC and then think of something else afterwards
You know, if you keep this up you will cost me a bundle of money, as soon as the 1070 can't keep up I will be forced to upgrade to a 1080 or better

Post by **admin** » Thu Feb 23, 2017 6:24 pm

At least we have a year until Volta arrives on the desktop (so they say).

OK, deal. Maybe I'll do a quick temporal filter so nobody can whine that I am restricted to space, and then QTGMC.

Guest · Post by **Guest** » Thu Feb 23, 2017 6:37 pm

Volta?
Heck, I was talking about the triple slot Titan X
5760 cuda cores vs 2560 for a 1080

Post by **admin** » Thu Feb 23, 2017 6:50 pm

LOLZ

Guest · Post by **Guest** » Thu Feb 23, 2017 7:17 pm

that should be Titan Z, my boo boo

Post by **admin** » Thu Feb 23, 2017 7:51 pm

I thought it was Titan X (compute 6.1) or GTX Titan Z (compute 3.5).

Sharc · Post by **Sharc** » Fri Feb 24, 2017 1:37 am

The speed of DGDenoise makes it possible to use the NLM algo even on my relatively slow system. That's quite something!
Some observations based on VHS tape sources::
For strength >0.05 the denoiser tends to smoothen low contrast areas, e.g. slightly darker grey areas on a brighter gray background get gradually washed out (a kind of detail loss, the picture tends to looks flat). This is however not specific to DGDenoise, it's similar with KNLMeansCL. (I am using interleave(original,filtered) rather than stackhorizontal for comparison.) As I mentioned before, I found the combination of DGDenoise with x264 --nr most efficient and effective.
Temporal smoothing: On some sources I have to apply temporal smoothing. I do this currently by adding temporalsoften(4,4,8,10,mode=2). Just curious whether the proposed temporal denoising in DGDenoise would do similar in one step?

Post by **admin** » Fri Feb 24, 2017 5:52 am

Hi Sharc. Thank you for your feedback. Of course, as you say, all denoisers are going to struggle to reliably distinguish noise from real detail. That's something that only humans are currently very good at. But the NLM algorithm does a fair job of it.

I haven't decided the exact temporal denoising to be implemented [thank you for pointing out TemporalSoften()], but it will surely be done with additional parameters and not by adding a new filter. Regarding algorithms developers always try to do better than the existing ways in quality or performance, but one can't always succeed in that. We'll see.

Guest · Post by **Guest** » Fri Feb 24, 2017 4:06 pm

Re: Speed of DGDenoise
One source --- two encodes
DGDenoise(strength=0) 56.88 fps
DGDenoise(strength=0.15) 67.30 fps

Post by **admin** » Fri Feb 24, 2017 5:56 pm

Don't invoke DGDenoise(strength=0). It is not a no-op like DGSource(strength=0.0) is. Maybe it should be, or maybe we should punish people that add filters that do nothing.

Guest · Post by **Guest** » Fri Feb 24, 2017 5:59 pm

Sorry for the confusion
Didn't invoke DGDenoise(strength=0.0)
Was just trying to illustrate speed with no DGDenoise in the mix

maybe we should punish people that add filters that do nothing.

Oh Jeez, Am I in for a beating?

Post by **admin** » Fri Feb 24, 2017 6:01 pm

gonca wrote:Was just trying to illustrate speed with no DGDenoise in the mix

Just leave it out next time.

Put that little # thing at the front of the script line, to comment it out.

Anyway, it looks like your encode process is the bottleneck, because you were reporting 288 fps for DGDenoise() alone.

As always, I appreciate your feedback, especially with data. High numbers...but only for DG tools. OK? Only for DG. Carry on.

Post by **admin** » Fri Feb 24, 2017 6:06 pm

I can't believe no-one went

over my CUDASynth idea. Just ahead of the times, what can I do?

Guest · Post by **Guest** » Fri Feb 24, 2017 6:16 pm

CUDASynth

That would be awesome!

Post by **admin** » Fri Feb 24, 2017 6:18 pm

How much memory on your GPU, gonca? I'm going to need a power tester for CUDASynth.

Guest · Post by **Guest** » Fri Feb 24, 2017 6:25 pm

8 GB of GDDR5

Post by **admin** » Fri Feb 24, 2017 6:31 pm

Sweet, how many CUDA cores?

I tried to play with core clock but it was too unstable and didn't give much improvement.

DGDenoise

Re: About DGDenoise

Re: About DGDenoise

Re: About DGDenoise

Re: About DGDenoise

Re: About DGDenoise

Re: About DGDenoise

Re: About DGDenoise

Re: About DGDenoise

Re: About DGDenoise

Re: About DGDenoise

Re: About DGDenoise

Re: About DGDenoise

Re: About DGDenoise

Re: About DGDenoise

Re: About DGDenoise

Re: About DGDenoise

Re: About DGDenoise

Re: About DGDenoise

Re: About DGDenoise

Re: About DGDenoise

Re: About DGDenoise

Re: About DGDenoise

Re: About DGDenoise

Re: About DGDenoise

Re: About DGDenoise