DGDenoise
Re: About DGDenoise
Any news on DGDenoise?
Re: About DGDenoise
I'm busy trying to implement deinterlacing in DGDecIM. I'll come back to the denoising.
Re: About DGDenoise
thanks for replying
Re: About DGDenoise
You're welcome, gonca my friend.
Re: About DGDenoise
Yeah, but you are talented and determined.
Re: About DGDenoise
You make me blush.
Re: About DGDenoise
I got bored today and worked on DGDenoise.
I found a way to eliminate the artifacting that had stopped me before. Now the filter is comparable to KNLMeansCL in speed and quality. It's integrated into DGSource() but can also be invoked stand-alone (stand-alone use requires a DG license). So you can do:
LoadPlugin("DGDecodeNV.dll")
DGSource("file.dgi", strength=0.6)
or
LoadPlugin("DGDecodeNV.dll")
AVISource("file.avi") # or other source filter
DGDenoise(strength=0.6)
I'm pretty happy with it but have to do a bit more testing and write documentation. Also I want to look into CUDA kernel optimization. Then I will release a test version for y'all to bang on.
I found a way to eliminate the artifacting that had stopped me before. Now the filter is comparable to KNLMeansCL in speed and quality. It's integrated into DGSource() but can also be invoked stand-alone (stand-alone use requires a DG license). So you can do:
LoadPlugin("DGDecodeNV.dll")
DGSource("file.dgi", strength=0.6)
or
LoadPlugin("DGDecodeNV.dll")
AVISource("file.avi") # or other source filter
DGDenoise(strength=0.6)
I'm pretty happy with it but have to do a bit more testing and write documentation. Also I want to look into CUDA kernel optimization. Then I will release a test version for y'all to bang on.
Re: About DGDenoise
You da Man
Re: About DGDenoise
Thanks, gents. I'm working on a 35% speed improvement, but ReadModeNormalizedFloat is not fully cooperating. Don't worry, I will beat it into submission.
Persistence is the key, right gonca?
Persistence is the key, right gonca?
Re: About DGDenoise
Yep!
And your good programming skills help as well
And your good programming skills help as well
Re: About DGDenoise
Ah, cuTexRefSetFlags(cu_texref, CU_TRSF_READ_AS_INTEGER) during my texture initialization suppresses the action of ReadModeNormalizedFloat! Delete that sucker. So now I have it working and achieved my big performance gain. I'll explain a bit in case anyone else is interested in CUDA kernel optimization.
I had a texture of unsigned chars (the YV12 input data). Textures are good to use because texture reads are optimized for 2D spatial locality, just what we need for NLM. I do a texture read and get an unsigned char out, of course. But I need it to be a float in the range [0,1] for use in the NLM algorithm. So naively I was just dividing it by 255.0f. Now in the innermost loop, I need two such texture reads, so I was getting a big overhead of these divisions. Then I discovered ReadModeNormalizedFloat. It maps to [0,1] automatically during the texture read and does it very fast. I implemented it and it gave the speed-up (because I deleted the divisions) but it messed up my colors. Then reading CUDA docs I found that CU_TRSF_READ_AS_INTEGER suppresses the action of ReadModeNormalizedFloat. I had inherited that flag setting from an nVidia texture sample I started from. So not setting that flag was the key.
Where are we now? Using my 1080p sample rat.264 I ran with both DGDenoise and KNLMeansCL default settings:
DGDenoise: 35 seconds
KNLMeansCL: 53 seconds
The quality is equal as the resulting frames are essentially indistinguishable.
But I'm not happy yet. I have a vector length calculation in the inner loop that I might be able to improve.
Getting close to a beta for y'all.
I had a texture of unsigned chars (the YV12 input data). Textures are good to use because texture reads are optimized for 2D spatial locality, just what we need for NLM. I do a texture read and get an unsigned char out, of course. But I need it to be a float in the range [0,1] for use in the NLM algorithm. So naively I was just dividing it by 255.0f. Now in the innermost loop, I need two such texture reads, so I was getting a big overhead of these divisions. Then I discovered ReadModeNormalizedFloat. It maps to [0,1] automatically during the texture read and does it very fast. I implemented it and it gave the speed-up (because I deleted the divisions) but it messed up my colors. Then reading CUDA docs I found that CU_TRSF_READ_AS_INTEGER suppresses the action of ReadModeNormalizedFloat. I had inherited that flag setting from an nVidia texture sample I started from. So not setting that flag was the key.
Where are we now? Using my 1080p sample rat.264 I ran with both DGDenoise and KNLMeansCL default settings:
DGDenoise: 35 seconds
KNLMeansCL: 53 seconds
The quality is equal as the resulting frames are essentially indistinguishable.
But I'm not happy yet. I have a vector length calculation in the inner loop that I might be able to improve.
Getting close to a beta for y'all.
Re: About DGDenoise
Wow! This speed is impressive. Can't wait to test it.......
Re: About DGDenoise
Just finished the documentation. Assembling a beta...
Re: About DGDenoise
Here you go:
http://rationalqm.us/misc/NLM.rar
I may uprev to 2053 when I include this in the formal distribution.
http://rationalqm.us/misc/NLM.rar
I may uprev to 2053 when I include this in the formal distribution.
Re: About DGDenoise
Don't I need a DGDenoise.dll for the standalone filter (avisynth plugin)?
Where to put the dgdenoise.ptx file?
I want to denoise an .avi using the sourcefilter AviSource() followed by DGDenoise(). I can't index an .avi with DGIndexNV .....
Where to put the dgdenoise.ptx file?
I want to denoise an .avi using the sourcefilter AviSource() followed by DGDenoise(). I can't index an .avi with DGIndexNV .....
Re: About DGDenoise
Put the PTX file with DGDecodeNV.dll.
Just load the DGDecodeNV.dll and then put DGDenoise() in your script. You don't need to invoke DGSource().
Is the documentation not clear?
Just load the DGDecodeNV.dll and then put DGDenoise() in your script. You don't need to invoke DGSource().
Is the documentation not clear?
Re: About DGDenoise
Now it works, thanks. I had still an old DGDenoise.dll in my avisynth plugins folder.
Speed is very good. Congrats! With which settings of KNLMeansCL did you compare (d parameter)?
Speed is very good. Congrats! With which settings of KNLMeansCL did you compare (d parameter)?
Re: About DGDenoise
Give me an hour or so and I will try to run some tests on it
Hope I don't have the same brain fart as last time
Hope I don't have the same brain fart as last time
Re: About DGDenoise
Default, i.e., 0.Sharc wrote:With which settings of KNLMeansCL did you compare (d parameter)?
Re: About DGDenoise
Here some first results, tested on the fields of a noisy 720x576i video:
KNLMeansCL(0,3,4,4.0) 15.8fps
KNLMeansCL(1,3,4,4.0) 4.9fps
DGDenoise(strength=0.13) 13.2fps (strength>0.2 loosing details)
GPU: GT730
KNLMeansCL(0,3,4,4.0) 15.8fps
KNLMeansCL(1,3,4,4.0) 4.9fps
DGDenoise(strength=0.13) 13.2fps (strength>0.2 loosing details)
GPU: GT730
Re: About DGDenoise
Can you please give me the video and your script so I can look into the detail loss you mentioned, and also compare performance? I have a 1050 Ti.
Re: About DGDenoise
What did I mess up this time on the script
LoadPlugin("C:\Program Files (Portable)\dgdecnv\DGDecodeNV.dll")
DGSource("W:\HD\WORKFILES\VID_00000.DGI", fieldop=0)
DGDenoise(strength=0.5)
ConvertToYV12().AssumeFPS(24000,1001)
LoadPlugin("C:\Program Files (Portable)\dgdecnv\DGDecodeNV.dll")
DGSource("W:\HD\WORKFILES\VID_00000.DGI", fieldop=0)
DGDenoise(strength=0.5)
ConvertToYV12().AssumeFPS(24000,1001)
- Attachments
-
- VID_00000.AVS
- (184 Bytes) Downloaded 604 times
Re: About DGDenoise
You don't need the ConverttoYV12() and probably also the AssumeFPS(). Also, you can invoke the denoising in DGSource() and omit the DGDenoise() if you like.
Are you having a problem? Make sure you don't have an old DGDenoise.dll lying around to be autoloaded.
Are you having a problem? Make sure you don't have an old DGDenoise.dll lying around to be autoloaded.