DGSharpen

These CUDA filters are packaged into DGDecodeNV, which is part of DGDecNV.
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

DGSharpen

Post by admin »

I made a CUDA kernel to perform limited sharpening. Call it LimitedSharpenFastest. :lol: On a 1080P stream it runs at ~309 fps (including decoding) on my GTX 1050Ti. Note that DGSharpen() is my own design and not based on any LSF code.

I'll release it hopefully later today or tomorrow. Thanks to hydra3333 for suggesting this filter.

CUDA filters are fun. :ugeek:

Code: Select all

Number of frames:                  839
Length (hh:mm:ss.ms):     00:00:00.084
Frame width:                      1920
Frame height:                     1080
Framerate:                   10000.000 (10000/1)
Colorspace:                       YV12
Audio channels:                    n/a
Audio bits/sample:                 n/a
Audio sample rate:                 n/a
Audio samples:                     n/a

Frames processed:               839 (0 - 838)
FPS (min | max | average):      67.36 | 338.5 | 308.7
Memory usage (phys | virt):     227 | 316 MiB
Thread count:                   20
CPU usage (average):            14%

GPU core clock | memory clock:  1747 | 1752
GPU usage (average):            38%
VPU usage (average):            57%
GPU memory usage:               697 MiB

Time (elapsed):                 00:00:02.718
DAE avatar
Sharc
Posts: 233
Joined: Thu Sep 23, 2010 1:53 pm

Re: About DGSharpen

Post by Sharc »

Eager to try it ...... :D
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: About DGSharpen

Post by admin »

Good morning Mr. Sharc. I have slipstreamed everything.

DGSharpen is a basic limited sharpener without all the bells and whistles of LimitedSharpenFaster(). If you think any of those bells and whistles are needed, please let me know.
DAE avatar
Sharc
Posts: 233
Joined: Thu Sep 23, 2010 1:53 pm

Re: About DGSharpen

Post by Sharc »

Greetings, Don
DGSharpen() works fine here. Very effective in conjunction with DGDenoise().
And no, I am missing neither a bell nor a whistle for the time being ......

Just a note on the documentation: DGSource() still has the denoising parameters (strength,blend,chroma,searchw), but the explanation of the same has been moved to the standalone filters. I assume NLM is still integral in DGSource()?
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: About DGSharpen

Post by admin »

The integral support was removed as I intend to have many filters and so would run out of parameters for DGSource(). In any case, the integral support just did Invoke() so it was not really different from doing:

DGSource()
DGDenoise()

If I had (say) 10 filters available and tried to have them integral, even if I had unlimited parameters, how could one control the filter ordering? It just gets silly when you have multiple filters available.

I'll correct the document. Thanks for pointing it out. And thanks for your testing. I too find the combination of DGDenoise() plus DGSharpen() to be very effective...and super fast.

The big challenge now is to make a CUDA equivalent for QTGMC. It's a big job as it will need functionality doing similar things to MaskTools and MVTools, etc. QTGMC really is the gold standard so it would be wonderful to have a fast CUDA implementation offering similar quality.

There's one little technical thing I have to do to keep JIT compile time reasonable when the number of filters gets much larger. I have to either find a way not to have to JIT compile DGFilters.ptx once for each invoked filter, or I have to split DGFilters.ptx into individual ptx files. I'll probably do the latter as it is much simpler and conceptually cleaner. I'm also looking into packaging ptx code into DGDecodeNV.dll, so users do not have to manage things.
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: About DGSharpen

Post by admin »

Document corrected and slipstreamed.
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: About DGSharpen

Post by admin »

I have successfully packed the PTX code into DGDecodeNV.dll, eliminating the need for PTX files in the distribution. I'll include it in the next slipstream.

Maybe I will pack the cubin into DGIndexNV also.
DAE avatar
Guest

Re: About DGSharpen

Post by Guest »

Re: having individual filters
:agree: :bravo:
In my case, it just makes it easier for me to keep track, specially with my scripting skills
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: About DGSharpen

Post by admin »

Yes, gonca!

Can you imagine a DGSource() invocation with 50 parameters to set up all the filters? :wow:

The PTX file issue is irrelevant now that I packed them into the DLL. You'll never see those things. You'll just load DGDecodeNV.dll and then write your script as usual:

LoadPlugin("DGDecodeNV.dll")
DGSource()
DGDenoise()
DGSharpen()

Off to the store, I'm out of Scotch and ribeye.
DAE avatar
Guest

Re: About DGSharpen

Post by Guest »

Can you imagine a DGSource() invocation with 50 parameters to set up all the filters? :wow:
Yes I can, so once again I say :bravo: :bow: to individual filters
Also makes it more modular and manageable, easier to see what is being invoked and parameters.
Possibly easier to debug if necessary.
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: About DGSharpen

Post by admin »

I did some tests of DGSharpen versus LSF with corresponding parameters on my 1050Ti.

With no filters the decoding rate for 1080p is 395 fps. Adding DGSharpen drops it to 314 fps. Adding LSF drops it to 225. DGSharpen is thus about twice as fast as LSF.

A similar calculation excluding the decoding for DGDenoise versus KNLMeansCL would show an even greater speedup than previously reported.

I did some extensive research online about available CUDA video filtering implementations. With all due modesty, it seems I am the first guy to get impressive performance results (sorry, I do not consider KNLMeansCL to be impressive in performance). My architecture and optimizations appear to be hitting the sweet spot. Can't wait for my 1080Ti.

I found a CUDA denoising filter for VirtualDub that claimed to implement "Quick NLM" (which I previously rejected due to its artifacting). When I ran it, it produced only black frames. I reckon I could produce black frames at a very high rate. :lol:

Regarding my CUDASynth idea, during the above-mentioned research I found that user "Adub" from another forum has previously suggested a similar idea. He did not ever release anything. Some others have talked about making pipelines by leaving frames in GPU memory. Unfortunately for my current architecture, it's not so easy. To get good performance I need to use texture memory, but it is read-only on the device. I have some ideas to get around that, but I don't know right now what the impact on performance will be versus just copying back to the host and running the following filter in the usual way.
DAE avatar
Guest

Re: About DGSharpen

Post by Guest »

Easy to suggest or come up with ideas
Implementation is the hard part, and you are implementing your ideas
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: About DGSharpen

Post by admin »

gonca wrote:Implementation is the hard part
+1
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: About DGSharpen

Post by admin »

I packaged the CUDA binaries into DGIndexNV and DGDecodeNV as I mentioned earlier. I also converted to PTX and JIT compilation, which means I won't have anything to do for new architectures. Now you won't need any extra CUDA files in the distribution. I'll regression test and slipstream later today. A slipstream a day keeps the cracker away. :lol:
DAE avatar
Guest

Re: About DGSharpen

Post by Guest »

Guess I'll be checking it out this weekend, any tests you want me to do?

PS
Haven't forgot about using the iGPU on my new system to check your software like you asked, its just that life got in the way for a few days.
Hopefully I'll have the new system up and running early next week and will test.
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: About DGSharpen

Post by admin »

Please just make sure I didn't break anything when I embedded the PTX code for both DGIndexNV and DGDecodeNV. Thanks, gonca. This PTX embedding is good both to be future-proof and to protect my kernels at least a tiny bit. If I really want to get paranoid, I can encrypt the memory representation and decrypt it just before JIT compilation.

Don't worry too much about IMSDK stuff. I am focused on CUDASynth and CUDA versions of MaskTools and MVTools functionality right now.

Tomorrow is launch day for the 1080Ti. I hope to at least score a pre-order somewhere. MSI is looking promising.

I'm working on a bottle of Grant's right now. It has a nice sweetness to it. I usually take it neat but I will try on the rocks too.

My new rule is don't buy any Scotch that comes in a plastic bottle. ;)
DAE avatar
Guest

Re: About DGSharpen

Post by Guest »

My new rule is don't buy any Scotch that comes in a plastic bottle. ;)
or in a cardboard box with a plastic liner :cry:
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: About DGSharpen

Post by admin »

Oy, I thought they did that only for wine.

Do you like that Laphroigh stuff? I could not abide the overly smoky taste.
DAE avatar
Guest

Re: About DGSharpen

Post by Guest »

You're probably right, but in this day and age you never know.
If there is a market, they will try to sell it, some good ol' 5 minute aged stuff (moonshine)
http://www.oocities.org/collegepark/qua ... reech.html
If it isn't sold in a cardboard box it should be
This is what Canadian moonshine is all about
https://search.yahoo.com/search?p=newfo ... h&ei=UTF-8
DAE avatar
Guest

Re: About DGSharpen

Post by Guest »

Results
Used my basic settings to encode
Results look awesome
DGDenoise and DGSharpen at default

Speeds
DGSource 541.2 fps >> 1.848 mseconds per frame
DGSource + DGDenoise 258.7 fps >> 3.501 mseconds per frame
DGDenoise 1.653 mseconds per frame >>> 605.0 fps
DGSource + DGDenoise + DGSharpen 205.6 fps >> 4.863 mseconds per frame
DGSharpen 1.362 mseconds per frame >>> 734.2 fps

I also captured 1000 frames from each encode with VDub for comparison. Quality is great

If you wish access to the encoded clips or the caps let me know. I can set something up.
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: About DGSharpen

Post by admin »

Looking good, thank you for the results, gonca.

I pulled the trigger on my new system. Sadly, though, the 1080Ti is on back-order. At least I am in the queue.
DAE avatar
Guest

Re: About DGSharpen

Post by Guest »

Seems that even the Nvidia store is out of stock
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: About DGSharpen

Post by admin »

Everybody is out. You had to be online hitting F5 every 10 seconds to even have a chance.
DAE avatar
Guest

Re: About DGSharpen

Post by Guest »

I pulled the trigger on my new system
What are you getting?
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: About DGSharpen

Post by admin »

EVGA SuperNOVA 850 G2 220-G2-0850-XR 80+ GOLD 850W Fully Modular EVGA ECO Mode Includes FREE Power On Self Tester Power Supply

G.SKILL Ripjaws V Series 64GB (4 x 16GB) 288-Pin DDR4 SDRAM DDR4 2666 (PC4 21300) Intel Z170 Platform Desktop Memory Model F4-2666C15Q-64GVR

Intel Core i7-7700K Kaby Lake Quad-Core 4.2 GHz LGA 1151 91W BX80677I77700K Desktop Processor

MSI Z270 XPOWER GAMING TITANIUM LGA 1151 Intel Z270 HDMI SATA 6Gb/s USB 3.1 ATX Motherboards - Intel

Windows 10 Pro 64-bit - OEM

ASUS GeForce GTX 1080 TI 11GB GDDR5X Founders Edition VR Ready 5K HD Gaming HDMI DisplayPort PCIe Graphics Card (GTX1080TI-FE)

I have a big tower case and hard disk in the shed. Mouse/keyboard/monitor to be determined.
Post Reply