Support forum for DG Tools

Posted: **Tue Mar 07, 2017 10:18 am**

I made a CUDA kernel to perform limited sharpening. Call it LimitedSharpenFastest.

On a 1080P stream it runs at ~309 fps (including decoding) on my GTX 1050Ti. Note that DGSharpen() is my own design and not based on any LSF code.

I'll release it hopefully later today or tomorrow. Thanks to hydra3333 for suggesting this filter.

CUDA filters are fun.

Code: Select all

Number of frames:                  839
Length (hh:mm:ss.ms):     00:00:00.084
Frame width:                      1920
Frame height:                     1080
Framerate:                   10000.000 (10000/1)
Colorspace:                       YV12
Audio channels:                    n/a
Audio bits/sample:                 n/a
Audio sample rate:                 n/a
Audio samples:                     n/a

Frames processed:               839 (0 - 838)
FPS (min | max | average):      67.36 | 338.5 | 308.7
Memory usage (phys | virt):     227 | 316 MiB
Thread count:                   20
CPU usage (average):            14%

GPU core clock | memory clock:  1747 | 1752
GPU usage (average):            38%
VPU usage (average):            57%
GPU memory usage:               697 MiB

Time (elapsed):                 00:00:02.718

Posted: **Wed Mar 08, 2017 7:24 am**

Eager to try it ......

Posted: **Wed Mar 08, 2017 7:57 am**

Good morning Mr. Sharc. I have slipstreamed everything.

DGSharpen is a basic limited sharpener without all the bells and whistles of LimitedSharpenFaster(). If you think any of those bells and whistles are needed, please let me know.

Posted: **Wed Mar 08, 2017 8:53 am**

Greetings, Don
DGSharpen() works fine here. Very effective in conjunction with DGDenoise().
And no, I am missing neither a bell nor a whistle for the time being ......

Just a note on the documentation: DGSource() still has the denoising parameters (strength,blend,chroma,searchw), but the explanation of the same has been moved to the standalone filters. I assume NLM is still integral in DGSource()?

Posted: **Wed Mar 08, 2017 9:30 am**

The integral support was removed as I intend to have many filters and so would run out of parameters for DGSource(). In any case, the integral support just did Invoke() so it was not really different from doing:

DGSource()
DGDenoise()

If I had (say) 10 filters available and tried to have them integral, even if I had unlimited parameters, how could one control the filter ordering? It just gets silly when you have multiple filters available.

I'll correct the document. Thanks for pointing it out. And thanks for your testing. I too find the combination of DGDenoise() plus DGSharpen() to be very effective...and super fast.

The big challenge now is to make a CUDA equivalent for QTGMC. It's a big job as it will need functionality doing similar things to MaskTools and MVTools, etc. QTGMC really is the gold standard so it would be wonderful to have a fast CUDA implementation offering similar quality.

There's one little technical thing I have to do to keep JIT compile time reasonable when the number of filters gets much larger. I have to either find a way not to have to JIT compile DGFilters.ptx once for each invoked filter, or I have to split DGFilters.ptx into individual ptx files. I'll probably do the latter as it is much simpler and conceptually cleaner. I'm also looking into packaging ptx code into DGDecodeNV.dll, so users do not have to manage things.

Posted: **Wed Mar 08, 2017 9:33 am**

Document corrected and slipstreamed.

Posted: **Wed Mar 08, 2017 12:54 pm**

I have successfully packed the PTX code into DGDecodeNV.dll, eliminating the need for PTX files in the distribution. I'll include it in the next slipstream.

Maybe I will pack the cubin into DGIndexNV also.

Posted: **Wed Mar 08, 2017 3:33 pm**

Re: having individual filters

In my case, it just makes it easier for me to keep track, specially with my scripting skills

Posted: **Wed Mar 08, 2017 4:05 pm**

Yes, gonca!

Can you imagine a DGSource() invocation with 50 parameters to set up all the filters?

The PTX file issue is irrelevant now that I packed them into the DLL. You'll never see those things. You'll just load DGDecodeNV.dll and then write your script as usual:

LoadPlugin("DGDecodeNV.dll")
DGSource()
DGDenoise()
DGSharpen()

Off to the store, I'm out of Scotch and ribeye.

Posted: **Wed Mar 08, 2017 4:13 pm**

Can you imagine a DGSource() invocation with 50 parameters to set up all the filters?

Yes I can, so once again I say :bravo:

to individual filters
Also makes it more modular and manageable, easier to see what is being invoked and parameters.
Possibly easier to debug if necessary.

Posted: **Wed Mar 08, 2017 8:02 pm**

I did some tests of DGSharpen versus LSF with corresponding parameters on my 1050Ti.

With no filters the decoding rate for 1080p is 395 fps. Adding DGSharpen drops it to 314 fps. Adding LSF drops it to 225. DGSharpen is thus about twice as fast as LSF.

A similar calculation excluding the decoding for DGDenoise versus KNLMeansCL would show an even greater speedup than previously reported.

I did some extensive research online about available CUDA video filtering implementations. With all due modesty, it seems I am the first guy to get impressive performance results (sorry, I do not consider KNLMeansCL to be impressive in performance). My architecture and optimizations appear to be hitting the sweet spot. Can't wait for my 1080Ti.

I found a CUDA denoising filter for VirtualDub that claimed to implement "Quick NLM" (which I previously rejected due to its artifacting). When I ran it, it produced only black frames. I reckon I could produce black frames at a very high rate.

Regarding my CUDASynth idea, during the above-mentioned research I found that user "Adub" from another forum has previously suggested a similar idea. He did not ever release anything. Some others have talked about making pipelines by leaving frames in GPU memory. Unfortunately for my current architecture, it's not so easy. To get good performance I need to use texture memory, but it is read-only on the device. I have some ideas to get around that, but I don't know right now what the impact on performance will be versus just copying back to the host and running the following filter in the usual way.

Posted: **Thu Mar 09, 2017 5:07 am**

Easy to suggest or come up with ideas
Implementation is the hard part, and you are implementing your ideas

Posted: **Thu Mar 09, 2017 8:34 am**

gonca wrote:Implementation is the hard part

+1

Posted: **Thu Mar 09, 2017 10:12 am**

I packaged the CUDA binaries into DGIndexNV and DGDecodeNV as I mentioned earlier. I also converted to PTX and JIT compilation, which means I won't have anything to do for new architectures. Now you won't need any extra CUDA files in the distribution. I'll regression test and slipstream later today. A slipstream a day keeps the cracker away.

Posted: **Thu Mar 09, 2017 5:48 pm**

Guess I'll be checking it out this weekend, any tests you want me to do?

PS
Haven't forgot about using the iGPU on my new system to check your software like you asked, its just that life got in the way for a few days.
Hopefully I'll have the new system up and running early next week and will test.

Posted: **Thu Mar 09, 2017 6:22 pm**

Please just make sure I didn't break anything when I embedded the PTX code for both DGIndexNV and DGDecodeNV. Thanks, gonca. This PTX embedding is good both to be future-proof and to protect my kernels at least a tiny bit. If I really want to get paranoid, I can encrypt the memory representation and decrypt it just before JIT compilation.

Don't worry too much about IMSDK stuff. I am focused on CUDASynth and CUDA versions of MaskTools and MVTools functionality right now.

Tomorrow is launch day for the 1080Ti. I hope to at least score a pre-order somewhere. MSI is looking promising.

I'm working on a bottle of Grant's right now. It has a nice sweetness to it. I usually take it neat but I will try on the rocks too.

My new rule is don't buy any Scotch that comes in a plastic bottle.

Posted: **Thu Mar 09, 2017 6:34 pm**

My new rule is don't buy any Scotch that comes in a plastic bottle.

or in a cardboard box with a plastic liner

Posted: **Thu Mar 09, 2017 6:47 pm**

Oy, I thought they did that only for wine.

Do you like that Laphroigh stuff? I could not abide the overly smoky taste.

Posted: **Thu Mar 09, 2017 6:51 pm**

You're probably right, but in this day and age you never know.
If there is a market, they will try to sell it, some good ol' 5 minute aged stuff (moonshine)
http://www.oocities.org/collegepark/qua ... reech.html
If it isn't sold in a cardboard box it should be
This is what Canadian moonshine is all about
https://search.yahoo.com/search?p=newfo ... h&ei=UTF-8

Posted: **Sat Mar 11, 2017 11:51 am**

Results
Used my basic settings to encode
Results look awesome
DGDenoise and DGSharpen at default

Speeds
DGSource 541.2 fps >> 1.848 mseconds per frame
DGSource + DGDenoise 258.7 fps >> 3.501 mseconds per frame
DGDenoise 1.653 mseconds per frame >>> 605.0 fps
DGSource + DGDenoise + DGSharpen 205.6 fps >> 4.863 mseconds per frame
DGSharpen 1.362 mseconds per frame >>> 734.2 fps

I also captured 1000 frames from each encode with VDub for comparison. Quality is great

If you wish access to the encoded clips or the caps let me know. I can set something up.

Posted: **Sat Mar 11, 2017 12:26 pm**

Looking good, thank you for the results, gonca.

I pulled the trigger on my new system. Sadly, though, the 1080Ti is on back-order. At least I am in the queue.

Posted: **Sat Mar 11, 2017 3:02 pm**

Seems that even the Nvidia store is out of stock

Posted: **Sat Mar 11, 2017 3:10 pm**

Everybody is out. You had to be online hitting F5 every 10 seconds to even have a chance.

Posted: **Sat Mar 11, 2017 4:48 pm**

I pulled the trigger on my new system

What are you getting?

Posted: **Sat Mar 11, 2017 4:52 pm**

EVGA SuperNOVA 850 G2 220-G2-0850-XR 80+ GOLD 850W Fully Modular EVGA ECO Mode Includes FREE Power On Self Tester Power Supply

G.SKILL Ripjaws V Series 64GB (4 x 16GB) 288-Pin DDR4 SDRAM DDR4 2666 (PC4 21300) Intel Z170 Platform Desktop Memory Model F4-2666C15Q-64GVR

Intel Core i7-7700K Kaby Lake Quad-Core 4.2 GHz LGA 1151 91W BX80677I77700K Desktop Processor

MSI Z270 XPOWER GAMING TITANIUM LGA 1151 Intel Z270 HDMI SATA 6Gb/s USB 3.1 ATX Motherboards - Intel

Windows 10 Pro 64-bit - OEM

ASUS GeForce GTX 1080 TI 11GB GDDR5X Founders Edition VR Ready 5K HD Gaming HDMI DisplayPort PCIe Graphics Card (GTX1080TI-FE)

I have a big tower case and hard disk in the shed. Mouse/keyboard/monitor to be determined.

Support forum for DG Tools

DGSharpen

DGSharpen

Re: About DGSharpen

Re: About DGSharpen

Re: About DGSharpen

Re: About DGSharpen

Re: About DGSharpen

Re: About DGSharpen

Re: About DGSharpen

Re: About DGSharpen

Re: About DGSharpen

Re: About DGSharpen

Re: About DGSharpen

Re: About DGSharpen

Re: About DGSharpen

Re: About DGSharpen

Re: About DGSharpen

Re: About DGSharpen

Re: About DGSharpen

Re: About DGSharpen

Re: About DGSharpen

Re: About DGSharpen

Re: About DGSharpen

Re: About DGSharpen

Re: About DGSharpen

Re: About DGSharpen