DGDecomb

These CUDA filters are packaged into DGDecodeNV, which is part of DGDecNV.
gonca
Moose Approved
Posts: 846
Joined: Sun Apr 08, 2012 6:12 pm

Re: DGDecomb

Post by gonca » Mon Mar 13, 2017 3:15 pm

I won't pretend to understand most of the technical things you said, or how you implement them, but I believe I catch the gist.
By increasing the number of kernels you can reduce the number of conditionals and therefore clock cycles, and by running the first two in parallel and then the appropriate one for TFF or BFF makes it even more efficient, and easier in future to modify if needed.
Very logical and elegant approach
That is why you are the man when it comes to CUDA coding.
Last edited by gonca on Mon Mar 13, 2017 3:18 pm, edited 1 time in total.

gonca
Moose Approved
Posts: 846
Joined: Sun Apr 08, 2012 6:12 pm

Re: DGDecomb

Post by gonca » Mon Mar 13, 2017 3:18 pm

Aleron Ives wrote::wow:

Image

:lol:

Seriously though, thank you for preserving support for older cards.
+1

Good thing D.G. knows what it means

User avatar
admin
Site Admin
Posts: 4411
Joined: Thu Sep 09, 2010 3:08 pm

Re: DGDecomb

Post by admin » Mon Mar 13, 2017 5:16 pm

Thanks, guys. I'm glad you didn't find any holes in my design.

User avatar
admin
Site Admin
Posts: 4411
Joined: Thu Sep 09, 2010 3:08 pm

Re: DGDecomb

Post by admin » Mon Mar 13, 2017 8:32 pm

Started coding. :D

Sharc
Moose Approved
Posts: 224
Joined: Thu Sep 23, 2010 1:53 pm

Re: DGDecomb

Post by Sharc » Tue Mar 14, 2017 5:14 am

Looking forward ..... :D

User avatar
admin
Site Admin
Posts: 4411
Joined: Thu Sep 09, 2010 3:08 pm

Re: DGDecomb

Post by admin » Wed Mar 15, 2017 11:13 am

Good morning, all.

I completed a very rough first cut of TelecideNV to test the concept. It works and fields get matched as expected. However, I have not yet implemented any optimizations, that is:

* no frame subsampling (vanilla Telecide subsamples by 4 in both X and Y)
* no toggling to avoid an extra texture update
* reduction (summing of the differences) performed on the CPU (very slow, especially when not subsampled) instead of a parallel reduction kernel
* no kernel optimizations
* using floats rather than ints (because I started with the DGSharpen code)
* no pinned memory
* some other stuff

Performance is comparable to vanilla Telecide. Now I'll implement the optimizations and we'll see how much performance can be squeezed out.

On another matter, I have successfully brought up my new system (with a 1050Ti for now). I managed to sneak in an order for the 1080Ti at the nVidia store and it seems to have taken. This site is great for getting alerted when buying windows open:

https://www.nowinstock.net/computers/vi ... gtx1080ti/

gonca
Moose Approved
Posts: 846
Joined: Sun Apr 08, 2012 6:12 pm

Re: DGDecomb

Post by gonca » Wed Mar 15, 2017 5:37 pm

It will be fun to see the fps of the 1080ti

User avatar
admin
Site Admin
Posts: 4411
Joined: Thu Sep 09, 2010 3:08 pm

Re: DGDecomb

Post by admin » Thu Mar 16, 2017 1:51 pm

1080Ti arrives tomorrow. :D

Here is the state of play for TelecideNV on my 1050Ti. This includes all optimizations except parallel reduction and pinned host memory. Looking good! Performance is limited by host<-->device memory bandwidth.

Script:

loadplugin("telecidenv.dll")
loadplugin("decomb.dll")
blankclip(length=10000,pixel_type="YV12",width=1920,height=1080)
telecidenv()
#telecide(post=0,chroma=false)

TelecideNV (CUDA) --------------------------------------------------------------------
Number of frames: 10000
Length (hh:mm:ss.ms): 00:06:56.667
Frame width: 1920
Frame height: 1080
Framerate: 24.000 (24/1)
Colorspace: YV12
Audio channels: 1
Audio bits/sample: 16
Audio sample rate: 44100
Audio samples: 18375000

Frames processed: 10000 (0 - 9999)
FPS (min | max | average): 1428 | 1812 | 1739
Memory usage (phys | virt): 135 | 128 MiB
Thread count: 17
CPU usage (average): 11%

Time (elapsed): 00:00:05.749

Telecide (Classic) --------------------------------------------------------------------
Number of frames: 10000
Length (hh:mm:ss.ms): 00:06:56.667
Frame width: 1920
Frame height: 1080
Framerate: 24.000 (24/1)
Colorspace: YV12
Audio channels: 1
Audio bits/sample: 16
Audio sample rate: 44100
Audio samples: 18375000

Frames processed: 10000 (0 - 9999)
FPS (min | max | average): 462.2 | 628.0 | 584.2
Memory usage (phys | virt): 18 | 15 MiB
Thread count: 9
CPU usage (average): 12%

Time (elapsed): 00:00:17.116

Aleron Ives
Posts: 113
Joined: Fri May 31, 2013 8:36 pm

Re: DGDecomb

Post by Aleron Ives » Thu Mar 16, 2017 2:21 pm

Ooooh, aaaah...

I sure hope it works with my poor old GPU driver. :?

User avatar
admin
Site Admin
Posts: 4411
Joined: Thu Sep 09, 2010 3:08 pm

Re: DGDecomb

Post by admin » Thu Mar 16, 2017 6:08 pm

Added pinning of the metrics array on the host:

TelecideNV: 1870 fps
Telecide: 579 fps

I cannot pin the source frame data because it is allocated by Avisynth.

Post Reply