DGDecomb

These CUDA filters are packaged into DGDecodeNV, which is part of DGDecNV.
Aleron Ives
Posts: 113
Joined: Fri May 31, 2013 8:36 pm

Re: DGDecomb

Post by Aleron Ives » Fri Mar 17, 2017 11:28 pm

admin wrote:Now I have to find out what a lane is. :cry:
It's time for a driver's ed refresher course! :lol:

;)

Sharc
Moose Approved
Posts: 224
Joined: Thu Sep 23, 2010 1:53 pm

Re: DGDecomb

Post by Sharc » Sat Mar 18, 2017 3:26 am

DGSource has the boolean parameter "use-pf".
Is the decision about the frame type (progressive/interlaced) based on flags of the source, or does the algorithm analyse the frames and decide for combed or non-combed frames for deinterlacing (similar to DGDecomb)?

How will this be with the new DGDecombNV? Is it a pure FM / Decimate function?

gonca
Moose Approved
Posts: 846
Joined: Sun Apr 08, 2012 6:12 pm

Re: DGDecomb

Post by gonca » Sat Mar 18, 2017 6:26 am

lane >>> pci-e x 3.0 lane
Different CPU types support different numbers

User avatar
admin
Site Admin
Posts: 4411
Joined: Thu Sep 09, 2010 3:08 pm

Re: DGDecomb

Post by admin » Sat Mar 18, 2017 6:38 am

Sharc wrote:DGSource has the boolean parameter "use-pf".
Is the decision about the frame type (progressive/interlaced) based on flags of the source, or does the algorithm analyse the frames and decide for combed or non-combed frames for deinterlacing (similar to DGDecomb)?

How will this be with the new DGDecombNV? Is it a pure FM / Decimate function?
use_pf uses source stream flags; there is no analysis. It isn't something you can rely on generally, but if you know that your stream properly sets the progressive_frame flag, it can be useful to avoid deinterlacing frames marked as progressive. Even so, a frame may be coded and marked interlaced but have no motion and thus not need deinterlacing. So analysis is the fully correct general strategy to preserve frames whose content is progressive.

I am just now starting to think about postprocessing/deinterlacing for DGDecomb. It will perform analysis in some way (I'm trying to re-use the field matching metrics) and then any frames appearing combed after field matching will be deinterlaced with an as yet to-be-determined CUDA kernel. We will save one frame copy to the GPU because it is already there from the field matching. The primary goal is speed, and not necessarily to reproduce all classic Decomb behavior and options.

User avatar
hydra3333
Moose Approved
Posts: 200
Joined: Wed Oct 06, 2010 3:34 am
Contact:

Re: DGDecomb

Post by hydra3333 » Sat Mar 18, 2017 7:49 am

admin wrote:BTW, while profiling the filters I was able to find a useful speed gain for DGSource(). About 20%. I'll slipstream it at some point.
Thanks !
admin wrote:Maybe it's time for me to look into the NV encoding samples. After I complete DGDecomb, of course.
... and DGdeblockNV :) ? OK, this looks like it has options you may or may nor consider useful; just a thought.
https://forum.videohelp.com/threads/370 ... U-encoding
http://rigaya34589.blog135.fc2.com/blog ... ry-17.html
https://github.com/rigaya/NVEnc
this also may or may not be of interest (a new deblocker) [link removed]

User avatar
admin
Site Admin
Posts: 4411
Joined: Thu Sep 09, 2010 3:08 pm

Re: DGDecomb

Post by admin » Sat Mar 18, 2017 10:33 am

Thanks for the references, hydra3333. Yes, deblocking is on the list and has a high priority.

Meanwhile, I have completed postprocessing for TelecideNV and I added a show option so you can see the metrics and decisions. I'll give y'all a test version after I fix up DGDecIM to use the Avisynth 2.6 interface for my friend Selur. Then comes DecimateNV followed by DeblockNV.

User avatar
hydra3333
Moose Approved
Posts: 200
Joined: Wed Oct 06, 2010 3:34 am
Contact:

Re: DGDecomb

Post by hydra3333 » Sun Mar 19, 2017 12:11 am

Many thanks for your ongoing clarity of thought and action, and really useful code.
:bravo:

Sharc
Moose Approved
Posts: 224
Joined: Thu Sep 23, 2010 1:53 pm

Re: DGDecomb

Post by Sharc » Sun Mar 19, 2017 6:19 am

admin wrote:
Sharc wrote:DGSource has the boolean parameter "use-pf".
Is the decision about the frame type (progressive/interlaced) based on flags of the source, or does the algorithm analyse the frames and decide for combed or non-combed frames for deinterlacing (similar to DGDecomb)?

How will this be with the new DGDecombNV? Is it a pure FM / Decimate function?
use_pf uses source stream flags; there is no analysis. It isn't something you can rely on generally, but if you know that your stream properly sets the progressive_frame flag, it can be useful to avoid deinterlacing frames marked as progressive. Even so, a frame may be coded and marked interlaced but have no motion and thus not need deinterlacing. So analysis is the fully correct general strategy to preserve frames whose content is progressive.

I am just now starting to think about postprocessing/deinterlacing for DGDecomb. It will perform analysis in some way (I'm trying to re-use the field matching metrics) and then any frames appearing combed after field matching will be deinterlaced with an as yet to-be-determined CUDA kernel. We will save one frame copy to the GPU because it is already there from the field matching. The primary goal is speed, and not necessarily to reproduce all classic Decomb behavior and options.
Thanks for clarification.
Yes, the flags can be misleading or confusing. For example, I have seen different practices for 3:2 hard telecined material, like the 3 progressive frames flagged as progressive_frame=true (progressive) and the 2 combed frames as progressive_frame=false (interlaced), whereas in another case all 5 frames were flagged as progressive_frame=false (interlaced)..... :o

User avatar
admin
Site Admin
Posts: 4411
Joined: Thu Sep 09, 2010 3:08 pm

Re: DGDecomb

Post by admin » Sun Mar 19, 2017 7:48 pm

I got DecimateNV working. Now I have a full CUDA IVTC solution. I need to add licensing to DecimateNV and make a few optimizations, and then I'll give y'all a beta of both TelecideNV and DecimateNV.

There's something very interesting I learned about CUDA. Memory transfers are so expensive relative to kernel processing that it often pays off to do some suboptimal stuff in the kernel in order to save on memory transferred back to the host. For example, suppose you have a kernel that runs one thread per pixel and calculates a difference between two frames. Then you would transfer a full frame-sized array of differences back to the host and sum them on the host. But if you (say) allocated one thread per 16 pixels (sacrificing some parallelism) and calculated all sixteen and summed them in the kernel, then the size of the memory transfer back to the host is 1/16 of a full frame. Not only that, but the summation on the host is faster. There is a sweet spot for performance that has to be determined empirically. This is one of the optimizations I mentioned above. I want to complete those before giving any more timings. Trust me, it blows away classic Decomb!

jpsdr
Moose Approved
Posts: 182
Joined: Tue Sep 21, 2010 4:16 am

Re: DGDecomb

Post by jpsdr » Mon Mar 20, 2017 4:30 am

I don't know how exactly your IVTC works, but i'll just share with you my work/result of my VDub IVTC filter i've made almost 15 years ago... Get the code on my github if you're curious.
At the time, all the automatic IVTC filters worked by detecting interlaced frame by computing fields corrolation values (differences between fields), and considering IVTC frames the two which have the highest correlation value. But i wasn't satisfied by the results.
I've choosen a different approach: The rough idea is the following: Doing the IVTC on the frame, computing the correlation, and consider the frame telecined if the difference between the correlation of the IVTC frame and correlation of the original frame is decreased under some threshold. And also use one of your idea comming from your smart deinterlace : computing correlation only on area suposely detected interlaced by a simple threshold test as you do in your smart deinterlace. If i remember properly, the interlace detection area is made only on the original frame, and these same areas are used on both original and telecined frame to compute correlation.
And i was unsing two computations : correlation from the whole frame, and correlation only from "areas".

Do whatever you want with these two cents share thoughs... ;)

Post Reply