DGDecomb

These CUDA filters are packaged into DGDecodeNV, which is part of DGDecNV.
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: DGDecomb

Post by admin »

OK, girls and boys, are you ready? Are you ready for the first performance comparison between the 1050Ti and the 1080Ti?

I ran this script with a 720x480 clip needing field matching. I added denoising and sharpening as well. Note that TelecideNV has not been integrated into DGDecodeNV.dll yet, but it will be.

loadplugin("dgdecodenv.dll")
loadplugin("telecidenv.dll")
a=dgsource("lain.dgi")
a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a # Lain is a short clip; make a long enough test for AVSMeter
telecidenv()
DGDenoise(strength=0.1)
DGSharpen(strength=0.5)

The results:

1050Ti: 470 fps
1080Ti: 818 fps

That seems like a useful speed-up to me. ;)
DAE avatar
Guest

Re: DGDecomb

Post by Guest »

That's a substantial speed increase.
In Canada they are still out of stock
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: DGDecomb

Post by admin »

I reckon your 1070 would weigh in at about 650 fps for that test.

BTW, while profiling the filters I was able to find a useful speed gain for DGSource(). About 20%. I'll slipstream it at some point.

Can you imagine having two of these, the first decoding and filtering, and the second encoding? Maybe it's time for me to look into the NV encoding samples. After I complete DGDecomb, of course.
DAE avatar
Guest

Re: DGDecomb

Post by Guest »

2 gtx1080ti cards in one system
That might require a 1000 watt power supply
But the CPU could be cut down

Edit
Now that I think about it, the CPU can't be cut down.
Each card requires 16 lanes for a total of 32 lanes
You would need an extreme edition CPU to handle that many lanes
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: DGDecomb

Post by admin »

Now I have to find out what a lane is. :cry:
DAE avatar
Aleron Ives
Posts: 126
Joined: Fri May 31, 2013 8:36 pm

Re: DGDecomb

Post by Aleron Ives »

admin wrote:Now I have to find out what a lane is. :cry:
It's time for a driver's ed refresher course! :lol:

;)
DAE avatar
Sharc
Posts: 233
Joined: Thu Sep 23, 2010 1:53 pm

Re: DGDecomb

Post by Sharc »

DGSource has the boolean parameter "use-pf".
Is the decision about the frame type (progressive/interlaced) based on flags of the source, or does the algorithm analyse the frames and decide for combed or non-combed frames for deinterlacing (similar to DGDecomb)?

How will this be with the new DGDecombNV? Is it a pure FM / Decimate function?
DAE avatar
Guest

Re: DGDecomb

Post by Guest »

lane >>> pci-e x 3.0 lane
Different CPU types support different numbers
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: DGDecomb

Post by admin »

Sharc wrote:DGSource has the boolean parameter "use-pf".
Is the decision about the frame type (progressive/interlaced) based on flags of the source, or does the algorithm analyse the frames and decide for combed or non-combed frames for deinterlacing (similar to DGDecomb)?

How will this be with the new DGDecombNV? Is it a pure FM / Decimate function?
use_pf uses source stream flags; there is no analysis. It isn't something you can rely on generally, but if you know that your stream properly sets the progressive_frame flag, it can be useful to avoid deinterlacing frames marked as progressive. Even so, a frame may be coded and marked interlaced but have no motion and thus not need deinterlacing. So analysis is the fully correct general strategy to preserve frames whose content is progressive.

I am just now starting to think about postprocessing/deinterlacing for DGDecomb. It will perform analysis in some way (I'm trying to re-use the field matching metrics) and then any frames appearing combed after field matching will be deinterlaced with an as yet to-be-determined CUDA kernel. We will save one frame copy to the GPU because it is already there from the field matching. The primary goal is speed, and not necessarily to reproduce all classic Decomb behavior and options.
User avatar
hydra3333
Posts: 394
Joined: Wed Oct 06, 2010 3:34 am
Contact:

Re: DGDecomb

Post by hydra3333 »

admin wrote:BTW, while profiling the filters I was able to find a useful speed gain for DGSource(). About 20%. I'll slipstream it at some point.
Thanks !
admin wrote:Maybe it's time for me to look into the NV encoding samples. After I complete DGDecomb, of course.
... and DGdeblockNV :) ? OK, this looks like it has options you may or may nor consider useful; just a thought.
https://forum.videohelp.com/threads/370 ... U-encoding
http://rigaya34589.blog135.fc2.com/blog ... ry-17.html
https://github.com/rigaya/NVEnc
this also may or may not be of interest (a new deblocker) [link removed]
I really do like it here.
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: DGDecomb

Post by admin »

Thanks for the references, hydra3333. Yes, deblocking is on the list and has a high priority.

Meanwhile, I have completed postprocessing for TelecideNV and I added a show option so you can see the metrics and decisions. I'll give y'all a test version after I fix up DGDecIM to use the Avisynth 2.6 interface for my friend Selur. Then comes DecimateNV followed by DeblockNV.
User avatar
hydra3333
Posts: 394
Joined: Wed Oct 06, 2010 3:34 am
Contact:

Re: DGDecomb

Post by hydra3333 »

Many thanks for your ongoing clarity of thought and action, and really useful code.
:bravo:
I really do like it here.
DAE avatar
Sharc
Posts: 233
Joined: Thu Sep 23, 2010 1:53 pm

Re: DGDecomb

Post by Sharc »

admin wrote:
Sharc wrote:DGSource has the boolean parameter "use-pf".
Is the decision about the frame type (progressive/interlaced) based on flags of the source, or does the algorithm analyse the frames and decide for combed or non-combed frames for deinterlacing (similar to DGDecomb)?

How will this be with the new DGDecombNV? Is it a pure FM / Decimate function?
use_pf uses source stream flags; there is no analysis. It isn't something you can rely on generally, but if you know that your stream properly sets the progressive_frame flag, it can be useful to avoid deinterlacing frames marked as progressive. Even so, a frame may be coded and marked interlaced but have no motion and thus not need deinterlacing. So analysis is the fully correct general strategy to preserve frames whose content is progressive.

I am just now starting to think about postprocessing/deinterlacing for DGDecomb. It will perform analysis in some way (I'm trying to re-use the field matching metrics) and then any frames appearing combed after field matching will be deinterlaced with an as yet to-be-determined CUDA kernel. We will save one frame copy to the GPU because it is already there from the field matching. The primary goal is speed, and not necessarily to reproduce all classic Decomb behavior and options.
Thanks for clarification.
Yes, the flags can be misleading or confusing. For example, I have seen different practices for 3:2 hard telecined material, like the 3 progressive frames flagged as progressive_frame=true (progressive) and the 2 combed frames as progressive_frame=false (interlaced), whereas in another case all 5 frames were flagged as progressive_frame=false (interlaced)..... :o
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: DGDecomb

Post by admin »

I got DecimateNV working. Now I have a full CUDA IVTC solution. I need to add licensing to DecimateNV and make a few optimizations, and then I'll give y'all a beta of both TelecideNV and DecimateNV.

There's something very interesting I learned about CUDA. Memory transfers are so expensive relative to kernel processing that it often pays off to do some suboptimal stuff in the kernel in order to save on memory transferred back to the host. For example, suppose you have a kernel that runs one thread per pixel and calculates a difference between two frames. Then you would transfer a full frame-sized array of differences back to the host and sum them on the host. But if you (say) allocated one thread per 16 pixels (sacrificing some parallelism) and calculated all sixteen and summed them in the kernel, then the size of the memory transfer back to the host is 1/16 of a full frame. Not only that, but the summation on the host is faster. There is a sweet spot for performance that has to be determined empirically. This is one of the optimizations I mentioned above. I want to complete those before giving any more timings. Trust me, it blows away classic Decomb!
DAE avatar
jpsdr
Posts: 214
Joined: Tue Sep 21, 2010 4:16 am

Re: DGDecomb

Post by jpsdr »

I don't know how exactly your IVTC works, but i'll just share with you my work/result of my VDub IVTC filter i've made almost 15 years ago... Get the code on my github if you're curious.
At the time, all the automatic IVTC filters worked by detecting interlaced frame by computing fields corrolation values (differences between fields), and considering IVTC frames the two which have the highest correlation value. But i wasn't satisfied by the results.
I've choosen a different approach: The rough idea is the following: Doing the IVTC on the frame, computing the correlation, and consider the frame telecined if the difference between the correlation of the IVTC frame and correlation of the original frame is decreased under some threshold. And also use one of your idea comming from your smart deinterlace : computing correlation only on area suposely detected interlaced by a simple threshold test as you do in your smart deinterlace. If i remember properly, the interlace detection area is made only on the original frame, and these same areas are used on both original and telecined frame to compute correlation.
And i was unsing two computations : correlation from the whole frame, and correlation only from "areas".

Do whatever you want with these two cents share thoughs... ;)
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: DGDecomb

Post by admin »

Thanks for bringing this to my attention, jpsdr.

Do you have a clip that shows your method performing better than the traditional approach? And how does your method compare in performance? From what you said it sounds like it would be way slower.

I had a look at your code. It is so complex and extensive and with only very limited commenting that I have no hope of figuring out what you are doing. And to be honest your previous post is rather unclear. Finally, telling me about this after I complete my implementation is a bit perverse. ;)
DAE avatar
jpsdr
Posts: 214
Joined: Tue Sep 21, 2010 4:16 am

Re: DGDecomb

Post by jpsdr »

Sorry, no bad/perverse intention... :?
I'll PM you later an ftp account with a clip i'm using to made my tests, but i don't remember if it performs better on this specific clip. I think remembering it performs better than the automatic IVTC included in VDub...

If my post wasn't clear, i'll re-try to explain the idea:
N.o : odd field of frame N. (Bottom field, lines 1,3,5,...)
N.e : even field of frame N. (Top field lines 0,2,4,...).
- Compute correlation data between N.o and N.e : value A.
- Compute correlation data between N.e and N-1.o : value B.
Two correlation values are computed for each frame :
One computed from the whole frame.
One computed only on zones detected interlaced, using the same idea/method of your smart deinterlace. The map of interlaced zones is made using the original frame, and after the both correlations values (can be called A' and B') are computed only on the zones from the map constructed.
As the filter is old, it was made at the time VDub filters were working only on RGB32 data. So, all computations are made only on RGB data.
To remove noise from correlation data and greatly increase the accuracy, the 2 LSB are removed (from RGB data).
If A' and B' are "good" : If B' << A' frame is a telecined frame, otherwise not. If A' and B' are "not good", A and B are used.
To validate a telecine pattern the both frames detected must be contiguous, except... If change scene is detected.
The filter has a pipeline structure, it computes data only on the current frame. Meaning it works only when runned through the whole file, display is not working, and it's why it doesn't have a preview function.
Another thing in my program, when it founds an IVTC pattern, it stays locked on it if no "strong" detection pattern is found. It's typicaly for on anime when a caracter is talking without mouving, and on the picture only a small mouth is "moving". Another thing preventing any "preview" from working, history/past has an effect on the present.
These are the rough ideas.
If you want to play with it, put in VDub the following filter in the filter chain :
IVTC (with default setting)
Remove frame (with default setting)

then run the process, and see the saved result file.
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: DGDecomb

Post by admin »

Thanks, jpsdr. Looking forward to your IVTC torture clip(s).

As I mentioned, for a CUDA implementation my focus is on speed and your algorithm would be both difficult to implement and rather slow for me. I notice you did not comment on its speed, even though I specifically asked about it. Nevertheless, thank you for the further explanation.
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: DGDecomb

Post by admin »

Folks, please do some testing with this beta of DGTelecide/DGDecimate:

http://rationalqm.us/misc/Beta.rar

If no outright bugs are found I'll slipstream it. Remember, this has no bells and whistles. Based on results, I'll enhance it as needed.
DAE avatar
Sharc
Posts: 233
Joined: Thu Sep 23, 2010 1:53 pm

Re: DGDecomb

Post by Sharc »

First quick tests with DGTelecide():
I am getting strong residual combing even though the show=true reports that the frame has been deinterlaced.
I don't get such combes with the classic telecide().

What is the valid range of pthresh? 0.0 to 1.0? (The documentation calls it "strength" btw.)
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: DGDecomb

Post by admin »

Hi Sharc, please provide the source clip and your script. It's the only way I can analyze it. Thank you.

There is no "normal" range for pthresh. It depends on the source clip. Use show to see the metrics. It's not limited to 1.0; it can go as high as you need.

Thanks for the document correction.
DAE avatar
Guest

Re: DGDecomb

Post by Guest »

I just ran a quick test on a clip that was 100% interlaced, according to DGIndex and to my untrained eyes it looks good at default settings.
I saved the first 300 frames from each result as bitmaps. If needed or desired I will set up an account on a file hosting site, no ads, for easy access.
Three test were
DGSource()
DGSource() + DGTelecide(pthresh=3.5)
DGSource() + DGTelecide(pthresh=3.5) + DGDecimate()
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: DGDecomb

Post by admin »

Hi gonca.

You shouldn't do IVTC on interlaced material! Nevertheless, if you do, it will likely deinterlace every frame, as long as your pthresh is low enough. Better to just use deinterlace=1 in DGSource().
DAE avatar
Sharc
Posts: 233
Joined: Thu Sep 23, 2010 1:53 pm

Re: DGDecomb

Post by Sharc »

Here a testclip. It starts with a telecined segment and changes to interlaced video.
http://www.mediafire.com/file/clztmro8l ... ybrid.m2ts

Script:
clip=DGSource("....hybrid.dgi")
ivtc=clip.DGTelecide(pthresh=1.0,show=true)#.DGDecimate(cycle=5) #for testing IVTC of the telecined segment
return ivtc
DAE avatar
Guest

Re: DGDecomb

Post by Guest »

Please forgive my inexperience with the terminology.
To be more precise, this was a 4 minute clip I created from a DVD of a NTSC TV series which I bought for my collection.
DGDource vs DGSource + DGTelecide show the same frame sequence ( ie same exact image) minus what I believe is the interlacing .
However the result of the DGTelecide clearly showed repeat frames in every group of 5
So for giggles I did DGSource +DGTelecide + DGDecimate
Results were that interlacing and repeat frames are gone
As I said, I can use a file hosting site so you or any other member can analyze the sequences
Here is the script I used

Code: Select all

LoadPlugin("C:\Program Files (Portable)\dgdecnv\DGDecodeNV.dll")
DGSource("I:\test.DGI", fieldop=0)
DGTelecide(pthresh=3.5)
DGDecimate()
#DGDenoise()
#DGSharpen()
ConvertToYV12().AssumeFPS(24000,1001)
Post Reply