DGDecomb

These CUDA filters are packaged into DGDecodeNV, which is part of DGDecNV.
User avatar
admin
Site Admin
Posts: 4411
Joined: Thu Sep 09, 2010 3:08 pm

Re: DGDecomb

Post by admin » Fri Mar 17, 2017 8:59 am

I implemented SSE2 reduction. I chose to do that rather than use a CUDA reduction kernel because by the time the host<--> device transfers, kernel launches, and synchronization are factored in, there is no advantage for CUDA.

Current timings for 1080p on 1050Ti:

TelecideNV: 2050 fps
Telecide: 579 fps

Now I can leave the optimizations, implement BFF handling, and then think about postprocessing and decimation.

gonca
Moose Approved
Posts: 846
Joined: Sun Apr 08, 2012 6:12 pm

Re: DGDecomb

Post by gonca » Fri Mar 17, 2017 9:14 am

TelecideNV: 2050 fps
Is that with the GTX1050ti or GTX1080ti

Just noticed that the 1080ti will arrive today, so this speed is on a 1050ti.
Very impressive!

User avatar
admin
Site Admin
Posts: 4411
Joined: Thu Sep 09, 2010 3:08 pm

Re: DGDecomb

Post by admin » Fri Mar 17, 2017 10:29 am

I clarified my post to indicate that I used the 1050Ti. The kernel is so fast that performance is now being limited mainly by Avisynth and application overhead.

Regarding performance, some points should be borne in mind. With a single GPU all the filters have to compete for CUDA time. With multiple GPUs you could, for example, run DGDecodeNV on one of them and DGDenoise on another, etc. Even with DGDecodeNV, TelecideNV, DGDenoise, and DGSharpen all running at once, even if computation is serialized in the case of one GPU, you recover a lot of CPU time for encoding. For a CPU-bound encoder, the main thing is that the delivery of filtered frames to the encoder must not be a bottleneck.

gonca
Moose Approved
Posts: 846
Joined: Sun Apr 08, 2012 6:12 pm

Re: DGDecomb

Post by gonca » Fri Mar 17, 2017 11:29 am

But the filters are so fast that even with Avisynth overhead limiting then a CPU bound encoder will always be the bottleneck.
Now, if somebody could come up with a better (CUDA) version of NVEnc, say DGNVenc... :ugeek:
I realize that it probably isn't possible with CUDA :facepalm:

User avatar
admin
Site Admin
Posts: 4411
Joined: Thu Sep 09, 2010 3:08 pm

Re: DGDecomb

Post by admin » Fri Mar 17, 2017 2:04 pm

Nothing is off the table.

User avatar
admin
Site Admin
Posts: 4411
Joined: Thu Sep 09, 2010 3:08 pm

Re: DGDecomb

Post by admin » Fri Mar 17, 2017 7:55 pm

OK, girls and boys, are you ready? Are you ready for the first performance comparison between the 1050Ti and the 1080Ti?

I ran this script with a 720x480 clip needing field matching. I added denoising and sharpening as well. Note that TelecideNV has not been integrated into DGDecodeNV.dll yet, but it will be.

loadplugin("dgdecodenv.dll")
loadplugin("telecidenv.dll")
a=dgsource("lain.dgi")
a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a # Lain is a short clip; make a long enough test for AVSMeter
telecidenv()
DGDenoise(strength=0.1)
DGSharpen(strength=0.5)

The results:

1050Ti: 470 fps
1080Ti: 818 fps

That seems like a useful speed-up to me. ;)

gonca
Moose Approved
Posts: 846
Joined: Sun Apr 08, 2012 6:12 pm

Re: DGDecomb

Post by gonca » Fri Mar 17, 2017 8:29 pm

That's a substantial speed increase.
In Canada they are still out of stock

User avatar
admin
Site Admin
Posts: 4411
Joined: Thu Sep 09, 2010 3:08 pm

Re: DGDecomb

Post by admin » Fri Mar 17, 2017 9:08 pm

I reckon your 1070 would weigh in at about 650 fps for that test.

BTW, while profiling the filters I was able to find a useful speed gain for DGSource(). About 20%. I'll slipstream it at some point.

Can you imagine having two of these, the first decoding and filtering, and the second encoding? Maybe it's time for me to look into the NV encoding samples. After I complete DGDecomb, of course.

gonca
Moose Approved
Posts: 846
Joined: Sun Apr 08, 2012 6:12 pm

Re: DGDecomb

Post by gonca » Fri Mar 17, 2017 9:20 pm

2 gtx1080ti cards in one system
That might require a 1000 watt power supply
But the CPU could be cut down

Edit
Now that I think about it, the CPU can't be cut down.
Each card requires 16 lanes for a total of 32 lanes
You would need an extreme edition CPU to handle that many lanes

User avatar
admin
Site Admin
Posts: 4411
Joined: Thu Sep 09, 2010 3:08 pm

Re: DGDecomb

Post by admin » Fri Mar 17, 2017 10:37 pm

Now I have to find out what a lane is. :cry:

Post Reply