Page 2 of 16

Re: CUDASynth

Posted: Wed Oct 03, 2018 12:06 pm
by admin
I am happy to announce CUDASynth 0.1:

http://rationalqm.us/misc/CUDASynth_0.1.rar

Testing and feedback will be appreciated.

Re: CUDASynth

Posted: Wed Oct 03, 2018 3:48 pm
by Guest
Got a couple of things to finish off and then the testing will begin

Re: CUDASynth

Posted: Wed Oct 03, 2018 5:43 pm
by Guest
Speed looks good
cudasynth.log
(719.6 KiB) Downloaded 656 times
test.log
(244.36 KiB) Downloaded 644 times
Can't see the results of the cudasynth file in VDub or MPC-HC so I will run quick encode and check

Re: CUDASynth

Posted: Wed Oct 03, 2018 6:34 pm
by Guest
No visible issues that I could see on the encoded file
CPU usage down, and surprisingly so is GPU and VPU load.
Speed is right up there though
Looks good

Re: CUDASynth

Posted: Wed Oct 03, 2018 8:24 pm
by admin
Thanks for the test results, gonca. Now to get some critical mass we need to make more CUDASynth-enabled filters. Feel free to suggest possibilities. If there are any good open source ones it would not be hard to port them.

Re: CUDASynth

Posted: Thu Oct 04, 2018 12:56 am
by hydra3333
testing, sorry, per the CUDASynth.txt "* Vapoursynth is not yet supported" unfortunately I no longer have avisynth.

Filters possibilities ? You have current functionality for
- decode / deinterlace / crop / resize
- denoise
- sharpen
- HDR10 to SDR

That's about all I use, other than maybe an occasional
- deblock for low quality TV broadcasts (Aus telly can be bitrate starved)
- video stabilisation, rarely, more for the home videos that one must share including vhs type captures
- croprel, addborders, rarely, more for the home videos that one must share including vhs type captures
- HDRAGC or equivalent, rarely, more for the home videos that one must share including vhs type captures
- despot, very rarely for some vhs type captures

- mdegrain, very rarely for some vhs captures etc
- anti-alias ?sangnom, almost never nowadays
- QTGMC deinterlacing, almost never nowadays

Re: CUDASynth

Posted: Thu Oct 04, 2018 4:55 am
by Guest
DGTelecide
DGDecimate
DGPQtoHLG

I tend to use the DG filters more than any other

Re: CUDASynth

Posted: Fri Oct 05, 2018 7:49 am
by DJATOM
I'd like to have nnedi3/eedi3 CUDA versions. They are cpu consuming, and offloading to gpu will help a lot. Currently we have nnedi3 openCL (full rewrite to use gpu only) and eedi3 openCL (partial rewrite to use gpu for calculating connection costs), so the second one still consuming cpu for main processing.
If you want to look at them, eedi3 | nnedi3.

Re: CUDASynth

Posted: Fri Oct 05, 2018 8:39 am
by admin
Thank you, gentlemen, for the thoughts and links. eedi3 and nnedi3 look like they might be fun to try. First, though, I need to get serious about DGIndex MKV support.

@hydra3333

For Vapoursynth, can't you use the avscompat layer? I can add native support later, although I have to confess that the required code duplication is a royal pain in the you-know-what.

Re: CUDASynth

Posted: Mon Oct 08, 2018 2:03 pm
by Guest
Masktools 2 seems to be popular
https://github.com/pinterf/masktools/releases

Re: CUDASynth

Posted: Mon Oct 08, 2018 2:14 pm
by DJATOM
avscompat layer seems to be working, but speed is near the same as in "cpu" mode.
But still relatively fast - near 65 fps (default settings, DGSource -> DGDenoise -> DGSharpen) and 105 fps (default settings, DGSource -> DGDenoise).
Hardware: GTX 750, i5-4670k

Re: CUDASynth

Posted: Mon Oct 08, 2018 4:04 pm
by admin
Thanks for the results, DJ. Just out of interest I'd like to see a benchmark of a script for these three (no CUDASynth):

Avisynth+
Vapoursynth native
Vapoursynth avscompat

I seem to recall when doing some testing recently both Vapoursynth ways fell short compared to Avisynth+, but I haven't tried it recently.

Re: CUDASynth

Posted: Mon Oct 08, 2018 4:37 pm
by DJATOM
Ok, for now I've checked same script
ClearAutoloadDirs()
LoadPlugin("C:\322\x64\DGDecodeNV.dll")
DGSource("J:\Darling6\STREAM\EP16.dgi")
DGDenoise()
DGSharpen()
trim(0,6000)
and
ClearAutoloadDirs()
LoadPlugin("C:\322\x64\DGDecodeNV.dll")
DGSource("J:\Darling6\STREAM\EP16.dgi",fdst="gpu0")
DGDenoise(fsrc="gpu0",fdst="gpu0")
DGSharpen(fsrc="gpu0",fdst="cpu")
trim(0,6000)
So CUDASynth works in the native avs+.
C:\322>avs2yuv64 EP16.avs -o NUL
Avs2YUV 0.28
Script file: EP16.avs
Resolution: 1920x1080
Frames per sec: 24000/1001 (23.976)
Total frames: 6001
CSP: YV12
Progress Frames FPS Elapsed Remain
[100.0%] 6000/6001 86.72 0:01:09 0:00:00
Started: Tue Oct 9 00:24:34 2018
Finished: Tue Oct 9 00:25:43 2018
Elapsed: 0:01:09

C:\322>avs2yuv64 EP16.avs -o NUL
Avs2YUV 0.28
Script file: EP16.avs
Resolution: 1920x1080
Frames per sec: 24000/1001 (23.976)
Total frames: 6001
CSP: YV12
Progress Frames FPS Elapsed Remain
[100.0%] 6000/6001 102.32 0:00:58 0:00:00
Started: Tue Oct 9 00:26:26 2018
Finished: Tue Oct 9 00:27:25 2018
Elapsed: 0:00:59
I'll measure avscompat (without and with fsrc/fdst) soon, need to close browser to have more GPU RAM for testing.
And as there are no native Vapoursynth versions for DGDenoise/DGSharpen, should I check them in avscompat and DGSource in the native modes?

Re: CUDASynth

Posted: Mon Oct 08, 2018 4:46 pm
by Guest
LoadPlugin("C:/Program Files (Portable)/dgdecnv/x64 Binaries/DGDecodeNV.dll")
DGSource("I:\test.dgi", fieldop=0, fulldepth=True)
ConvertBits(10)
FPS 92.3
import vapoursynth as vs
core = vs.get_core()
core.std.LoadPlugin("C:/Program Files (Portable)/dgdecnv/x64 Binaries/DGDecodeNV.dll")
clip = core.dgdecodenv.DGSource(r'I:\test.dgi', fieldop=0, fulldepth=True)
clip = core.resize.Point(clip, format=vs.YUV420P10)
clip.set_output()
FPS 133.0
import vapoursynth as vs
core = vs.get_core()
core.avs.LoadPlugin("C:/Program Files (Portable)/dgdecnv/x64 Binaries/DGDecodeNV.dll")
clip = core.avs.DGSource("I:/test.dgi", fieldop=0, fulldepth=True)
clip = core.resize.Point(clip, format=vs.YUV420P10)
clip.set_output()
FPS 129.8
Source was a 4K clip

Edit
Avs compatability is 2x faster with cudasynth than with the regular version, 4K sample with DGHDRtoSDR (default) and DGSharpen (default)

Re: CUDASynth

Posted: Mon Oct 08, 2018 4:52 pm
by DJATOM
cudasynth in avscompat:
import vapoursynth as vs
core = vs.get_core()

core.avs.LoadPlugin(r'C:\322\x64\DGDecodeNV.dll')

clip = core.avs.DGSource(r'J:\Darling6\STREAM\EP16.dgi', fdst="gpu0")
clip = core.avs.DGDenoise(clip, fsrc="gpu0", fdst="gpu0")
clip = core.avs.DGSharpen(clip, fsrc="gpu0", fdst="cpu")
clip = core.std.Trim(clip, 0, 6000)
clip.set_output()
Image

no cudasynth in avscompat:
import vapoursynth as vs
core = vs.get_core()

core.avs.LoadPlugin(r'C:\322\x64\DGDecodeNV.dll')

clip = core.avs.DGSource(r'J:\Darling6\STREAM\EP16.dgi")
clip = core.avs.DGDenoise(clip)
clip = core.avs.DGSharpen(clip)
clip = core.std.Trim(clip, 0, 6000)
clip.set_output()
Image

native DGSource + avscompat DGDenoise and DGSharpen:
import vapoursynth as vs
core = vs.get_core()

core.std.LoadPlugin(r'C:\322\x64\DGDecodeNV.dll')
core.avs.LoadPlugin(r'C:\322\x64\DGDecodeNV.dll')

clip = core.dgdecodenv.DGSource(r'J:\Darling6\STREAM\EP16.dgi')
clip = core.avs.DGDenoise(clip)
clip = core.avs.DGSharpen(clip)
clip = core.std.Trim(clip, 0, 6000)
clip.set_output()
Image

I don't know why we have such results, at least I tried to compare with minimum differences in the resource usage (with closed browser, etc).

Re: CUDASynth

Posted: Mon Oct 08, 2018 4:57 pm
by Guest
DJATOM

Two items
Don't know if DGDenoise is actually cudasynth enabled yet
clip = core.avs.DGDenoise(clip, fsrc="gpu0", fdst="gpu0")
clip = core.avs.DGSharpen(clip, fsrc="gpu0", fdst="cpu")
should actually be
clip = core.avs.DGDenoise(clip, fsrc="gpu0", fdst="gpu1")
clip = core.avs.DGSharpen(clip, fsrc="gpu1", fdst="cpu")
to get the ping pong effect

Re: CUDASynth

Posted: Mon Oct 08, 2018 5:04 pm
by DJATOM
Oh, I thought gpu0/gpu1 is for 2 cards setup (I have only one).

Re: CUDASynth

Posted: Mon Oct 08, 2018 5:09 pm
by Guest
I only have one card as well
I think it has to do with the pipelines/kernels???

Try it and see if it makes a difference

Re: CUDASynth

Posted: Mon Oct 08, 2018 5:16 pm
by DJATOM
Tried and...
Image

Re: CUDASynth

Posted: Mon Oct 08, 2018 5:28 pm
by Guest
Could you check on what your GPU usage is while running the script?

Re: CUDASynth

Posted: Mon Oct 08, 2018 5:46 pm
by admin
Thanks, guys, awesome!

DGDenoise and DGSharpen are both CUDASynth-enabled.

Meanwhile, there is another limitation I discovered. Some 3rd party players and encode apps open the script multiple times. That will not work with CUDASynth as currently designed because there can be only one pipeline. I think I can fix that up fairly easily by having only the first source filter set up the framework.

Also, I have CUDASynth-enabled DGPQtoHLG. I'll make a release tomorrow after some testing.

Re: CUDASynth

Posted: Tue Oct 09, 2018 12:33 am
by hydra3333
Extremely nice work, DG. Thank you.

edit:To allay my lack of clarity, in the context of the new pipeline enabled DGDecodeNV.dll and the aforementioned test scripts with like
(a)

Code: Select all

core.std.LoadPlugin("C:/Program Files (Portable)/dgdecnv/x64 Binaries/DGDecodeNV.dll")
clip = core.dgdecodenv.DGSource(r'I:\test.dgi', fieldop=0, fulldepth=True)
and
(b)

Code: Select all

core.avs.LoadPlugin(r'C:\322\x64\DGDecodeNV.dll')
clip = core.avs.DGSource(r'J:\Darling6\STREAM\EP16.dgi', fdst="gpu0")
clip = core.avs.DGDenoise(clip, fsrc="gpu0", fdst="gpu0")
edit: added LoadPlugin to snippet (b) for clarity

and per the CUDASynth.txt "* Vapoursynth is not yet supported", would be correct to say the original non-cudasynth DGDecodeNV.dll is used in snippet (a) with ".dgdecodenv." and cudasynth DGDecodeNV.dll in snippet (b) with ".avs." ?

Hmm, I must not have caught up with the latest as my (long not updated) scripts have continued to use "core.avs.LoadPlugin" and "core.avs.DGSource" rather than "core.std.LoadPlugin" and "core.dgdecodenv.DGSource" ... damn, get I must get in from the scrub outa the midday sun. https://www.youtube.com/embed/z2YvYiWto ... &version=3

Re: CUDASynth

Posted: Tue Oct 09, 2018 5:30 am
by Guest
and per the CUDASynth.txt "* Vapoursynth is not yet supported", would be correct to say the original non-cudasynth DGDecodeNV.dll is used in snippet (a) with ".dgdecodenv." and cudasynth DGDecodeNV.dll in snippet (b) with ".avs." ?
Yes

Re: CUDASynth

Posted: Tue Oct 09, 2018 9:28 am
by admin
and per the CUDASynth.txt "* Vapoursynth is not yet supported", would be correct to say the original non-cudasynth DGDecodeNV.dll is used in snippet (a) with ".dgdecodenv." and cudasynth DGDecodeNV.dll in snippet (b) with ".avs." ?
No. The dgdecodenv versus avs decides whether the referenced DLL is invoked natively or via the avscompat layer. Either way, you would still load the same DLL. However, the CUDASynth DLL can only be loaded with avscompat at this time. If you omit the load plugin call then you could pick up something from autoloading. I recommend always using explicit loading.

Re: CUDASynth

Posted: Tue Oct 09, 2018 10:05 am
by Guest
Snippet (a) is the one I used in the benchmarking you asked for, and it uses to original non-cudasynth dll
Snippet (b) is the one from DJATOM's testing of the cudasynth dll