Page 3 of 15

Re: CUDASynth

Posted: Mon Oct 08, 2018 2:14 pm
by DJATOM
avscompat layer seems to be working, but speed is near the same as in "cpu" mode.
But still relatively fast - near 65 fps (default settings, DGSource -> DGDenoise -> DGSharpen) and 105 fps (default settings, DGSource -> DGDenoise).
Hardware: GTX 750, i5-4670k

Re: CUDASynth

Posted: Mon Oct 08, 2018 4:04 pm
by admin
Thanks for the results, DJ. Just out of interest I'd like to see a benchmark of a script for these three (no CUDASynth):

Avisynth+
Vapoursynth native
Vapoursynth avscompat

I seem to recall when doing some testing recently both Vapoursynth ways fell short compared to Avisynth+, but I haven't tried it recently.

Re: CUDASynth

Posted: Mon Oct 08, 2018 4:37 pm
by DJATOM
Ok, for now I've checked same script
ClearAutoloadDirs()
LoadPlugin("C:\322\x64\DGDecodeNV.dll")
DGSource("J:\Darling6\STREAM\EP16.dgi")
DGDenoise()
DGSharpen()
trim(0,6000)
and
ClearAutoloadDirs()
LoadPlugin("C:\322\x64\DGDecodeNV.dll")
DGSource("J:\Darling6\STREAM\EP16.dgi",fdst="gpu0")
DGDenoise(fsrc="gpu0",fdst="gpu0")
DGSharpen(fsrc="gpu0",fdst="cpu")
trim(0,6000)
So CUDASynth works in the native avs+.
C:\322>avs2yuv64 EP16.avs -o NUL
Avs2YUV 0.28
Script file: EP16.avs
Resolution: 1920x1080
Frames per sec: 24000/1001 (23.976)
Total frames: 6001
CSP: YV12
Progress Frames FPS Elapsed Remain
[100.0%] 6000/6001 86.72 0:01:09 0:00:00
Started: Tue Oct 9 00:24:34 2018
Finished: Tue Oct 9 00:25:43 2018
Elapsed: 0:01:09

C:\322>avs2yuv64 EP16.avs -o NUL
Avs2YUV 0.28
Script file: EP16.avs
Resolution: 1920x1080
Frames per sec: 24000/1001 (23.976)
Total frames: 6001
CSP: YV12
Progress Frames FPS Elapsed Remain
[100.0%] 6000/6001 102.32 0:00:58 0:00:00
Started: Tue Oct 9 00:26:26 2018
Finished: Tue Oct 9 00:27:25 2018
Elapsed: 0:00:59
I'll measure avscompat (without and with fsrc/fdst) soon, need to close browser to have more GPU RAM for testing.
And as there are no native Vapoursynth versions for DGDenoise/DGSharpen, should I check them in avscompat and DGSource in the native modes?

Re: CUDASynth

Posted: Mon Oct 08, 2018 4:46 pm
by Guest
LoadPlugin("C:/Program Files (Portable)/dgdecnv/x64 Binaries/DGDecodeNV.dll")
DGSource("I:\test.dgi", fieldop=0, fulldepth=True)
ConvertBits(10)
FPS 92.3
import vapoursynth as vs
core = vs.get_core()
core.std.LoadPlugin("C:/Program Files (Portable)/dgdecnv/x64 Binaries/DGDecodeNV.dll")
clip = core.dgdecodenv.DGSource(r'I:\test.dgi', fieldop=0, fulldepth=True)
clip = core.resize.Point(clip, format=vs.YUV420P10)
clip.set_output()
FPS 133.0
import vapoursynth as vs
core = vs.get_core()
core.avs.LoadPlugin("C:/Program Files (Portable)/dgdecnv/x64 Binaries/DGDecodeNV.dll")
clip = core.avs.DGSource("I:/test.dgi", fieldop=0, fulldepth=True)
clip = core.resize.Point(clip, format=vs.YUV420P10)
clip.set_output()
FPS 129.8
Source was a 4K clip

Edit
Avs compatability is 2x faster with cudasynth than with the regular version, 4K sample with DGHDRtoSDR (default) and DGSharpen (default)

Re: CUDASynth

Posted: Mon Oct 08, 2018 4:52 pm
by DJATOM
cudasynth in avscompat:
import vapoursynth as vs
core = vs.get_core()

core.avs.LoadPlugin(r'C:\322\x64\DGDecodeNV.dll')

clip = core.avs.DGSource(r'J:\Darling6\STREAM\EP16.dgi', fdst="gpu0")
clip = core.avs.DGDenoise(clip, fsrc="gpu0", fdst="gpu0")
clip = core.avs.DGSharpen(clip, fsrc="gpu0", fdst="cpu")
clip = core.std.Trim(clip, 0, 6000)
clip.set_output()
Image

no cudasynth in avscompat:
import vapoursynth as vs
core = vs.get_core()

core.avs.LoadPlugin(r'C:\322\x64\DGDecodeNV.dll')

clip = core.avs.DGSource(r'J:\Darling6\STREAM\EP16.dgi")
clip = core.avs.DGDenoise(clip)
clip = core.avs.DGSharpen(clip)
clip = core.std.Trim(clip, 0, 6000)
clip.set_output()
Image

native DGSource + avscompat DGDenoise and DGSharpen:
import vapoursynth as vs
core = vs.get_core()

core.std.LoadPlugin(r'C:\322\x64\DGDecodeNV.dll')
core.avs.LoadPlugin(r'C:\322\x64\DGDecodeNV.dll')

clip = core.dgdecodenv.DGSource(r'J:\Darling6\STREAM\EP16.dgi')
clip = core.avs.DGDenoise(clip)
clip = core.avs.DGSharpen(clip)
clip = core.std.Trim(clip, 0, 6000)
clip.set_output()
Image

I don't know why we have such results, at least I tried to compare with minimum differences in the resource usage (with closed browser, etc).

Re: CUDASynth

Posted: Mon Oct 08, 2018 4:57 pm
by Guest
DJATOM

Two items
Don't know if DGDenoise is actually cudasynth enabled yet
clip = core.avs.DGDenoise(clip, fsrc="gpu0", fdst="gpu0")
clip = core.avs.DGSharpen(clip, fsrc="gpu0", fdst="cpu")
should actually be
clip = core.avs.DGDenoise(clip, fsrc="gpu0", fdst="gpu1")
clip = core.avs.DGSharpen(clip, fsrc="gpu1", fdst="cpu")
to get the ping pong effect

Re: CUDASynth

Posted: Mon Oct 08, 2018 5:04 pm
by DJATOM
Oh, I thought gpu0/gpu1 is for 2 cards setup (I have only one).

Re: CUDASynth

Posted: Mon Oct 08, 2018 5:09 pm
by Guest
I only have one card as well
I think it has to do with the pipelines/kernels???

Try it and see if it makes a difference

Re: CUDASynth

Posted: Mon Oct 08, 2018 5:16 pm
by DJATOM
Tried and...
Image

Re: CUDASynth

Posted: Mon Oct 08, 2018 5:28 pm
by Guest
Could you check on what your GPU usage is while running the script?

Re: CUDASynth

Posted: Mon Oct 08, 2018 5:46 pm
by admin
Thanks, guys, awesome!

DGDenoise and DGSharpen are both CUDASynth-enabled.

Meanwhile, there is another limitation I discovered. Some 3rd party players and encode apps open the script multiple times. That will not work with CUDASynth as currently designed because there can be only one pipeline. I think I can fix that up fairly easily by having only the first source filter set up the framework.

Also, I have CUDASynth-enabled DGPQtoHLG. I'll make a release tomorrow after some testing.

Re: CUDASynth

Posted: Tue Oct 09, 2018 12:33 am
by hydra3333
Extremely nice work, DG. Thank you.

edit:To allay my lack of clarity, in the context of the new pipeline enabled DGDecodeNV.dll and the aforementioned test scripts with like
(a)

Code: Select all

core.std.LoadPlugin("C:/Program Files (Portable)/dgdecnv/x64 Binaries/DGDecodeNV.dll")
clip = core.dgdecodenv.DGSource(r'I:\test.dgi', fieldop=0, fulldepth=True)
and
(b)

Code: Select all

core.avs.LoadPlugin(r'C:\322\x64\DGDecodeNV.dll')
clip = core.avs.DGSource(r'J:\Darling6\STREAM\EP16.dgi', fdst="gpu0")
clip = core.avs.DGDenoise(clip, fsrc="gpu0", fdst="gpu0")
edit: added LoadPlugin to snippet (b) for clarity

and per the CUDASynth.txt "* Vapoursynth is not yet supported", would be correct to say the original non-cudasynth DGDecodeNV.dll is used in snippet (a) with ".dgdecodenv." and cudasynth DGDecodeNV.dll in snippet (b) with ".avs." ?

Hmm, I must not have caught up with the latest as my (long not updated) scripts have continued to use "core.avs.LoadPlugin" and "core.avs.DGSource" rather than "core.std.LoadPlugin" and "core.dgdecodenv.DGSource" ... damn, get I must get in from the scrub outa the midday sun. https://www.youtube.com/embed/z2YvYiWto ... &version=3

Re: CUDASynth

Posted: Tue Oct 09, 2018 5:30 am
by Guest
and per the CUDASynth.txt "* Vapoursynth is not yet supported", would be correct to say the original non-cudasynth DGDecodeNV.dll is used in snippet (a) with ".dgdecodenv." and cudasynth DGDecodeNV.dll in snippet (b) with ".avs." ?
Yes

Re: CUDASynth

Posted: Tue Oct 09, 2018 9:28 am
by admin
and per the CUDASynth.txt "* Vapoursynth is not yet supported", would be correct to say the original non-cudasynth DGDecodeNV.dll is used in snippet (a) with ".dgdecodenv." and cudasynth DGDecodeNV.dll in snippet (b) with ".avs." ?
No. The dgdecodenv versus avs decides whether the referenced DLL is invoked natively or via the avscompat layer. Either way, you would still load the same DLL. However, the CUDASynth DLL can only be loaded with avscompat at this time. If you omit the load plugin call then you could pick up something from autoloading. I recommend always using explicit loading.

Re: CUDASynth

Posted: Tue Oct 09, 2018 10:05 am
by Guest
Snippet (a) is the one I used in the benchmarking you asked for, and it uses to original non-cudasynth dll
Snippet (b) is the one from DJATOM's testing of the cudasynth dll

Re: CUDASynth

Posted: Tue Oct 09, 2018 10:19 am
by admin
I don't see a loadplugin call in snippet b so it's ambiguous. Also, snippet b leaves the output on the GPU. The last filter should output it to the CPU.

To get precise answers, one needs to ask precise questions. ;)

Re: CUDASynth

Posted: Tue Oct 09, 2018 11:08 am
by admin
CUDASynth 0.2:

* Added CUDASynth-enabled DGPQtoHLG.

* Revised the user manual: explain meaning of gpu0/1 (not different cards!),
added note that Vapoursynth can be used in avscompat mode, and mention
limitation of some players and third-party apps.

http://rationalqm.us/misc/CUDASynth_0.2.rar

Re: CUDASynth

Posted: Tue Oct 09, 2018 4:38 pm
by admin
Regarding the test results you guys gave, I'm having a little trouble digesting it as you have included CUDASynth results when I specifically asked you to exclude it. Also, there seems to be some confusion about native versus avscompat, etc. Finally, we want the prefetch() call for Avisynth+, otherwise we throw away some performance. Tell you what, I'll do some testing and post full results with full scripts and we can go from there.

One of my motivations here is to know whether using the asvcompat layer for Vapoursynth loses performance versus native. To be honest, I'd like to know why I should bother with the PITA of duplicating code to have native Vapoursynth if avscompat performs the same. Even if I have to release an avscompat.dll that's way easier than writing Vapoursynth native code for everything. Any thoughts?

Re: CUDASynth

Posted: Tue Oct 09, 2018 5:56 pm
by DJATOM
Yeah, it's possible to make autoloading with hand-written python module (as I did before native DGSource version came out), but it's, say, wasting a time to type another line in the script. So I'd like to have native versions if possible. I almost don't use avs+ nowadays, moved to VS about 1 year ago :lol:

Re: CUDASynth

Posted: Tue Oct 09, 2018 6:25 pm
by hydra3333
admin wrote:
Tue Oct 09, 2018 4:38 pm
Tell you what, I'll do some testing and post full results with full scripts and we can go from there.
Beaut ! :hat:
admin wrote:
Tue Oct 09, 2018 4:38 pm
One of my motivations here is to know whether using the asvcompat layer for Vapoursynth loses performance versus native. To be honest, I'd like to know why I should bother with the PITA of duplicating code to have native Vapoursynth if avscompat performs the same. Even if I have to release an avscompat.dll that's way easier than writing Vapoursynth native code for everything. Any thoughts?
An eminently reasonable line of reasoning :) Maybe also a question over at the other site as to what may or may not be be foregone if using the asvcompat layer for Vapoursynth versus native ? With any luck the VS author may provide some insight.

Re: CUDASynth

Posted: Tue Oct 09, 2018 6:31 pm
by hydra3333
DJATOM wrote:
Tue Oct 09, 2018 5:56 pm
Yeah, it's possible to make autoloading with hand-written python module (as I did before native DGSource version came out), but it's, say, wasting a time to type another line in the script. So I'd like to have native versions if possible. I almost don't use avs+ nowadays, moved to VS about 1 year ago :lol:
Being a control freak from way back (too many systems went belly up if decent control was omitted during development) I always manually load everything and don't begrudge a line or 20 of code :) Personal preference.

rar ? google tells me
What does RAR stand for in finance?
Abbr. Meaning
RAR Revenue Agent Report (US IRS)
RAR Refund-Anticipated Return
RAR Run At Risk
RAR Regulatory Asset Ratio (finance)
3rd seems about right given the vsrepo experience with 7z :D

Re: CUDASynth

Posted: Tue Oct 09, 2018 8:11 pm
by Guest
If you go this route all I really need to do is change my templates to be avs compatible.
So, all is good
Now that I think about it, the only reason I moved to vs was the lack of high bit depth support in the avisynth chain (NVEncC)
That has been corrected though

Re: CUDASynth

Posted: Tue Oct 09, 2018 8:22 pm
by admin
Good points, guys, thanks.

I have found a simple script that runs perfect in Avisynth+ but runs at half speed and then stops completely in Vapoursynth native (no CUDASynth for both). I want to check a few things first and then I'll give you the script to see if you can replicate it. Then we'll have to try to figure out what is going wrong.

Re: CUDASynth

Posted: Wed Oct 10, 2018 4:32 pm
by admin
I found the cause of the Vapoursynth slowdown and stoppage. DGHDRtoSDR was missing a freeFrame(src) call (affecting only the Vapoursynth code) and so memory was being exhausted. I'll release a fix later today and then get back to proper benchmarking.

Re: CUDASynth

Posted: Thu Oct 11, 2018 7:23 pm
by Guest
Good to hear you are making headway