CUDASynth
Re: CUDASynth
I am happy to announce CUDASynth 0.1:
http://rationalqm.us/misc/CUDASynth_0.1.rar
Testing and feedback will be appreciated.
http://rationalqm.us/misc/CUDASynth_0.1.rar
Testing and feedback will be appreciated.
Re: CUDASynth
Got a couple of things to finish off and then the testing will begin
Re: CUDASynth
Speed looks good
Can't see the results of the cudasynth file in VDub or MPC-HC so I will run quick encode and checkRe: CUDASynth
No visible issues that I could see on the encoded file
CPU usage down, and surprisingly so is GPU and VPU load.
Speed is right up there though
Looks good
CPU usage down, and surprisingly so is GPU and VPU load.
Speed is right up there though
Looks good
Re: CUDASynth
Thanks for the test results, gonca. Now to get some critical mass we need to make more CUDASynth-enabled filters. Feel free to suggest possibilities. If there are any good open source ones it would not be hard to port them.
Re: CUDASynth
testing, sorry, per the CUDASynth.txt "* Vapoursynth is not yet supported" unfortunately I no longer have avisynth.
Filters possibilities ? You have current functionality for
- decode / deinterlace / crop / resize
- denoise
- sharpen
- HDR10 to SDR
That's about all I use, other than maybe an occasional
- deblock for low quality TV broadcasts (Aus telly can be bitrate starved)
- video stabilisation, rarely, more for the home videos that one must share including vhs type captures
- croprel, addborders, rarely, more for the home videos that one must share including vhs type captures
- HDRAGC or equivalent, rarely, more for the home videos that one must share including vhs type captures
- despot, very rarely for some vhs type captures
- mdegrain, very rarely for some vhs captures etc
- anti-alias ?sangnom, almost never nowadays
- QTGMC deinterlacing, almost never nowadays
Filters possibilities ? You have current functionality for
- decode / deinterlace / crop / resize
- denoise
- sharpen
- HDR10 to SDR
That's about all I use, other than maybe an occasional
- deblock for low quality TV broadcasts (Aus telly can be bitrate starved)
- video stabilisation, rarely, more for the home videos that one must share including vhs type captures
- croprel, addborders, rarely, more for the home videos that one must share including vhs type captures
- HDRAGC or equivalent, rarely, more for the home videos that one must share including vhs type captures
- despot, very rarely for some vhs type captures
- mdegrain, very rarely for some vhs captures etc
- anti-alias ?sangnom, almost never nowadays
- QTGMC deinterlacing, almost never nowadays
I really do like it here.
Re: CUDASynth
DGTelecide
DGDecimate
DGPQtoHLG
I tend to use the DG filters more than any other
DGDecimate
DGPQtoHLG
I tend to use the DG filters more than any other
Re: CUDASynth
I'd like to have nnedi3/eedi3 CUDA versions. They are cpu consuming, and offloading to gpu will help a lot. Currently we have nnedi3 openCL (full rewrite to use gpu only) and eedi3 openCL (partial rewrite to use gpu for calculating connection costs), so the second one still consuming cpu for main processing.
If you want to look at them, eedi3 | nnedi3.
If you want to look at them, eedi3 | nnedi3.
PC: RTX 2070 | Ryzen R9 5950X (no OC) | 64 GB RAM
Notebook: RTX 4060 | Ryzen R9 7945HX | 32 GB RAM
Notebook: RTX 4060 | Ryzen R9 7945HX | 32 GB RAM
Re: CUDASynth
Thank you, gentlemen, for the thoughts and links. eedi3 and nnedi3 look like they might be fun to try. First, though, I need to get serious about DGIndex MKV support.
@hydra3333
For Vapoursynth, can't you use the avscompat layer? I can add native support later, although I have to confess that the required code duplication is a royal pain in the you-know-what.
@hydra3333
For Vapoursynth, can't you use the avscompat layer? I can add native support later, although I have to confess that the required code duplication is a royal pain in the you-know-what.
Re: CUDASynth
Masktools 2 seems to be popular
https://github.com/pinterf/masktools/releases
https://github.com/pinterf/masktools/releases
Re: CUDASynth
avscompat layer seems to be working, but speed is near the same as in "cpu" mode.
But still relatively fast - near 65 fps (default settings, DGSource -> DGDenoise -> DGSharpen) and 105 fps (default settings, DGSource -> DGDenoise).
Hardware: GTX 750, i5-4670k
But still relatively fast - near 65 fps (default settings, DGSource -> DGDenoise -> DGSharpen) and 105 fps (default settings, DGSource -> DGDenoise).
Hardware: GTX 750, i5-4670k
PC: RTX 2070 | Ryzen R9 5950X (no OC) | 64 GB RAM
Notebook: RTX 4060 | Ryzen R9 7945HX | 32 GB RAM
Notebook: RTX 4060 | Ryzen R9 7945HX | 32 GB RAM
Re: CUDASynth
Thanks for the results, DJ. Just out of interest I'd like to see a benchmark of a script for these three (no CUDASynth):
Avisynth+
Vapoursynth native
Vapoursynth avscompat
I seem to recall when doing some testing recently both Vapoursynth ways fell short compared to Avisynth+, but I haven't tried it recently.
Avisynth+
Vapoursynth native
Vapoursynth avscompat
I seem to recall when doing some testing recently both Vapoursynth ways fell short compared to Avisynth+, but I haven't tried it recently.
Re: CUDASynth
Ok, for now I've checked same script
And as there are no native Vapoursynth versions for DGDenoise/DGSharpen, should I check them in avscompat and DGSource in the native modes?
andClearAutoloadDirs()
LoadPlugin("C:\322\x64\DGDecodeNV.dll")
DGSource("J:\Darling6\STREAM\EP16.dgi")
DGDenoise()
DGSharpen()
trim(0,6000)
So CUDASynth works in the native avs+.ClearAutoloadDirs()
LoadPlugin("C:\322\x64\DGDecodeNV.dll")
DGSource("J:\Darling6\STREAM\EP16.dgi",fdst="gpu0")
DGDenoise(fsrc="gpu0",fdst="gpu0")
DGSharpen(fsrc="gpu0",fdst="cpu")
trim(0,6000)
I'll measure avscompat (without and with fsrc/fdst) soon, need to close browser to have more GPU RAM for testing.C:\322>avs2yuv64 EP16.avs -o NUL
Avs2YUV 0.28
Script file: EP16.avs
Resolution: 1920x1080
Frames per sec: 24000/1001 (23.976)
Total frames: 6001
CSP: YV12
Progress Frames FPS Elapsed Remain
[100.0%] 6000/6001 86.72 0:01:09 0:00:00
Started: Tue Oct 9 00:24:34 2018
Finished: Tue Oct 9 00:25:43 2018
Elapsed: 0:01:09
C:\322>avs2yuv64 EP16.avs -o NUL
Avs2YUV 0.28
Script file: EP16.avs
Resolution: 1920x1080
Frames per sec: 24000/1001 (23.976)
Total frames: 6001
CSP: YV12
Progress Frames FPS Elapsed Remain
[100.0%] 6000/6001 102.32 0:00:58 0:00:00
Started: Tue Oct 9 00:26:26 2018
Finished: Tue Oct 9 00:27:25 2018
Elapsed: 0:00:59
And as there are no native Vapoursynth versions for DGDenoise/DGSharpen, should I check them in avscompat and DGSource in the native modes?
PC: RTX 2070 | Ryzen R9 5950X (no OC) | 64 GB RAM
Notebook: RTX 4060 | Ryzen R9 7945HX | 32 GB RAM
Notebook: RTX 4060 | Ryzen R9 7945HX | 32 GB RAM
Re: CUDASynth
FPS 92.3LoadPlugin("C:/Program Files (Portable)/dgdecnv/x64 Binaries/DGDecodeNV.dll")
DGSource("I:\test.dgi", fieldop=0, fulldepth=True)
ConvertBits(10)
FPS 133.0import vapoursynth as vs
core = vs.get_core()
core.std.LoadPlugin("C:/Program Files (Portable)/dgdecnv/x64 Binaries/DGDecodeNV.dll")
clip = core.dgdecodenv.DGSource(r'I:\test.dgi', fieldop=0, fulldepth=True)
clip = core.resize.Point(clip, format=vs.YUV420P10)
clip.set_output()
FPS 129.8import vapoursynth as vs
core = vs.get_core()
core.avs.LoadPlugin("C:/Program Files (Portable)/dgdecnv/x64 Binaries/DGDecodeNV.dll")
clip = core.avs.DGSource("I:/test.dgi", fieldop=0, fulldepth=True)
clip = core.resize.Point(clip, format=vs.YUV420P10)
clip.set_output()
Source was a 4K clip
Edit
Avs compatability is 2x faster with cudasynth than with the regular version, 4K sample with DGHDRtoSDR (default) and DGSharpen (default)
Re: CUDASynth
cudasynth in avscompat:
no cudasynth in avscompat:
native DGSource + avscompat DGDenoise and DGSharpen:
I don't know why we have such results, at least I tried to compare with minimum differences in the resource usage (with closed browser, etc).
import vapoursynth as vs
core = vs.get_core()
core.avs.LoadPlugin(r'C:\322\x64\DGDecodeNV.dll')
clip = core.avs.DGSource(r'J:\Darling6\STREAM\EP16.dgi', fdst="gpu0")
clip = core.avs.DGDenoise(clip, fsrc="gpu0", fdst="gpu0")
clip = core.avs.DGSharpen(clip, fsrc="gpu0", fdst="cpu")
clip = core.std.Trim(clip, 0, 6000)
clip.set_output()
no cudasynth in avscompat:
import vapoursynth as vs
core = vs.get_core()
core.avs.LoadPlugin(r'C:\322\x64\DGDecodeNV.dll')
clip = core.avs.DGSource(r'J:\Darling6\STREAM\EP16.dgi")
clip = core.avs.DGDenoise(clip)
clip = core.avs.DGSharpen(clip)
clip = core.std.Trim(clip, 0, 6000)
clip.set_output()
native DGSource + avscompat DGDenoise and DGSharpen:
import vapoursynth as vs
core = vs.get_core()
core.std.LoadPlugin(r'C:\322\x64\DGDecodeNV.dll')
core.avs.LoadPlugin(r'C:\322\x64\DGDecodeNV.dll')
clip = core.dgdecodenv.DGSource(r'J:\Darling6\STREAM\EP16.dgi')
clip = core.avs.DGDenoise(clip)
clip = core.avs.DGSharpen(clip)
clip = core.std.Trim(clip, 0, 6000)
clip.set_output()
I don't know why we have such results, at least I tried to compare with minimum differences in the resource usage (with closed browser, etc).
PC: RTX 2070 | Ryzen R9 5950X (no OC) | 64 GB RAM
Notebook: RTX 4060 | Ryzen R9 7945HX | 32 GB RAM
Notebook: RTX 4060 | Ryzen R9 7945HX | 32 GB RAM
Re: CUDASynth
DJATOM
Two items
Don't know if DGDenoise is actually cudasynth enabled yet
Two items
Don't know if DGDenoise is actually cudasynth enabled yet
should actually beclip = core.avs.DGDenoise(clip, fsrc="gpu0", fdst="gpu0")
clip = core.avs.DGSharpen(clip, fsrc="gpu0", fdst="cpu")
to get the ping pong effectclip = core.avs.DGDenoise(clip, fsrc="gpu0", fdst="gpu1")
clip = core.avs.DGSharpen(clip, fsrc="gpu1", fdst="cpu")
Re: CUDASynth
Oh, I thought gpu0/gpu1 is for 2 cards setup (I have only one).
PC: RTX 2070 | Ryzen R9 5950X (no OC) | 64 GB RAM
Notebook: RTX 4060 | Ryzen R9 7945HX | 32 GB RAM
Notebook: RTX 4060 | Ryzen R9 7945HX | 32 GB RAM
Re: CUDASynth
I only have one card as well
I think it has to do with the pipelines/kernels???
Try it and see if it makes a difference
I think it has to do with the pipelines/kernels???
Try it and see if it makes a difference
Re: CUDASynth
Tried and...
PC: RTX 2070 | Ryzen R9 5950X (no OC) | 64 GB RAM
Notebook: RTX 4060 | Ryzen R9 7945HX | 32 GB RAM
Notebook: RTX 4060 | Ryzen R9 7945HX | 32 GB RAM
Re: CUDASynth
Could you check on what your GPU usage is while running the script?
Re: CUDASynth
Thanks, guys, awesome!
DGDenoise and DGSharpen are both CUDASynth-enabled.
Meanwhile, there is another limitation I discovered. Some 3rd party players and encode apps open the script multiple times. That will not work with CUDASynth as currently designed because there can be only one pipeline. I think I can fix that up fairly easily by having only the first source filter set up the framework.
Also, I have CUDASynth-enabled DGPQtoHLG. I'll make a release tomorrow after some testing.
DGDenoise and DGSharpen are both CUDASynth-enabled.
Meanwhile, there is another limitation I discovered. Some 3rd party players and encode apps open the script multiple times. That will not work with CUDASynth as currently designed because there can be only one pipeline. I think I can fix that up fairly easily by having only the first source filter set up the framework.
Also, I have CUDASynth-enabled DGPQtoHLG. I'll make a release tomorrow after some testing.
Re: CUDASynth
Extremely nice work, DG. Thank you.
edit:To allay my lack of clarity, in the context of the new pipeline enabled DGDecodeNV.dll and the aforementioned test scripts with like
(a)
and
(b)
edit: added LoadPlugin to snippet (b) for clarity
and per the CUDASynth.txt "* Vapoursynth is not yet supported", would be correct to say the original non-cudasynth DGDecodeNV.dll is used in snippet (a) with ".dgdecodenv." and cudasynth DGDecodeNV.dll in snippet (b) with ".avs." ?
Hmm, I must not have caught up with the latest as my (long not updated) scripts have continued to use "core.avs.LoadPlugin" and "core.avs.DGSource" rather than "core.std.LoadPlugin" and "core.dgdecodenv.DGSource" ... damn, get I must get in from the scrub outa the midday sun. https://www.youtube.com/embed/z2YvYiWto ... &version=3
edit:To allay my lack of clarity, in the context of the new pipeline enabled DGDecodeNV.dll and the aforementioned test scripts with like
(a)
Code: Select all
core.std.LoadPlugin("C:/Program Files (Portable)/dgdecnv/x64 Binaries/DGDecodeNV.dll")
clip = core.dgdecodenv.DGSource(r'I:\test.dgi', fieldop=0, fulldepth=True)
(b)
Code: Select all
core.avs.LoadPlugin(r'C:\322\x64\DGDecodeNV.dll')
clip = core.avs.DGSource(r'J:\Darling6\STREAM\EP16.dgi', fdst="gpu0")
clip = core.avs.DGDenoise(clip, fsrc="gpu0", fdst="gpu0")
and per the CUDASynth.txt "* Vapoursynth is not yet supported", would be correct to say the original non-cudasynth DGDecodeNV.dll is used in snippet (a) with ".dgdecodenv." and cudasynth DGDecodeNV.dll in snippet (b) with ".avs." ?
Hmm, I must not have caught up with the latest as my (long not updated) scripts have continued to use "core.avs.LoadPlugin" and "core.avs.DGSource" rather than "core.std.LoadPlugin" and "core.dgdecodenv.DGSource" ... damn, get I must get in from the scrub outa the midday sun. https://www.youtube.com/embed/z2YvYiWto ... &version=3
I really do like it here.
Re: CUDASynth
Yesand per the CUDASynth.txt "* Vapoursynth is not yet supported", would be correct to say the original non-cudasynth DGDecodeNV.dll is used in snippet (a) with ".dgdecodenv." and cudasynth DGDecodeNV.dll in snippet (b) with ".avs." ?
Re: CUDASynth
No. The dgdecodenv versus avs decides whether the referenced DLL is invoked natively or via the avscompat layer. Either way, you would still load the same DLL. However, the CUDASynth DLL can only be loaded with avscompat at this time. If you omit the load plugin call then you could pick up something from autoloading. I recommend always using explicit loading.and per the CUDASynth.txt "* Vapoursynth is not yet supported", would be correct to say the original non-cudasynth DGDecodeNV.dll is used in snippet (a) with ".dgdecodenv." and cudasynth DGDecodeNV.dll in snippet (b) with ".avs." ?
Re: CUDASynth
Snippet (a) is the one I used in the benchmarking you asked for, and it uses to original non-cudasynth dll
Snippet (b) is the one from DJATOM's testing of the cudasynth dll
Snippet (b) is the one from DJATOM's testing of the cudasynth dll