CUDASynth

These CUDA filters are packaged into DGDecodeNV, which is part of DGDecNV.
User avatar
Rocky
Posts: 3621
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

I knew about that after a former member suggested it should be done to me. Guess why he is former. ;)
User avatar
hydra3333
Posts: 406
Joined: Wed Oct 06, 2010 3:34 am
Contact:

CUDASynth

Post by hydra3333 »

Please forgive me if I enquire on the status of and the ins and outs of the much anticipated cudasynth/dgdenoise et al :)
I feel like I'm an excited little kid again ... are we there yet ?
I really do like it here.
User avatar
Rocky
Posts: 3621
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

Happy to oblige. The NLM part is fine and isn't changing from test3.

We made some good progress with the CUDA temporal filter. As mentioned we prioritize speed but of course still require that the filtering is useful and doesn't create artifacts. Going to a full block-matching kind of motion compensation is not consistent with that, because the speed would not be satisfying. Then the question is whether there is a way to achieve our goal, i.e., to have good to great filtering at high speed.

Those of you that were around in the early days of desktop image processing starting with VirtualDub filters may remember a filter called TemporalCleaner by Jim Casaburi. The concept was simple:

---
For each pixel in the current frame, if the difference between the previous frame's corresponding pixel and the current pixel is below a threshold then replace the current pixel by the average of the previous and current pixels, otherwise keep the current pixel.
---

https://avisynth.org.ru/docs/english/ex ... leaner.htm

This sounds naive by modern standards but was a useful advance at the time, when desktop image processing was just getting started. Indeed, while sometimes improving compressibility for MPEG2, this idea performs poorly for denoising, with the threshold being critical. Nevertheless, it is a reasonable fundamental approach that can be easily improved.

Let's consider why it performed poorly. If the threshold was too large, then you would get visible motion blur in the filtered video. If it was too small, the blur would be removed but only noise amplitudes less than the threshold would be filtered. So you couldn't have both good filtering and no blurring.

Attempts were made at the time to solve this by doing scene change detection. At a scene change the original pixels would be used. While this improved things by suppressing blurring across scene changes, it did nothing for motion not involving the entire frame, such as a moving object in an otherwise static frame. Learn to love motion blur? No!

So, our insight was to in effect apply scene change detection but only considering a window around the current pixel. If the neighborhood is changing enough, determined by a threshold, then the original pixel is used, otherwise it is filtered. The problem with the original TemporalCleaner was that the "scene change detection window" was just the current pixel, leading to the conflict between filtering and blurring.

We implemented this idea in our CUDA filter. We used a 5x5 window around the current pixel, and accumulated a change metric between the current and previous frames. If the metric is under a threshold, the pixel is filtered, otherwise not. In order to increase the filtering effect, when below threshold, we averaged the current pixel with the two previous frames.

This algorithm performs surprisingly well. We also implemented an option to show the motion map to allow for tweaking of the user-configurable threshold. If all this reminds you of our early work on Smart Deinterlacer, it means you are an old codger with a great memory. It still remains to determine the best window size for motion detection, but 5x5 works very well. The algorithm as a whole delivers a good tradeoff between quality and speed.

I hope to be able to give you a test4 build with this tomorrow. BTW, even though we removed the SIMD temporal median algorithm, we are going to keep the multiple-CPU DLLs, as there are performance gains from the SIMD for other things that the compiler applies.
User avatar
Curly
Posts: 716
Joined: Sun Mar 15, 2020 11:05 am

CUDASynth

Post by Curly »

sounds gud Rock way 2 go

quik report
i won big at the rulette table 2nite
the ladies are swarming me
more later knerk
Curly Howard
Director of EAC3TO Development
User avatar
hydra3333
Posts: 406
Joined: Wed Oct 06, 2010 3:34 am
Contact:

CUDASynth

Post by hydra3333 »

Cool !

Feel free to ignore:

Speaking of roulette tables, just wondering ... over at d9, cudasynth post p=1997434#post1997434 says in part
if CUDA will provide ME data from hardware ASIC and you can use it to make motion compensated frames you can use temporal median as final output stage (or simple weighted averaging as in MDegrain). As I read NVIDIA also provide some API for CUDA-programmers to get ME data from MPEG encoder ASIC where available. Other implementation is for DX12.
See https://docs.nvidia.com/video-techno...ide/index.html

Motion Estimation Only Mode

NVENC can be used as a hardware accelerator to perform motion search and generate motion vectors and mode information. The resulting motion vectors or mode decisions can be used, for example, in motion compensated filtering
I didn't readily spot further commentary, so :
not sure if that is relevant or useful, for example if possible to use the nvenc asic for ME then is the relative cost of transiting the PCIE "negligible enough" for reasonable speed vs CPU-only ME, and even if obtainable could the ME vectors be easily used or not in the cudasynth architecture, and would it yield superior enough results for any potential effort to implement.

edit: ps: if there's an ffmpeg nvenc encode going on at the same time (almost certain) then could that interfere with the asic concurrently performing the ME ?

Just wondering.

Dear Curly, girlfriends can be expensive; just a thought, perhaps choose ones with less makeup, there's a reasonable chance they'll still look OK when you wake up. Natural beauties like Britney.
I really do like it here.
User avatar
Rocky
Posts: 3621
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

I can't answer for Curly but regarding the ME stuff, I live in the real world. Just because someone (not you, hydra3333) expatiates about something in a manner designed to demonstrate purported expertise or preeminence in a domain, it doesn't necessarily mean that the ideas are sound or practical. Let that person show the proof of concept.
User avatar
Curly
Posts: 716
Joined: Sun Mar 15, 2020 11:05 am

CUDASynth

Post by Curly »

gud advice hydra3333 amigo mio thx
gonna try blackjack 2nite
Curly Howard
Director of EAC3TO Development
User avatar
hydra3333
Posts: 406
Joined: Wed Oct 06, 2010 3:34 am
Contact:

CUDASynth

Post by hydra3333 »

OK and thanks for clarifying, current course maintained, cool.

If you mean me it's zero friends and I'm ok with that, being on the lower end of the bell curve for many things :)
I've had my fair share of "well that didn't work, it seemed like a good idea at the time to have a go with, let's try a different tack" ;)
Just happy you're doing cool stuff !

Cheers
I really do like it here.
User avatar
Curly
Posts: 716
Joined: Sun Mar 15, 2020 11:05 am

CUDASynth

Post by Curly »

Rocky was not referring to you, m8! I hope we are friends.
Curly Howard
Director of EAC3TO Development
User avatar
Rocky
Posts: 3621
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

Some clarifications on performance:

Previously we had searchw 5/7/9. Now we have good/better/best. The good/better/best are scaled higher than searchw. That means "good" is already equivalent to searchw = 9. So for accurate comparison to the previous searchw = 9 you have to choose "good". Settings "better" and "best" provide additional high-quality settings that we did not have before.

You also have to turn off temporal denoising in the new version to compare to the previous version, because the previous version did not have temporal denoising.

Another thing not to forget is that the more filters in the chain, the more relative speed improvement there will be, as more and more expensive PCIe transfers are avoided. This is why I intend to add more filters, even those supplied by third parties. If not too difficult, I'll even port them to CUDA. The case with one filter doesn't showcase the CUDASynth philosophy, but it already shows signifcant gains. The filter chain with HDR to SDR and denoising starts to get quite impressive, for example.
User avatar
Rocky
Posts: 3621
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

Here is test4 adding the new temporal denoiser. Refer to the Notes.txt file for details.

Note that the standalone DGDenoise() filter has not yet been updated with the new filter.

https://rationalqm.us/misc/DGDecodeNV_test4.zip

I think I'll add DGSharpen() next.
User avatar
hydra3333
Posts: 406
Joined: Wed Oct 06, 2010 3:34 am
Contact:

CUDASynth

Post by hydra3333 »

Thanks. Creating test scripts now to try it out. dgsharpen, cool ! dgdeblock after that ? ;)
I really do like it here.
User avatar
hydra3333
Posts: 406
Joined: Wed Oct 06, 2010 3:34 am
Contact:

CUDASynth

Post by hydra3333 »

Hi, I must be doing something wrong, advice would be welcomed.

Using the same source and almost the same .vpy script, I seem to get getting 1/2 speed from the new cudasynth compared to pre-cudasynth.
109fps vs 52 fps.
Perhaps it is because I specified dn_quality="best" ?

I have a 3900X with 32Gb, and an 8Gb 2060-Super video card.
https://www.msi.com/Graphics-Card/GeFor ... cification

Here is a log showing the .vpy scripts etc, in the hope you may find time to consider it.

Code: Select all

REM ******** characteristics of the media file

G:\HDTV\DGtest>"C:\SOFTWARE\MediaInfo\MediaInfo.exe" --Legacy "!QSF_VIDEO!" 
General
Complete name                            : G:\HDTV\DGtest\H264_INTERLACED.qsf.mp4
Format                                   : MPEG-4
Format profile                           : Base Media
Codec ID                                 : isom (isom/iso2/avc1/mp41)
File size                                : 770 MiB
Duration                                 : 36 min 48 s
Overall bit rate                         : 2 927 kb/s
Frame rate                               : 25.000 FPS
Writing application                      : VideoReDo (Lavf58.29.100)

Video
ID                                       : 1
Format                                   : AVC
Format/Info                              : Advanced Video Codec
Format profile                           : High@L4
Format settings                          : CABAC / 4 Ref Frames
Format settings, CABAC                   : Yes
Format settings, Reference frames        : 4 frames
Format settings, GOP                     : M=4, N=28
Codec ID                                 : avc1
Codec ID/Info                            : Advanced Video Coding
Duration                                 : 36 min 48 s
Bit rate                                 : 2 665 kb/s
Width                                    : 1 920 pixels
Height                                   : 1 080 pixels
Display aspect ratio                     : 16:9
Frame rate mode                          : Variable
Frame rate                               : 25.000 FPS
Minimum frame rate                       : 25.000 FPS
Maximum frame rate                       : 90 000.000 FPS
Standard                                 : Component
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Bit depth                                : 8 bits
Scan type                                : MBAFF
Scan type, store method                  : Interleaved fields
Scan order                               : Top Field First
Bits/(Pixel*Frame)                       : 0.051
Stream size                              : 702 MiB (91%)
Color range                              : Limited
Codec configuration box                  : avcC

Audio
ID                                       : 2
Format                                   : MPEG Audio
Format version                           : Version 1
Format profile                           : Layer 2
Codec ID                                 : mp4a-6B
Duration                                 : 36 min 48 s
Bit rate mode                            : Constant
Bit rate                                 : 256 kb/s
Channel(s)                               : 2 channels
Sampling rate                            : 48.0 kHz
Compression mode                         : Lossy
Stream size                              : 67.4 MiB (9%)
Language                                 : English
Default                                  : Yes
Alternate group                          : 1

G:\HDTV\DGtest>"!vapoursynth_root!\DGIndex\DGIndexNV.exe" -version  
DGIndexNV 251.0.0.0 (64 bit)

G:\HDTV\DGtest>"!vapoursynth_root!\DGIndex\DGIndexNV.exe" -i "!QSF_VIDEO!" -e -h -o "!_DGI_FILE!" 
Project
100

G:\HDTV\DGtest>TYPE "!_DGI_LOG!" 
Stream Type: MP4
Video Type: AVC
Profile: High
Level: 4
Coded Size: 1920x1088
Display Size: 1920x1080
PAR: 1:1
Frame Rate: 25.000000 fps
Colorimetry: Unknown [2]
Frame Structure: 
Field Order: 
Frame Type: 
Frame Coding: 
Coded Number: 55202
Playback Number: 55202
Frame Repeats: 0
Field Repeats: 0
Film Percent: 0.00
Bitrate: 
Bitrate (Avg): 2.602
Bitrate (Max): 
Elapsed: 
Remain: 0:00:00
FPS: 
Info: Finished!


REM ******** pre-CudaSynth 


G:\HDTV\DGtest>TYPE "!_OLD_VPY_file!"  2>&1 
import vapoursynth as vs		# this allows use of constants eg vs.YUV420P8 
from vapoursynth import core	# actual vapoursynth core 
#import functool 
#import mvsfunc as mvs			# this relies on the .py residing at the VS folder root level - see run_vsrepo.bat 
#import havsfunc as haf		# this relies on the .py residing at the VS folder root level - see run_vsrepo.bat 
core.std.LoadPlugin(r'C:\SOFTWARE\Vapoursynth-x64\DGIndex\DGDecodeNV.dll') # do it like gonca https://forum.doom9.org/showthread.php?p=1877765#post1877765 
core.avs.LoadPlugin(r'C:\SOFTWARE\Vapoursynth-x64\DGIndex\DGDecodeNV.dll') # do it like gonca https://forum.doom9.org/showthread.php?p=1877765#post1877765 
# NOTE: deinterlace=1, use_top_field=True for "Interlaced"/"TFF" "MPEG-2V"/"MPA1L2" 
video = core.dgdecodenv.DGSource(r'G:\HDTV\DGtest\H264_INTERLACED.DGI', deinterlace=1, use_top_field=True, use_pf=False) 
# DGDecNV changes - 
# 2020.10.21 Added new parameters cstrength and cblend to independently control the chroma denoising. 
# 2020.11.07 Revised DGDenoise parameters. The 'chroma' option is removed. 
#            Now, if 'strength' is set to 0.0 then luma denoising is disabled, 
#            and if cstrength is set to 0.0 then chroma denoising is disabled. 
#            'cstrength' is now defaulted to 0.0, and 'searchw' is defaulted to 9. 
# example: video = core.avs.DGDenoise(video, strength=0.06, cstrength=0.06) # replaced chroma=True 
video = core.avs.DGDenoise(video, strength=0.06, cstrength=0.06) 
video = core.avs.DGSharpen(video, strength=0.2) 
#video = vs.core.text.ClipInfo(video) 
video.set_output() 

G:\HDTV\DGtest>"!old_vspipeexe64!" --version 
VapourSynth Video Processing Library
Copyright (c) 2012-2023 Fredrik Mellbin
Core R65
API R4.0
API R3.6
Options: -

G:\HDTV\DGtest>"!old_vspipeexe64!" --info "!_OLD_VPY_file!" 
Width: 1920
Height: 1080
Frames: 55202
FPS: 25/1 (25.000 fps)
Format Name: YUV420P8
Color Family: YUV
Alpha: No
Sample Type: Integer
Bits: 8
SubSampling W: 1
SubSampling H: 1

G:\HDTV\DGtest>"!old_vspipeexe64!" --filter-time --container y4m "!_OLD_VPY_file!" -- 
Output 55202 frames in 505.90 seconds (109.12 fps)
Filtername           Filter mode   Time (%)   Time (s)
DGDenoise            parreq          99.81     504.93
DGSharpen            parreq          39.12     197.91
DGSource             unordered       25.17     127.34


REM ******** CudaSynth 


G:\HDTV\DGtest>TYPE "!_VPY_file!"  2>&1 
import vapoursynth as vs		# this allows use of constants eg vs.YUV420P8 
from vapoursynth import core	# actual vapoursynth core 
#import functool 
#import mvsfunc as mvs			# this relies on the .py residing at the VS folder root level - see run_vsrepo.bat 
#import havsfunc as haf		# this relies on the .py residing at the VS folder root level - see run_vsrepo.bat 
core.std.LoadPlugin(r'G:\HDTV\DGtest\Vapoursynth-x64\DGIndex\DGDecodeNV.dll') # do it like gonca https://forum.doom9.org/showthread.php?p=1877765#post1877765 
core.avs.LoadPlugin(r'G:\HDTV\DGtest\Vapoursynth-x64\DGIndex\DGDecodeNV.dll') # do it like gonca https://forum.doom9.org/showthread.php?p=1877765#post1877765 
# NOTE: deinterlace=1, use_top_field=True for "Interlaced"/"TFF" "h264"/"AC3" 
# dn_enable=x DENOISE 
# default 0  0: disabled  1: spatial denoising only  2: temporal denoising only  3: spatial and temporal denoising 
# dn_quality="x" default "good"    "good" "better" "best" 
video = core.dgdecodenv.DGSource( r'G:\HDTV\DGtest\H264_INTERLACED.DGI', deinterlace=1, use_top_field=True, use_pf=False, dn_enable=1, dn_quality="best" ) 
video = core.avs.DGSharpen( video, strength=0.2 ) 
#video = vs.core.text.ClipInfo(video) 
video.set_output() 

G:\HDTV\DGtest>"!vspipeexe64!" --version 
VapourSynth Video Processing Library
Copyright (c) 2012-2023 Fredrik Mellbin
Core R65
API R4.0
API R3.6
Options: -

G:\HDTV\DGtest>"!vspipeexe64!" --info "!_VPY_file!" 
Width: 1920
Height: 1080
Frames: 55202
FPS: 25/1 (25.000 fps)
Format Name: YUV420P8
Color Family: YUV
Alpha: No
Sample Type: Integer
Bits: 8
SubSampling W: 1
SubSampling H: 1

G:\HDTV\DGtest>"!vspipeexe64!" --filter-time --container y4m "!_VPY_file!" -- 
Output 55202 frames in 1047.22 seconds (52.71 fps)
Filtername           Filter mode   Time (%)   Time (s)
DGSource             unordered       99.86    1045.74
DGSharpen            parreq          15.64     163.79
I really do like it here.
User avatar
Rocky
Posts: 3621
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

Yes, it's because of "best". I posted about that a few posts ago. Maybe you missed it, please read it. For comparison you have to use "good" and disable temporal. "best" is a new very high but very slow mode, and the old version does not have temporal denoising.
User avatar
hydra3333
Posts: 406
Joined: Wed Oct 06, 2010 3:34 am
Contact:

CUDASynth

Post by hydra3333 »

Thank you. OK, re-read it.
Yes "best" is an invalid setting for direct comparison ... "best" did yield some sense of the cost of better spatial-only denoising which I was hoping to add for some "minimal" penalty :)
I may be prepared to live with halving the transcoding fps (eg for spatial denoising), or perhaps even more for spatial & temporal, depending on what I see with my usual media files.

Oh well, a more direct comparison using "good" on my PC is below.
Unless I did something silly, it is a nice comparative fps result for CUDASynth.

Interestingly, DGSharpen time appeared to go down a bit under CUDASynth.

pre-CUDASynth

Code: Select all

G:\HDTV\DGtest>TYPE "!_OLD_VPY_file!"  2>&1 
import vapoursynth as vs		# this allows use of constants eg vs.YUV420P8 
from vapoursynth import core	# actual vapoursynth core 
#import functool 
#import mvsfunc as mvs			# this relies on the .py residing at the VS folder root level - see run_vsrepo.bat 
#import havsfunc as haf		# this relies on the .py residing at the VS folder root level - see run_vsrepo.bat 
core.std.LoadPlugin(r'C:\SOFTWARE\Vapoursynth-x64\DGIndex\DGDecodeNV.dll') # do it like gonca https://forum.doom9.org/showthread.php?p=1877765#post1877765 
core.avs.LoadPlugin(r'C:\SOFTWARE\Vapoursynth-x64\DGIndex\DGDecodeNV.dll') # do it like gonca https://forum.doom9.org/showthread.php?p=1877765#post1877765 
# NOTE: deinterlace=1, use_top_field=True for "Interlaced"/"TFF" "MPEG-2V"/"MPA1L2" 
video = core.dgdecodenv.DGSource(r'G:\HDTV\DGtest\H264_INTERLACED.DGI', deinterlace=1, use_top_field=True, use_pf=False) 
# DGDecNV changes - 
# 2020.10.21 Added new parameters cstrength and cblend to independently control the chroma denoising. 
# 2020.11.07 Revised DGDenoise parameters. The 'chroma' option is removed. 
#            Now, if 'strength' is set to 0.0 then luma denoising is disabled, 
#            and if cstrength is set to 0.0 then chroma denoising is disabled. 
#            'cstrength' is now defaulted to 0.0, and 'searchw' is defaulted to 9. 
# example: video = core.avs.DGDenoise(video, strength=0.06, cstrength=0.06) # replaced chroma=True 
video = core.avs.DGDenoise(video, strength=0.06, cstrength=0.06) 
video = core.avs.DGSharpen(video, strength=0.2) 
#video = vs.core.text.ClipInfo(video) 
video.set_output()
run (a)

Code: Select all

G:\HDTV\DGtest>"!old_vspipeexe64!" --filter-time --container y4m "!_OLD_VPY_file!" -- 
Output 55202 frames in 497.50 seconds (110.96 fps)
Filtername           Filter mode   Time (%)   Time (s)
DGDenoise            parreq          99.82     496.63
DGSharpen            parreq          39.34     195.71
DGSource             unordered       25.48     126.77
run (b)

Code: Select all

G:\HDTV\DGtest>"!old_vspipeexe64!" --filter-time --container y4m "!_OLD_VPY_file!" -- 
Output 55202 frames in 498.67 seconds (110.70 fps)
Filtername           Filter mode   Time (%)   Time (s)
DGDenoise            parreq          99.83     497.82
DGSharpen            parreq          39.31     196.04
DGSource             unordered       25.43     126.82
CUDASynth using "good"

Code: Select all

G:\HDTV\DGtest>TYPE "!_VPY_file!"  2>&1 
import vapoursynth as vs		# this allows use of constants eg vs.YUV420P8 
from vapoursynth import core	# actual vapoursynth core 
#import functool 
#import mvsfunc as mvs			# this relies on the .py residing at the VS folder root level - see run_vsrepo.bat 
#import havsfunc as haf		# this relies on the .py residing at the VS folder root level - see run_vsrepo.bat 
core.std.LoadPlugin(r'G:\HDTV\DGtest\Vapoursynth-x64\DGIndex\DGDecodeNV.dll') # do it like gonca https://forum.doom9.org/showthread.php?p=1877765#post1877765 
core.avs.LoadPlugin(r'G:\HDTV\DGtest\Vapoursynth-x64\DGIndex\DGDecodeNV.dll') # do it like gonca https://forum.doom9.org/showthread.php?p=1877765#post1877765 
# NOTE: deinterlace=1, use_top_field=True for "Interlaced"/"TFF" "h264"/"AC3" 
# dn_enable=x DENOISE 
# default 0  0: disabled  1: spatial denoising only  2: temporal denoising only  3: spatial and temporal denoising 
# dn_quality="x" default "good"    "good" "better" "best" ... "best" halves the speed compared pre-CUDASynth 
#video = core.dgdecodenv.DGSource( r'G:\HDTV\DGtest\H264_INTERLACED.DGI', deinterlace=1, use_top_field=True, use_pf=False, dn_enable=1, dn_quality="best" ) 
video = core.dgdecodenv.DGSource( r'G:\HDTV\DGtest\H264_INTERLACED.DGI', deinterlace=1, use_top_field=True, use_pf=False, dn_enable=1, dn_quality="good" ) 
video = core.avs.DGSharpen( video, strength=0.2 ) 
#video = vs.core.text.ClipInfo(video) 
video.set_output() 
run (a)

Code: Select all

G:\HDTV\DGtest>"!vspipeexe64!" --filter-time --container y4m "!_VPY_file!" -- 
Output 55202 frames in 331.33 seconds (166.61 fps)
Filtername           Filter mode   Time (%)   Time (s)
DGSource             unordered       99.63     330.10
DGSharpen            parreq          49.72     164.73
run (b)

Code: Select all

G:\HDTV\DGtest>"!vspipeexe64!" --filter-time --container y4m "!_VPY_file!" -- 
Output 55202 frames in 332.64 seconds (165.95 fps)
Filtername           Filter mode   Time (%)   Time (s)
DGSource             unordered       99.62     331.38
DGSharpen            parreq          49.61     165.04

When I comment out DGsharpen in both, leaving only dgsource with denoising, there's a fair increase in fps with CUDASynth !

pre-CUDASynth

Code: Select all

G:\HDTV\DGtest>"!old_vspipeexe64!" --filter-time --container y4m "!_OLD_VPY_file!" -- 
Output 55202 frames in 386.83 seconds (142.70 fps)
Filtername           Filter mode   Time (%)   Time (s)
DGDenoise            parreq          99.89     386.40
DGSource             unordered       31.77     122.89
CUDASynth using "good"

Code: Select all

G:\HDTV\DGtest>"!vspipeexe64!" --filter-time --container y4m "!_VPY_file!" -- 
Output 55202 frames in 249.27 seconds (221.46 fps)
Filtername           Filter mode   Time (%)   Time (s)
DGSource             unordered       99.74     248.62
I really do like it here.
User avatar
Rocky
Posts: 3621
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

Thank you for your testing m8! Maybe "better" would be a good compromise for you between speed and quality. I'll look into your Sharpen() observation and report back.
User avatar
hydra3333
Posts: 406
Joined: Wed Oct 06, 2010 3:34 am
Contact:

CUDASynth

Post by hydra3333 »

Pragmatically, with nvenc encoding:

pre-CUDASynth

Code: Select all

G:\HDTV\DGtest>set bitrate=3000000 
G:\HDTV\DGtest>set min_bitrate=500000 
G:\HDTV\DGtest>set max_bitrate=6000000 
G:\HDTV\DGtest>set bufsize=!max_bitrate! 
G:\HDTV\DGtest>"!old_vspipeexe64!" --container y4m --filter-time "!_OLD_VPY_file!" -   | "!old_ffmpegexe64_OpenCL!" -hide_banner -v verbose -nostats -f yuv4mpegpipe -i pipe: -probesize 200M -analyzeduration 200M  -i "!QSF_VIDEO!" -map 0:v:0 -map 1:a:0 -vf "setdar=16/9" -fps_mode passthrough -sws_flags lanczos+accurate_rnd+full_chroma_int+full_chroma_inp -strict experimental -c:v h264_nvenc -pix_fmt nv12 -preset p7 -multipass fullres -forced-idr 1 -g 25 -coder:v cabac -spatial-aq 1 -temporal-aq 1 -dpb_size 0 -bf:v 3 -b_ref_mode:v 0 -rc:v vbr -cq:v 0 -b:v !bitrate! -minrate:v !min_bitrate! -maxrate:v !max_bitrate! -bufsize !bufsize! -profile:v high -level 5.2 -movflags +faststart+write_colr  -c:a libfdk_aac -cutoff 18000 -ab 256k -ar 48000 -y  "!OLD_TARGET_VIDEO!" 
...
Output 55202 frames in 548.11 seconds (100.71 fps)
Filtername           Filter mode   Time (%)   Time (s)
DGDenoise            parreq          99.84     547.22
DGSharpen            parreq          36.14     198.06
DGSource             unordered       24.80     135.91
...

CUDASynth using both spatial-"good" plus temporal denoising

Code: Select all

G:\HDTV\DGtest>set bitrate=3000000 
G:\HDTV\DGtest>set min_bitrate=500000 
G:\HDTV\DGtest>set max_bitrate=6000000 
G:\HDTV\DGtest>set bufsize=!max_bitrate! 
G:\HDTV\DGtest>"!vspipeexe64!" --container y4m --filter-time "!_VPY_file!" -   | "!ffmpegexe64_OpenCL!" -hide_banner -v verbose -nostats -f yuv4mpegpipe -i pipe: -probesize 200M -analyzeduration 200M  -i "!QSF_VIDEO!" -map 0:v:0 -map 1:a:0 -vf "setdar=16/9" -fps_mode passthrough -sws_flags lanczos+accurate_rnd+full_chroma_int+full_chroma_inp -strict experimental -c:v h264_nvenc -pix_fmt nv12 -preset p7 -multipass fullres -forced-idr 1 -g 25 -coder:v cabac -spatial-aq 1 -temporal-aq 1 -dpb_size 0 -bf:v 3 -b_ref_mode:v 0 -rc:v vbr -cq:v 0 -b:v !bitrate! -minrate:v !min_bitrate! -maxrate:v !max_bitrate! -bufsize !bufsize! -profile:v high -level 5.2 -movflags +faststart+write_colr  -c:a libfdk_aac -cutoff 18000 -ab 256k -ar 48000 -y  "!TARGET_VIDEO!" 
...
Output 55202 frames in 409.71 seconds (134.74 fps)
Filtername           Filter mode   Time (%)   Time (s)
DGSource             unordered       98.47     403.44
DGSharpen            parreq          41.01     168.01
...

There was a significant fps increase using CUDASynth even when adding temporal denoising into the mix.

(note: haven't visually checked the resulting .mp4 files)
I really do like it here.
User avatar
Rocky
Posts: 3621
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

Sweet! Things would look even better with more filters in the chain. You can test again after I add Sharpen(), or do something that requires HDRtoSDR().

DGDeblock() could be a thing. ;)
User avatar
hydra3333
Posts: 406
Joined: Wed Oct 06, 2010 3:34 am
Contact:

CUDASynth

Post by hydra3333 »

Rocky wrote:
Fri Feb 16, 2024 7:10 am
do something that requires HDRtoSDR()
Am looking forward to transcoding h.265 hdr10+ video (it says vfr but we know it is cfr) from my samsung S22 phone in the next couple of hours once I change its settings to record that ... that'll be the future use of HDRtoSDR use for me.

Having said that, I notice the phone can convert/transcode, but who knows how well etc as it's all "hidden" ... much prefer DG stuff as there's more options to set.
https://www.samsung.com/us/support/answer/ANS00086003/
I really do like it here.
User avatar
Sherman
Posts: 578
Joined: Mon Jan 06, 2020 10:19 pm

CUDASynth

Post by Sherman »

hydra3333 wrote:
Fri Feb 16, 2024 7:25 am
Having said that, I notice the phone can convert/transcode, but who knows how well
It will be interesting to compare them.
Sherman Peabody
Director of Linux Development
User avatar
hydra3333
Posts: 406
Joined: Wed Oct 06, 2010 3:34 am
Contact:

CUDASynth

Post by hydra3333 »

Hello. I can't quite figure out why the "old" (non-CUDASynth) dgsource works whereas the new CUDASynth seems to do something different.

The log below shows what happens in the "old" (which works as expected) vs new CUDASynth (which appears to stop at the first frame).
They both use the same .dgi file created from a 576i interlaced mpeg2 input.
I pared the CUDASynth .vpy down to just dgsource so it seems to be related to that.
It does the same thing with and without the "--progress " vspipe option.

Interestingly, per logs in a prior post, a different file with h.264 which has larger dimensions works fine.

No doubt I'm doing something wrong, just what escapes me at the moment.

Code: Select all

G:\HDTV\DGtest>TYPE "!_OLD_VPY_file!"  2>&1 
import vapoursynth as vs		# this allows use of constants eg vs.YUV420P8 
from vapoursynth import core	# actual vapoursynth core 
#import functool 
#import mvsfunc as mvs			# this relies on the .py residing at the VS folder root level - see run_vsrepo.bat 
#import havsfunc as haf		# this relies on the .py residing at the VS folder root level - see run_vsrepo.bat 
core.std.LoadPlugin(r'C:\SOFTWARE\Vapoursynth-x64\DGIndex\DGDecodeNV.dll') # do it like gonca https://forum.doom9.org/showthread.php?p=1877765#post1877765 
core.avs.LoadPlugin(r'C:\SOFTWARE\Vapoursynth-x64\DGIndex\DGDecodeNV.dll') # do it like gonca https://forum.doom9.org/showthread.php?p=1877765#post1877765 
# NOTE: deinterlace=1, use_top_field=True for "Interlaced"/"TFF" 
video = core.dgdecodenv.DGSource(r'G:\HDTV\DGtest\MPEG2_INTERLACED.DGI', deinterlace=1, use_top_field=True, use_pf=False) 
# DGDecNV changes - 
# 2020.10.21 Added new parameters cstrength and cblend to independently control the chroma denoising. 
# 2020.11.07 Revised DGDenoise parameters. The 'chroma' option is removed. 
#            Now, if 'strength' is set to 0.0 then luma denoising is disabled, 
#            and if cstrength is set to 0.0 then chroma denoising is disabled. 
#            'cstrength' is now defaulted to 0.0, and 'searchw' is defaulted to 9. 
# example: video = core.avs.DGDenoise(video, strength=0.06, cstrength=0.06) # replaced chroma=True 
video = core.avs.DGDenoise(video, strength=0.06, cstrength=0.06) 
video = core.avs.DGSharpen(video, strength=0.2) 
#video = vs.core.text.ClipInfo(video) 
video.set_output() 

G:\HDTV\DGtest>"!old_vspipeexe64!" --version 
VapourSynth Video Processing Library
Copyright (c) 2012-2023 Fredrik Mellbin
Core R65
API R4.0
API R3.6
Options: -

G:\HDTV\DGtest>"!old_vspipeexe64!" --info "!_OLD_VPY_file!" 
Width: 720
Height: 576
Frames: 56659
FPS: 25/1 (25.000 fps)
Format Name: YUV420P8
Color Family: YUV
Alpha: No
Sample Type: Integer
Bits: 8
SubSampling W: 1
SubSampling H: 1

G:\HDTV\DGtest>"!old_vspipeexe64!" --filter-time --container y4m "!_OLD_VPY_file!" -- 
Output 56659 frames in 132.31 seconds (428.24 fps)
Filtername           Filter mode   Time (%)   Time (s)
DGDenoise            parreq          99.45     131.57
DGSource             unordered       57.57      76.16
DGSharpen            parreq          21.22      28.08

Code: Select all

G:\HDTV\DGtest>TYPE "!_VPY_file!"  2>&1 
import vapoursynth as vs		# this allows use of constants eg vs.YUV420P8 
from vapoursynth import core	# actual vapoursynth core 
#import functool 
#import mvsfunc as mvs			# this relies on the .py residing at the VS folder root level - see run_vsrepo.bat 
#import havsfunc as haf		# this relies on the .py residing at the VS folder root level - see run_vsrepo.bat 
core.std.LoadPlugin(r'G:\HDTV\DGtest\Vapoursynth-x64\DGIndex\DGDecodeNV.dll') # do it like gonca https://forum.doom9.org/showthread.php?p=1877765#post1877765 
core.avs.LoadPlugin(r'G:\HDTV\DGtest\Vapoursynth-x64\DGIndex\DGDecodeNV.dll') # do it like gonca https://forum.doom9.org/showthread.php?p=1877765#post1877765 
# NOTE: deinterlace=1, use_top_field=True for "Interlaced"/"TFF" 
# dn_enable=x DENOISE 
# default 0  0: disabled  1: spatial denoising only  2: temporal denoising only  3: spatial and temporal denoising 
# dn_quality="x" default "good"    "good" "better" "best" ... "best" halves the speed compared pre-CUDASynth 
#video = core.dgdecodenv.DGSource( r'G:\HDTV\DGtest\MPEG2_INTERLACED.DGI', deinterlace=1, use_top_field=True, use_pf=False, dn_enable=1, dn_quality="best", dn_strength=0.06, dn_cstrength=0.06 ) 
#video = core.dgdecodenv.DGSource( r'G:\HDTV\DGtest\MPEG2_INTERLACED.DGI', deinterlace=1, use_top_field=True, use_pf=False, dn_enable=1, dn_quality="better", dn_strength=0.06, dn_cstrength=0.06 ) 
#video = core.dgdecodenv.DGSource( r'G:\HDTV\DGtest\MPEG2_INTERLACED.DGI', deinterlace=1, use_top_field=True, use_pf=False, dn_enable=3, dn_quality="good", dn_strength=0.06, dn_cstrength=0.06 ) 
#video = core.dgdecodenv.DGSource( r'G:\HDTV\DGtest\MPEG2_INTERLACED.DGI', deinterlace=1, use_top_field=True, use_pf=False, dn_enable=1, dn_quality="good", dn_strength=0.06, dn_cstrength=0.06 ) 
video = core.dgdecodenv.DGSource( r'G:\HDTV\DGtest\MPEG2_INTERLACED.DGI', deinterlace=1, use_top_field=True, use_pf=False ) 
#video = core.avs.DGSharpen( video, strength=0.2 ) 
#video = vs.core.text.ClipInfo(video) 
video.set_output() 

G:\HDTV\DGtest>"!vspipeexe64!" --version 
VapourSynth Video Processing Library
Copyright (c) 2012-2023 Fredrik Mellbin
Core R65
API R4.0
API R3.6
Options: -

G:\HDTV\DGtest>"!vspipeexe64!" --info "!_VPY_file!" 
Width: 720
Height: 576
Frames: 56659
FPS: 25/1 (25.000 fps)
Format Name: YUV420P8
Color Family: YUV
Alpha: No
Sample Type: Integer
Bits: 8
SubSampling W: 1
SubSampling H: 1

G:\HDTV\DGtest>"!vspipeexe64!" --filter-time --progress --container y4m "!_VPY_file!" --  
Script evaluation done in 0.38 seconds
Frame: 1/56659

G:\HDTV\DGtest>
I really do like it here.
User avatar
Rocky
Posts: 3621
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

My testing for the following script using the new DLL (no difference with deinterlace=1):

import vapoursynth as vs
from vapoursynth import core
core.std.LoadPlugin(r"...\dgdecodenv.dll")
video = core.dgdecodenv.DGSource(r"...\out.264")
video.set_output()

Code: Select all

D:\Don\Programming\C++\DGDecNV\DGDecodeNV\Test>vspipe --version
VapourSynth Video Processing Library
Copyright (c) 2012-2022 Fredrik Mellbin
Core R61
API R4.0
API R3.6
Options: -

D:\Don\Programming\C++\DGDecNV\DGDecodeNV\Test>vspipe --filter-time --progress --container y4m nostalghia.vpy .\NUL
Script evaluation done in 0.30 seconds
Output 9194 frames in 16.49 seconds (557.40 fps)
Filtername           Filter mode   Time (%)   Time (s)
DGSource             unordered       99.40      16.39
Try it like directly like that. If it still fails maybe I need your source file, or I need to install R65, or ... :?:
User avatar
hydra3333
Posts: 406
Joined: Wed Oct 06, 2010 3:34 am
Contact:

CUDASynth

Post by hydra3333 »

OK !

Clipped the .mpg input and attached it in a .zip (note: previous .mp4 avc files worked, this mpeg2 didn't).

Log of successful pre-CUDASynth test:

Code: Select all

G:\HDTV\DGtest>"!vapoursynth_root!\DGIndex\DGIndexNV.exe" -version  
DGIndexNV 251.0.0.0 (64 bit)

G:\HDTV\DGtest>"!vapoursynth_root!\DGIndex\DGIndexNV.exe" -i "!QSF_VIDEO!" -e -h -o "!_DGI_FILE!" 
Project
100

G:\HDTV\DGtest>TYPE "!_DGI_LOG!" 
Stream Type: MPEG2 Program
Video Type: MPEG2
Profile: main@main
Coded Size: 720x576
Display Size: 720x576
Aspect Ratio: 16:11
Frame Rate: 25.000000 fps
Colorimetry: Unknown [2]
Sequence: Frame/Field
Frame Structure: 
Field Order: 
Frame Type: 
Frame Coding: 
Coded Number: 125
Playback Number: 125
Frame Repeats: 0
Field Repeats: 0
Bitrate: 
Bitrate (Avg): 2.791
Bitrate (Max): 
Elapsed: 0:00:00
Remain: 0:00:00
FPS: 
Info: Finished!

G:\HDTV\DGtest>TYPE "!_OLD_VPY_file!"  2>&1 
import vapoursynth as vs		# this allows use of constants eg vs.YUV420P8 
from vapoursynth import core	# actual vapoursynth core 
#import functool 
#import mvsfunc as mvs			# this relies on the .py residing at the VS folder root level - see run_vsrepo.bat 
#import havsfunc as haf		# this relies on the .py residing at the VS folder root level - see run_vsrepo.bat 
core.std.LoadPlugin(r'C:\SOFTWARE\Vapoursynth-x64\DGIndex\DGDecodeNV.dll') # do it like gonca https://forum.doom9.org/showthread.php?p=1877765#post1877765 
core.avs.LoadPlugin(r'C:\SOFTWARE\Vapoursynth-x64\DGIndex\DGDecodeNV.dll') # do it like gonca https://forum.doom9.org/showthread.php?p=1877765#post1877765 
# NOTE: deinterlace=1, use_top_field=True for "Interlaced"/"TFF" 
video = core.dgdecodenv.DGSource(r'G:\HDTV\DGtest\MPEG2_INTERLACED_CLIPPED.DGI', deinterlace=1, use_top_field=True, use_pf=False) 
# DGDecNV changes - 
# 2020.10.21 Added new parameters cstrength and cblend to independently control the chroma denoising. 
# 2020.11.07 Revised DGDenoise parameters. The 'chroma' option is removed. 
#            Now, if 'strength' is set to 0.0 then luma denoising is disabled, 
#            and if cstrength is set to 0.0 then chroma denoising is disabled. 
#            'cstrength' is now defaulted to 0.0, and 'searchw' is defaulted to 9. 
# example: video = core.avs.DGDenoise(video, strength=0.06, cstrength=0.06) # replaced chroma=True 
video = core.avs.DGDenoise(video, strength=0.06, cstrength=0.06) 
video = core.avs.DGSharpen(video, strength=0.2) 
#video = vs.core.text.ClipInfo(video) 
video.set_output() 

G:\HDTV\DGtest>"!old_vspipeexe64!" --version  2>&1 
VapourSynth Video Processing Library
Copyright (c) 2012-2023 Fredrik Mellbin
Core R65
API R4.0
API R3.6
Options: -

G:\HDTV\DGtest>"!old_vspipeexe64!" --info "!_OLD_VPY_file!"  2>&1 
Width: 720
Height: 576
Frames: 125
FPS: 25/1 (25.000 fps)
Format Name: YUV420P8
Color Family: YUV
Alpha: No
Sample Type: Integer
Bits: 8
SubSampling W: 1
SubSampling H: 1

G:\HDTV\DGtest>"!old_vspipeexe64!" --filter-time --container y4m "!_OLD_VPY_file!" --  2>&1 
Output 125 frames in 0.42 seconds (299.54 fps)
Filtername           Filter mode   Time (%)   Time (s)
DGDenoise            parreq          92.47       0.39
DGSource             unordered       60.75       0.25
DGSharpen            parreq          26.43       0.11
Log of unsuccessful CUDASynth test using the same .dgi:

Code: Select all

G:\HDTV\DGtest>TYPE "!_VPY_file!"  2>&1 
# from DG 
import vapoursynth as vs 
from vapoursynth import core 
core.std.LoadPlugin(r'G:\HDTV\DGtest\Vapoursynth-x64\DGIndex\DGDecodeNV.dll')  
video = core.dgdecodenv.DGSource( r'G:\HDTV\DGtest\MPEG2_INTERLACED_CLIPPED.DGI', deinterlace=1, use_top_field=True, use_pf=False ) 
video.set_output() 

G:\HDTV\DGtest>"!vspipeexe64!" --version  2>&1 
VapourSynth Video Processing Library
Copyright (c) 2012-2023 Fredrik Mellbin
Core R65
API R4.0
API R3.6
Options: -

G:\HDTV\DGtest>"!vspipeexe64!" --info "!_VPY_file!"  2>&1 
Width: 720
Height: 576
Frames: 125
FPS: 25/1 (25.000 fps)
Format Name: YUV420P8
Color Family: YUV
Alpha: No
Sample Type: Integer
Bits: 8
SubSampling W: 1
SubSampling H: 1

G:\HDTV\DGtest>"!vspipeexe64!" --filter-time --progress --container y4m "!_VPY_file!" --  2>&1 
Script evaluation done in 0.72 seconds
Frame: 1/125

G:\HDTV\DGtest>
It yields the same unsuccessful result if I use this .vpy without deinterlacing:

Code: Select all

G:\HDTV\DGtest>TYPE "!_VPY_file!"  2>&1 
# from DG 
import vapoursynth as vs 
from vapoursynth import core 
core.std.LoadPlugin(r'G:\HDTV\DGtest\Vapoursynth-x64\DGIndex\DGDecodeNV.dll')  
#video = core.dgdecodenv.DGSource( r'G:\HDTV\DGtest\MPEG2_INTERLACED_CLIPPED.DGI', deinterlace=1, use_top_field=True, use_pf=False ) 
video = core.dgdecodenv.DGSource( r'G:\HDTV\DGtest\MPEG2_INTERLACED_CLIPPED.DGI' ) 
video.set_output() 
Cheers.
Attachments
MPEG2_INTERLACED_CLIPPED.qsf.zip
(1.81 MiB) Downloaded 19 times
I really do like it here.
User avatar
Rocky
Posts: 3621
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

Thank you. Trying to duplicate...

EDIT: Duplicated. Investigating...
User avatar
Rocky
Posts: 3621
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

I have it fixed. Give me a jiffy to upload a new test4. Thank you for the report.
Post Reply