Port Cube
Port Cube
Two things.
1. Where are you loading avsresize.dll?
2. Can you give me the cube file?
It's working fine for me with this script, only differing from yours in the cube file (which must be size 65):
loadplugin("d:\don\Programming\C++\dgdecnv\DGDecodeNV\x64\Release\dgdecodenv.dll")
loadplugin("avsresize.dll")
loadplugin("D:\Don\Programming\C++\Avisynth filters\DGCube\x64\Release\dgcube.dll")
dgsource("THE GREAT WALL.dgi")
#From 4:2:2 16bit planar Narrow Range to RGB Planar 16bit Full Range
z_ConvertFormat(pixel_type="RGBP16", colorspace_op="2020:st2084:2020:limited=>rgb:st2084:2020:full", resample_filter_uv="spline64", dither_type="error_diffusion")
#From PQ to HLG with 16bit precision
Cube("PQ_to_BT709_slope.cube", fullrange=true)
#From RGB 16bit planar Full Range to YUV422 10bit planar Narrow Range with dithering
z_ConvertFormat(pixel_type="YUV422P10", colorspace_op="rgb:std-b67:2020:full=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")
1. Where are you loading avsresize.dll?
2. Can you give me the cube file?
It's working fine for me with this script, only differing from yours in the cube file (which must be size 65):
loadplugin("d:\don\Programming\C++\dgdecnv\DGDecodeNV\x64\Release\dgdecodenv.dll")
loadplugin("avsresize.dll")
loadplugin("D:\Don\Programming\C++\Avisynth filters\DGCube\x64\Release\dgcube.dll")
dgsource("THE GREAT WALL.dgi")
#From 4:2:2 16bit planar Narrow Range to RGB Planar 16bit Full Range
z_ConvertFormat(pixel_type="RGBP16", colorspace_op="2020:st2084:2020:limited=>rgb:st2084:2020:full", resample_filter_uv="spline64", dither_type="error_diffusion")
#From PQ to HLG with 16bit precision
Cube("PQ_to_BT709_slope.cube", fullrange=true)
#From RGB 16bit planar Full Range to YUV422 10bit planar Narrow Range with dithering
z_ConvertFormat(pixel_type="YUV422P10", colorspace_op="rgb:std-b67:2020:full=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")
Port Cube
No need as it's in AVS+ default folders.
I can give you a identity one, that crashes too:
Are you meaning kB or what? Both BBC and Warner Bros licensed cube files are around 1 MB.
I asked a friend of mine on Quadro + Xeon workstation to try it too and he can confirm that:
1) 709 to HLG with commercial(*) cube works
2) PQ to HLG with commercial(*) cube crashes
3) HLG to PLQ with commercial(*) cube crashes
(*) both BBC and Warner Bros
AVSCube can ingest the various cubes with no issues.
Port Cube
The cube dimension must be 65. Checking...
Port Cube
Your cube is size 33. I'll generalize the size and give a new test version. After DGDecNV 444.
Port Cube
DGCube (CUDA) was hardwired for 65. Cube (CPU) could handle any size.
Re-download DGCube to get the fix, i.e., ability to open any size cube file.
Re-download DGCube to get the fix, i.e., ability to open any size cube file.
Port Cube
Preliminary benchmarks before encoding, using AVSMeter64 + GPU-Z.
AVSCube script:
LoadPlugin("D:\Eseguibili\Media\DGDecNV\DGDecodeNV.dll")
LoadPlugin("D:\Eseguibili\Media\AVSCube\VSCube.dll")
DGSource("F:\In\2_0446 Akira\akira.dgi",ct=48,cb=48,cl=0,cr=0)
propClearAll()
#From 4:2:0 16bit planar Narrow Range to RGB Planar 16bit Full Range
z_ConvertFormat(pixel_type="RGBP16", colorspace_op="2020:st2084:2020:limited=>rgb:st2084:2020:full", resample_filter_uv="spline64", dither_type="error_diffusion")
#From PQ to HLG with 16bit precision
Cube("D:\Programmi\Media\AviSynth+\cube\1a_PQ1000_HLG_mode-nar_in-nar_out-nar_nocomp.cube", fullrange=true)
#From RGB 16bit planar Full Range to YUV420 10bit planar Narrow Range with dithering
z_ConvertFormat(pixel_type="YUV420P10", colorspace_op="rgb:std-b67:2020:full=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")
Frame width: 3840
Frame height: 2064
Framerate: 23.976 (24000/1001)
Colorspace: YUV420P10
FPS (cur | min | max | avg): 4.617 | 1.325 | 4.669 | 4.165
Process memory usage: 687 MiB
Thread count: 11
CPU usage (current | average): 6.7% | 6.8%
GPU usage (current | average): 2% | 9%
VPU usage (current | average): 3% | 11%
GPU memory usage: 1143 MiB
GPU Power Consumption (cur | avg): 37.6 W | 25.4 W
Same with Prefetch(8):
FPS (cur | min | max | avg): 2.859 | 0.910 | 108696 | 11.83
Process memory usage: 2014 MiB
Thread count: 22
CPU usage (current | average): 65.8% | 71.4%
GPU usage (current | average): 8% | 14%
VPU usage (current | average): 11% | 21%
GPU memory usage: 1143 MiB
GPU Power Consumption (cur | avg): 16.5 W | 38.3 W
DGCube script:
LoadPlugin("D:\Eseguibili\Media\DGDecNV\DGDecodeNV.dll")
LoadPlugin("D:\Eseguibili\Media\DGCube.dll")
DGSource("F:\In\2_0446 Akira\akira.dgi",ct=48,cb=48,cl=0,cr=0)
propClearAll()
z_ConvertFormat(pixel_type="RGBP16", colorspace_op="2020:st2084:2020:limited=>rgb:st2084:2020:full", resample_filter_uv="spline64", dither_type="error_diffusion")
Cube("D:\Programmi\Media\AviSynth+\cube\1a_PQ1000_HLG_mode-nar_in-nar_out-nar_nocomp.cube", fullrange=true)
z_ConvertFormat(pixel_type="YUV420P10", colorspace_op="rgb:std-b67:2020:full=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")
FPS (cur | min | max | avg): 4.617 | 1.325 | 4.669 | 4.165
Process memory usage: 687 MiB
Thread count: 11
CPU usage (current | average): 6.7% | 6.8%
GPU usage (current | average): 2% | 9%
VPU usage (current | average): 3% | 11%
GPU memory usage: 1143 MiB
GPU Power Consumption (cur | avg): 37.6 W | 25.4 W
Same with Prefetch(8):
FPS (cur | min | max | avg): 1.433 | 0.567 | 175439 | 10.15
Process memory usage: 2702 MiB
Thread count: 27
CPU usage (current | average): 64.3% | 62.6%
GPU usage (current | average): 37% | 28%
VPU usage (current | average): 28% | 10%
GPU memory usage: 2858 MiB
GPU Power Consumption (cur | avg): 50.1 W | 47.1 W
AVSCube script:
LoadPlugin("D:\Eseguibili\Media\DGDecNV\DGDecodeNV.dll")
LoadPlugin("D:\Eseguibili\Media\AVSCube\VSCube.dll")
DGSource("F:\In\2_0446 Akira\akira.dgi",ct=48,cb=48,cl=0,cr=0)
propClearAll()
#From 4:2:0 16bit planar Narrow Range to RGB Planar 16bit Full Range
z_ConvertFormat(pixel_type="RGBP16", colorspace_op="2020:st2084:2020:limited=>rgb:st2084:2020:full", resample_filter_uv="spline64", dither_type="error_diffusion")
#From PQ to HLG with 16bit precision
Cube("D:\Programmi\Media\AviSynth+\cube\1a_PQ1000_HLG_mode-nar_in-nar_out-nar_nocomp.cube", fullrange=true)
#From RGB 16bit planar Full Range to YUV420 10bit planar Narrow Range with dithering
z_ConvertFormat(pixel_type="YUV420P10", colorspace_op="rgb:std-b67:2020:full=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")
Frame width: 3840
Frame height: 2064
Framerate: 23.976 (24000/1001)
Colorspace: YUV420P10
FPS (cur | min | max | avg): 4.617 | 1.325 | 4.669 | 4.165
Process memory usage: 687 MiB
Thread count: 11
CPU usage (current | average): 6.7% | 6.8%
GPU usage (current | average): 2% | 9%
VPU usage (current | average): 3% | 11%
GPU memory usage: 1143 MiB
GPU Power Consumption (cur | avg): 37.6 W | 25.4 W
Same with Prefetch(8):
FPS (cur | min | max | avg): 2.859 | 0.910 | 108696 | 11.83
Process memory usage: 2014 MiB
Thread count: 22
CPU usage (current | average): 65.8% | 71.4%
GPU usage (current | average): 8% | 14%
VPU usage (current | average): 11% | 21%
GPU memory usage: 1143 MiB
GPU Power Consumption (cur | avg): 16.5 W | 38.3 W
DGCube script:
LoadPlugin("D:\Eseguibili\Media\DGDecNV\DGDecodeNV.dll")
LoadPlugin("D:\Eseguibili\Media\DGCube.dll")
DGSource("F:\In\2_0446 Akira\akira.dgi",ct=48,cb=48,cl=0,cr=0)
propClearAll()
z_ConvertFormat(pixel_type="RGBP16", colorspace_op="2020:st2084:2020:limited=>rgb:st2084:2020:full", resample_filter_uv="spline64", dither_type="error_diffusion")
Cube("D:\Programmi\Media\AviSynth+\cube\1a_PQ1000_HLG_mode-nar_in-nar_out-nar_nocomp.cube", fullrange=true)
z_ConvertFormat(pixel_type="YUV420P10", colorspace_op="rgb:std-b67:2020:full=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")
FPS (cur | min | max | avg): 4.617 | 1.325 | 4.669 | 4.165
Process memory usage: 687 MiB
Thread count: 11
CPU usage (current | average): 6.7% | 6.8%
GPU usage (current | average): 2% | 9%
VPU usage (current | average): 3% | 11%
GPU memory usage: 1143 MiB
GPU Power Consumption (cur | avg): 37.6 W | 25.4 W
Same with Prefetch(8):
FPS (cur | min | max | avg): 1.433 | 0.567 | 175439 | 10.15
Process memory usage: 2702 MiB
Thread count: 27
CPU usage (current | average): 64.3% | 62.6%
GPU usage (current | average): 37% | 28%
VPU usage (current | average): 28% | 10%
GPU memory usage: 2858 MiB
GPU Power Consumption (cur | avg): 50.1 W | 47.1 W
Port Cube
Real world scenario: 4k PQ video to 1080p HLG video with denoise and x265 encoding.
Script:
SetMemoryMax()
SetFilterMTMode("DEFAULT_MT_MODE", 2)
LoadPlugin("D:\Eseguibili\Media\DGDecNV\DGDecodeNV.dll")
LoadPlugin("D:\Eseguibili\Media\AVSCube\VSCube.dll") # or DGCube
DGSource("F:\In\2_0446 Akira\akira.dgi",ct=48,cb=48,cl=0,cr=0, rw=1920, rh=1032)
propClearAll()
CompTest(1)
z_ConvertFormat(pixel_type="RGBP16", colorspace_op="2020:st2084:2020:limited=>rgb:st2084:2020:full", resample_filter_uv="spline64", dither_type="error_diffusion")
Cube("D:\Programmi\Media\AviSynth+\cube\1a_PQ1000_HLG_mode-nar_in-nar_out-nar_nocomp.cube", fullrange=true)
z_ConvertFormat(pixel_type="YUV420P10", colorspace_op="rgb:std-b67:2020:full=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")
ConvertBits(32)
BM3D_CUDA(sigma=3, radius=2)
BM3D_VAggregate(radius=2)
fmtc_bitdepth (bits=10,dmode=8)
neo_f3kdb(range=15, Y=65, Cb=40, Cr=40, grainY=0, grainC=0, sample_mode=2, blur_first=true, dynamic_grain=false, mt=false, keep_tv_range=true)
Prefetch(1) # 1,4,6
x265.exe --crf 22 --output-depth 10 --aq-mode 5 --fades --colorprim bt2020 --colormatrix bt2020nc --transfer arib-std-b67 --range limited --min-luma 64 --max-luma 940 --output "F:\In\2_0446 Akira\akira_cube_temp\akira_cube_out.hevc" "F:\In\2_0446 Akira\akira_cube_temp\akira_cube.avs"
AVSCube:
no Prefetch: encoded 1792 frames in 525.70s (3.41 fps), 1244.06 kb/s, Avg QP:26.48
Prefetch(4): encoded 1792 frames in 430.76s (4.16 fps), 1244.31 kb/s, Avg QP:26.49
Prefetch(6): encoded 1792 frames in 346.58s (5.17 fps), 1242.82 kb/s, Avg QP:26.49
DGCube:
no Prefetch: encoded 1792 frames in 525.13s (3.41 fps), 1242.09 kb/s, Avg QP:26.50
Prefetch(4): encoded 1792 frames in 415.67s (4.31 fps), 1244.13 kb/s, Avg QP:26.50
Prefetch(6): encoded 1792 frames in 351.01s (5.11 fps), 1243.90 kb/s, Avg QP:26.49
It's really strange that both me and you are having cpu and gpu results really too much aligned. I start to think that perhaps the limit is somewhere else.
Script:
SetMemoryMax()
SetFilterMTMode("DEFAULT_MT_MODE", 2)
LoadPlugin("D:\Eseguibili\Media\DGDecNV\DGDecodeNV.dll")
LoadPlugin("D:\Eseguibili\Media\AVSCube\VSCube.dll") # or DGCube
DGSource("F:\In\2_0446 Akira\akira.dgi",ct=48,cb=48,cl=0,cr=0, rw=1920, rh=1032)
propClearAll()
CompTest(1)
z_ConvertFormat(pixel_type="RGBP16", colorspace_op="2020:st2084:2020:limited=>rgb:st2084:2020:full", resample_filter_uv="spline64", dither_type="error_diffusion")
Cube("D:\Programmi\Media\AviSynth+\cube\1a_PQ1000_HLG_mode-nar_in-nar_out-nar_nocomp.cube", fullrange=true)
z_ConvertFormat(pixel_type="YUV420P10", colorspace_op="rgb:std-b67:2020:full=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")
ConvertBits(32)
BM3D_CUDA(sigma=3, radius=2)
BM3D_VAggregate(radius=2)
fmtc_bitdepth (bits=10,dmode=8)
neo_f3kdb(range=15, Y=65, Cb=40, Cr=40, grainY=0, grainC=0, sample_mode=2, blur_first=true, dynamic_grain=false, mt=false, keep_tv_range=true)
Prefetch(1) # 1,4,6
x265.exe --crf 22 --output-depth 10 --aq-mode 5 --fades --colorprim bt2020 --colormatrix bt2020nc --transfer arib-std-b67 --range limited --min-luma 64 --max-luma 940 --output "F:\In\2_0446 Akira\akira_cube_temp\akira_cube_out.hevc" "F:\In\2_0446 Akira\akira_cube_temp\akira_cube.avs"
AVSCube:
no Prefetch: encoded 1792 frames in 525.70s (3.41 fps), 1244.06 kb/s, Avg QP:26.48
Prefetch(4): encoded 1792 frames in 430.76s (4.16 fps), 1244.31 kb/s, Avg QP:26.49
Prefetch(6): encoded 1792 frames in 346.58s (5.17 fps), 1242.82 kb/s, Avg QP:26.49
DGCube:
no Prefetch: encoded 1792 frames in 525.13s (3.41 fps), 1242.09 kb/s, Avg QP:26.50
Prefetch(4): encoded 1792 frames in 415.67s (4.31 fps), 1244.13 kb/s, Avg QP:26.50
Prefetch(6): encoded 1792 frames in 351.01s (5.11 fps), 1243.90 kb/s, Avg QP:26.49
It's really strange that both me and you are having cpu and gpu results really too much aligned. I start to think that perhaps the limit is somewhere else.
Port Cube
Aligned results also with x265.exe --crf 20 --preset slow --output-depth 10 --aq-mode 5 --fades --colorprim bt2020 --colormatrix bt2020nc --transfer arib-std-b67 --range limited --min-luma 64 --max-luma 940 --output "F:\In\2_0446 Akira\akira_cube_6_temp\akira_cube_6_out.hevc" "F:\In\2_0446 Akira\akira_cube_6_temp\akira_cube_6.avs"
AVSCube: encoded 1792 frames in 673.69s (2.66 fps), 1716.09 kb/s, Avg QP:24.01
DGCube: encoded 1792 frames in 696.78s (2.57 fps), 1716.41 kb/s, Avg QP:24.02
According to Agatha Christie, one coincidence is just a coincidence, two coincidences are a clue, three coincidences are a proof.
AVSCube: encoded 1792 frames in 673.69s (2.66 fps), 1716.09 kb/s, Avg QP:24.01
DGCube: encoded 1792 frames in 696.78s (2.57 fps), 1716.41 kb/s, Avg QP:24.02
According to Agatha Christie, one coincidence is just a coincidence, two coincidences are a clue, three coincidences are a proof.
Port Cube
It's just that the actual 3D LUT application is tiny compared to everything else, for both versions. I compared scripts with BlankClip() source and no conversions and things are as expected. DGCube is faster for no prefetch. Cube is faster with prefetch, however, each prefetch comes with more CPU utilization. So for transcoding, DGCube could be useful when the encoding load is high, compared to Cube with prefetch.
I'm going to add tetrahedral interpolation to both to address ErazorTT's problem case.
I'm going to add tetrahedral interpolation to both to address ErazorTT's problem case.
Port Cube
I read the whole thread and didn't see anything relevant. Did I miss something?Guest 2 wrote: ↑Sun Aug 07, 2022 10:05 amhttps://forum.doom9.org/showthread.php?t=183517
It's a long thread, where he seemed to go thru some of your issues.
Port Cube
You are forgiven.
Here is a new version supporting tetrahedral interpolation. It addresses ErazorTT's issue, as the artifacts do not occur with tetrahedral. Please read the new DGCube.txt file for details and be aware that the filter is now invoked as DGCube(). I'll add this to the timecube-derived Cube() filter as well. Also need to add Vapoursynth support to DGCube().
https://rationalqm.us/misc/DGCube.zip
Please let the Doom9 guys know about this.
The script you asked for:
loadplugin("D:\Don\Programming\C++\Avisynth filters\DGCube\x64\Release\dgcube.dll")
BlankClip(pixel_type="RGBP16", width=3840, height=2160, length=1000)
DGCube("IDENTITY.cube")
Here is a new version supporting tetrahedral interpolation. It addresses ErazorTT's issue, as the artifacts do not occur with tetrahedral. Please read the new DGCube.txt file for details and be aware that the filter is now invoked as DGCube(). I'll add this to the timecube-derived Cube() filter as well. Also need to add Vapoursynth support to DGCube().
https://rationalqm.us/misc/DGCube.zip
Please let the Doom9 guys know about this.
The script you asked for:
loadplugin("D:\Don\Programming\C++\Avisynth filters\DGCube\x64\Release\dgcube.dll")
BlankClip(pixel_type="RGBP16", width=3840, height=2160, length=1000)
DGCube("IDENTITY.cube")
Port Cube
With a commercial (BBC) LUT:
AVSCube no prefetch
Number of frames: 1000
Length (hh:mm:ss.ms): 00:00:41.667
Frame width: 3840
Frame height: 2160
Framerate: 24.000 (24/1)
Colorspace: RGBP16
Audio channels: 1
Audio bits/sample: 16
Audio sample rate: 44100
Audio samples: 1837500
Frames processed: 1000 (0 - 999)
FPS (min | max | average): 8.370 | 14.84 | 14.11
Process memory usage (max): 116 MiB
Thread count: 9
CPU usage (average): 8.6%
GPU usage (average): 4%
VPU usage (average): 0%
GPU memory usage: 658 MiB
GPU Power Consumption (average): 11.7 W
AVSCube 6 threads
Frames processed: 1000 (0 - 999)
FPS (min | max | average): 35.30 | 90.92 | 59.82
Process memory usage (max): 1259 MiB
Thread count: 18
CPU usage (average): 61.6%
GPU usage (average): 3%
VPU usage (average): 0%
GPU memory usage: 658 MiB
GPU Power Consumption (average): 11.8 W
DGCube no prefetch
Frames processed: 1000 (0 - 999)
FPS (min | max | average): 21.50 | 45.00 | 41.58
Process memory usage (max): 221 MiB
Thread count: 13
CPU usage (average): 9.5%
GPU usage (average): 66%
VPU usage (average): 0%
GPU memory usage: 867 MiB
GPU Power Consumption (average): 49.1 W
DGCube 6 threads
Frames processed: 1000 (0 - 999)
FPS (min | max | average): 28.36 | 81.98 | 51.89
Process memory usage (max): 1755 MiB
Thread count: 24
CPU usage (average): 61.6%
GPU usage (average): 85%
VPU usage (average): 0%
GPU memory usage: 1910 MiB
GPU Power Consumption (average): 50.9 W
Interesting, it seems to hit a wall.
Rocky, do you think we could give fmtconv a try instead of z? I have tried do look at documentation and it's a bit obscure to me
P.S Some day I will ask you about nVidia DALI
Port Cube
I was trying to get fmtc working today but failed. I'll try again after Vapoursynth support.
Port Cube
Please re-download to get:
* Vapoursynth support.
* 'device' parameter to select GPU device.
* Updated user manual.
Salvadore?
* Vapoursynth support.
* 'device' parameter to select GPU device.
* Updated user manual.
Salvadore?
Port Cube
Actually, I'm not going to add tetrahedral to timecube, because of all the assembler intrinsics stuff, for which I am neither qualified nor motivated for. And it gives a raison d'etre for DGCube.
Port Cube
It's kicking my patootie. Maybe I should call in Britney.