Port Cube

Post by **Rocky** » Mon Aug 08, 2022 7:15 am

Two things.

1. Where are you loading avsresize.dll?

2. Can you give me the cube file?

It's working fine for me with this script, only differing from yours in the cube file (which must be size 65):

loadplugin("d:\don\Programming\C++\dgdecnv\DGDecodeNV\x64\Release\dgdecodenv.dll")
loadplugin("avsresize.dll")
loadplugin("D:\Don\Programming\C++\Avisynth filters\DGCube\x64\Release\dgcube.dll")
dgsource("THE GREAT WALL.dgi")
#From 4:2:2 16bit planar Narrow Range to RGB Planar 16bit Full Range
z_ConvertFormat(pixel_type="RGBP16", colorspace_op="2020:st2084:2020:limited=>rgb:st2084:2020:full", resample_filter_uv="spline64", dither_type="error_diffusion")
#From PQ to HLG with 16bit precision
Cube("PQ_to_BT709_slope.cube", fullrange=true)
#From RGB 16bit planar Full Range to YUV422 10bit planar Narrow Range with dithering
z_ConvertFormat(pixel_type="YUV422P10", colorspace_op="rgb:std-b67:2020:full=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")

Guest 2 · Post by **Guest 2** » Mon Aug 08, 2022 8:26 am

Rocky wrote: ↑
Mon Aug 08, 2022 7:15 am
1. Where are you loading avsresize.dll?

No need as it's in AVS+ default folders.

Rocky wrote: ↑
Mon Aug 08, 2022 7:15 am
2. Can you give me the cube file?

I can give you a identity one, that crashes too:

IDENTITY.7z: (10.44 KiB) Downloaded 338 times

Rocky wrote: ↑
Mon Aug 08, 2022 7:15 am
which must be size 65

Are you meaning kB or what? Both BBC and Warner Bros licensed cube files are around 1 MB.

I asked a friend of mine on Quadro + Xeon workstation to try it too and he can confirm that:

1) 709 to HLG with commercial(*) cube works
2) PQ to HLG with commercial(*) cube crashes
3) HLG to PLQ with commercial(*) cube crashes

(*) both BBC and Warner Bros

AVSCube can ingest the various cubes with no issues.

Post by **Rocky** » Mon Aug 08, 2022 8:53 am

The cube dimension must be 65. Checking...

Post by **Rocky** » Mon Aug 08, 2022 9:09 am

Your cube is size 33. I'll generalize the size and give a new test version. After DGDecNV 444.

Guest 2 · Post by **Guest 2** » Mon Aug 08, 2022 10:55 am

Rocky wrote: ↑
Mon Aug 08, 2022 9:09 am
Your cube is size 33.

Is the 65 size related to CUDA only? AVSCube works ok.

Post by **Rocky** » Mon Aug 08, 2022 11:57 am

DGCube (CUDA) was hardwired for 65. Cube (CPU) could handle any size.

Re-download DGCube to get the fix, i.e., ability to open any size cube file.

Guest 2 · Post by **Guest 2** » Mon Aug 08, 2022 1:22 pm

Rocky wrote: ↑
Mon Aug 08, 2022 11:57 am
Re-download DGCube to get the fix, i.e., ability to open any size cube file.

Rapidly tested and it seems to be working.

I will bench and post results.

Guest 2 · Post by **Guest 2** » Tue Aug 09, 2022 3:57 am

Preliminary benchmarks before encoding, using AVSMeter64 + GPU-Z.

AVSCube script:

LoadPlugin("D:\Eseguibili\Media\DGDecNV\DGDecodeNV.dll")
LoadPlugin("D:\Eseguibili\Media\AVSCube\VSCube.dll")
DGSource("F:\In\2_0446 Akira\akira.dgi",ct=48,cb=48,cl=0,cr=0)
propClearAll()
#From 4:2:0 16bit planar Narrow Range to RGB Planar 16bit Full Range
z_ConvertFormat(pixel_type="RGBP16", colorspace_op="2020:st2084:2020:limited=>rgb:st2084:2020:full", resample_filter_uv="spline64", dither_type="error_diffusion")
#From PQ to HLG with 16bit precision
Cube("D:\Programmi\Media\AviSynth+\cube\1a_PQ1000_HLG_mode-nar_in-nar_out-nar_nocomp.cube", fullrange=true)
#From RGB 16bit planar Full Range to YUV420 10bit planar Narrow Range with dithering
z_ConvertFormat(pixel_type="YUV420P10", colorspace_op="rgb:std-b67:2020:full=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")

Frame width: 3840
Frame height: 2064
Framerate: 23.976 (24000/1001)
Colorspace: YUV420P10

FPS (cur | min | max | avg): 4.617 | 1.325 | 4.669 | 4.165
Process memory usage: 687 MiB
Thread count: 11
CPU usage (current | average): 6.7% | 6.8%

GPU usage (current | average): 2% | 9%
VPU usage (current | average): 3% | 11%
GPU memory usage: 1143 MiB
GPU Power Consumption (cur | avg): 37.6 W | 25.4 W

Same with Prefetch(8):

FPS (cur | min | max | avg): 2.859 | 0.910 | 108696 | 11.83
Process memory usage: 2014 MiB
Thread count: 22
CPU usage (current | average): 65.8% | 71.4%

GPU usage (current | average): 8% | 14%
VPU usage (current | average): 11% | 21%
GPU memory usage: 1143 MiB
GPU Power Consumption (cur | avg): 16.5 W | 38.3 W

DGCube script:

LoadPlugin("D:\Eseguibili\Media\DGDecNV\DGDecodeNV.dll")
LoadPlugin("D:\Eseguibili\Media\DGCube.dll")
DGSource("F:\In\2_0446 Akira\akira.dgi",ct=48,cb=48,cl=0,cr=0)
propClearAll()
z_ConvertFormat(pixel_type="RGBP16", colorspace_op="2020:st2084:2020:limited=>rgb:st2084:2020:full", resample_filter_uv="spline64", dither_type="error_diffusion")
Cube("D:\Programmi\Media\AviSynth+\cube\1a_PQ1000_HLG_mode-nar_in-nar_out-nar_nocomp.cube", fullrange=true)
z_ConvertFormat(pixel_type="YUV420P10", colorspace_op="rgb:std-b67:2020:full=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")

FPS (cur | min | max | avg): 4.617 | 1.325 | 4.669 | 4.165
Process memory usage: 687 MiB
Thread count: 11
CPU usage (current | average): 6.7% | 6.8%

GPU usage (current | average): 2% | 9%
VPU usage (current | average): 3% | 11%
GPU memory usage: 1143 MiB
GPU Power Consumption (cur | avg): 37.6 W | 25.4 W

Same with Prefetch(8):

FPS (cur | min | max | avg): 1.433 | 0.567 | 175439 | 10.15
Process memory usage: 2702 MiB
Thread count: 27
CPU usage (current | average): 64.3% | 62.6%

GPU usage (current | average): 37% | 28%
VPU usage (current | average): 28% | 10%
GPU memory usage: 2858 MiB
GPU Power Consumption (cur | avg): 50.1 W | 47.1 W

Guest 2 · Post by **Guest 2** » Tue Aug 09, 2022 5:39 am

Real world scenario: 4k PQ video to 1080p HLG video with denoise and x265 encoding.

Script:

SetMemoryMax()
SetFilterMTMode("DEFAULT_MT_MODE", 2)
LoadPlugin("D:\Eseguibili\Media\DGDecNV\DGDecodeNV.dll")
LoadPlugin("D:\Eseguibili\Media\AVSCube\VSCube.dll") # or DGCube
DGSource("F:\In\2_0446 Akira\akira.dgi",ct=48,cb=48,cl=0,cr=0, rw=1920, rh=1032)
propClearAll()
CompTest(1)
z_ConvertFormat(pixel_type="RGBP16", colorspace_op="2020:st2084:2020:limited=>rgb:st2084:2020:full", resample_filter_uv="spline64", dither_type="error_diffusion")
Cube("D:\Programmi\Media\AviSynth+\cube\1a_PQ1000_HLG_mode-nar_in-nar_out-nar_nocomp.cube", fullrange=true)
z_ConvertFormat(pixel_type="YUV420P10", colorspace_op="rgb:std-b67:2020:full=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")
ConvertBits(32)
BM3D_CUDA(sigma=3, radius=2)
BM3D_VAggregate(radius=2)
fmtc_bitdepth (bits=10,dmode=8)
neo_f3kdb(range=15, Y=65, Cb=40, Cr=40, grainY=0, grainC=0, sample_mode=2, blur_first=true, dynamic_grain=false, mt=false, keep_tv_range=true)
Prefetch(1) # 1,4,6

x265.exe --crf 22 --output-depth 10 --aq-mode 5 --fades --colorprim bt2020 --colormatrix bt2020nc --transfer arib-std-b67 --range limited --min-luma 64 --max-luma 940 --output "F:\In\2_0446 Akira\akira_cube_temp\akira_cube_out.hevc" "F:\In\2_0446 Akira\akira_cube_temp\akira_cube.avs"

AVSCube:

no Prefetch: encoded 1792 frames in 525.70s (3.41 fps), 1244.06 kb/s, Avg QP:26.48
Prefetch(4): encoded 1792 frames in 430.76s (4.16 fps), 1244.31 kb/s, Avg QP:26.49
Prefetch(6): encoded 1792 frames in 346.58s (5.17 fps), 1242.82 kb/s, Avg QP:26.49

DGCube:
no Prefetch: encoded 1792 frames in 525.13s (3.41 fps), 1242.09 kb/s, Avg QP:26.50
Prefetch(4): encoded 1792 frames in 415.67s (4.31 fps), 1244.13 kb/s, Avg QP:26.50
Prefetch(6): encoded 1792 frames in 351.01s (5.11 fps), 1243.90 kb/s, Avg QP:26.49

It's really strange that both me and you are having cpu and gpu results really too much aligned. I start to think that perhaps the limit is somewhere else.

Guest 2 · Post by **Guest 2** » Tue Aug 09, 2022 6:14 am

Aligned results also with x265.exe --crf 20 --preset slow --output-depth 10 --aq-mode 5 --fades --colorprim bt2020 --colormatrix bt2020nc --transfer arib-std-b67 --range limited --min-luma 64 --max-luma 940 --output "F:\In\2_0446 Akira\akira_cube_6_temp\akira_cube_6_out.hevc" "F:\In\2_0446 Akira\akira_cube_6_temp\akira_cube_6.avs"

AVSCube: encoded 1792 frames in 673.69s (2.66 fps), 1716.09 kb/s, Avg QP:24.01
DGCube: encoded 1792 frames in 696.78s (2.57 fps), 1716.41 kb/s, Avg QP:24.02

According to Agatha Christie, one coincidence is just a coincidence, two coincidences are a clue, three coincidences are a proof.

Post by **Rocky** » Tue Aug 09, 2022 11:15 am

It's just that the actual 3D LUT application is tiny compared to everything else, for both versions. I compared scripts with BlankClip() source and no conversions and things are as expected. DGCube is faster for no prefetch. Cube is faster with prefetch, however, each prefetch comes with more CPU utilization. So for transcoding, DGCube could be useful when the encoding load is high, compared to Cube with prefetch.

I'm going to add tetrahedral interpolation to both to address ErazorTT's problem case.

Post by **Rocky** » Tue Aug 09, 2022 6:14 pm

Guest 2 wrote: ↑
Sun Aug 07, 2022 10:05 am

Rocky wrote: ↑
Sun Aug 07, 2022 8:53 am
Can you tell me about DTL's workarounds? Any links?
https://forum.doom9.org/showthread.php?t=183517

It's a long thread, where he seemed to go thru some of your issues.

I read the whole thread and didn't see anything relevant. Did I miss something?

Guest 2 · Post by **Guest 2** » Wed Aug 10, 2022 1:51 am

Rocky wrote: ↑
Tue Aug 09, 2022 11:15 am
I compared scripts with BlankClip() source and no conversions and things are as expected.

Please, post your script.

Rocky wrote: ↑
Tue Aug 09, 2022 6:14 pm
I read the whole thread and didn't see anything relevant. Did I miss something?

Post by **Rocky** » Wed Aug 10, 2022 9:04 am

You are forgiven.

Here is a new version supporting tetrahedral interpolation. It addresses ErazorTT's issue, as the artifacts do not occur with tetrahedral. Please read the new DGCube.txt file for details and be aware that the filter is now invoked as DGCube(). I'll add this to the timecube-derived Cube() filter as well. Also need to add Vapoursynth support to DGCube().

https://rationalqm.us/misc/DGCube.zip

Please let the Doom9 guys know about this.

The script you asked for:

loadplugin("D:\Don\Programming\C++\Avisynth filters\DGCube\x64\Release\dgcube.dll")
BlankClip(pixel_type="RGBP16", width=3840, height=2160, length=1000)
DGCube("IDENTITY.cube")

Guest 2 · Post by **Guest 2** » Wed Aug 10, 2022 10:08 am

Rocky wrote: ↑
Wed Aug 10, 2022 9:04 am
Please let the Doom9 guys know about this.

Your wish is my command.

P.S: Will you add tetrahedral to AVSCube too?

Guest 2 · Post by **Guest 2** » Wed Aug 10, 2022 11:29 am

Rocky wrote: ↑
Wed Aug 10, 2022 9:04 am
The script you asked for

With a commercial (BBC) LUT:

AVSCube no prefetch
Number of frames: 1000
Length (hh:mm:ss.ms): 00:00:41.667
Frame width: 3840
Frame height: 2160
Framerate: 24.000 (24/1)
Colorspace: RGBP16
Audio channels: 1
Audio bits/sample: 16
Audio sample rate: 44100
Audio samples: 1837500

Frames processed: 1000 (0 - 999)
FPS (min | max | average): 8.370 | 14.84 | 14.11
Process memory usage (max): 116 MiB
Thread count: 9
CPU usage (average): 8.6%

GPU usage (average): 4%
VPU usage (average): 0%
GPU memory usage: 658 MiB
GPU Power Consumption (average): 11.7 W

AVSCube 6 threads
Frames processed: 1000 (0 - 999)
FPS (min | max | average): 35.30 | 90.92 | 59.82
Process memory usage (max): 1259 MiB
Thread count: 18
CPU usage (average): 61.6%

GPU usage (average): 3%
VPU usage (average): 0%
GPU memory usage: 658 MiB
GPU Power Consumption (average): 11.8 W

DGCube no prefetch
Frames processed: 1000 (0 - 999)
FPS (min | max | average): 21.50 | 45.00 | 41.58
Process memory usage (max): 221 MiB
Thread count: 13
CPU usage (average): 9.5%

GPU usage (average): 66%
VPU usage (average): 0%
GPU memory usage: 867 MiB
GPU Power Consumption (average): 49.1 W

DGCube 6 threads
Frames processed: 1000 (0 - 999)
FPS (min | max | average): 28.36 | 81.98 | 51.89
Process memory usage (max): 1755 MiB
Thread count: 24
CPU usage (average): 61.6%

GPU usage (average): 85%
VPU usage (average): 0%
GPU memory usage: 1910 MiB
GPU Power Consumption (average): 50.9 W

Interesting, it seems to hit a wall.

Rocky, do you think we could give fmtconv a try instead of z? I have tried do look at documentation and it's a bit obscure to me

P.S Some day I will ask you about nVidia DALI

Post by **Rocky** » Wed Aug 10, 2022 2:05 pm

I was trying to get fmtc working today but failed. I'll try again after Vapoursynth support.

Post by **Rocky** » Thu Aug 11, 2022 10:10 am

Please re-download to get:

* Vapoursynth support.
* 'device' parameter to select GPU device.
* Updated user manual.

Salvadore?

Post by **Rocky** » Fri Aug 12, 2022 10:29 am

Actually, I'm not going to add tetrahedral to timecube, because of all the assembler intrinsics stuff, for which I am neither qualified nor motivated for. And it gives a raison d'etre for DGCube.

Guest 2 · Post by **Guest 2** » Sat Aug 13, 2022 2:40 am

Rocky wrote: ↑
Fri Aug 12, 2022 10:29 am
And it gives a raison d'etre for DGCube.

Do you think it's possible to have the necessary color space conversion ported to CUDA, to offload the cpu as much as possible?

OpenCV supports it easily and, if I am not wrong, it's written in CUDA... so...

Post by **Rocky** » Sat Aug 13, 2022 8:11 am

Guest 2 wrote: ↑
Sat Aug 13, 2022 2:40 am
Do you think it's possible to have the necessary color space conversion ported to CUDA, to offload the cpu as much as possible?

Sure. Bullwinkle already mentioned that.

Guest 2 · Post by **Guest 2** » Sat Aug 13, 2022 11:27 am

Rocky wrote: ↑
Sat Aug 13, 2022 8:11 am
Sure. Bullwinkle already mentioned that.

Post by **Rocky** » Sun Aug 14, 2022 3:30 pm

It's kicking my patootie. Maybe I should call in Britney.

Post by **Sherman** » Sun Aug 14, 2022 3:32 pm

Did you want me to try?

Post by **Rocky** » Sun Aug 14, 2022 3:33 pm

Aren't you busy with your new tube tester?