Port Cube

Post by **Rocky** » Thu Sep 22, 2022 7:19 am

Finding #5: Frame properties

When using std-b67, the _Transfer frame property is 18. When using st2084, the _Transfer frame property is 16. All other properties are the same.

So we will focus initially on the last stage and seek out the reason for the discrepancy. I'll have to bring up avsresize in the debugger. That will take a little time to get going.

Post by **Curly** » Thu Sep 22, 2022 11:53 am

@Guest 2

Just for fun, add this parameter to all z calls and see if things go faster.

approximate_gamma=true

And modify avsresize to support CUDASynth protocol. It's not difficult.

Guest 2 · Post by **Guest 2** » Thu Sep 22, 2022 1:24 pm

Curly wrote: ↑
Thu Sep 22, 2022 11:53 am
Just for fun

Will try, thanks!

Curly wrote: ↑
Thu Sep 22, 2022 11:53 am
And modify avsresize to support CUDASynth protocol. It's not difficult.

Post by **Rocky** » Thu Sep 22, 2022 3:11 pm

Curly wrote: ↑
Thu Sep 22, 2022 11:53 am
And modify avsresize to support CUDASynth protocol. It's not difficult.

No. To gain anything you need to port z algorithms to CUDA.

Guest 2 · Post by **Guest 2** » Wed Sep 28, 2022 3:24 pm

Rocky wrote: ↑
Thu Sep 22, 2022 3:11 pm
No.

Hello? Everything is fine?

Post by **Bullwinkle** » Wed Sep 28, 2022 4:01 pm

Rocky is ill and may not get back to coding for some days.

Any luck with the approximate_gamma hack?

Guest 2 · Post by **Guest 2** » Wed Sep 28, 2022 4:06 pm

Bullwinkle wrote: ↑
Wed Sep 28, 2022 4:01 pm
is ill

My best wishes.

Post by **Bullwinkle** » Fri Sep 30, 2022 11:08 am

Thank you, big t.

BTW, I slipstreamed a change to DGCube to map negative values to 0. People should not create cubes with negative values but I'll be nice and try to help them. Redownload to get the change.

Any luck with the approximate_gamma hack?

Guest 2 · Post by **Guest 2** » Sat Oct 01, 2022 1:08 pm

Bullwinkle wrote: ↑
Fri Sep 30, 2022 11:08 am
BTW, I slipstreamed

I was about sending you the url of the message

Bullwinkle wrote: ↑
Fri Sep 30, 2022 11:08 am
Any luck with the approximate_gamma hack?

I need to empty the encoding queue before trying. Will do ASAP.

Guest 2 · Post by **Guest 2** » Tue Oct 04, 2022 10:20 am

I had some time to test approx gamma.

UHD: Thor – Love and thunder, mixed frames (1722 frames in total)

Script:

Code: Select all

SetMemoryMax()
LoadPlugin("D:\Eseguibili\Media\DGDecNV\DGDecodeNV.dll")
LoadPlugin("D:\Eseguibili\Media\DGCube\DGCube.dll")
DGSource("F:\In\1_58 Thor Love and thunder\thor.dgi",ct=280,cb=280,cl=0,cr=0)
propClearAll()
z_ConvertFormat(pixel_type="RGBP16", colorspace_op="2020:st2084:2020:limited=>rgb:st2084:2020:full", resample_filter_uv="spline64", dither_type="error_diffusion")
DGCube("D:\Programmi\Media\AviSynth+\cube\1a_PQ1000_HLG_mode-nar_in-nar_out-nar_nocomp.cube", in="full", lut="full", out="full")
z_ConvertFormat(pixel_type="YUV420P10", colorspace_op="rgb:std-b67:2020:full=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")
fmtc_bitdepth (bits=10, dmode=8)
Prefetch(8)

Code: Select all

AVSMeter64.exe "Thor 4K HLG BBC ext.avs"

AVSMeter 3.0.8.0 (x64), (c) Groucho2004, 2012-2021
AviSynth+ 3.7.3 (r3689, 3.7, x86_64) (3.7.3.0)

Number of frames:                     1722
Length (hh:mm:ss.ms):         00:01:11.822
Frame width:                          3840
Frame height:                         1600
Framerate:                          23.976 (24000/1001)
Colorspace:                      YUV420P10

Frames processed:                   1722 (0 - 1721)
FPS (min | max | average):          0.839 | 322580 | 16.01
Process memory usage (max):         2840 MiB
Thread count:                       27
CPU usage (average):                71.5%

Time (elapsed):                     00:01:47.571

Replacing the cube part with

Code: Select all

z_ConvertFormat(pixel_type="RGBP16", colorspace_op="2020:st2084:2020:limited=>rgb:st2084:2020:full", resample_filter_uv="spline64", dither_type="error_diffusion",approximate_gamma=true)
DGCube("D:\Programmi\Media\AviSynth+\cube\1a_PQ1000_HLG_mode-nar_in-nar_out-nar_nocomp.cube", in="full", lut="full", out="full")
z_ConvertFormat(pixel_type="YUV420P10", colorspace_op="rgb:std-b67:2020:full=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion",approximate_gamma=true)

I get

Code: Select all

Frames processed:                   1722 (0 - 1721)
FPS (min | max | average):          0.606 | 344828 | 16.62
Process memory usage (max):         2697 MiB
Thread count:                       27
CPU usage (average):                70.5%

Time (elapsed):                     00:01:43.608

Not so much to risk issues with gamma approx.

Post by **Rocky** » Wed Oct 05, 2022 12:52 pm

OK, nowhere near the speedup hoped for. Thank you for your testing.

Guest 2 · Post by **Guest 2** » Wed Oct 05, 2022 1:48 pm

Rocky wrote: ↑
Wed Oct 05, 2022 12:52 pm
OK, nowhere near the speedup hoped for. Thank you for your testing.

Now that you have stocked with acorns, I bet you will pull a rabbit out of the hat.

Post by **Rocky** » Wed Oct 05, 2022 5:43 pm

Ha ha, clever frog. Croak!

Post by **Rocky** » Thu Oct 06, 2022 5:32 pm

Completed my last coaching cert today and Sherman is taking over the radios. I'll try to make sense of z source code to extract the matrices they are using for comparison to mine.

Guest 2 · Post by **Guest 2** » Mon Oct 10, 2022 8:52 am

Rocky wrote: ↑
Thu Oct 06, 2022 5:32 pm
I'll try to make sense of z source code to extract the matrices they are using for comparison to mine.

Eager to test your wonders.

Post by **Rocky** » Mon Oct 10, 2022 1:07 pm

Me too.

Post by **Rocky** » Tue Oct 11, 2022 9:39 am

Ooh, I think I found the place in the code where the coefficients are generated. Let's see if I can hit it in the debugger.

Guest 2 · Post by **Guest 2** » Tue Oct 11, 2022 3:34 pm

Rocky wrote: ↑
Tue Oct 11, 2022 9:39 am
Ooh, I think I found the place in the code where the coefficients are generated.

Remember that conversion from PQ to SD works fine, it's the HLG part that has issue.

Post by **Rocky** » Tue Oct 11, 2022 3:55 pm

I don't follow that because the HLG is part of the 3D LUT. I've shown the discrepancies for the two operations before and after the 3D LUT application, i.e., the YUV -> RGB and RGB -> YUV conversions.

Guest 2 · Post by **Guest 2** » Tue Oct 11, 2022 3:56 pm

I was reading https://github.com/WolframRhodium/VapourSynth-BM3DCUDA and I noticed:

Code: Select all

fast:

Multi-threaded copy between CPU and GPU at the expense of 4x memory consumption.

Default True.

Perhaps you can find some trick in the source to increase transfer speed.

P.S: I really would like to see its AVS+ native by your talented hands.

Post by **Sherman** » Tue Oct 11, 2022 7:11 pm

And now, it's time for a message from our sponsor.

Britney · Post by **Britney** » Tue Oct 11, 2022 7:15 pm

Post by **Baltasar** » Tue Oct 11, 2022 7:18 pm

Post by **Albert** » Tue Oct 11, 2022 7:21 pm

Boris · Post by **Boris** » Tue Oct 11, 2022 7:26 pm