Port Cube

These CUDA filters are packaged into DGDecodeNV, which is part of DGDecNV.
User avatar
Rocky
Posts: 2709
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

Finding #1: Comparison after conversion of input YUV420P16 to RGBP16.

Pixel value after DGSource(): Y=26048, U=30592, V=34624 # same for both scripts

Pixel value after external conversion: R = 28788, G = 24857, B = 20991

Pixel value after internal conversion: R = 28760, G = 24833, B = 20961

The percent discrepancies are: R = 0.097%, G 0.096%, B = 0.143%

So we are talking about a tenth of a percent discrepancy for the input conversion to RGB. While this appears inconsequential, let's come back to determining reasons for discrepancies after we determine the discrepancies for each stage.

Next up is to compare after the LUT application. After that is to compare after the output conversion back to YUV420P16.
User avatar
Rocky
Posts: 2709
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

Finding #2: Comparison after application of LUT.

Pixel value after external application: R = 33663, G = 24354, B = 17262

Pixel value after internal application: R = 33606, G = 24313, B = 17219

The percent discrepancies are: R = 0.17%, G 0.17%, B = 0.25%

The percent discrepancies are cumulative, that is, after all previous stages and not per-stage. At this point, we are still within a quarter of one percent.
User avatar
Rocky
Posts: 2709
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

Finding #3: Comparison after final output (adds the RGBP16->YUV420P16 stage).

Pixel value after external output: R = 26663, G = 28512, B = 37090

Pixel value after internal output: R = 26673, G = 28262, B = 37360

The percent discrepancies are: R = 0.04%, G 0.88%, B = 0.73%

The percent discrepancies are cumulative, that is, after all previous stages and not per-stage. At final output, we are still less than one percent.

It's clear that the biggest discrepancy is introduced at the final stage, so we will focus on the reasons for that first.
User avatar
tormento
Posts: 727
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by tormento »

Rocky wrote:
Thu Sep 22, 2022 6:51 am
It's clear that the biggest discrepancy is introduced at the final stage, so we will focus on the reasons for that first.
Are you testing PQ to HLG or PQ to SDR internal conversion?
User avatar
Rocky
Posts: 2709
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

Finding #4: Change the final external conversion stage from:

z_ConvertFormat(pixel_type="YUV420P16", colorspace_op="rgb:std-b67:2020:full=>2020:std-b67:2020:limited")

to:

z_ConvertFormat(pixel_type="YUV420P16", colorspace_op="rgb:st2084:2020:full=>2020:st2084:2020:limited")

Interestingly, this does not alter the final pixel values! One might suppose it changes only metadata, but this is an Avisynth script delivering only raw pixels, so any metadata must be in frame properties. I'll check those next.
User avatar
Rocky
Posts: 2709
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

tormento wrote:
Thu Sep 22, 2022 7:03 am
Are you testing PQ to HLG or PQ to SDR internal conversion?
PQ to HLG. The scripts were given here:

https://www.rationalqm.us/board/viewtop ... 394#p16394

You were supposed to post if you didn't like my choice of playing field. ;)
User avatar
Rocky
Posts: 2709
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

Finding #5: Frame properties

When using std-b67, the _Transfer frame property is 18. When using st2084, the _Transfer frame property is 16. All other properties are the same.

So we will focus initially on the last stage and seek out the reason for the discrepancy. I'll have to bring up avsresize in the debugger. That will take a little time to get going.
User avatar
Curly
Posts: 222
Joined: Sun Mar 15, 2020 11:05 am

Port Cube

Post by Curly »

@tormento

Just for fun, add this parameter to all z calls and see if things go faster.

approximate_gamma=true

And modify avsresize to support CUDASynth protocol. It's not difficult.
User avatar
tormento
Posts: 727
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by tormento »

Curly wrote:
Thu Sep 22, 2022 11:53 am
Just for fun
Will try, thanks!
Curly wrote:
Thu Sep 22, 2022 11:53 am
And modify avsresize to support CUDASynth protocol. It's not difficult.
:scratch:
User avatar
Rocky
Posts: 2709
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

Curly wrote:
Thu Sep 22, 2022 11:53 am
And modify avsresize to support CUDASynth protocol. It's not difficult.
No. To gain anything you need to port z algorithms to CUDA.
User avatar
tormento
Posts: 727
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by tormento »

Rocky wrote:
Thu Sep 22, 2022 3:11 pm
No.
Hello? Everything is fine?
User avatar
Bullwinkle
Posts: 309
Joined: Thu Sep 05, 2019 6:37 pm

Port Cube

Post by Bullwinkle »

Rocky is ill and may not get back to coding for some days.

Any luck with the approximate_gamma hack?
User avatar
tormento
Posts: 727
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by tormento »

Bullwinkle wrote:
Wed Sep 28, 2022 4:01 pm
is ill
My best wishes.
User avatar
Bullwinkle
Posts: 309
Joined: Thu Sep 05, 2019 6:37 pm

Port Cube

Post by Bullwinkle »

Thank you, big t.

BTW, I slipstreamed a change to DGCube to map negative values to 0. People should not create cubes with negative values but I'll be nice and try to help them. Redownload to get the change.

Any luck with the approximate_gamma hack?
User avatar
tormento
Posts: 727
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by tormento »

Bullwinkle wrote:
Fri Sep 30, 2022 11:08 am
BTW, I slipstreamed
I was about sending you the url of the message :D
Bullwinkle wrote:
Fri Sep 30, 2022 11:08 am
Any luck with the approximate_gamma hack?
I need to empty the encoding queue before trying. Will do ASAP.
User avatar
tormento
Posts: 727
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by tormento »

I had some time to test approx gamma.

UHD: Thor – Love and thunder, mixed frames (1722 frames in total)

Script:

Code: Select all

SetMemoryMax()
LoadPlugin("D:\Eseguibili\Media\DGDecNV\DGDecodeNV.dll")
LoadPlugin("D:\Eseguibili\Media\DGCube\DGCube.dll")
DGSource("F:\In\1_58 Thor Love and thunder\thor.dgi",ct=280,cb=280,cl=0,cr=0)
propClearAll()
z_ConvertFormat(pixel_type="RGBP16", colorspace_op="2020:st2084:2020:limited=>rgb:st2084:2020:full", resample_filter_uv="spline64", dither_type="error_diffusion")
DGCube("D:\Programmi\Media\AviSynth+\cube\1a_PQ1000_HLG_mode-nar_in-nar_out-nar_nocomp.cube", in="full", lut="full", out="full")
z_ConvertFormat(pixel_type="YUV420P10", colorspace_op="rgb:std-b67:2020:full=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")
fmtc_bitdepth (bits=10, dmode=8)
Prefetch(8)

Code: Select all

AVSMeter64.exe "Thor 4K HLG BBC ext.avs"

AVSMeter 3.0.8.0 (x64), (c) Groucho2004, 2012-2021
AviSynth+ 3.7.3 (r3689, 3.7, x86_64) (3.7.3.0)

Number of frames:                     1722
Length (hh:mm:ss.ms):         00:01:11.822
Frame width:                          3840
Frame height:                         1600
Framerate:                          23.976 (24000/1001)
Colorspace:                      YUV420P10

Frames processed:                   1722 (0 - 1721)
FPS (min | max | average):          0.839 | 322580 | 16.01
Process memory usage (max):         2840 MiB
Thread count:                       27
CPU usage (average):                71.5%

Time (elapsed):                     00:01:47.571
Replacing the cube part with

Code: Select all

z_ConvertFormat(pixel_type="RGBP16", colorspace_op="2020:st2084:2020:limited=>rgb:st2084:2020:full", resample_filter_uv="spline64", dither_type="error_diffusion",approximate_gamma=true)
DGCube("D:\Programmi\Media\AviSynth+\cube\1a_PQ1000_HLG_mode-nar_in-nar_out-nar_nocomp.cube", in="full", lut="full", out="full")
z_ConvertFormat(pixel_type="YUV420P10", colorspace_op="rgb:std-b67:2020:full=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion",approximate_gamma=true)
I get

Code: Select all

Frames processed:                   1722 (0 - 1721)
FPS (min | max | average):          0.606 | 344828 | 16.62
Process memory usage (max):         2697 MiB
Thread count:                       27
CPU usage (average):                70.5%

Time (elapsed):                     00:01:43.608
Not so much to risk issues with gamma approx.
User avatar
Rocky
Posts: 2709
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

OK, nowhere near the speedup hoped for. Thank you for your testing.
User avatar
tormento
Posts: 727
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by tormento »

Rocky wrote:
Wed Oct 05, 2022 12:52 pm
OK, nowhere near the speedup hoped for. Thank you for your testing.
Now that you have stocked with acorns, I bet you will pull a rabbit out of the hat. :D
User avatar
Rocky
Posts: 2709
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

Ha ha, clever frog. Croak!
User avatar
Rocky
Posts: 2709
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

Completed my last coaching cert today and Sherman is taking over the radios. I'll try to make sense of z source code to extract the matrices they are using for comparison to mine.
User avatar
tormento
Posts: 727
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by tormento »

Rocky wrote:
Thu Oct 06, 2022 5:32 pm
I'll try to make sense of z source code to extract the matrices they are using for comparison to mine.
Eager to test your wonders. :salute:
User avatar
Rocky
Posts: 2709
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

Me too. :D
User avatar
Rocky
Posts: 2709
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

Ooh, I think I found the place in the code where the coefficients are generated. Let's see if I can hit it in the debugger.
User avatar
tormento
Posts: 727
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by tormento »

Rocky wrote:
Tue Oct 11, 2022 9:39 am
Ooh, I think I found the place in the code where the coefficients are generated.
Remember that conversion from PQ to SD works fine, it's the HLG part that has issue.
User avatar
Rocky
Posts: 2709
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

I don't follow that because the HLG is part of the 3D LUT. I've shown the discrepancies for the two operations before and after the 3D LUT application, i.e., the YUV -> RGB and RGB -> YUV conversions.
Post Reply