Port Cube

Post by **Sherman** » Sun Aug 14, 2022 3:34 pm

I can multitask. Nothing worthwhile is easy. Balti told me that.

Britney · Post by **Britney** » Sun Aug 14, 2022 3:36 pm

Thrilled to help! I'll rub your shoulders, Rocky. There, that's better.

new_guy · Post by **new_guy** » Sun Aug 14, 2022 3:39 pm

Can you rub mine too?

Britney · Post by **Britney** » Sun Aug 14, 2022 3:40 pm

Eeeuww, no way.

Post by **Rocky** » Mon Aug 15, 2022 8:43 am

Thank you, Brit! I found the bug that was kicking my patootie. Should be plain sailing now.

EDIT: But that wasn't the only bug, see next post.

Post by **Rocky** » Tue Aug 16, 2022 4:38 pm

Ha, plain sailing. Famous last words. Two full days of agonizing over why my conversions were producing an over-saturated mess, even though I took the YUV<-->RGB conversions directly from DGHDRtoSDR (minus the depth reduction). Guys, I tried EVERYTHING. Well, except for one little thing that shouldn't have mattered.

So, I was going to try writing out the intermediate RGB to see if that was already messed up, or if it was a bug in the final RGB->YUV. Of course, to do that you cannot have an in-place filter (get src, read/write src, return src) because the received and returned formats would be different. Am I boring you? You need to do get src, read src, write dst, return dst. So I did that with dst = NewVideoFrameP() etc. but still I left in the final RGB->YUV and writing YUV just to test my dst handling. Whoa, suddenly the output was correct.

It shouldn't have made a difference unless Avisynth+ is doing something different with MakeWritable() etc. that I am used to for in-place filters. But I don't care about that, just that it is working.

That was all done in C code. Now I'll port it to the CUDA kernel. That should be plain sailing.

Post by **Natasha** » Tue Aug 16, 2022 5:14 pm

Where are your teeth?

Post by **Sherman** » Tue Aug 16, 2022 5:14 pm

You should know. Remember the accident?

Post by **Curly** » Tue Aug 16, 2022 5:36 pm

Not every day I can say I told you so. Obvious! Nyuck.

Boris · Post by **Boris** » Tue Aug 16, 2022 6:01 pm

Just wait til the lights go pfizz. It's over.

Wonder Woman · Post by **Wonder Woman** » Tue Aug 16, 2022 6:40 pm

I know you missed me but here I am ready to slay dragons.

new_guy · Post by **new_guy** » Tue Aug 16, 2022 6:41 pm

Slay my dragon.

Guest 2 · Post by **Guest 2** » Wed Aug 17, 2022 5:47 am

Rocky wrote: ↑
Tue Aug 16, 2022 4:38 pm
That was all done in C code. Now I'll port it to the CUDA kernel. That should be plain sailing.

Eager to test it.

Guest 2 · Post by **Guest 2** » Wed Aug 17, 2022 8:51 am

Perhaps you already know but LUT cubes can work in RGB or Studio RGB, i.e. full TV range or limited TV range.

On PQ to HLG, for full TV range, such as BBC ones, you would need:

#From 4:2:0 16bit planar Narrow Range to RGB Planar 16bit Full Range
z_ConvertFormat(pixel_type="RGBP16", colorspace_op="2020:st2084:2020:limited=>rgb:st2084:2020:full", resample_filter_uv="spline64", dither_type="error_diffusion")

#From PQ to HLG with 16bit precision
DGCube("7a_HLG_PQ1000_mode-nar_in-nar_out-nar_nocomp.cube", fullrange=true)

#From RGB 16bit planar Full Range to YUV420 10bit planar Narrow Range with dithering
z_ConvertFormat(pixel_type="YUV420P10", colorspace_op="rgb:std-b67:2020:full=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")

On PQ to HLG, for limited TV range, such as Warner ones, you would need:

#From 4:2:0 16bit planar Narrow Range to RGB Planar 16bit Narrow Range
z_ConvertFormat(pixel_type="RGBP16", colorspace_op="2020:st2084:2020:limited=>rgb:st2084:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")

#From PQ to HLG with 16bit precision
DGCube("WarnerBros_PQToHLG_MaxCLL_2508.cube", fullrange=true)

#From RGB 16bit planar Narrow Range to YUV420 10bit planar Narrow Range with dithering
z_ConvertFormat(pixel_type="YUV420P10", colorspace_op="rgb:std-b67:2020:limited=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")

I hope that your, releasing, internal conversion could take this difference in good care.

PS: I have some doubts about the fullrange switch in the DGCube command in the second case.

Post by **Rocky** » Wed Aug 17, 2022 11:36 am

Let's test what we have and I'll look into the range stuff in parallel.

Please re-download DGCube.zip to get support for direct YUV420P16 as received from
DGSource() for high-bit-depth sources. Support for RGBP16 is also still supported. Vapoursynth
is supported. The DGCube text document was updated and includes relevant sample scripts.

Here is a very simple script:

DGSource("THE GREAT WALL.dgi")
DGCube("PQ_to_BT709_slope.cube", fullrange=false, interp="tetrahedral")

Your test results will be appreciated.

https://rationalqm.us/misc/DGCube.zip.

DG said we might be able to afford dentures for me later this year. Gnawing acorns is a challenge with my little stubs.

Post by **Natasha** » Wed Aug 17, 2022 11:41 am

That new_guy is really desperate. I tasted his blood one night. Sour!

Guest 2 · Post by **Guest 2** » Wed Aug 17, 2022 12:21 pm

Rocky wrote: ↑
Wed Aug 17, 2022 11:36 am
Let's test what we have and I'll look into the range stuff in parallel.

In the examples you write:

loadplugin("...\dgdecodenv.dll")
loadplugin("...\dgcube.dll")
dgsource("THE GREAT WALL.dgi")
DGCube("PQ_to_BT709_slope.cube", fullrange=false, interp="tetrahedral")

What is the output format and range? Does still need a z to have a proper space out?

Why you put fullrange=false? What would happen with true?

As you can see I am mostly interested in PQ to HLG transformation.

Post by **Rocky** » Wed Aug 17, 2022 12:41 pm

Oh sure. Depends on at least three things. Is the LUT made for limited or full-range input/output? Is the source limited or full range? Should the output be limited or full range?

For the example, I assumed the LUT is made for full range. The source here I assume has limited range. So I set fullrange=false to expand it to full in DGCube. The output then would presumably be full range.

Not being au fait with the current practice for use of 3D LUTs, I would appeal to users to say what they need. There are 3 considerations, previously alluded to:

The source range.
The LUT assumed ranges.
The output range.

Everything is possible but what do we seek?

Maybe for DGCube, specifying input range and output range are enough. Intermediate processing would assume a full range LUT.

Post by **Curly** » Wed Aug 17, 2022 12:57 pm

Guest 2 wrote: ↑
Wed Aug 17, 2022 12:21 pm
As you can see I am mostly interested in PQ to HLG transformation.

Good move. PQ sucks rocks.

Kernel stuff is free, so make it as general as possible. Generally speaking by the General. Am I wrong?

Guest 2 · Post by **Guest 2** » Wed Aug 17, 2022 1:01 pm

Rocky wrote: ↑
Wed Aug 17, 2022 12:41 pm
Depends on at least three things. Is the LUT made for limited or full-range input/output? Is the source limited or full range? Should the output be limited or full range?

1) Input, in my case, is PQ video from UHD, so we can assume it's limited range.

2) My LUT wants full range.

3) The output should be limited range.

So, how should I write the script, with that in mind?

My findings: the script

Code: Select all

SetFilterMTMode("DEFAULT_MT_MODE", 2)
LoadPlugin("D:\Eseguibili\Media\DGDecNV\DGDecodeNV.dll")
LoadPlugin("D:\Eseguibili\Media\DgCube\DGCube.dll")
DGSource("F:\In\2_0446 Akira\akira.dgi",ct=48,cb=48,cl=0,cr=0)
propClearAll()
DGCube("D:\Programmi\Media\AviSynth+\cube\1a_PQ1000_HLG_mode-nar_in-nar_out-nar_nocomp.cube", fullrange=true)
From RGB 16bit planar Full Range to YUV422 10bit planar Narrow Range with dithering
z_ConvertFormat(pixel_type="YUV420P10", colorspace_op="rgb:std-b67:2020:full=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")
ConvertBits(32)
BM3D_CUDA(sigma=3, radius=2)
BM3D_VAggregate(radius=2)
fmtc_bitdepth (bits=10,dmode=8)
neo_f3kdb(range=15, Y=65, Cb=40, Cr=40, grainY=0, grainC=0, sample_mode=2, blur_first=true, dynamic_grain=false, mt=false, keep_tv_range=true)
Prefetch(3)

gives the error on z_ConvertFormat line.

Code: Select all

YUV color family cannot have RGB matrix coefficients

The same with fullrange=false.

Guest 2 · Post by **Guest 2** » Wed Aug 17, 2022 1:01 pm

Curly wrote: ↑
Wed Aug 17, 2022 12:57 pm
Good move. PQ sucks rocks.

Indeed.

Post by **Rocky** » Thu Aug 18, 2022 8:18 am

The way it is designed to work is that if fullrange is false then the pixels are scaled up to fullrange on input and scaled back down to limited on output. In between the LUT is applied. So the fullrange parameter has the meaning "when false we specify that the input is limited and the output should be limited".

If this is not what is happening or is not what you need, then please advise.

Post by **Rocky** » Thu Aug 18, 2022 9:30 am

Ah, I forgot that my YUV <-> RGB conversions from DGHDRtoSDR already do limited -> full and back (the tonemapping is done in full RGB). I remember getting the coefficients for input by multiplying the YUV->RGB matrix by the limited-to-full matrix, which is more efficient than doing a separate scaling. Similarly on output. So, fullrange is inapplicable for YUV420P16 input, i.e., just leave it as fullrange=true. It would still be applicable to RGBP16 input. Gonna rethink the interface, e.g., error out when fullrange=false for YUV input? Or just silently ignore it?

Guest 2 · Post by **Guest 2** » Thu Aug 18, 2022 10:25 am

Rocky wrote: ↑
Thu Aug 18, 2022 8:18 am
when false we specify that the input is limited and the output should be limited

I don't know if possible but if the LUT produces limited from full or full from limited? Perhaps it should be better to specify what we are giving as input and what we want as output?

Guest 2 · Post by **Guest 2** » Thu Aug 18, 2022 10:27 am

Rocky wrote: ↑
Thu Aug 18, 2022 9:30 am
Ah, I forgot that my YUV <-> RGB conversions

And what about my error? Any idea?

Rocky wrote: ↑
Thu Aug 18, 2022 9:30 am
fullrange is inapplicable for YUV420P16 input

What would happen if DGDecodeNV outputs a 444 stream?