Port Cube
Port Cube
I can multitask. Nothing worthwhile is easy. Balti told me that.
Port Cube
Thrilled to help! I'll rub your shoulders, Rocky. There, that's better.
Port Cube
Thank you, Brit! I found the bug that was kicking my patootie. Should be plain sailing now.
EDIT: But that wasn't the only bug, see next post.
EDIT: But that wasn't the only bug, see next post.
Port Cube
Ha, plain sailing. Famous last words. Two full days of agonizing over why my conversions were producing an over-saturated mess, even though I took the YUV<-->RGB conversions directly from DGHDRtoSDR (minus the depth reduction). Guys, I tried EVERYTHING. Well, except for one little thing that shouldn't have mattered.
So, I was going to try writing out the intermediate RGB to see if that was already messed up, or if it was a bug in the final RGB->YUV. Of course, to do that you cannot have an in-place filter (get src, read/write src, return src) because the received and returned formats would be different. Am I boring you? You need to do get src, read src, write dst, return dst. So I did that with dst = NewVideoFrameP() etc. but still I left in the final RGB->YUV and writing YUV just to test my dst handling. Whoa, suddenly the output was correct.
It shouldn't have made a difference unless Avisynth+ is doing something different with MakeWritable() etc. that I am used to for in-place filters. But I don't care about that, just that it is working.
That was all done in C code. Now I'll port it to the CUDA kernel. That should be plain sailing.
So, I was going to try writing out the intermediate RGB to see if that was already messed up, or if it was a bug in the final RGB->YUV. Of course, to do that you cannot have an in-place filter (get src, read/write src, return src) because the received and returned formats would be different. Am I boring you? You need to do get src, read src, write dst, return dst. So I did that with dst = NewVideoFrameP() etc. but still I left in the final RGB->YUV and writing YUV just to test my dst handling. Whoa, suddenly the output was correct.
It shouldn't have made a difference unless Avisynth+ is doing something different with MakeWritable() etc. that I am used to for in-place filters. But I don't care about that, just that it is working.
That was all done in C code. Now I'll port it to the CUDA kernel. That should be plain sailing.
Port Cube
You should know. Remember the accident?
Port Cube
Not every day I can say I told you so. Obvious! Nyuck.
Port Cube
Just wait til the lights go pfizz. It's over.
- Wonder Woman
- Posts: 58
- Joined: Sun Feb 07, 2021 10:46 am
Port Cube
I know you missed me but here I am ready to slay dragons.
Port Cube
Perhaps you already know but LUT cubes can work in RGB or Studio RGB, i.e. full TV range or limited TV range.
On PQ to HLG, for full TV range, such as BBC ones, you would need:
#From 4:2:0 16bit planar Narrow Range to RGB Planar 16bit Full Range
z_ConvertFormat(pixel_type="RGBP16", colorspace_op="2020:st2084:2020:limited=>rgb:st2084:2020:full", resample_filter_uv="spline64", dither_type="error_diffusion")
#From PQ to HLG with 16bit precision
DGCube("7a_HLG_PQ1000_mode-nar_in-nar_out-nar_nocomp.cube", fullrange=true)
#From RGB 16bit planar Full Range to YUV420 10bit planar Narrow Range with dithering
z_ConvertFormat(pixel_type="YUV420P10", colorspace_op="rgb:std-b67:2020:full=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")
On PQ to HLG, for limited TV range, such as Warner ones, you would need:
#From 4:2:0 16bit planar Narrow Range to RGB Planar 16bit Narrow Range
z_ConvertFormat(pixel_type="RGBP16", colorspace_op="2020:st2084:2020:limited=>rgb:st2084:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")
#From PQ to HLG with 16bit precision
DGCube("WarnerBros_PQToHLG_MaxCLL_2508.cube", fullrange=true)
#From RGB 16bit planar Narrow Range to YUV420 10bit planar Narrow Range with dithering
z_ConvertFormat(pixel_type="YUV420P10", colorspace_op="rgb:std-b67:2020:limited=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")
I hope that your, releasing, internal conversion could take this difference in good care.
PS: I have some doubts about the fullrange switch in the DGCube command in the second case.
On PQ to HLG, for full TV range, such as BBC ones, you would need:
#From 4:2:0 16bit planar Narrow Range to RGB Planar 16bit Full Range
z_ConvertFormat(pixel_type="RGBP16", colorspace_op="2020:st2084:2020:limited=>rgb:st2084:2020:full", resample_filter_uv="spline64", dither_type="error_diffusion")
#From PQ to HLG with 16bit precision
DGCube("7a_HLG_PQ1000_mode-nar_in-nar_out-nar_nocomp.cube", fullrange=true)
#From RGB 16bit planar Full Range to YUV420 10bit planar Narrow Range with dithering
z_ConvertFormat(pixel_type="YUV420P10", colorspace_op="rgb:std-b67:2020:full=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")
On PQ to HLG, for limited TV range, such as Warner ones, you would need:
#From 4:2:0 16bit planar Narrow Range to RGB Planar 16bit Narrow Range
z_ConvertFormat(pixel_type="RGBP16", colorspace_op="2020:st2084:2020:limited=>rgb:st2084:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")
#From PQ to HLG with 16bit precision
DGCube("WarnerBros_PQToHLG_MaxCLL_2508.cube", fullrange=true)
#From RGB 16bit planar Narrow Range to YUV420 10bit planar Narrow Range with dithering
z_ConvertFormat(pixel_type="YUV420P10", colorspace_op="rgb:std-b67:2020:limited=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")
I hope that your, releasing, internal conversion could take this difference in good care.
PS: I have some doubts about the fullrange switch in the DGCube command in the second case.
Port Cube
Let's test what we have and I'll look into the range stuff in parallel.
Please re-download DGCube.zip to get support for direct YUV420P16 as received from
DGSource() for high-bit-depth sources. Support for RGBP16 is also still supported. Vapoursynth
is supported. The DGCube text document was updated and includes relevant sample scripts.
Here is a very simple script:
DGSource("THE GREAT WALL.dgi")
DGCube("PQ_to_BT709_slope.cube", fullrange=false, interp="tetrahedral")
Your test results will be appreciated.
https://rationalqm.us/misc/DGCube.zip.
DG said we might be able to afford dentures for me later this year. Gnawing acorns is a challenge with my little stubs.
Please re-download DGCube.zip to get support for direct YUV420P16 as received from
DGSource() for high-bit-depth sources. Support for RGBP16 is also still supported. Vapoursynth
is supported. The DGCube text document was updated and includes relevant sample scripts.
Here is a very simple script:
DGSource("THE GREAT WALL.dgi")
DGCube("PQ_to_BT709_slope.cube", fullrange=false, interp="tetrahedral")
Your test results will be appreciated.
https://rationalqm.us/misc/DGCube.zip.
DG said we might be able to afford dentures for me later this year. Gnawing acorns is a challenge with my little stubs.
Port Cube
That new_guy is really desperate. I tasted his blood one night. Sour!
Port Cube
In the examples you write:
loadplugin("...\dgdecodenv.dll")
loadplugin("...\dgcube.dll")
dgsource("THE GREAT WALL.dgi")
DGCube("PQ_to_BT709_slope.cube", fullrange=false, interp="tetrahedral")
What is the output format and range? Does still need a z to have a proper space out?
Why you put fullrange=false? What would happen with true?
As you can see I am mostly interested in PQ to HLG transformation.
Port Cube
Oh sure. Depends on at least three things. Is the LUT made for limited or full-range input/output? Is the source limited or full range? Should the output be limited or full range?
For the example, I assumed the LUT is made for full range. The source here I assume has limited range. So I set fullrange=false to expand it to full in DGCube. The output then would presumably be full range.
Not being au fait with the current practice for use of 3D LUTs, I would appeal to users to say what they need. There are 3 considerations, previously alluded to:
The source range.
The LUT assumed ranges.
The output range.
Everything is possible but what do we seek?
Maybe for DGCube, specifying input range and output range are enough. Intermediate processing would assume a full range LUT.
For the example, I assumed the LUT is made for full range. The source here I assume has limited range. So I set fullrange=false to expand it to full in DGCube. The output then would presumably be full range.
Not being au fait with the current practice for use of 3D LUTs, I would appeal to users to say what they need. There are 3 considerations, previously alluded to:
The source range.
The LUT assumed ranges.
The output range.
Everything is possible but what do we seek?
Maybe for DGCube, specifying input range and output range are enough. Intermediate processing would assume a full range LUT.
Port Cube
1) Input, in my case, is PQ video from UHD, so we can assume it's limited range.
2) My LUT wants full range.
3) The output should be limited range.
So, how should I write the script, with that in mind?
My findings: the script
Code: Select all
SetFilterMTMode("DEFAULT_MT_MODE", 2)
LoadPlugin("D:\Eseguibili\Media\DGDecNV\DGDecodeNV.dll")
LoadPlugin("D:\Eseguibili\Media\DgCube\DGCube.dll")
DGSource("F:\In\2_0446 Akira\akira.dgi",ct=48,cb=48,cl=0,cr=0)
propClearAll()
DGCube("D:\Programmi\Media\AviSynth+\cube\1a_PQ1000_HLG_mode-nar_in-nar_out-nar_nocomp.cube", fullrange=true)
From RGB 16bit planar Full Range to YUV422 10bit planar Narrow Range with dithering
z_ConvertFormat(pixel_type="YUV420P10", colorspace_op="rgb:std-b67:2020:full=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")
ConvertBits(32)
BM3D_CUDA(sigma=3, radius=2)
BM3D_VAggregate(radius=2)
fmtc_bitdepth (bits=10,dmode=8)
neo_f3kdb(range=15, Y=65, Cb=40, Cr=40, grainY=0, grainC=0, sample_mode=2, blur_first=true, dynamic_grain=false, mt=false, keep_tv_range=true)
Prefetch(3)
Code: Select all
YUV color family cannot have RGB matrix coefficients
Port Cube
The way it is designed to work is that if fullrange is false then the pixels are scaled up to fullrange on input and scaled back down to limited on output. In between the LUT is applied. So the fullrange parameter has the meaning "when false we specify that the input is limited and the output should be limited".
If this is not what is happening or is not what you need, then please advise.
If this is not what is happening or is not what you need, then please advise.
Port Cube
Ah, I forgot that my YUV <-> RGB conversions from DGHDRtoSDR already do limited -> full and back (the tonemapping is done in full RGB). I remember getting the coefficients for input by multiplying the YUV->RGB matrix by the limited-to-full matrix, which is more efficient than doing a separate scaling. Similarly on output. So, fullrange is inapplicable for YUV420P16 input, i.e., just leave it as fullrange=true. It would still be applicable to RGBP16 input. Gonna rethink the interface, e.g., error out when fullrange=false for YUV input? Or just silently ignore it?
Port Cube
I don't know if possible but if the LUT produces limited from full or full from limited? Perhaps it should be better to specify what we are giving as input and what we want as output?