Port Cube

These CUDA filters are packaged into DGDecodeNV, which is part of DGDecNV.
User avatar
tormento
Posts: 726
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by tormento »

Rocky wrote:
Thu Aug 25, 2022 2:48 pm
Could be I'm confused, keeping an open mind.
You know, I am not a programmer, just a computer enthusiast.

AFAIK (https://developer.nvidia.com/blog/gpudirect-storage), CPU can be freed from having to manage data transfer to GPU. Perhaps it's won't increase GPU processing speed but, for sure, anything that is CPU related.

Or not? :D
User avatar
tormento
Posts: 726
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by tormento »

Annnnd sending 2 / 4 frames to video memory, stitched as a single transfer, to make them process and moved back as single transaction again?

I was also thinking that it's possible to need to have to apply more than one cube at a time (ex: technical, artistic, technical again) and it would be a pity to transfer and convert data more than once for the same frame(s).
User avatar
Rocky
Posts: 2709
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

That's the whole point of our CUDASynth proposal.

Stitching requires lots more GPU memory for minimal gain. It's not the overhead of memory transfer management that is the bottleneck. It's the data raw transfer time for large frames.

But I can certainly put CUDASynth into my filters, so that for example the following script has to transfer only one decoded frame on the PCIe, i.e., the final frame going back to the CPU.

DGSource(appropriate CUDASynth params) # only compressed data goes up to GPU, decoded frame is left on GPU
Cube(appropriate CUDASynth params) # uses frame on GPU, writes to GPU memory
Cube(appropriate CUDASynth params) # uses frame on GPU, writes to GPU memory
Cube(appropriate CUDASynth params) # uses frame on GPU, writes to CPU and returns it to Avisynth

Any CUDASynth enabled filter could be used in the chain. Any version of Avisynth or Vapoursynth could be used. Really guys, what's not to like? :?

Remember, I tried to interest developers in this years ago but no-one was interested. I even gave them a sample filter showing how to do it. Zero interest. Fine, I'll just do it for all my filters, and when people see how fast things can be maybe they will give it a thought.

OK, back to coding the DGCube() changes we've agreed on.
User avatar
Rocky
Posts: 2709
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

Avisynth+ CUDA stuff moved to a new thread. Please follow up there on that subject.

Also, I updated my last post above.
User avatar
Rocky
Posts: 2709
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

Got the new scheme coded and tested for RGBP16. Moving on to YUV420P16. Probably won't be done until tomorrow.
User avatar
tormento
Posts: 726
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by tormento »

Rocky wrote:
Sun Aug 28, 2022 12:36 pm
Got the new scheme coded and tested for RGBP16. Moving on to YUV420P16. Probably won't be done until tomorrow.
:salute:
User avatar
Rocky
Posts: 2709
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

I'm thinking of not implementing full range input/output for YUV. You could still have a full or limited range LUT. For RGB, full range or limited would be allowed for input, lut, and output.

Will anyone pop an aneurysm if I do that? YUV from our disks, etc., is limited range.
User avatar
tormento
Posts: 726
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by tormento »

Rocky wrote:
Mon Aug 29, 2022 12:29 pm
Will anyone pop an aneurysm if I do that? YUV from our disks, etc., is limited range.
YUV from "usual disks" is limited TV range but YUV from some cameras (even pro ones like Arri) can be either limited or full.

While you have hands in it, do it the right way, or, at least, plan for future release.
User avatar
Rocky
Posts: 2709
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

do it the right way
t, this stuff is hard. Lots of equations rigorously derived from specs. That's why there is no existing solution.

I'll plan for it and give an interim release without it. We need some samples for testing. Got any?
User avatar
tormento
Posts: 726
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by tormento »

Rocky wrote:
Mon Aug 29, 2022 1:01 pm
Got any?
Sure, let me see if I have some Sony Slog3 Full PC Range or similar. Perhaps tomorrow morning (CET).
User avatar
Rocky
Posts: 2709
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

tormento wrote:
Mon Aug 29, 2022 1:09 pm
Sony Slog3
OK.
User avatar
tormento
Posts: 726
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by tormento »

Rocky wrote:
Mon Aug 29, 2022 1:44 pm
OK.
I have found a Canon Log3 file, YUV 4:2:2 full range:

https://krakenfiles.com/view/3hCxREqNdC/file.html

Sorry for the elephant size but I don't really know how to split it without doing damages.

You need to index with LWLibavVideoSource, as nVidia doesn't support 4:2:2 (yet).

Here is the cube, pay attention that it converts Log3 full to 709 full:

https://krakenfiles.com/view/xSAX60AqLZ/file.html

You can find more LUTs here https://tools.rodrigopolo.com/canonluts/

And a tool to play with them here https://cameramanben.github.io/LUTCalc/ ... index.html
User avatar
Rocky
Posts: 2709
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

That's the best you've got? OK, I'll try to work with it. :D

Meanwhile, here is something to play with. Re-download DGCube.zip. It implements the new interface and should be good for everything except for full range YUV, which I am still working on. Refer to the document for details. Changes were extensive so your thorough testing will be greatly appreciated and amply rewarded.
User avatar
tormento
Posts: 726
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by tormento »

Script error: the named argument "in" to DGCube had the wrong type.

DGCube("D:\Programmi\Media\AviSynth+\cube\1a_PQ1000_HLG_mode-nar_in-nar_out-nar_nocomp.cube", in=1, lut=0, out=1, interp="tetrahedral")

Why oh why.

P.S: It seems that we need to use "full" or "lim" (much better than 0 and 1). Please update txt. :D
User avatar
tormento
Posts: 726
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by tormento »

Number of frames: 1792
Length (hh:mm:ss.ms): 00:01:14.741
Frame width: 3840
Frame height: 2064
Framerate: 23.976 (24000/1001)
Colorspace: YUV420P10

z_ConvertFormat(pixel_type="RGBP16", colorspace_op="2020:st2084:2020:limited=>rgb:st2084:2020:full", resample_filter_uv="spline64", dither_type="error_diffusion")
DGCube("D:\Programmi\Media\AviSynth+\cube\1a_PQ1000_HLG_mode-nar_in-nar_out-nar_nocomp.cube", in="full", lut="full", out="full")
z_ConvertFormat(pixel_type="YUV420P10", colorspace_op="rgb:std-b67:2020:full=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")


Frames processed: 1792 (0 - 1791)
FPS (min | max | average): 1.389 | 6.633 | 5.533
Process memory usage (max): 614 MiB
Thread count: 14
CPU usage (average): 8.4%

GPU usage (average): 16%
VPU usage (average): 10%
GPU memory usage: 1886 MiB
GPU Power Consumption (average): 42.5 W

Time (elapsed): 00:05:23.864

DGCube("D:\Programmi\Media\AviSynth+\cube\1a_PQ1000_HLG_mode-nar_in-nar_out-nar_nocomp.cube", in="lim", lut="full", out="lim")

Frames processed: 1792 (0 - 1791)
FPS (min | max | average): 2.799 | 57.32 | 27.42
Process memory usage (max): 543 MiB
Thread count: 12
CPU usage (average): 8.4%

GPU usage (average): 56%
VPU usage (average): 42%
GPU memory usage: 1826 MiB
GPU Power Consumption (average): 54.5 W

Time (elapsed): 00:01:05.359
User avatar
Rocky
Posts: 2709
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

Thank you for testing. Looks good so far.

Congrats on guessing new syntax. Never underestimate the t!

Text document was updated.
User avatar
tormento
Posts: 726
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by tormento »

Rocky wrote:
Wed Aug 31, 2022 11:09 am
Thank you for testing.
I see tiny discrepancies between the product of external (identical to AVSCube) and of internal processing (look at graphs, mostly).

z_ConvertFormat(pixel_type="RGBP16", colorspace_op="2020:st2084:2020:limited=>rgb:st2084:2020:full", resample_filter_uv="spline64", dither_type="error_diffusion")
Cube("D:\Programmi\Media\AviSynth+\cube\1a_PQ1000_HLG_mode-nar_in-nar_out-nar_nocomp.cube", fullrange=true)
z_ConvertFormat(pixel_type="YUV420P10", colorspace_op="rgb:std-b67:2020:full=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")

Image

z_ConvertFormat(pixel_type="RGBP16", colorspace_op="2020:st2084:2020:limited=>rgb:st2084:2020:full", resample_filter_uv="spline64", dither_type="error_diffusion")
DGCube("D:\Programmi\Media\AviSynth+\cube\1a_PQ1000_HLG_mode-nar_in-nar_out-nar_nocomp.cube", in="full", lut="full", out="full")
z_ConvertFormat(pixel_type="YUV420P10", colorspace_op="rgb:std-b67:2020:full=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")

Image

DGCube("D:\Programmi\Media\AviSynth+\cube\1a_PQ1000_HLG_mode-nar_in-nar_out-nar_nocomp.cube", in="lim", lut="full", out="lim")

Image

In limited LUT, the result on internal processing is totally wrong:

z_ConvertFormat(pixel_type="RGBP16", colorspace_op="2020:st2084:2020:limited=>rgb:st2084:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")
Cube("D:\Programmi\Media\AviSynth+\cube\WarnerBros_PQToHLG_MaxCLL_1000.cube", fullrange=true)
z_ConvertFormat(pixel_type="YUV420P10", colorspace_op="rgb:std-b67:2020:limited=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")

Image

z_ConvertFormat(pixel_type="RGBP16", colorspace_op="2020:st2084:2020:limited=>rgb:st2084:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")
DGCube("D:\Programmi\Media\AviSynth+\cube\WarnerBros_PQToHLG_MaxCLL_1000.cube", in="lim", lut="lim", out="lim")
z_ConvertFormat(pixel_type="YUV420P10", colorspace_op="rgb:std-b67:2020:limited=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")

Image

DGCube("D:\Programmi\Media\AviSynth+\cube\WarnerBros_PQToHLG_MaxCLL_1000.cube", in="lim", lut="lim", out="lim")

Image
User avatar
Rocky
Posts: 2709
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

How do you get those plots?
User avatar
tormento
Posts: 726
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by tormento »

Rocky wrote:
Wed Aug 31, 2022 1:33 pm
How do you get those plots?
VideoTek AVSI by FranceBB
User avatar
Rocky
Posts: 2709
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

Link please.
User avatar
tormento
Posts: 726
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by tormento »

Rocky wrote:
Wed Aug 31, 2022 2:11 pm
Link please.
Literally first link on Google

https://github.com/FranceBB/VideoTek
User avatar
tormento
Posts: 726
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by tormento »

Rocky wrote:
Wed Aug 31, 2022 11:09 am
Text document was updated.
Please, pay attention to Avisynth and Vapoursynth Examples sections, as you are using the old parameters.
User avatar
Rocky
Posts: 2709
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

Text document was updated.
User avatar
Rocky
Posts: 2709
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

tormento wrote:
Wed Aug 31, 2022 12:18 pm
I see tiny discrepancies between the product of external (identical to AVSCube) and of internal processing (look at graphs, mostly).
Let's look at that after we fix your big issue.
In limited LUT, the result on internal processing is totally wrong
Confirmed. I have it fixed for CPU code. Just have to port it to GPU and do some regression testing, and then I can give you a new test version.
User avatar
Rocky
Posts: 2709
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

Please re-download and test. Should be working for lim-lim-lim YUV. If you confirm it is working we can look at the tiny discrepancies.
Post Reply