Port Cube

Post by **Sherman** » Tue Jun 20, 2023 12:22 pm

The malcontents should learn to code. It's so easy!

Post by **Curly** » Tue Jun 20, 2023 12:41 pm

Easy 4 u 2 say.

Post by **Baltasar** » Tue Jun 20, 2023 3:43 pm

This worked for me.

Post by **Rocky** » Wed Jun 21, 2023 3:46 pm

There are ample resources for learning CUDA. We published sample source code for a CUDA filter, and documented our full dialog with nVidia during the development of DGDecNV. Everything you need is in the nVidia SDKs, API documentation, and developer forum. Focus, persistence, and attention to detail!

The zimg/avsresize authors...maybe they'd be willing to work with us to port their stuff to CUDA. I'd be more than happy to help. I could add some CUDASynth magic to eliminate extra PCIe transfers, etc. Their filters can remain standalone and fully in their control. We'll help for free. That seems like the most likely path to reach the promised land.

Post by **Rocky** » Sun Jul 02, 2023 7:55 am

Guest 2 wrote: ↑
Wed Aug 10, 2022 10:08 am
P.S: Will you add tetrahedral to AVSCube too?

sekrit-twc has added tetrahedral to timecube so after it is ported to AVS we can ditch DGCube and stop all the associated nonsense.

Guest 2 · Post by **Guest 2** » Sun Jul 02, 2023 4:32 pm

Rocky wrote: ↑
Sun Jul 02, 2023 7:55 am
Then we can ditch DGCube and stop all the associated nonsense.

Not everybody has a 56 core CPU. Someone still relies on GPU. (me)

Post by **Rocky** » Sun Jul 02, 2023 4:34 pm

Oh OK. We'll keep it then.

Guest 2 · Post by **Guest 2** » Tue Jul 04, 2023 3:52 am

Rocky wrote: ↑
Sun Jul 02, 2023 4:34 pm
Oh OK. We'll keep it then.

https://github.com/rigaya/NVEnc/blob/ma ... ram2value2

lut3d=<string>
Apply a 3D LUT to an input video. Currently supports .cube file only.

lut3d_interp=<string>
nearest, trilinear, tetrahedral, pyramid, prism

I think this could ease my pain.

I could use nvenc to encode to lossless intermediate and then proceed with x265.

Do you think is now feasible to use the source code to implement it in DGCube?

Post by **Rocky** » Tue Jul 04, 2023 7:42 am

Guest 2 wrote: ↑
Tue Jul 04, 2023 3:52 am
Do you think is now feasible to use the source code to implement it in DGCube?

Implement what? Please be specific and precise.

And what is your pain exactly? You cannot implement some desired processing? Or you can but it doesn't run as fast as you'd like?

Guest 2 · Post by **Guest 2** » Tue Jul 04, 2023 12:55 pm

Rocky wrote: ↑
Tue Jul 04, 2023 7:42 am
Implement what? Please be specific and precise.

Nothing, Rocky. I think I won't bug you again about DGCube.

I have eased my pains applying LUT with NVEnc to an intermediate lossless HEVC and then encoded it with standard x265. Easy peasy and unbelievably fast.

Thanks again for having implemented HEVC 4:4:4 decoding

Post by **Rocky** » Tue Jul 04, 2023 1:28 pm

Guest 2 wrote: ↑
Tue Jul 04, 2023 12:55 pm
I have eased my pains...

Won't you have the grace to explain what your pains are after I explicitly asked and after all I've done for you over the years.

Guest 2 · Post by **Guest 2** » Tue Jul 04, 2023 2:05 pm

Rocky wrote: ↑
Tue Jul 04, 2023 1:28 pm
Won't you have the grace to explain what your pains are after I explicitly asked and after all I've done for you over the years.

As I told many times, my CPU is too old and slow to comfortably apply zimg conversion and have a correct PQ to HLG transformation, using DGCube.

I have squeezed my brain to find a workaround and I am testing NVEnc to do all the job but the final encode.

It does everything in HW, fast and clean, tetrahedral included.

1080p SDR to HLG 160.65 fps
2160p PQ to HLG 42.90 fps

The only issue is storage requirements but I can cope with that.

Post by **Rocky** » Tue Jul 04, 2023 2:14 pm

It's not just storage, it's the time to write out massive lossless streams and then read them again. And having to encode twice. That's not fast and it's certainly not clean. Nevertheless I'm happy you have what you consider to be an adequate workaround for your pains.

As I told many times

That feels rude and unfriendly. And who knows what "comfortable" means for you?

my CPU is too old and slow

When you upgrade your HW for HEVC lossless, think too about upgrading your CPU.

Post by **Rocky** » Fri Jul 07, 2023 10:05 am

Hehe, I have successfully ported sekrit-twc's latest vscube to AVS+. Still have to fix up some loose ends but it's running fine with tetrahedral and all cpu modes. It took just less than 3 hours to port.

Post by **Rocky** » Sat Jul 08, 2023 12:36 pm

Here is a test release of the AVS+ support for sekrit-twc's latest vscube. Refer to the user manual for syntax and examples. Your test results will be greatly appreciated. My testing shows the AVS+ version with prefetch(6) to be faster than the Vapoursynth version.

https://rationalqm.us/cube/AVSCube_test.rar

Say thank you.

Post by **Sherman** » Sat Jul 08, 2023 12:59 pm

Rocky wrote: ↑
Fri Jul 07, 2023 10:05 am
It took just less than 3 hours to port.

You're slipping, Rocky. Do you need to get some young blood involved?

Post by **Natasha** » Sat Jul 08, 2023 1:00 pm

Sherman wrote: ↑
Sat Jul 08, 2023 12:59 pm
young blood

The best kind!

Post by **Rocky** » Tue Jul 11, 2023 10:05 am

The timecube+AVS support test build was relocated:

https://rationalqm.us/cube/

All the cube stuff is now together in directory cube.

Post by **Rocky** » Wed Jul 12, 2023 5:05 am

Here is an updated test build for AVS+ support for sekrit-twc's vscube. It includes sekrit-twc's bug fix for the AVX2 support of tetrahedral mode.

https://rationalqm.us/cube/AVSCube_test.rar

Post by **Rocky** » Fri Apr 05, 2024 4:56 pm

Guest 2 wrote: ↑
Wed Aug 31, 2022 12:18 pm
I see tiny discrepancies between the product of external (identical to AVSCube) and of internal processing (look at graphs, mostly).

Well guys, I finally wrapped my rodent brain around this stuff and discovered the reasons for this. My matrices are off. I did research and now fully understand how to generate the matrices for any combination of:

8 vs. 16 bits
limited vs full range for input and output
601 vs. 709 vs. 2020 space
constant vs. non-constant luminance

Working through that for DGHDRtoSDR(), I saw that the equations I was using (can't even remember where I got them) were off by enough to account for discrepancies.

So I will fix that and, more importantly, I will fix DGCube's internal conversions and properly extend them to support all needed conversions. This will eliminate the need for external conversions using zimg, greatly improving performance.

Post by **Rocky** » Thu Apr 11, 2024 1:23 pm

Well guys, my optimism was premature. I was under the impression that all gamma-related stuff would be implemented in the LUT. But looking at the script (the one that revealed discrepancies) shows that the specified gamma inverse is being applied to create linear RGB to be passed to the script. So it is not enough for me to fix the coefficients in the YUV->RGB->YUV conversions. I also have to implement all the gamma stuff. And who knows, maybe also primaries stuff. So it's back to having to recreate the whole of z_ConvertFormat() if we are to have everything on the GPU. I'm not going to do that as it is a massive undertaking with zero benefit for me.

If you are wondering about DGHDRtoSDR() everything is fine as it does the needed gamma processing for
PQ/HLG->709.

hydra3333 · Post by **hydra3333** » Sat Apr 13, 2024 9:06 am

OK and thanks for looking into it.

At a guess, I suppose it also means no gpu HDRAGC ? Or even a hybrid ?

Post by **Rocky** » Sat Apr 13, 2024 9:54 am

No m8 it has no relevance for HDRAGC and curves-type stuff. I am still developing my own curves filter.

hydra3333 · Post by **hydra3333** » Sun Apr 14, 2024 3:38 am

Beaut, thanks bloke.