Port Cube
Port Cube
The malcontents should learn to code. It's so easy!
Port Cube
There are ample resources for learning CUDA. We published sample source code for a CUDA filter, and documented our full dialog with nVidia during the development of DGDecNV. Everything you need is in the nVidia SDKs, API documentation, and developer forum. Focus, persistence, and attention to detail!
The zimg/avsresize authors...maybe they'd be willing to work with us to port their stuff to CUDA. I'd be more than happy to help. I could add some CUDASynth magic to eliminate extra PCIe transfers, etc. Their filters can remain standalone and fully in their control. We'll help for free. That seems like the most likely path to reach the promised land.
The zimg/avsresize authors...maybe they'd be willing to work with us to port their stuff to CUDA. I'd be more than happy to help. I could add some CUDASynth magic to eliminate extra PCIe transfers, etc. Their filters can remain standalone and fully in their control. We'll help for free. That seems like the most likely path to reach the promised land.
Port Cube
https://github.com/rigaya/NVEnc/blob/ma ... ram2value2
lut3d=<string>
Apply a 3D LUT to an input video. Currently supports .cube file only.
lut3d_interp=<string>
nearest, trilinear, tetrahedral, pyramid, prism
I think this could ease my pain.
I could use nvenc to encode to lossless intermediate and then proceed with x265.
Do you think is now feasible to use the source code to implement it in DGCube?

Port Cube
Nothing, Rocky. I think I won't bug you again about DGCube.
I have eased my pains applying LUT with NVEnc to an intermediate lossless HEVC and then encoded it with standard x265. Easy peasy and unbelievably fast.
Thanks again for having implemented HEVC 4:4:4 decoding

Port Cube
As I told many times, my CPU is too old and slow to comfortably apply zimg conversion and have a correct PQ to HLG transformation, using DGCube.
I have squeezed my brain to find a workaround and I am testing NVEnc to do all the job but the final encode.
It does everything in HW, fast and clean, tetrahedral included.
1080p SDR to HLG 160.65 fps
2160p PQ to HLG 42.90 fps
The only issue is storage requirements but I can cope with that.
Port Cube
It's not just storage, it's the time to write out massive lossless streams and then read them again. And having to encode twice. That's not fast and it's certainly not clean. Nevertheless I'm happy you have what you consider to be an adequate workaround for your pains.
That feels rude and unfriendly. And who knows what "comfortable" means for you?As I told many times
When you upgrade your HW for HEVC lossless, think too about upgrading your CPU.my CPU is too old and slow
Port Cube
Hehe, I have successfully ported sekrit-twc's latest vscube to AVS+. Still have to fix up some loose ends but it's running fine with tetrahedral and all cpu modes. It took just less than 3 hours to port.
Port Cube
Here is a test release of the AVS+ support for sekrit-twc's latest vscube. Refer to the user manual for syntax and examples. Your test results will be greatly appreciated. My testing shows the AVS+ version with prefetch(6) to be faster than the Vapoursynth version.
https://rationalqm.us/cube/AVSCube_test.rar
Say thank you.
https://rationalqm.us/cube/AVSCube_test.rar
Say thank you.
Port Cube
The timecube+AVS support test build was relocated:
https://rationalqm.us/cube/
All the cube stuff is now together in directory cube.
https://rationalqm.us/cube/
All the cube stuff is now together in directory cube.
Port Cube
Here is an updated test build for AVS+ support for sekrit-twc's vscube. It includes sekrit-twc's bug fix for the AVX2 support of tetrahedral mode.
https://rationalqm.us/cube/AVSCube_test.rar
https://rationalqm.us/cube/AVSCube_test.rar
Port Cube
Well guys, I finally wrapped my rodent brain around this stuff and discovered the reasons for this. My matrices are off. I did research and now fully understand how to generate the matrices for any combination of:
8 vs. 16 bits
limited vs full range for input and output
601 vs. 709 vs. 2020 space
constant vs. non-constant luminance
Working through that for DGHDRtoSDR(), I saw that the equations I was using (can't even remember where I got them) were off by enough to account for discrepancies.
So I will fix that and, more importantly, I will fix DGCube's internal conversions and properly extend them to support all needed conversions. This will eliminate the need for external conversions using zimg, greatly improving performance.
Port Cube
Well guys, my optimism was premature. I was under the impression that all gamma-related stuff would be implemented in the LUT. But looking at the script (the one that revealed discrepancies) shows that the specified gamma inverse is being applied to create linear RGB to be passed to the script. So it is not enough for me to fix the coefficients in the YUV->RGB->YUV conversions. I also have to implement all the gamma stuff. And who knows, maybe also primaries stuff. So it's back to having to recreate the whole of z_ConvertFormat() if we are to have everything on the GPU. I'm not going to do that as it is a massive undertaking with zero benefit for me.
If you are wondering about DGHDRtoSDR() everything is fine as it does the needed gamma processing for
PQ/HLG->709.
If you are wondering about DGHDRtoSDR() everything is fine as it does the needed gamma processing for
PQ/HLG->709.
Port Cube
OK and thanks for looking into it.
At a guess, I suppose it also means no gpu HDRAGC ? Or even a hybrid ?

At a guess, I suppose it also means no gpu HDRAGC ? Or even a hybrid ?
I really do like it here.
Port Cube
No m8 it has no relevance for HDRAGC and curves-type stuff. I am still developing my own curves filter.