Know how you feel. I am "planning" a new build for a little later in the year and a major hurdle is the complete lack of stock of RTX30xx cardsRocky wrote: ↑Thu Feb 25, 2021 5:21 pmGot my 3090 tracker with alerts running for days. Twice I got the alert and within seconds went to the seller site and tried to buy. But both times they were already out of stock. This is getting to be very annoying. By the time I can actually buy one there'll be 4090 and 5090.
CUDASynth
CUDASynth
CUDASynth
I got tired by waiting for stocks and bought bitcoin yesterday for 50% of holding sum for RTX3080. I'm expecting to sell it at 70-90k and cover inflation/overprice by our sellers. Also hoping for better availability in summer or autumn. Most likely I'm not gonna have it this spring.
CUDASynth
Elon Musk appears to be all in on bitcoin. Might have to revise my stance on that.
https://www.cnbc.com/2021/02/08/tesla-b ... tcoin.html
https://www.cnbc.com/2021/02/08/tesla-b ... tcoin.html
CUDASynth
My my, how time flies. I was thinking about CUDASynth again. It's hard to pass up a 350% performance improvement for the case I demonstrated. Actually, the improvement gets greater the more CUDA filters there are in the chain. That's because with CUDASynth you always have only two CPU<->GPU transfers, instead of 2 x N, where N is the number of CUDA filters (including DGSource). And for UHD, the CPU<->GPU transfers are the bottleneck, so it's really attractive for UHD and above.
Realistically, even though I published code for a generic CUDA filter supporting this mode of operation, nobody was interested. And honestly, only a handful of people globally can even write a CUDA filter, let alone one that supports this mode of operation. So we're not going to change the world.
No matter, there's no reason why I can't just do it in the DG world. The way I previously suggested requires some fiddly syntax in the script, the fsrc/fdst parameters. Sure I could make a script compiler that adds those things automatically. But since no-one cares I'm going to just put the filters inside DGSource(), requiring only to have new DGSource() parameters to, for example, turn on denoising. I'll probably support DGDenoise(), DGSharpen(), and DGHDRtoSDR(), starting with DGDenoise(). Future CUDA filters could be added.
The script reduces to something like:
DGSource(source parameters, denoise parameters, sharpen parameters, hdr2sdr parameters)
If the denoise parameters are omitted then denoising is not performed, etc.
This framework will be a big incentive to create more-and-more CUDA filters. The standalone filters would still work fine of course.
Well, it's something fun to do in the cold months.
Realistically, even though I published code for a generic CUDA filter supporting this mode of operation, nobody was interested. And honestly, only a handful of people globally can even write a CUDA filter, let alone one that supports this mode of operation. So we're not going to change the world.
No matter, there's no reason why I can't just do it in the DG world. The way I previously suggested requires some fiddly syntax in the script, the fsrc/fdst parameters. Sure I could make a script compiler that adds those things automatically. But since no-one cares I'm going to just put the filters inside DGSource(), requiring only to have new DGSource() parameters to, for example, turn on denoising. I'll probably support DGDenoise(), DGSharpen(), and DGHDRtoSDR(), starting with DGDenoise(). Future CUDA filters could be added.
The script reduces to something like:
DGSource(source parameters, denoise parameters, sharpen parameters, hdr2sdr parameters)
If the denoise parameters are omitted then denoising is not performed, etc.
This framework will be a big incentive to create more-and-more CUDA filters. The standalone filters would still work fine of course.
Well, it's something fun to do in the cold months.
CUDASynth
Beaut ! Fantasmagorical.
Sounds silly, but in a "reusable script" it would allow the call to have favourite parameters "preset" and the only thing I'd then need to do is turn the filter on or off via those True/False depending on the need.
If I have this right in my noggin - with HDR becoming ubiquitous especially on phones, cudasynth DGHDRtoSDR is especially welcome !
Thank you and Cheers.
I wonder if you would consider having off/on parameters, eg "denoise=True", in there too.requiring only to have new DGSource() parameters to, for example, turn on denoising.
Sounds silly, but in a "reusable script" it would allow the call to have favourite parameters "preset" and the only thing I'd then need to do is turn the filter on or off via those True/False depending on the need.
If I have this right in my noggin - with HDR becoming ubiquitous especially on phones, cudasynth DGHDRtoSDR is especially welcome !
Thank you and Cheers.
I really do like it here.
CUDASynth
Great idea, hydra3333. We could have one main parameter, e.g.:
denoise = 0 is off (default)
1 is preset 1
2 is preset 2
...
-1 is no preset with parameters to follow
Presets would be stored in a config file that you program.
Thank you for the idea and keep them coming.
denoise = 0 is off (default)
1 is preset 1
2 is preset 2
...
-1 is no preset with parameters to follow
Presets would be stored in a config file that you program.
Thank you for the idea and keep them coming.
CUDASynth
OK. Not sure how that may work in the case of a vanilla .vpy script consumed by vspipe and piped into ffmpeg.
Unless perhaps DGSource opens and checks a presets file containing bunches of presets per filter for the specified id, by text id name ?, (it may, I'll have to check the documentation).
Deinterlacing is a daily thing for me, I guess that may not change or perhaps somehow default to the current nvidia deinterlacer.
Looking forward to seeing what you settle on !
Kind Regards,
ye olde hydra
I really do like it here.
CUDASynth
It doesn't do that now, but yes, that's what I have in mind.
We could have a preset for the deinterlace option.Deinterlacing is a daily thing for me, I guess that may not change or perhaps somehow default to the current nvidia deinterlacer.
Remember the old .def file for DGMPGDec's mpeg2_source()? We'll reincarnate it for DGSource().
I may not do the -1 option. All parameters for these filters would have to be in the .def file. Not sure yet. Just don't want to have super long DGSource() lines in the script.
CUDASynth
He he, no long lines etc ?
Cool ! Oops, I guess I wandered back to my mis-spent youth where it a thing to see stuff you needed in one place and fiddling with "sequential files" could be challenging. Oh well. I suppose I don't really miss paper tape and debugging device drivers by single stepping using front panel switches and lights and octal though.
Cool ! Oops, I guess I wandered back to my mis-spent youth where it a thing to see stuff you needed in one place and fiddling with "sequential files" could be challenging. Oh well. I suppose I don't really miss paper tape and debugging device drivers by single stepping using front panel switches and lights and octal though.
I really do like it here.
CUDASynth
Wow guys, you're gonna be shocked, shocked, I tell ya.
So, I implemented the first part of the changes for the new DGSource() that will integrate some CUDA filters like DGDenoise(). Three things I had to do for that were: 1) do the NV12->YV12 conversion in a CUDA kernel rather than the CPU, 2) use consistent pitches for all buffers except the Avisynth buffers (which I cannot control), and which allowed replacement of pitched copies with linear copies, and 3) use env->BitBlt() for the final copying to the Avisynth buffers [instead of for loops and memcpy()]. Here are the results:
Original DGSource():
Revised DGSource():
I hope you noticed the 34% performance improvement. And that's for UHD content. I think that deserves a .
I was pulling my fur out trying to get this working for days. Looks like it was worth the effort. I might release this as is and then add the filters. DG is gonna be thrilled. Can you imagine the change log entry?
* Improved DGSource() performance by 34%.
So, I implemented the first part of the changes for the new DGSource() that will integrate some CUDA filters like DGDenoise(). Three things I had to do for that were: 1) do the NV12->YV12 conversion in a CUDA kernel rather than the CPU, 2) use consistent pitches for all buffers except the Avisynth buffers (which I cannot control), and which allowed replacement of pitched copies with linear copies, and 3) use env->BitBlt() for the final copying to the Avisynth buffers [instead of for loops and memcpy()]. Here are the results:
Original DGSource():
Code: Select all
AVSMeter 2.7.5 (x64) - Copyright (c) 2012-2017, Groucho2004
AviSynth+ 3.7.3 (r4029, 3.7, x86_64) (3.7.3.0)
Number of frames: 6772
Length (hh:mm:ss.ms): 00:01:52.980
Frame width: 3840
Frame height: 2160
Framerate: 59.940 (60000/1001)
Colorspace: YUV420P16
Frames processed: 6772 (0 - 6771)
FPS (min | max | average): 106.0 | 278.7 | 263.4
Memory usage (phys | virt): 602 | 1550 MiB
Thread count: 23
CPU usage (average): 7%
Time (elapsed): 00:00:25.711
Code: Select all
AviSynth+ 3.7.3 (r4029, 3.7, x86_64) (3.7.3.0)
Number of frames: 6772
Length (hh:mm:ss.ms): 00:01:52.980
Frame width: 3840
Frame height: 2160
Framerate: 59.940 (60000/1001)
Colorspace: YUV420P16
Frames processed: 6772 (0 - 6771)
FPS (min | max | average): 119.1 | 379.9 | 354.0
Memory usage (phys | virt): 602 | 1562 MiB
Thread count: 23
CPU usage (average): 7%
Time (elapsed): 00:00:19.127
I was pulling my fur out trying to get this working for days. Looks like it was worth the effort. I might release this as is and then add the filters. DG is gonna be thrilled. Can you imagine the change log entry?
* Improved DGSource() performance by 34%.
CUDASynth
Now 444 is kicking my ass, but I'll win in the end.
- SomeHumanPerson
- Posts: 96
- Joined: Fri Mar 24, 2023 10:41 am
CUDASynth
That's an impressive performance gain and it sounds like you've really got your teeth into this.
CUDASynth
Thank you, sir. I just cracked the 444 nut! Silly me, I forgot that for 444, the U and V are not interleaved. I just couldn't figure out why my de-interleaving kernel wasn't working for 444. All I needed was a straight copy. That was 12 hours down the toilet. I finally thought to see what the current version does with 444 and it jumped right out at me because it uses a straight copy. Still, seeing that beautiful decoded 444 playing washed away all the frustration and negativity that had set in. After starting vdub hundreds of times while debugging, when it finally shows a clean picture, there is euphoria.
Now I just have to port it to vapoursynth, which should not be fraught in any way, and I can give you a test version, probably tomorrow.
Bonus: Ever since I started using a UHD monitor, I've had an issue with mouse double clicks. Any slight motion of the mouse between the clicks causes it to fail to detect a double click. So frustrating. Well today, I stumbled on a solution for that. In the registry you can configure the size of the rectangle within which motion doesn't matter. By default windows has 4x4 pixels, which isn't very large on a UHD monitor. I increased it to 16x16 (may even go to 32x32) and I don't have to be a brain surgeon to make a double click anymore.
Are you not entertained?
Now I just have to port it to vapoursynth, which should not be fraught in any way, and I can give you a test version, probably tomorrow.
Bonus: Ever since I started using a UHD monitor, I've had an issue with mouse double clicks. Any slight motion of the mouse between the clicks causes it to fail to detect a double click. So frustrating. Well today, I stumbled on a solution for that. In the registry you can configure the size of the rectangle within which motion doesn't matter. By default windows has 4x4 pixels, which isn't very large on a UHD monitor. I increased it to 16x16 (may even go to 32x32) and I don't have to be a brain surgeon to make a double click anymore.
Are you not entertained?
CUDASynth
Boys and girls, be the first on your block to try out the new and improved DGSource()! Best price and approved by Granny. Your testing will be appreciated.
* 33 percent improvement in frame rate (measured for a UHD stream).
https://rationalqm.us/misc/DGDecodeNV_test1.rar (64 bit)
Simply replace your existing DGDecN.dll with this one.
* 33 percent improvement in frame rate (measured for a UHD stream).
https://rationalqm.us/misc/DGDecodeNV_test1.rar (64 bit)
Simply replace your existing DGDecN.dll with this one.
CUDASynth
Got DGHDRtoSDR working integrated into DGSource() as a test of concept. I put the parameters in hard-coded. Now I have to decide about the DGSource() configuration file stuff.
Kinda surprised at the lack of feedback on the sped-up DGSource(). I suppose maybe people don't follow this thread. I thought it was a big deal as it throws the cat among the pigeons in the CPU vs. GPU decoding wars. The two have been comparable with an edge to GPU for its lesser CPU utilization, but this starts to make it a mismatch, IMHO. With the integrated filters it will become a stomping.
Kinda surprised at the lack of feedback on the sped-up DGSource(). I suppose maybe people don't follow this thread. I thought it was a big deal as it throws the cat among the pigeons in the CPU vs. GPU decoding wars. The two have been comparable with an edge to GPU for its lesser CPU utilization, but this starts to make it a mismatch, IMHO. With the integrated filters it will become a stomping.
CUDASynth
Great to hear from you.
What do you think about the config file idea:
dgsource:deinterlace=2
dgsource:use_pf=true
dghdrtosdr:white=1500
etc.
If something is not included the existing default is used. Giving the parameter explicitly to DGSource() overrides the config file. Or we could ditch the explicit parameters entirely and use only the config file, but that breaks existing scripts. We need a way to avoid unwieldy parameter lists to DGSource(). Or do we really, what's so terrible about it? I would just have to check that Avisynth/Vapoursynth supports potentially very long lists. Maybe just change to the config file approach when we exceed that limit as we add filters.
EDIT: Apparently, 1024 parameters can be given, so we'll just do it that way for now. StainlessS
Another issue is ordering of the filter execution. I have some ideas for that. We should support an arbitrary ordering. Maybe each filter has an order parameter. Or, we get it from the order of the parameters, that's the ticket. I used to fantasize about changing Avisynth to do this stuff, i.e., kernel management and ordering as a solution to the cumbersome CUDASynth original idea with the fsrc/fdst parameters. Now this management will be a part of DGSource(). It's wonderful to be in control.
What do you think about the config file idea:
dgsource:deinterlace=2
dgsource:use_pf=true
dghdrtosdr:white=1500
etc.
If something is not included the existing default is used. Giving the parameter explicitly to DGSource() overrides the config file. Or we could ditch the explicit parameters entirely and use only the config file, but that breaks existing scripts. We need a way to avoid unwieldy parameter lists to DGSource(). Or do we really, what's so terrible about it? I would just have to check that Avisynth/Vapoursynth supports potentially very long lists. Maybe just change to the config file approach when we exceed that limit as we add filters.
EDIT: Apparently, 1024 parameters can be given, so we'll just do it that way for now. StainlessS
Another issue is ordering of the filter execution. I have some ideas for that. We should support an arbitrary ordering. Maybe each filter has an order parameter. Or, we get it from the order of the parameters, that's the ticket. I used to fantasize about changing Avisynth to do this stuff, i.e., kernel management and ordering as a solution to the cumbersome CUDASynth original idea with the fsrc/fdst parameters. Now this management will be a part of DGSource(). It's wonderful to be in control.
CUDASynth
Frame serving and HDRtoSDR for UHD video is clocking in at 331 fps. That seems pretty good to me. Now, I'm going to add DGDenoise next.
CUDASynth
Mmm... it's rare to see common features that can be applied to all videos. I would leave the config file for DGIndexNV only and let the command line speak for DGSource itself.
That is fast. Let's see how it scales on different cards. I am still on PCI-E 3.0 with a 1660 Super.
When DGCube time will come, we will need and internal conversion path too, as now CPU <-> GPU transfer is mandatory in certain type of HDR conversions (PQ->HLG, as example).
CUDASynth
My horoscope for today:
Sagittarius, only those who follow the developed plan and complete all its stages can achieve success in their field.
Sagittarius, only those who follow the developed plan and complete all its stages can achieve success in their field.
CUDASynth
Got Vapoursynth supported. I just want to check a few things as I found some strangenesses in the Vapoursynth support that trigger my internal inconsistency detector. I don't like to just ignore them. Hopefully, I'll make a test release tomorrow.