Performance question
Performance question
It was ages I didn't try to encode something with the support of a software decoder.
I had a bunch of anime to encode during xmas to HEVC and I tried to give the batch process of StaxRip a run.
Before configuring StaxRip to properly index using DGIndexNV, it had FFVideoSource as default.
Well, with my big surprise its performance are equal if not faster than DGDecNV, when encoding, despite with the latter the encoding is offloaded to GPU.
Perhaps my PCI-e 2.0 is a bit old, perhaps my 1060 3GB is not the fastest card around and the CPU is an ancient i7-2600k.
Some years ago there was the possibility to choose between CUDA and CUVID decoding but it has disappeared.
Beside that, do you have any hint to increase DGDecNV performance?
I had a bunch of anime to encode during xmas to HEVC and I tried to give the batch process of StaxRip a run.
Before configuring StaxRip to properly index using DGIndexNV, it had FFVideoSource as default.
Well, with my big surprise its performance are equal if not faster than DGDecNV, when encoding, despite with the latter the encoding is offloaded to GPU.
Perhaps my PCI-e 2.0 is a bit old, perhaps my 1060 3GB is not the fastest card around and the CPU is an ancient i7-2600k.
Some years ago there was the possibility to choose between CUDA and CUVID decoding but it has disappeared.
Beside that, do you have any hint to increase DGDecNV performance?
Performance issue
Can you please tell me:
1. the source details
2. both scripts
3. the target format details
4. the performance details for both cases
I'd like to try duplicating it before speculating.
1. the source details
2. both scripts
3. the target format details
4. the performance details for both cases
I'd like to try duplicating it before speculating.
Performance issue
Guest 2
I believe that DGDecodeNV frame serves, not encodes, your cpu is doing the encoding.Well, with my big surprise its performance are equal if not faster than DGDecNV, when encoding, despite with the latter the encoding is offloaded to GPU.
Performance issue
1) Plain 1080p anime in mkv container, x264 8000 kbit/s cbr
2) No script at all, i.e. the plain lines to serve video to x265, nothing else
3) HEVC 10 bit mkv, x265.exe --crf 22 --tune animation --output-depth 10 --colorprim bt709 --colormatrix bt709 --transfer bt709 --range limited
4) Let me finish the queue and I will give you some results.
Performance issue
This may be an MKV issue. I know my old MKV library has issues. Any chance to try with an M2TS?
"No script at all, i.e. the plain lines to serve video to x265, nothing else"
You need a script to invoke DGSource, so you are confusing me. How do you send the script output to x265? Just whatever staxrip does? If you need a special version of x265, please tell me what it is and where to get it.
Not doing any prefetch games (possibly transparently by staxrip)?
"No script at all, i.e. the plain lines to serve video to x265, nothing else"
You need a script to invoke DGSource, so you are confusing me. How do you send the script output to x265? Just whatever staxrip does? If you need a special version of x265, please tell me what it is and where to get it.
Not doing any prefetch games (possibly transparently by staxrip)?
Performance issue
I will.
I meant: the scripts are really minimal, just the necessary few lines to invoke DG or FF.
As far I can see, the script is plain simple, with no prefetching at all.
DG one:
Code: Select all
LoadPlugin("D:\Eseguibili\Media\DGDecNV\DGDecodeNV.dll")
DGSource("G:\Raw\World Trigger\2021 3ª\96 Round finale_temp\temp.dgi")
Code: Select all
LoadPlugin("D:\Eseguibili\Media\StaxRip Anime\Apps\Plugins\Dual\ffms2\ffms2.dll")
tcFile = "G:\Raw\World Trigger\2021 3ª\96 Round finale_temp\96 Round finale_timestamps.txt" # timestamps file path
Exist(tcFile) ? FFVideoSource("G:\Raw\World Trigger\2021 3ª\96 Round finale.mkv", cachefile="G:\Raw\World Trigger\2021 3ª\96 Round finale_temp\temp.ffindex", timecodes=tcFile) : FFVideoSource("G:\Raw\World Trigger\2021 3ª\96 Round finale.mkv", cachefile="G:\Raw\World Trigger\2021 3ª\96 Round finale_temp\temp.ffindex")
Performance issue
Some data for MKV, will update later with M2TS results.
Recompression from 1080p mkv x264 to x265 (same command line as before).
DG: encoded 33596 frames in 2256.04s (14.89 fps), 1126.28 kb/s, Avg QP:27.89
FF: encoded 33596 frames in 2320.88s (14.48 fps), 1126.28 kb/s, Avg QP:27.89
Recompression from 1080p mkv x264 to x265 (same command line as before).
DG: encoded 33596 frames in 2256.04s (14.89 fps), 1126.28 kb/s, Avg QP:27.89
FF: encoded 33596 frames in 2320.88s (14.48 fps), 1126.28 kb/s, Avg QP:27.89
Performance issue
You didn't answer my questions about which version of x265 and how you feed the script to it.
Performance issue
avs+ [INFO]: AviSynth+ 3.7.1 (r3577, master, x86_64)
x265 [INFO]: HEVC encoder version 3.5+21+12-cb341a7ef [Mod by Patman]
The AVS script is generated and launched by StaxRip.
I did another test with a m2ts
DG
encoded 34377 frames in 2405.01s (14.29 fps), 1025.35 kb/s, Avg QP:27.50
FF
encoded 34377 frames in 2363.18s (14.55 fps), 1025.35 kb/s, Avg QP:27.50
Performance issue
Guest 2, what is the cpu usage during each scenario?
Performance issue
Hi.
I am not quite sure what you expect.
The encoding happens in the CPU and not in the GPU. It doesn't really matter what you use to serve the frames in my opinion. dgdecnv can probably serve much higher FPS to x265, but x265 is the bottleneck here.
Try to run the avs file in avsmeter, and you will probably see that it is showing much higher FPS.
x265 is just a bitch when it comes to encoding. Seeing 3-4 FPS is not exceptional when encoding movies with x265.
Maybe I am missing something with your workflow, but x265 is just much more CPU hungry than x264.
renols
I am not quite sure what you expect.
The encoding happens in the CPU and not in the GPU. It doesn't really matter what you use to serve the frames in my opinion. dgdecnv can probably serve much higher FPS to x265, but x265 is the bottleneck here.
Try to run the avs file in avsmeter, and you will probably see that it is showing much higher FPS.
x265 is just a bitch when it comes to encoding. Seeing 3-4 FPS is not exceptional when encoding movies with x265.
Maybe I am missing something with your workflow, but x265 is just much more CPU hungry than x264.
renols
Performance issue
That's what I was thinking. The decoding is such a small part of things that all source filters will look alike.
Performance issue
Some results with AVSMeter64 and simple AVS scripts, just the few lines to serve the video:
DG:
FF:
I can't complain about DG performance at all
Now the results with x264.exe --crf 20 --preset slow --tune animation --level 4.1 --aq-mode 2 --colorprim bt709 --colormatrix bt709 --transfer bt709 --range tv on the very same scripts:
DG:
FF:
With --preset medium:
DG:
FF:
Perhaps the decoding task is really a small part of the whole process.
Unfortunately my rig doesn't have the power to deal with 4k. If someone else can do some tests, they could be useful.
DG:
Code: Select all
AVSMeter 3.0.8.0 (x64), (c) Groucho2004, 2012-2021
AviSynth+ 3.7.1 (r3577, master, x86_64) (3.7.1.0)
Number of frames: 34377
Length (hh:mm:ss.ms): 00:23:53.807
Frame width: 1920
Frame height: 1080
Framerate: 23.976 (24000/1001)
Colorspace: YV12
Frames processed: 34377 (0 - 34376)
FPS (min | max | average): 334.1 | 775.2 | 725.3
Process memory usage (max): 304 MiB
Thread count: 14
CPU usage (average): 13.9%
GPU usage (average): 40%
VPU usage (average): 90%
GPU memory usage: 584 MiB
GPU Power Consumption (average): 43.5 W
Time (elapsed): 00:00:47.398
Code: Select all
AVSMeter 3.0.8.0 (x64), (c) Groucho2004, 2012-2021
AviSynth+ 3.7.1 (r3577, master, x86_64) (3.7.1.0)
Number of frames: 34377
Length (hh:mm:ss.ms): 00:23:53.807
Frame width: 1920
Frame height: 1080
Framerate: 23.976 (24000/1001)
Colorspace: i420
Frames processed: 34377 (0 - 34376)
FPS (min | max | average): 241.9 | 1115 | 482.1
Process memory usage (max): 110 MiB
Thread count: 17
CPU usage (average): 79.8%
GPU usage (average): 5%
VPU usage (average): 0%
GPU memory usage: 473 MiB
GPU Power Consumption (average): 10.7 W
Time (elapsed): 00:01:11.306
Now the results with x264.exe --crf 20 --preset slow --tune animation --level 4.1 --aq-mode 2 --colorprim bt709 --colormatrix bt709 --transfer bt709 --range tv on the very same scripts:
DG:
Code: Select all
encoded 34377 frames, 31.55 fps, 2561.83 kb/s, duration 0:18:09.76
Code: Select all
encoded 34377 frames, 31.39 fps, 2561.83 kb/s, duration 0:18:15.00
DG:
Code: Select all
encoded 34377 frames, 38.58 fps, 2786.29 kb/s, duration 0:14:51.06
Code: Select all
encoded 34377 frames, 36.40 fps, 2786.29 kb/s, duration 0:15:44.36
Unfortunately my rig doesn't have the power to deal with 4k. If someone else can do some tests, they could be useful.
Performance issue
Using VapourSynth Editor to bench mark script
4000 frames, 4K
190 fps
4000 frames, 4K
Code: Select all
import vapoursynth as vs
from vapoursynth import core
#####FRAME SERVER#####
core.std.LoadPlugin("C:/Program Files (Portable)/dgdecodenv/DGDecodeNV.dll")
clip = core.dgdecodenv.DGSource(r'F:\4K\BLACK PANTHER PID 1011.dgi', fieldop=0)
clip = core.resize.Point(clip, format=vs.YUV420P10)
clip.set_output()
Performance question
Do not have FFMS2 installed, don't need it
The fps is not only dependent on resolution, but also cpu and gpu.
Performance comparisons would have account for this as well.
Edit
FFMS2 is not frame accurate
The fps is not only dependent on resolution, but also cpu and gpu.
Performance comparisons would have account for this as well.
Edit
FFMS2 is not frame accurate
Performance question
In some cases, yes. People resort to remuxing transport streams to MKV to get around it. That is absurd to me. Why not just fix it? Accurate random access is actually what we hang our hat on for DGDecNV. Sure, in some use cases faster decoding can be achieved and can contribute to faster transcoding, but not always, as we have seen in this thread.
One thing I want to do is look into improving MKV parsing. While it's not horribly bad, why not fix that too, if possible.
Performance question
To appease the poisonous frog of torment
1.7% of CPU
FFMS2 was using no VPU and 34.7% of CPU
which is an absurd amount on my system and would significantly impact a hevc encode.
I'll stick to DGIndexNV
PS
FFMS2 had a (eternal) long delay launching on vdub2 and avsmeter while DGIndexNV was near instantaneous
If the purpose of the script is for encoding then DGDecodeNV is the superior choice, no cpu load
Bear in mind that DGDecodeNV was using approximately 40% of my VPU and FFMS2 was using no VPU and 34.7% of CPU
which is an absurd amount on my system and would significantly impact a hevc encode.
I'll stick to DGIndexNV
PS
FFMS2 had a (eternal) long delay launching on vdub2 and avsmeter while DGIndexNV was near instantaneous
If the purpose of the script is for encoding then DGDecodeNV is the superior choice, no cpu load
Performance question
Good to hear.
Still, I'd like to do some testing before I mark this thread resolved.
Still, I'd like to do some testing before I mark this thread resolved.