Page 1 of 3

[RESOLVED] DGIndexNV 2048 doesn't handle Shift-JIS names

Posted: Sat Jul 19, 2014 4:31 pm
by jmt247
Hello.

I see some errors when I open a file that contains "0x5c" in the second byte of a character.

Please see https://en.wikipedia.org/wiki/Shift_JIS and https://sites.google.com/site/fudist/Ho ... i-jp/table

It opens the wrong directory when I try to save the project file.

I'm certain that previous versions didn't have this problem.
2047.png
2048.png
Also, those should be "(テレビ大阪) 2014-07-20 天気予報.ts" in the top. "予" is the problematic character.

Re: DGIndexNV 2048 doesn't handle Shift-JIS names

Posted: Sat Jul 19, 2014 4:52 pm
by jmt247
The source code of this modified DGIndex would help you to fix this issue.

https://skydrive.live.com/?cid=8658EC27 ... 99D5%21230

edit: I found newer versions.

Re: DGIndexNV 2048 doesn't handle Shift-JIS names

Posted: Sat Jul 19, 2014 6:39 pm
by admin
Is it Unicode? I don't support Unicode. Sorry.

Re: DGIndexNV 2048 doesn't handle Shift-JIS names

Posted: Sat Jul 19, 2014 9:05 pm
by Aleron Ives
Shift JIS is one of the older character sets used for Japanese. It predates Unicode and is still very common in Japan and in programs made by Japanese developers.

Re: DGIndexNV 2048 doesn't handle Shift-JIS names

Posted: Sat Jul 19, 2014 11:45 pm
by jmt247
I don't know much about programming, but the readme of the source says, he had made a trick with the TCHAR routine instead of Unicode, MBCS. I think Unicode support would help other people who regularly use multibyte characters, though.

Re: DGIndexNV 2048 doesn't handle Shift-JIS names

Posted: Sun Jul 20, 2014 7:39 am
by admin
Which is the first version up there that includes this fix (including his history directory)? I need to minimize all the other changes he made to make diffing easier for me.

Re: DGIndexNV 2048 doesn't handle Shift-JIS names

Posted: Sun Jul 20, 2014 1:04 pm
by jmt247
admin wrote:Which is the first version up there that includes this fix (including his history directory)? I need to minimize all the other changes he made to make diffing easier for me.
He fixed that in mod 3 and this is the closest that I have. (mod 4)

https://www.mediafire.com/?9w0y9z5ah14dkbd

Thank you very much for your effort. All Asians salute you!

Re: DGIndexNV 2048 doesn't handle Shift-JIS names

Posted: Sun Jul 20, 2014 7:26 pm
by admin
I've started a Unicode aware version. It will take some time. I'm using the approach described here, which appears to be the easiest way for me to implement Unicode support:

http://www.utf8everywhere.org/

It retains the existing simple char arrays for internal storage but they are now UTF-8. For all interactions with Windows APIs, they are converted as needed to/from wchar_t. I already have a lot of it implemented, certainly enough to prove the concept. There are a lot of places I need to change (for example, the Boost narrow() function does not work on a multiselect string from an open dialog), and there is a lot of regression testing to do, but at this point it is just cranking the handle.

I accept the author's point that in the modern world a program that does not support Unicode is arguably brain dead.
:agree:

Re: DGIndexNV 2048 doesn't handle Shift-JIS names

Posted: Tue Jul 22, 2014 12:16 pm
by admin
I've completed Unicode support in DGIndexNV. Not bad for 3 days, given the extensive functionality that was affected (everything touching a Windows API!). ;)

I have to make some minor changes to DGDecodeNV. It has to handle the UTF-8 paths from the AVS script and the DGI file. Then I will give you a beta version of 2049 including this.

Re: DGIndexNV 2048 doesn't handle Shift-JIS names

Posted: Tue Jul 22, 2014 11:01 pm
by admin
Oh crap. I get DGDecodeNV coded for Unicode and then try to test it. Surprise! Stupid Avisynth/Avisynth+ won't open a script file with a Unicode name. And even if I rename the script, it kicks out an error if the script is in UTF-8. Bottom line: Avisynth does not support Unicode, so all my work is wasted. I should have thought of that, but I thought Avisynth could and would just pass my UTF-8 file name param as it is -- a weird looking char string. It doesn't have to interpret it. No, it just sees some UTF-8 in the script and says unh-unh. So stupid. Now, I see threads about Avisynth and Unicode out there but they peter out without anything being concluded or done. And anyway, IanB fell off the face of the earth leaving Avisynth licensing in lala land. What a giant cluster truck.

I will look into hacking Avisynth to at least allow a UTF-8 script name and passing filter params without bothering if they are UTF-8, which would allow my source filter to work, but really, don't hold your breaths.

Let me sleep on this; maybe there is a way out.

:evil: :evil: :evil: :evil: :evil: :evil: :evil: :evil: :evil: :evil: :evil: :evil:

Re: DGIndexNV 2048 doesn't handle Shift-JIS names

Posted: Wed Jul 23, 2014 2:06 am
by Aleron Ives
It's a real shame that AviSynth is stuck in limbo, as it really seems like there should be an x86-64 version with native mulithreading support by now, so developers could update old plugins and write new ones to take full advantage of modern CPUs, but we're still stuck with getting such features in unofficial branches. Thanks for trying to support more languages, at any rate. Maybe you can at least keep the code around in case anyone makes a viable UTF-aware AviSynth fork at some point.

Re: DGIndexNV 2048 doesn't handle Shift-JIS names

Posted: Wed Jul 23, 2014 7:17 am
by admin
That's good to know. I wonder if OP will comment.

One shouldn't have to depend on the system locale setting, however.

Re: DGIndexNV 2048 doesn't handle Shift-JIS names

Posted: Wed Jul 23, 2014 7:52 am
by admin
Ah, it's good to have access to an expert in encodings.

Will you be willing to test my unicode version 2049 to see if it fixes the display issue you mention. If so, I will make it available to you privately.

And of course it's great to know that Avisynth can work correctly if the locale is set properly.

Re: DGIndexNV 2048 doesn't handle Shift-JIS names

Posted: Wed Jul 23, 2014 7:58 am
by admin
PM coming.

The display issue at least should be fixed. But this version generates UTF-8 scripts and DGI files. Will that be a problem for Avisynth? If so, what encoding should I use for the scripts? I can use whatever I want for the DGI files, I suppose, as Avisynth doesn't look at it.

After resolving that stuff, we can test DGDecodeNV.

Re: DGIndexNV 2048 doesn't handle Shift-JIS names

Posted: Wed Jul 23, 2014 8:12 am
by admin
Did you get the PM? I don't see it going away from my outbox.

Re: DGIndexNV 2048 doesn't handle Shift-JIS names

Posted: Wed Jul 23, 2014 8:25 am
by admin
No, it's not just the Windows title.

Were you able to test?

I understand that I can't use UTF-8 in the script. What should I use to be able to represent the Unicode names?

Re: DGIndexNV 2048 doesn't handle Shift-JIS names

Posted: Wed Jul 23, 2014 8:55 am
by admin
What else did you find?
Messages in MessageBox's, etc. I've fixed all of them. Remember, I do not have the locale set. I try to make everything work without the locale. So we may not see the same things.
All I could test is the DGI and AVS creation which seems fine. Also, the window title displays correctly now.
Good.
As I mentioned before, define "_MBCS" and "MBCS" in your project, and use string routines from tchar.h.
I can't do that because it conflicts with everything I have already done (I have Unicode enabled just to get the right APIs and compiler checks). But I did discover that it is only the BOM in the script that dazzles Avisynth. So I can just not set ccs=UTF-8 on the open but go ahead and write my UTF-8 strings anyway. The only possible downside is that the script may not look right in an editor?

But I can just use MBCS in the script. That I think I can do with the correct open mode. Will Avisynth be OK with a BOM specifying MBCS?

Re: DGIndexNV 2048 doesn't handle Shift-JIS names

Posted: Wed Jul 23, 2014 9:18 am
by admin
Huh? Why not?
Because I am a purist. I think a string should be able to have multiple languages, if that is what the user wants. So I test with names like this:

שלום привет hello.m2ts

The problem is that I can't do anything about Avisynth being not able to open a script named with Unicode, so the locale is needed to get around that. But I have already coded everything non-local-aware and I see no reason to throw it out.
Noooo! If you remove the BOM Avisynth will assume the script content is Ansi (Shift-JIS, whatever) and choke.
Even if the non-ANSI is only in a filename parameter to a source filter?
Yes. If you want to use Unicode in your program you should use the "WideCharToMultiByte()" API to write strings to the script/dgi.
OK, that's the way I will go. But instead of doing it explicitly like that I only have to use open mode "w,ccs=UTF-16LE". If Avisynth can't handle that, I will try your way.

Re: DGIndexNV 2048 doesn't handle Shift-JIS names

Posted: Wed Jul 23, 2014 9:26 am
by admin
Yes.
But why?

Actually, I just tested it and it doesn't barf.

Also, I am scared to change my locale because I may not be able to switch back because I cannot read Japanese. :o

Re: DGIndexNV 2048 doesn't handle Shift-JIS names

Posted: Wed Jul 23, 2014 9:55 am
by admin
Not at all.

As I said I deleted the BOM before opening the script in Avisynth, and my filter received the call. You can call that wrong testing, but it happened. There is no reason for Avisynth to parse a filename parameter to a source filter.

It's moot, though, because I am going to have an MBCS BOM.

Re: DGIndexNV 2048 doesn't handle Shift-JIS names

Posted: Wed Jul 23, 2014 10:04 am
by admin
I greatly appreciate your help. ;)

Re: DGIndexNV 2048 doesn't handle Shift-JIS names

Posted: Wed Jul 23, 2014 10:22 am
by admin
Thanks for the correction. I just meant that I will send MBCS in the script.

I'll give you the script when I have converted to MBCS from UTF-8. Right now, I have a UTF-8 script and I can make everything work all the way to serving frames to VirtualDub if I delete the BOM from the script. With file mode "w,css=UTF-8" a BOM gets added to the script by the runtime. If I leave it, Avisynth barfs. If I delete it, everything works.

Converting to MBCS in the script should solve everything (I hope). I can leave UTF-8 in the DGI file; that doesn't matter.

Re: DGIndexNV 2048 doesn't handle Shift-JIS names

Posted: Wed Jul 23, 2014 10:32 am
by admin
OK, remember the script is created by my AVS template system. That's another good reason not to use UTF-8 for the script, because a user making a script by hand will not be using UTF-8. :scratch:

See the attachment.

Re: DGIndexNV 2048 doesn't handle Shift-JIS names

Posted: Wed Jul 23, 2014 10:47 am
by admin
Can you show me a script that Avisynth can open but which has a multibyte filename for the source filter? And how will users make scripts like that?

Re: DGIndexNV 2048 doesn't handle Shift-JIS names

Posted: Wed Jul 23, 2014 4:06 pm
by admin
So it's just a char string that is properly mapped by the locale?

In that case, I should just go back and make a few simple changes like you described, as you say everything just worked with one (or maybe a few) display issues. The idea of "UTF-8 Everywhere" is not viable if Avisynth does not support it.