Currently, I'm working on eboot dissassembling with IDA and xorloser's plugins.
Also, I made a program which automaticaly translates (Microsoft Translator API) blocks of japanese text to english and corrects byte-size (if output english file is bigger, it truncates it, if smaller - adds spaces). Maximum chunk size - 70 KB at once. Speed is also decent. If it'll help Alexmagno, I'll release it.
thank you guys.. quick question is there currently a way to somehow understand whats going on in cutscenses? I've patched the sound and replaced all the files mentioned with US equivalents but I have no idea whats up in cut scenes.
just listen up carefully. Dialogues are not messed up, they are just going forward wneh picture is lagging.
One or two cutscenes and you get the idea. It'll be more-or-less fixed in nearest release (beta), but it's the best we can do for you to understand he plot for now.
Also, dissassembling eboot in IDA gave 0-result, I'm totaly unfamilliar with PPC mnemonics, although I've managed to dig into some algos, searching for message loading procedure is an overkill. Moreover, because the string table is at the end of file (and not near corresponding call-function like in almost every x86 Windows executable), it only adds difficulty
A friend of mine, who'd rather remain unknown, helped me quite a lot on insight regarding those 2 files.
He said, and i quote: I think I've figured out a somewhat-clumsy-yet-efficient solution to the "square glyph" problem, which is responsible for the letter repetition when using half-width characters in the dialogue lines. I don't know what you've figured out and what you didn't, so I'll explain things from the beginning. Please be patient with me even if you know most or all of what follows. I only mean to be of assistance.
I've read in the first post of the "Disgaea 4 PS3 Translation / Fix Project WIP" thread at PS3 News that the square glyph issue is an encoding related problem, resulting of the fact that SJIS has 2 bytes per character for most of the characters. I think that's not quite it -- it is more of a rendering problem. In the dialogues, all glyphs are rendered as squares. And since half-width characters have smaller textures, they just "warp around". On the other hand, encoding is respected since all of the characters of an ASCII string are printed. One would expect one letter skipped out of two if the game where not "encoding aware enough".
I think it would be very difficult to solve this problem through disassembling and modifying the "EBOOT.BIN" file, because the buffer used to store the dialogue text is likely too small. The "bugs" one can observe by using the US "talk.dat" with the Japanese version are probably due to buffer overruns. So even if one managed to halve the character width, the maximum number of displayed characters would stay the same. Unless of course the size of the buffer is increased. All in all, this seems very difficult to do to me.
So I propose another solution: overwriting the glyphs found in the "font.lzs" file so as to substitute some of the characters (e.g., the kanjis) with letter pairs. For instance, the first kanji of the font file is "亜", which could be substituted by a square glyph representing the letter pair "Th". The next kanji would be replaced by "e ", then "fr", "es", "h ", and so on, in order to display on the screen "The fresh, dripping blood...", which is Valvatorez's first line. Of course, those same pairs could be reused later, should they appear in other dialogues. According to my estimations, the font file is big enough to contain all the pairs needed. Were it not, it could still be resized (but adding extra glyphs would also require to modify "font.ffm", which would make things more complicated).
I'll move to the technical aspects. Such a hack would require to understand the formats of the ".lzs", ".txf" and "talk.dat" files. Here is the information I could gather:
+ LZS is an already well understood format. See this page: www.cetramod.it/forum/viewtopic.php?f=52&t=266. I don't understand Italian, so I don't really know who to credit for the information (the author of the post?), but in summary it is a 254-bytes sliding-window compression scheme. To be specific:
- bytes 0x0 to 0x2: the expected extension of the decompressed file; should be "txf" for "font.lzs"
- byte 0x3: should be '\0'
- bytes 0x4 to 0x7 (little endian uint32): decompressed file size
- bytes 0x8 to 0xb (little endian uint32): (compressed file size) - 4
- bytes 0xc to 0xf (little endian uint32): flag value; must be less than 255, which means that bytes 0xd to 0xf are always "\0\0\0"
- the rest is compressed data; to decompress it, all the characters are to be copied as is except the flag byte which indicates a match, and is followed by the distance/length pair (unless it is followed by another flag byte, in which case a single byte with the flag value should be added to the uncompressed data); the distance and length are one uint8 each; the distance is given from the current position, and should be substracted 1 if it is greater than the flag (to account for the two-consecutive-flags special case); see for instance LZ77 on Wikipedia if you're not familiar with such compression methods
+ TXF is a very simple bitmap format:
- bytes 0x0 to 0xf are the header; the big-endian-encoded uint16 at offsets 0x4 and 0x6 are the image width and height, respectively; it is 1024×2272 for "font.txf"
- the rest is pixel data in the usual scanline order; when it comes to the "font.txf" file, there are two channels: the first is alpha and the other is value, I guess (it is worth 0xff for all the pixels)
+ "talk.dat" has the following structure:
- bytes 0x0 to 0x3 (big endian): the number of conversations (?)
- bytes 0x4 to 0x7 (big endian): should be the same as bytes 0x0 to 0x3
- following 56×(number of conversations) bytes: an array of conversation (?) structures; the first 4 bytes of each element (big endian) are the offset of the conversation start in the conversation data, in bytes (so if it is n, the conversation starts at n+8+56×(number of conversations) from the beginning of the file)
- the rest is conversation data; in particular, spoken lines begin with '\1' and end with '\0' (but of course, not all the '\1' indicate the beginning of a spoken line)
That's basically it. I've also got some very basic understanding of "font.ffm", but normally one should not need to mess with it. It is necessary to associate the glyph position in the "font.txf" bitmap with its SJIS byte sequence. The kanji table given below should be enough. Ask me I you really need to know.
Some kanjis of "font.lzs", in order (starting from glyph 486):
With some more dedicated programming, we could actually make the whole game text translated. I'll look into it after i'm done dumping/translating the eboot, but if anyone is interested/able to do something with this info, by all means, go for it.
Wow! OMFG! I had some vague thoughts about this file containing glyphs info, but I've never thought, it could be possibly edited to achieve such goal. The post is awesome and perfectly clear to me. I have some idea of the algorithm, maybe I'll work out when I'll have free time.
Just one question, to assure I've read it right,
Some kanjis of "font.lzs", in order (starting from glyph 486):
means, this is the text inside txf-file, I mean, "inside" bitmap?
I've drafted some quick algo for this process, but have problem with unpacking LZS. I know it's preety basic compression methof, and I had written some packers myself ages ago, but now it just don't work out
If someone would supply me with unpacked txf ot (better) Kanji glyph full dump (as in previous message "Code" section), that would be awesome.
Also, what should we do, if the number of chars in sentence (message) is odd? Should we add more bytes and reconstruct the whole message system?
Moreover, japanese talk.dat has more conversation messages than US. It was obvious, since there are more japanese sound files for dialogues
Last edited by Tidusnake666; 10-20-2011 at 06:03 AMReason: Automerged Doublepost