Users browsing this thread: 1 Guest(s)
TRIP WORLD analysis
#1
MODS, I don't know if this should be in Ripping Projects or Requests or what? But uhm this isn't actually any contribution yet so if this needs to be moved, that's okay.

Puggsoy wrote in my previous thread: "How do you go about figuring this stuff out?"
to which I promised to have more transparency in my future rips. As in, make a case study on how I rip. So, here's part one.

Anyway, Trip world (J). What a sweet game. Guaranteed Sunsoft quality.
Shame that if you open the ROM in Tile Molester you only see compressed graphics.
Today's practice: crack open the compression format.

DISCLAIMER: I've never looked into the game's innards except regarding some old level format study, so I'm practically going into this blindly a I type. WITH YOU. I apologise for badly worded text.

First off, I found that there are some uncompressed graphics in the ROM, namely, the protagonist sprites – nothing else. So it's safe to assume that the hedgehog enemy sprites are compressed (why? because they aren't uncompressed.)

So I played the game and found a hedgehog and made a save state.
[Image: hedgehog.png]

Great. Now that I opened the save state in Tile Molester, I found the hedgehog's graphics on the first gfx page (0x8040):
[Image: hedgehog2.png]

so, woo, mission accomplished? Naw, man... Now we'll scour the ROM for anything that looks like that. Apparently, at 0x22E54, there's a graphics page that looks... awfully similar Cute
[Image: compressed.png]

Don't see the resemblance? Here's a gif (emphasis on the hedgehog)
[Image: comparegif.gif]

So, now we know while the ROM contains that seemingly garbled data and the save state contains those pretty clean sprites of the hedgehog, there's some magic inbetween that the game does.
Our job is to figure out how those mangled graphics get transformed into the prettiness that is in the save state.

So, let's compare the bytes. I opened both files from the offset right after where the hedgehog data ends: the save state at 0x9040 and the ROM at 0x23E54. I read around $100 bytes backwards from those points and laid them on top of each other. Find differences and explain them, I thought, while indenting the bytes with spaces so the same bytes are parallel and conflicting bytes isolated. Warning: pretty wide image.. You probably have to open it in a new tab to see it completely.
So, at a quick glance looking at the bytes, it seems that the compressed (top) data is actually prettttty much the same as the decompressed data. In fact, it seems that there are extra bytes on the compressed data with all those $FFs and $FEs thrown in there every now and then.
The major differences hit me right in the eye – I've color-coded some parts for easier analysis. Green is 8-byte-match, purple is 7-byte-match and red is mismatch zone:
Okay.. How the hell does 82 00 E0 in ROM get decompressed into FE 00 00 00 00 00 00 E0 E0? What's the purpose of the $FEs and $FFs?
Well, after some intense staring, apparently, in the ROM:
• There's a 7-byte-match always before a mismatch zone.
• There's a $FF always before an 8-byte-match.
• There's a $FE always before a 7-byte-match followed immediately by a mismatch zone.
• There are never two 7-byte-matches successively.
• There are two kinds of mismatch zones. Some are three bytes and some are four bytes, but in the end they get decompressed to 10 bytes.

———Intermission———
So, like, what is the point of compression anyway? Minimizing redundant data. Instead of na na na na na na na na na na na na na na na na BATMAAAAAAAN it's much more efficient to say [na(16)] BATM[A(7)]N
Since we've only got four colors to work with, can you tell which is the most redundant color just by looking at the graphics pages of the state and the ROM? What would benefit the most from compression? I'll let you think about that.

okay yeah it's black. or transparent, in this case. which is 00. In fact, searching for a string of 00 00 brings.. ZERO RESULTS in the ROM portion. It's very common in the state portion, which is indeed the decompressed data. Write that down.
———End intermission———

• Mismatch zones always begin with the last byte of the previous chunk.
• Mismatch zones always end in either $BF, $C0, $E0.
• Three-byte mismatch zones begin with $80 and four-byte mismatch zones begin with $82.
• The mismatch zones that end in $BF all have the last byte copied from the next chunk, then seven bytes printed as-is.

I'm not sure, but looking at these observations, I think this is how they work:
(At this point I also realised that since there's always a 7-byte match after the special byte $BF, it's safe to assume that $BF is actually an instruction. Never mind it being grouped with the mismatch bytes, should've put it after them...)

Anyway, using this logical set of information we can make some rules for a protoype decompressor:
1. Check instruction.
2. If it's $FF, print the next eight bytes as-is and read next byte as an instruction.
3. If it's $FE, print the next seven bytes as-is, print the seventh byte again and read the eighth and ninth bytes as XY ZZ; print ZZ exactly X times, and if Y is a nonzero value, overwrite the last Y bytes with the byte after ZZ. If Y is zero, never mind that shit. After ZZ (or the byte after ZZ in a case of nonzero Y) read an instruction.
4. If it's $BF, print the next seven bytes as-is, but the first byte twice. Then read the next byte as an instruction.

Now, obviously there has to be more instructions, as these rules alone don't make a too efficient compression scheme. But now that we have a basic grasp of the common methods (instruction, bytes, how the RLEish bytes work), we can continue with relative ease.

So, to see if our conclusions were right, I backtracked the bytes as far as I could until I saw unknown instructions, copied the bytes from 0x23C73 to the end of the graphics page, 0x23E54, and used our prototype decompressor.
Here's the graphics in the ROM at 0x23C73:
[Image: gfxbefore.png]

Here it is after our decompression:
[Image: gfxafter.png]

So far so good! The output was 100% accurate.
Let's go even further back...

Reading backwards from 0x23BD1, I think that's the squirrel thing's sprite set. Again, I laid the bytes parallel to each other, the ROM being above and the state being below, isolating the conflict zones...... And I noticed a pattern. Can you see it?
well, here's the bytes again, color-coded.
As you can see, I was wrong about the "special bytes" and the $FEs function earlier.
I also noticed that the output is always in packs of eight bytes! This helped with the indenting of the state bytes tremendously.

But anyway, we've found some new instructions. If you don't feel like checking the wide images out, I'll list the new findings here:

Instructions, revision 2:
$FF: Same old, same old. Read the following eight bytes and print them all out as-is.
$FE: Read the following seven bytes, but print the last byte twice.
$FA: Read the following six bytes, but print the last and second to last bytes twice.
$F8: Read the following five bytes, but print the last byte four times.
$EF: Read the following seven bytes, but print the third byte twice.
$E8: Read the following four bytes, but print the third byte twice and the last one four times
$D8: Read the following four bytes, but print the second byte twice and the last byte four times
$BF: Read the following seven bytes, but print the first byte twice.
$AF: Read the following six bytes, but print the first and the second byte twice.
$80: Print the following byte eight times.
$82: Read the following two bytes, print the first byte six times and the second byte twice.
$88: Read the following two bytes and print both four times.
$8A: Read the following three bytes, print the first byte four times and the second and third twice.
$8B: Read the following four bytes, print the first byte four times, the second one twice and the rest as-is.

Note: I can't program, but I can do all of the tricks above with just key presses, so I assigned a textmate macro to fourteen keys.
[Image: keys.jpg]

Right. So, at this point, you should have gotten a pretty good grasp on what we're doing here; hunting for unknown instructions in the ROM, figuring out how they convert the bytes to those found in the state, and documenting them.
So what's missing? I don't know, but let's try decompressing some more graphics with out new prototype decompressor that operates by these new rules. So, let's take a snippet all the way from 0x23A196 and run it through the decompressor.

[Image: sqrl.png]

Flawless. But I must admit that I ran into an extra $00 on the way, just before the hedgehog's tiles. I deleted it, and I'm led to believe that instruction $00 alone means "these particular graphics end here."

So, we've made great progress. I used the textmate macro method and tried to decompress some other sprites and every time I stumbled into an unknown instruction, I just skipped it and analyzed it afterwards.
With enough save states and comparisons, I found so many new instructions that I think I could decompress most of the game's graphics now.
Here's a list:


$81: Read the following two bytes, print the first byte seven times and the second byte as-is.
$83: Read the following three bytes, print the first byte six times and the rest as-is.
$87: Read the following four bytes, print the first byte five times and the rest as-is.
$8F: Read the following five bytes, print the first byte four times and the rest as-is.
$97: Read the following five bytes, print the first byte three times, the second byte twice and the rest as-is.
$9F: Read the following six bytes, print the first byte three times and the rest as-is.
$A0: Read the following two bytes, print the first byte twice and the second byte six times.
$A2: Read the following three bytes, print the first byte twice, the second byte four times and the third byte twice.
$A3: Read the following four bytes, print the first byte twice, second byte four times and the rest as-is.
$A8: Read the following three bytes, print the first two bytes twice and the third byte four times.
$B7: Read the following six bytes, print the first byte twice, second byte as-is, third byte twice and the rest as-is.
$BVery Sad Read the following six bytes, print the first byte twice, the second, third and fourth bytes as-is, the fifth byte twice and the last byte once.
$DA: Read the following five bytes, print the first byte as-is, the second byte twice, third byte once, fourth and fifth bytes twice.
$DE: Read the following six bytes, print the first byte as-is, second byte twice, third, fourth and fifth bytes as-is and finally the last one twice.
$DF: Read the following six bytes, print the first byte as-is, second byte twice and the rest as-is.
$E0: Read the following three bytes, print the first two bytes as-is and the third byte six times.
$EA: Read the following five bytes, print the first two bytes as-is, and the rest twice.
$F2: Read the following five bytes, print the first three bytes as-is, the fourth byte three times and the last one twice.
$F6: Read the following six bytes, print the first three bytes as-is, the fourth byte twice, fifth as-is and finally the last one twice.
$FC: Read the following six bytes, print the first five bytes as-is, and the sixth byte three times.
$FVery Sad Read the following seven bytes, print the first five bytes as-is, the sixth byte twice and the seventh one as-is.


Also, turns out this compression is actually pretty weak. But what do I know.

So. I think I have enough data. I think I can do this. I'll dive into the ROM and try to decompress everything, starting from the beginning of the first graphics page at 0x8018 (it looks like some tail or something). I'll stop when I read an instruction $00. LET'S GO
here we go
[Image: phaseone.png]

aaaand DONE
[Image: phasetwo.png]

as you can see, the output is flawless. With some trial and error, staring at bytes and careful analysis of input and output, we were able to find out what happens in between and created a pseudo-decompressor.
Alas, having to physically push a key every time I see a new instruction is a pain in the ass, I propose that someone who's great at programming makes a decompressor. Doesn't need to be fancy, just needs a ROM, beginning-offset field and a big GO button that starts the engines. I'd also go with a "instruction $00 found, check next byte?" -dialog when it thinks it's done.
And uh, these aren't probably all of the instructions so maybe a "unknown instruction occurred, check log.txt" -dialog would be wise too.
Once there was a way to get back homeward
Reply
#2
Wow, awesome! I think I can see how you go about it now, I understood it all. Comparing savestates with ROM data though, that never came to mind. Certainly a useful technique. You'd need the determination for it, but I might give this a shot sometime. Being able to extract sprites directly from ROMs is a helpful skill!

I'll try whip up a program tonight. It'll be a command-line program but that should be sufficient for this.

EDIT: For testing purposes I've been looking for the same graphic points in the ROM that you used, but I'm not seeing the same thing. I've got the Japanese version of the ROM, is it supposed to be unheadered or something?
You may have a fresh start any moment you choose, for this thing that we call "failure" is not the falling down, but the staying down. -Mary Pickford
Reply
Thanked by: Raccoon Sam, TheShyGuy
#3
OH dear what a clown I am. I guess Tile Molester's grown on me too much.. I forgot to mention that you must set your canvas width to 1, then disable block size from full canvas, then re-stretch the width back to 16.
CHECK THIS OUT.

And, thanks a bunch for the decompressor. I'm a macfag but I can virtualize and run some exes through mono or wine.


EDIT: Also, in case I didn't put enough emphasis on "part one", I mean this is not all. This is merely the first part of a bigger journey to perfect rips. First we find out how to decompress the graphics, then we find out where the information on what tiles are what all the sprites created from, THEN we're kinda done i guess
Once there was a way to get back homeward
Reply
#4
Ah, thanks! And yeah this is obviously not complete, being able to automatically construct sprites from the tiles would be much better.

Also, while I was writing all the instructions into my program, I was wondering if there was a better way to do it. I had a helper function that simplified what I had to do, but I still had to write in checks for the instructions, and then what to do depending on that. Once I got to $BF I noticed it was getting quite tedious, so I had a better look at the bits of the bytes to see if there was some way to automatically know what each byte does. And there is. So here I'll have my own little lecture Smile



As you know, each byte has 8 bits. For the instruction bytes, these bits tell you how to read the following bytes to end up with 8 bytes in total. How it works is like this:

1 = read the next byte and print it.
0 = print the last read byte.

So for example, let's say we have the instruction $D8. In binary this is 11011000. Let's check these out one by one:

1 = read the following byte and print it.
1 = read the following byte and print it.
0 = print the last read byte.
1 = read the following byte and print it.
1 = read the following byte and print it.
0 = print the last read byte.
0 = print the last read byte.
0 = print the last read byte.

You can see that only four bytes are read, the second one is printed twice and the last is printed four times. This is the same as your instruction for $D8: "Read the following four bytes, but print the second byte twice and the last byte four times".

You can check it out yourself, convert the instruction bytes to binary and you should see that they match up with the instructions you deduced.
This also shows why $FF (11111111) is read the following 8 bytes and print them once each, and $80 (10000000) is read the first byte and print it 8 times.



As you can imagine, this is much quicker and simpler to implement than checking for specific bytes and doing specific actions depending on them. So I'm glad I discovered that.

You might also notice that for this to actually work, the first bit has to be 1; 0 means print the last byte, and there's no byte before the first one! This means that anything below $80 can't be an instruction (except for $00 which, as you said, is probably the end of that section). I'll account for this in the program.

EDIT: Done! Here you go; TripWorldDecompressor. Just download and unzip it somewhere, and double click the program (or start it through the commandline, whatever). It'll ask you to type in the ROM name, do that. It'll ask where to start decompressing, so type that in (without any $ or 0x prefix, just the hex offset directly). Once it reaches a null byte it'll say how many "chunks" it decompressed, and you can choose to let it continue or not. Keep in mind that the chunk counter is for the total of that session, so if it says it's done 80, then you say to continue, and it says 80 again, that means it's done 80 in total (not 160). Getting the same number twice like this also means two adjacent null bytes, so there's probably no graphics data for a while after that.
Anyway, once you've said no to a null byte notification, it'll stop decompressing and ask you to name a file to save the decompressed data to. Keep in mind that if you name an existing file then it will overwrite that completely. So just do something like "hedgehog.bin" or whatever. After that'll it'll end and you just press any key to close it.

If it reaches an invalid instruction during the decompression process, it'll tell you the byte it reached and then automatically stop decompressing. It'll go straight to the file saving question so you can save what it's decompressed so far.

So yeah, this should hopefully make the decompressing a lot easier!
You may have a fresh start any moment you choose, for this thing that we call "failure" is not the falling down, but the staying down. -Mary Pickford
Reply
Thanked by:
#5
My god!! It all makes sense now!! Haha, awesome find! You're a genius!
In fact, DK94 uses the same kind of bits-as-instructions type of compression.. No idea why this didn't occur to me earlier.

And to anyone who's been reading the thread; I can't stress enough how much just fucking around with the bytes helps. In the beginning, neither of us had no idea how the compression worked–when I laid the bytes out next to each other, at some point the idea of instruction bytes just hit me. Same with Puggsoy–just wondering if there was an easier method made him look into the bits deeper and BAM there it was.
Never stop being curious! STARE AT THE BYTES and MAKE NOTES

anyway

I'll decompress everything now and get on writing part 2. Stay tuned!!
Once there was a way to get back homeward
Reply
Thanked by: Garamonde, puggsoy


Forum Jump: