Users browsing this thread: 1 Guest(s)
Paper Mario TTYD File Research: samp
#1
So... you might or might not have noticed the other day that I posted in this thread for about 5 or 10 minutes before I deleted my post. Basically I felt like I was onto something but because I didn't really have much of anything yet, it seemed very awkward and the mounting anxiety compelled me to delete my post until I collected my thoughts for something with marginally more substance.

I figured I might as well make a thread of my own here in the hopes that I might find something and thus illuminate the otherwise seemingly undocumented .samp file, specifically the one in Paper Mario: The Thousand Year Door. It's also to basically ask for help from people who might be better at reading patterns in hex files than me if/when I get stuck Ninja

My main problem is that I have no experience with reverse-engineering file formats so this is my first time. As a result, expect me to stumble over myself trying to figure out basic things like hex representations of integers and floats along with basic header patterns! Who knows, maybe I'll learn something from this. That'll be great if I do! 'o'


So in the basics as people figured, there are sound effects in the pmario.samp file located in the TTYD ISO's /sound/proj/ folder. I've confirmed this by importing the pmario.samp file as Raw Data in Audacity. The only thing is that when I've imported the data, there is this awful LOUD, SCREECHY NOISE PERMEATING THE WHOLE FILE SO IT'S NOT USABLE FOR ANYTHING BUT DAMAGING YOUR EARS. On the plus side though, I did noticeably hear Bowser/Mario/Peach's voice clips past the mid point in the noise! Based on other docs I've read, the format might be a custom-ish 4-bit ADPCM format used for other things in the 'cube like streamed audio. Makes sense.

The format I used to import in Audacity to hear the sound files in the horrible screeching fashion was 8-bit PCM, Big Endian, Mono @ 11025Hz.

VOX ADPCM, Big Endian, Mono @ 22050Hz also produced similar results. Obviously these are unusable for actual game purposes but they were valuable for me since they told me that the sounds exist seemingly uncompressed/unscrambled in the file, one after another. It gives me hope that while splitting up the sounds and programmatically naming them might be a horrible endeavor, I can at least theoretically find and steal some of the ADPCM decoding code from a project like vgmstream to use in an unpacker/extractor. I'll do this if I'm going nowhere with the thing I'm trying to do.

Note: The Hex Editor I'm using is HxD which allows me to specify column width for the hex data and the ASCII representation. I'm using a width of 32 instead of the Hex Editor standard of 16 which makes things much clearer for me. I explain things in the wall of text below under this assumption.

Anyway... except for the .db(2) files, there are also a bunch of other files all named "pmario" in the same directory with extensions as follows:

.samp - Our prize right here, 10.7MB and confirmed by myself to contain (PCM-based?) audio samples. It appears to store the instrument samples for sequenced music in other games, according to some posts I've read. Since TTYD uses streamed music though, it just chucks the sound effects of the whole game into it for shits 'n giggles.

I haven't confirmed that it's Gamecube 4-bit ADPCM format but I'll do it eventually.

.sdir - Sound directory, a headerless file with a repeating 32-byte structure that encodes information about each sample. See post below for my current understanding of the structure.

.slib - No idea, but it's 275KB compared to to the interesting pmario.sdir filesize of 87.6KB. Must be important, it contains many chunks of data or files that start with a 4 byte integer detailing the chunk's length, including the filesize int.

.hrf - ??? Contains "HRFi" as the first 4 bytes in the file, has 8 bytes of unknown purpose (didn't seem to create any sensical number as 4-byte ints. Maybe they're several shorts??), a null, then the name of the file "pmario.samp". There's a huge chunk of nulls followed by some other bytes and "pmario_samp-0000000001.669". I doubt this will be of any use.

.etbl - Filename table for sound effects. The last byte before a new name seems to increment up to 0xFF and then rolls back to 0x00 for no real reason without changing anything else in the file. I don't get why. There's also junk in the names up to the 30th character that I explain below.

There doesn't seem to be any associated offset or length data in here which makes me think other data might reference it by specifying a fixed index (or the current index of the pmario.samp sound) and multiplying it by 32 (length of each name record in bytes), then reading 30 bytes to get the name of the sound. To get SE3_AMB_RIVER1, (index 3, position 2 starting from 0): 2*32 = 64, or offset of 0x40 - 0x60 (0x40+20, length of record[32] in hex) which corresponds with "SE3_AMB_RIVER1.IO_JUMP2..LING2..". Again, junk is explained below.

.stbl - Other filename table for sound effects? Not sure why there are two files, might be contextual.

On the previously mentioned note, pmario.stbl may come before pmario.etbl judging by the junk in the pmario.etbl file. Also, I figured out what the ".IO_JUMP2..LING2.." junk means... it's basically a null terminator followed by an "after image" of the names that came before it*.

If a subsequent name doesn't take up the 30 characters allowed (it seems the last 2 are 'reserved' for a purpose I haven't identified yet), it simply reprints the last used chars on the next line. In a way I guess it's like if you copied the line you typed to the next line and used Insert mode to overwrite it partially.

Either way, it doesn't really seem to matter because even if the full string with junk is read in to a 30 byte array (not including null terminator, otherwise 31) for the filename, the premature end of the string will be signified by a null character that ends the string before the junk data enters it. If it's the full 30 chars? The array ends naturally. Kinda simple in retrospect and not a major discovery that deserves me being windbaggy about, but knowing it makes me happier at least.

pmario_sound_bgm_txt.db
pmario_sound_env_txt.db
pmario_sound_env_txt.db2

^
These all contain some sort of configuration data for the sounds to be played in-game, including streamed music. It appears to use the tbl filenames as the identifier. As far as I can tell, information stored in these files is effectively worthless from our perspective, aside maybe giving slightly more verbose names for the music tracks, apparently. Everyone just renames those to the appropriate in-game name though so even that information is worthless! Tongue

If you wish to examine these files yourself I can make a zip/7z archive of the proj folder and upload it to Mediafire or something. Either way you should be able to get the same files if you have a copy of TTYD (hopefully one you ripped yourself!)

*Example from pmario.etbl, nulls highlighted in red:
[Image: yZrVVNu.png]
Example from pmario.stbl, nulls also highlighted red but with a 30char circled in orange. The 31st char seems to increment almost randomly and this features the 32nd character incrementing pointlessly:
[Image: 4ysyiBs.png]
Thanked by:
#2
I found the sample rates! Big Grin

I updated my OP to reflect this change, now I'm pretty much certain that the 32-byte structures in pmario.sdir encode sample information. I just need to figure out what the others are.

The two bytes at the start of this structure are likely an index number while I'm still trying to make sense of 0x04-0x07 (incrementing integer) along with 0C-0D. The latter is all 0x3C00 (15360) though. This number seems familiar, anyone got any ideas? Searches imply it could be volume but I have my doubts.

I get the feeling that the last 4 bytes are some kind of offset and I'm fairly certain that they point into the huge chunk of random data below that I mentioned previously; I just noticed that the 4 index bytes at 0x1C - 0x1F literally point to the start of the 'random' data past the 0xFFFFFFFF that seems to signify that the entry blocks end while the last set seem to point near the end of the file.

This implies that those structures I mentioned earlier starting with 0x0008 are in fact the beginning of what seem to be repeating 40-byte structures.
Thanked by:
#3
Quote:Haven't gotten around to playing the game yet. I want to get the Dolphin developers to see if maybe they can see what's wrong first.

In the meanwhile, here's something.

Star Fox Adventures uses MusyX for sound and music playback. The musyxbin directory contains the output of MUCONV tool that's included in the MusyX SDK.

According to the documentation for the tool:


.proj Project data. This file contains the actual structure of
the converted project.
.pool Pool data. Contains macros and tables needed by the
converted project.
.sdir Sample directory. A file which contains a “directory”
describing all of the samples used in the project. Does
not contain sample data.
.samp Sample data.
Note that song files do not use the basename. These files use the
names assigned by their composers and have a .song extension.



These five different kinds of files are still in the folder, but for some reason someone decided to trim the last letter of each file extension. So instead of .pool, the game's version of the file is .poo for example (lol).

Also generated is a header file that allows the programmer to implement the music/sound data into the game.

All the data is "trimmed and compressed" so that it's optimized for the Gamecube automatically. They say their format is proprietary, so I don't know what format they compress it in. I have the SDK for MusyX, but I can't find anything to convert the converted data back or tools that can analyze the format (but I understand why that would normally be unnecessary, lol).

I don't know if there's a way to get everything back to a MusyX project file. It'd be nice because that way we can listen to Star Fox's music. For Dinosaur Planet's case though, I have no clue what they use for play back. According to DLLS.bin they use something called AMSEQ/AMSFX.

FWIW, the SDK comes with a converter to convert MIDI to the SONG format.

I found this by pure chance in a thread about Dinosaur Planet (N64) and the inclusion of some of its data in a Starfox Adventures demo disc. It doesn't really tell me anything about converting the data to a usable format but at least it confirms that pmario.proj and pmario.pool aren't really what I'm looking for so I figure I might as well drop those from investigation.

Also, current details on the sdir 32-byte block format:
[Image: lxFlOjp.png]
Code:
--------------------------------------------------------
|  Size   | Type  | Purpose                            |
| 2 bytes | Short | Index number, pointer?             |
| 2 bytes | ???   | Unknown, always 0x00               |
| 4 bytes | Int   | Offset to data in pmario.samp      |
| 4 bytes | ???   | Unknown, always 0x00               |
| 2 bytes | ???   | Often 0x3C00, sometimes varies     |
| 2 bytes | Short | Sample rate, often 16000Hz         |
| 4 bytes | Int?  | Unknown                            |
| 4 bytes | Int?  | Unknown                            |
| 4 bytes | Int?  | Unknown                            |
| 4 bytes | Int   | Offset to other structure in sdir  |
--------------------------------------------------------
There doesn't seem to be any filesize information (that I can make out) in this 32-byte block yet or in the 40-byte block pointed to by the last 4 bytes in the 32-byte block but I figure it's not strictly needed since the any offset data n can have its filesize determined by: size[n+1]-size[n], n being the current index's offset addr and 1 essentially being 32 bytes (1 block).

I'm also trying to see if there is any filename index information that I can point to a name in pmario.stbl or pmario.etbl but I haven't found anything concrete yet. I might try to make a program to parse the pmario.samp file using the data in pmario.sdir and retrieve each sample (over 1,200(!) in total, though) but I might see if vgmstream supports converting headerless raw sample data so I can also try converting them to WAV, too. I will do some research on that note, soon.

On a semi-related note, here are all of the filenames in the tbl files without the junk following them. Not sure why there's ~300 more filenames in the table compared to the number of sample chunks. This is why I want to find some sort of index data if possible because I could end up with wrong names.

On the note of pmario.slib though, I noticed that it also seems to be a huge collection of smaller chunks of data (or perhaps files). The first four bytes of the file appear to be an integer that shows how long the chunk/file (including the 4 byte size) following it is. After each chunk, this 4-bytes-then-data thing repeats, albeit with a differing size.

For example, the first four bytes in the file are 0x000FC0 (4032 bytes); sure enough, when you highlight the first 4032 bytes including the 0x000FC0, the length comes out to be 0x0FC0, or 3KB. I have no idea what the data could be yet but at least I figured that one out. The file is ended with a sequence of 0xFFFFFFFF.
Thanked by:
#4
Quote:Note: The Hex Editor I'm using is HxD

I like that Hex editor too since it's nice and simple. But I recommend Hex Edit which allows you to write file templates.

check out the images here as an example: http://brawlimports.proboards.com/thread...-edit-mode

edit:
file template making tutorial
http://www.youtube.com/watch?v=snL0_rfBDNo
Animations - MFGG TKO (scrapped) - tFR
[Image: QUmE6.gif]
"It feels that time is better spent on original creations" - Konjak
Focus on the performance, the idea, not the technical bits or details - Milt Kahl
Thanked by:
#5
(02-08-2014, 03:56 PM)TheShyGuy Wrote: Hex Edit

Seems neat but I might hold off on that for now. I wasn't aware it existed though, thanks for the info.
Thanked by:


Forum Jump: