Hey there.

So... you use an ad blocker. That's cool. Sometimes we do too.

But without ad revenue, we wouldn't even be here. And we might not be here much longer.

Please disable your ad blocker and click to continue.

  1. #1
    Join Date
    Apr 2005

    Guide to Reversing on PS3 Linux Using the GNU Toolchain

    Today Disane has made available a guide to reversing on PS3 Linux using the GNU Toolchain.

    Prerequisites for it include computer science, assembly and C knowledge along with having available the GNU toolchain for the Cell Processor installed on PS3 Linux.

    PPC32-64 assembly:

    - IBM Introduction to PowerPC assembly
    - MacTech PPC assembly introduction
    - IBM's article series (1-4) on PowerPC assembly
    - Beginners guide to PPC32 assembly
    - MacTech PPC Function Calls
    - PPC ABI
    - IBM developers guide to the PPC architecture
    - PPC compiler writers guide

    Official PPC32-64 Docs:

    - PPC32
    - PPC64

    Cell Processor SPU assembly:

    - IBM SPU assembly article series
    - About the SPU ABI
    - Cell BE Linux ABI
    - Cell Processor ABI

    We are also going to use the following documentations when we are reversing SPU code:

    - SPU language specification
    - SPU Instruction Set Architecture

    So now that we are through the introduction. Let's get down and dirty. The first thing we need is an application we can reverse. Well, we could try and reverse existing applications compiled on our PPC64 Linux but that would take too much time and besides this is not a book. So we are going to start with something very basic.

    The application we are going to reverse today is the all times classic "Guessing Game" written in C. The idea is simple. The computer generates a number between 1 and 10 and the player has to guess which number has the computer generated. Simple, right? In C probably yes but in PPC and SPU assembly this could be a real challenge even for experienced programmers.

    Well we are not going to write the game in PPC and SPU assembly simply because of two reasons. The first one is that C is portable code. We can compile our C code to run on the PPU or on the SPU. Cool, what's the other reason? Well the other reason is that this is a tutorial on reversing SPU and PPU code and not an assembly tutorial.

    About the sample code I'm going to show you. We are not going to use classes or anything fancy this is a classic C (sequential programming 101) so no Object Oriented Programming introduced. No exceptions to invoke or anything fancy. We are going to use one single Game Loop to keep the game going until the player wins (yeah you can't lose in this game, but you can modify it if you want, the possibilities are endless).

    Alright so let's fire up Linux on our PS3 and open up the Terminal. I don't know about you, but I've got Ubuntu 10.4 on my machine. Locate a nice spot like ~/Projects/Reversing/guessing_game/ (use cd command to get to you home directory then mkdir to create the directories you need) and here we can create a text document using gedit or nano (type in gedit or nano).

    The rest of the Guide to Reversing on PS3 Linux Using the GNU Toolchain can be found HERE or linked above!

    Our C code is the following:

    [Register or Login to view code]

    Save the text as a C source code file. Let’s name it ppe_guessing_game.c .I think the sample code is well commented so I won’t be explaining what it does. I just assume you know C. (I think even beginner programmers should be able to understand the code.) Let’s compile this baby using gcc. The command looks as the following:

    [Register or Login to view code]

    That’s our PPU code. Nicely done so far! Now we are going to create another source code and name it spe_guessing_game.c. Our SPU code is a little bit different than the PPU code. Take a look at it first to see if you can spot the differences and then I will tell you why this code is different:

    [Register or Login to view code]

    As you can see the main() has changed a little bit, we have some interesting arguments here. Let’s see what they are:

    unsigned long long spe: is the SPU’s ID which is a unique SPU task identifier assigned by the operating system. Which means the OS selects the SPE where the program get’s loaded into.
    unsigned long long argp: is an effective address (EA) in the main storage to application parameters.
    unsigned long long env: The environment pointer, which is an effective address (EA) in main storage to runtime environment information. I really don’t want to get into how the SPU code is getting loaded and all the other mysterious questions you might raise here. So please refer to this tutorial if you have questions about programming the SPU:

    Okay so the SPE version of the guessing game has some little extra. It stores the SPE number in a variable and displays the ID of the SPE when the game starts up. Other than that these two games are all the same. Compile the game with the following command in your Terminal:

    [Register or Login to view code]

    We are done. Let’s check if the games are working fine. Type in:

    [Register or Login to view code]

    Test the game to see how it works then type in:

    [Register or Login to view code]

    Again, test the game then close it. If everything has gone well then we have our Binaries. Good work so far.

    [Register or Login to view code]

    This section will talk about using readelf, objdump, spu-readelf and spu-objdump . If you haven’t used these tools before then you might want to take a look at the following tutorial on exploring ELF files with readelf and objdump: http://www.linuxforums.org/articles/...jdump_125.html

    I will only going to show you what kind of arguments we are going to use and tell a little bit about these tools.

    So let’s start with the readelf. In this tutorial we are going to use the readelf file to view ELF headers and to detect sections of the executable ELF binary we created when we compiled the simple game above this section. So for starters the readelf arguments in general look like this:

    [Register or Login to view code]

    Navigate to the PPU_guessing binary using the Terminal and let’s type in readelf –h PPU_guessing to view the header of the PPU_guessing game:

    Alright, so now we know that we are dealing with PPU code. We know that Data is Big Endian code and it’s an EXEC (Executable File) file. We know that there are 38 Sections (This could probably hold debug information as well). We also see where the entry point to the program is: 0x10000430.
    Now let’s run readelf –S PPU_guessing. We get the following:

    As you can see this has lots info we can use. Like what sections are executable/readable/writable codes (by looking at the Flags column). Also we have the sizes of each section. The ones that are zeroed out are probably not interesting anyways. We also have debug sections you can check out. There’s a major difference between debug ELF executables and non-debug ELF executables. There are less sections in the ELF file and also the executable code itself is shorter, since you don’t have to store every register on the stack to make the value available to read by the debugger (ppu/spu-gdb). The list your seeing now won’t be explained now. We will discuss it in the objdump section of this tutorial. We are actually going to use objdump to view what’s inside of these sections.

    Now, try and use spu-readelf –h and spu-readelf –S on your SPU_guessing:

    As you can see the spu-elf consists of fewer sections. This means fewer data and code to worry about. Ok, so we have a bunch of sections here but how do we check what’s inside in each section. That’s where the objdump comes into play.

    objdump: objdump is the main tool we are going to use for reversing. You can look at it as a dump IDA disassembler but don’t be fooled. objdump can be still very powerful. First of all it allows us to view PPU and SPU code. Which IDA currently does not implement is the SPU disassemble and without the proper SPU processor plugin you can’t view/reverse SPU code using IDA. We are going to use objdump for two main things. The first one is viewing data and addresses. The second thing is viewing PPU and SPU code plus the address of each code segment.

    If you look at the top readelf/spu-readelf –S listings. You should see that there are many sections to disassemble and view. Let’s take a look at the most important sections which we are going to touch in this tutorial.

    .init: Holds the process initialization code. Gets called before main().
    .text: Executable section. This is where the program code goes. You might think that only main() and the other functions which are used by the application reside in this section. Well, the truth is that you get a lot more than that. More details in the “Reversing PowerPC 32 code” section of this tutorial.
    .fini: Holds the process termination code. Gets executed when your program exits normally.
    .rodata: A non writable section. Holds read only data which are strings, mostly. An ASCII code table could come in handy when dealing with .rodata.
    .bss: Uninitialized or so called “null” data goes here. When the program executes this data gets initialized as 0.
    .data: Initialized data.
    .got: The Global Offset Table. Nearly everything gets redirected to this address. It’s very handy to know this address.
    .plt: The Procedure Linkage Table. The base address for some libc function calls.

    note.spu_name: holds the name of the executable. For example: SPU_NAME.SPU_guessing
    For more information on the ELF and it’s special sections check out: http://www.skyfree.org/linux/references/ELF_Format.pdf

    Now, let’s check out some of the sections with objdump. Here’s some of the arguments you can give to objdump:

    [Register or Login to view code]

    So let’s go ahead and display the .rodata section of the PPU_guessing using the command below:

    [Register or Login to view code]

    Here we can view the read-only data (like strings) which are located in our program. The first column shows the address while the other 4 next to this shows the data in hex (each row holds 16 bytes of data, remember that one character is 1 byte which can be represented with 2 hexadecimal digits).

    Let’s check out what the following command gives us:

    [Register or Login to view code]

    Now, you might think hex-editing is the same as hacking. Not at all dear Sir, you are wrong. Hacking is about reversing parts of the application which you are interested in or want to abuse (remove nag screens, remove trial counter, make the application fully functional if it’s a shareware). In order to do that, you’ll need to reverse engineer parts of the application and this is the part where assembly knowledge comes in handy. In the next section we are going to dive deep into reversing our PPU_guessing program.

    Reversing PowerPC 32 code

    Before we go into detail, there’s something I want to show you. If you look at the .text section using the command below:

    [Register or Login to view code]

    As I mentioned above the .text section has more on store for us than you would expect. Normally you are expecting to get main(), initialize(), player_guess() but you get things like:

    _start: The first thing that gets executed by the Operating System. Also sets up the stack and It’s in fact a wrapper for __libc_start_main. Which is a library function inside libc. It’s interesting to note that we can actually create our own _start function enabling us to make the start code smaller.
    _do_global_dtors_aux: The destructor function for this program.
    _do_globabl_ctors_aux: The constructor function for this program.
    [email protected]: srand()
    [email protected]
    [email protected]: initializes the C library and calls main()
    [email protected]: calls printf()
    [email protected]: calls time()
    [email protected]: calls scanf()
    [email protected]: calls puts() although we know from the source code this wasn’t called specifically you’ll see how the compiler has swapped some of the printf()s to puts().
    [email protected]: calls rand()
    _glink_plt_resolver: resolves linking when the program executes.

    It’s nice to know some of these (C runtime) functions as well, cause at least we know that they’re not interesting for us to reverse. The good thing is that we know which functions are being called from libc this can make our job a lot easier.

    Check this out for more info on the above functions: http://www.mentalcases.net/texts/sec...ctionStudy.txt

    So now that we know what not to reverse there are only 3 functions left. Let’s view them:

    [Register or Login to view code]

    Still a lot of code to look at but it’s doable First of all, we need to review our knowledge on the PPC ABI and a little bit of our assembly knowledge to make a sense out of this huge blob of code. Let’s start with the ABI.

    PPC ABI review: First of all we won’t be reviewing the whole ABI for that you might want to read the articles in the Prologue section of this tutorial.

    r0 Volatile register used in function prologues (used to hold the LR)
    r1 Stack frame pointer (SP)
    r2 TOC pointer
    r3 Volatile parameter and return value register (sometimes used as parameter 1)
    r4-r10 Volatile registers used for function parameters 2 to 7
    r11 Volatile register used in calls by pointer and as an
    environment pointer for languages which require one
    r12 Volatile register used for exception handling and glink code
    r13 Reserved for use as system thread ID
    r14-r31 Nonvolatile registers used for local variables

    f0 Volatile scratch register
    f1-f4 Volatile floating point parameter and return value registers
    f5-f13 Volatile floating point parameter registers
    f14-f31 Nonvolatile registers

    LR Link register (volatile) you can’t use it as a general purpose register
    use mflr (move from LR) and mtlr (move to LR) to access its contents
    CTR Loop counter register (volatile)
    XER Fixed point exception register (volatile)
    FPSCR Floating point status and control register (volatile)

    Condition Registers:
    CR0-CR1 Volatile condition code register fields
    CR2-CR4 Nonvolatile condition code register fields
    CR5-CR7 Volatile condition code register fields

    The PPU stack frame organization:

    The information above comes from the http://refspecs.linuxfoundation.org/...64abi-1.7.html document.

    Here’s a little bit of an explanation on what the Link Register is and what the Stack Pointer is:

    Link Register(LR):
    The LR is actually a special purpose register which holds the address to return to when a function call completes. In other words this register stores an address where the program counter will jump to when the execution of the function is complete. It’s always moved from the LR and stored on r0 when a new stack is being built.

    Stack Pointer (stored on r1):
    Stores an address, that points to the top of the current stack. It’s also important to note that not every stack has the same size. The size of the Stack depends on the local variables and the number of saved registers on the stack. But there is a minimum size which the Stack must have where it can store the LR and the SP. When your decrementing the stack you are “allocating” Stack space and when your incrementing it then you are actually “deleting” the Stack.

    After this small review of the ABI let’s get back to our main() function and it’s PPU code:

    Setting up the Stack: Prologue and Epilogue
    In this section we are going to review how the stack is handled and how the ABI works. This will make our job easier to see what data is being stored on the stack. We are interested in Local Variables and looking at how the Stack is being setup, we can detect them.

    Prologue Phase:
    In the first few lines of our main function, we can see how the Stack is being setup. This is an important part which we need to investigate.

    [Register or Login to view code]

    Here we increment the Stack Pointer by -32, which means we move the pointer downwards by 32 Bytes and we store the stack pointer back in r1. This is why we call this instruction Store Word and Update. We also store the Stack Pointer (also known as the Back Chain Pointer, which is a Pointer to the previous Stack) on this address. This is how you allocate space on the stack. We always go from the higher address to the lower and we use a pointer to tell which Stack are we at. So now we store the “base” address of our new stack on r1 and now we can refer to r1 as the Stack Pointer. Remember that we said that the Back Chain is at SP+0 which is true. We are at that spot.

    [Register or Login to view code]

    Here we Move the LR into r0. So we can save it:

    [Register or Login to view code]

    on SP+36 which is an interesting thing. Remember that we already decremented the SP to create our Stack. But now we are actually jumping back to our previous stack and place the LR on the “bottom” of the previous Stack.

    [Register or Login to view code]

    Here we move the contents of r1 into r31 to create a frame pointer (a pointer to the top of the current stack).

    [Register or Login to view code]

    These two instructions store argc and argv on the stack. Note that we are using the frame pointer as the base address.

    Program Code:

    [Register or Login to view code]

    Here we can see 2 Branch and Link instructions. Which means the instruction pointer (or program counter) jumps on to initialize() and saves the address of where it jumped from on to the LR and increments it by 4(Bytes) so when the blr (branch to the address in LR) is called from initialize() the program counter won’t jump back to the bl initialize() code rather on to the next instruction(bl player_guess). We won’t be following the program counter for now cause we haven’t really figured out what else does the main() function perform. So let’s move further:

    So far we know that main() calls initialize() and player_guess() right after initialize() was called.

    [Register or Login to view code]

    So what happens here? That’s the million dollar question. Actually this isn’t that hard to find out. As you can see the first instruction loads in 4097 into r0, but this time it shifts the value by 2 bytes. Let’s see how this looks like:

    4097 (decimal) is equal to 0x1001 (hexa-decimal, just grab a calculator ). After loaded into r0 it looks like this:
    0x10010000 in the next instruction we move this value into r11 from r0 and use this as a base to add 4144. Let’s see how we add the two addressed to create one:

    [Register or Login to view code]

    We get this value in r9. What is this? Definitely not a value cause it’s too large to be some kind of value of a variable. But if you look at the address where this instruction is (100005c0) you should notice that the value we got also starts with 0x100… Hmmm are you thinking what I’m thinking? Yeah this is an address in our program. But the value is so large that it’s beyond the .text section. So where does this address point to? Let’s type in readelf –S PPU_guessing again to view the sizes and addresses of each section.

    Just as I expected: our address is inside the .sbss. The .sbss is exactly 8 Bytes long. So there’s probably data inside. Let’s open up the .sbss using objdump -D -j .sbss PPU_guessing.

    At the address 10011030 we have <guess> .Bingo! We can see that r9 stores the value of guess. I think we can move on to the next instruction. Again, we have these three instructions:

    [Register or Login to view code]

    I highlighted the most important part in these 3 instructions. Yes, you can view the address if you combine these two. So we get 0x1001102c inside r0. Which is an address in the .sbss to <game_number>. This means r0 stores the value of game_number. Great job so far. Now, we know that we are loading in 2 important values into r9 and r0. But here comes something even more important. These two values are being compared by the following instruction:

    [Register or Login to view code]

    Basically what happens here is that. We compare these two “words”. If you look back at the previous instructions then you can see that these two values were loaded in as “words”, a word is exactly 4 Bytes long, remember that on 32 bit mode an address is 4 Bytes long, quite explains why you can only address 4 GBs of memory right? . But that’s not important what’s important is that we are comparing these two words, r9 with r0 and we are setting the bits on cr7 (Condition Register 7). After the bits are set we move on to our next instruction:

    [Register or Login to view code]

    Ok, so we are branching again. Our instruction is the Branch if not Equal (+ predict, that the branch is likely to be taken). Very interesting fact is that the machine knows that in most cases the player will be wrong . If the branch hint was wrong and the player guessed the number then the processor has to unload the code it has loaded (the hinted code) and loads in the code which it did not predict. Branch prediction is good for optimization but I think you don’t want to use it random situations (especially in real time applications). In this case we can assume that the player will be false and besides we need to take the other branch only once so we won’t lose too much overall speed. So let’s get back to our instruction. We are checking the bits inside cr7 and if the prediction was right then we go back to this instruction:

    [Register or Login to view code]

    Oh, yes! This means that if the two values are not equal then the Program Counter jumps back to player_guess. Which we can assume is the function which takes the guess value from the player. But we will see once we get there. We now know that main() has some very important role in this program. It loads in the values guess and game_number and compares them and branches according to the result. Let’s see what happens if the player was correct:

    [Register or Login to view code]

    Again we can see Load Immediate Shifted and his buddy Add Immediate with Carry. But this time the situation is a bit different. The 0x10000a2c address gets loaded into r3, which makes me want to think that we are witnessing a function call. According to the ABI the r3 is the first argument that gets loaded when a function is being called. If you look at the third instruction, you can see that indeed we are branching and Linking into puts(). puts(), “WTF?” you might ask. Why is puts() being called and not printf(). I guess the answer is simple but you won’t like it. The compiler has decided to use puts() instead of printf() cause of one reason and that has something to do with the address we are loading into r3. Let’s check the address:

    [Register or Login to view code]

    Our address this time is inside .rodata according to the section addresses. If your still scratching your head how did I come to this conclusion then let’s check the section addresses together:

    [Register or Login to view code]

    Ok, we have 0x10000a2c which is greater than 100009e0 (.fini ) and 10000a18 (.rodata) but not greater than 10000b9c(.eh_frame_hdr). Which makes me want to think that the address we are looking for is between .rodata and .eh_frame_hrd. So we open up .rodata (this is where read-only data is stored, makes sense, right?):

    [Register or Login to view code]

    Well, there’s no such address mentioned here. But we can guess that the address is somewhere here. So let’s look at 0x10000a28. We need to calculate where the data resides. Let’s subtract 0x10000a2c with 0x10000a28:

    [Register or Login to view code]

    That means 4 Bytes. So our data starts from here:

    [Register or Login to view code]

    Here we need to cut 4 bytes which means our data is the one in the green highlight and the red is the one we cut off (remember 1 Byte is two “digits” in Hex-Decimal). We can check if we are looking at the data we are searching for. We need an ASCII table to see that. Since we know that each character is two Hex digit. We can decode the characters from the hex-decimal code manually:

    [Register or Login to view code]

    I think this is it we’ve found the data. Let’s see how it looks like:

    [Register or Login to view code]

    Our data is the one I highlighted with green. Okay nice so puts() writes out “Oh goody, you guessed the number!”.
    Since there’s no data to display GCC thought that puts() is enough to do the job there’s no need for printf(). So how did I find out where’s the end of the string in .rodata? It’s simple if you’ve ever ran the program your reversing then it’s pretty easy to tell how strings are displayed. You can see them and write down the messages the program writes out. There’s another way to check this as you can see there’s a little bit of gap between each string. The gab consists of 0’s. Let’s move on to our next instruction series. This next one is also a function call. But this time we are calling printf(). This means we are going to display data as well as strings. Let’s see how this is preformed:

    [Register or Login to view code]

    Alright let’s check the addresses. 0x10000a50 points to “the number was indeed %d./nCongratulations!!!” where we can see data as well and this data is stored on 0x1001102c very familiar address. Isn’t this game_number? Indeed it is game_number. After loading the string and data into r9 and r0 we move the string from r9 into r3 and the data from r0 into r4. Then something interesting happens:

    [Register or Login to view code]

    crclr stands for Condition Register Clear. Basically what it does is simple. It xors the Condition Register with its own values which means it clears every bit to zero. But here we are only touching cr1 more specifically we are clearing the equal bits. Which is the bit field of cr1 that indicates if the result was equal or not. I have no idea why is GCC doing this :-/

    Anyway, after that we Branch and Link into printf() with the two arguments to print out the string and the data together.
    Epilogue Phase:
    From now on we are moving into the Epilogue phase of the main() function. Let’s see how GCC handles that:

    [Register or Login to view code]


    So let’s see what we know about main():

    Stores argv and argc on the stack.
    Calls initialize() and player_guess().
    Loads in the game_number and guess and compares them if they’re equal if not then the Program Counter jumps back and calls player_guess() again.
    Else it writes out the Contratulations message with the game_number and then the program ends.

    This is vital info in reverse engineering this small program (not that you would need to reverse engineer something as simple as this xD).

    So now that we know what main() does then let’s look at initialize() and player_guess(). I won’t break the code down like I previously did. I’m going to just comment it and when something special comes up, then I’ll make a more detailed explanation:

    [Register or Login to view code]

    Here’s a little help on recognizing MODULO in any future projects. I’ve managed to create a program which consists of the following code:

    [Register or Login to view code]

    The example code is very simple. We are creating static data on the stack, 2 Integers and a double. The double stores the result. That means we will see some use of the float instructions and the floating point registers. Let’s look at the screenshot I made after disassembling the ELF with objdump:

    Notice one important thing. We can see what the operands are and how the MODULO code uses them. This can be helpful in recognizing operands in our PPU_guessing example.

    We can see the +1 immediate in the end and the result of rand() stored on r9. The only thing we can’t spot is the 10 immediate which we are “dividing” with (Notice rand() % 10 in the source code of PPU_guessing.)

    [Register or Login to view code]

    If you are asking “How the hell did you find out that those instructions were actually the MODULO operator?”. This is when experience starts to kick in. If you had checked out a C source code (compiled with GCC) that uses the MODULO, you would’ve known that those instructions are indeed “hiding” a MODULO operator. From now on always check the assembly source code your compiler is generating. Then try and understand it. It’s easier if you have the C source code. But without the source code you can only rely on your own knowledge.


    So here’s what we know about initialize():

    Writes a welcome message.
    Uses time(), srand(), rand() and the MODULO operator (% in C) to generate a random number between 1 to 10
    Stores the random number as game_number (on game_number’s address)
    Writes out a message that the number was generated between 1 to 10

    Next up, we have the player_guess() function. What we already know about player_guess() is that it probably asks the player to input values. Let’s see how this is done:

    [Register or Login to view code]


    So what do we know about player_guess?:

    player_guess() asks the player to input an integer. It doesn’t check the size of the number! But it doesn’t matter cause, you can’t input anything larger than what an integer can hold. If you input a character or a string then the program goes into an infinite loop. So no character or string check as well. So the data is being read by scanf(%d).
    After that the program compares game_number and guess. If guess is less than or equal to game_number then the program checks if guess is actually larger than or equal to game_number if the two numbers are equal then we branch and link out of this function. The code to check if game_number and guess are equal is in the main() function.

    We can consider this as a big bottleneck to performance cause, the Stack has to be rebuilt each time the player inputs the wrong number. It would be better to not to use a function for this rather copy the code into main().
    We’ve managed to find out how the program works exactly by looking at its PPU assembly code. The information we’ve gathered is enough to rebuild this program or to view its flaws. We can also rewrite the whole program in C if we really want to. Yeah I know that the MODULO section was pretty tricky but if you view the assembly code your compiler generates then you’ll be able to spot things like these easily. Keep viewing PPU code and you’ll be able to reverse anything you want.

    Our next section will talk about reversing SPU code.

    Reversing Cell SPU code

    Now that we know how to use spu-gdb, spu-objdump and spu-readelf, we should move on to reversing our sample program. There are not too many articles or tutorials about reversing actual SPU code (actually I wasn’t able to find any but I’m sure you guys will prove me wrong). So this part of my tutorial might be very interesting for newcomers to SPU assembly language and reversing.

    So as I mentioned before in the prologue section of this tutorial (nah, not the prologue section where we setup the stack ). Open up the docs SPU ISA and SPU assembly language specification. Don’t worry you don’t have to actually read these documents from the beginning to the end. We are actually going to work with these documents when we are reversing our code. You can download these pdf’s or just use the online version from IBM.

    Alright I hope you are well rested cause, we are going deep into SPU code today. SPU code is a lot easier to reverse using objdump cause, you can view the data right away, and you don’t have to calculate the addresses. Also the addresses you need to load into your registers are smaller numbers and you don’t have to combine them. The LS can only store about 256 KBs of data. This means, if you want more memory, then you need to use MFC DMA to access the main memory. Using MFC DMA isn’t easy to use. You need to use 2^4 byte alignment and use some special function calls in order to send and receive data from the MFC DMA traffic. Our example program doesn’t handle this but you can try and write something that utilizes this fantastic feature of the CELL/BE or check out the tutorials I mentioned above.

    Let’s cut the chit-chat and look at the code we have here. Oh I almost forgot, we haven’t used objdump to view the code, so here are the magic words you need to use:

    [Register or Login to view code]

    As we did previously, I’m going to run and explain most of the sections we won’t be touching when we are reversing the program. I will only list the parts we won’t be touching. I won’t be commenting the ones that were already commented in the PPU section. Here we go:

    [Register or Login to view code]

    “This performs the assembly language magic needed to stop the SPE, signal the PPE, and pass a single argument. __send_to_ppe is called with three arguments:
    • The first one is the signal to send to the PPE. PPE callbacks begin with the hex digits 21 and the next two hex digits refer to the callback number being called.
    • The second argument is used for callbacks that code multiple functions and basically serves to allow the callback function to switch functionality based on the value. This value basically gets plugged into the high-order byte of the data pointer. Most simple applications just use 0 for this argument.
    • The third argument is the data (usually a pointer) to pass to the PPE. Remember, however, that the high-order byte of the third parameter gets replaced with the value of the second parameter when the system actually passes the arguments.”

    Taken from: http://www.ibm.com/developerworks/library/pa-libspe2/

    [Register or Login to view code]

    As you can see the C runtime functions are actually passed to the PPE. The SPU doesn’t call them directly. Interesting mechanism, isn’t it?

    The most important parts are as follows:

    [Register or Login to view code]

    Notice that we are using $ (Dollar) to tell the programmer that he’s looking at a register.

    The ABI is pretty much the same as with the PPU. We have $0 which stores the current LR (Link Register or Back Chain Pointer). $1 stores the SP (Stack Pointer). $2 stores the environment pointer, which is used by some programming languages. $3 to $74 are used as arguments. $3 is the register that stores the returned result. We can safely assume that we won’t be storing any arguments on the stack that’s for sure with this many registers.

    After the listing I guess we should take a look at the code more closely:

    [Register or Login to view code]

    As always SPU-GCC makes a label for main().


    [Register or Login to view code]

    Very standard. We store the LR on the previous stack (_start). stqd stands for store quad word (D-form). I haven’t really mentioned the addressing forms yet. But in SPU assembly you are almost always telling the assembler explicitly which addressing mode you are using. Unlike PPU assembly where you just use what you like when storing or loading. The D-Form is very simple basically you are adding an immediate to the register. Here we are adding 16 to $1 (register 1, which stores the SP). $0 stores the LR of course. These two are dedicated registers. Since this is a store command it means that we are storing $0 on SP+16. Notice how we store the previous SP (Stack Pointer) at the bottom of the stack. Then we create the 80 bytes long stack by decrementing the SP (Stack Pointer) in $1. Unlike in PPU assembly in SPU assembly you cannot store something or load and then update the registers. So here you need an extra Add Immediate (ai) to decrement the SP.

    Program Code:

    [Register or Login to view code]

    Here is something very interesting. The first instruction loads in data from SP+32 and moves it into $2 then creates a control for word insertion. Basically this can be used to mask off unwanted bits. The next instruction shuffles $3 and $2 using $5 as control bit and stores the result in $2. This way we are moving every bit into the so called preferred slots. This is an important part. If the data isn’t aligned this way then you’ll get false data or so called data corruption. In order to counter that, we do the alignment. To understand control bits and bit shuffle refer to the SPU_Assembly_Specification document. Then we store the data back into SP+32.

    [Register or Login to view code]

    The other data gets the same treatment and gets stored back on SP+48.

    [Register or Login to view code]

    Same as before only this time we are storing everything back into SP+64. If you are wondering what these could be then look at the following code:

    [Register or Login to view code]

    Here we can see that we are reading something out of SP+32 and loading it into $2. Then we pass it to a variable on host ‘s address. The three variables SP+32, SP+48 and SP+64 are the arguments which we are storing on the stack. The spe_id, argp and envp. Which means we are storing the spe_id inside host.

    [Register or Login to view code]

    Very familiar instruction sequence. Branch Relative and Set Link is the name of these two instructions. Here we are using them to call 2 functions: initialize() and then player_guess().

    [Register or Login to view code]

    We load guess into $2 and game_number into $3. The third instruction does the comparison and writes the EQ bits into $2. Yeah you are not mistaken. Unlike the PPU, the SPU does not have conditional registers. It uses the “GPRs” for that. The third instruction branches if the result in $2 is Zero. If so then we jump back and call player_guess. If $2 and $3 are not equal, then we branch back. Remember if EQ is set then the two registers are equal, but we are checking for zero, which means we are checking if the two are not equal. After this we load our address (0xc20) of a string into $3. Our string is the following: “Oh goody, you guessed the number!”. After outputting this string with puts(). We are loading in game_number’s address into $2 and a string address (c50) into $3. $4 gets a Zero. Then we can call printf(). Our message will be the following: “the number was indeed %d Congratulations !!!/n”. After that we clear out $2 and $3 and load a Zero inside them.

    If you are interested in where the symbols are for the host, game_number and guess. Check out the .bss (uninitialized data) section using the following command:

    [Register or Login to view code]

    You should get the following results. As you can see we have a pointer to the effective address of the LS (Local Store) as well:

    [Register or Login to view code]


    [Register or Login to view code]

    Nothing fancy here. We increment the SP ($1) by 64 to destroy the current stack and then load the LR from the previous stack stored at SP+16. Then we use bi to Branch indirectly out of here and end the program.


    Let’s summarize what we managed to discover about main(). In general main() is almost the same as we saw in the PPU code. Only this time we are storing SPE_ID inside host and we have 3 arguments which we store on the stack. After that the program calls initialize(). This happens only once. After that we enter a loop where the program keeps asking the player for a number between 1 and 10. The main() loop does the comparison of guess and game_number and branches according to the result. If the two numbers are not the same then the program calls player_guess. If the two values are the same then the program writes out some strings to congratulate the player and the game_number as well. After this the program ends.

    Other Functions:

    Now, we are going to go through the rest of the functions briefly (as we did with PPU_guess):

    [Register or Login to view code]

    As you can see our MODULO operator is back. Here’s the objdump I made of the SPU_Modulo sample program (!) that uses the MODULO operator:

    There are two numbers stored on the stack: operand_1 on SP+64 which has the value of 50 and operand_2 on SP+48 which has the value of 100. The two data are being loaded into registers 2 and 3. this is a big thing cause in the PPU version we weren't able to spot all the operands. If we look at our SPU_guessing, then we can see that we are loading in 10 (immediate) into $2 and the result of rand() is inside $3. On 0x294 address (code) we can see that 1 is being added to the result.

    [Register or Login to view code]


    First thing this function does is that it welcomes the player then writes out which SPE is being used using another message. This function initializes the game. It generates a number between 1 and 10. We can finally read what kind of operands we are dealing with here. After generating the number the program stores it as game_number. So the generated number is actually game_number.

    [Register or Login to view code]


    The player_guess() does some of the logical decisions of the program, but not all of them. It asks for a number (no limits) and then compares it with game_number. If guess is greater than game_number, then we write out that guess was too large and the number we are looking for is smaller. After that the program destroys the stack and goes back to main(). Where it jumps back to the player_guess and creates the stack again. (What a waste, this logic should be inside main(). The problem is the same as with the PPU_guessing program.) But if guess is not greater than game_number then we check if guess is smaller than game_number. If so then we write out that the number we are looking for is larger than guess. If not then we exit from the function directly. Meaning that the two values are probably equal and main() will check that separately.

    Right, we are done reversing.

    When we fail miserably - I bumped into several problems during the time I wrote this tutorial. It appears that using objdump on PPC64 code which is compiled with the “-dynamic” flag (normally if you don’t ask for “–static” explicitly then you get dynamic code) you can’t tell which function from the dynamically linked object file is being called. The problem is that some function calls like the libc functions get resolved dynamically. Which means you can’t determine what kind of function call was made. You only get redirected to .glink PLT resolver code which makes no sense to mere humans. The .got doesn’t store an absolute address to these calls which brings us back to the problem I mentioned in the last sentence. What can we do then? How can we use objdump to reverse PPC64 code? …Honestly? Use IDA. There’s really no other way. You can try and guess what the calls could be. Also the addresses to data are negative numbers which make no sense what so ever. The following commands could help you with PPC64 code (but I think PPC64 is just hopeless as it gets with objdump):

    [Register or Login to view code]

    The above commands can help you finding out the name of the functions and methods being called.
    Also check out the following article on dynamic linking and ELF files: http://devpit.org/wiki/Debugging_PowerPC_ELF_Binaries

    This is the command I tried to use to disassemble the PPC64 bit code:

    [Register or Login to view code]

    This is also possible:

    [Register or Login to view code]

    Maybe if you try and play around with the Machine or Architecture settings you might get the function names as well. I tried everything even following the code but I haven’t been able to solve this mystery. Good luck!


    That’s it for now folks. If anyone has questions, then please post them on this thread so I can take a look. It was actually pretty hard and time consuming to write this tutorial. I hope it shed some light on a few mysteries about reverse engineering code. Remember we’ve only touched basic code with symbols turned on. In most cases you don’t have this comfort. You need to recognize C runtime functions your self and you cannot rely on symbols. Also C++ code is another thing to reverse. It can be really painful. Just keep looking at assembly code and you’ll get better. Also you might want to try and compile your code with the following parameters:

    [Register or Login to view code]

    This will output you the assembly source code of your C program. This assembly source code can be very helpful in understanding what the compiler did with your C code. The binary we were examining was already linked and contained text and data which we were not interested in. This parameter gives you the program code only. Have fun with your new toys and if you have questions then please don't keep them to yourself.

    More PlayStation 3 News...

  2. #2
    tjay17 Guest
    What can you do with this?

  3. #3
    Join Date
    Apr 2005
    Honestly, it's just for budding Devs who wish to learn how to reverse-engineer... most end-users probably won't ever use it.

  4. #4
    red8316 Guest
    Wowza, lots of information, very comprehensive post. Looks like a useful contribution. Nice work.

  5. #5
    duventj Guest
    Interesting and very nice post...

    Merci beaucoup !!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Log in