TSC Hacking
Creating Your Own Commands

It looks like you survived that huge barrage of "redefining" lessons. If you feel like your brain has been fried, I suggest that you take a well-deserved break before starting on this lesson.

About TSC
TSC is *the* scripting language of Cave Story. It's incredibly useful, and fairly straightforward to learn. Right now, we're going to learn how to create new TSC commands, so you can very easily extend any assembly functionality to TSC.

The Cave Story program is able to read data from TSC files and perform certain actions based on what commands those files contain. Pixel knew that writing dialogue and cutscenes directly into the program was not a great idea. So, being the game-creator that he is, Pixel designed his own scripting language for Cave Story to make things easier and more manageable.

We're going to take a look at the TSC parser. What is a parser? Well, I'm glad you asked.

The definition of the verb "to parse" is to break down a string of text characters into small parts and then analyze what those parts mean. Essentially, the TSC parser checks a TSC command to see what it is, then it performs the command, then it moves onto the next command.

Also, you should know what ASCII is. ASCII is the "American Standard Code for Information Interchange". ASCII is a system of numbering in which each text character, such as A or B or @, is given a certain hex number, so that computers are able to display text information. Every modern computer in the world, from Canada to Japan to the USA, should be able to understand ASCII.[1]

ASCII Table
Before we begin, you need the ASCII table. An ASCII table tells you which hex code belongs to which text character. For your convenience, I've compiled such a table so that you can easily view it inside this guide:

Click Here to View the ASCII Table

The TSC Parser
If you look at the ever-so-useful Assembly Compendium, you'll find that the TSC parser starts at address 422510.

TSC Parser

It's important to understand that the parser is CMPing ASCII hex codes to check for a TSC command. Remember to look at the ASCII table I gave you!

At the very beginning of the parser, the parser CMPs a single hex code: 3C. Looking at the table, you'll see that 3C is the hex code for <, which is the beginning of every TSC command. If the character isn't <, then the code jumps way down near the end of the parser.

The first command the parser checks is the <END command. The parser CMPs 45, 4E, and then 44, which are the ASCII hex codes for E, N, and D, respectively. If all those letters match up, then the parser performs the <END command. If one or more of those letters don't match up, it jumps to address 422666. Notice that 422666 is the beginning of the next command: the <LI+ command.

After checking for <LI+, the parser continues to check for commands until it finds the right one. If the command isn't found (i.e. you made a typo during TSC coding), then the game will return an error message.

Script Positions
MOV ECX,DWORD [4A5AD8]
ADD ECX,DWORD [4A5AE0]
These two commands are very important. [4A5AD8] holds the pointer to where the current script is loaded. [4A5AE0] holds the script position within that script file. When you add [4A5AE0] to [4A5AD8], you get the pointer to the current character, or the character that is being read by the game.

In this case, I used MOV ECX,DWORD [4A5AD8] and ADD ECX,DWORD [4A5AE0]. I used ECX as an example, but you could just as easily substitute EDX or EAX.

Let's say that the game is running and you've just opened the door to Arthur's house. After determining that you've got Arthur's Key, the game will jump to the event #0101 and set the script position to the beginning of that event.

Arthur's Door

I've highlighted the < character using a red box because that's the character being read by the game. If you perform the commands MOV ECX,DWORD [4A5AD8] and then ADD ECX,DWORD [4A5AE0], then ECX will act as a pointer to that character.

Accessing BYTE [ECX] will give you the hex code 3C, which is the hex number of the < character. BYTE [ECX+1] contains the hex code for the S character, BYTE [ECX+2] contains the hex code for O, and BYTE [ECX+3] contains the hex code for U.

The parser will first check to see if the command is <END, which isn't the right one, so it'll move on to <LI+, which also isn't the right one. It will keep checking until it finally reaches <SOU.

ASCII to Number Function
Ever wondered what this function does?
ASCII to Number Function:
PUSH (Script Position)
CALL 00421900           ;returns number into EAX
ADD ESP,4
The above function takes the current Script Position, and then stores the 4-digit decimal number (inside the TSC file) starting at that Script Position to the register EAX.

So let's see an example of that. Here is the <SOU command:
Address   Command                                  Comments
00424266  MOV EAX,DWORD PTR DS:[4A5AD8]            ;Where the script is loaded
0042426B  ADD EAX,DWORD PTR DS:[4A5AE0]            ;add Script Position
00424271  MOVSX ECX,BYTE PTR DS:[EAX+1]            ;get character directly after the current character.
00424275  CMP ECX,53                            ;check for S
00424278  JNE SHORT 004242DA                    ;if not S, then jump to next command, which is <CMU.
0042427A  MOV EDX,DWORD PTR DS:[4A5AD8]            ;Where the script is loaded
00424280  ADD EDX,DWORD PTR DS:[4A5AE0]            ;add Script Position
00424286  MOVSX EAX,BYTE PTR DS:[EDX+2]            ;get char that's 2 characters after the current character.
0042428A  CMP EAX,4F                            ;check for O
0042428D  JNE SHORT 004242DA                    ;if not O, then jump to next command <CMU.
0042428F  MOV ECX,DWORD PTR DS:[4A5AD8]            ;Where the script is loaded
00424295  ADD ECX,DWORD PTR DS:[4A5AE0]            ;add Script Position
0042429B  MOVSX EDX,BYTE PTR DS:[ECX+3]            ;get char that's 3 characters after the current character.
0042429F  CMP EDX,55                            ;check for U
004242A2  JNE SHORT 004242DA                    ;if not U, then jump to next command <CMU.
004242A4  MOV EAX,DWORD PTR DS:[4A5AE0]            ;get Script Position
004242A9  ADD EAX,4                                ;Add 4 to Script Position
004242AC  PUSH EAX                              ;PUSH Script position
004242AD  CALL 00421900                         ;CALL Ascii to number Function
004242B2  ADD ESP,4                             ;Fix the stack
004242B5  MOV DWORD PTR SS:[EBP-24],EAX            ;take EAX (which holds the 4-digit TSC number) and store it into variable [EBP-24].
004242B8  PUSH 1                                ;PUSH 1 (this is the Channel #)
004242BA  MOV ECX,DWORD PTR SS:[EBP-24]            ;take variable [EBP-24] and store it into ECX
004242BD  PUSH ECX                              ;PUSH ECX (ECX is still holding that same 4-digit TSC number)
004242BE  CALL 00420640                         ;CALL Play Sound Function
004242C3  ADD ESP,8                             ;Fix the stack
004242C6  MOV EDX,DWORD PTR DS:[4A5AE0]            ;Get Script Position
004242CC  ADD EDX,8                                ;Add 8 to Script Position
004242CF  MOV DWORD PTR DS:[4A5AE0],EDX            ;Store Back to Script Position
004242D5  JMP 004252A7                          ;Jump Back to Beginning of Parser
Hey look, it's the MOVSX instruction. See, those past lessons weren't useless after all.

But why are we using a byte? This is because each ASCII character is worth 1 byte. Technically, each ASCII character contains 1 hex pair, and each pair of hexadecimal numbers is worth 1 byte. In order to check each letter of a TSC command 1 letter at a time, we have to MOV (or MOVSX) the data 1 byte at a time.

Here's a flowchart that explains what happens when the <SOU command is executed:

SOU Flowchart

Now here's an even bigger chart that explains <SOU in even more detail:

Detailed SOU Flowchart

Summary

1. The first thing the parser does is check for characters. This is the basic routine for checking three characters of any TSC command:
MOV ECX,DWORD [4A5AD8]               ;Notice that I've rewritten this code to make
ADD ECX,DWORD [4A5AE0]               ;it shorter.
CMP BYTE [ECX+1],(first character)
JNE (address of next TSC command)
CMP BYTE [ECX+2],(2nd character)
JNE (address of next TSC command)
CMP BYTE [ECX+3],(3rd character)
JNE (address of next TSC command)
(Yes, you do need to use BYTE, not DWORD, for checking the letters of a TSC command. Otherwise, the above code will not work and you will get an error message.)

2. After that, you can get the 4-digit numbers used after the TSC command using the ASCII to Number Function. Make sure you set the Script Position correctly, then just use the function to grab that 4-digit TSC number.

3. You make the command do whatever action you want it to perform.

4. Next, you add the (number of characters your TSC command takes up) directly to [4A5AE0], also known as the Script Position. This allows the parser to move onto the next command in the TSC file.

5. Finally, make sure you do a JMP 4252A7 at the end of your command, which will lead the code into a series of other jumps that eventually gets you back to the beginning of the parser.

Jumping to Other TSC Events

The following function is what the <EVE command uses to jump to another event while the script is running:
Jump to TSC Event Function:
PUSH (event #)       ;remember all numbers must be in hex
CALL 00421AF0        ;Call this function to Jump to a TSC Event while the script is running.
ADD ESP,4
JMP 004252A7         ;this part goes back to the beginning of the parser.
If you use this Jump to TSC Event function, you do not need to do part 4 (see the above Summary section). This is because the Script Position is automatically set to the beginning of whatever event you jumped to. So, there is no need to add anything extra to the script position.

Navigation
Previous Lesson: Redefining IMUL and IDIV
Next Lesson: The <BBP Command
Table of Contents

[1]Of course, nowadays computers use Unicode for all sorts of multilingual text support. For Cave Story hacking though, understanding ASCII is enough.