SuperCPU Tutorial

SuperCPU Illuminated - Part 4

As we learned in Part 3, there are a few special particularities to watch out for when dealing with 8 and 16 bit registers. In Part 4, we'll give you a few more important tips when it comes to working with double width registers. We also give you some more routines, and lastly you'll get to know another great feature of the 65816 processor.

Let's continue from where we left off last time: As we were discussing transfers between 8 and 16 bit registers, we mentioned briefly the transfer command TXS (Transfer X to Stackpointer). That's right, you can transfer a 16 bit Index register to the Stackpointer. This transfer is 16 bits due to the nature of the 65816 - of course only in Native mode. In Emulation mode it's 8 bits and the Highbyte is always $01, as we know. But in Native mode the Stack can be located anywhere within the 64K. With this, Page 1 loses it's special meaning in Native mode, and can in principle be dealt with like any other memory. In addition the 16 bit wide Stackpointer means that the Stack doesn't have to be confined to $0100 anymore!

An Imposter?

So you can create a stack for a certain routine, or simply leave the stack where it would go best. One should always be careful with such things, because theStack will be used for reference addresses. If you misplace the Stack with a JSR reference subroutine, and if you don't pay attention to the contents of the old Stackpointer, the C64 won't know where it should reference to. This is because one would have made it clear to the computer that the reference addresses were pushed onto the stack.

There are a few special commands used when dealing with the Stackpointer: TXS and TSX are already well known to 6510 programmers. If you turn the Index register to 16 bit mode, these commands will work with 16 bit words too. In addition, there is another command pair: TCS and TSC. "C" stands for memory, and it means that in any case 16 bits will be transferred to and from memory; Regardless of whether it's 8 or 16 bits! How can this be? Remember: in Native mode there is a "hidden" memory B when the memory is in 8 bit mode.
The Highbyte of the 16 bit memory is put there. If you switch the memory from 16 to 8 bit mode, the Highbyte of memory isn't lost; it'll stay in memory "B" until the memory is switched back to 16 bit mode, where it will become Highbyte of the memory. "C" stands for Lowbyte and Highbyte of the memory at the same time. TSC transfers the 16 bit Stackpointer to memory. If it's in 8 bit mode, the Stackpointer-Lowbyte will be placed there, and the Highbyte goes to "B" memory. If it's in 16 bit mode, you have the same Stackpointer in memory.

Shoving Games

A 16 bit word (two bytes) will be pushed onto the Stack by the PHA command as well (In 16 bit memory mode). You should know that the Highbyte is pushed first, then the Lowbyte. Accordingly, the command PLA has the reverse effect: the Lowbyte ist pushed first, then the Highbyte. But the 65816 has more in its bag of tricks: the well known command sequence in which you must push the memory then put the X register into memory, push it again, and then do the same with the Y register, is a thing of the past. There are two new commands PHX, PHY and their opposites PLX and PLY. With these you can push the Index register directly onto the Stack. Again, just like with the memory, the Highbyte is pushed first, then the Lowbyte. This is, of course, when the Index register is in 16 bit mode.
With a "Pull", just like with memory again, first the Lowbyte then the Highbyte are retrieved. The command PHP (PusH Processor status), causes the flags to be pushed to the Stack, and also causes the Status register to always be 8 bits wide. Caution: the "ninth bit" won't be pushed! The only possible way to effect it is with the well known XCE command.

Is That What You Thought?

Aside from the "normal" Stack applications and moving of the Stack you can also do some very different things with the 65816. Remember: Normally the Stackpointeris located at $01FF, but after a PHA is executed, it reduces itself by one, and is located at $01FE. With a 16 bit PHA it reduces by two; for example from $01FF to $01FD. But how would you put the Stackpointer at $07E7? Switch the memory to 16 bit mode, execute an LDA #$2020 - the Lowbyte and Highbyte now contain the word for an empty signal.
Now it can begin: PHA - we've already erased two signals, without having to use any DEX command, and the processor has reset the pointer to $07E5! Now we send a PHA right back. So how far have we come? Well, we'll get the Stackpointer from the X register - TSX - and compare to see if it's smaller than $0400 (above the corner of the screen) - the 16 bit Stackpointer will move itself to $06FF and respectively $06FE! But is it bigger now? So far we have, with help, erased the screen stack - and we've done it faster than it would have been possible with any other routine!
Except for a huge heap, STA's wouldn't be faster - you can foresee little more insignificantly long code. See the whole thing in Listing 4.1 - it works! Listing 4.2 shown the same routine, admittedly with the TSC and PHX commands. Even though it looks a little unusual to us, they function exactly so. Incidentally, no interrupts can occur during the execution of the routine - the reference addresses for the RTI would be pushed onto the stack automatically - that is, they'll land in the middle of our screen. Moreover, don't just fill the screen ram with empty signals - there's nothing wrong with erasing an entire bitmap of this type.

The Whole Truth

Since we're on the subject of new stack manipulation commands, here are a few new stack cammands. For example, the PEA command (Push Effective Absolute address) is a simple command used to push a word onto a stack without having to put it into a register first. As the name suggests the word is an address, but it's ultimately up to the programmer. This command will always push a 16 bit word regardless of whether the M or X flag is set to 8 or 16 bit.
You should be careful when dealing with the syntax of this command: a PEA $5000 means that the 16 bit word $5000 will be pushed onto the stack, not the word stored at $5000 and $5001 in memory! A close relative of the PEA command is the PEI command (Push Effective Indirect address). With this command, you can push and address (which is a pointer in the zero page) onto the stack. If $00 is in $FE, and $60 is in FF, a PEI execution will cause $6000 to be pushed onto the stack. Again, it doesn't matter if you're in 16 or 8 bit mode; a 16 bit word will be pushed. You must be careful with the syntax here too: The command doesn't deal with the contents of the address at which the declared zero page pointer is located. The address itself will be pushed onto the stack.

Relocate It Yourself!

A really great new command is PER. PER stands for "Push Effective PC Relative Indirect Address" and functions thusly: A PER $1000 works like PEI except that before the $1000 gets pushed to the stack, it gets added to the value of the program counter! So this command offers the possibility to write code that can move freely about the memory. The value which is pushed onto the Stack at the moment that the PER is executed will be added to the runtime of the program. So you can imagine what's possible with direct addressing when dealing with data areas of memory. The address behind the PER is not a concrete address, but rather like an offset (with the branch commands), and interval "from here on".
This will be dealt with in a later part of this tutorial, "Stack-relative indexed indirect addressing" (don't panic!). Moreover, there are no logical opposites to these three commands (PEA, PEI, and PER) like with PHA-PLA or TCS-TSC.

Safe With 16 Bits

So that does it for the stack! Now a few finer points for dealing with 16 bits. Perhaps you haven't even thought about it, but there is a difference between the relationship between $7F and $80 in 8 bit mode and 16 bit mode. No, we don't mean 1, but rather positive and negative. As the knowledgeable 6510 coder knows, going from a positive to negative word will set the negative flag. Here, the 7th bit is seen as the negative bit: If a bit of a value is set, the negative flag is also set. This will sometimes be used to "drag the zero along": You must peep into the X register from $27, then branch out, but not with a BNE, rather a BPL command. How does it work when the Index registers are in 16 bit mode? It's actually quite logical: Again, the next highest bit (15 in this case) is the indicator of negative or positive, and the corresponding one will also set the negative flag!
The fact that addition and subtraction commands work well in 16 bit mode was seen in the last part of this tutorial. But these are by no means the only opcodes which are effected by the erasure of the M and X flags. For example, a 16 bit INC will count up to the appointed address until it has reached $FF - an additional INC makes sure that it will be $00 again (that much we know) - but now the byte at the address+1 will also be raised by one - of course, because it will be seen as the highbyte of a 16 bit word! An ASL in 16 byte mode will push two bytes to the left, and with an STA you can write the highbyte and lowbyte of an address to the zero page, or at the same time change the frame and background colors. The commands AND and ORA are self-explanitory with 16 bit wide arguments, regardless of whether appointed or gotten from memory. In Listing 4.3 you can see a few commands applied in 16 bit mode.

Direct Switching

Now to the afore mentioned expanded feature of the SuperCPU's processor. The stack isn't the only thing that will put itself wherever it wants to. The zeropage can also be moved! It isn't stuck on page 0 in the memory anymore, that's why the
term "direct paging" is used, when refering to the 65816. You can have direct access to it, even without having a declared highbyte. Zeropage addresses were and are so beloved, because they can be accessed and used very quickly and efficiently.
Without the Highbyte the code not only becomes shorter, but faster as well. That is why you take up zero page addresses for all values which you want to work with often and quickly. But you can address all the memory just like the zero page - one block at a time of course - but nonetheless, each and every one that you want to work with! There is the so called "Direct Page Register", which contains the word $0000 by default (in emulation mode as well).
But we can change that: The command TCD transfers the contents of the 16 bit memory into the direct page register. Then we load the word #$1000 into memory, and execute a TCD. Now our zeropage (read: direct page) is at $1000! An LDA $50
deals with the address $1050. At the address $0050 we approach as before:
We either switch back and get a #$0000 from memory, and transfer to the direct page register by way of a TCD, or we switch by way of LDA $0050. The TCD also has an opposite, TDC. The "C" means that in every case, a 16 bit word will be transferred, just like with the commands TCS and TSC. You can transfer the zero page without switching to 16 bit mode: Here, we get help from the PLD command (PuLl Direct page register from stack). Next, we push the Highbyte ($10) onto the stack (with PHA), and then the same with the Lowbyte ($00). Now a PLD - this one gets 16 bits from the stack in the direct page register in every case, regardless of the status of the M flag. Vice versa, you can push the value of this register onto the stack with the PHD command in order to get it from any old register. Moreover, you can push values into the direct page register at which the lowbyte is not zero. (The processor must always calculate the addresses first, something which counteracts the speed advantage).

Insane Speed

The qualities of the randomly locatable zero page (direct page) open up unexpected possiblities. Not only can you put your sound on it's own zero page, but there are more advantages as well. At every memory location at which there is a certain routine which you often use, you can use zero page addressing (direct page addressing) and be able to count on some speed advantages.

So you can, for example, work faster by moving one region after the other of a bitmap to zero page. The same applies for sounds, a bit of music with a corresponding routine can be played easily without even using one raster line of the SuperCPU. In Listing 4.4 and in Listing 4.5 you can see how the direct page register works.

We can't think of any computer whose processor offers a similar possibility - surely not in Windows PC's, where the addresses must deal with an extremly expendable hardware made up of diverse segmentation pointers and offsets. Next time we'll go further - we're still not far from the border to the 65816!

(w) ThunderBlade/DMAgic

[ To Part 3 ][ To Index ][ To Part 5 ]