SuperCPU Illuminated - Part 4
As we learned in Part 3, there are a few special particularities to watch out for when
dealing with 8 and 16 bit registers. In Part 4, we'll give you a few more important
tips when it comes to working with double width registers. We also give you some more
routines, and lastly you'll get to know another great feature of the 65816 processor.
Let's continue from where we left off last time: As we were discussing transfers
between 8 and 16 bit registers, we mentioned briefly the transfer command TXS (Transfer X
to Stackpointer). That's right, you can transfer a 16 bit Index register to the
Stackpointer. This transfer is 16 bits due to the nature of the 65816 - of course
only in Native mode. In Emulation mode it's 8 bits and the Highbyte is always $01,
as we know. But in Native mode the Stack can be located anywhere within the 64K.
With this, Page 1 loses it's special meaning in Native mode, and can in principle
be dealt with like any other memory. In addition the 16 bit wide Stackpointer means
that the Stack doesn't have to be confined to $0100 anymore!
An Imposter?
So you can create a stack for a certain routine, or simply leave the stack where it would
go best. One should always be careful with such things, because theStack will be
used for reference addresses. If you misplace the Stack with a JSR reference
subroutine, and if you don't pay attention to the contents of the old Stackpointer, the
C64 won't know where it should reference to. This is because one would have made it
clear to the computer that the reference addresses were pushed onto the stack.
There are a few special commands used when dealing with the Stackpointer: TXS and
TSX are already well known to 6510 programmers. If you turn the Index register to 16
bit mode, these commands will work with 16 bit words too. In addition, there is
another command pair: TCS and TSC. "C" stands for memory, and it means
that in any case 16 bits will be transferred to and from memory; Regardless of whether
it's 8 or 16 bits! How can this be? Remember: in Native mode there is a
"hidden" memory B when the memory is in 8 bit mode.
The Highbyte of the 16 bit memory is put there. If you switch the memory from 16 to
8 bit mode, the Highbyte of memory isn't lost; it'll stay in memory "B" until
the memory is switched back to 16 bit mode, where it will become Highbyte of the memory.
"C" stands for Lowbyte and Highbyte of the memory at the same time.
TSC transfers the 16 bit Stackpointer to memory. If it's in 8 bit mode, the
Stackpointer-Lowbyte will be placed there, and the Highbyte goes to "B" memory.
If it's in 16 bit mode, you have the same Stackpointer in memory.
Shoving Games
A 16 bit word (two bytes) will be pushed onto the Stack by the PHA command as well (In 16
bit memory mode). You should know that the Highbyte is pushed first, then the
Lowbyte. Accordingly, the command PLA has the reverse effect: the Lowbyte ist
pushed first, then the Highbyte. But the 65816 has more in its bag of tricks:
the well known command sequence in which you must push the memory then put the X register
into memory, push it again, and then do the same with the Y register, is a thing of the
past. There are two new commands PHX, PHY and their opposites PLX and PLY.
With these you can push the Index register directly onto the Stack. Again, just like
with the memory, the Highbyte is pushed first, then the Lowbyte. This is, of course,
when the Index register is in 16 bit mode.
With a "Pull", just like with memory again, first the Lowbyte then the Highbyte
are retrieved. The command PHP (PusH Processor status), causes the flags to be
pushed to the Stack, and also causes the Status register to always be 8 bits wide.
Caution: the "ninth bit" won't be pushed! The only possible way to effect
it is with the well known XCE command.
Is That What You Thought?
Aside from the "normal" Stack applications and moving of the Stack you can also
do some very different things with the 65816. Remember: Normally the
Stackpointeris located at $01FF, but after a PHA is executed, it reduces itself by one,
and is located at $01FE. With a 16 bit PHA it reduces by two; for example from $01FF
to $01FD. But how would you put the Stackpointer at $07E7? Switch the memory
to 16 bit mode, execute an LDA #$2020 - the Lowbyte and Highbyte now contain the word for
an empty signal.
Now it can begin: PHA - we've already erased two signals, without having to use any DEX
command, and the processor has reset the pointer to $07E5! Now we send a PHA right
back. So how far have we come? Well, we'll get the Stackpointer from the X
register - TSX - and compare to see if it's smaller than $0400 (above the corner of the
screen) - the 16 bit Stackpointer will move itself to $06FF and respectively $06FE!
But is it bigger now? So far we have, with help, erased the screen stack - and we've
done it faster than it would have been possible with any other routine!
Except for a huge heap, STA's wouldn't be faster - you can foresee little more
insignificantly long code. See the whole thing in Listing
4.1 - it works! Listing 4.2 shown the same routine,
admittedly with the TSC and PHX commands. Even though it looks a little unusual to
us, they function exactly so. Incidentally, no interrupts can occur during the
execution of the routine - the reference addresses for the RTI would be pushed onto the
stack automatically - that is, they'll land in the middle of our screen. Moreover,
don't just fill the screen ram with empty signals - there's nothing wrong with erasing an
entire bitmap of this type.
The Whole Truth
Since we're on the subject of new stack manipulation commands, here are a few new stack
cammands. For example, the PEA command (Push Effective Absolute address) is a simple
command used to push a word onto a stack without having to put it into a register first.
As the name suggests the word is an address, but it's ultimately up to the
programmer. This command will always push a 16 bit word regardless of whether the M
or X flag is set to 8 or 16 bit.
You should be careful when dealing with the syntax of this command: a PEA $5000 means that
the 16 bit word $5000 will be pushed onto the stack, not the word stored at $5000 and
$5001 in memory! A close relative of the PEA command is the PEI command (Push
Effective Indirect address). With this command, you can push and address (which is a
pointer in the zero page) onto the stack. If $00 is in $FE, and $60 is in FF, a PEI
execution will cause $6000 to be pushed onto the stack. Again, it doesn't matter if you're
in 16 or 8 bit mode; a 16 bit word will be pushed. You must be careful with the syntax
here too: The command doesn't deal with the contents of the address at which the
declared zero page pointer is located. The address itself will be pushed onto the
stack.
Relocate It Yourself!
A really great new command is PER. PER stands for "Push Effective PC Relative
Indirect Address" and functions thusly: A PER $1000 works like PEI except that
before the $1000 gets pushed to the stack, it gets added to the value of the program
counter! So this command offers the possibility to write code that can move freely
about the memory. The value which is pushed onto the Stack at the moment that the
PER is executed will be added to the runtime of the program. So you can imagine
what's possible with direct addressing when dealing with data areas of memory. The
address behind the PER is not a concrete address, but rather like an offset (with the
branch commands), and interval "from here on".
This will be dealt with in a later part of this tutorial, "Stack-relative indexed
indirect addressing" (don't panic!). Moreover, there are no logical opposites
to these three commands (PEA, PEI, and PER) like with PHA-PLA or TCS-TSC.
Safe With 16 Bits
So that does it for the stack! Now a few finer points for dealing with 16 bits.
Perhaps you haven't even thought about it, but there is a difference between the
relationship between $7F and $80 in 8 bit mode and 16 bit mode. No, we don't mean 1,
but rather positive and negative. As the knowledgeable 6510 coder knows, going from
a positive to negative word will set the negative flag. Here, the 7th bit is seen as
the negative bit: If a bit of a value is set, the negative flag is also set. This
will sometimes be used to "drag the zero along": You must peep into the X
register from $27, then branch out, but not with a BNE, rather a BPL command. How
does it work when the Index registers are in 16 bit mode? It's actually quite
logical: Again, the next highest bit (15 in this case) is the indicator of negative
or positive, and the corresponding one will also set the negative flag!
The fact that addition and subtraction commands work well in 16 bit mode was seen in the
last part of this tutorial. But these are by no means the only opcodes which are
effected by the erasure of the M and X flags. For example, a 16 bit INC will count
up to the appointed address until it has reached $FF - an additional INC makes sure that
it will be $00 again (that much we know) - but now the byte at the address+1 will also be
raised by one - of course, because it will be seen as the highbyte of a 16 bit word!
An ASL in 16 byte mode will push two bytes to the left, and with an STA you can
write the highbyte and lowbyte of an address to the zero page, or at the same time change
the frame and background colors. The commands AND and ORA are self-explanitory with
16 bit wide arguments, regardless of whether appointed or gotten from memory. In Listing 4.3 you can see a few commands applied in 16 bit
mode.
Direct Switching
Now to the afore mentioned expanded feature of the SuperCPU's processor. The stack
isn't the only thing that will put itself wherever it wants to. The zeropage can
also be moved! It isn't stuck on page 0 in the memory anymore, that's why the
term "direct paging" is used, when refering to the 65816. You can have
direct access to it, even without having a declared highbyte. Zeropage addresses
were and are so beloved, because they can be accessed and used very quickly and
efficiently.
Without the Highbyte the code not only becomes shorter, but faster as well. That is
why you take up zero page addresses for all values which you want to work with often and
quickly. But you can address all the memory just like the zero page - one block at a
time of course - but nonetheless, each and every one that you want to work with!
There is the so called "Direct Page Register", which contains the word $0000 by
default (in emulation mode as well).
But we can change that: The command TCD transfers the contents of the 16 bit memory
into the direct page register. Then we load the word #$1000 into memory, and execute
a TCD. Now our zeropage (read: direct page) is at $1000! An LDA $50
deals with the address $1050. At the address $0050 we approach as before:
We either switch back and get a #$0000 from memory, and transfer to the direct page
register by way of a TCD, or we switch by way of LDA $0050. The TCD also has an
opposite, TDC. The "C" means that in every case, a 16 bit word will be
transferred, just like with the commands TCS and TSC. You can transfer the zero page
without switching to 16 bit mode: Here, we get help from the PLD command (PuLl
Direct page register from stack). Next, we push the Highbyte ($10) onto the stack
(with PHA), and then the same with the Lowbyte ($00). Now a PLD - this one gets 16
bits from the stack in the direct page register in every case, regardless of the status of
the M flag. Vice versa, you can push the value of this register onto the stack with the
PHD command in order to get it from any old register. Moreover, you can push values into
the direct page register at which the lowbyte is not zero. (The processor must always
calculate the addresses first, something which counteracts the speed advantage).
Insane Speed
The qualities of the randomly locatable zero page (direct page) open up unexpected
possiblities. Not only can you put your sound on it's own zero page, but there are
more advantages as well. At every memory location at which there is a certain
routine which you often use, you can use zero page addressing (direct page addressing) and
be able to count on some speed advantages.
So you can, for example, work faster by moving one region after the other of a bitmap to
zero page. The same applies for sounds, a bit of music with a corresponding routine
can be played easily without even using one raster line of the SuperCPU. In Listing 4.4 and in Listing 4.5
you can see how the direct page register works.
We can't think of any computer whose processor offers a similar possibility - surely not
in Windows PC's, where the addresses must deal with an extremly expendable hardware made
up of diverse segmentation pointers and offsets. Next time we'll go further - we're
still not far from the border to the 65816!
© 1999 GO64! Redax & Count Zero/SCS*TRC for all HTML Stuff