SuperCPU Kurs - Folge 2

SuperCPU Illuminated - Part 2

Last time we discussed how to recognize the SuperCPU. When you've determined that the turbo card is connected up correctly, you can use it to your advantage. Existing routines are sped up dramatically - but the 65816 processor contained in the SuperCPU can do a lot more!

New Opcodes

As everyone knows, the CMD turbo card will not run programs which use illegal opcodes. But why? Why can't the 65816 processor do it? The "Illegals" run on almost all processors, on new and old C64's and C128's, etc. Although often the opposite is true. The reason is simple: all hex words (from $00 - $FF) are reserved with opcodes by the 65816. While the 8500 (the processor in the new C64's) and the 8502 (new C128's) are based on the 6502/6510 processors, the 65816 is a totally new processor, a truely new development. Actually, the 65C02 is the direct predecessor of the 6502. With this processor came a few new commands, but nothing earth shattering. Further, not all legal opcodes with "NOP" were presented. The 65C02 was no "real" predecessor; quite the opposite of the 65816. Someone who knows all of the abilities of the 65816 and compares it to the 6502 can obviously see that they are worlds apart. There aren't just new types of commands, but also new forms of addressing and some new features which make writing faster programs fun. The 65816 has the ability to switch to 16 bit wide registers! This will be one of the first things you learn about in the tutorial.

Built-in Emulator

As soon as the 65816 gets power, it automatically switches itself to the so-called "Emulation Mode". In this mode the SuperCPU behaves exactly like the 6502, with the exception that the small 6502 errors will be rejected: the stack is where it always is, there is 64k of address space, 8 bit registers, a zero page at $0000 and the timing of the commands is exactly the same as the 6510. From these features you might have already begun to think about the possibilities offered by the 65816. In order to use the new abilities to the fullest there is "Native Mode". In this mode, the processor uses all of its advantages well. In this mode the concept of the 65XX series stays true - you'll see that the 65816 is the logical development of the 6510. There is a new flag, the "E-flag" which returns the CPU mode status, and with which you can change the mode.

The Gate Opens

Now we are gaining entrance into a new world. The Command XCE is there to exchange the contents of the Carry bit with the Emulation bit. It is the only command with which this flag can be set, because there is a quasi ninth flag "behind" the Carry bit.(See schematic). It is practical to have a "back door" of sorts, so to reach the Native mode - finally we're in Emulation mode, where there are no additional flags. If the flag is 1, the processor is in 6502 emulation mode. If it's 0, we're in the native mode of the 65816. Consequently, one turns on Native mode, in which you erase the Carry flag, and exchange it with the Emulation bit:

CLC
XCE

That's it! All of the abilities of the 65816 are open to us. Moreover you can check the status of the Carry bit, and see whether the CPU is in emulation mode or Native mode: The C-Flag is now either 1 or 0. But before we take the first step into that new land, we should make sure that we know the way back - how do you get the processor back into Emulation mode? Very simply: The E bit must be set to 1 again. Again, the way is carried out through the Carry bit:

SEC
XCE

Now we're back in 6502 Emulation mode. Through the switching into Native mode a few changes occur; changes about which you should be clear. The 65816 has the ability as well as the memory to switch to the 16 bit wide index register (X and Y). There are four possibilities of variation here: It can be either

1) Memory 8 bit, Index register 8 bit
2) Memory 16 bit, Index register 8 bit
3) Memory 8 bit, Index register 16 bit
4) Memory 16 bit, Index register 16 bit

In order to be able to enter and maintain these modes, there are two new flags in Native mode: the "M flag" and the "X flag" (See Schematic).

The High Flag

The "M" flag is responsible for the memory. If this flag is erased, the memory is in 16 bit mode. The other new flag, the "X" flag gives the width of the Index register X and Y. If it is erased, the registers are 16 bits wide; if it's set, they're in 8 bit mode. Both Index registers are changed together, you can't for example set the X register to 16 bit and the Y register to 8 bit. So how do you shut 16 bit mode off? Are there corresponding Set- and Clear- commands like the other flags? No - the developers of the 65816 have done something different here. You'll most likely want to switch back and forth between 8 and 16 bit modes - this should happen quickly. Instead of different commands for each new flag there are a few which kann erase ALL flags with one go. Don't worry - the old commands like SEI, CLC and so on are still there.

To change the flags for the width of the register, the new command-duo must be used. The command to erase is called REP (REset Processorstatus). By processor status, flags are implied (just like the well known commands PHP and PLP). After REP an 8 bit execution should follow, whereby each bit is represented by a flag. For example the Carry bit, the zero flag and the IRQ flag can be erased with:

REP #%00000111

It can be clearly seen here how the command works: with every set bit the corresponding bit is erased. The "M" and the "X" flags are found at bit 5 and bit 4, respectively. So to turn the memory and Index register to 8 bit, we do the following:

REP #%00110000

Now the memory and Index registers are 16 bits wide. All flags which contain a zero with the argument REP remain unchanged. With this command there is, as mentioned, s counterpart, with which you can set more flags. It's called SEP and functions analogous to REP. Now we'll turn the memory and Index register back to 8 bit:

SEP #%00110000

Every set bit in the argument SEP effects a set of the corresponding flag. All other flags remain unchanged.

But what does 16 bit give you? Does higher bit width really guarantee higher speed? The answer is: Yes, if you do it right. Just like in the past, whether you're using 16, 32 or 64 bit processors or whatever, you must skillfully program each one differently. It's clear that an addition of the numbers 5 and 7 uses more tact cycles on a 32 bit processor than on the C64 with an 8 bit processor - the 3 null byte ballast musn't be carried along as well.

This is the strong point of the 65816 in Native mode: the switching between 8 and 16 bit modoes - an ability which is either not available on other processors, or if it is present, it is very limited. The 16 bits can, as a matter of fact, achive considerable speed. A simple subtraction of two 16 bit numbers makes this clear:

The 8 bit 6510 must be programmed such that the low byte gets the first 16 bit number, then the second number is taken away from it. The result is saved, and then the high byte gets the original number and it is saved. In 16 bit mode, on the other hand, the 65816 gets the whole 16 bit number, takes the second number away from it, and saves the result. Three steps instead of six! LIsting 2.1 shows the 8 bit variation and listing 2.2 shows the same in 16 bit. If you don't totally understand the second one, don't worry, there will be more examples and explanations.

One thing is important: The opcodes stay the same! This has the advantage for the coder of not having to adjust them - the Assembler, however, can't possibly know which flags are set at run time. If "M" is in 16 bit, the CPU expects an absolute LDA, that is, a 16 bit word (two bytes). The same goes for the Index registers.

That is why the Assembler must be divided up into as many bits as it has. The pseudo opcode "Łal" turns the memory to 16 bit "Long" (in the case of the F8-Assblaster). With "Łas" it's turned to 8 bit "Short" mode. "Łrl" and "Łrs" have the analogous function when dealing with the Index registers. The commands are only considered for the Assembler, they aren't changed in object code! In Listing 2.2 you can see an example of this. Because the vectors for the interrupts are located in another place in Native mode, (where nothing happens in ROM) you should always shut off before you switch to Native mode - otherwise you will write another handler and deactivate the ROM. We will explore the location of the vectors in another part of this tutorial.

The Sixth Sense

The way that REP and SEP basically function is relatively easy to understand. It it somewhat harder to develop a sense for skillfull entry, since there are always several ways to do it. Even though you may want to use 16 bit mode for everything right now, you should know that 8 bit memory is better for some applications, for example like the manipulation of symbols in a Charset.

Old 6510 misers shouldn't think that their routines will lose efficiency, because in order to get the best out of the 65816, it is not permanently in 16 bit mode. The 8 bit memory and the 8 bit Index registers are really always there and can be very good for setting registers, something that the 6510 was already good at even before there was 16 bit mode. In 8 bit mode the 65816 offers many
advantages over the 6510. It is important to develop a sort of rythm, a "sixth sense" of when you should switch modes.

Persistant switching between 8 and 16 bit modes isn't ideal. But luckily we program the Assembler code ourselves, and we don't compile it with just any old compiler, which could produce who-knows-what kind of code, most of which is long and inefficient. It's obvious that a specific order of operations or calculations needn't always be followed!

So you should be aware what is possible in one specific mode before you go and switch into another. On the other hand, you should realize that a switch to turbo mode brings you very quickly to more efficiency. A large area just opened itself up to you, in which you can experiment and go through all the different possiblities, but you should try to develop a sense of how to use the switching betwenn modes in order to get the best performance.

The Same But Different

So what happens when the processor works on a program? How will the new flags be dealt with? You could see it as a development of the corresponding opcodes really. The processor executes the following commands:

object code    instruction
BD 00 20       LDA $2000,X

This command functions, as we know, to erase whatever's stored in the memory address $2000 plus the content of the X register. The contents of the X register can either be 8 or 16 bit, dependent on the X flag. The memory can be 8 or 16 bit, which means that it can be cleared from the appointed address (that is, Lowbyte from $2000 and Highbyte from $2001). The opcode $BD is the same as on the 6510, but with the flags M and X, the possibilities are expanded. Let's imagine that the processor finds the following byte sequence in memory:

object code
A9 00 20 A9 90 58

If the memory is 8 bit, use the following:

instruction
LDA #$00
JSR $90A9
CLI

If the M flag is 0, however, the memory is in 16 bit mode, so the processor reads:

instruction
LDA #$2000
LDA #$5890

$A9 is totally LDA. In 8 bit mode the succession $00 will be read and packed into memory - with this, the command is ended and the next byte, $20, is read. $20 stands for JSR - after a JSR come two bytes, in this case $A9 as the Lowbyte and $90 as Highbyte, this yealds a JSR $90A9. The command is done, and the next byte is fetched, a $58. The command CLI interprets this.

In 16 bit mode the whole thing goes like this: $A9 is totally LDA. The following are in 16 bit, $00 as Lowbyte and $20 as Highbyte. These are placed into memory. The command is completed, the next byte is fetched: $A9, again, totally LDA. Again, the Lowbyte in memory is fetched, $90, then the Highbyte, a $58. At this point the developer of the program doesn't have to concern himself further - the assembler takes care of everything, provided that you have set the right pseudo opcodes to the mode switch. In order to have the program work the way you want it to, the M and X flags of the 65816 corresponding with REP or SEP must be set as well as erased correctly. If you have understood thus far, the obstacle to your understanding 16 bit mode is no longer there!

Never Check the Highbyte Again

So we've seen how to work with the memory in 16 bit mode. Listing 2.3 shows a 16 bit addition; another example how you can work with 16 bit mode. More interestingly is the use of 16 bits in somewhat more complicated calculations. Listing 2.5 shows a genuine 16 bit multiplication. Listing 2.4 shows the same thing in 8 bit mode - the difference is very clear.

But the Index registers are still 16 bit! That means that X indexed LDA, for example, you can not just write to an area of 256 bytes, but rather and area of 65536. That's a full 64K! With a simple routine (like in Listing 6), the troublesome need to check the Highbyte dissapears. It's not tough to recognize that the known types of addressing work just like before! Moreover, an STA $1000,X needs exactly 5 tact cycles regardless of the bit width of the Index register! When the memory is in 16 bit, on the other hand, the command executes more slowly - two bytes are saved instead of one. Despite this, the command only takes one additional tact cycle to execute!

Back and Forth

If you switch a 16 bit Index register back to 8 bits, the Highbyte gets irrevocablly lost and will be set to zero. If you switch it the other way, however, the Highbyte is not lost.

In contrast to the Index registers, memory behaves in such a way that you can switch back and forth between modes without suffering any loss. If you switch the memory from 16 bit to 8 bit, the Lowbyte will become the new 8 bit memory A. The Highbyte goes into a "hidden" memory B. You can consider B an appendage of A. Although you'll find yourself in 8 bit mode, you can still try this:The command XBA (eXchange B and A) exchanges the contents of A and B.

So B is very usefull if you'd like to save a word inbetween saves when there are no more free registers available! If you switch from 8 bit memory mode to 16 bit memory mode, the 16 bit memory has the Lowbyte of the previous 8 bit memory. The Highbyte is the contents of the "hidden" B memory. Certain commands which see and deal with the memory as 16 bit often contain a "C" which points to the fact that full 16 bit memory is being used, regardless of your mode status (8 or 16 bit). These commands and additional offerings of the 65816 are left to the next part of the Tutorial!

 Processor Status Register  (in Emulation Mode)



   7   6   5   4   3   2   1   0

                             -----
                             |   |
                             | E |------- Emulation  1=6502 Emulation Mode
                             |   |
 ---------------------------------
 |   |   |   |   |   |   |   |   |
 | N | V |   | B | D | I | Z | C |----------- Carry  1=Carry
 |   |   |   |   |   |   |   |   |
 ---------------------------------
   |   |       |   |   |   |                
   |   |       |   |   |   ------------------- Zero  1=Result Zero
   |   |       |   |   |
   |   |       |   |   ---------------- IRQ Disable  1=Disabled
   |   |       |   |
   |   |       |   ------------------- Decimal Mode  1=Decimal, 0=Binary
   |   |       |
   |   |       ------------------ Break Instruction  1=Break caused Interrupt
   |   |
   |   |
   |   ----------------------------------- Overflow  1=Overflow
   |
   --------------------------------------- Negative  1=Negative

 Processor Status Register  (in Native Mode)



   7   6   5   4   3   2   1   0

                             -----
                             |   |
                             | E |------- Emulation  0=Native Mode
                             |   |
 ---------------------------------
 |   |   |   |   |   |   |   |   |
 | N | V | M | X | D | I | Z | C |----------- Carry  1=Carry
 |   |   |   |   |   |   |   |   |
 ---------------------------------
   |   |   |   |   |   |   |                
   |   |   |   |   |   |   ------------------- Zero  1=Result Zero
   |   |   |   |   |   |
   |   |   |   |   |   ---------------- IRQ Disable  1=Disabled
   |   |   |   |   |
   |   |   |   |   ------------------- Decimal Mode  1=Decimal, 0=Binary
   |   |   |   |
   |   |   |   -------------- Index Register Select  1=8 Bit, 0=16 Bit
   |   |   |   
   |   |   -------------- Memory/Accumulator Select  1=8 Bit, 0=16 Bit
   |   |
   |   ----------------------------------- Overflow  1=Overflow
   |
   --------------------------------------- Negative  1=Negative

(w) ThunderBlade/DMAgic

[ To Part 1 ][ To the Index ][ To Part 3 ]