Did anyone ever use the extra set of registers on the Z80?
Clash Royale CLAN TAG#URR8PPP
up vote
1
down vote
favorite
The Z80 has the surprising feature of a second set of registers. I suppose these were intended to be used for rapid task switching or interrupt handling, though I think if I were programming a Z80 retrocomputer, I would be more likely to use them for fast access to global variables.
Such small snippets of Z80 code as I have seen, do not use them, but then, that's not surprising; they are something that would be expected to be only used in large programs. Back in the day, I was on 6502 machines, so I never had occasion to write anything nontrivial on the Z80.
Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?
z80
add a comment |Â
up vote
1
down vote
favorite
The Z80 has the surprising feature of a second set of registers. I suppose these were intended to be used for rapid task switching or interrupt handling, though I think if I were programming a Z80 retrocomputer, I would be more likely to use them for fast access to global variables.
Such small snippets of Z80 code as I have seen, do not use them, but then, that's not surprising; they are something that would be expected to be only used in large programs. Back in the day, I was on 6502 machines, so I never had occasion to write anything nontrivial on the Z80.
Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?
z80
1
I could see the alternative registers being used for fast context switching, but I'm not sure how efficient they would act as fast global variables. To use them, you would have to swap register sets, (BC, DE, HL with their prime counterpart (AF could also be swapped with a different instruction)). Then you would have to preserve a copy of that data, perhaps onto the stack or into an index register, then swap the sets back. It would probably be quicker just to grab the variable directly from memory.
– RichF
1 hour ago
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
The Z80 has the surprising feature of a second set of registers. I suppose these were intended to be used for rapid task switching or interrupt handling, though I think if I were programming a Z80 retrocomputer, I would be more likely to use them for fast access to global variables.
Such small snippets of Z80 code as I have seen, do not use them, but then, that's not surprising; they are something that would be expected to be only used in large programs. Back in the day, I was on 6502 machines, so I never had occasion to write anything nontrivial on the Z80.
Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?
z80
The Z80 has the surprising feature of a second set of registers. I suppose these were intended to be used for rapid task switching or interrupt handling, though I think if I were programming a Z80 retrocomputer, I would be more likely to use them for fast access to global variables.
Such small snippets of Z80 code as I have seen, do not use them, but then, that's not surprising; they are something that would be expected to be only used in large programs. Back in the day, I was on 6502 machines, so I never had occasion to write anything nontrivial on the Z80.
Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?
z80
z80
asked 2 hours ago
rwallace
7,21623197
7,21623197
1
I could see the alternative registers being used for fast context switching, but I'm not sure how efficient they would act as fast global variables. To use them, you would have to swap register sets, (BC, DE, HL with their prime counterpart (AF could also be swapped with a different instruction)). Then you would have to preserve a copy of that data, perhaps onto the stack or into an index register, then swap the sets back. It would probably be quicker just to grab the variable directly from memory.
– RichF
1 hour ago
add a comment |Â
1
I could see the alternative registers being used for fast context switching, but I'm not sure how efficient they would act as fast global variables. To use them, you would have to swap register sets, (BC, DE, HL with their prime counterpart (AF could also be swapped with a different instruction)). Then you would have to preserve a copy of that data, perhaps onto the stack or into an index register, then swap the sets back. It would probably be quicker just to grab the variable directly from memory.
– RichF
1 hour ago
1
1
I could see the alternative registers being used for fast context switching, but I'm not sure how efficient they would act as fast global variables. To use them, you would have to swap register sets, (BC, DE, HL with their prime counterpart (AF could also be swapped with a different instruction)). Then you would have to preserve a copy of that data, perhaps onto the stack or into an index register, then swap the sets back. It would probably be quicker just to grab the variable directly from memory.
– RichF
1 hour ago
I could see the alternative registers being used for fast context switching, but I'm not sure how efficient they would act as fast global variables. To use them, you would have to swap register sets, (BC, DE, HL with their prime counterpart (AF could also be swapped with a different instruction)). Then you would have to preserve a copy of that data, perhaps onto the stack or into an index register, then swap the sets back. It would probably be quicker just to grab the variable directly from memory.
– RichF
1 hour ago
add a comment |Â
3 Answers
3
active
oldest
votes
up vote
2
down vote
accepted
The Z80 has the surprising feature of a second set of registers. I suppose these were intended to be used for rapid task switching or interrupt handling,
Indeed they where intended for fast interrupt reaction. In a sinple, general way, this saved the time to push the main process' registers onto the stack and restore them again. they spend single byte opcodes to do so to get the absolute minimum execution time - like the Z80 Technical Manual states on p.26:
OP code 08H allows the programmer to switch between the two pairs of accumulator
flag registers while D9H allows the programmer to switch between the duplicate
set of six general purpose registers. These OP codes are only one byte in length
to absolutely minimize the time necessary to perform the exchange so that the
duplicate banks can be used to effect very fast interrupt response times.
EX
and EXX
only thake 4 T-cycles, while even just pushing a simple 16 bit register would take 11 cycles plus another 15 to load it again. 8 T-cycles instead of 25 or more cycles is a considerable faster reaction, isn't it?
That's also why there are two EX*
instruction, as very simple routines may only (use and) need to preseve the flags and A
. This leaves the whole second set (except AF
) for other purpose. Like being used in normal software, or for even more speedup in I/O.
After all, the Second set can not only be used for some kind of fast 'stack', but even be prepared for a certain I/O operation. Think maybe of a serial interface receivng at high speed. Loading things like the memory pointer where received data is to be placed, the numbers of bytes to receive and so on, does take quite some time (16 T-Cycles for a 16 Bit pointer, 13 for a byte value) - and they need to be stored later on as well.
If these values are placed in the second register set before the high speed interrupt driven routine gets active, no loads and stores are to be executed. Intterrupt service time gets reduced to the absolute minimum, not only causing less interruption of the main process, but also working up to higher speeds.
After all, the Z80 design was mainly focused on a more flexible, configurable and faster interrupt handling.
though I think if I were programming a Z80 retrocomputer, I would be more likely to use them for fast access to global variables.
I can't see much gain here. Sure, 6 additional bytes or 3 pointers, but at the same time you can't access the other ones. So there are not many cases where the secondary registerset is helpful - beside interrupts and 'dead end' subroutines.
Such small snippets of Z80 code as I have seen, do not use them, but then, that's not surprising; they are something that would be expected to be only used in large programs.
Well, it's exactly the region where they are usefull - to speed up small functions.
Back in the day, I was on 6502 machines, so I never had occasion to write anything nontrivial on the Z80.
Did both, and while they need different aproaches, the result is usually quite similar.
Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?
It was quite common to use them either for interrupt (mostly in embedded systems) or 'dead end' routines.
So within a single task, the most likely place to use them would be in leaf subroutines, so you can have the use of a full set of registers without having to spend cycles saving and restoring those used by the rest of the program. Okay, that makes sense.
– rwallace
1 hour ago
@rwallace yes, except there's till the issue of parameter passing.
– Raffzahn
32 mins ago
add a comment |Â
up vote
1
down vote
The key to efficient programming on Z80 is to use registers as much as possible. I can easily believe that designers of Z80 intended the use of the alternative set of registers as an efficient way of context switching. However, the context switching does not tend to happen often enough to use the alternative set of registers only for that; the gains are simply not worth it most of the time. Hence, the good practice of Z80 programming is typically about using as many registers as possible and still use stack for saving registers during the interrupts.
Now, let me give you several ideas on how one would benefit from having two sets of equivalent registers. A typical pixel scrolling for 16 byte wide bitmap can look e.g. as follows:
rl (hl) : dec l ; repeated 16 times
What if one needs to scroll by 2 pixels at a time?
rl (hl) : ex af,af' : rl (hl) : ex af,af' : dec l ; repeated 16 times
is the fastest way. OK, this is only using the second accumulator. Let us consider fast copying. The obvious
ld a,(hl) : ld (de),a : inc hl : inc de ; 26 t-states
which is actually very slow. Unrolled
ldi ; 16 t-states
is better and, in fact, is often acceptably fast. However, the fastest copiers are based on (semi-)unrolled code loading and saving the data via the stack, e.g. as follows:
ld sp,.. : pop af : pop bc : pop de : pop hl
exx : ex af,af' : pop af : pop bc : pop de : pop hl
ld sp,.. : push hl : push de : push bc : push af
exx : ex af,af' : push hl : push de : push bc : push af
; 10+10*4 + 4*2+10*4 + 10+11*4 + 4*2+11*4 = 204 t-states per 16 bytes
i.e. 12.75 t-states per byte. And note that this is not esoteric; variations of this idea were used in a huge number of commercial games on ZX Spectrum.
Much non-trivial code, e.g. fast polygon fillers or texture mappers are only possible with decent speed if one uses both sets of registers simultaneously.
add a comment |Â
up vote
1
down vote
Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?
This being one occasion when a personal experience answer will do, EXX
is ideal for the very specific task of multiplying a 16-bit 2d vector by a scalar, which makes it helpful for 2d vector graphics, and the projection part of 3d vector graphics.
Specifically:
- use
A
for the multiplier — rotate right from it into carry; - use
BC
andBC'
for the working copy of the multiplicands; these will need shifting left on each iteration; - use
HL
andHL'
to accumulate the result; performADD HL, BC
if carry is set after theRRA
.
So the specific convenient observations are:
- you're juggling four 16-bit quantities, but they interact only in pairs;
- and using
EXX
lets you use the 16-bit arithmetic that's right there on the main instruction page.
add a comment |Â
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
The Z80 has the surprising feature of a second set of registers. I suppose these were intended to be used for rapid task switching or interrupt handling,
Indeed they where intended for fast interrupt reaction. In a sinple, general way, this saved the time to push the main process' registers onto the stack and restore them again. they spend single byte opcodes to do so to get the absolute minimum execution time - like the Z80 Technical Manual states on p.26:
OP code 08H allows the programmer to switch between the two pairs of accumulator
flag registers while D9H allows the programmer to switch between the duplicate
set of six general purpose registers. These OP codes are only one byte in length
to absolutely minimize the time necessary to perform the exchange so that the
duplicate banks can be used to effect very fast interrupt response times.
EX
and EXX
only thake 4 T-cycles, while even just pushing a simple 16 bit register would take 11 cycles plus another 15 to load it again. 8 T-cycles instead of 25 or more cycles is a considerable faster reaction, isn't it?
That's also why there are two EX*
instruction, as very simple routines may only (use and) need to preseve the flags and A
. This leaves the whole second set (except AF
) for other purpose. Like being used in normal software, or for even more speedup in I/O.
After all, the Second set can not only be used for some kind of fast 'stack', but even be prepared for a certain I/O operation. Think maybe of a serial interface receivng at high speed. Loading things like the memory pointer where received data is to be placed, the numbers of bytes to receive and so on, does take quite some time (16 T-Cycles for a 16 Bit pointer, 13 for a byte value) - and they need to be stored later on as well.
If these values are placed in the second register set before the high speed interrupt driven routine gets active, no loads and stores are to be executed. Intterrupt service time gets reduced to the absolute minimum, not only causing less interruption of the main process, but also working up to higher speeds.
After all, the Z80 design was mainly focused on a more flexible, configurable and faster interrupt handling.
though I think if I were programming a Z80 retrocomputer, I would be more likely to use them for fast access to global variables.
I can't see much gain here. Sure, 6 additional bytes or 3 pointers, but at the same time you can't access the other ones. So there are not many cases where the secondary registerset is helpful - beside interrupts and 'dead end' subroutines.
Such small snippets of Z80 code as I have seen, do not use them, but then, that's not surprising; they are something that would be expected to be only used in large programs.
Well, it's exactly the region where they are usefull - to speed up small functions.
Back in the day, I was on 6502 machines, so I never had occasion to write anything nontrivial on the Z80.
Did both, and while they need different aproaches, the result is usually quite similar.
Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?
It was quite common to use them either for interrupt (mostly in embedded systems) or 'dead end' routines.
So within a single task, the most likely place to use them would be in leaf subroutines, so you can have the use of a full set of registers without having to spend cycles saving and restoring those used by the rest of the program. Okay, that makes sense.
– rwallace
1 hour ago
@rwallace yes, except there's till the issue of parameter passing.
– Raffzahn
32 mins ago
add a comment |Â
up vote
2
down vote
accepted
The Z80 has the surprising feature of a second set of registers. I suppose these were intended to be used for rapid task switching or interrupt handling,
Indeed they where intended for fast interrupt reaction. In a sinple, general way, this saved the time to push the main process' registers onto the stack and restore them again. they spend single byte opcodes to do so to get the absolute minimum execution time - like the Z80 Technical Manual states on p.26:
OP code 08H allows the programmer to switch between the two pairs of accumulator
flag registers while D9H allows the programmer to switch between the duplicate
set of six general purpose registers. These OP codes are only one byte in length
to absolutely minimize the time necessary to perform the exchange so that the
duplicate banks can be used to effect very fast interrupt response times.
EX
and EXX
only thake 4 T-cycles, while even just pushing a simple 16 bit register would take 11 cycles plus another 15 to load it again. 8 T-cycles instead of 25 or more cycles is a considerable faster reaction, isn't it?
That's also why there are two EX*
instruction, as very simple routines may only (use and) need to preseve the flags and A
. This leaves the whole second set (except AF
) for other purpose. Like being used in normal software, or for even more speedup in I/O.
After all, the Second set can not only be used for some kind of fast 'stack', but even be prepared for a certain I/O operation. Think maybe of a serial interface receivng at high speed. Loading things like the memory pointer where received data is to be placed, the numbers of bytes to receive and so on, does take quite some time (16 T-Cycles for a 16 Bit pointer, 13 for a byte value) - and they need to be stored later on as well.
If these values are placed in the second register set before the high speed interrupt driven routine gets active, no loads and stores are to be executed. Intterrupt service time gets reduced to the absolute minimum, not only causing less interruption of the main process, but also working up to higher speeds.
After all, the Z80 design was mainly focused on a more flexible, configurable and faster interrupt handling.
though I think if I were programming a Z80 retrocomputer, I would be more likely to use them for fast access to global variables.
I can't see much gain here. Sure, 6 additional bytes or 3 pointers, but at the same time you can't access the other ones. So there are not many cases where the secondary registerset is helpful - beside interrupts and 'dead end' subroutines.
Such small snippets of Z80 code as I have seen, do not use them, but then, that's not surprising; they are something that would be expected to be only used in large programs.
Well, it's exactly the region where they are usefull - to speed up small functions.
Back in the day, I was on 6502 machines, so I never had occasion to write anything nontrivial on the Z80.
Did both, and while they need different aproaches, the result is usually quite similar.
Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?
It was quite common to use them either for interrupt (mostly in embedded systems) or 'dead end' routines.
So within a single task, the most likely place to use them would be in leaf subroutines, so you can have the use of a full set of registers without having to spend cycles saving and restoring those used by the rest of the program. Okay, that makes sense.
– rwallace
1 hour ago
@rwallace yes, except there's till the issue of parameter passing.
– Raffzahn
32 mins ago
add a comment |Â
up vote
2
down vote
accepted
up vote
2
down vote
accepted
The Z80 has the surprising feature of a second set of registers. I suppose these were intended to be used for rapid task switching or interrupt handling,
Indeed they where intended for fast interrupt reaction. In a sinple, general way, this saved the time to push the main process' registers onto the stack and restore them again. they spend single byte opcodes to do so to get the absolute minimum execution time - like the Z80 Technical Manual states on p.26:
OP code 08H allows the programmer to switch between the two pairs of accumulator
flag registers while D9H allows the programmer to switch between the duplicate
set of six general purpose registers. These OP codes are only one byte in length
to absolutely minimize the time necessary to perform the exchange so that the
duplicate banks can be used to effect very fast interrupt response times.
EX
and EXX
only thake 4 T-cycles, while even just pushing a simple 16 bit register would take 11 cycles plus another 15 to load it again. 8 T-cycles instead of 25 or more cycles is a considerable faster reaction, isn't it?
That's also why there are two EX*
instruction, as very simple routines may only (use and) need to preseve the flags and A
. This leaves the whole second set (except AF
) for other purpose. Like being used in normal software, or for even more speedup in I/O.
After all, the Second set can not only be used for some kind of fast 'stack', but even be prepared for a certain I/O operation. Think maybe of a serial interface receivng at high speed. Loading things like the memory pointer where received data is to be placed, the numbers of bytes to receive and so on, does take quite some time (16 T-Cycles for a 16 Bit pointer, 13 for a byte value) - and they need to be stored later on as well.
If these values are placed in the second register set before the high speed interrupt driven routine gets active, no loads and stores are to be executed. Intterrupt service time gets reduced to the absolute minimum, not only causing less interruption of the main process, but also working up to higher speeds.
After all, the Z80 design was mainly focused on a more flexible, configurable and faster interrupt handling.
though I think if I were programming a Z80 retrocomputer, I would be more likely to use them for fast access to global variables.
I can't see much gain here. Sure, 6 additional bytes or 3 pointers, but at the same time you can't access the other ones. So there are not many cases where the secondary registerset is helpful - beside interrupts and 'dead end' subroutines.
Such small snippets of Z80 code as I have seen, do not use them, but then, that's not surprising; they are something that would be expected to be only used in large programs.
Well, it's exactly the region where they are usefull - to speed up small functions.
Back in the day, I was on 6502 machines, so I never had occasion to write anything nontrivial on the Z80.
Did both, and while they need different aproaches, the result is usually quite similar.
Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?
It was quite common to use them either for interrupt (mostly in embedded systems) or 'dead end' routines.
The Z80 has the surprising feature of a second set of registers. I suppose these were intended to be used for rapid task switching or interrupt handling,
Indeed they where intended for fast interrupt reaction. In a sinple, general way, this saved the time to push the main process' registers onto the stack and restore them again. they spend single byte opcodes to do so to get the absolute minimum execution time - like the Z80 Technical Manual states on p.26:
OP code 08H allows the programmer to switch between the two pairs of accumulator
flag registers while D9H allows the programmer to switch between the duplicate
set of six general purpose registers. These OP codes are only one byte in length
to absolutely minimize the time necessary to perform the exchange so that the
duplicate banks can be used to effect very fast interrupt response times.
EX
and EXX
only thake 4 T-cycles, while even just pushing a simple 16 bit register would take 11 cycles plus another 15 to load it again. 8 T-cycles instead of 25 or more cycles is a considerable faster reaction, isn't it?
That's also why there are two EX*
instruction, as very simple routines may only (use and) need to preseve the flags and A
. This leaves the whole second set (except AF
) for other purpose. Like being used in normal software, or for even more speedup in I/O.
After all, the Second set can not only be used for some kind of fast 'stack', but even be prepared for a certain I/O operation. Think maybe of a serial interface receivng at high speed. Loading things like the memory pointer where received data is to be placed, the numbers of bytes to receive and so on, does take quite some time (16 T-Cycles for a 16 Bit pointer, 13 for a byte value) - and they need to be stored later on as well.
If these values are placed in the second register set before the high speed interrupt driven routine gets active, no loads and stores are to be executed. Intterrupt service time gets reduced to the absolute minimum, not only causing less interruption of the main process, but also working up to higher speeds.
After all, the Z80 design was mainly focused on a more flexible, configurable and faster interrupt handling.
though I think if I were programming a Z80 retrocomputer, I would be more likely to use them for fast access to global variables.
I can't see much gain here. Sure, 6 additional bytes or 3 pointers, but at the same time you can't access the other ones. So there are not many cases where the secondary registerset is helpful - beside interrupts and 'dead end' subroutines.
Such small snippets of Z80 code as I have seen, do not use them, but then, that's not surprising; they are something that would be expected to be only used in large programs.
Well, it's exactly the region where they are usefull - to speed up small functions.
Back in the day, I was on 6502 machines, so I never had occasion to write anything nontrivial on the Z80.
Did both, and while they need different aproaches, the result is usually quite similar.
Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?
It was quite common to use them either for interrupt (mostly in embedded systems) or 'dead end' routines.
answered 1 hour ago


Raffzahn
35.7k478141
35.7k478141
So within a single task, the most likely place to use them would be in leaf subroutines, so you can have the use of a full set of registers without having to spend cycles saving and restoring those used by the rest of the program. Okay, that makes sense.
– rwallace
1 hour ago
@rwallace yes, except there's till the issue of parameter passing.
– Raffzahn
32 mins ago
add a comment |Â
So within a single task, the most likely place to use them would be in leaf subroutines, so you can have the use of a full set of registers without having to spend cycles saving and restoring those used by the rest of the program. Okay, that makes sense.
– rwallace
1 hour ago
@rwallace yes, except there's till the issue of parameter passing.
– Raffzahn
32 mins ago
So within a single task, the most likely place to use them would be in leaf subroutines, so you can have the use of a full set of registers without having to spend cycles saving and restoring those used by the rest of the program. Okay, that makes sense.
– rwallace
1 hour ago
So within a single task, the most likely place to use them would be in leaf subroutines, so you can have the use of a full set of registers without having to spend cycles saving and restoring those used by the rest of the program. Okay, that makes sense.
– rwallace
1 hour ago
@rwallace yes, except there's till the issue of parameter passing.
– Raffzahn
32 mins ago
@rwallace yes, except there's till the issue of parameter passing.
– Raffzahn
32 mins ago
add a comment |Â
up vote
1
down vote
The key to efficient programming on Z80 is to use registers as much as possible. I can easily believe that designers of Z80 intended the use of the alternative set of registers as an efficient way of context switching. However, the context switching does not tend to happen often enough to use the alternative set of registers only for that; the gains are simply not worth it most of the time. Hence, the good practice of Z80 programming is typically about using as many registers as possible and still use stack for saving registers during the interrupts.
Now, let me give you several ideas on how one would benefit from having two sets of equivalent registers. A typical pixel scrolling for 16 byte wide bitmap can look e.g. as follows:
rl (hl) : dec l ; repeated 16 times
What if one needs to scroll by 2 pixels at a time?
rl (hl) : ex af,af' : rl (hl) : ex af,af' : dec l ; repeated 16 times
is the fastest way. OK, this is only using the second accumulator. Let us consider fast copying. The obvious
ld a,(hl) : ld (de),a : inc hl : inc de ; 26 t-states
which is actually very slow. Unrolled
ldi ; 16 t-states
is better and, in fact, is often acceptably fast. However, the fastest copiers are based on (semi-)unrolled code loading and saving the data via the stack, e.g. as follows:
ld sp,.. : pop af : pop bc : pop de : pop hl
exx : ex af,af' : pop af : pop bc : pop de : pop hl
ld sp,.. : push hl : push de : push bc : push af
exx : ex af,af' : push hl : push de : push bc : push af
; 10+10*4 + 4*2+10*4 + 10+11*4 + 4*2+11*4 = 204 t-states per 16 bytes
i.e. 12.75 t-states per byte. And note that this is not esoteric; variations of this idea were used in a huge number of commercial games on ZX Spectrum.
Much non-trivial code, e.g. fast polygon fillers or texture mappers are only possible with decent speed if one uses both sets of registers simultaneously.
add a comment |Â
up vote
1
down vote
The key to efficient programming on Z80 is to use registers as much as possible. I can easily believe that designers of Z80 intended the use of the alternative set of registers as an efficient way of context switching. However, the context switching does not tend to happen often enough to use the alternative set of registers only for that; the gains are simply not worth it most of the time. Hence, the good practice of Z80 programming is typically about using as many registers as possible and still use stack for saving registers during the interrupts.
Now, let me give you several ideas on how one would benefit from having two sets of equivalent registers. A typical pixel scrolling for 16 byte wide bitmap can look e.g. as follows:
rl (hl) : dec l ; repeated 16 times
What if one needs to scroll by 2 pixels at a time?
rl (hl) : ex af,af' : rl (hl) : ex af,af' : dec l ; repeated 16 times
is the fastest way. OK, this is only using the second accumulator. Let us consider fast copying. The obvious
ld a,(hl) : ld (de),a : inc hl : inc de ; 26 t-states
which is actually very slow. Unrolled
ldi ; 16 t-states
is better and, in fact, is often acceptably fast. However, the fastest copiers are based on (semi-)unrolled code loading and saving the data via the stack, e.g. as follows:
ld sp,.. : pop af : pop bc : pop de : pop hl
exx : ex af,af' : pop af : pop bc : pop de : pop hl
ld sp,.. : push hl : push de : push bc : push af
exx : ex af,af' : push hl : push de : push bc : push af
; 10+10*4 + 4*2+10*4 + 10+11*4 + 4*2+11*4 = 204 t-states per 16 bytes
i.e. 12.75 t-states per byte. And note that this is not esoteric; variations of this idea were used in a huge number of commercial games on ZX Spectrum.
Much non-trivial code, e.g. fast polygon fillers or texture mappers are only possible with decent speed if one uses both sets of registers simultaneously.
add a comment |Â
up vote
1
down vote
up vote
1
down vote
The key to efficient programming on Z80 is to use registers as much as possible. I can easily believe that designers of Z80 intended the use of the alternative set of registers as an efficient way of context switching. However, the context switching does not tend to happen often enough to use the alternative set of registers only for that; the gains are simply not worth it most of the time. Hence, the good practice of Z80 programming is typically about using as many registers as possible and still use stack for saving registers during the interrupts.
Now, let me give you several ideas on how one would benefit from having two sets of equivalent registers. A typical pixel scrolling for 16 byte wide bitmap can look e.g. as follows:
rl (hl) : dec l ; repeated 16 times
What if one needs to scroll by 2 pixels at a time?
rl (hl) : ex af,af' : rl (hl) : ex af,af' : dec l ; repeated 16 times
is the fastest way. OK, this is only using the second accumulator. Let us consider fast copying. The obvious
ld a,(hl) : ld (de),a : inc hl : inc de ; 26 t-states
which is actually very slow. Unrolled
ldi ; 16 t-states
is better and, in fact, is often acceptably fast. However, the fastest copiers are based on (semi-)unrolled code loading and saving the data via the stack, e.g. as follows:
ld sp,.. : pop af : pop bc : pop de : pop hl
exx : ex af,af' : pop af : pop bc : pop de : pop hl
ld sp,.. : push hl : push de : push bc : push af
exx : ex af,af' : push hl : push de : push bc : push af
; 10+10*4 + 4*2+10*4 + 10+11*4 + 4*2+11*4 = 204 t-states per 16 bytes
i.e. 12.75 t-states per byte. And note that this is not esoteric; variations of this idea were used in a huge number of commercial games on ZX Spectrum.
Much non-trivial code, e.g. fast polygon fillers or texture mappers are only possible with decent speed if one uses both sets of registers simultaneously.
The key to efficient programming on Z80 is to use registers as much as possible. I can easily believe that designers of Z80 intended the use of the alternative set of registers as an efficient way of context switching. However, the context switching does not tend to happen often enough to use the alternative set of registers only for that; the gains are simply not worth it most of the time. Hence, the good practice of Z80 programming is typically about using as many registers as possible and still use stack for saving registers during the interrupts.
Now, let me give you several ideas on how one would benefit from having two sets of equivalent registers. A typical pixel scrolling for 16 byte wide bitmap can look e.g. as follows:
rl (hl) : dec l ; repeated 16 times
What if one needs to scroll by 2 pixels at a time?
rl (hl) : ex af,af' : rl (hl) : ex af,af' : dec l ; repeated 16 times
is the fastest way. OK, this is only using the second accumulator. Let us consider fast copying. The obvious
ld a,(hl) : ld (de),a : inc hl : inc de ; 26 t-states
which is actually very slow. Unrolled
ldi ; 16 t-states
is better and, in fact, is often acceptably fast. However, the fastest copiers are based on (semi-)unrolled code loading and saving the data via the stack, e.g. as follows:
ld sp,.. : pop af : pop bc : pop de : pop hl
exx : ex af,af' : pop af : pop bc : pop de : pop hl
ld sp,.. : push hl : push de : push bc : push af
exx : ex af,af' : push hl : push de : push bc : push af
; 10+10*4 + 4*2+10*4 + 10+11*4 + 4*2+11*4 = 204 t-states per 16 bytes
i.e. 12.75 t-states per byte. And note that this is not esoteric; variations of this idea were used in a huge number of commercial games on ZX Spectrum.
Much non-trivial code, e.g. fast polygon fillers or texture mappers are only possible with decent speed if one uses both sets of registers simultaneously.
answered 26 mins ago
introspec
1,3381512
1,3381512
add a comment |Â
add a comment |Â
up vote
1
down vote
Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?
This being one occasion when a personal experience answer will do, EXX
is ideal for the very specific task of multiplying a 16-bit 2d vector by a scalar, which makes it helpful for 2d vector graphics, and the projection part of 3d vector graphics.
Specifically:
- use
A
for the multiplier — rotate right from it into carry; - use
BC
andBC'
for the working copy of the multiplicands; these will need shifting left on each iteration; - use
HL
andHL'
to accumulate the result; performADD HL, BC
if carry is set after theRRA
.
So the specific convenient observations are:
- you're juggling four 16-bit quantities, but they interact only in pairs;
- and using
EXX
lets you use the 16-bit arithmetic that's right there on the main instruction page.
add a comment |Â
up vote
1
down vote
Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?
This being one occasion when a personal experience answer will do, EXX
is ideal for the very specific task of multiplying a 16-bit 2d vector by a scalar, which makes it helpful for 2d vector graphics, and the projection part of 3d vector graphics.
Specifically:
- use
A
for the multiplier — rotate right from it into carry; - use
BC
andBC'
for the working copy of the multiplicands; these will need shifting left on each iteration; - use
HL
andHL'
to accumulate the result; performADD HL, BC
if carry is set after theRRA
.
So the specific convenient observations are:
- you're juggling four 16-bit quantities, but they interact only in pairs;
- and using
EXX
lets you use the 16-bit arithmetic that's right there on the main instruction page.
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?
This being one occasion when a personal experience answer will do, EXX
is ideal for the very specific task of multiplying a 16-bit 2d vector by a scalar, which makes it helpful for 2d vector graphics, and the projection part of 3d vector graphics.
Specifically:
- use
A
for the multiplier — rotate right from it into carry; - use
BC
andBC'
for the working copy of the multiplicands; these will need shifting left on each iteration; - use
HL
andHL'
to accumulate the result; performADD HL, BC
if carry is set after theRRA
.
So the specific convenient observations are:
- you're juggling four 16-bit quantities, but they interact only in pairs;
- and using
EXX
lets you use the 16-bit arithmetic that's right there on the main instruction page.
Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?
This being one occasion when a personal experience answer will do, EXX
is ideal for the very specific task of multiplying a 16-bit 2d vector by a scalar, which makes it helpful for 2d vector graphics, and the projection part of 3d vector graphics.
Specifically:
- use
A
for the multiplier — rotate right from it into carry; - use
BC
andBC'
for the working copy of the multiplicands; these will need shifting left on each iteration; - use
HL
andHL'
to accumulate the result; performADD HL, BC
if carry is set after theRRA
.
So the specific convenient observations are:
- you're juggling four 16-bit quantities, but they interact only in pairs;
- and using
EXX
lets you use the 16-bit arithmetic that's right there on the main instruction page.
answered 10 mins ago
Tommy
12.3k13262
12.3k13262
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fretrocomputing.stackexchange.com%2fquestions%2f7794%2fdid-anyone-ever-use-the-extra-set-of-registers-on-the-z80%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
1
I could see the alternative registers being used for fast context switching, but I'm not sure how efficient they would act as fast global variables. To use them, you would have to swap register sets, (BC, DE, HL with their prime counterpart (AF could also be swapped with a different instruction)). Then you would have to preserve a copy of that data, perhaps onto the stack or into an index register, then swap the sets back. It would probably be quicker just to grab the variable directly from memory.
– RichF
1 hour ago