Did anyone ever use the extra set of registers on the Z80?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite












The Z80 has the surprising feature of a second set of registers. I suppose these were intended to be used for rapid task switching or interrupt handling, though I think if I were programming a Z80 retrocomputer, I would be more likely to use them for fast access to global variables.



Such small snippets of Z80 code as I have seen, do not use them, but then, that's not surprising; they are something that would be expected to be only used in large programs. Back in the day, I was on 6502 machines, so I never had occasion to write anything nontrivial on the Z80.



Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?










share|improve this question

















  • 1




    I could see the alternative registers being used for fast context switching, but I'm not sure how efficient they would act as fast global variables. To use them, you would have to swap register sets, (BC, DE, HL with their prime counterpart (AF could also be swapped with a different instruction)). Then you would have to preserve a copy of that data, perhaps onto the stack or into an index register, then swap the sets back. It would probably be quicker just to grab the variable directly from memory.
    – RichF
    1 hour ago














up vote
1
down vote

favorite












The Z80 has the surprising feature of a second set of registers. I suppose these were intended to be used for rapid task switching or interrupt handling, though I think if I were programming a Z80 retrocomputer, I would be more likely to use them for fast access to global variables.



Such small snippets of Z80 code as I have seen, do not use them, but then, that's not surprising; they are something that would be expected to be only used in large programs. Back in the day, I was on 6502 machines, so I never had occasion to write anything nontrivial on the Z80.



Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?










share|improve this question

















  • 1




    I could see the alternative registers being used for fast context switching, but I'm not sure how efficient they would act as fast global variables. To use them, you would have to swap register sets, (BC, DE, HL with their prime counterpart (AF could also be swapped with a different instruction)). Then you would have to preserve a copy of that data, perhaps onto the stack or into an index register, then swap the sets back. It would probably be quicker just to grab the variable directly from memory.
    – RichF
    1 hour ago












up vote
1
down vote

favorite









up vote
1
down vote

favorite











The Z80 has the surprising feature of a second set of registers. I suppose these were intended to be used for rapid task switching or interrupt handling, though I think if I were programming a Z80 retrocomputer, I would be more likely to use them for fast access to global variables.



Such small snippets of Z80 code as I have seen, do not use them, but then, that's not surprising; they are something that would be expected to be only used in large programs. Back in the day, I was on 6502 machines, so I never had occasion to write anything nontrivial on the Z80.



Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?










share|improve this question













The Z80 has the surprising feature of a second set of registers. I suppose these were intended to be used for rapid task switching or interrupt handling, though I think if I were programming a Z80 retrocomputer, I would be more likely to use them for fast access to global variables.



Such small snippets of Z80 code as I have seen, do not use them, but then, that's not surprising; they are something that would be expected to be only used in large programs. Back in the day, I was on 6502 machines, so I never had occasion to write anything nontrivial on the Z80.



Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?







z80






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked 2 hours ago









rwallace

7,21623197




7,21623197







  • 1




    I could see the alternative registers being used for fast context switching, but I'm not sure how efficient they would act as fast global variables. To use them, you would have to swap register sets, (BC, DE, HL with their prime counterpart (AF could also be swapped with a different instruction)). Then you would have to preserve a copy of that data, perhaps onto the stack or into an index register, then swap the sets back. It would probably be quicker just to grab the variable directly from memory.
    – RichF
    1 hour ago












  • 1




    I could see the alternative registers being used for fast context switching, but I'm not sure how efficient they would act as fast global variables. To use them, you would have to swap register sets, (BC, DE, HL with their prime counterpart (AF could also be swapped with a different instruction)). Then you would have to preserve a copy of that data, perhaps onto the stack or into an index register, then swap the sets back. It would probably be quicker just to grab the variable directly from memory.
    – RichF
    1 hour ago







1




1




I could see the alternative registers being used for fast context switching, but I'm not sure how efficient they would act as fast global variables. To use them, you would have to swap register sets, (BC, DE, HL with their prime counterpart (AF could also be swapped with a different instruction)). Then you would have to preserve a copy of that data, perhaps onto the stack or into an index register, then swap the sets back. It would probably be quicker just to grab the variable directly from memory.
– RichF
1 hour ago




I could see the alternative registers being used for fast context switching, but I'm not sure how efficient they would act as fast global variables. To use them, you would have to swap register sets, (BC, DE, HL with their prime counterpart (AF could also be swapped with a different instruction)). Then you would have to preserve a copy of that data, perhaps onto the stack or into an index register, then swap the sets back. It would probably be quicker just to grab the variable directly from memory.
– RichF
1 hour ago










3 Answers
3






active

oldest

votes

















up vote
2
down vote



accepted











The Z80 has the surprising feature of a second set of registers. I suppose these were intended to be used for rapid task switching or interrupt handling,




Indeed they where intended for fast interrupt reaction. In a sinple, general way, this saved the time to push the main process' registers onto the stack and restore them again. they spend single byte opcodes to do so to get the absolute minimum execution time - like the Z80 Technical Manual states on p.26:



OP code 08H allows the programmer to switch between the two pairs of accumulator
flag registers while D9H allows the programmer to switch between the duplicate
set of six general purpose registers. These OP codes are only one byte in length
to absolutely minimize the time necessary to perform the exchange so that the
duplicate banks can be used to effect very fast interrupt response times.


EX and EXX only thake 4 T-cycles, while even just pushing a simple 16 bit register would take 11 cycles plus another 15 to load it again. 8 T-cycles instead of 25 or more cycles is a considerable faster reaction, isn't it?



That's also why there are two EX* instruction, as very simple routines may only (use and) need to preseve the flags and A. This leaves the whole second set (except AF) for other purpose. Like being used in normal software, or for even more speedup in I/O.



After all, the Second set can not only be used for some kind of fast 'stack', but even be prepared for a certain I/O operation. Think maybe of a serial interface receivng at high speed. Loading things like the memory pointer where received data is to be placed, the numbers of bytes to receive and so on, does take quite some time (16 T-Cycles for a 16 Bit pointer, 13 for a byte value) - and they need to be stored later on as well.



If these values are placed in the second register set before the high speed interrupt driven routine gets active, no loads and stores are to be executed. Intterrupt service time gets reduced to the absolute minimum, not only causing less interruption of the main process, but also working up to higher speeds.



After all, the Z80 design was mainly focused on a more flexible, configurable and faster interrupt handling.




though I think if I were programming a Z80 retrocomputer, I would be more likely to use them for fast access to global variables.




I can't see much gain here. Sure, 6 additional bytes or 3 pointers, but at the same time you can't access the other ones. So there are not many cases where the secondary registerset is helpful - beside interrupts and 'dead end' subroutines.




Such small snippets of Z80 code as I have seen, do not use them, but then, that's not surprising; they are something that would be expected to be only used in large programs.




Well, it's exactly the region where they are usefull - to speed up small functions.




Back in the day, I was on 6502 machines, so I never had occasion to write anything nontrivial on the Z80.




Did both, and while they need different aproaches, the result is usually quite similar.




Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?




It was quite common to use them either for interrupt (mostly in embedded systems) or 'dead end' routines.






share|improve this answer




















  • So within a single task, the most likely place to use them would be in leaf subroutines, so you can have the use of a full set of registers without having to spend cycles saving and restoring those used by the rest of the program. Okay, that makes sense.
    – rwallace
    1 hour ago










  • @rwallace yes, except there's till the issue of parameter passing.
    – Raffzahn
    32 mins ago

















up vote
1
down vote













The key to efficient programming on Z80 is to use registers as much as possible. I can easily believe that designers of Z80 intended the use of the alternative set of registers as an efficient way of context switching. However, the context switching does not tend to happen often enough to use the alternative set of registers only for that; the gains are simply not worth it most of the time. Hence, the good practice of Z80 programming is typically about using as many registers as possible and still use stack for saving registers during the interrupts.



Now, let me give you several ideas on how one would benefit from having two sets of equivalent registers. A typical pixel scrolling for 16 byte wide bitmap can look e.g. as follows:



rl (hl) : dec l ; repeated 16 times


What if one needs to scroll by 2 pixels at a time?



rl (hl) : ex af,af' : rl (hl) : ex af,af' : dec l ; repeated 16 times


is the fastest way. OK, this is only using the second accumulator. Let us consider fast copying. The obvious



ld a,(hl) : ld (de),a : inc hl : inc de ; 26 t-states


which is actually very slow. Unrolled



ldi ; 16 t-states


is better and, in fact, is often acceptably fast. However, the fastest copiers are based on (semi-)unrolled code loading and saving the data via the stack, e.g. as follows:



ld sp,.. : pop af : pop bc : pop de : pop hl
exx : ex af,af' : pop af : pop bc : pop de : pop hl
ld sp,.. : push hl : push de : push bc : push af
exx : ex af,af' : push hl : push de : push bc : push af
; 10+10*4 + 4*2+10*4 + 10+11*4 + 4*2+11*4 = 204 t-states per 16 bytes


i.e. 12.75 t-states per byte. And note that this is not esoteric; variations of this idea were used in a huge number of commercial games on ZX Spectrum.



Much non-trivial code, e.g. fast polygon fillers or texture mappers are only possible with decent speed if one uses both sets of registers simultaneously.






share|improve this answer



























    up vote
    1
    down vote














    Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?




    This being one occasion when a personal experience answer will do, EXX is ideal for the very specific task of multiplying a 16-bit 2d vector by a scalar, which makes it helpful for 2d vector graphics, and the projection part of 3d vector graphics.



    Specifically:



    • use A for the multiplier — rotate right from it into carry;

    • use BC and BC' for the working copy of the multiplicands; these will need shifting left on each iteration;

    • use HL and HL' to accumulate the result; perform ADD HL, BC if carry is set after the RRA.

    So the specific convenient observations are:



    • you're juggling four 16-bit quantities, but they interact only in pairs;

    • and using EXX lets you use the 16-bit arithmetic that's right there on the main instruction page.





    share|improve this answer




















      Your Answer







      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "648"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      convertImagesToLinks: false,
      noModals: false,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      noCode: true, onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













       

      draft saved


      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fretrocomputing.stackexchange.com%2fquestions%2f7794%2fdid-anyone-ever-use-the-extra-set-of-registers-on-the-z80%23new-answer', 'question_page');

      );

      Post as a guest






























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      2
      down vote



      accepted











      The Z80 has the surprising feature of a second set of registers. I suppose these were intended to be used for rapid task switching or interrupt handling,




      Indeed they where intended for fast interrupt reaction. In a sinple, general way, this saved the time to push the main process' registers onto the stack and restore them again. they spend single byte opcodes to do so to get the absolute minimum execution time - like the Z80 Technical Manual states on p.26:



      OP code 08H allows the programmer to switch between the two pairs of accumulator
      flag registers while D9H allows the programmer to switch between the duplicate
      set of six general purpose registers. These OP codes are only one byte in length
      to absolutely minimize the time necessary to perform the exchange so that the
      duplicate banks can be used to effect very fast interrupt response times.


      EX and EXX only thake 4 T-cycles, while even just pushing a simple 16 bit register would take 11 cycles plus another 15 to load it again. 8 T-cycles instead of 25 or more cycles is a considerable faster reaction, isn't it?



      That's also why there are two EX* instruction, as very simple routines may only (use and) need to preseve the flags and A. This leaves the whole second set (except AF) for other purpose. Like being used in normal software, or for even more speedup in I/O.



      After all, the Second set can not only be used for some kind of fast 'stack', but even be prepared for a certain I/O operation. Think maybe of a serial interface receivng at high speed. Loading things like the memory pointer where received data is to be placed, the numbers of bytes to receive and so on, does take quite some time (16 T-Cycles for a 16 Bit pointer, 13 for a byte value) - and they need to be stored later on as well.



      If these values are placed in the second register set before the high speed interrupt driven routine gets active, no loads and stores are to be executed. Intterrupt service time gets reduced to the absolute minimum, not only causing less interruption of the main process, but also working up to higher speeds.



      After all, the Z80 design was mainly focused on a more flexible, configurable and faster interrupt handling.




      though I think if I were programming a Z80 retrocomputer, I would be more likely to use them for fast access to global variables.




      I can't see much gain here. Sure, 6 additional bytes or 3 pointers, but at the same time you can't access the other ones. So there are not many cases where the secondary registerset is helpful - beside interrupts and 'dead end' subroutines.




      Such small snippets of Z80 code as I have seen, do not use them, but then, that's not surprising; they are something that would be expected to be only used in large programs.




      Well, it's exactly the region where they are usefull - to speed up small functions.




      Back in the day, I was on 6502 machines, so I never had occasion to write anything nontrivial on the Z80.




      Did both, and while they need different aproaches, the result is usually quite similar.




      Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?




      It was quite common to use them either for interrupt (mostly in embedded systems) or 'dead end' routines.






      share|improve this answer




















      • So within a single task, the most likely place to use them would be in leaf subroutines, so you can have the use of a full set of registers without having to spend cycles saving and restoring those used by the rest of the program. Okay, that makes sense.
        – rwallace
        1 hour ago










      • @rwallace yes, except there's till the issue of parameter passing.
        – Raffzahn
        32 mins ago














      up vote
      2
      down vote



      accepted











      The Z80 has the surprising feature of a second set of registers. I suppose these were intended to be used for rapid task switching or interrupt handling,




      Indeed they where intended for fast interrupt reaction. In a sinple, general way, this saved the time to push the main process' registers onto the stack and restore them again. they spend single byte opcodes to do so to get the absolute minimum execution time - like the Z80 Technical Manual states on p.26:



      OP code 08H allows the programmer to switch between the two pairs of accumulator
      flag registers while D9H allows the programmer to switch between the duplicate
      set of six general purpose registers. These OP codes are only one byte in length
      to absolutely minimize the time necessary to perform the exchange so that the
      duplicate banks can be used to effect very fast interrupt response times.


      EX and EXX only thake 4 T-cycles, while even just pushing a simple 16 bit register would take 11 cycles plus another 15 to load it again. 8 T-cycles instead of 25 or more cycles is a considerable faster reaction, isn't it?



      That's also why there are two EX* instruction, as very simple routines may only (use and) need to preseve the flags and A. This leaves the whole second set (except AF) for other purpose. Like being used in normal software, or for even more speedup in I/O.



      After all, the Second set can not only be used for some kind of fast 'stack', but even be prepared for a certain I/O operation. Think maybe of a serial interface receivng at high speed. Loading things like the memory pointer where received data is to be placed, the numbers of bytes to receive and so on, does take quite some time (16 T-Cycles for a 16 Bit pointer, 13 for a byte value) - and they need to be stored later on as well.



      If these values are placed in the second register set before the high speed interrupt driven routine gets active, no loads and stores are to be executed. Intterrupt service time gets reduced to the absolute minimum, not only causing less interruption of the main process, but also working up to higher speeds.



      After all, the Z80 design was mainly focused on a more flexible, configurable and faster interrupt handling.




      though I think if I were programming a Z80 retrocomputer, I would be more likely to use them for fast access to global variables.




      I can't see much gain here. Sure, 6 additional bytes or 3 pointers, but at the same time you can't access the other ones. So there are not many cases where the secondary registerset is helpful - beside interrupts and 'dead end' subroutines.




      Such small snippets of Z80 code as I have seen, do not use them, but then, that's not surprising; they are something that would be expected to be only used in large programs.




      Well, it's exactly the region where they are usefull - to speed up small functions.




      Back in the day, I was on 6502 machines, so I never had occasion to write anything nontrivial on the Z80.




      Did both, and while they need different aproaches, the result is usually quite similar.




      Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?




      It was quite common to use them either for interrupt (mostly in embedded systems) or 'dead end' routines.






      share|improve this answer




















      • So within a single task, the most likely place to use them would be in leaf subroutines, so you can have the use of a full set of registers without having to spend cycles saving and restoring those used by the rest of the program. Okay, that makes sense.
        – rwallace
        1 hour ago










      • @rwallace yes, except there's till the issue of parameter passing.
        – Raffzahn
        32 mins ago












      up vote
      2
      down vote



      accepted







      up vote
      2
      down vote



      accepted







      The Z80 has the surprising feature of a second set of registers. I suppose these were intended to be used for rapid task switching or interrupt handling,




      Indeed they where intended for fast interrupt reaction. In a sinple, general way, this saved the time to push the main process' registers onto the stack and restore them again. they spend single byte opcodes to do so to get the absolute minimum execution time - like the Z80 Technical Manual states on p.26:



      OP code 08H allows the programmer to switch between the two pairs of accumulator
      flag registers while D9H allows the programmer to switch between the duplicate
      set of six general purpose registers. These OP codes are only one byte in length
      to absolutely minimize the time necessary to perform the exchange so that the
      duplicate banks can be used to effect very fast interrupt response times.


      EX and EXX only thake 4 T-cycles, while even just pushing a simple 16 bit register would take 11 cycles plus another 15 to load it again. 8 T-cycles instead of 25 or more cycles is a considerable faster reaction, isn't it?



      That's also why there are two EX* instruction, as very simple routines may only (use and) need to preseve the flags and A. This leaves the whole second set (except AF) for other purpose. Like being used in normal software, or for even more speedup in I/O.



      After all, the Second set can not only be used for some kind of fast 'stack', but even be prepared for a certain I/O operation. Think maybe of a serial interface receivng at high speed. Loading things like the memory pointer where received data is to be placed, the numbers of bytes to receive and so on, does take quite some time (16 T-Cycles for a 16 Bit pointer, 13 for a byte value) - and they need to be stored later on as well.



      If these values are placed in the second register set before the high speed interrupt driven routine gets active, no loads and stores are to be executed. Intterrupt service time gets reduced to the absolute minimum, not only causing less interruption of the main process, but also working up to higher speeds.



      After all, the Z80 design was mainly focused on a more flexible, configurable and faster interrupt handling.




      though I think if I were programming a Z80 retrocomputer, I would be more likely to use them for fast access to global variables.




      I can't see much gain here. Sure, 6 additional bytes or 3 pointers, but at the same time you can't access the other ones. So there are not many cases where the secondary registerset is helpful - beside interrupts and 'dead end' subroutines.




      Such small snippets of Z80 code as I have seen, do not use them, but then, that's not surprising; they are something that would be expected to be only used in large programs.




      Well, it's exactly the region where they are usefull - to speed up small functions.




      Back in the day, I was on 6502 machines, so I never had occasion to write anything nontrivial on the Z80.




      Did both, and while they need different aproaches, the result is usually quite similar.




      Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?




      It was quite common to use them either for interrupt (mostly in embedded systems) or 'dead end' routines.






      share|improve this answer













      The Z80 has the surprising feature of a second set of registers. I suppose these were intended to be used for rapid task switching or interrupt handling,




      Indeed they where intended for fast interrupt reaction. In a sinple, general way, this saved the time to push the main process' registers onto the stack and restore them again. they spend single byte opcodes to do so to get the absolute minimum execution time - like the Z80 Technical Manual states on p.26:



      OP code 08H allows the programmer to switch between the two pairs of accumulator
      flag registers while D9H allows the programmer to switch between the duplicate
      set of six general purpose registers. These OP codes are only one byte in length
      to absolutely minimize the time necessary to perform the exchange so that the
      duplicate banks can be used to effect very fast interrupt response times.


      EX and EXX only thake 4 T-cycles, while even just pushing a simple 16 bit register would take 11 cycles plus another 15 to load it again. 8 T-cycles instead of 25 or more cycles is a considerable faster reaction, isn't it?



      That's also why there are two EX* instruction, as very simple routines may only (use and) need to preseve the flags and A. This leaves the whole second set (except AF) for other purpose. Like being used in normal software, or for even more speedup in I/O.



      After all, the Second set can not only be used for some kind of fast 'stack', but even be prepared for a certain I/O operation. Think maybe of a serial interface receivng at high speed. Loading things like the memory pointer where received data is to be placed, the numbers of bytes to receive and so on, does take quite some time (16 T-Cycles for a 16 Bit pointer, 13 for a byte value) - and they need to be stored later on as well.



      If these values are placed in the second register set before the high speed interrupt driven routine gets active, no loads and stores are to be executed. Intterrupt service time gets reduced to the absolute minimum, not only causing less interruption of the main process, but also working up to higher speeds.



      After all, the Z80 design was mainly focused on a more flexible, configurable and faster interrupt handling.




      though I think if I were programming a Z80 retrocomputer, I would be more likely to use them for fast access to global variables.




      I can't see much gain here. Sure, 6 additional bytes or 3 pointers, but at the same time you can't access the other ones. So there are not many cases where the secondary registerset is helpful - beside interrupts and 'dead end' subroutines.




      Such small snippets of Z80 code as I have seen, do not use them, but then, that's not surprising; they are something that would be expected to be only used in large programs.




      Well, it's exactly the region where they are usefull - to speed up small functions.




      Back in the day, I was on 6502 machines, so I never had occasion to write anything nontrivial on the Z80.




      Did both, and while they need different aproaches, the result is usually quite similar.




      Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?




      It was quite common to use them either for interrupt (mostly in embedded systems) or 'dead end' routines.







      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered 1 hour ago









      Raffzahn

      35.7k478141




      35.7k478141











      • So within a single task, the most likely place to use them would be in leaf subroutines, so you can have the use of a full set of registers without having to spend cycles saving and restoring those used by the rest of the program. Okay, that makes sense.
        – rwallace
        1 hour ago










      • @rwallace yes, except there's till the issue of parameter passing.
        – Raffzahn
        32 mins ago
















      • So within a single task, the most likely place to use them would be in leaf subroutines, so you can have the use of a full set of registers without having to spend cycles saving and restoring those used by the rest of the program. Okay, that makes sense.
        – rwallace
        1 hour ago










      • @rwallace yes, except there's till the issue of parameter passing.
        – Raffzahn
        32 mins ago















      So within a single task, the most likely place to use them would be in leaf subroutines, so you can have the use of a full set of registers without having to spend cycles saving and restoring those used by the rest of the program. Okay, that makes sense.
      – rwallace
      1 hour ago




      So within a single task, the most likely place to use them would be in leaf subroutines, so you can have the use of a full set of registers without having to spend cycles saving and restoring those used by the rest of the program. Okay, that makes sense.
      – rwallace
      1 hour ago












      @rwallace yes, except there's till the issue of parameter passing.
      – Raffzahn
      32 mins ago




      @rwallace yes, except there's till the issue of parameter passing.
      – Raffzahn
      32 mins ago










      up vote
      1
      down vote













      The key to efficient programming on Z80 is to use registers as much as possible. I can easily believe that designers of Z80 intended the use of the alternative set of registers as an efficient way of context switching. However, the context switching does not tend to happen often enough to use the alternative set of registers only for that; the gains are simply not worth it most of the time. Hence, the good practice of Z80 programming is typically about using as many registers as possible and still use stack for saving registers during the interrupts.



      Now, let me give you several ideas on how one would benefit from having two sets of equivalent registers. A typical pixel scrolling for 16 byte wide bitmap can look e.g. as follows:



      rl (hl) : dec l ; repeated 16 times


      What if one needs to scroll by 2 pixels at a time?



      rl (hl) : ex af,af' : rl (hl) : ex af,af' : dec l ; repeated 16 times


      is the fastest way. OK, this is only using the second accumulator. Let us consider fast copying. The obvious



      ld a,(hl) : ld (de),a : inc hl : inc de ; 26 t-states


      which is actually very slow. Unrolled



      ldi ; 16 t-states


      is better and, in fact, is often acceptably fast. However, the fastest copiers are based on (semi-)unrolled code loading and saving the data via the stack, e.g. as follows:



      ld sp,.. : pop af : pop bc : pop de : pop hl
      exx : ex af,af' : pop af : pop bc : pop de : pop hl
      ld sp,.. : push hl : push de : push bc : push af
      exx : ex af,af' : push hl : push de : push bc : push af
      ; 10+10*4 + 4*2+10*4 + 10+11*4 + 4*2+11*4 = 204 t-states per 16 bytes


      i.e. 12.75 t-states per byte. And note that this is not esoteric; variations of this idea were used in a huge number of commercial games on ZX Spectrum.



      Much non-trivial code, e.g. fast polygon fillers or texture mappers are only possible with decent speed if one uses both sets of registers simultaneously.






      share|improve this answer
























        up vote
        1
        down vote













        The key to efficient programming on Z80 is to use registers as much as possible. I can easily believe that designers of Z80 intended the use of the alternative set of registers as an efficient way of context switching. However, the context switching does not tend to happen often enough to use the alternative set of registers only for that; the gains are simply not worth it most of the time. Hence, the good practice of Z80 programming is typically about using as many registers as possible and still use stack for saving registers during the interrupts.



        Now, let me give you several ideas on how one would benefit from having two sets of equivalent registers. A typical pixel scrolling for 16 byte wide bitmap can look e.g. as follows:



        rl (hl) : dec l ; repeated 16 times


        What if one needs to scroll by 2 pixels at a time?



        rl (hl) : ex af,af' : rl (hl) : ex af,af' : dec l ; repeated 16 times


        is the fastest way. OK, this is only using the second accumulator. Let us consider fast copying. The obvious



        ld a,(hl) : ld (de),a : inc hl : inc de ; 26 t-states


        which is actually very slow. Unrolled



        ldi ; 16 t-states


        is better and, in fact, is often acceptably fast. However, the fastest copiers are based on (semi-)unrolled code loading and saving the data via the stack, e.g. as follows:



        ld sp,.. : pop af : pop bc : pop de : pop hl
        exx : ex af,af' : pop af : pop bc : pop de : pop hl
        ld sp,.. : push hl : push de : push bc : push af
        exx : ex af,af' : push hl : push de : push bc : push af
        ; 10+10*4 + 4*2+10*4 + 10+11*4 + 4*2+11*4 = 204 t-states per 16 bytes


        i.e. 12.75 t-states per byte. And note that this is not esoteric; variations of this idea were used in a huge number of commercial games on ZX Spectrum.



        Much non-trivial code, e.g. fast polygon fillers or texture mappers are only possible with decent speed if one uses both sets of registers simultaneously.






        share|improve this answer






















          up vote
          1
          down vote










          up vote
          1
          down vote









          The key to efficient programming on Z80 is to use registers as much as possible. I can easily believe that designers of Z80 intended the use of the alternative set of registers as an efficient way of context switching. However, the context switching does not tend to happen often enough to use the alternative set of registers only for that; the gains are simply not worth it most of the time. Hence, the good practice of Z80 programming is typically about using as many registers as possible and still use stack for saving registers during the interrupts.



          Now, let me give you several ideas on how one would benefit from having two sets of equivalent registers. A typical pixel scrolling for 16 byte wide bitmap can look e.g. as follows:



          rl (hl) : dec l ; repeated 16 times


          What if one needs to scroll by 2 pixels at a time?



          rl (hl) : ex af,af' : rl (hl) : ex af,af' : dec l ; repeated 16 times


          is the fastest way. OK, this is only using the second accumulator. Let us consider fast copying. The obvious



          ld a,(hl) : ld (de),a : inc hl : inc de ; 26 t-states


          which is actually very slow. Unrolled



          ldi ; 16 t-states


          is better and, in fact, is often acceptably fast. However, the fastest copiers are based on (semi-)unrolled code loading and saving the data via the stack, e.g. as follows:



          ld sp,.. : pop af : pop bc : pop de : pop hl
          exx : ex af,af' : pop af : pop bc : pop de : pop hl
          ld sp,.. : push hl : push de : push bc : push af
          exx : ex af,af' : push hl : push de : push bc : push af
          ; 10+10*4 + 4*2+10*4 + 10+11*4 + 4*2+11*4 = 204 t-states per 16 bytes


          i.e. 12.75 t-states per byte. And note that this is not esoteric; variations of this idea were used in a huge number of commercial games on ZX Spectrum.



          Much non-trivial code, e.g. fast polygon fillers or texture mappers are only possible with decent speed if one uses both sets of registers simultaneously.






          share|improve this answer












          The key to efficient programming on Z80 is to use registers as much as possible. I can easily believe that designers of Z80 intended the use of the alternative set of registers as an efficient way of context switching. However, the context switching does not tend to happen often enough to use the alternative set of registers only for that; the gains are simply not worth it most of the time. Hence, the good practice of Z80 programming is typically about using as many registers as possible and still use stack for saving registers during the interrupts.



          Now, let me give you several ideas on how one would benefit from having two sets of equivalent registers. A typical pixel scrolling for 16 byte wide bitmap can look e.g. as follows:



          rl (hl) : dec l ; repeated 16 times


          What if one needs to scroll by 2 pixels at a time?



          rl (hl) : ex af,af' : rl (hl) : ex af,af' : dec l ; repeated 16 times


          is the fastest way. OK, this is only using the second accumulator. Let us consider fast copying. The obvious



          ld a,(hl) : ld (de),a : inc hl : inc de ; 26 t-states


          which is actually very slow. Unrolled



          ldi ; 16 t-states


          is better and, in fact, is often acceptably fast. However, the fastest copiers are based on (semi-)unrolled code loading and saving the data via the stack, e.g. as follows:



          ld sp,.. : pop af : pop bc : pop de : pop hl
          exx : ex af,af' : pop af : pop bc : pop de : pop hl
          ld sp,.. : push hl : push de : push bc : push af
          exx : ex af,af' : push hl : push de : push bc : push af
          ; 10+10*4 + 4*2+10*4 + 10+11*4 + 4*2+11*4 = 204 t-states per 16 bytes


          i.e. 12.75 t-states per byte. And note that this is not esoteric; variations of this idea were used in a huge number of commercial games on ZX Spectrum.



          Much non-trivial code, e.g. fast polygon fillers or texture mappers are only possible with decent speed if one uses both sets of registers simultaneously.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered 26 mins ago









          introspec

          1,3381512




          1,3381512




















              up vote
              1
              down vote














              Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?




              This being one occasion when a personal experience answer will do, EXX is ideal for the very specific task of multiplying a 16-bit 2d vector by a scalar, which makes it helpful for 2d vector graphics, and the projection part of 3d vector graphics.



              Specifically:



              • use A for the multiplier — rotate right from it into carry;

              • use BC and BC' for the working copy of the multiplicands; these will need shifting left on each iteration;

              • use HL and HL' to accumulate the result; perform ADD HL, BC if carry is set after the RRA.

              So the specific convenient observations are:



              • you're juggling four 16-bit quantities, but they interact only in pairs;

              • and using EXX lets you use the 16-bit arithmetic that's right there on the main instruction page.





              share|improve this answer
























                up vote
                1
                down vote














                Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?




                This being one occasion when a personal experience answer will do, EXX is ideal for the very specific task of multiplying a 16-bit 2d vector by a scalar, which makes it helpful for 2d vector graphics, and the projection part of 3d vector graphics.



                Specifically:



                • use A for the multiplier — rotate right from it into carry;

                • use BC and BC' for the working copy of the multiplicands; these will need shifting left on each iteration;

                • use HL and HL' to accumulate the result; perform ADD HL, BC if carry is set after the RRA.

                So the specific convenient observations are:



                • you're juggling four 16-bit quantities, but they interact only in pairs;

                • and using EXX lets you use the 16-bit arithmetic that's right there on the main instruction page.





                share|improve this answer






















                  up vote
                  1
                  down vote










                  up vote
                  1
                  down vote










                  Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?




                  This being one occasion when a personal experience answer will do, EXX is ideal for the very specific task of multiplying a 16-bit 2d vector by a scalar, which makes it helpful for 2d vector graphics, and the projection part of 3d vector graphics.



                  Specifically:



                  • use A for the multiplier — rotate right from it into carry;

                  • use BC and BC' for the working copy of the multiplicands; these will need shifting left on each iteration;

                  • use HL and HL' to accumulate the result; perform ADD HL, BC if carry is set after the RRA.

                  So the specific convenient observations are:



                  • you're juggling four 16-bit quantities, but they interact only in pairs;

                  • and using EXX lets you use the 16-bit arithmetic that's right there on the main instruction page.





                  share|improve this answer













                  Did anyone ever use that second register bank, either for its intended purpose, or just to get more registers within a single task?




                  This being one occasion when a personal experience answer will do, EXX is ideal for the very specific task of multiplying a 16-bit 2d vector by a scalar, which makes it helpful for 2d vector graphics, and the projection part of 3d vector graphics.



                  Specifically:



                  • use A for the multiplier — rotate right from it into carry;

                  • use BC and BC' for the working copy of the multiplicands; these will need shifting left on each iteration;

                  • use HL and HL' to accumulate the result; perform ADD HL, BC if carry is set after the RRA.

                  So the specific convenient observations are:



                  • you're juggling four 16-bit quantities, but they interact only in pairs;

                  • and using EXX lets you use the 16-bit arithmetic that's right there on the main instruction page.






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered 10 mins ago









                  Tommy

                  12.3k13262




                  12.3k13262



























                       

                      draft saved


                      draft discarded















































                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fretrocomputing.stackexchange.com%2fquestions%2f7794%2fdid-anyone-ever-use-the-extra-set-of-registers-on-the-z80%23new-answer', 'question_page');

                      );

                      Post as a guest













































































                      Comments

                      Popular posts from this blog

                      Long meetings (6-7 hours a day): Being “babysat” by supervisor

                      Is the Concept of Multiple Fantasy Races Scientifically Flawed? [closed]

                      Confectionery