Why does RAID-5 require an additional disk for parity blocks?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite












I know that RAID-5 consists of block-level striping across multiple disks, but using an additional parity-check block on each disk .. and that at least two disks are required for striping.



And it's obvious that each parity block is specific to each disk it belongs to (and so there is no need for allocating an additional disk).



RAID 5
Image from Wikipedia.



However I've been unable to understand why in fact there is an additional disk required for parity checks, as I found on this article:




The minimum number of disks in a RAID 5 set is three (two for data and
one for parity).




Any idea?










share|cite|improve this question



















  • 1




    "Striping" (arranging in stripes), not "stripping" (removing).
    – David Richerby
    2 hours ago














up vote
1
down vote

favorite












I know that RAID-5 consists of block-level striping across multiple disks, but using an additional parity-check block on each disk .. and that at least two disks are required for striping.



And it's obvious that each parity block is specific to each disk it belongs to (and so there is no need for allocating an additional disk).



RAID 5
Image from Wikipedia.



However I've been unable to understand why in fact there is an additional disk required for parity checks, as I found on this article:




The minimum number of disks in a RAID 5 set is three (two for data and
one for parity).




Any idea?










share|cite|improve this question



















  • 1




    "Striping" (arranging in stripes), not "stripping" (removing).
    – David Richerby
    2 hours ago












up vote
1
down vote

favorite









up vote
1
down vote

favorite











I know that RAID-5 consists of block-level striping across multiple disks, but using an additional parity-check block on each disk .. and that at least two disks are required for striping.



And it's obvious that each parity block is specific to each disk it belongs to (and so there is no need for allocating an additional disk).



RAID 5
Image from Wikipedia.



However I've been unable to understand why in fact there is an additional disk required for parity checks, as I found on this article:




The minimum number of disks in a RAID 5 set is three (two for data and
one for parity).




Any idea?










share|cite|improve this question















I know that RAID-5 consists of block-level striping across multiple disks, but using an additional parity-check block on each disk .. and that at least two disks are required for striping.



And it's obvious that each parity block is specific to each disk it belongs to (and so there is no need for allocating an additional disk).



RAID 5
Image from Wikipedia.



However I've been unable to understand why in fact there is an additional disk required for parity checks, as I found on this article:




The minimum number of disks in a RAID 5 set is three (two for data and
one for parity).




Any idea?







storage fault-tolerance space-partitioning






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited 2 hours ago









David Richerby

63.1k1595180




63.1k1595180










asked 4 hours ago









Kais

208113




208113







  • 1




    "Striping" (arranging in stripes), not "stripping" (removing).
    – David Richerby
    2 hours ago












  • 1




    "Striping" (arranging in stripes), not "stripping" (removing).
    – David Richerby
    2 hours ago







1




1




"Striping" (arranging in stripes), not "stripping" (removing).
– David Richerby
2 hours ago




"Striping" (arranging in stripes), not "stripping" (removing).
– David Richerby
2 hours ago










1 Answer
1






active

oldest

votes

















up vote
3
down vote



accepted










I think you've misunderstood what the parity data is. They're not parity checks, so it's not true that "each parity block is specific to each disc it belongs to." The parity data is to allow recovery from a failed disc.



Let's go back to RAID-4 for a second, and assume we have three discs: discs $0$ and  $1$ are data and disc $2$ is parity. "Parity" means that the $b$th block of disc $2$ is the xor of the $b$th blocks of discs $0$ and $1$. The point is that, if any single disc fails, we can recover its data because the $b$th block of any disc is the xor of the $b$th block on the other two discs. For this to work, it's crucial that the parity data is on a separate discs. If you only had two discs and put the parity data on those discs (e.g., each disc was two-thirds data blocks and one-third parity blocks) then the failure of a single drive would destroy some blocks and their corresponding parity data, so you'd be unable to recover the data using just what was left on the remaining disc.



RAID-5 is the same idea except that, instead of putting all the parity data on the last disc, it's spread across all the discs. So, for a three-disc set-up, a third of the blocks would have parity data on disc $2$, a third on disc $1$ and a third on disc $0$.



The point of using RAID-5 rather than RAID-4 is that every time you write data, the corresponding parity block must be updated. If all parity data is on the same disc, that disc will be written to much more than the other discs ($k$ times as much, in a $k$-disc sytem), so it will fail faster. Spreading the parity data across the discs evens out the wear on them.






share|cite|improve this answer




















  • if I understood correctly, the parity block on each disk, as illustrated on the figure above such as Ap, is computed using hamming code for storing parity information for A1, A2 and A3?
    – Kais
    18 mins ago











  • Wikipedia says it's the XOR, and the word "parity" usually implies that. (Though Wikipedia also uses "parity" for RAID-6, which allows recovery from two failed discs so isn't actually parity.)
    – David Richerby
    49 secs ago










  • I would guess that the fact that a dedicated parity disk creates a performance bottleneck is even more a concern than it wearing out faster.
    – Jörg W Mittag
    35 secs ago










Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "419"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcs.stackexchange.com%2fquestions%2f99180%2fwhy-does-raid-5-require-an-additional-disk-for-parity-blocks%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
3
down vote



accepted










I think you've misunderstood what the parity data is. They're not parity checks, so it's not true that "each parity block is specific to each disc it belongs to." The parity data is to allow recovery from a failed disc.



Let's go back to RAID-4 for a second, and assume we have three discs: discs $0$ and  $1$ are data and disc $2$ is parity. "Parity" means that the $b$th block of disc $2$ is the xor of the $b$th blocks of discs $0$ and $1$. The point is that, if any single disc fails, we can recover its data because the $b$th block of any disc is the xor of the $b$th block on the other two discs. For this to work, it's crucial that the parity data is on a separate discs. If you only had two discs and put the parity data on those discs (e.g., each disc was two-thirds data blocks and one-third parity blocks) then the failure of a single drive would destroy some blocks and their corresponding parity data, so you'd be unable to recover the data using just what was left on the remaining disc.



RAID-5 is the same idea except that, instead of putting all the parity data on the last disc, it's spread across all the discs. So, for a three-disc set-up, a third of the blocks would have parity data on disc $2$, a third on disc $1$ and a third on disc $0$.



The point of using RAID-5 rather than RAID-4 is that every time you write data, the corresponding parity block must be updated. If all parity data is on the same disc, that disc will be written to much more than the other discs ($k$ times as much, in a $k$-disc sytem), so it will fail faster. Spreading the parity data across the discs evens out the wear on them.






share|cite|improve this answer




















  • if I understood correctly, the parity block on each disk, as illustrated on the figure above such as Ap, is computed using hamming code for storing parity information for A1, A2 and A3?
    – Kais
    18 mins ago











  • Wikipedia says it's the XOR, and the word "parity" usually implies that. (Though Wikipedia also uses "parity" for RAID-6, which allows recovery from two failed discs so isn't actually parity.)
    – David Richerby
    49 secs ago










  • I would guess that the fact that a dedicated parity disk creates a performance bottleneck is even more a concern than it wearing out faster.
    – Jörg W Mittag
    35 secs ago














up vote
3
down vote



accepted










I think you've misunderstood what the parity data is. They're not parity checks, so it's not true that "each parity block is specific to each disc it belongs to." The parity data is to allow recovery from a failed disc.



Let's go back to RAID-4 for a second, and assume we have three discs: discs $0$ and  $1$ are data and disc $2$ is parity. "Parity" means that the $b$th block of disc $2$ is the xor of the $b$th blocks of discs $0$ and $1$. The point is that, if any single disc fails, we can recover its data because the $b$th block of any disc is the xor of the $b$th block on the other two discs. For this to work, it's crucial that the parity data is on a separate discs. If you only had two discs and put the parity data on those discs (e.g., each disc was two-thirds data blocks and one-third parity blocks) then the failure of a single drive would destroy some blocks and their corresponding parity data, so you'd be unable to recover the data using just what was left on the remaining disc.



RAID-5 is the same idea except that, instead of putting all the parity data on the last disc, it's spread across all the discs. So, for a three-disc set-up, a third of the blocks would have parity data on disc $2$, a third on disc $1$ and a third on disc $0$.



The point of using RAID-5 rather than RAID-4 is that every time you write data, the corresponding parity block must be updated. If all parity data is on the same disc, that disc will be written to much more than the other discs ($k$ times as much, in a $k$-disc sytem), so it will fail faster. Spreading the parity data across the discs evens out the wear on them.






share|cite|improve this answer




















  • if I understood correctly, the parity block on each disk, as illustrated on the figure above such as Ap, is computed using hamming code for storing parity information for A1, A2 and A3?
    – Kais
    18 mins ago











  • Wikipedia says it's the XOR, and the word "parity" usually implies that. (Though Wikipedia also uses "parity" for RAID-6, which allows recovery from two failed discs so isn't actually parity.)
    – David Richerby
    49 secs ago










  • I would guess that the fact that a dedicated parity disk creates a performance bottleneck is even more a concern than it wearing out faster.
    – Jörg W Mittag
    35 secs ago












up vote
3
down vote



accepted







up vote
3
down vote



accepted






I think you've misunderstood what the parity data is. They're not parity checks, so it's not true that "each parity block is specific to each disc it belongs to." The parity data is to allow recovery from a failed disc.



Let's go back to RAID-4 for a second, and assume we have three discs: discs $0$ and  $1$ are data and disc $2$ is parity. "Parity" means that the $b$th block of disc $2$ is the xor of the $b$th blocks of discs $0$ and $1$. The point is that, if any single disc fails, we can recover its data because the $b$th block of any disc is the xor of the $b$th block on the other two discs. For this to work, it's crucial that the parity data is on a separate discs. If you only had two discs and put the parity data on those discs (e.g., each disc was two-thirds data blocks and one-third parity blocks) then the failure of a single drive would destroy some blocks and their corresponding parity data, so you'd be unable to recover the data using just what was left on the remaining disc.



RAID-5 is the same idea except that, instead of putting all the parity data on the last disc, it's spread across all the discs. So, for a three-disc set-up, a third of the blocks would have parity data on disc $2$, a third on disc $1$ and a third on disc $0$.



The point of using RAID-5 rather than RAID-4 is that every time you write data, the corresponding parity block must be updated. If all parity data is on the same disc, that disc will be written to much more than the other discs ($k$ times as much, in a $k$-disc sytem), so it will fail faster. Spreading the parity data across the discs evens out the wear on them.






share|cite|improve this answer












I think you've misunderstood what the parity data is. They're not parity checks, so it's not true that "each parity block is specific to each disc it belongs to." The parity data is to allow recovery from a failed disc.



Let's go back to RAID-4 for a second, and assume we have three discs: discs $0$ and  $1$ are data and disc $2$ is parity. "Parity" means that the $b$th block of disc $2$ is the xor of the $b$th blocks of discs $0$ and $1$. The point is that, if any single disc fails, we can recover its data because the $b$th block of any disc is the xor of the $b$th block on the other two discs. For this to work, it's crucial that the parity data is on a separate discs. If you only had two discs and put the parity data on those discs (e.g., each disc was two-thirds data blocks and one-third parity blocks) then the failure of a single drive would destroy some blocks and their corresponding parity data, so you'd be unable to recover the data using just what was left on the remaining disc.



RAID-5 is the same idea except that, instead of putting all the parity data on the last disc, it's spread across all the discs. So, for a three-disc set-up, a third of the blocks would have parity data on disc $2$, a third on disc $1$ and a third on disc $0$.



The point of using RAID-5 rather than RAID-4 is that every time you write data, the corresponding parity block must be updated. If all parity data is on the same disc, that disc will be written to much more than the other discs ($k$ times as much, in a $k$-disc sytem), so it will fail faster. Spreading the parity data across the discs evens out the wear on them.







share|cite|improve this answer












share|cite|improve this answer



share|cite|improve this answer










answered 2 hours ago









David Richerby

63.1k1595180




63.1k1595180











  • if I understood correctly, the parity block on each disk, as illustrated on the figure above such as Ap, is computed using hamming code for storing parity information for A1, A2 and A3?
    – Kais
    18 mins ago











  • Wikipedia says it's the XOR, and the word "parity" usually implies that. (Though Wikipedia also uses "parity" for RAID-6, which allows recovery from two failed discs so isn't actually parity.)
    – David Richerby
    49 secs ago










  • I would guess that the fact that a dedicated parity disk creates a performance bottleneck is even more a concern than it wearing out faster.
    – Jörg W Mittag
    35 secs ago
















  • if I understood correctly, the parity block on each disk, as illustrated on the figure above such as Ap, is computed using hamming code for storing parity information for A1, A2 and A3?
    – Kais
    18 mins ago











  • Wikipedia says it's the XOR, and the word "parity" usually implies that. (Though Wikipedia also uses "parity" for RAID-6, which allows recovery from two failed discs so isn't actually parity.)
    – David Richerby
    49 secs ago










  • I would guess that the fact that a dedicated parity disk creates a performance bottleneck is even more a concern than it wearing out faster.
    – Jörg W Mittag
    35 secs ago















if I understood correctly, the parity block on each disk, as illustrated on the figure above such as Ap, is computed using hamming code for storing parity information for A1, A2 and A3?
– Kais
18 mins ago





if I understood correctly, the parity block on each disk, as illustrated on the figure above such as Ap, is computed using hamming code for storing parity information for A1, A2 and A3?
– Kais
18 mins ago













Wikipedia says it's the XOR, and the word "parity" usually implies that. (Though Wikipedia also uses "parity" for RAID-6, which allows recovery from two failed discs so isn't actually parity.)
– David Richerby
49 secs ago




Wikipedia says it's the XOR, and the word "parity" usually implies that. (Though Wikipedia also uses "parity" for RAID-6, which allows recovery from two failed discs so isn't actually parity.)
– David Richerby
49 secs ago












I would guess that the fact that a dedicated parity disk creates a performance bottleneck is even more a concern than it wearing out faster.
– Jörg W Mittag
35 secs ago




I would guess that the fact that a dedicated parity disk creates a performance bottleneck is even more a concern than it wearing out faster.
– Jörg W Mittag
35 secs ago

















 

draft saved


draft discarded















































 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcs.stackexchange.com%2fquestions%2f99180%2fwhy-does-raid-5-require-an-additional-disk-for-parity-blocks%23new-answer', 'question_page');

);

Post as a guest













































































Comments

Popular posts from this blog

Long meetings (6-7 hours a day): Being “babysat” by supervisor

Is the Concept of Multiple Fantasy Races Scientifically Flawed? [closed]

Confectionery