Detect file type using file signatures
Clash Royale CLAN TAG#URR8PPP
up vote
2
down vote
favorite
My program has to detect the real file type of a given file using signatures. For now I'm just checking JPG, but I want to add more.
Dim files() As String = IO.Directory.GetFiles(pictures)
Dim file_data As Byte()
Dim jpg_file_extension() As Byte = &HFF, &HD8, &HFF
Dim office_file_extension() As Byte = &H50, &H4B, &H3, &H4, &H14, &H0, &H6, &H0
Dim check As Integer = 0
For Each file As String In files
file_data = IO.File.ReadAllBytes(file)
If file_data.Length > 2 Then
For i = 0 To jpg_file_extension.Length - 1
If file_data(i) = jpg_file_extension(i) Then
check += 1
Else
check = 0
Exit For
End If
Next
If (check.ToString.Length = jpg_file_extension.Length - 1) Then
MsgBox(file.Split("").Last & ": its jpg")
End If
End If
Next
The code looks a bit messy right now and It's only checking one file type, my questions are:
- How can I improve this code, make it cleaner and efficient.
- Is there a way to implement this code in such a way that I can have a function, give it the file data and check if the signature is "whitelisted"?
file vb.net
New contributor
add a comment |Â
up vote
2
down vote
favorite
My program has to detect the real file type of a given file using signatures. For now I'm just checking JPG, but I want to add more.
Dim files() As String = IO.Directory.GetFiles(pictures)
Dim file_data As Byte()
Dim jpg_file_extension() As Byte = &HFF, &HD8, &HFF
Dim office_file_extension() As Byte = &H50, &H4B, &H3, &H4, &H14, &H0, &H6, &H0
Dim check As Integer = 0
For Each file As String In files
file_data = IO.File.ReadAllBytes(file)
If file_data.Length > 2 Then
For i = 0 To jpg_file_extension.Length - 1
If file_data(i) = jpg_file_extension(i) Then
check += 1
Else
check = 0
Exit For
End If
Next
If (check.ToString.Length = jpg_file_extension.Length - 1) Then
MsgBox(file.Split("").Last & ": its jpg")
End If
End If
Next
The code looks a bit messy right now and It's only checking one file type, my questions are:
- How can I improve this code, make it cleaner and efficient.
- Is there a way to implement this code in such a way that I can have a function, give it the file data and check if the signature is "whitelisted"?
file vb.net
New contributor
add a comment |Â
up vote
2
down vote
favorite
up vote
2
down vote
favorite
My program has to detect the real file type of a given file using signatures. For now I'm just checking JPG, but I want to add more.
Dim files() As String = IO.Directory.GetFiles(pictures)
Dim file_data As Byte()
Dim jpg_file_extension() As Byte = &HFF, &HD8, &HFF
Dim office_file_extension() As Byte = &H50, &H4B, &H3, &H4, &H14, &H0, &H6, &H0
Dim check As Integer = 0
For Each file As String In files
file_data = IO.File.ReadAllBytes(file)
If file_data.Length > 2 Then
For i = 0 To jpg_file_extension.Length - 1
If file_data(i) = jpg_file_extension(i) Then
check += 1
Else
check = 0
Exit For
End If
Next
If (check.ToString.Length = jpg_file_extension.Length - 1) Then
MsgBox(file.Split("").Last & ": its jpg")
End If
End If
Next
The code looks a bit messy right now and It's only checking one file type, my questions are:
- How can I improve this code, make it cleaner and efficient.
- Is there a way to implement this code in such a way that I can have a function, give it the file data and check if the signature is "whitelisted"?
file vb.net
New contributor
My program has to detect the real file type of a given file using signatures. For now I'm just checking JPG, but I want to add more.
Dim files() As String = IO.Directory.GetFiles(pictures)
Dim file_data As Byte()
Dim jpg_file_extension() As Byte = &HFF, &HD8, &HFF
Dim office_file_extension() As Byte = &H50, &H4B, &H3, &H4, &H14, &H0, &H6, &H0
Dim check As Integer = 0
For Each file As String In files
file_data = IO.File.ReadAllBytes(file)
If file_data.Length > 2 Then
For i = 0 To jpg_file_extension.Length - 1
If file_data(i) = jpg_file_extension(i) Then
check += 1
Else
check = 0
Exit For
End If
Next
If (check.ToString.Length = jpg_file_extension.Length - 1) Then
MsgBox(file.Split("").Last & ": its jpg")
End If
End If
Next
The code looks a bit messy right now and It's only checking one file type, my questions are:
- How can I improve this code, make it cleaner and efficient.
- Is there a way to implement this code in such a way that I can have a function, give it the file data and check if the signature is "whitelisted"?
file vb.net
file vb.net
New contributor
New contributor
edited 1 hour ago
200_success
125k14145406
125k14145406
New contributor
asked 5 hours ago
Milton Cardoso
113
113
New contributor
New contributor
add a comment |Â
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
1
down vote
Your first step is to wind your thinking back a few steps and re-approach your code with a fresh line of thinking. Looking at your code, you say "to detect the real file type of a given file" but you have written code to detect a JPEG(*) file.
There is a subtlety here, but once you have mastered that you can approach complex problems with more confidence. The subtlety is you want a generic approach, but your thinking at the moment is constrained to and focussed on a particular example - your solution is tailored to that example. More specifically, your current code answers the question "Is this a JPEG file?", you want your solution to answer the question "What is the file type of this file?".
Signatures
You define your signatures early. This is a good approach because it lends itself to a future implementation where you can import a tailored list of signatures.
However, you are currently using separate arrays to store the signature data. The use of multiple arrays is going to be inefficient for any improvements or event for checking multiple files.
The use of static arrays implies looping through all arrays. In a small implementation this is not that noticeable, but if you have a hundred arrays with a size ranging from 3 to 15 bytes, you will start to notice a performance hit. Basically, you will be continuing to check arrays that you have already eliminated as being relevant to your quest.
A suggested way to improve the performance initially is to put the signatures in a collection (e.g. List(Of OrderedList(Of Byte))
). This way, once you eliminate a signature you can remove it from the collection, thus quickly removing the unnecessary checks with a commensurate improvement in performance.
The use of the inner collection removes the need to check array lengths, but having a List(Of Array)
could also work.
Looping
You manually loop through your array. This is always a simple first approach and reflects the basic solution to identifying a signature. Your code is set up to first loop through the first signature and I assume you were thinking of duplicating this kind of loop for the other signatures.
Sitting here, I can think of two simple approaches:
- Looping through the file bytes individually, removing signatures from the collection as they fail
- Looping through the signatures and doing an array check against the first x bytes of each file
Intuitively, I think the second option will be less efficient but I could be wrong.
Some example code (not guaranteed to be compilable):
For Each file As String In files
file_data = IO.File.ReadAllBytes(file)
For signatureIterator = MasterSignatureList.Count - 1 to 0
' Declare and implement as required
' Used a For loop going backwards because in this example we are going to remove elements from the collection
signature = MasterSignatureList(signatureIterator) ' the shorter text makes my example easier to read.
If file_data.Length < signature.Length then
MasterSignatureList.Remove signatureIterator
Else
If Not CheckArrayIsSame(file_data.Resize(signature.length), signature) then
' Some function to check arrays are the same will be required
' The native .Resize actually changes the original array, so you should make a copy before running .Resize. I was being lazy.
MasterSignatureList.Remove signatureIterator
End If
End if
Next signatureIterator
' **** do something here with the remaining signatures as these are the valid ones for that particular file!
Next file
And an example for the first option
For Each file As String In files
file_data = IO.File.ReadAllBytes(file)
For each signature in MasterSignatureList
if filedata.Length < signature.Length Then MasterSignatureList.Remove signature ' Obviously wrong
Next signature
For signatureIterator = 0 to file_data.Length ' we should exit the loop before getting to the end of most files!
signatureCheck = false
For each signature in MasterSignatureList
If signatureIterator < signature.Length Then ' retains signatures that have already passed
signatureCheck = true ' still some signatures to check
If file_data(signatureIterator) <> signature(signatureIterator) Then
MasterSignatureList.Remove signature ' signature does not match
End if
End if
Next signature
If MasterSignatureList.Empty or Not signatureCheck then Exit For ' exit if nothing left to check
Next signatureIterator
' **** do something here with the remaining signatures as these are the valid ones for that particular file!
Next file
In both of those examples, the signatures remaining the signature list are the potential file types. In these examples, the possibility of multiple signatures passing is allowed - how you handle that is up to your programming logic.
As already noted - I have not tested the above code, so also check for the dreaded Jedi array error condition (off-by-1) in my iterations.
(*) The correct nomenclature is JPEG, the file extension in traditional 8.3 style is ".jpg". Why this is so, I leave up to your own research.
add a comment |Â
up vote
0
down vote
IO.File.ReadAllBytes(file)
seems like overkill. Most file formats have signatures that appear within the first few kilobytes. There are, however, signatures where the signature does not appear at the start of the file (e.g. TAR archives), as well as signatures with subtype information at discontinuous locations (e.g. DOS / Windows executables). Depending on how ambitious you want to be, you may need to generalize how the signatures are specified.
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
Your first step is to wind your thinking back a few steps and re-approach your code with a fresh line of thinking. Looking at your code, you say "to detect the real file type of a given file" but you have written code to detect a JPEG(*) file.
There is a subtlety here, but once you have mastered that you can approach complex problems with more confidence. The subtlety is you want a generic approach, but your thinking at the moment is constrained to and focussed on a particular example - your solution is tailored to that example. More specifically, your current code answers the question "Is this a JPEG file?", you want your solution to answer the question "What is the file type of this file?".
Signatures
You define your signatures early. This is a good approach because it lends itself to a future implementation where you can import a tailored list of signatures.
However, you are currently using separate arrays to store the signature data. The use of multiple arrays is going to be inefficient for any improvements or event for checking multiple files.
The use of static arrays implies looping through all arrays. In a small implementation this is not that noticeable, but if you have a hundred arrays with a size ranging from 3 to 15 bytes, you will start to notice a performance hit. Basically, you will be continuing to check arrays that you have already eliminated as being relevant to your quest.
A suggested way to improve the performance initially is to put the signatures in a collection (e.g. List(Of OrderedList(Of Byte))
). This way, once you eliminate a signature you can remove it from the collection, thus quickly removing the unnecessary checks with a commensurate improvement in performance.
The use of the inner collection removes the need to check array lengths, but having a List(Of Array)
could also work.
Looping
You manually loop through your array. This is always a simple first approach and reflects the basic solution to identifying a signature. Your code is set up to first loop through the first signature and I assume you were thinking of duplicating this kind of loop for the other signatures.
Sitting here, I can think of two simple approaches:
- Looping through the file bytes individually, removing signatures from the collection as they fail
- Looping through the signatures and doing an array check against the first x bytes of each file
Intuitively, I think the second option will be less efficient but I could be wrong.
Some example code (not guaranteed to be compilable):
For Each file As String In files
file_data = IO.File.ReadAllBytes(file)
For signatureIterator = MasterSignatureList.Count - 1 to 0
' Declare and implement as required
' Used a For loop going backwards because in this example we are going to remove elements from the collection
signature = MasterSignatureList(signatureIterator) ' the shorter text makes my example easier to read.
If file_data.Length < signature.Length then
MasterSignatureList.Remove signatureIterator
Else
If Not CheckArrayIsSame(file_data.Resize(signature.length), signature) then
' Some function to check arrays are the same will be required
' The native .Resize actually changes the original array, so you should make a copy before running .Resize. I was being lazy.
MasterSignatureList.Remove signatureIterator
End If
End if
Next signatureIterator
' **** do something here with the remaining signatures as these are the valid ones for that particular file!
Next file
And an example for the first option
For Each file As String In files
file_data = IO.File.ReadAllBytes(file)
For each signature in MasterSignatureList
if filedata.Length < signature.Length Then MasterSignatureList.Remove signature ' Obviously wrong
Next signature
For signatureIterator = 0 to file_data.Length ' we should exit the loop before getting to the end of most files!
signatureCheck = false
For each signature in MasterSignatureList
If signatureIterator < signature.Length Then ' retains signatures that have already passed
signatureCheck = true ' still some signatures to check
If file_data(signatureIterator) <> signature(signatureIterator) Then
MasterSignatureList.Remove signature ' signature does not match
End if
End if
Next signature
If MasterSignatureList.Empty or Not signatureCheck then Exit For ' exit if nothing left to check
Next signatureIterator
' **** do something here with the remaining signatures as these are the valid ones for that particular file!
Next file
In both of those examples, the signatures remaining the signature list are the potential file types. In these examples, the possibility of multiple signatures passing is allowed - how you handle that is up to your programming logic.
As already noted - I have not tested the above code, so also check for the dreaded Jedi array error condition (off-by-1) in my iterations.
(*) The correct nomenclature is JPEG, the file extension in traditional 8.3 style is ".jpg". Why this is so, I leave up to your own research.
add a comment |Â
up vote
1
down vote
Your first step is to wind your thinking back a few steps and re-approach your code with a fresh line of thinking. Looking at your code, you say "to detect the real file type of a given file" but you have written code to detect a JPEG(*) file.
There is a subtlety here, but once you have mastered that you can approach complex problems with more confidence. The subtlety is you want a generic approach, but your thinking at the moment is constrained to and focussed on a particular example - your solution is tailored to that example. More specifically, your current code answers the question "Is this a JPEG file?", you want your solution to answer the question "What is the file type of this file?".
Signatures
You define your signatures early. This is a good approach because it lends itself to a future implementation where you can import a tailored list of signatures.
However, you are currently using separate arrays to store the signature data. The use of multiple arrays is going to be inefficient for any improvements or event for checking multiple files.
The use of static arrays implies looping through all arrays. In a small implementation this is not that noticeable, but if you have a hundred arrays with a size ranging from 3 to 15 bytes, you will start to notice a performance hit. Basically, you will be continuing to check arrays that you have already eliminated as being relevant to your quest.
A suggested way to improve the performance initially is to put the signatures in a collection (e.g. List(Of OrderedList(Of Byte))
). This way, once you eliminate a signature you can remove it from the collection, thus quickly removing the unnecessary checks with a commensurate improvement in performance.
The use of the inner collection removes the need to check array lengths, but having a List(Of Array)
could also work.
Looping
You manually loop through your array. This is always a simple first approach and reflects the basic solution to identifying a signature. Your code is set up to first loop through the first signature and I assume you were thinking of duplicating this kind of loop for the other signatures.
Sitting here, I can think of two simple approaches:
- Looping through the file bytes individually, removing signatures from the collection as they fail
- Looping through the signatures and doing an array check against the first x bytes of each file
Intuitively, I think the second option will be less efficient but I could be wrong.
Some example code (not guaranteed to be compilable):
For Each file As String In files
file_data = IO.File.ReadAllBytes(file)
For signatureIterator = MasterSignatureList.Count - 1 to 0
' Declare and implement as required
' Used a For loop going backwards because in this example we are going to remove elements from the collection
signature = MasterSignatureList(signatureIterator) ' the shorter text makes my example easier to read.
If file_data.Length < signature.Length then
MasterSignatureList.Remove signatureIterator
Else
If Not CheckArrayIsSame(file_data.Resize(signature.length), signature) then
' Some function to check arrays are the same will be required
' The native .Resize actually changes the original array, so you should make a copy before running .Resize. I was being lazy.
MasterSignatureList.Remove signatureIterator
End If
End if
Next signatureIterator
' **** do something here with the remaining signatures as these are the valid ones for that particular file!
Next file
And an example for the first option
For Each file As String In files
file_data = IO.File.ReadAllBytes(file)
For each signature in MasterSignatureList
if filedata.Length < signature.Length Then MasterSignatureList.Remove signature ' Obviously wrong
Next signature
For signatureIterator = 0 to file_data.Length ' we should exit the loop before getting to the end of most files!
signatureCheck = false
For each signature in MasterSignatureList
If signatureIterator < signature.Length Then ' retains signatures that have already passed
signatureCheck = true ' still some signatures to check
If file_data(signatureIterator) <> signature(signatureIterator) Then
MasterSignatureList.Remove signature ' signature does not match
End if
End if
Next signature
If MasterSignatureList.Empty or Not signatureCheck then Exit For ' exit if nothing left to check
Next signatureIterator
' **** do something here with the remaining signatures as these are the valid ones for that particular file!
Next file
In both of those examples, the signatures remaining the signature list are the potential file types. In these examples, the possibility of multiple signatures passing is allowed - how you handle that is up to your programming logic.
As already noted - I have not tested the above code, so also check for the dreaded Jedi array error condition (off-by-1) in my iterations.
(*) The correct nomenclature is JPEG, the file extension in traditional 8.3 style is ".jpg". Why this is so, I leave up to your own research.
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Your first step is to wind your thinking back a few steps and re-approach your code with a fresh line of thinking. Looking at your code, you say "to detect the real file type of a given file" but you have written code to detect a JPEG(*) file.
There is a subtlety here, but once you have mastered that you can approach complex problems with more confidence. The subtlety is you want a generic approach, but your thinking at the moment is constrained to and focussed on a particular example - your solution is tailored to that example. More specifically, your current code answers the question "Is this a JPEG file?", you want your solution to answer the question "What is the file type of this file?".
Signatures
You define your signatures early. This is a good approach because it lends itself to a future implementation where you can import a tailored list of signatures.
However, you are currently using separate arrays to store the signature data. The use of multiple arrays is going to be inefficient for any improvements or event for checking multiple files.
The use of static arrays implies looping through all arrays. In a small implementation this is not that noticeable, but if you have a hundred arrays with a size ranging from 3 to 15 bytes, you will start to notice a performance hit. Basically, you will be continuing to check arrays that you have already eliminated as being relevant to your quest.
A suggested way to improve the performance initially is to put the signatures in a collection (e.g. List(Of OrderedList(Of Byte))
). This way, once you eliminate a signature you can remove it from the collection, thus quickly removing the unnecessary checks with a commensurate improvement in performance.
The use of the inner collection removes the need to check array lengths, but having a List(Of Array)
could also work.
Looping
You manually loop through your array. This is always a simple first approach and reflects the basic solution to identifying a signature. Your code is set up to first loop through the first signature and I assume you were thinking of duplicating this kind of loop for the other signatures.
Sitting here, I can think of two simple approaches:
- Looping through the file bytes individually, removing signatures from the collection as they fail
- Looping through the signatures and doing an array check against the first x bytes of each file
Intuitively, I think the second option will be less efficient but I could be wrong.
Some example code (not guaranteed to be compilable):
For Each file As String In files
file_data = IO.File.ReadAllBytes(file)
For signatureIterator = MasterSignatureList.Count - 1 to 0
' Declare and implement as required
' Used a For loop going backwards because in this example we are going to remove elements from the collection
signature = MasterSignatureList(signatureIterator) ' the shorter text makes my example easier to read.
If file_data.Length < signature.Length then
MasterSignatureList.Remove signatureIterator
Else
If Not CheckArrayIsSame(file_data.Resize(signature.length), signature) then
' Some function to check arrays are the same will be required
' The native .Resize actually changes the original array, so you should make a copy before running .Resize. I was being lazy.
MasterSignatureList.Remove signatureIterator
End If
End if
Next signatureIterator
' **** do something here with the remaining signatures as these are the valid ones for that particular file!
Next file
And an example for the first option
For Each file As String In files
file_data = IO.File.ReadAllBytes(file)
For each signature in MasterSignatureList
if filedata.Length < signature.Length Then MasterSignatureList.Remove signature ' Obviously wrong
Next signature
For signatureIterator = 0 to file_data.Length ' we should exit the loop before getting to the end of most files!
signatureCheck = false
For each signature in MasterSignatureList
If signatureIterator < signature.Length Then ' retains signatures that have already passed
signatureCheck = true ' still some signatures to check
If file_data(signatureIterator) <> signature(signatureIterator) Then
MasterSignatureList.Remove signature ' signature does not match
End if
End if
Next signature
If MasterSignatureList.Empty or Not signatureCheck then Exit For ' exit if nothing left to check
Next signatureIterator
' **** do something here with the remaining signatures as these are the valid ones for that particular file!
Next file
In both of those examples, the signatures remaining the signature list are the potential file types. In these examples, the possibility of multiple signatures passing is allowed - how you handle that is up to your programming logic.
As already noted - I have not tested the above code, so also check for the dreaded Jedi array error condition (off-by-1) in my iterations.
(*) The correct nomenclature is JPEG, the file extension in traditional 8.3 style is ".jpg". Why this is so, I leave up to your own research.
Your first step is to wind your thinking back a few steps and re-approach your code with a fresh line of thinking. Looking at your code, you say "to detect the real file type of a given file" but you have written code to detect a JPEG(*) file.
There is a subtlety here, but once you have mastered that you can approach complex problems with more confidence. The subtlety is you want a generic approach, but your thinking at the moment is constrained to and focussed on a particular example - your solution is tailored to that example. More specifically, your current code answers the question "Is this a JPEG file?", you want your solution to answer the question "What is the file type of this file?".
Signatures
You define your signatures early. This is a good approach because it lends itself to a future implementation where you can import a tailored list of signatures.
However, you are currently using separate arrays to store the signature data. The use of multiple arrays is going to be inefficient for any improvements or event for checking multiple files.
The use of static arrays implies looping through all arrays. In a small implementation this is not that noticeable, but if you have a hundred arrays with a size ranging from 3 to 15 bytes, you will start to notice a performance hit. Basically, you will be continuing to check arrays that you have already eliminated as being relevant to your quest.
A suggested way to improve the performance initially is to put the signatures in a collection (e.g. List(Of OrderedList(Of Byte))
). This way, once you eliminate a signature you can remove it from the collection, thus quickly removing the unnecessary checks with a commensurate improvement in performance.
The use of the inner collection removes the need to check array lengths, but having a List(Of Array)
could also work.
Looping
You manually loop through your array. This is always a simple first approach and reflects the basic solution to identifying a signature. Your code is set up to first loop through the first signature and I assume you were thinking of duplicating this kind of loop for the other signatures.
Sitting here, I can think of two simple approaches:
- Looping through the file bytes individually, removing signatures from the collection as they fail
- Looping through the signatures and doing an array check against the first x bytes of each file
Intuitively, I think the second option will be less efficient but I could be wrong.
Some example code (not guaranteed to be compilable):
For Each file As String In files
file_data = IO.File.ReadAllBytes(file)
For signatureIterator = MasterSignatureList.Count - 1 to 0
' Declare and implement as required
' Used a For loop going backwards because in this example we are going to remove elements from the collection
signature = MasterSignatureList(signatureIterator) ' the shorter text makes my example easier to read.
If file_data.Length < signature.Length then
MasterSignatureList.Remove signatureIterator
Else
If Not CheckArrayIsSame(file_data.Resize(signature.length), signature) then
' Some function to check arrays are the same will be required
' The native .Resize actually changes the original array, so you should make a copy before running .Resize. I was being lazy.
MasterSignatureList.Remove signatureIterator
End If
End if
Next signatureIterator
' **** do something here with the remaining signatures as these are the valid ones for that particular file!
Next file
And an example for the first option
For Each file As String In files
file_data = IO.File.ReadAllBytes(file)
For each signature in MasterSignatureList
if filedata.Length < signature.Length Then MasterSignatureList.Remove signature ' Obviously wrong
Next signature
For signatureIterator = 0 to file_data.Length ' we should exit the loop before getting to the end of most files!
signatureCheck = false
For each signature in MasterSignatureList
If signatureIterator < signature.Length Then ' retains signatures that have already passed
signatureCheck = true ' still some signatures to check
If file_data(signatureIterator) <> signature(signatureIterator) Then
MasterSignatureList.Remove signature ' signature does not match
End if
End if
Next signature
If MasterSignatureList.Empty or Not signatureCheck then Exit For ' exit if nothing left to check
Next signatureIterator
' **** do something here with the remaining signatures as these are the valid ones for that particular file!
Next file
In both of those examples, the signatures remaining the signature list are the potential file types. In these examples, the possibility of multiple signatures passing is allowed - how you handle that is up to your programming logic.
As already noted - I have not tested the above code, so also check for the dreaded Jedi array error condition (off-by-1) in my iterations.
(*) The correct nomenclature is JPEG, the file extension in traditional 8.3 style is ".jpg". Why this is so, I leave up to your own research.
answered 1 hour ago
AJD
1,0451213
1,0451213
add a comment |Â
add a comment |Â
up vote
0
down vote
IO.File.ReadAllBytes(file)
seems like overkill. Most file formats have signatures that appear within the first few kilobytes. There are, however, signatures where the signature does not appear at the start of the file (e.g. TAR archives), as well as signatures with subtype information at discontinuous locations (e.g. DOS / Windows executables). Depending on how ambitious you want to be, you may need to generalize how the signatures are specified.
add a comment |Â
up vote
0
down vote
IO.File.ReadAllBytes(file)
seems like overkill. Most file formats have signatures that appear within the first few kilobytes. There are, however, signatures where the signature does not appear at the start of the file (e.g. TAR archives), as well as signatures with subtype information at discontinuous locations (e.g. DOS / Windows executables). Depending on how ambitious you want to be, you may need to generalize how the signatures are specified.
add a comment |Â
up vote
0
down vote
up vote
0
down vote
IO.File.ReadAllBytes(file)
seems like overkill. Most file formats have signatures that appear within the first few kilobytes. There are, however, signatures where the signature does not appear at the start of the file (e.g. TAR archives), as well as signatures with subtype information at discontinuous locations (e.g. DOS / Windows executables). Depending on how ambitious you want to be, you may need to generalize how the signatures are specified.
IO.File.ReadAllBytes(file)
seems like overkill. Most file formats have signatures that appear within the first few kilobytes. There are, however, signatures where the signature does not appear at the start of the file (e.g. TAR archives), as well as signatures with subtype information at discontinuous locations (e.g. DOS / Windows executables). Depending on how ambitious you want to be, you may need to generalize how the signatures are specified.
answered 1 hour ago
200_success
125k14145406
125k14145406
add a comment |Â
add a comment |Â
Milton Cardoso is a new contributor. Be nice, and check out our Code of Conduct.
Milton Cardoso is a new contributor. Be nice, and check out our Code of Conduct.
Milton Cardoso is a new contributor. Be nice, and check out our Code of Conduct.
Milton Cardoso is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f204595%2fdetect-file-type-using-file-signatures%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password