|  | binary files format |  | |
| | | Bill Cunningham |  |
| Posted: Wed Aug 20, 2008 7:53 pm Post subject: binary files format |  |
Where would one put a signature for a file format in a file. Be it assembly or a C program? For example, win32/pe BFD has other file formats Adobe has pdf files for example. Where do they put in their files the code that identifies a file as .pdf and is supposed to be opened as an acrobat file ?
Bill |
| |
| | | Bartc |  |
| Posted: Wed Aug 20, 2008 8:27 pm Post subject: Re: binary files format |  |
"Bill Cunningham" <nospam@nspam.com> wrote in message news:9n0rk.372$lf2.317@trnddc07...
| Quote: | Where would one put a signature for a file format in a file. Be it assembly or a C program? For example, win32/pe BFD has other file formats Adobe has pdf files for example. Where do they put in their files the code that identifies a file as .pdf and is supposed to be opened as an acrobat file ?
|
At the beginning usually.
-- Bartc |
| |
| | | jellybean stonerfish |  |
| Posted: Wed Aug 20, 2008 9:08 pm Post subject: Re: binary files format |  |
On Wed, 20 Aug 2008 21:53:09 +0000, Bill Cunningham wrote:
| Quote: | Where would one put a signature for a file format in a file. Be it assembly or a C program? For example, win32/pe BFD has other file formats Adobe has pdf files for example. Where do they put in their files the code that identifies a file as .pdf and is supposed to be opened as an acrobat file ?
Bill
|
Type "file magic numbers" in a search engine.
pdf start with %PDF I believe.
stonerfish |
| |
| | | Harold Aptroot |  |
| Posted: Wed Aug 20, 2008 11:51 pm Post subject: Re: binary files format |  |
| |  | |
"Bill Cunningham" <nospam@nspam.com> wrote in message news:9n0rk.372$lf2.317@trnddc07...
| Quote: | Where would one put a signature for a file format in a file. Be it assembly or a C program? For example, win32/pe BFD has other file formats Adobe has pdf files for example. Where do they put in their files the code that identifies a file as .pdf and is supposed to be opened as an acrobat file ?
Bill
|
Be original and put it at the end  Or better yet, store an offset at the end of the file, and store the magic value at that offset
What are you trying to do though, design your own format? (if so, just make up something, who cares right? to be even safer, also store a checksum) Or are you trying to decode some unknown file and looking for the magic value? (99% chance it'll be at the beginning if it exists at all, and obviously: having 2 files of the same type helps a lot)
The code that ultimately identifies the file and determines what program to open it with is the extension (it is possible to interpret your question in such way that that would be the answer) |
| |
| | | cr88192 |  |
| Posted: Thu Aug 21, 2008 12:04 am Post subject: Re: binary files format |  |
"Bill Cunningham" <nospam@nspam.com> wrote in message news:9n0rk.372$lf2.317@trnddc07...
| Quote: | Where would one put a signature for a file format in a file. Be it assembly or a C program? For example, win32/pe BFD has other file formats Adobe has pdf files for example. Where do they put in their files the code that identifies a file as .pdf and is supposed to be opened as an acrobat file ?
|
multiple questions are being asked here.
but, as for the OS knowing the file type, this is usually a result of the file extension. for example, the 'PDF' extension tells windows to open Acrobat, ...
now, for the app identifying file types (or verifying that the file with a specific extension is actually the expected type), often this is done by using magic numbers (very often, a special value in the first 4 bytes of the file).
|
| |
| | | [Jongware] |  |
| Posted: Thu Aug 21, 2008 12:08 pm Post subject: Re: binary files format |  |
| |  | |
Bill Cunningham wrote:
| Quote: | Where would one put a signature for a file format in a file. Be it assembly or a C program? For example, win32/pe BFD has other file formats Adobe has pdf files for example. Where do they put in their files the code that identifies a file as .pdf and is supposed to be opened as an acrobat file ?
|
Some other posters mention checking by 'extension'. Well, it is a way, although they cannot differentiate, for example, between the variants of ..exe files (plain MS-DOS, DPMI-extended LE, Windows PE/LE/NE), or between different versions (Acrobat documents have been called 'PDF' for ages). It's also important to note that old, reliable MS-DOS didn't *care* if an executable had either a .com or a .exe extension -- it checked the inside of the file.
What do you mean by 'assembly or C program'? If you make an executable, you are bound by the OS for any possible leading signatures -- you can't put a sig of your own at the start of a program. You can, however, put anything you want inside, or even better: at the end. For an executable, that's a fairly neat trick, as the OS presumably never "sees" something that's outside the bounds of the program as reported by the program itself. Self-unpacking executables work like this: it's an unpacker with all data glued to the end. A contra point is that you need a fair bit of understanding how executables are build (and also hope these specs don't change within a reasonable time). It's also not platform independent.
Apart from cross-checking if a file's extension is the right one, I'd say the most important reason for putting a sig in a datafile is version info. That can get /so/ important that sometimes the version number becomes a de facto standard -- GIF, for example, comes in just two well-defined versions: GIF87a and GIF89a. There aren't any more versions.
Your PDF example also comes with a version number. The "magic" header is "%PDF-x.y", where 'x.y' define a version of the official Acrobat documentation. From memory, I think even the 2nd line -- "%" then some gibberish -- is in the official specs.
.... For a plain data file, just put it at the start.
[Jw] |
| |
| | | Alf P. Steinbach |  |
| Posted: Thu Aug 21, 2008 5:41 pm Post subject: Re: binary files format |  |
* [Jongware]:
| Quote: | It's also important to note that old, reliable MS-DOS didn't *care* if an executable had either a .com or a .exe extension -- it checked the inside of the file.
|
And so does the default Windows command interpreter [cmd.exe]. You can call your executable [foo.txt] if you want. Type "foo.txt" and it's executed, not loaded in a text editor.
From old Win32 did also support registry specification of magic numbers, or rather, magic data, for file types. As I recall the intention was that it would be an UUID, but not restricted to that. But I never encountered any instance where this scheme was actually used.
Cheers,
- Alf
-- A: Because it messes up the order in which people normally read text. Q: Why is it such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail? |
| |
| | | Bill Cunningham |  |
| Posted: Mon Aug 25, 2008 5:40 pm Post subject: Re: binary files format |  |
| |  | |
"Harold Aptroot" <harold.aptroot@gmail.com> wrote in message news:g8ihml$u92$1@registered.motzarella.org...
| Quote: | What are you trying to do though, design your own format? (if so, just make up something, who cares right? to be even safer, also store a checksum) Or are you trying to decode some unknown file and looking for the magic value? (99% chance it'll be at the beginning if it exists at all, and obviously: having 2 files of the same type helps a lot)
The code that ultimately identifies the file and determines what program to open it with is the extension (it is possible to interpret your question in such way that that would be the answer)
|
Designing my own format would involve a search for magic numbers that do exist too. In order not to conflict. Sorry I've been away my DSL was moved and now I'm on dial up. Extensions aren't always reliable either in OSs that have them. Take windows for example. Some exes are dlls. And the other way around. I want to read the magic number from the byte code first before writing anything. Say 0x7f. I think that's ELF. No extensions necessary only a gnu/linux OS. I could read into a stream 500 bytes with fread and look through that. Would that be sufficient ? To those who have done this if anyone. I would like to read and write a text reader that reads and writes to a certain format. That's my idea.
Bill |
| |
| | | cr88192 |  |
| Posted: Tue Aug 26, 2008 4:37 am Post subject: Re: binary files format |  |
| |  | |
"Bill Cunningham" <nospam@nspam.com> wrote in message news:HUDsk.657$482.289@trnddc06...
| Quote: | "Harold Aptroot" <harold.aptroot@gmail.com> wrote in message news:g8ihml$u92$1@registered.motzarella.org... What are you trying to do though, design your own format? (if so, just make up something, who cares right? to be even safer, also store a checksum) Or are you trying to decode some unknown file and looking for the magic value? (99% chance it'll be at the beginning if it exists at all, and obviously: having 2 files of the same type helps a lot)
The code that ultimately identifies the file and determines what program to open it with is the extension (it is possible to interpret your question in such way that that would be the answer)
Designing my own format would involve a search for magic numbers that do exist too. In order not to conflict. Sorry I've been away my DSL was moved and now I'm on dial up. Extensions aren't always reliable either in OSs that have them. Take windows for example. Some exes are dlls. And the other way around. I want to read the magic number from the byte code first before writing anything. Say 0x7f. I think that's ELF. No extensions necessary only a gnu/linux OS. I could read into a stream 500 bytes with fread and look through that. Would that be sufficient ? To those who have done this if anyone. I would like to read and write a text reader that reads and writes to a certain format. That's my idea.
|
ELF is "\x7fELF"
now, for 4 byte magics, one route is to use a presumable TRNG, and generate some random sequence. chances are probably "one in a million" that it will actually clash with anything. of course, if you use alphabetic or alphanumeric tags, even if random, the chance of clash is far higher, since, for example, with upper-case alphabetic codes, there are only about 500,000 of them, a decent portion of which are probably also used (especially given the tendency of the human mind to generate statistically predictable sequences, making it such that many people end up trying to come up with the same sequences).
now, if one wants stronger measures, a 48, 64, 96, or 128 bit sequence is far less likely to clash (128 bit sequences being GUIDs, which are commonly used when people want a fairly good degree of confidence that there is no clash, although most modern GUIDs are generated with TRNGs).
note: TRNG != PRNG where a TRNG will generate a unique non-repeating sequence, but a PRNG will typically generate a non-unique eventually-repeating sequence.
if using Linux, /dev/random can be used for GUIDs (more or less, there are a few hidden fields that should be set to appropriate values). on windows there is also a similar feature available.
there are also tools available for generating them as well (for both windows and linux).
in my case, I have something vaguely similar, but smaller (it is 64 bits). in particular I generate 48 bit random MAC addresses (bits are set so that they are "locally defined", making it so that they do not clash with proper MAC addresses). the upper 16 bits is currently a magic (although, it is possible that this could be used as an extension field, for example, to gain 12 or so additional entropy bits, if needed, or a locally-defined EUI-64 value could also be possible...).
this is used for the "segment" field in a DSM scheme of mine (I use a 64:64 segmented addressing scheme, or 48:64 as it currently works out...).
typically, a similar scheme is used in my case for a "gensym" mechanism (this gensym generating names that are also unique between nodes and runs, vs more traditional "gensyms" that only generate locally unique names...).
|
| |
|
|