Google
 
Webnews.only-4-geeks.com
Interesting places
news.only-4-geeks.com Forum Index » Programming

binary files format

 
Jump to:  
 
Bill Cunningham
PostPosted: Wed Aug 20, 2008 7:53 pm    Post subject: binary files format
       
Where would one put a signature for a file format in a file. Be it
assembly or a C program? For example, win32/pe BFD has other file formats
Adobe has pdf files for example. Where do they put in their files the code
that identifies a file as .pdf and is supposed to be opened as an acrobat
file ?

Bill
 

 
Bartc
PostPosted: Wed Aug 20, 2008 8:27 pm    Post subject: Re: binary files format
       
"Bill Cunningham" <nospam@nspam.com> wrote in message
news:9n0rk.372$lf2.317@trnddc07...
Quote:
Where would one put a signature for a file format in a file. Be it
assembly or a C program? For example, win32/pe BFD has other file formats
Adobe has pdf files for example. Where do they put in their files the code
that identifies a file as .pdf and is supposed to be opened as an acrobat
file ?

At the beginning usually.

--
Bartc
 

 
jellybean stonerfish
PostPosted: Wed Aug 20, 2008 9:08 pm    Post subject: Re: binary files format
       
On Wed, 20 Aug 2008 21:53:09 +0000, Bill Cunningham wrote:

Quote:
Where would one put a signature for a file format in a file. Be it
assembly or a C program? For example, win32/pe BFD has other file
formats Adobe has pdf files for example. Where do they put in their
files the code that identifies a file as .pdf and is supposed to be
opened as an acrobat file ?

Bill

Type "file magic numbers" in a search engine.

pdf start with %PDF I believe.

stonerfish
 

 
Harold Aptroot
PostPosted: Wed Aug 20, 2008 11:51 pm    Post subject: Re: binary files format
       
"Bill Cunningham" <nospam@nspam.com> wrote in message
news:9n0rk.372$lf2.317@trnddc07...
Quote:
Where would one put a signature for a file format in a file. Be it
assembly or a C program? For example, win32/pe BFD has other file formats
Adobe has pdf files for example. Where do they put in their files the code
that identifies a file as .pdf and is supposed to be opened as an acrobat
file ?

Bill

Be original and put it at the end Wink
Or better yet, store an offset at the end of the file, and store the magic
value at that offset

What are you trying to do though, design your own format? (if so, just make
up something, who cares right? to be even safer, also store a checksum)
Or are you trying to decode some unknown file and looking for the magic
value? (99% chance it'll be at the beginning if it exists at all, and
obviously: having 2 files of the same type helps a lot)

The code that ultimately identifies the file and determines what program to
open it with is the extension (it is possible to interpret your question in
such way that that would be the answer)
 

 
cr88192
PostPosted: Thu Aug 21, 2008 12:04 am    Post subject: Re: binary files format
       
"Bill Cunningham" <nospam@nspam.com> wrote in message
news:9n0rk.372$lf2.317@trnddc07...
Quote:
Where would one put a signature for a file format in a file. Be it
assembly or a C program? For example, win32/pe BFD has other file formats
Adobe has pdf files for example. Where do they put in their files the code
that identifies a file as .pdf and is supposed to be opened as an acrobat
file ?


multiple questions are being asked here.

but, as for the OS knowing the file type, this is usually a result of the
file extension.
for example, the 'PDF' extension tells windows to open Acrobat, ...

now, for the app identifying file types (or verifying that the file with a
specific extension is actually the expected type), often this is done by
using magic numbers (very often, a special value in the first 4 bytes of the
file).


Quote:
Bill

 

 
[Jongware]
PostPosted: Thu Aug 21, 2008 12:08 pm    Post subject: Re: binary files format
       
Bill Cunningham wrote:
Quote:
Where would one put a signature for a file format in a file. Be it
assembly or a C program? For example, win32/pe BFD has other file formats
Adobe has pdf files for example. Where do they put in their files the code
that identifies a file as .pdf and is supposed to be opened as an acrobat
file ?

Some other posters mention checking by 'extension'. Well, it is a way,
although they cannot differentiate, for example, between the variants of
..exe files (plain MS-DOS, DPMI-extended LE, Windows PE/LE/NE), or
between different versions (Acrobat documents have been called 'PDF' for
ages). It's also important to note that old, reliable MS-DOS didn't
*care* if an executable had either a .com or a .exe extension -- it
checked the inside of the file.

What do you mean by 'assembly or C program'? If you make an executable,
you are bound by the OS for any possible leading signatures -- you can't
put a sig of your own at the start of a program. You can, however, put
anything you want inside, or even better: at the end. For an executable,
that's a fairly neat trick, as the OS presumably never "sees" something
that's outside the bounds of the program as reported by the program
itself. Self-unpacking executables work like this: it's an unpacker with
all data glued to the end. A contra point is that you need a fair bit of
understanding how executables are build (and also hope these specs don't
change within a reasonable time). It's also not platform independent.

Apart from cross-checking if a file's extension is the right one, I'd
say the most important reason for putting a sig in a datafile is version
info. That can get /so/ important that sometimes the version number
becomes a de facto standard -- GIF, for example, comes in just two
well-defined versions: GIF87a and GIF89a. There aren't any more versions.

Your PDF example also comes with a version number. The "magic" header is
"%PDF-x.y", where 'x.y' define a version of the official Acrobat
documentation. From memory, I think even the 2nd line -- "%" then some
gibberish -- is in the official specs.

.... For a plain data file, just put it at the start.

[Jw]
 

 
Alf P. Steinbach
PostPosted: Thu Aug 21, 2008 5:41 pm    Post subject: Re: binary files format
       
* [Jongware]:
Quote:
It's also important to note that old, reliable MS-DOS didn't
*care* if an executable had either a .com or a .exe extension -- it
checked the inside of the file.


And so does the default Windows command interpreter [cmd.exe]. You can call your
executable [foo.txt] if you want. Type "foo.txt" and it's executed, not loaded
in a text editor.

From old Win32 did also support registry specification of magic numbers, or
rather, magic data, for file types. As I recall the intention was that it would
be an UUID, but not restricted to that. But I never encountered any instance
where this scheme was actually used.


Cheers,

- Alf

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
 

 
Bill Cunningham
PostPosted: Mon Aug 25, 2008 5:40 pm    Post subject: Re: binary files format
       
"Harold Aptroot" <harold.aptroot@gmail.com> wrote in message
news:g8ihml$u92$1@registered.motzarella.org...
Quote:
What are you trying to do though, design your own format? (if so, just
make up something, who cares right? to be even safer, also store a
checksum)
Or are you trying to decode some unknown file and looking for the magic
value? (99% chance it'll be at the beginning if it exists at all, and
obviously: having 2 files of the same type helps a lot)

The code that ultimately identifies the file and determines what program
to open it with is the extension (it is possible to interpret your
question in such way that that would be the answer)

Designing my own format would involve a search for magic numbers that do
exist too. In order not to conflict. Sorry I've been away my DSL was moved
and now I'm on dial up. Extensions aren't always reliable either in OSs that
have them. Take windows for example. Some exes are dlls. And the other way
around.
I want to read the magic number from the byte code first before writing
anything. Say 0x7f. I think that's ELF. No extensions necessary only a
gnu/linux OS. I could read into a stream 500 bytes with fread and look
through that. Would that be sufficient ? To those who have done this if
anyone.
I would like to read and write a text reader that reads and writes to a
certain format. That's my idea.

Bill
 

 
cr88192
PostPosted: Tue Aug 26, 2008 4:37 am    Post subject: Re: binary files format
       
"Bill Cunningham" <nospam@nspam.com> wrote in message
news:HUDsk.657$482.289@trnddc06...
Quote:

"Harold Aptroot" <harold.aptroot@gmail.com> wrote in message
news:g8ihml$u92$1@registered.motzarella.org...
What are you trying to do though, design your own format? (if so, just
make up something, who cares right? to be even safer, also store a
checksum)
Or are you trying to decode some unknown file and looking for the magic
value? (99% chance it'll be at the beginning if it exists at all, and
obviously: having 2 files of the same type helps a lot)

The code that ultimately identifies the file and determines what program
to open it with is the extension (it is possible to interpret your
question in such way that that would be the answer)

Designing my own format would involve a search for magic numbers that
do exist too. In order not to conflict. Sorry I've been away my DSL was
moved and now I'm on dial up. Extensions aren't always reliable either in
OSs that have them. Take windows for example. Some exes are dlls. And the
other way around.
I want to read the magic number from the byte code first before writing
anything. Say 0x7f. I think that's ELF. No extensions necessary only a
gnu/linux OS. I could read into a stream 500 bytes with fread and look
through that. Would that be sufficient ? To those who have done this if
anyone.
I would like to read and write a text reader that reads and writes to a
certain format. That's my idea.


ELF is "\x7fELF"

now, for 4 byte magics, one route is to use a presumable TRNG, and generate
some random sequence.
chances are probably "one in a million" that it will actually clash with
anything. of course, if you use alphabetic or alphanumeric tags, even if
random, the chance of clash is far higher, since, for example, with
upper-case alphabetic codes, there are only about 500,000 of them, a decent
portion of which are probably also used (especially given the tendency of
the human mind to generate statistically predictable sequences, making it
such that many people end up trying to come up with the same sequences).

now, if one wants stronger measures, a 48, 64, 96, or 128 bit sequence is
far less likely to clash (128 bit sequences being GUIDs, which are commonly
used when people want a fairly good degree of confidence that there is no
clash, although most modern GUIDs are generated with TRNGs).

note: TRNG != PRNG where a TRNG will generate a unique non-repeating
sequence, but a PRNG will typically generate a non-unique
eventually-repeating sequence.

if using Linux, /dev/random can be used for GUIDs (more or less, there are a
few hidden fields that should be set to appropriate values). on windows
there is also a similar feature available.

there are also tools available for generating them as well (for both windows
and linux).


in my case, I have something vaguely similar, but smaller (it is 64 bits).
in particular I generate 48 bit random MAC addresses (bits are set so that
they are "locally defined", making it so that they do not clash with proper
MAC addresses). the upper 16 bits is currently a magic (although, it is
possible that this could be used as an extension field, for example, to gain
12 or so additional entropy bits, if needed, or a locally-defined EUI-64
value could also be possible...).

this is used for the "segment" field in a DSM scheme of mine (I use a 64:64
segmented addressing scheme, or 48:64 as it currently works out...).


typically, a similar scheme is used in my case for a "gensym" mechanism
(this gensym generating names that are also unique between nodes and runs,
vs more traditional "gensyms" that only generate locally unique names...).


Quote:
Bill

 

Page 1 of 1 .:.

Google
 
Webnews.only-4-geeks.com

Windows Update | C++ | C | PHP | JavaScript | Photoshop | Programming | Windows 2000 | Python | Windows XP | Object | Flash | Flash - ActionScript | Paint Shop Pro | Excel | PowerPoint | Access | Word | Windows 98 | Internet Explorer 6.0 | CorelDraw12 | Java | XML | asm x86 | Linux Mandrake | Linux RedHat | Outlook |  | news from newsgroups |_ | s

Web Templates

Awesome Website Templates ©

zaluzje roll up własne radio internetowe scooter insurance hale namiotowe