|  | Total No. of "Records" in a File? |  | |
| | | W. eWatson |  |
| Posted: Sat Aug 23, 2008 3:23 pm Post subject: Total No. of "Records" in a File? |  |
I have an ordinary text file with a CR at the end of a line, and two numbers in each line. Is there some way to determine the number of lines (records) in the file before I begin reading it?
-- Wayne Watson (Watson Adventures, Prop., Nevada City, CA)
(121.015 Deg. W, 39.262 Deg. N) GMT-8 hr std. time) Obz Site: 39° 15' 7" N, 121° 2' 32" W, 2700 feet
Web Page: <www.speckledwithstars.net/> |
| |
| | | Nick Dumas |  |
| Posted: Sat Aug 23, 2008 3:32 pm Post subject: Re: Total No. of "Records" in a File? |  |
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Err...you want to know what is in a file before you open it? This could be done if you keep some external database documenting changes made to the file. But unless I misunderstand what you're saying, then it's not possible to know the contents of a file without opening and reading that file.
W. eWatson wrote:
| Quote: | I have an ordinary text file with a CR at the end of a line, and two numbers in each line. Is there some way to determine the number of lines (records) in the file before I begin reading it?
-----BEGIN PGP SIGNATURE----- |
Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - LINK
iEYEARECAAYFAkiwSZ4ACgkQLMI5fndAv9hXugCeJs5XBkLLne6ljqQggB/MoAVs SNIAoJxsU04cwcZMrH9QjElAbMD34RdK =RlmP -----END PGP SIGNATURE----- |
| |
| | | Grant Edwards |  |
| Posted: Sat Aug 23, 2008 3:48 pm Post subject: Re: Total No. of "Records" in a File? |  |
On 2008-08-23, W. eWatson <notvalid2@sbcglobal.net> wrote:
| Quote: | I have an ordinary text file with a CR at the end of a line, and two numbers in each line. Is there some way to determine the number of lines (records) in the file before I begin reading it?
|
If the lines are fixed lengh (e.g. always 12 bytes long), then you can use os.stat() or os.fstat() to find the size of the file. Divide the size of the file by the number of bytes in a line, and you get the number of lines.
-- Grant |
| |
| | | Fredrik Lundh |  |
| Posted: Sat Aug 23, 2008 3:51 pm Post subject: Re: Total No. of "Records" in a File? |  |
W. eWatson wrote:
| Quote: | I have an ordinary text file with a CR at the end of a line, and two numbers in each line. Is there some way to determine the number of lines (records) in the file before I begin reading it?
|
In the general case, no. A file is just a bunch of bytes. If you know that all lines have exactly the same length, you can of course fetch the file size and divide by the line size, but that doesn't work for arbitrary files.
Why do you need to know the number of lines before reading it, btw?
</F> |
| |
| | | W. eWatson |  |
| Posted: Sat Aug 23, 2008 3:53 pm Post subject: Re: Total No. of "Records" in a File? |  |
| |  | |
Nick Dumas wrote:
| Quote: | -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Err...you want to know what is in a file before you open it? This could be done if you keep some external database documenting changes made to the file. But unless I misunderstand what you're saying, then it's not possible to know the contents of a file without opening and reading that file.
W. eWatson wrote: I have an ordinary text file with a CR at the end of a line, and two numbers in each line. Is there some way to determine the number of lines (records) in the file before I begin reading it?
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - LINK
iEYEARECAAYFAkiwSZ4ACgkQLMI5fndAv9hXugCeJs5XBkLLne6ljqQggB/MoAVs SNIAoJxsU04cwcZMrH9QjElAbMD34RdK =RlmP -----END PGP SIGNATURE----- Maybe. I could see it if the file were truly in a record format. The # of |
records might be kept by the OS. It's conceivable that Python or the OS might see a file with a CR as "recordized". All unlikely though. Just checkin'.
How about in a slightly different case. Suppose I want to know the number of files in a folder? The OS and maybe some Python method might know that.
-- Wayne Watson (Watson Adventures, Prop., Nevada City, CA)
(121.015 Deg. W, 39.262 Deg. N) GMT-8 hr std. time) Obz Site: 39° 15' 7" N, 121° 2' 32" W, 2700 feet
Web Page: <www.speckledwithstars.net/> |
| |
| | | W. eWatson |  |
| Posted: Sat Aug 23, 2008 5:45 pm Post subject: Re: Total No. of "Records" in a File? |  |
| |  | |
Fredrik Lundh wrote:
| Quote: | W. eWatson wrote:
I have an ordinary text file with a CR at the end of a line, and two numbers in each line. Is there some way to determine the number of lines (records) in the file before I begin reading it?
In the general case, no. A file is just a bunch of bytes. If you know that all lines have exactly the same length, you can of course fetch the file size and divide by the line size, but that doesn't work for arbitrary files.
Why do you need to know the number of lines before reading it, btw?
/F
Actually, it was a matter of curiosity, and maybe absent mindedness. I was |
envisioning a program where I might want to run up and down a file a lot, sometimes deleting a record interactively at the request of the user. However, I wanted to keep him alert to the total number of records remaining. However, in retrospect, I more likely do this with files in a folder. I also want him to be able to skip around in the Win OS folder by saying something like go forward 3 files. I'd like not to have to read all the files between the two points. The whole idea needs some more thinking.
-- Wayne Watson (Watson Adventures, Prop., Nevada City, CA)
(121.015 Deg. W, 39.262 Deg. N) GMT-8 hr std. time) Obz Site: 39° 15' 7" N, 121° 2' 32" W, 2700 feet
Web Page: <www.speckledwithstars.net/> |
| |
| | | Grzegorz Staniak |  |
| Posted: Sat Aug 23, 2008 6:38 pm Post subject: Re: Total No. of "Records" in a File? |  |
On 23.08.2008, W. eWatson <notvalid2@sbcglobal.net> wroted:
| Quote: | Maybe. I could see it if the file were truly in a record format. The # of records might be kept by the OS. It's conceivable that Python or the OS might see a file with a CR as "recordized".
|
Isn't it much easier to use a database instead? That's what they're made for.
| Quote: | How about in a slightly different case. Suppose I want to know the number of files in a folder? The OS and maybe some Python method might know that.
|
Use "os" and "os.path". For a simple case the length of "os.listdir()" could suffice, but then you might need to filter out sub-directories, or maybe count files in them too using "os.walk()".
GS -- Grzegorz Staniak <gstaniak _at_ wp [dot] pl> |
| |
| | | Dennis Lee Bieber |  |
| Posted: Sat Aug 23, 2008 8:07 pm Post subject: Re: Total No. of "Records" in a File? |  |
| |  | |
On Sat, 23 Aug 2008 12:45:24 -0700, "W. eWatson" <notvalid2@sbcglobal.net> declaimed the following in comp.lang.python:
| Quote: | Actually, it was a matter of curiosity, and maybe absent mindedness. I was envisioning a program where I might want to run up and down a file a lot, sometimes deleting a record interactively at the request of the user.
|
You either define a fixed length record format, wherein finding a record is a simple [reclen * recno] (recno starting at 0) to find the start of a record, OR you allow variable length records but maintain a separate fixed length index wherein the index contains the offset to the record start, OR a hybrid wherein the fixed length data portion contains pointers to the variable length fields in another file (essentially, the dBase structure as used by, say, Visual FoxPro).
Most operating systems seem to have incorporated the UNIX stream construct for all data files... Unlike the ancient TRSDOS/LDOS family in which the OS supported direct access fixed length files (applications create the files with a specified record length, and I/O then operated using just record numbers) or ISAM keyed files (Xerox CP/V common editor files were really ISAM keyed -- the line numbers shown by the editor were the key; one had to go out of their way to convert such a file into "consecutive" [UNIX stream] format, opening in an editor implicitly converted back to ISAM). ISAM structure doesn't require a separate file, if one works in file blocks, and links the tree via block addresses... data would then be data block address/offset/length.
Note that none of these "advanced" file formats supports a pure delete-record; they can only mark the space of the record as free for use. One has to run a compaction process to reclaim wasted space.
| Quote: | However, I wanted to keep him alert to the total number of records remaining. However, in retrospect, I more likely do this with files in a folder. I also want him to be able to skip around in the Win OS folder by saying something like go forward 3 files. I'd like not to have to read all the files between the two points. The whole idea needs some more thinking.
|
What does "go forward 3 files" mean? Obtain a list of the "currently available" files, sort that list, find the position of the "currently processed" file in that list, then find the name of the file three spaces beyond?
Directories, on disk, tend to be unsorted, depending upon the OS. The "forward 3" may change depending on any other operations on the directory. On the ancient Amiga, each directory block contained 64 pointers -- the file name/path component, is hashed into one of 64 values, and then the pointer is followed to a linked list of sub-directory/file-header blocks (each having its name at the start; if the block name does not match the sought for name, one follows the linked list to the next block; file-header blocks contain a list of pointers to the data blocks of the file) So what does "forward 3" mean when one is using hashed name values? Does it mean forward 3 on the hash linked list (which is not sorted), or does it mean first file on the third hash down from the original. (Producing a sorted directory list requires collecting the file names in each file-header on each linked list, for each of the 64 hashes, and then sorting the results -- Amiga directories did not contain the names themselves). -- Wulfraed Dennis Lee Bieber KD6MOG wlfraed@ix.netcom.com wulfraed@bestiaria.com HTTP://wlfraed.home.netcom.com/ (Bestiaria Support Staff: web-asst@bestiaria.com) HTTP://www.bestiaria.com/ |
| |
| | | Bruno Desthuilliers |  |
| Posted: Tue Aug 26, 2008 6:49 am Post subject: Re: Total No. of "Records" in a File? |  |
W. eWatson a écrit :
| Quote: | I have an ordinary text file with a CR at the end of a line, and two numbers in each line. Is there some way to determine the number of lines (records) in the file before I begin reading it?
|
How could you know how many times a given character appears in file without reading the whole file ?
You could of course store metadata about one file in another file[1], but then you'd have to read and parse this other file, and it might go out of sync.
[1] or at the begining of your 'data' file - but you still have to rad at least this part, and you still have the potential sync problem.
Or you could use a fixed-size binary format for your records, and try dividing the file size by the record size.
What's your concrete use case, exactly ? |
| |
| | | Bruno Desthuilliers |  |
| Posted: Tue Aug 26, 2008 6:51 am Post subject: Re: Total No. of "Records" in a File? |  |
| |  | |
W. eWatson a écrit :
| Quote: | Fredrik Lundh wrote: W. eWatson wrote:
I have an ordinary text file with a CR at the end of a line, and two numbers in each line. Is there some way to determine the number of lines (records) in the file before I begin reading it?
In the general case, no. A file is just a bunch of bytes. If you know that all lines have exactly the same length, you can of course fetch the file size and divide by the line size, but that doesn't work for arbitrary files.
Why do you need to know the number of lines before reading it, btw?
/F
Actually, it was a matter of curiosity, and maybe absent mindedness. I was envisioning a program where I might want to run up and down a file a lot, sometimes deleting a record interactively at the request of the user. However, I wanted to keep him alert to the total number of records remaining. However, in retrospect, I more likely do this with files in a folder. I also want him to be able to skip around in the Win OS folder by saying something like go forward 3 files. I'd like not to have to read all the files between the two points. The whole idea needs some more thinking.
The whole idea is that you should learn what a DBMS is good for, IMHO. |
|
| |
|
|