Google
 
Webnews.only-4-geeks.com
Interesting places
news.only-4-geeks.com Forum Index » Python

Large amount of files to parse/organize, tips on algorithm?

 
Jump to:  
 
cnb
PostPosted: Tue Sep 02, 2008 4:48 pm    Post subject: Large amount of files to parse/organize, tips on algorithm?
       
I have a bunch of files consisting of moviereviews.

For each file I construct a list of reviews and then for each new file
I merge the reviews so that in the end have a list of reviewers and
for each reviewer all their reviews.

What is the fastest way to do this?

1. Create one file with reviews, open next file an for each review see
if the reviewer exists, then add the review else create new reviewer.

2. create all the separate files with reviews then mergesort them?
 

 
Steven D'Aprano
PostPosted: Tue Sep 02, 2008 4:48 pm    Post subject: Re: Large amount of files to parse/organize, tips on algorit
       
On Tue, 02 Sep 2008 09:48:32 -0700, cnb wrote:

Quote:
I have a bunch of files consisting of moviereviews.

For each file I construct a list of reviews and then for each new file I
merge the reviews so that in the end have a list of reviewers and for
each reviewer all their reviews.

What is the fastest way to do this?

Use the timeit module to find out.


Quote:
1. Create one file with reviews, open next file an for each review see
if the reviewer exists, then add the review else create new reviewer.

2. create all the separate files with reviews then mergesort them?

The answer will depend on whether you have three reviews or three
million, whether each review is twenty words or twenty thousand words,
and whether you have to do the merging once only or over and over again.


--
Steven
 

 
Eric Wertman
PostPosted: Tue Sep 02, 2008 4:48 pm    Post subject: Re: Large amount of files to parse/organize, tips on algorit
       
I think you really want use a relational database of some sort for this.

On Tue, Sep 2, 2008 at 2:02 PM, cnb <circularfunc@yahoo.se> wrote:
Quote:
over 17000 files...

netflixprize.
--
LINK
 

 
Paul Rubin
PostPosted: Tue Sep 02, 2008 4:48 pm    Post subject: Re: Large amount of files to parse/organize, tips on algorit
       
cnb <circularfunc@yahoo.se> writes:
Quote:
For each file I construct a list of reviews and then for each new file
I merge the reviews so that in the end have a list of reviewers and
for each reviewer all their reviews.

What is the fastest way to do this?

Scan through all the files sequentially, emitting records like

(movie, reviewer, review)

Then use an external sort utility to sort/merge that output file
on each of the 3 columns. Beats writing code.
 

 
cnb
PostPosted: Tue Sep 02, 2008 6:02 pm    Post subject: Re: Large amount of files to parse/organize, tips on algorit
       
On Sep 2, 7:06 pm, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:
Quote:
On Tue, 02 Sep 2008 09:48:32 -0700, cnb wrote:
I have a bunch of files consisting of moviereviews.

For each file I construct a list of reviews and then for each new file I
merge the reviews so that in the end have a list of reviewers and for
each reviewer all their reviews.

What is the fastest way to do this?

Use the timeit module to find out.

1. Create one file with reviews, open next file an for each review see
if the reviewer exists, then add the review else create new reviewer.

2. create all the separate files with reviews then mergesort them?

The answer will depend on whether you have three reviews or three
million, whether each review is twenty words or twenty thousand words,
and whether you have to do the merging once only or over and over again.

--
Steven



I merge once. each review has 3 fields, date rating customerid. in
total ill be parsing between 10K and 100K, eventually 450K reviews.
 

 
cnb
PostPosted: Tue Sep 02, 2008 6:02 pm    Post subject: Re: Large amount of files to parse/organize, tips on algorit
       
over 17000 files...

netflixprize.
 

 
jay graves
PostPosted: Tue Sep 02, 2008 6:50 pm    Post subject: Re: Large amount of files to parse/organize, tips on algorit
       
On Sep 2, 1:02 pm, cnb <circularf...@yahoo.se> wrote:
Quote:
over 17000 files...

netflixprize.

LINK

specifically:

LINK
 

Page 1 of 1 .:.

Google
 
Webnews.only-4-geeks.com

Windows Update | C++ | C | PHP | JavaScript | Photoshop | Programming | Windows 2000 | Python | Windows XP | Object | Flash | Flash - ActionScript | Paint Shop Pro | Excel | PowerPoint | Access | Word | Windows 98 | Internet Explorer 6.0 | CorelDraw12 | Java | XML | asm x86 | Linux Mandrake | Linux RedHat | Outlook |  | news from newsgroups |_ | s

Web Templates

Awesome Website Templates ©

palety przemysłowe strony www Kraków Biskupin fotele nowy styl texas holdem