Google
 
Webnews.only-4-geeks.com
Interesting places
news.only-4-geeks.com Forum Index » ProgrammingGoto page 1, 2, 3, 4, 5  Next

Sorting a file with many already-sorted chunks

 
Jump to:  
 
Guest
PostPosted: Thu Aug 21, 2008 2:56 pm    Post subject: Sorting a file with many already-sorted chunks
       
What is the best sorting algorithm if the file consists of many chunks
of various sizes, each chunk is already ordered, but they are out of
order? Example:

38013
38014
38015
33016
33017
33018
44034
44035
44036
44037
44038
44039
44040
44041
44042
44043
44044
44045
44046
44047
34892
34893
34894
34895
34896
34897
34898
34899
34900
34901
34902
34903
34904
34905
34906
38016
38017
38018
38019
38020

Here we have five already-ordered chunks, length 3, 3, 14, 15 and 5. I
would think QuickSort is the best in such situation, but I am not sure.
 

 
Richard Heathfield
PostPosted: Thu Aug 21, 2008 2:56 pm    Post subject: Re: Sorting a file with many already-sorted chunks
       
ilya2@rcn.com said:

Quote:
What is the best sorting algorithm if the file consists of many chunks
of various sizes, each chunk is already ordered, but they are out of
order?

Expect improvements from others on this idea:

start a new group
while the read of the next datum succeeds
compare it to the previous datum (if there is one) in this group
if it's greater, add it to this group
otherwise, start a new group, and add this datum to the new group,
which becomes the "current" group, the "this" group
wend
merge the groups in the canonical way

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
 

 
Juha Nieminen
PostPosted: Thu Aug 21, 2008 2:56 pm    Post subject: Re: Sorting a file with many already-sorted chunks
       
ilya2@rcn.com wrote:
Quote:
What is the best sorting algorithm if the file consists of many chunks
of various sizes, each chunk is already ordered, but they are out of
order?

Read all the sorted chunks (detecting where a chunk ends and a new
starts is a trivial conditional) and then merge-sort all the chunks
together. (In other words, merge pairs of chunks, then merge pairs of
the resulting chunks, etc. until you have only one left.)
 

 
stan
PostPosted: Thu Aug 21, 2008 2:56 pm    Post subject: Re: Sorting a file with many already-sorted chunks
       
Richard Heathfield wrote:
Quote:
ilya2@rcn.com said:

What is the best sorting algorithm if the file consists of many chunks
of various sizes, each chunk is already ordered, but they are out of
order?

Expect improvements from others on this idea:

start a new group
while the read of the next datum succeeds
compare it to the previous datum (if there is one) in this group
if it's greater, add it to this group
otherwise, start a new group, and add this datum to the new group,
which becomes the "current" group, the "this" group
wend
merge the groups in the canonical way

I think this application would require great care. The scan to
find where a new sorted group starts and ends will most likely cost more
than a simple straightforward sort. There are too many unknowns to give
much advice. How large is the file? How large and how many sorted groups
are there? Are any elements repeated?

This sure sounds like one of those cases where it could pay to make sure
you are solving the right problem before jumping on what appears to be
an obvious solution. On top of that this really seems like one of those
times where it will be real easy to generate an O(n^2) solution when a
standard library qsort or a standard mergesort would probably win
without even considering any partial sort of the input.

How about more information about the input?
 

 
Pascal J. Bourguignon
PostPosted: Thu Aug 21, 2008 4:32 pm    Post subject: Re: Sorting a file with many already-sorted chunks
       
Juha Nieminen <nospam@thanks.invalid> writes:

Quote:
ilya2@rcn.com wrote:
What is the best sorting algorithm if the file consists of many chunks
of various sizes, each chunk is already ordered, but they are out of
order?

Read all the sorted chunks (detecting where a chunk ends and a new
starts is a trivial conditional) and then merge-sort all the chunks
together. (In other words, merge pairs of chunks, then merge pairs of
the resulting chunks, etc. until you have only one left.)

That's O(N^2).

You must do the merge of all chunks at once to reduce it to O(N).

As long as the number of chunks is not so big that at least one
element from each chunk can be held in memory, it can be done in O(N).

Read the file once, keeping in memory the start and end position of
each chunk, and the first element of each chunk.

Then scan all the chunks for the smallest element, and output it,
advancing the chunk (read the next element, incrementing the current
position for the chunk). Repeat until all chunks are empty.

--
__Pascal Bourguignon__ LINK

HANDLE WITH EXTREME CARE: This product contains minute electrically
charged particles moving at velocities in excess of five hundred
million miles per hour.
 

 
Richard Harter
PostPosted: Thu Aug 21, 2008 5:16 pm    Post subject: Re: Sorting a file with many already-sorted chunks
       
On Thu, 21 Aug 2008 20:32:10 +0200, pjb@informatimago.com (Pascal
J. Bourguignon) wrote:

Quote:
Juha Nieminen <nospam@thanks.invalid> writes:

ilya2@rcn.com wrote:
What is the best sorting algorithm if the file consists of many chunks
of various sizes, each chunk is already ordered, but they are out of
order?

Read all the sorted chunks (detecting where a chunk ends and a new
starts is a trivial conditional) and then merge-sort all the chunks
together. (In other words, merge pairs of chunks, then merge pairs of
the resulting chunks, etc. until you have only one left.)

That's O(N^2).

No, its O(N*log2(M)) where M is the number of chunks. It's
log2(M) passes with each pass being O(N).

Quote:

You must do the merge of all chunks at once to reduce it to O(N).

It's still O(N*log2(M)). You need a data structure to hold
headers for M chunks. Searching that data structure for the next
item to merge in is O(log2(M)).

Quote:

As long as the number of chunks is not so big that at least one
element from each chunk can be held in memory, it can be done in O(N).

Read the file once, keeping in memory the start and end position of
each chunk, and the first element of each chunk.

Then scan all the chunks for the smallest element, and output it,
advancing the chunk (read the next element, incrementing the current
position for the chunk). Repeat until all chunks are empty.


Richard Harter, cri@tiac.net
LINK LINK
Save the Earth now!!
It's the only planet with chocolate.
 

 
Richard Harter
PostPosted: Thu Aug 21, 2008 5:23 pm    Post subject: Re: Sorting a file with many already-sorted chunks
       
On Thu, 21 Aug 2008 16:13:56 GMT, Juha Nieminen
<nospam@thanks.invalid> wrote:

Quote:
ilya2@rcn.com wrote:
What is the best sorting algorithm if the file consists of many chunks
of various sizes, each chunk is already ordered, but they are out of
order?

Read all the sorted chunks (detecting where a chunk ends and a new
starts is a trivial conditional) and then merge-sort all the chunks
together. (In other words, merge pairs of chunks, then merge pairs of
the resulting chunks, etc. until you have only one left.)

If the chunks are of significantly unequal size it is more
efficient to sort the chunks by size and then successively merge
the two smallest. The principle is the same as in Huffman
encoding.



Richard Harter, cri@tiac.net
LINK LINK
Save the Earth now!!
It's the only planet with chocolate.
 

 
Richard Heathfield
PostPosted: Thu Aug 21, 2008 5:45 pm    Post subject: Re: Sorting a file with many already-sorted chunks
       
James said:

Quote:
On Aug 21, 12:13 pm, Juha Nieminen <nos...@thanks.invalid> wrote:
il...@rcn.com wrote:
What is the best sorting algorithm if the file consists of many chunks
of various sizes, each chunk is already ordered, but they are out of
order?

Read all the sorted chunks (detecting where a chunk ends and a new
starts is a trivial conditional) and then merge-sort all the chunks
together. (In other words, merge pairs of chunks, then merge pairs of
the resulting chunks, etc. until you have only one left.)

This works well for the OP's example data, but consider the data...

1
2
5
6
9
3
4

Splitting this data into the minimum number of chunks does not seem so
easy.

It's trivial, as both Juha and I have explained.

Quote:
We can split it into 3 chunks (1,2) (3,4) (5,6,9), but on a
single pass?

Yes, you can split it into three chunks just like that, in a single pass.

1 2
5 6 9
3 4

Then you merge the chunks:

1 2
5 6 9
3 4

Comparing 1, 5, 3, we take 1

2
5 6 9
3 4

(1)

Comparing 2, 5, 3, we take 2

5 6 9
3 4
(1, 2)

Comparing 5, 3, we take 3

5 6 9
4

(1, 2, 3)

Comparing 5, 4, we take 4

5 6 9

(1, 2, 3, 4)

Comparing 5 we take 5, and so on through the remainder of the single chunk.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
 

 
Guest
PostPosted: Thu Aug 21, 2008 5:54 pm    Post subject: Re: Sorting a file with many already-sorted chunks
       
On Aug 21, 12:13 pm, Juha Nieminen <nos...@thanks.invalid> wrote:
Quote:
il...@rcn.com wrote:
What is the best sorting algorithm if the file consists of many chunks
of various sizes, each chunk is already ordered, but they are out of
order?

  Read all the sorted chunks (detecting where a chunk ends and a new
starts is a trivial conditional) and then merge-sort all the chunks
together. (In other words, merge pairs of chunks, then merge pairs of
the resulting chunks, etc. until you have only one left.)

Thanks -- it worked quite well.
 

 
Richard Heathfield
PostPosted: Thu Aug 21, 2008 6:03 pm    Post subject: Re: Sorting a file with many already-sorted chunks
       
James said:

Quote:
On Aug 21, 3:45 pm, Richard Heathfield <r...@see.sig.invalid> wrote:

Yes, you can split it into three chunks just like that, in a single
pass.

1 2
5 6 9
3 4


I'm a rank amature in computational algorithms (I'm an engineer) but
how are you splitting the data into those 3 chunks instead of one of
the following?

1 2 5 6 9
3 4

Oops, good spot. Yes, the algorithm I described would indeed split it as 1,
2, 5, 6, 9 and 3, 4, not as three chunks. Sorry, just a braino there.

Which is fine. 1 2 5 6 9
3 4

gives you a very quick merge:

2 5 6 9
3 4
(1)

5 6 9
3 4
(1 2)

5 6 9
4
(1 2 3)

5 6 9

(1 2 3 4)

followed by a bolt-on move:

1 2 3 4 5 6 9

That's one pass (i.e. O(n)) to get your chunks, and a complete sort in only
four comparisons, for nine items. I'd call that pretty darn good. :-)

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
 

Page 1 of 5 .:. Goto page 1, 2, 3, 4, 5  Next

Google
 
Webnews.only-4-geeks.com

Windows Update | C++ | C | PHP | JavaScript | Photoshop | Programming | Windows 2000 | Python | Windows XP | Object | Flash | Flash - ActionScript | Paint Shop Pro | Excel | PowerPoint | Access | Word | Windows 98 | Internet Explorer 6.0 | CorelDraw12 | Java | XML | asm x86 | Linux Mandrake | Linux RedHat | Outlook |  | news from newsgroups |_ | s

Web Templates

Awesome Website Templates ©

grecja pierścionek rodos last minute gadżety reklamowe Może się wydawać - Ewelina Flinta