Google
 
Webnews.only-4-geeks.com
Interesting places
news.only-4-geeks.com Forum Index » ProgrammingGoto page 1, 2  Next

Code timing

 
Jump to:  
 
Martin Eisenberg
PostPosted: Mon Aug 18, 2008 9:42 am    Post subject: Code timing
       
Hi folks!

I wrote the C++ program at the end of this post to get an idea of
the relative speeds of various mathematical operations. Now I'm
wondering about two features of the results. Here's an example
output from an Athlon 800 machine:

| Timing 2000 runs of 1000 calls
|
| div : 0.39 s 0.000195 ms/call
| mul : 0.38 s 0.00019 ms/call
| add : 0.39 s 0.000195 ms/call
| null: 0.33 s 0.000165 ms/call
| 1.2e+007 Repeat (y/n)? y
| div : 0.39 s 0.000195 ms/call
| mul : 0.38 s 0.00019 ms/call
| add : 0.38 s 0.00019 ms/call
| null: 0.88 s 0.00044 ms/call
| 1.38e+007 Repeat (y/n)? n

Now--

1) The "null" time is always the same as for add/mul on the first
run but about double that subsequently, as seen above. All other
numbers behave consistently. Why might that be?

2) I expected division to be rather slower than add/mul.
Sometimes the numbers are indeed higher but only by about 10% and
in a minority of runs. Am I observing latency instead of throughput,
which is what I'm interested in? I tried feeding dummy (the
local one) back into compute() but, range issues aside,
the times didn't change.


Thanks,
Martin

---------------------------------
#include <iostream>
#include <ostream>
#include <iomanip>
#include <vector>
#include <cstdlib>
#include <ctime>
#include <cmath>

using namespace std;

const int KBlockSize = 1000, KRunCount = 2000;
const int KCallCount = KBlockSize * KRunCount;

void printClocks(const char* name, clock_t clocks)
{
float sec = float(clocks) / CLOCKS_PER_SEC;
cout << right << setprecision(3);
cout << name << ": " << setw(Cool << sec << " s "
<< setw(Cool << sec * 1000 / KCallCount << " ms/call\n";
}

double dummy = 0;

template<class Float, class Compute>
class Runner {
const char* name_;
clock_t clocks_;
public:
Runner(const char* name)
: name_(name)
{
vector<Float> v(KBlockSize + 1);
for(int i = 0; i <= KBlockSize; ++i) v[i] = 1e-7 + rand() / Float(RAND_MAX);
Compute compute;
Float dummy = 0;
clock_t clocks = -clock();
for(int r = 0; r < KRunCount; ++r) {
for(int i = 0; i < KBlockSize; ++i) dummy += compute(v[i], v[i+1]);
}
clocks += clock();
::dummy += dummy;
clocks_ = clocks;
}
~Runner() { printClocks(name_, clocks_); }
};

struct Null {
template<class Float>
Float operator() (Float x, Float y) { return x; }
};

struct Add {
template<class Float>
Float operator() (Float x, Float y) { return x + y; }
};

struct Mul {
template<class Float>
Float operator() (Float x, Float y) { return x * y; }
};

struct Div {
template<class Float>
Float operator() (Float x, Float y) { return x / y; }
};

int main() {
srand(time(0));
cout << "Timing " << KRunCount << " runs of " << KBlockSize << " calls\n\n";
char repeat;
do {
dummy = 0;
{
Runner<float, Null> r0("null");
Runner<float, Add > r1("add ");
Runner<float, Mul > r2("mul ");
Runner<float, Div > r3("div ");
}
cout << dummy << " Repeat (y/n)? ";
} while(cin >> repeat && repeat != 'n');
return 0;
}

// end of code
 

 
Pascal J. Bourguignon
PostPosted: Mon Aug 18, 2008 12:13 pm    Post subject: Re: Code timing
       
Martin Eisenberg <martin.eisenberg@udo.edu> writes:

Quote:
Hi folks!

I wrote the C++ program at the end of this post to get an idea of
the relative speeds of various mathematical operations. Now I'm
wondering about two features of the results. Here's an example
output from an Athlon 800 machine:

| Timing 2000 runs of 1000 calls
|
| div : 0.39 s 0.000195 ms/call
| mul : 0.38 s 0.00019 ms/call
| add : 0.39 s 0.000195 ms/call
| null: 0.33 s 0.000165 ms/call
| 1.2e+007 Repeat (y/n)? y
| div : 0.39 s 0.000195 ms/call
| mul : 0.38 s 0.00019 ms/call
| add : 0.38 s 0.00019 ms/call
| null: 0.88 s 0.00044 ms/call
| 1.38e+007 Repeat (y/n)? n

Now--

1) The "null" time is always the same as for add/mul on the first
run but about double that subsequently, as seen above. All other
numbers behave consistently. Why might that be?

2) I expected division to be rather slower than add/mul.
Sometimes the numbers are indeed higher but only by about 10% and
in a minority of runs. Am I observing latency instead of throughput,
which is what I'm interested in? I tried feeding dummy (the
local one) back into compute() but, range issues aside,
the times didn't change.

The function / method call times, and the cache fetch times (in a
multiprocessing environment, you may get interrupted for processing
for other processes, so your caches can be emptied at any time), are
several orders of magnitude greater than a simple operation like add
or divide.

Benchmarks are really idiotic, and more and more with time...


--
__Pascal Bourguignon__
 

 
Harold Aptroot
PostPosted: Mon Aug 18, 2008 3:28 pm    Post subject: Re: Code timing
       
"Martin Eisenberg" <martin.eisenberg@udo.edu> wrote in message
news:6gt5gaFh3fk8U1@mid.uni-berlin.de...
Quote:
Hi folks!

I wrote the C++ program at the end of this post to get an idea of
the relative speeds of various mathematical operations. Now I'm
wondering about two features of the results. Here's an example
output from an Athlon 800 machine:

| Timing 2000 runs of 1000 calls
|
| div : 0.39 s 0.000195 ms/call
| mul : 0.38 s 0.00019 ms/call
| add : 0.39 s 0.000195 ms/call
| null: 0.33 s 0.000165 ms/call
| 1.2e+007 Repeat (y/n)? y
| div : 0.39 s 0.000195 ms/call
| mul : 0.38 s 0.00019 ms/call
| add : 0.38 s 0.00019 ms/call
| null: 0.88 s 0.00044 ms/call
| 1.38e+007 Repeat (y/n)? n

Now--

1) The "null" time is always the same as for add/mul on the first
run but about double that subsequently, as seen above. All other
numbers behave consistently. Why might that be?

2) I expected division to be rather slower than add/mul.
Sometimes the numbers are indeed higher but only by about 10% and
in a minority of runs. Am I observing latency instead of throughput,
which is what I'm interested in? I tried feeding dummy (the
local one) back into compute() but, range issues aside,
the times didn't change.


(snipped)

<unimportant part skip if in a hurry>

The function calls themselves may take more time than their bodies
(depending on optimisation)
Also, the loops have to be unrolled a Lot to not make a difference
Random is much slower than anything else - generate an array of random
numbers first and only after that run the loop over it
Actually, don't run a loop over a large array of random numbers.. its timing
depends too much on ram speeds and caching behaviour

</unimportant>


The latency and throughput of all instructions are known though, and the
throughput of most instructions is atleast* 1 per cc there days (pipelined)
but there are exceptions
See LINK

*: meaning there can be more than 1 per cc
 

 
Harold Aptroot
PostPosted: Mon Aug 18, 2008 3:29 pm    Post subject: Re: Code timing
       
Quote:
The latency and throughput of all instructions are known though, and the
throughput of most instructions is atleast* 1 per cc there days
(pipelined) but there are exceptions
See LINK

*: meaning there can be more than 1 per cc

also go here: LINK
 

 
santosh
PostPosted: Mon Aug 18, 2008 5:18 pm    Post subject: Re: Code timing
       
Pascal J. Bourguignon wrote:

<snip>

Quote:
Benchmarks are really idiotic, and more and more with time...

Usually yes. But when they are necessary, I often conduct them with all
non-essential tasks killed and the program under test given realtime
priority.
 

 
Pascal J. Bourguignon
PostPosted: Mon Aug 18, 2008 7:15 pm    Post subject: Re: Code timing
       
santosh <santosh.k83@gmail.com> writes:

Quote:
Pascal J. Bourguignon wrote:

snip

Benchmarks are really idiotic, and more and more with time...

Usually yes. But when they are necessary, I often conduct them with all
non-essential tasks killed and the program under test given realtime
priority.

Nonetheless. My remark is aimed at the thicker and thicker layers of
JIT and pipeline and reordering and what not you find in newest high
end processors. For example, if you try to benchmark the four
operations by encapsulating inside a function/method call, all you get
most probably will be the time needed to flush the call (JSR) and
return (RET) pipelines. The operation itself will have a lot of time
to be done in parallel during the restoring of registers and return.


--
__Pascal Bourguignon__ LINK

"I have challenged the entire quality assurance team to a Bat-Leth
contest. They will not concern us again."
 

 
Martin Eisenberg
PostPosted: Mon Aug 18, 2008 7:47 pm    Post subject: Re: Code timing
       
Harold Aptroot wrote:

Quote:
The function calls themselves may take more time than their
bodies (depending on optimisation)
Also, the loops have to be unrolled a Lot to not make a
difference
Random is much slower than anything else - generate
an array of random numbers first and only after that run the
loop over it
Actually, don't run a loop over a large array of
random numbers.. its timing depends too much on ram speeds and
caching behaviour

I suppose I should be thankful that someone responds in earnest, but
somehow... The idea of posting code is that people take note of what
it actually does -- like pre-generating data, repeating often enough
that results stabilize (on this system), accounting for fixed costs,
you get the picture.

Quote:
See LINK

Thank you, without irony this time!


Martin

--
I think perhaps the most important problem is that we
are trying to understand the fundamental workings of
the universe via a language devised for telling one
another where the best fruit is. --Terry Pratchett
 

 
Harold Aptroot
PostPosted: Mon Aug 18, 2008 10:01 pm    Post subject: Re: Code timing
       
"Martin Eisenberg" <martin.eisenberg@udo.edu> wrote in message
news:6gu8vfFhoel0U1@mid.uni-berlin.de...
Quote:
Harold Aptroot wrote:

The function calls themselves may take more time than their
bodies (depending on optimisation)
Also, the loops have to be unrolled a Lot to not make a
difference
Random is much slower than anything else - generate
an array of random numbers first and only after that run the
loop over it
Actually, don't run a loop over a large array of
random numbers.. its timing depends too much on ram speeds and
caching behaviour

I suppose I should be thankful that someone responds in earnest, but
somehow... The idea of posting code is that people take note of what
it actually does -- like pre-generating data, repeating often enough
that results stabilize (on this system), accounting for fixed costs,
you get the picture.

See LINK

Thank you, without irony this time!


Martin

I'm sorry .. I must admit I only gave it a quick skim..
Ah well maybe the warning will be good for other readers eh?
 

 
santosh
PostPosted: Tue Aug 19, 2008 7:28 am    Post subject: Re: Code timing
       
Pascal J. Bourguignon wrote:

Quote:
santosh <santosh.k83@gmail.com> writes:

Pascal J. Bourguignon wrote:

snip

Benchmarks are really idiotic, and more and more with time...

Usually yes. But when they are necessary, I often conduct them with
all non-essential tasks killed and the program under test given
realtime priority.

Nonetheless. My remark is aimed at the thicker and thicker layers of
JIT and pipeline and reordering and what not you find in newest high
end processors. For example, if you try to benchmark the four
operations by encapsulating inside a function/method call, all you get
most probably will be the time needed to flush the call (JSR) and
return (RET) pipelines. The operation itself will have a lot of time
to be done in parallel during the restoring of registers and return.

I agree that benchmarking code fragments is becoming increasingly
unreliable but there is still something to be said for profiling entire
applications to discover bottlenecks, despite the fact that code speed
is given exaggerated importance.
 

 
Martin Eisenberg
PostPosted: Tue Aug 19, 2008 12:40 pm    Post subject: Re: Code timing
       
Harold Aptroot wrote:

Quote:
I'm sorry .. I must admit I only gave it a quick skim..
Ah well maybe the warning will be good for other readers eh?

OK, never mind Smile Do you (or anyone) happen to know of similar, good
tables for floating-point instructions?


Martin

--
The biggest difference between time and space is
that you can't reuse time. --Merrick Furst
 

Page 1 of 2 .:. Goto page 1, 2  Next

Google
 
Webnews.only-4-geeks.com

Windows Update | C++ | C | PHP | JavaScript | Photoshop | Programming | Windows 2000 | Python | Windows XP | Object | Flash | Flash - ActionScript | Paint Shop Pro | Excel | PowerPoint | Access | Word | Windows 98 | Internet Explorer 6.0 | CorelDraw12 | Java | XML | asm x86 | Linux Mandrake | Linux RedHat | Outlook |  | news from newsgroups |_ | s

Web Templates

Awesome Website Templates ©

pekin 2008 kolonie i obozy przepisy kulinarne compensation gry mario