|  | Code timing |  | |
| | | Guest |  |
| Posted: Tue Aug 19, 2008 3:48 pm Post subject: Re: Code timing |  |
| |  | |
On 19 Aug., 11:16, santosh <santosh....@gmail.com> wrote:
| Quote: | Pascal J. Bourguignon wrote: santosh <santosh....@gmail.com> writes:
Pascal J. Bourguignon wrote:
snip
Benchmarks are really idiotic, and more and more with time...
Usually yes. But when they are necessary, I often conduct them with all non-essential tasks killed and the program under test given realtime priority.
Nonetheless. My remark is aimed at the thicker and thicker layers of JIT and pipeline and reordering and what not you find in newest high end processors. For example, if you try to benchmark the four operations by encapsulating inside a function/method call, all you get most probably will be the time needed to flush the call (JSR) and return (RET) pipelines. The operation itself will have a lot of time to be done in parallel during the restoring of registers and return.
I agree that benchmarking code fragments is becoming increasingly unreliable but there is still something to be said for profiling entire applications to discover bottlenecks, despite the fact that code speed is given exaggerated importance.
|
From time to time I do profiling also. There is no other way to find out, if you new optimized loop is really faster (or only more ugly and harder to read).
Sometimes interesting results come out of profiling sessions. Huge efforts to improve the performance may show no improvements, while tiny changes seemingly unrelated to performance lead to big differences in time.
For libraries (where you don't know how they will be used) the need for speed is bigger than for an actual application where you can estimate without benchmarks that it is fast enough.
Greetings Thomas Mertes
Seed7 Homepage: LINK Seed7 - The extensible programming language: User defined statements and operators, abstract data types, templates without special syntax, OO with interfaces and multiple dispatch, statically typed, interpreted or compiled, portable, runs under linux/unix/windows. |
| |
| | | Harold Aptroot |  |
| Posted: Tue Aug 19, 2008 4:56 pm Post subject: Re: Code timing |  |
"Martin Eisenberg" <martin.eisenberg@udo.edu> wrote in message news:6h04aoFhqgh6U1@mid.uni-berlin.de...
| Quote: | Harold Aptroot wrote:
I'm sorry .. I must admit I only gave it a quick skim.. Ah well maybe the warning will be good for other readers eh?
OK, never mind Do you (or anyone) happen to know of similar, good tables for floating-point instructions?
Martin
|
I think most everything is in here: LINK |
| |
| | | Harold Aptroot |  |
| Posted: Tue Aug 19, 2008 4:58 pm Post subject: Re: Code timing |  |
"Harold Aptroot" <harold.aptroot@gmail.com> wrote in message news:g8f51p$kk$1@registered.motzarella.org...
| Quote: | "Martin Eisenberg" <martin.eisenberg@udo.edu> wrote in message news:6h04aoFhqgh6U1@mid.uni-berlin.de... Harold Aptroot wrote:
I'm sorry .. I must admit I only gave it a quick skim.. Ah well maybe the warning will be good for other readers eh?
OK, never mind Do you (or anyone) happen to know of similar, good tables for floating-point instructions?
Martin
I think most everything is in here: LINK
|
Aahh sorry - there appear to be some pretty big gaps in there I've yet to find a better one though |
| |
| | | Martin Eisenberg |  |
| Posted: Wed Aug 20, 2008 12:26 pm Post subject: Re: Code timing |  |
| |  | |
Thanks again, Harold.
Any other takers on the original confusions? (Corrected version follows)
| Quote: | I wrote the C++ program at the end of this post to get an idea of the relative speeds of various mathematical operations. Now I'm wondering about two features of the results. Here's an example output from an Athlon 800 machine:
| Timing 2000 runs of 1000 calls | | div : 0.39 s 0.000195 ms/call | mul : 0.38 s 0.00019 ms/call | add : 0.39 s 0.000195 ms/call | null: 0.33 s 0.000165 ms/call | 1.2e+007 Repeat (y/n)? y | div : 0.39 s 0.000195 ms/call | mul : 0.38 s 0.00019 ms/call | add : 0.38 s 0.00019 ms/call | null: 0.88 s 0.00044 ms/call | 1.38e+007 Repeat (y/n)? n
Now--
1) The "null" time is always the same as for add/mul on the first run but about double that subsequently, as seen above. All other numbers behave consistently. Why might that be?
2) I expected division to be rather slower than add/mul. Sometimes the numbers are indeed higher but only by about 10% and in a minority of runs. Am I observing throughput instead of latency, which is what I'm interested in? I tried feeding dummy (the local one) back into compute() but, range issues aside, the times didn't change.
|
--------------------------------- #include <iostream> #include <ostream> #include <iomanip> #include <vector> #include <cstdlib> #include <ctime> #include <cmath>
using namespace std;
const int KBlockSize = 1000, KRunCount = 2000; const int KCallCount = KBlockSize * KRunCount;
void printClocks(const char* name, clock_t clocks) { float sec = float(clocks) / CLOCKS_PER_SEC; cout << right << setprecision(3); cout << name << ": " << setw( << sec << " s " << setw( << sec * 1000 / KCallCount << " ms/call\n"; }
double dummy = 0;
template<class Float, class Compute> class Runner { const char* name_; clock_t clocks_; public: Runner(const char* name) : name_(name) { vector<Float> v(KBlockSize + 1); for(int i = 0; i <= KBlockSize; ++i) v[i] = 1e-7 + rand() / Float(RAND_MAX); Compute compute; Float dummy = 0; clock_t clocks = -clock(); for(int r = 0; r < KRunCount; ++r) { for(int i = 0; i < KBlockSize; ++i) dummy += compute(v[i], v[i+1]); } clocks += clock(); ::dummy += dummy; clocks_ = clocks; } ~Runner() { printClocks(name_, clocks_); } };
struct Null { template<class Float> Float operator() (Float x, Float y) { return x; } };
struct Add { template<class Float> Float operator() (Float x, Float y) { return x + y; } };
struct Mul { template<class Float> Float operator() (Float x, Float y) { return x * y; } };
struct Div { template<class Float> Float operator() (Float x, Float y) { return x / y; } };
int main() { srand(time(0)); cout << "Timing " << KRunCount << " runs of " << KBlockSize << " calls\n\n"; char repeat; do { dummy = 0; { Runner<float, Null> r0("null"); Runner<float, Add > r1("add "); Runner<float, Mul > r2("mul "); Runner<float, Div > r3("div "); } cout << dummy << " Repeat (y/n)? "; } while(cin >> repeat && repeat != 'n'); return 0; }
// end of code |
| |
|
|