Discussion:
octave benchmark test
Paul Thomas
2004-03-07 11:40:57 UTC
Permalink
I am posting the results of this Sciviews benchmark test on the help
list to hit the widest possible audience. John Eaton and the other
contributors to octave and octave-forge have my applause for the effort
that they have put in over the last couple of months to extend and speed
octave up.

I also wish to encourage you to go the whole hog and update to ATLAS,
FFW, octave-2.1.56(when it becomes the recommended version) and
octave-forge (in that order). On an Athlon 1700 RH9 machine with only
256Mbyte memory (of which more in a moment), the Sciviews benchmark
tests (see http://www.sciviews.org/other/benchmark.htm ) gave:

Octave Benchmark 2
==================
Number of times each test is run__________________________: 3


I. Matrix calculation octave-2.1.56
(Matlab6.5)
---------------------
Creation, transp., deformation of a 1500x1500 matrix (sec): 1.117
(0.6733)
800x800 normal distributed random matrix ^1000______ (sec): 0.1454
(0.2846)
Sorting of 2,000,000 random values__________________ (sec): 1.696
(0.9027)
700x700 cross-product matrix (b = a' * a)___________ (sec): 0.4062
(0.5264)
Linear regression over a 600x600 matrix (c = a \ b') (sec): 0.247
(0.2885)
------------------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 0.4822


II. Matrix functions
--------------------
FFT over 800,000 random values______________________ (sec): 0.3724
(0.5914)
Eigenvalues of a 320x320 random matrix______________ (sec): 0.9604
(1.0192)
Determinant of a 650x650 random matrix______________ (sec): 0.2894
(0.2450)
Cholesky decomposition of a 900x900 matrix__________ (sec): 0.3212
(0.3306)
Inverse of a 400x400 random matrix__________________ (sec): 0.1626
(0.1957)
------------------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 0.3259


III. Programmation
------------------
750,000 Fibonacci numbers calculation (vector calc)_ (sec): 0.6075
(0.9792)
Creation of a 2250x2250 Hilbert matrix (matrix calc) (sec): 1.137****
(1.1501)
Grand common divisors of 70,000 pairs (recursion)___ (sec): 0.6918
(0.4376)
Creation of a 220x220 Toeplitz matrix (loops)_______ (sec): 1.449
(0.0021)
Escoufier's method on a 37x37 matrix (mixed)________ (sec): 2.2
(1.1663)
------------------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 1.045


Total time for all 15 tests_________________________ (sec): 11.8
(8.7927)
Overall mean (sum of I, II and III trimmed means/3)_ (sec): 0.5475
--- End of test ---

**** The Hilbert matrix test uses a lot of memory. With one memory bank
AWOL, 256Mbyte is not enough. It took 7-8 repeats to allow the system
to clean up the memory enough that no swapping occurred during the test.

It will be noted that most of the difference now lies in the last two
tests, where Matlab must either be employing automatic vectorisation or
an on the fly compiler, when it can. This seems to be where Matworks
have been concentrating their effort since the Sciviews tests were
posted). Octave, with all the trimmings, now has roughly the performance
of Matlab6.0.

Vectorising the Toeplitz test, reduces its execution time in octave to
0.018s. TAKE NOTE!



-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------
David Bateman
2004-03-07 14:07:19 UTC
Permalink
Post by Paul Thomas
Sorting of 2,000,000 random values__________________ (sec): 1.696
(0.9027)
This slowup with octave is also a choice. The sort code in octave-forge
uses the mergesort code from python, while matlab uses a quicksort
algorithm. The benchmark above is reversed if partially ordered lists
are used, as for instance often happen in interpolation codes. So octaves
loss in the above benchmark shows more about the choices of the person
writing the benchmark than the octaves relative performance. Benchmarks
are all figments of someones deranged imagination...

D.
--
David Bateman ***@motorola.com
Motorola CRM +33 1 69 35 48 04 (Ph)
Parc Les Algorithmes, Commune de St Aubin +33 1 69 35 77 01 (Fax)
91193 Gif-Sur-Yvette FRANCE

The information contained in this communication has been classified as:

[x] General Business Information
[ ] Motorola Internal Use Only
[ ] Motorola Confidential Proprietary



-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------
Paul Thomas
2004-03-07 14:35:03 UTC
Permalink
I agree entirely, where details (ie. less factor of 1.5-2 either way)
are concerned. However, you and others gained factors of 5 or more that
have made the difference that I find impressive. Whether deranged or
not.......

Paul T
Post by David Bateman
Post by Paul Thomas
Sorting of 2,000,000 random values__________________ (sec): 1.696
(0.9027)
This slowup with octave is also a choice. The sort code in octave-forge
uses the mergesort code from python, while matlab uses a quicksort
algorithm. The benchmark above is reversed if partially ordered lists
are used, as for instance often happen in interpolation codes. So octaves
loss in the above benchmark shows more about the choices of the person
writing the benchmark than the octaves relative performance. Benchmarks
are all figments of someones deranged imagination...
D.
-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------
Michael Martin
2004-03-08 17:58:30 UTC
Permalink
Let me say at the start that I am a big fan of octave. I use both
octave and matlab, and I prefer octave for the most part. It is less
capricious and cantankerous than matlab for the most part. Also, with
matlab, I had better be able to see the license server or I am out of
luck. That is not always possible and there are times when I am in a
mission critical situation and cannot use matlab because of this.

There are places, however, where matlab simply outruns octave by a big
margin to the point where octave is nearly unusable for a given task.
The run time difference may be as much as 100x. For one type of
analysis, matlab's response takes a minute and while octave's is two
hours. Even as a die hard octave fan, I will think twice before I run
those sorts of analyses in octave as opposed to matlab.

Given that some benchmarks have been posted, I will take the liberty of
posting some of my results too. I hope it will generate a profitable
discussion.

The following is the result for the total time of the Octave2 tests on
a 2GHz G5 under OS X using both matlab (6.5.1) and octave (2.1.53). (I
have posted all the results at the end in case they are of interest)


Octave: Total time for all 15 tests_________________________ (sec):
15.53
Matlab: Total time for all 15 tests_________________________ (sec):
5.0555

As you can see, about a 3x difference in the total time. The big
offender is the sort, 7.348 for octave and 0.41325 for matlab. The next
largest is the Toeplitz matrix, 1.5 for octave and 0.012 for matlab.
The third largest difference is the eigenvalues of a 320x320 matrix,
1.144 for octave and 0.43039 for matlab. The forth largest is
Escoufier's method, 1.713 for octave and 0.96309 for matlab. All told
that accounts for almost 10 seconds of the 10.5 second difference.

Now some of the differences are quite likely due to algorithmic
differences. The difference in sorting is certainly likely algorithmic,
though given that my other numbers are slightly better than the times
in Paul Thomas' post, I have to wonder why my sort is quite so much
worse than his.

Another observation is that this test, Octave2, is mostly testing
algorithmic differences between matlab and octave. The individual test
cases generally execute a particular function or set of functions
repeatedly with matrices being operated on. The major portion of the
execution time would be expected to be in the function call(s). It has
been my experience that octave for the most part does as well as
matlab, and in a few cases (depending upon the edition of matlab) much
better when one gets down to this level, namely function calls. Where I
see problems in octave is looping & array functions. Let me add some
results from my own benchmarks. I ran this benchmark on matlab and
octave. The times for each test are reported and then matlab's speed
relative to octave is posted below the test result.


Float test
octave: a = 2001.000000, err = 0.000000, time = 7.752197
matlab: a = 2001.000000, err = 0.000000, time = 0.180680
speedX: 42.906

Sqrt test
octave: time = 0.114017
matlab: time = 0.184528
speedX: 0.61788

Log test
octave: time = 0.150121
matlab: time = 0.158714
speedX: 0.94586

Exp test
octave: time = 0.087117
matlab: time = 0.021750
speedX: 4.0054

Atan test
octave: time = 0.113081
matlab: time = 0.045912
speedX: 2.4630

Tan test
octave: time = 0.166375
matlab: time = 0.221086
speedX: 0.75254

Matrix test
octave: time = 0.001770
matlab: time = 0.001488
speedX: 1.1895

Sieve test using arrays
octave: time = 0.020089
matlab: time = 0.001444
speedX: 13.912

Sieve test using loops
octave: time = 0.092604
matlab: time = 0.001667
speedX: 55.551


The float test is a rather interesting result. The code for it is as
follows:

%% Float test ---------------------------------------------------
fprintf('Float test\n');
tic
for j = 1:100
a = 1;
for i = 1:2000
a = tan(atan(exp(log(sqrt(a*a))))) + 1;
end
end
elapsedTime = toc;
fprintf ('a = %f, err = %f, time = %f\n\n', a, a - 2001, elapsedTime);

As you can see, there is nothing remarkable about this test, other than
it runs some 42x slower on octave than matlab. I use it all the time to
test numerical stability of transcendental functions on whatever
computer I am running on (though usually in C).

It is the individual transcendental function tests that may be
revealing the answer, at least in part. Each of those test takes a
100x100 matrix and performs a given transcendental function on each
element of the matrix (e.g. result = log(matrix)). On the sqrt, log,
and tan tests, octave comes out ahead of matlab. But on the exp octave
is 4x slower and on the atan, octave is 2.5x slower. Algorithmic
certainly, but if one deals with these transcendental functions a lot,
then the slow down can be crushing, as was observed in the float test.

The other observation is the Sieve tests. This is the classic sieve of
Eratosthenes. The first method uses array operators to implement the
algorithm, the second method uses loops, much like one would do in a C
program. There is a very large slowdown with the arrays (13X), and then
the looping version takes a 55x hit over matlab.

If there are folks working on speed improvements, I would suggest that
you have parity with matlab as far as the built-in functions go for the
most part, but arrays and looping definitely need work.

[Yes, I know that conversion of some of these sorts of algorithms to C
& .oct files will improve speed considerably, but I do not always have
that option for a whole host of reasons.]


----------------------------------------------------------------------
Michael W. Martin Phone: (281) 333-2177
Draper Laboratory FAX: (281) 333-5276
2200 Space Park Dr. EMail: ***@simba.jsc.draper.com
Houston, TX 77058 WWW: http://www.jsc.draper.com/
USA Mail Code: EG/Draper
----------------------------------------------------------------------

Octave Benchmark 2
==================
Number of times each test is run__________________________: 3

---------------------
Creation, transp., deformation of a 1500x1500 matrix (sec):
1.251 [0.48235]
800x800 normal distributed random matrix ^1000______ (sec):
0.1284 [0.21914]
Sorting of 2,000,000 random values__________________ (sec):
7.348 [0.41325]
700x700 cross-product matrix (b = a' * a)___________ (sec):
0.1312 [0.19244]
Linear regression over a 600x600 matrix (c = a \ b') (sec):
0.09014 [0.10584]
------------------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 0.2762

II. Matrix functions
--------------------
FFT over 800,000 random values______________________ (sec):
0.2844 [0.28988]
Eigenvalues of a 320x320 random matrix______________ (sec):
1.144 [0.43039]
Determinant of a 650x650 random matrix______________ (sec):
0.1085 [0.095085]
Cholesky decomposition of a 900x900 matrix__________ (sec):
0.1051 [0.13952]
Inverse of a 400x400 random matrix__________________ (sec):
0.06585 [0.073918]
------------------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 0.148

III. Programmation
------------------
750,000 Fibonacci numbers calculation (vector calc)_ (sec):
0.5173 [0.77387]
Creation of a 2250x2250 Hilbert matrix (matrix calc) (sec):
0.7473 [0.6179]
Grand common divisors of 70,000 pairs (recursion)___ (sec):
0.3986 [0.24719]
Creation of a 220x220 Toeplitz matrix (loops)_______ (sec):
1.5 [0.011642]
Escoufier's method on a 37x37 matrix (mixed)________ (sec):
1.713 [0.96309]
------------------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 0.8338


Total time for all 15 tests_________________________ (sec):
15.53 [5.0555]
Overall mean (sum of I, II and III trimmed means/3)_ (sec): 0.3243
--- End of test ---


My speed.m file:

%% Speed tests

%% Float test ---------------------------------------------------
fprintf('Float test\n');
tic
for j = 1:100
a = 1;
for i = 1:2000
a = tan(atan(exp(log(sqrt(a*a))))) + 1;
end
end
elapsedTime = toc;
fprintf ('a = %f, err = %f, time = %f\n\n', a, a - 2001, elapsedTime);

%% Needed for following ----------------------------------------
mat = reshape(1:10000, 100,100);

%% sqrt test ---------------------------------------------------
fprintf('Sqrt test\n');
tic
for j = 1:100
mata = sin(mat);
end
elapsedTime = toc;
fprintf ('time = %f\n\n', elapsedTime);

%% log test ---------------------------------------------------
fprintf('Log test\n');
tic
for j = 1:100
mata = log(mat);
end
elapsedTime = toc;
fprintf ('time = %f\n\n', elapsedTime);

%% exp test ---------------------------------------------------
fprintf('Exp test\n');
tic
for j = 1:100
mata = exp(mat);
end
elapsedTime = toc;
fprintf ('time = %f\n\n', elapsedTime);

%% Atan test ---------------------------------------------------
fprintf('Atan test\n');
tic
for j = 1:100
mata = atan(mat);
end
elapsedTime = toc;
fprintf ('time = %f\n\n', elapsedTime);

%% Tan test ---------------------------------------------------
fprintf('Tan test\n');
tic
for j = 1:100
mata = tan(mat);
end
elapsedTime = toc;
fprintf ('time = %f\n\n', elapsedTime);

%% Matrix test ---------------------------------------------------
fprintf('Matrix test\n');
tic
res = mat*mat;
elapsedTime = toc;
fprintf ('time = %f\n\n', elapsedTime);

%% sieve test ---------------------------------------------------
fprintf('Sieve test using arrays\n');
tic
xlen = 2000;
x = linspace(1, xlen, xlen);

for i = 2:xlen/2
if x(i) ~= 0
x(2*i:i:xlen) = 0;
end
end
elapsedTime = toc;
fprintf ('time = %f\n\n', elapsedTime);

fprintf('Sieve test using loops\n');
tic
x = linspace(1, xlen, xlen);

for i = 2:xlen/2
if x(i) ~= 0
for j = 2*i:i:xlen
x(j) = 0;
end
end
end
elapsedTime = toc;
fprintf ('time = %f\n\n', elapsedTime);



-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------
Francesco Potorti`
2004-03-08 18:31:29 UTC
Permalink
Since we are speaking about benchmarks, there is an old benchmark of
mine at <ftp://fly.isti.cnr.it/pub/software/octave/benchmark.m>, and the
companion <ftp://fly.isti.cnr.it/pub/software/octave/bm_results>.

Both are outdated, that is, they list results for machines that are now
obsolete (1997). At the time, I stopped collecting results because of
some inconcistencies caused by compilation options which were not clear.

Anyway, it would be easy to update that bechmark and use it for a quick
assessment of how Octave performs on a given box. Probably, because of
the new Octave features, the set of tests should be updated, which is
fairly easy.

So, if anyone feels like upgrading this, I will put it under the GPL.
--
Francesco Potortì (ricercatore) Voice: +39 050 315 3058 (op.2111)
ISTI - Area della ricerca CNR Fax: +39 050 313 8091
via G. Moruzzi 1, I-56124 Pisa Email: ***@isti.cnr.it
Web: http://fly.isti.cnr.it/ Key: fly.isti.cnr.it/public.key



-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------
Paul Kienzle
2004-03-08 23:41:30 UTC
Permalink
Post by Michael Martin
Now some of the differences are quite likely due to algorithmic
differences. The difference in sorting is certainly likely
algorithmic, though given that my other numbers are slightly better
than the times in Paul Thomas' post, I have to wonder why my sort is
quite so much worse than his.
He is using David Bateman's sort from octave-forge
(http://octave.sf.net).

It is reported to have better performance on partially ordered lists,
but worse on random data compared to matlab.

Hmmm... I wonder if it is worthwhile to run through the data once to see
how ordered it is, then do quicksort or merge sort as needed?
Also, somebody with access to matlab might want to see if
"[y,idx]=sort(x)" is any slower than "y=sort(x)". Maybe they have
a fast stable sort algorithm that we don't know about.

Paul Kienzle
***@users.sf.net



-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------
Dmitri A. Sergatskov
2004-03-09 00:17:30 UTC
Permalink
Post by Paul Kienzle
It is reported to have better performance on partially ordered lists,
but worse on random data compared to matlab.
Well, may be I misunderstood what does "partially ordered means", but here is
what I got.
Post by Paul Kienzle
x1=rand(3000); % random
x2=[1:3000]; % ordered
x3=repmat(x2,3000,1); %
x=x3+2*x1; % partially ordered ?
tic ; sort(x1) ; toc
elapsed_time =

1.4351
Post by Paul Kienzle
tic ; sort(x3) ; toc
elapsed_time =

0.8567
Post by Paul Kienzle
tic ; sort(x) ; toc
elapsed_time =

1.4285
Post by Paul Kienzle
x=x3+1.1*x1;
tic ; sort(x) ; toc
elapsed_time =

1.4355
Post by Paul Kienzle
tic ; [y,idx]=sort(x) ; toc
elapsed_time =

2.2073
Post by Paul Kienzle
tic ; [y,idx]=sort(x) ; toc % So we will not need to allocate y and idx
elapsed_time =

2.0607


(Octave 2.1.56):

octave:1> x1=rand(3000);
octave:2> x2=[1:3000];
octave:3> x3=repmat(x2,3000,1);
octave:4> x=x3+2*x1;
octave:5> tic ; sort(x) ; toc
ans = 2.1968
octave:6> tic ; sort(x1) ; toc
ans = 1.9328
octave:7> tic ; sort(x3) ; toc
ans = 0.58155
octave:8> tic ; [y,idx]= sort(x) ; toc
ans = 3.5003
octave:9> tic ; [y,idx]= sort(x) ; toc
ans = 3.4638
Post by Paul Kienzle
Paul Kienzle
Dmitri.



-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------
Paul Kienzle
2004-03-09 02:46:36 UTC
Permalink
Post by Dmitri A. Sergatskov
Post by Paul Kienzle
It is reported to have better performance on partially ordered lists,
but worse on random data compared to matlab.
Well, may be I misunderstood what does "partially ordered means", but here is
what I got.
Thanks. I don't know the details of the algorithm, but I believe it is
a merge sort variant. These tend to do better if you start with a
small number of long 'runs' rather than a large number of short
'runs', as in the following:

octave:42> w=[1:1000000];
octave:43> x=repmat([1:500000],1,2);
octave:44> y=repmat([1:100000],1,10);
octave:45> z=rand(1000000,1);
octave:46> tic; sort(w); toc
ans = 0.60063
octave:47> tic; sort(x); toc
ans = 0.80005
octave:48> tic; sort(y); toc
ans = 1.0342
octave:49> tic; sort(z); toc
ans = 1.4839

x is an imporant case for table lookups.

It would be nice if the random case were
1x rather 1.5x matlab. Still, 1.5x is loads
better than the 7x it used to be.

Paul Kienzle
***@users.sf.net



-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------
David Bateman
2004-03-09 10:00:49 UTC
Permalink
Dmitri,

Please find attached my test code for the speed of the sort code. This is
based on a simialr benchmark within Python itself. Code is called like

octave:1> sort_time = testsort('sort');

Test of the sorting function (sort)

\ N |1.00e+02 1.00e+03 1.00e+04 1.00e+05 1.00e+06
Test \ |
*sort |3.82e-05 5.26e-04 6.41e-03 8.65e-02 1.18e+00
\sort |1.15e-05 1.12e-04 1.13e-03 1.75e-02 2.08e-01
/sort |1.10e-05 1.07e-04 1.09e-03 1.49e-02 1.69e-01
3sort |2.06e-05 1.26e-04 1.19e-03 1.79e-02 2.22e-01
+sort |1.69e-05 1.19e-04 1.14e-03 1.66e-02 1.99e-01
=sort |1.09e-05 1.06e-04 1.11e-03 1.52e-02 1.67e-01

Normalized times against Matlab R12 on an IBM T23

\ N |1.00e+02 1.00e+03 1.00e+04 1.00e+05 1.00e+06
Test \ |
*sort |2.25e+00 2.30e+00 2.35e+00 2.40e+00 2.31e+00
\sort |9.94e-01 8.17e-01 6.47e-01 7.78e-01 6.33e-01
/sort |1.03e+00 8.33e-01 6.59e-01 6.75e-01 5.49e-01
3sort |1.88e+00 9.63e-01 7.28e-01 8.20e-01 7.15e-01
+sort |1.12e+00 6.38e-01 5.17e-01 5.43e-01 4.27e-01
=sort |8.97e-01 6.76e-01 5.22e-01 5.64e-01 4.43e-01

octave:2> sort_time_index = testsort('sort, [], 1);

Test of the sorting function (sort) with indexing

\ N |1.00e+02 1.00e+03 1.00e+04 1.00e+05 1.00e+06
Test \ |
*sort |6.39e-05 8.74e-04 1.15e-02 1.80e-01 2.55e+00
\sort |2.50e-05 2.54e-04 2.66e-03 3.95e-02 4.70e-01
/sort |2.46e-05 2.43e-04 2.51e-03 3.33e-02 4.03e-01
3sort |4.03e-05 2.73e-04 2.63e-03 3.66e-02 4.30e-01
+sort |3.34e-05 2.61e-04 2.56e-03 3.52e-02 4.20e-01
=sort |2.49e-05 2.49e-04 2.54e-03 3.40e-02 4.07e-01

Normalized times against Matlab R12 on an IBM T23

\ N |1.00e+02 1.00e+03 1.00e+04 1.00e+05 1.00e+06
Test \ |
*sort |2.69e+00 2.89e+00 3.09e+00 3.12e+00 3.04e+00
\sort |1.54e+00 1.37e+00 1.19e+00 1.24e+00 1.14e+00
/sort |1.75e+00 1.50e+00 1.25e+00 1.18e+00 1.11e+00
3sort |2.67e+00 1.68e+00 1.31e+00 1.30e+00 1.19e+00
+sort |1.59e+00 1.08e+00 8.78e-01 8.42e-01 7.50e-01
=sort |1.28e+00 9.49e-01 6.93e-01 6.62e-01 6.07e-01

Note the times calibrated against Matlab R12 are only valid on my machine.

D.
Post by Dmitri A. Sergatskov
Post by Paul Kienzle
It is reported to have better performance on partially ordered lists,
but worse on random data compared to matlab.
Well, may be I misunderstood what does "partially ordered means", but here is
what I got.
Post by Paul Kienzle
x1=rand(3000); % random
x2=[1:3000]; % ordered
x3=repmat(x2,3000,1); %
x=x3+2*x1; % partially ordered ?
tic ; sort(x1) ; toc
elapsed_time =
1.4351
Post by Paul Kienzle
tic ; sort(x3) ; toc
elapsed_time =
0.8567
Post by Paul Kienzle
tic ; sort(x) ; toc
elapsed_time =
1.4285
Post by Paul Kienzle
x=x3+1.1*x1;
tic ; sort(x) ; toc
elapsed_time =
1.4355
Post by Paul Kienzle
tic ; [y,idx]=sort(x) ; toc
elapsed_time =
2.2073
Post by Paul Kienzle
tic ; [y,idx]=sort(x) ; toc % So we will not need to allocate y
and idx
elapsed_time =
2.0607
octave:1> x1=rand(3000);
octave:2> x2=[1:3000];
octave:3> x3=repmat(x2,3000,1);
octave:4> x=x3+2*x1;
octave:5> tic ; sort(x) ; toc
ans = 2.1968
octave:6> tic ; sort(x1) ; toc
ans = 1.9328
octave:7> tic ; sort(x3) ; toc
ans = 0.58155
octave:8> tic ; [y,idx]= sort(x) ; toc
ans = 3.5003
octave:9> tic ; [y,idx]= sort(x) ; toc
ans = 3.4638
Post by Paul Kienzle
Paul Kienzle
Dmitri.
-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.
Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------
--
David Bateman ***@motorola.com
Motorola CRM +33 1 69 35 48 04 (Ph)
Parc Les Algorithmes, Commune de St Aubin +33 1 69 35 77 01 (Fax)
91193 Gif-Sur-Yvette FRANCE

The information contained in this communication has been classified as:

[x] General Business Information
[ ] Motorola Internal Use Only
[ ] Motorola Confidential Proprietary



-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------
David Bateman
2004-03-09 09:44:15 UTC
Permalink
Michael,

Please do the following

1) Upgrade to 2.1.56 from 2.1.53 due to the bugs in 2.1.53 and some speed-up
2) Please install the latest version of octave-forge where the sort and randn
codes are sped-up, which you are clearly missing.
3) Please check that you are linked against Atlas and FFTW for acceleration of
matrix and fft operations. Use the 'octave_config_info' command and look
for BLAS_LIBS and FFTW_LIBS and see what they are set to.
4) For the 100x case please vectorize your code....

Do this and then we can talk on an even footing about speed....
Post by Michael Martin
The float test is a rather interesting result. The code for it is as
%% Float test ---------------------------------------------------
fprintf('Float test\n');
tic
for j = 1:100
a = 1;
for i = 1:2000
a = tan(atan(exp(log(sqrt(a*a))))) + 1;
end
end
elapsedTime = toc;
fprintf ('a = %f, err = %f, time = %f\n\n', a, a - 2001,
elapsedTime);
Vectorize code wherever possible!!!!! Octave doesn't have JIT which is
where Matlab wins on this case. The above is a null example, since the
core of the algorithm results in a null operation as
'tan(atan(exp(log(sqrt(a*a))))) = a', so a vectorized version is

a = 2001;

But if you are getting such big differences between matlab and octave it
is time to consider how you have written your code, and probably make
similar simplifications to the above.

Regards
David
--
David Bateman ***@motorola.com
Motorola CRM +33 1 69 35 48 04 (Ph)
Parc Les Algorithmes, Commune de St Aubin +33 1 69 35 77 01 (Fax)
91193 Gif-Sur-Yvette FRANCE

The information contained in this communication has been classified as:

[x] General Business Information
[ ] Motorola Internal Use Only
[ ] Motorola Confidential Proprietary



-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------
Michael Martin
2004-03-09 16:43:39 UTC
Permalink
Post by David Bateman
1) Upgrade to 2.1.56 from 2.1.53 due to the bugs in 2.1.53 and some speed-up
I will do that if I get a chance. My installation is via fink, which
is generally clean and easy, if not always the latest ad greatest.
Since I am in a production mode, I would prefer not to live on the
bleeding edge. Fink is quite acceptable to me. Mac OSX has ATLAS & BLAS
built-in via its Frameworks, and my octave/fink does indeed use them. I
would say that my results show this given the similarity between my
numbers for the matrix operations and Paul Thomas' numbers. Though we
are running on different architectures, the same sorts of numbers are
seen in both or our octave and matlab results. fftw libs is also
required by fink, by the way.

I have experimented over the last several years with compiler
settings, compilers etc. I have run different level of both octave and
matlab, and on different platforms, etc. The results have shown pretty
much the same trends wrt looping. I doubt an upgrade will change the
results significantly.
Post by David Bateman
2) Please install the latest version of octave-forge where the sort and randn
codes are sped-up, which you are clearly missing.
Yes, I will try and do that. I am sure the discrepancy in the sort
times was due to different codes being run. Again, that is consistent
with Paul Thomas' numbers and mine. Ours match pretty closely EXCEPT
for the sort. Ergo different codes are likely the source of the
difference.

Taking the next two out of order....
Post by David Bateman
Vectorize code wherever possible!!!!! Octave doesn't have JIT which is
where Matlab wins on this case. The above is a null example, since the
core of the algorithm results in a null operation as
'tan(atan(exp(log(sqrt(a*a))))) = a', so a vectorized version is
a = 2001;
You have missed the point of the test. If the test had been to
calculate 2001 as quickly as possible, then I would certainly agree!
:-) However, the test was not to see how quickly one can calculate
2001, but to test the speed and numerical stability of ocatve for the
common transcendental functions. Since 2001 and is a nice
human-readable number, it is easy too subtract it from the result get a
feel for the error. The repeated use of the transcendental functions
and inverses will usually quickly show if there are any serious
problems with them.

I always run this test whenever I port code to a new platform or
whenever a platform has seen a significant OS or compiler update.
Usually I use my standard C program and Fortran program that I have had
for what must be close to 20 years now. However, when I am working with
a package such as octave or matlab, I will code it into the package's
langauge and test it there. In the past this test has revealed
significant problems with the code of transcendental functions on more
than one platform/compiler/package. All of my applications make
extensive use of transcendental functions and any significant errors in
them will lead to erroneous results. Call me paranoid :-), but since my
algorithms ends up in manned spacecraft, I would rather know **before**
flight that my codes have problems in them as opposed to **in** flight,
at which time it would invariably be too late. :-)

Now octave certainly passed the numerical stability test, so that is
not an issue. However, the test did reveal that for whatever reasons,
this particular piece of code does not run as fast in octave as opposed
to matlab. My intent on this test was simply to inform the community.
Perhaps someone is currently working on some octave code that may find
the data of use.
Post by David Bateman
4) For the 100x case please vectorize your code....
Regrettably, this is not possible in the particular case I mentioned.
Not all codes can be vectorized as much as one might like. There a
great number of different reasons for this. Let me touch upon just two
of them; namely algorithmic similarity, and heuristic type codes.

In the case of algorithmic similarity, if one has a code that produces
a result independently of other systems, then one does indeed have
great flexibility in coding style, etc. However, if one has a code or
algorithm that one is testing and that will be used in the "real
world", then it is best to keep the code as reasonably close to what
will run in the "real world". In other words, if the goal of the
analysis is to ensure the algorithm will behave properly under a wide
range of input scenarios in the real world, then the algorithm should
be kept as close as possible to the actual algorithm as it will be used
in the real world. Only then can one have confidence that the algorithm
will behave the same in the real world as it is behaving in the
laboratory.

In the case of heuristic type codes, at least the ones I have, there
are no linear/matrix formulations of the problem. Yes, vectors are
involved. Vectors are associated with the control system effectors and
vector/matrix calculations are taking place, so at that level the code
can be vectorized. However, these vectors are searched for suitability
to satisfy the control system requirements. The problem is
combinatorial and cannot really be vectorized. Not only does this
control system algorithm not lend itself very well to vectorization, it
itself is used in a non-vectorized fashion. A great number of cases
with varying control system requirements are run in a non-linear
fashion in order to determine control system performance and robustness
at extrema.

Again, my intent on mentioning looping & arrays (where the
inefficiency seems to lie), was simply to inform the community.
Perhaps someone is currently working on some octave code that may find
the observations of use. Given how well octave does with most of the
common vector and matrix functions, I would propose that if folks are
looking at speed improvement strategies for the octave code, then there
is much gold to mine here with looping & arrays. More so, I would say
then further improvements to matrix & vector functions, since for the
most part they are on par with matlab, if not better.


----------------------------------------------------------------------
Michael W. Martin Phone: (281) 333-2177
Draper Laboratory FAX: (281) 333-5276
2200 Space Park Dr. EMail: ***@simba.jsc.draper.com
Houston, TX 77058 WWW: http://www.jsc.draper.com/
USA Mail Code: EG/Draper
----------------------------------------------------------------------



-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------
David Bateman
2004-03-09 17:27:33 UTC
Permalink
Post by Michael Martin
Post by David Bateman
1) Upgrade to 2.1.56 from 2.1.53 due to the bugs in 2.1.53 and some speed-up
I will do that if I get a chance. My installation is via fink, which
is generally clean and easy, if not always the latest ad greatest.
Since I am in a production mode, I would prefer not to live on the
bleeding edge. Fink is quite acceptable to me. Mac OSX has ATLAS & BLAS
built-in via its Frameworks, and my octave/fink does indeed use them. I
would say that my results show this given the similarity between my
numbers for the matrix operations and Paul Thomas' numbers. Though we
are running on different architectures, the same sorts of numbers are
seen in both or our octave and matlab results. fftw libs is also
required by fink, by the way.
Humm, as for bleeding-edge 2.1.53 is as bleeding as it gets. Since then
has mostly be bug fixes. Try

octave:1> a(:,:,1) = ones(2,2);
octave:2> a(:,:,2) = 2*ones(2,2)
a =

ans(:,:,1) =

1 1
1 1

ans(:,:,2) =

2 2
2 2


and see what your answer is. There are lots of other fixes, though.

octave as of 2.1.54 needs FFTW3 and not FFTW2. So the presence of FFTW
on your system is not a guarantee that it can be used.
Post by Michael Martin
I have experimented over the last several years with compiler
settings, compilers etc. I have run different level of both octave and
matlab, and on different platforms, etc. The results have shown pretty
much the same trends wrt looping. I doubt an upgrade will change the
results significantly.
Loop performance is related to JIT. Vectorize ALL loops... If you can
avoid the loop, the the code is a strong candidate for conversion to
an oct-file. In this manor I have virtually no performance difference
between matlab/octave.
Post by Michael Martin
Post by David Bateman
2) Please install the latest version of octave-forge where the sort and randn
codes are sped-up, which you are clearly missing.
Yes, I will try and do that. I am sure the discrepancy in the sort
times was due to different codes being run. Again, that is consistent
with Paul Thomas' numbers and mine. Ours match pretty closely EXCEPT
for the sort. Ergo different codes are likely the source of the
difference.
Taking the next two out of order....
Post by David Bateman
Vectorize code wherever possible!!!!! Octave doesn't have JIT which is
where Matlab wins on this case. The above is a null example, since the
core of the algorithm results in a null operation as
'tan(atan(exp(log(sqrt(a*a))))) = a', so a vectorized version is
a = 2001;
You have missed the point of the test. If the test had been to
calculate 2001 as quickly as possible, then I would certainly agree!
:-) However, the test was not to see how quickly one can calculate
2001, but to test the speed and numerical stability of ocatve for the
common transcendental functions. Since 2001 and is a nice
human-readable number, it is easy too subtract it from the result get a
feel for the error. The repeated use of the transcendental functions
and inverses will usually quickly show if there are any serious
problems with them.
I always run this test whenever I port code to a new platform or
whenever a platform has seen a significant OS or compiler update.
Usually I use my standard C program and Fortran program that I have had
for what must be close to 20 years now. However, when I am working with
a package such as octave or matlab, I will code it into the package's
langauge and test it there. In the past this test has revealed
significant problems with the code of transcendental functions on more
than one platform/compiler/package. All of my applications make
extensive use of transcendental functions and any significant errors in
them will lead to erroneous results. Call me paranoid :-), but since my
algorithms ends up in manned spacecraft, I would rather know **before**
flight that my codes have problems in them as opposed to **in** flight,
at which time it would invariably be too late. :-)
Now octave certainly passed the numerical stability test, so that is
not an issue. However, the test did reveal that for whatever reasons,
this particular piece of code does not run as fast in octave as opposed
to matlab. My intent on this test was simply to inform the community.
Perhaps someone is currently working on some octave code that may find
the data of use.
I didn't miss the point as such..... We all know that Matlab is faster in
loops by many orders of magnitude. However, there are no reason to use loops
in virtually all cases.

As a test of numerical stability, sure your test is fine. But its not a
benchmark and this type of abuse of loops should be avoided at all costs
(even in Matlab)
Post by Michael Martin
Post by David Bateman
4) For the 100x case please vectorize your code....
Regrettably, this is not possible in the particular case I
mentioned. Not all codes can be vectorized as much as one might like. There
a great number of different reasons for this. Let me touch upon just two
of them; namely algorithmic similarity, and heuristic type codes.
In the case of algorithmic similarity, if one has a code that
produces a result independently of other systems, then one does indeed have
great flexibility in coding style, etc. However, if one has a code or
algorithm that one is testing and that will be used in the "real
world", then it is best to keep the code as reasonably close to what
will run in the "real world". In other words, if the goal of the
analysis is to ensure the algorithm will behave properly under a wide
range of input scenarios in the real world, then the algorithm should
be kept as close as possible to the actual algorithm as it will be used
in the real world. Only then can one have confidence that the algorithm
will behave the same in the real world as it is behaving in the
laboratory.
In the case of heuristic type codes, at least the ones I have, there
are no linear/matrix formulations of the problem. Yes, vectors are
involved. Vectors are associated with the control system effectors and
vector/matrix calculations are taking place, so at that level the code
can be vectorized. However, these vectors are searched for suitability
to satisfy the control system requirements. The problem is
combinatorial and cannot really be vectorized. Not only does this
control system algorithm not lend itself very well to vectorization, it
itself is used in a non-vectorized fashion. A great number of cases
with varying control system requirements are run in a non-linear
fashion in order to determine control system performance and robustness
at extrema.
Again, my intent on mentioning looping & arrays (where the
inefficiency seems to lie), was simply to inform the community.
Perhaps someone is currently working on some octave code that may find
the observations of use. Given how well octave does with most of the
common vector and matrix functions, I would propose that if folks are
looking at speed improvement strategies for the octave code, then there
is much gold to mine here with looping & arrays. More so, I would say
then further improvements to matrix & vector functions, since for the
most part they are on par with matlab, if not better.
Yeah, I have some viterbi codes that also couldn't be vectorized. In this
case, they should be implemented as oct-files....

The improvement in loops can only come from a complete rewrite of the octave
parser, trying to get it to predict the types of arguments early and spit
out byte-code for stuff it runs several times. Much like what matlab does.
However, the amount of work involved in this is overwhelming and you won't
see anything soon on this front.

Regards
David
--
David Bateman ***@motorola.com
Motorola CRM +33 1 69 35 48 04 (Ph)
Parc Les Algorithmes, Commune de St Aubin +33 1 69 35 77 01 (Fax)
91193 Gif-Sur-Yvette FRANCE

The information contained in this communication has been classified as:

[x] General Business Information
[ ] Motorola Internal Use Only
[ ] Motorola Confidential Proprietary



-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------
Henry F. Mollet
2004-03-09 22:50:47 UTC
Permalink
Post by David Bateman
Vectorize code wherever possible!!!!! Octave doesn't have JIT which is
where Matlab wins on this case.
Can someone please tell me what JIT is, and explain why Octave does not
have/could not do JIT, if Octave has/does almost everything else that Matlab
has/does?
Henry



-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------
Paul Kienzle
2004-03-10 00:48:14 UTC
Permalink
Post by Henry F. Mollet
Post by David Bateman
Vectorize code wherever possible!!!!! Octave doesn't have JIT which is
where Matlab wins on this case.
Can someone please tell me what JIT is, and explain why Octave does not
have/could not do JIT, if Octave has/does almost everything else that Matlab
has/does?
Just In Time compiler.

For best effect, you want to turn:

function total=sum(x)
total=0; for i=1:n, total+=x(i); end

into something like:

double total = 0;
int i; for (i=0; i < n; i++) total += x[i];
return total;

If that looks easy, consider that x may be real or complex or
a sparse matrix or a galois field or a indefinite precision
number, ... And consider that the size of x may be less than
n. And consider that x may be a function (matlab syntax does
not distinguish between functions and array indexing), which
returns a different value type for each i. You don't know until
you call the function what the type of x is.

The function case is particularly tricky since there are no
guarantees. For example, I can override the builtin sin
function so that it returns strings. Or the function that I'm
calling could replace a builtin function during the loop.

Yes you can build in checks to make sure the results are
as expected. If you are not careful, though, you will end
up rebuilding the octave interpreter, and it won't be any
faster.

We could probably hack in some simple cases which
would make the benchmarks look good. Some of them
would even be practical. I'm sure if you poke at the matlab
interpreter enough you will find places where their
optimization doesn't result in any speed up. Try for example
putting an eval statement in the loop (after eval, all bets
are off).

Paul Kienzle
***@users.sf.net



-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------
Paul Thomas
2004-03-10 06:12:09 UTC
Permalink
And for those of you that live in North America, here is an opportunity
that I found, whilst researching JIT a couple of weeks back:

Work on the Just In Time (JIT) Compiler Team to improve MATLAB performance.

JIT Compiler Team Member

10/14/2003

Natick, MA

The JIT (Just In Time compiler) team is a small group within The
MathWorks charged with improving the performance of MATLAB to near
FORTRAN / C levels. Because MATLAB is a typeless, interactive language,
this effort involves technical challenges rarely seen in other
languages. We are looking for another team member who is experienced,
creative, can work effectively independently as well as in a group, and
is willing to expand the state of the art while delivering value to tens
of thousands of customers.

For the reasons that Paul describes below, "technical challenge" is an
understatement. I prefer the Japanese phrase - chotto muri desu-ka >
"it's slightly impossible"

Paul T
Post by Paul Kienzle
If that looks easy, consider that x may be real or complex or
a sparse matrix or a galois field or a indefinite precision
number, ... And consider that the size of x may be less than
n. And consider that x may be a function (matlab syntax does
not distinguish between functions and array indexing), which
returns a different value type for each i. You don't know until
you call the function what the type of x is.
-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------
Paul Thomas
2004-03-12 17:07:56 UTC
Permalink
I have been asking around about destructors and octave classes. I
couldn't get a satisafactory answer, so here are the questions for the
list:

(i) Why do the derived classes at the bottom of the inheritance tree,
like Matrix or ColumnVector, not have destructors?

(ii) The following compiles and doesn't obviously cause any runtime
problems:

Matrix x( irow, icol );
......
delete [] &x ( 0 , 0 );

Is this accomplishing deallocation of the space allocated to x, or does
it cause memory leakage?

If these seem trivial to you, please excuse me, I am still something of
a C++ tyro.


Paul T



-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------

Continue reading on narkive:
Loading...