Array speeds

Previous topic - Next topic

fuzzy70

While reading an old programming book (2003 so not uber old) I got onto a section about memory access, cache access etc & found something I thought was interesting so decided to test it in GLB. The book is about game coding & the code is in C++ but it's the concepts I was reading it for. Most will prob not find it interesting or any use for it but I found it of interest  :whistle:

Basically (no pun intended) it was saying about how accessing arrays in the wrong order can have an effect on speed & cache hits. His results had more dramatic differences then mine but that was probably down the the cpu's of the time & their respective cache sizes, however I did witness noticeable differences with my translated version of the code.

Code (glbasic) Select

LOCAL n%,n2%, i%, j%, k%, TimeStart, TimeEnd, TimeTaken, DummyVar
n=250
n2=10000
DIM TestData3d[n][n][n]




PRINT "Writing Data to the 3d array",0,0

// Fill 3d array in Column order
TimeStart = GETTIMERALL()
FOR k=0 TO n-1
FOR j=0 TO n-1
FOR i=0 TO n-1
TestData3d[i][j][k]=0
NEXT
NEXT
NEXT
TimeEnd = GETTIMERALL()
TimeTaken = TimeEnd - TimeStart
PRINT "Column order took = "+TimeTaken,0,10

// Fill 3d array in Row order
TimeStart = GETTIMERALL()
FOR i=0 TO n-1
FOR j=0 TO n-1
FOR k=0 TO n-1
TestData3d[i][j][k]=0
NEXT
NEXT
NEXT
TimeEnd = GETTIMERALL()
TimeTaken = TimeEnd - TimeStart
PRINT "Row order took = "+TimeTaken,0,20

SHOWSCREEN

MOUSEWAIT

PRINT "Reading Data from the 3d array",0,0

// Read 3d array in Column order
TimeStart = GETTIMERALL()
FOR k=0 TO n-1
FOR j=0 TO n-1
FOR i=0 TO n-1
DummyVar=TestData3d[i][j][k]
NEXT
NEXT
NEXT
TimeEnd = GETTIMERALL()
TimeTaken = TimeEnd - TimeStart
PRINT "Column order took = "+TimeTaken,0,10

// Read 3d array in Row order
TimeStart = GETTIMERALL()
FOR i=0 TO n-1
FOR j=0 TO n-1
FOR k=0 TO n-1
DummyVar=TestData3d[i][j][k]
NEXT
NEXT
NEXT
TimeEnd = GETTIMERALL()
TimeTaken = TimeEnd - TimeStart
PRINT "Row order took = "+TimeTaken,0,20

SHOWSCREEN

MOUSEWAIT

// Clear 3d array from mem & create 2d one
DIM TestData3d[0][0][0]
DIM TestData2d[n2][n2]

PRINT "Writing Data to the 2d array",0,0

// Fill 2d array in Column order
TimeStart = GETTIMERALL()
FOR j=0 TO n2-1
FOR i=0 TO n2-1
TestData2d[i][j]=0
NEXT
NEXT

TimeEnd = GETTIMERALL()
TimeTaken = TimeEnd - TimeStart
PRINT "Column order took = "+TimeTaken,0,10

// Fill 2d array IN Row order
TimeStart = GETTIMERALL()
FOR i=0 TO n2-1
FOR j=0 TO n2-1
TestData2d[i][j]=0
NEXT
NEXT

TimeEnd = GETTIMERALL()
TimeTaken = TimeEnd - TimeStart
PRINT "Row order took = "+TimeTaken,0,20

SHOWSCREEN

MOUSEWAIT

PRINT "Reading Data from the 2d array",0,0

// Read 2d array in Column order
TimeStart = GETTIMERALL()
FOR j=0 TO n2-1
FOR i=0 TO n2-1
DummyVar=TestData2d[i][j]
NEXT
NEXT

TimeEnd = GETTIMERALL()
TimeTaken = TimeEnd - TimeStart
PRINT "Column order took = "+TimeTaken,0,10

// Read 2d array in Row order
TimeStart = GETTIMERALL()
FOR i=0 TO n2-1
FOR j=0 TO n2-1
DummyVar=TestData2d[i][j]
NEXT
NEXT

TimeEnd = GETTIMERALL()
TimeTaken = TimeEnd - TimeStart
PRINT "Row order took = "+TimeTaken,0,20

SHOWSCREEN

MOUSEWAIT


BTW originally they where separate tests I wrote but then lumped them into one prog & the results where still the same, also trying ints rather than floats produced the same results.

On my system filling the 3d array was on average twice as fast in column order then row order, while reading there was little in it but row order was slightly quicker. The difference does scale with the the size of the array for example the 2d fill with 10000x10000 elements was 3.5 times quicker in column order whereas with 5000x5000 elements around 2 times quicker.

How different the results would be on other platforms like iPhone etc I do not know as not having any of the other platforms I'm unable to test.

If you are filling any large arrays in your project during runtime (except at the initial start) it might be an idea to run this test on the device to see what works quicker if at all. Who knows you may gain a few millisecs that could be used elsewhere or give a boost to updates  :D

Lee
"Why don't you just make ten louder and make ten be the top number and make that a little louder?"
- "These go to eleven."

This Is Spinal Tap (1984)

hardyx

#1
Quote from: fuzzy70 on 2012-Feb-26
On my system filling the 3d array was on average twice as fast in column order then row order, while reading there was little in it but row order was slightly quicker.
Sometimes the C compiler optimizes the loops with pointers. In many languages row data are contiguous in memory. When you use row order you are accesing the addresses: d, d+1, d+2, d+3, etc.  When you access using column order you are accessing the addresses: d, d+rowsize*1, d+rowsize*2, d+rowsize*3, etc. An addition is faster than a multiplication.

I think GLBasic makes the arrays putting first the external indexes and the last index is the most internal, like other languages. But now I don't know with the results of your test.

fuzzy70

This was his original code

Code (glbasic) Select

const int g_n = 250;
float TestData[g_n][g_n][g_n];

inline void column_ordered()
{
   for (int k=0; k<g_n; k++)
      for (int j=0; j<g_n; j++)
         for (int i=0; i<g_n; i++)
            TestData[i][j][k] = 0.0f;
}

inline void row_ordered()
{
   for (int i=0; i<g_n; i++)
      for (int j=0; j<g_n; j++)
         for (int k=0; k<g_n; k++)
            TestData[i][j][k] = 0.0f;
}



& his results

Column Ordered=2817 ms   Row Ordered=298 ms

There is no mention of the cpu used but prob P3 or an early P4, the compiler was Visual C++ .net.

His findings are like you said
QuoteIn many languages row data are contiguous in memory
but as you can see mine were the opposite & column order was faster. How to convert the C++ into something I can compile & run in GCC or Visual C++ express that I have is beyond me at current as do not have that much knowledge of C++, just enough to pick some parts out  :D

Lee

"Why don't you just make ten louder and make ten be the top number and make that a little louder?"
- "These go to eleven."

This Is Spinal Tap (1984)

hardyx

#3
Check this code, you can compile in GCC or Visual C++ like console application.

Code (glbasic) Select
#include <stdio.h>
#include <time.h>

// put ordering functions here
//.....

inline clock_t GetTime() {
  return clock();
}

int main()
{
  clock_t timeinit;

  timeinit = GetTime();
  column_ordered();
  clock_t coltime = GetTime() - timeinit;

  timeinit = GetTime();
  row_ordered()
  clock_t rowtime = GetTime() - timeinit;

  printf("Columns time: %.3f tics\n", (double)coltime);
  printf("Rows time: %.3f tics\n", (double)rowtime);

  return 0;
}


P.D. I fixed the data definitions

fuzzy70

Thank you hardyx for that, very much appreciated as I would not been able to add the bits you posted without some serious headaches, like I said my C++ is far from great but I managed to get it to work by looking at the compile errors & hit & miss additions  :D

This was the code I ended up with

Code (glbasic) Select

#include <stdio.h>
#include <time.h>

// put ordering functions here
//.....
const int g_n = 250;
float TestData[g_n][g_n][g_n];

inline void column_ordered()
{
   for (int k=0; k<g_n; k++)
      for (int j=0; j<g_n; j++)
         for (int i=0; i<g_n; i++)
            TestData[i][j][k] = 0.0f;
}

inline void row_ordered()
{
   for (int i=0; i<g_n; i++)
      for (int j=0; j<g_n; j++)
         for (int k=0; k<g_n; k++)
            TestData[i][j][k] = 0.0f;
}


inline clock_t GetTime() {
  return clock();
}

int main()
{
  float coltime;
  float rowtime;
  clock_t timeinit;

  timeinit = GetTime();
  column_ordered();
  coltime = GetTime() - timeinit;

  timeinit = GetTime();
  row_ordered();
//  clock_t
  rowtime = GetTime() - timeinit;

  printf("Columns time: %.3f tics\n", (double)coltime);
  printf("Rows time: %.3f tics\n", (double)rowtime);

  return 0;
}


I presume I done right as it compiles in GCC & executes,  also I take it that 1 Tic = 1ms?

The results where like the ones from the book in that rows where quicker than columns

Columns time: 378.000 tics
Rows time: 40.000 tics

GLB results
Columns time: 157.785
Row Time: 374.704

As you can see quite a different ratio between columns vs rows.
"Why don't you just make ten louder and make ten be the top number and make that a little louder?"
- "These go to eleven."

This Is Spinal Tap (1984)

dreamerman

Interesting topic, so just for comparison same code in FreeBasic:

Code (glbasic) Select
dim as double TimeStart, TimeEnd, TimeTaken, DummyVar
dim as integer n,n2, i, j, k
n=250
n2=10000
reDIM as integer TestData3d(n,n,n)

cls
sleep 1000
PRINT "Writing Data to the 3d array"

'' Fill 3d array in Column order
TimeStart = timer()
FOR k=0 TO n-1
FOR j=0 TO n-1
FOR i=0 TO n-1
TestData3d(i,j,k)=0
NEXT
NEXT
NEXT
TimeEnd = timer()
TimeTaken = TimeEnd - TimeStart
PRINT "Column order took = "+str(TimeTaken)

'' Fill 3d array in Row order
TimeStart = timer()
FOR i=0 TO n-1
FOR j=0 TO n-1
FOR k=0 TO n-1
TestData3d(i,j,k)=0
NEXT
NEXT
NEXT
TimeEnd = timer()
TimeTaken = TimeEnd - TimeStart
PRINT "Row order took = "+str(TimeTaken)

sleep 1000
PRINT "Reading Data from the 3d array"

'' Read 3d array in Column order
TimeStart = timer()
FOR k=0 TO n-1
FOR j=0 TO n-1
FOR i=0 TO n-1
DummyVar=TestData3d(i,j,k)
NEXT
NEXT
NEXT
TimeEnd = timer()
TimeTaken = TimeEnd - TimeStart
PRINT "Column order took = "+str(TimeTaken)

'' Read 3d array in Row order
TimeStart = timer()
FOR i=0 TO n-1
FOR j=0 TO n-1
FOR k=0 TO n-1
DummyVar=TestData3d(i,j,k)
NEXT
NEXT
NEXT
TimeEnd = timer()
TimeTaken = TimeEnd - TimeStart
PRINT "Row order took = "+str(TimeTaken)

'' Clear 3d array from mem & create 2d one
redim TestData3d(0,0,0)
reDIM as integer TestData2d(n2,n2)

sleep 1000
PRINT "Writing Data to the 2d array"

'' Fill 2d array in Column order
TimeStart = timer()
FOR j=0 TO n2-1
FOR i=0 TO n2-1
TestData2d(i,j)=0
NEXT
NEXT

TimeEnd = timer()
TimeTaken = TimeEnd - TimeStart
PRINT "Column order took = "+str(TimeTaken)

'' Fill 2d array IN Row order
TimeStart = timer()
FOR i=0 TO n2-1
FOR j=0 TO n2-1
TestData2d(i,j)=0
NEXT
NEXT

TimeEnd = timer()
TimeTaken = TimeEnd - TimeStart
PRINT "Row order took = "+str(TimeTaken)

sleep 1000
PRINT "Reading Data from the 2d array"

'' Read 2d array in Column order
TimeStart = timer()
FOR j=0 TO n2-1
FOR i=0 TO n2-1
DummyVar=TestData2d(i,j)
NEXT
NEXT

TimeEnd = timer()
TimeTaken = TimeEnd - TimeStart
PRINT "Column order took = "+str(TimeTaken)

'' Read 2d array in Row order
TimeStart = timer()
FOR i=0 TO n2-1
FOR j=0 TO n2-1
DummyVar=TestData2d(i,j)
NEXT
NEXT

TimeEnd = timer()
TimeTaken = TimeEnd - TimeStart
PRINT "Row order took = "+str(TimeTaken)

sleep


And my results on P E2180 (repeated several times):
Code (glbasic) Select
GLB FB
writing to 3d array
column order: 0.170 sec - 0.703 sec
row order: 0.699 sec - 0.089 sec
reading from 3d array
colum order: 0.137 sec - 0.65 sec
row order: 0.120 sec - 0.086 sec
writing to 2d array
column order: 0.986 sec - 8.092 sec
row order: 6.791 sec - 0.808 sec
reading from 2d array
column order: 0.656 sec - 5.20 sec
row order: 0.684 sec - 0.455 sec


So basically in GLB is better to use column order :)
Check my source code editor for GLBasic - link Update: 20.04.2020

hardyx

#6
In the C code the rows ordering is faster, because what I said about the contiguous memory. Is the expected result. Tics duration depends on the sytem, you can use 1/CLOCKS_PER_SEC constant to convert to seconds.

GLBasic results are different because arrays in GLB are dynamic structures (I think) and they may be ordered in memory in a different form than normal C arrays.

fuzzy70

Quote from: Ocean on 2012-Feb-28
I don't get it, why do you call it 'GLB Results', where everything you do in that last snippet is straight C?

Ocean

The GLB Results where from the GLB program at the start of the thread, I mentioned the speed difference but not the timings. my bad

Lee
"Why don't you just make ten louder and make ten be the top number and make that a little louder?"
- "These go to eleven."

This Is Spinal Tap (1984)