Author Topic: FASTMEM2SPRITE  (Read 5083 times)

Offline Qedo

  • Dr. Type
  • ****
  • Posts: 373
  • to program what I have do how should programming?
    • View Profile
FASTMEM2SPRITE
« on: 2017-Nov-11 »
For my application I found a certain slowness with MEM2SPRITE so I searched in the GLBASIC forum for a solution to this problem without finding anything.
So I wrote FASTMEM2SPRITE using the OpenGL routines. The result it is very fast and in some conditions, in high sprite resolution, even a 6X ratio on my computer. In Android, the increase is even slightly higher. The syntax is the same, obviously by adding FAST to the command. The only difference is that to draw the sprite because of the different coordinates of the OpenGL screen (bottom / left) you have to use the ZOOMSPRITE nsprite, 0,0, 1, -1 (vertical mirror).
Tried on Win and Android.
If you have better solution then you will be happy to receive your jobs.
Use it freely and let me know.
Ciao

Offline dreamerman

  • Global Moderator
  • Dr. Type
  • *******
  • Posts: 443
    • View Profile
    • my personal website
Re: FASTMEM2SPRITE
« Reply #1 on: 2017-Nov-13 »
Very nice indeed, short and clean. Currently I don't have any use for it but with this speed can be surely used for some in-game effects.
Another good thing is that you can get CreateScreen out of function and this will speed it two times - that's good for effects on already created sprites/textures.
Check my source code editor for GLBasic - link Update: 20.04.2020

Offline Qedo

  • Dr. Type
  • ****
  • Posts: 373
  • to program what I have do how should programming?
    • View Profile
Re: FASTMEM2SPRITE
« Reply #2 on: 2017-Nov-25 »
Thank you dreamerman for trying FASTMEM2SPRITE.
I have made new and more precise benchmarks and these results for 50 cycles:

Win10:
 MEM2SPRITE 17000 millisec
 FASTMEM2SPRITE 500 millisec
 ratio 34

Android 6.0:
 MEM2SPRITE 27000 millisec
 FASTMEM2SPRITE 270 millisec
 ratio 100

Unexpected FASTMEM2SPRITE  in Android is faster than in Win.
Ciao

Offline Kitty Hello

  • code monkey
  • Administrator
  • Prof. Inline
  • *******
  • Posts: 10832
  • here on my island the sea says 'hello'
    • View Profile
    • http://www.glbasic.com
Re: FASTMEM2SPRITE
« Reply #3 on: 2017-Nov-25 »
Oooh!  =D. What crap did I program, then!?

Offline r0ber7

  • Prof. Inline
  • *****
  • Posts: 552
    • View Profile
Re: FASTMEM2SPRITE
« Reply #4 on: 2018-Aug-01 »
Thanks, I have exactly the place to use this. :-)

Offline Qedo

  • Dr. Type
  • ****
  • Posts: 373
  • to program what I have do how should programming?
    • View Profile
Re: FASTMEM2SPRITE
« Reply #5 on: 2022-Aug-17 »
Hi all, finally after Fastmem2sprite here is the solution also for Fastsprite2mem.
The combination of the two functions gives fastest  results, almost 400 fps.
Unfortunately Fastsprite2mem (unlike Fastmem2sprite) only works only with 2 ^ n size dims and also does not work with OpenglES because the glGetTexImage function is missing,  :rant:.
I added the optional mirror parameter for both to reverse the image since opengl puts it upside down.
Try and let them know. I think you will use them in your programs  =D
Ciao

Offline dreamerman

  • Global Moderator
  • Dr. Type
  • *******
  • Posts: 443
    • View Profile
    • my personal website
Re: FASTMEM2SPRITE
« Reply #6 on: 2022-Aug-20 »
Great work, thanks for sharing  :good:  this is something that could be used to create interesting 2d effects along shaders.
I'm using your FastMem2Sprite in my Steam API wrapper (little modified) for copying users avatar pictures to sprite and that's why I had some question about code responsible for flipping image. In your code one of lines is:
Code: (glbasic) [Select]
unsigned char* high = (unsigned  char*) &pixels((height -1) * width);in c++ examples the last variable isn't 'width' but 'stride' -> 4*width, and I was wondering why it's working here, then looking at rest of code reminded me that as function argument you are using glb int array without casting it to 'uint char/byte'. This raised another question, glb int in 32bit are 4bytes, but in 64bit are they 4 or 8bytes? But after compiling to 64bit win it looks that they are still 4bytes and code is working without problems. One thing here, not sure if
Code: (glbasic) [Select]
typedef unsigned int size_t;is really needed, it will compile without this in 32bit, and in 64bit it will give an error as size_t is already defined somewhere in c++ libs, it would be good to have at least some check here.
One thing that gave me a headache (as I forgot about it), was that 'power of 2 requirement' for texture size, I switched texture for some 720p image and it wasn't working, it took me couple minutes to figure out that it needs to be ^2 :D but that's really not an issue, as for any tricks we can use larger sprite/texture.

And thing about OpenGL ES, yeah unfortunately it lack's glGetTexImage but similar result could be achieved with glReadPixels like in this code: https://stackoverflow.com/questions/53993820/opengl-es-2-0-android-c-glgetteximage-alternative
did You checked something like this? I don't have Android toolset now, so can't play with this, but one of most desired use case would be on Android to simulate some shaders that are not available in GLB on android.

Ah, I would forget but here are result's of this benchmark on old i5-4300u w Intel HD 4400 iGPU:
FastS2M + FastM2S -> 115cps
GLBS2M + GLBM2S -> 9.8cps
FastS2M + GLBM2S -> 10.7cps
GLBS2M + FASTM2S -> 54.5cps
With such speed it could be used to some real-time effects on smaller textures.
Check my source code editor for GLBasic - link Update: 20.04.2020

Offline Qedo

  • Dr. Type
  • ****
  • Posts: 373
  • to program what I have do how should programming?
    • View Profile
Re: FASTMEM2SPRITE
« Reply #7 on: 2022-Aug-21 »
Right considerations dreamerman even if something I don't understand.
I answer where I can.

A) Size of int:
   In both Win32 and Win64, int and unsigned int, equal 32 bits, see Range of values ??https://en.cppreference.com/w/c/language/arithmetic_types#Data_models
B) Win64:
   it is correct as it is does not compile, it is necessary;
   1) delete typedef unsigned int size_t;
   2) vary these 2 lines in:
        extern "C" void * memcpy (void * destination, const void * source, unsigned int num);
        extern "C" void * memset (void * ptr, int value, unsigned int num);

   or:
   
   delete typedef unsigned int size_t;

   obviously size_t in Win32 Win64 is already declared but I don't understand why it gives problems only in Win64

C) OpenGL ES:
   glGetTexImage instead of glReadPixels could be done but notoriously glReadPixels is very slow and so I didn't even consider it.
   On the internet, in addition to the example you gave me, other examples are at least for OpenGL ES2.0 with commands incompatible with GLBasic version 1.1.

D) "unsigned char * high = (unsigned char *) & pixels ((height -1) * width);"
   Unfortunately I'm not much of a C ++ expert but "((height -1) * width)" returns the address of the beginning of the last line of an integer array.
   Example with a 4x2 matrix "((height -1) * width)" will be = 4 which with the cast (unsigned char *) will become 16. So with "unsigned char * high" high points to the sixteenth byte address. At least I have reasoned that way and it seems to work.
   
   The variable "stride" instead represents the length in bytes of the sprite line

E) Instead I don't understand:
   "in c ++ examples the last variable isn't 'width' but 'stride' -> 4 * width, and I was wondering why it's working here, then looking at rest of code reminded me that as function argument you are using glb int array without casting it to 'uint char / byte' "

Sorry if I answered without a specific order and a bit at random but I hope it's clear
Thank you
Ciao

Offline dreamerman

  • Global Moderator
  • Dr. Type
  • *******
  • Posts: 443
    • View Profile
    • my personal website
Re: FASTMEM2SPRITE
« Reply #8 on: 2022-Aug-21 »
Don't feel bad, sometimes I just overcomplicate and write something confusing. :D

A) just wasn't sure what GLB Integer type are in reality (I should look into the source :D), just remembered that (maybe in some other Basic language) some built-in type could have different bit width, and that stuck into my brain :giveup:

B) really not an issue, one pre-processor command will help with this

C) sadly, that's why I suspected :( Thanks for clarifying that.

D) + E) Yes, you are right with explanation, here my doubt raised because in original C++ method that I saw/am using pixel array is also unsigned char type, so one element is 1byte (each ARGB as separate array element), glb ints are 4bytes (and contain whole ARGB color) so you don't need to use multiplication in that formula to get proper address. That's why I also worried about (A), but it's all clear now.

my modified versions looks like this, full inline:
Code: (glbasic) [Select]
INLINE
int FastMem2Sprite_uint(unsigned char *pixels, int sprite_id, unsigned int posx, unsigned int posy, unsigned int width, unsigned int height) {
    int tx_id;

    tx_id = get_sprite_texture(sprite_id);

    // code from: https://codereview.stackexchange.com/questions/29618/image-flip-algorithm-in-c
    // flip image
    const unsigned int stride = width*4;      // 4 bytes per pixel
    unsigned char* row = (unsigned  char*) malloc(stride);
    unsigned char* low = (unsigned  char*) &pixels[0];
    unsigned char* high = (unsigned  char*) &pixels[(height-1) * stride];
    for (; low < high; low += stride, high -= stride) {
        memcpy(row, low, stride);
        memcpy(low, high, stride);
        memcpy(high, row, stride);
    }
    free(row);
    // end flip image

    //const void* low1 = (unsigned  char*) &pixels(0);    // not needed
glEnable(GL_BLEND);
glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA);
// bind this texture to gl context
glBindTexture(GL_TEXTURE_2D, tx_id);
// change the image pixel data for the bound image
glTexSubImage2D(GL_TEXTURE_2D, 0, (int)posx, (int)posy, (int)width, (int)height, GL_RGBA, GL_UNSIGNED_BYTE, &pixels[0]);
}
ENDINLINE

Again both functions are great, and FM2S is really handy as it allows to paste one image into specified region of larger texture, so not only 1:1 conversion are possible.
Check my source code editor for GLBasic - link Update: 20.04.2020

Offline Qedo

  • Dr. Type
  • ****
  • Posts: 373
  • to program what I have do how should programming?
    • View Profile
Re: FASTMEM2SPRITE
« Reply #9 on: 2022-Aug-22 »
Your solution that allows you to add a smaller or smaller image into a larger one is really functional. Great  :booze:

but two things don't work on my system:
1) the square brackets in
   unsigned char * low = (unsigned char *) & pixels [0] ;
   unsigned char * high = (unsigned char *) & pixels [(height -1) * width];
 
  with them the program does not compile with error: "no match for 'operator []"
 
2) the variable "stride" crashes the program
unsigned char * high = (unsigned char *) & pixels [(height-1) * stride];

Offline dreamerman

  • Global Moderator
  • Dr. Type
  • *******
  • Posts: 443
    • View Profile
    • my personal website
Re: FASTMEM2SPRITE
« Reply #10 on: 2022-Aug-22 »
Ah yes, it must be called from inline so arguments have proper types, but it should compile without problems, I've pasted it into your project at the end of *_OGL file and it compiles properly. I'm using it in this way:
Code: (glbasic) [Select]
// put Steam Image from img_handle to GLB sprite with specified ID
// posy -> needs to be calculated to OpenGL notation - from left bottom corner upwards
FUNCTION putSteamImageOnSprite%: img_handle%, sprite_id%, posx%, posy%
    INLINE
        uint32 cwidth, cheight;

        SteamAPI_ISteamUtils_GetImageSize(si_steamutils, img_handle, &cwidth, &cheight);
        //printf("image size: %1d x %1d \n", cwidth, cheight);
        const int cSizeInBytes = cwidth * cheight * 4;
        uint8 *pavatarImage = new uint8[cSizeInBytes];
        SteamAPI_ISteamUtils_GetImageRGBA(si_steamutils, img_handle, pavatarImage, cSizeInBytes);
        FastMem2Sprite_uint(pavatarImage, sprite_id, posx, posy - cheight, cwidth, cheight);

        delete[] pavatarImage;
    ENDINLINE
ENDFUNCTION
main difference to your code is that start x/y position to paste image, nothing more, and you can add that to glTexSubImage2D call without other modifications, for me it's sometimes just easier to use inline like this when working with Steamworks.
As this function is fast and user avatars are small there are no visible slowdowns when they are fetched and pasted into larger sprite.
Check my source code editor for GLBasic - link Update: 20.04.2020