May I suggest a speed up for x_sprite.
Rather than calculate the view angle every billboard (X_Sprite) individually, is it possible to have the option to use one global angle for every billboard in a scene. I realise this can cause some slight visual quirks but as each x_sprite needs a view to camera angle calculation, this gets expensive when using thousands of billboards (particles).
My particle system is quite efficient but Im getting slowdowns at about 1000 particles... in C and straight openGL, I can hit 10000 particles...
I'll try to see what I can archieve.