With 32bpp base set about 15-20% faster in the Draw function (slower with 8bpp base set). Overall, with 32bpp base set, about 5% faster.
With 32bpp base set about 40% faster than 32bpp-optimized, or about 10% for 8bpp base sets in the Draw function. Respectively about 8 and 1% of total run time