Ok, so many of us tried to use Alchemy opcodes in AS3 projects, years ago. But how many keep using it every day ever after? Not me. Why? Well, where do we start…
What is it good for?
Adobe introduced these opcodes to implement memory access operations required by LLVM, so that Alchemy could do its thing, and Flash Player could run C++ code. But for AS3 people these opcodes did not do anything magical. It was just another way to read numbers from or write to one ByteArray at a time that, coincidentally, worked faster than IDataInput/IDataOutput. So, if you were to do some number crunching, it might save you some milliseconds.
Placing your class fields in domain memory
In C++ you have pointers, but in AS3 you don’t, so placing your class data there will be ugly. But, if you will be able to access them normally (via dot syntax) and faster at the same time, why not try it? So I did: using legacy compiler and azoth, I did this simple test, and… got domain memory based loop running over 50 times slower than plain AS3. WTF?? Turns out domain memory is only fast if you set your ByteArray to use LITTLE_ENDIAN order (it’s the opposite by default). With LITTLE_ENDIAN memory opcodes were about 10 times faster, but this was still not enough to bridge the gap caused by having to use getters and setters now instead of plain variable fields.
Enter the ASC2
ASC2 has two features that make it interesting in this case. First, it can inline method calls. Second, it can emit memory opcodes without 3rd party tools. So, after people reported that this was indeed working for my cause, I opened my FB 4.7 trial and went on to try that myself. I was confused a lot by the fact that none of memory opcodes were present in playerglobal.swc, but it turns out you can ignore that – the code will compile and run any way. This time everything went better than expected – I had domain memory based loop finally running
30 to 50% times faster, although timings fluctuated a lot update: I bumped arrays length to get better numbers on desktop; for MBP/Chrome/11.6 they were on average 220 ms vs 110 ms, for Windows/standalone/11.6 – 250 ms vs 190 ms. Finally, for iPad3/ipa-test/AIR 3.7 they were 200 ms vs 275 ms (memory opcodes loop was slower).
People on twitter were fast to remind me that apparat had similar feature for quite some time. This way is probably the best way to use memory opcodes for people who cannot use ASC2. I wanted to include apparat-based test for this post, too, but it turns out I don’t have it installed here (nor do I have scala). It’s not that hard to install but, as lazy as I am, this is not something I am willing to do for the sake of this blog post. So you are going to have to trust in that it works, because clever people made it.