Skein, Threefish, and Mono.Simd — Part 1
Thursday, October 1st, 2009Mono.Simd is a developing framework introducing SIMD intrinsics into the .NET/Mono framework. Ever since Miguel introduced the world to Mono.Simd, I have been very interested in getting my hands dirty with the new API.
SIMD, or Single Instruction Multiple Data, instructions are special instructions that can concurrently perform the same or related functions on multiple data sets, or vectors, e.g. adding four integers to four other integers, respectively, as opposed to summing each pair in sequence. SIMD instructions on x86 processors use special XMM registers. Examples of SIMD instruction sets include the SSE family and AMD’s 3DNow!. SIMD instructions are most effectively put to use processing large data sets that have similar sets of operations repeated across the data set. Such effective applications include graphical rendering and some physics simulations.
Unfortunately for myself, I am mainly a front-end applications developer by trade. Thus, the opportunity to work with low-level intrinsic functions hadn’t come around in the nine months since their announcement, so I deliberately set out to find a little side project wherein I could test out the new instructions.
There is very little out there on the Internet regarding the Mono.Simd API except for Miguel’s blog post, the Mono source code, and the Monodoc. With that in mind, I found a promising candidate for implementing with Mono.Simd in one of the SHA-3 hash algorithm candidates, specifically Skein. The project’s homepage noted that there were already two implementations of the algorithm in .NET, which gave me a good basis to start from.
I chose the in-progress implementation of Skein by Alberto Fajardo as the basis for my work since it looked to be pretty clean and meshed alright with the BCL standards for System.Security.Cryptography with a few exceptions. Fajardo’s implementation is set up so that the Threefish cipher runs are completely unrolled, meaning no loop overhead in a function already a likely candidate for a tight loop. I chose Threefish256 to implement using Mono.Simd due to its relative simplicity. My preliminary tests1 with Threefish256 showed an average speed of 9.2 ns per encrypt operation.
An example of the first subkey permutation and four rounds of Skein 1.2:
Mix(ref b0, ref b1, 14, k0, k1 + t0); Mix(ref b2, ref b3, 16, k2 + t1, k3); Mix(ref b0, ref b3, 52); Mix(ref b2, ref b1, 57); Mix(ref b0, ref b1, 23); Mix(ref b2, ref b3, 40); Mix(ref b0, ref b3, 5); Mix(ref b2, ref b1, 37);
Based on this, my working hypothesis is that I will be able to get the SIMD version of the function to run in about 75-55% of the time of the non-SIMD version, or 7–5 ns. This series will continue in my next post, when I’ll discuss some of the progress I’ve made and issues I’ve encountered.
- Tests were performed on an EeePC 900A, with an Intel Atom x86 32-bit processor and the Mono 2.4 framework. Tests were run for 100,000,000 iterations unrolled in sets of 100 and averaged to get the stated results. [↩]







