Performance comparison of streamed encryption and compression
In the last post I discussed encrypting users' data files when they move around in a distributed system, such as Chipster. The big practical question was not answered: Will it be fast enough?
To answer that I did some performance tests. The first question is the choice of cipher. When using Java, the obvious solution is to look at Java's default algorithms (offered by SunJCE provider, which is bundled with J2SE). There are quite a few of them, but basically two of them are viable: RC4 algorithm that is very fast, but not very secure, and the new standard AES, that is quite fast also, and safe (until proven otherwise). Other algorithms are either not as secure as AES, or slower than RC4 without significant improvements in security. Blowfish might have been used instead of AES, but I went with AES because it is more popular. For more information on ciphers, see:
There is also the question of key size: 128 bit is not perfect, but larger key sizes might cause troubles because they are regulated by US export laws and not available in all J2SE bundles. With RC4 key size does not slow down encryption, but with AES it does. In these tests 128 bit keys were used, with default padding and default cipher mode. Future is hard to predict, but 128 bit keys should probably be safe for some decades, if used correctly and carefully. For caching data on a file broker that can be considered to be enough. For more information on key sizes, see:
With Java it is easy to encrypt data on-fly. It is also easy to add compression on top of that, so I tested on-fly compression too. For that I used Java's DeflaterOutputStream (GZIP algorithm). Compression is done in BEST_SPEED mode, which compresses practically as well as stronger modes, but is a lot faster.
Before each measurement there was an extensive burn in to make sure that disk cache and CPU are "hot". Test iterated through 4 encryption parameters and 2 compression parameters (on/off), making total of 8 parameter combinations. Tested encryption algorithms were RC4, SSL (SSL_RSA_WITH_RC4_128_MD5) and AES. SSL is not directly comparable as it decrypts also and produces plain files in the end, but it was used as a reference point.
Each test result was averaged over 5 repeated runs. The test was done in 3 different environments. First one was disk-to-disk, which tested the theoretical properties of on-fly processing options. SSL was not tested in this case, because it would have required some special arrangements. The second was fast network, in practice 100 Mb/s LAN connection to the server. The third environment was slow network, a home ADSL connection with slow upstream bandwidth.
The transfer payload was a large scientific data file in text format. So it compresses nicely. Tests took 10-30 minutes to run. For the slow network case, a reduced version of the file was used to keep running times reasonable.
There was some fluctuation during the test runs, so you should mentally adjust to 10-20% tolerance when interpreting the results. Between the test sessions fluctuation was quite large. For that reason and for better readability, I normalised the results. The raw connection case was given value of 100 (percent). Rest of the values on the same row are proportional to that. So a value close to 100 is very good, a value around 50 means half of the transfer speed etc.
The actual raw transfer speeds were: 139.0 Mb/s for disk-to-disk, 33.4 Mb/s for fast network and 0.1 Mb/s for slow network. For the previously mentioned reasons, these numbers have to be taken with multiple grains of salt.
We first look into compression. When transfer speeds are good, compression slows things down, even with highly compressible data. However in the slow network case you get 10 times faster transfers, which is quite a significant boost. For many uses compression is not worth it when looking at these numbers. With less compressable data the boost will not be as good, but on the other hand, slow down should be about the same. Hence the price you pay is small, and even smaller when encryption is used because compression reduces the amount of bytes to encrypt. When you factor in the saved disk space on the file server, compression starts to look quite nice in a real network environment.
From encryption results you can see that RC4 is the clear winner, and SSL and AES have a tie. Actually in both network cases RC4 performance is equal to not encrypting at all. The only difference was about 30-50% CPU load on one core when network speed was high; for slow network there was no significant CPU load.
What SSL in this case actually does is RC4 encryption and decryption, plus the message authentication coding (HMAC). With AES, you get stronger encryption for the same performance penalty. Though you have to bear in mind that encryption alone does not give tamper proof connections. For best security, I believe you should use SSL and force it to internally use a secure hash function, but no encryption. Then you should encrypt the contents with strong AES before passing to SSL. If you use AES inside SSL, then files end up unencrypted in the end, what we wanted to avoid.
My conclusion would be that everyone was a winner in this competition. Compression over network connections did fairly well. RC4 provided very good performance with fairly good encryption. AES on the other hand provides very good encryption with fairly good performance. If more complete end-to-end security and trust is needed, then SSL with properly chosen settings can be used. You just have to pick the one that suits your needs best. For our specific needs RC4 seems to fit surprisingly well.