A byte-oriented AES-256 implementation

2007 November 11
by Ilya

As you may know, I do cryptographic perversions occasionally. Recently I’ve been asked for a compact implementation of AES-256. Code size must be as small as possible, speed is not important and (here is the catch) no assembler. The requesters have tried various public available implementations before, none were fit. So I did mine. It is a straightforward and rather naïve byte-oriented portable C implementation, where all the lookup tables replaced with “on-the-fly” calculations. Certainly it is slower and more subjective to side-channel attacks in general by nature. But this implementation is exactly what was wanted and it made everybody happy.

So, if you want it then there is the source code:

Reddit this / Add to del.icio.us / Digg this!
39 Comments leave one →
2007 November 21

The source code is not correct, I have checked it against AES256 test vectors and the result is completely different !

2007 November 22

So you’ve checked it wrong, check again. The demo.c in zip archive is exactly the test vector for AES-256 from FIPS-197, Appendix C.3

2008 March 7

Hey, buddy! I’ve looked at the source code… What a nightmare! How did you do it? I am beginning to like EnRUPT even more! Wanna see how many minutes it will take you to put together its 8-bit implementation? ;-)

2008 March 18

Ugh, must admit that it was annoying enough :)
I’ve been busy recently, still need to find a time to put my hands on EnRUPT.

2008 March 22
PaulHolland permalink

Hi, I also tried this code and indeed its NOT correct, simply compiled it in C30 (microchip) and the result is noncense. sorry !.

2008 March 22

Ugh… Once again:
a) this is AES-256, not AES-128
b) demo.c is a bloody test vector from FIPS-197
I’ve no idea on how and where you have mistested it, but you’ve tested it wrong. So the nonsense result is all yours. Sorry.

2008 May 5

[...] Ilyas implementation av AES är Byte-orienterad, vilket gör att den enkelt går att kompilera för 8-bits MCU:er. En annan udda sak med den här implementationen är att den inte innehåller en tabell för S-boxen. Istället räknar programmet ut korrekt substitutionsbyte under körning. Detta tar naturligvis tid, men eliminerar 256 Bytes. Tyvärr öppnar detta troligen även för sidoattacker. [...]

Pingback
2008 September 30
jonathan permalink

hello, sorry by my ignorance because I’m new at this, but I just tested your code and the demo works perfect… my question is that I increment the buffer from 16 to 32 and it only encrypt the first 16 bytes of the buffer.
it’s an error from my end? I can encrypt only 16 bytes strings?
hope you can help me.

Thanks

2008 October 18

@jonathan

No, it is not an error. Any block cipher (including AES) works on a single block. If you have more than one block of plaintext then you must apply cipher on each block with a proper mode. You may read more about block cipher modes at http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation

2008 November 19
visitor permalink

Have you take this code as reference?
http://fp.gladman.plus.com/AES/index.htm

Specialy, http://fp.gladman.plus.com/AES/aes-byte-29-08-08.zip ?

2008 November 19

No. Brian made his “no-tables” implementation 8 months later, after our brief discussion.

2008 November 26
Paulo permalink

Hello, I’m brazilian,
is a cool demo and lib,

I’ve one question, I understand that block size is 128 bits (16 bytes),
but I don’t understand if I’ve 17 bytes, have I extend string with null-bytes? byte 90 (nop) or 00 ?

sorry my broken english, ^^
(if you can put one example I’ll be happy)

thanks

2008 November 26

Paulo,

If you do encryption/decryption for your own purposes then you are free to pad any way you like as long as you able to get the actual data size. Null byte (00) or a running sequence (01, 02, 03…) would do just fine. I prefer to pad with a byte == the number of padding bytes (15 in your example)

If you do encryption to be compatible with some other applications then you must follow the whatever applied standards. Check RFC3602, RFC2406 or RFC3852 for example.

2008 November 27
Paulo permalink

If I encrypt using your mthod and decrypt with http://fp.gladman.plus.com/cryptography_technology/rijndael/ method won’t work ??

2008 November 27
Paulo permalink

Well, I making one shield, and he “client” is wried in pascal and the “server” is writed in C, I’ll use your libe, and other lib for pascal, the lib of pascal use “rejindael” method…

thanks for help ^^

2008 November 28
Paulo permalink

LLya, your lib is RFC3602, RFC2406 or RFC3852 ?

you know one cool dll or lib for pascal with AES ?

2008 November 28
Ondra permalink

Hello,

I am from Czech Republic and I use your method in my project, but when I want decrypt crypted data, in fifth round the script will be closed… I don’t know why. I use Borland C++ 5.02. Could you help me please with it please?

Thanks

2008 December 2
Paulo permalink

I’m with the some error =/

2008 December 3

Paulo,

The code is FIPS-197. Whatever RFC functionality required you must do it yourself. I cannot advise you on any particular DLL or lib for pascal. Any DLL may be used with any language regardless of the language it made of. For an application, having cryptographic functions encapsulated in a separate DLL is a bad idea indeed.

2008 December 3

Ondra,

It does sounds like a buffer overflow for me, but I cannot say anything without seeing your code. You may try to check the output buffer size and usage in your program. Take a closer look where and how you store a decrypted data.

2008 December 4
Anonymous permalink

Hi Ilya,

I wanted to see some RAM/ROM usage data and for including your lib in an embedded system with limited CPU capabilities. It is generally of interest for most guys who want to use the lib like me.

So, here is what I found

Setup:
======
MCU : Freescale 9S12XS128
No real target : Simulated (Full Chip Simulation in Code Warrior IDE)
Selected Clock : 8 MHz (or 8 MIPS)

RAM / ROM Data:
===============
Name Data Code Const
—————————————————————–
MC9S12XS128.c.o 359 0 0
main.c.o 0 92 0
Start12.c.o 0 52 0
rtshc12.c.o (ansixbi.lib) 0 8 0
aes256.c.o 0 1392 0
other 256 12 2

Timing:
=======
Encryption: (Execution of the below 2 functions: 387.50 milli seconds
aes256_init(&ctx, key); ====================
aes256_encrypt_ecb(&ctx, buf);

Decryption: (Execution of the below 2 functions: 33.241 SECONDS
aes256_init(&ctx, key); ==============
aes256_decrypt_ecb(&ctx, buf);

2008 December 5

You might want to double-check that emphasized seconds result of yours. It is certainly not the fastest implementation and, as it said, speed was not important. But you’ve got ridiculous and obviously inadequate timing.

2008 December 16
Olaf permalink

Thx for the implementation. I compiled it for an 8-bit AVR processor using AVR-GCC and it takes up about 2KBytes of space. I haven’t yet tested the speed.

2009 January 20
Alex permalink

Hi,

I experienced that the decryption is much slower than the encryption. Is that reasonable? I don’t think so, since the encryption runs through the same algorithm like the decryption just with another key.
Do you have any speed test values? The decryption of 2000 bytes took about 3 to 5 seconds, which seems too much for me. What do you think?

best regards,
Alex

2009 January 21

Alex,

Decryption in AES generally a bit slower than encryption because at least the inverse MixColumns is slower than MixColumns. Plus in this particular implementation the inverse SubBytes is the real performance killer. If you are looking for speed then you may refer to the Gladman’s implementation instead.

2009 March 22
Hal permalink

This is nice and it’s just what I was looking for, a compact, free implementation.

However I too found that decryption was several hundred times slower than encryption. The problem is the aes_subBytes_inv() function, which tries all possible sbox values to find the inverse. Since computing the sbox already iterates over all possible values to find the mulinv, this is a doubly nested loop and takes a long time.

Studying the AES spec I figured out how to write a fast sbox inversion function. I manually (pencil and paper!) inverted their matrix 5.2, and it comes out that the inverse transform is b_i = b_((i+2) mod 8) xor b_((i+5) mod 8) xor b_((i+7) mod 8). So you do this step and then the mulinv, and it is a relatively fast (faster anyway) sbox inversion:

/* ————————————————————————– */
uint8_t rj_sbox_inv(uint8_t x)
{
uint8_t y, sb;

y = x ^ 0×63;
sb = y = (y<>7);
y = (y<>6); sb ^= y; y = (y<>5); sb ^= y;

return gf_mulinv(sb);
} /* rj_sbox_inv */

Then you can do:

/* ————————————————————————– */
void aes_subBytes_inv(uint8_t *buf)
{
register uint8_t i = 16, j;
while (i–)
{
buf[i] = rj_sbox_inv(buf[i]);
}
} /* aes_subBytes_inv */

and get rid of the double nesting. Now decryption is as fast as encryption, fast enough for my purposes.

However, having done that, it turned out that just putting in the sbox and inverse sbox as tables, 256 bytes each, was no bigger and considerably faster. You get rid of gf_alog(), gf_log(), gf_mulinv(), rj_sbox() and rj_sbox_inv(), and do:

#define rj_sbox(x) sbox[(x)]
#define rj_sbox_inv(x) sboxinv[(x)]

With the code you eliminate, the tables don’t make things any bigger, and least on a 32 bit machine. But I imagine there’s not much difference on most architectures.

Your way is a nice exercise in interpreting the spec, but the table do simplify things. But thanks for this implementation, I will use it!

Feel free to use my changes if you like, under the same licensing terms.

Hal

2009 March 22

Hal, thanks for the valuable comment. I think I’ll include your changes to benefit a generic usage aside of CTR/OFB modes.

2009 May 5

[...] moving onto the HSM side. The HSM itself uses a AES library from literatecode.com which allows me to peform AES encryption and decryption on the Arduino board. Some (and I must [...]

Pingback
2009 May 5

Hey there, Just started using your library as part of a hardware HSM project I’m making using Arduino, and its perfect - its a little slow on the chip but its nice and small in code size, I didn’t have to make any changes or anything.

I realise you replaced the lookup tables with “on the fly” and I’m wondering if you have a version which does not do this - What is the size / performance difference between the two I wonder?

Best Regards
Syn

2009 May 5

Syn, I’ve just uploaded the updated source code, you may take it. Decryption is faster now and the code has an option to be compiled with static tables.

2009 May 5

[...] have updated the source code of my byte-oriented AES-256 implementation. The decryption routine is faster, so this implementation is better for a generic use. You may also [...]

Pingback
2009 July 10
Arif permalink

Hi,

Thanks for the AES implementation.
If I want to change your code to AES-128, which part that I need to change?

Thank you.

2009 July 14

@Arif

These would be a number of rounds and a key schedule.

2009 July 22
James permalink

Absolutely great stuff, thanks a lot Ilya!

2009 October 17

I need to open a ddl which was encrpted using AES256

2009 December 10
Pro permalink

In your demo you call “Init” both before you decode and encode a buffer. Do you need to cal Init before each separate encode or decode? If I’m trying to decode a buffer of size - say 16384 bytes, I assume I have to loop 1024 times and call decode each time, but do I also have to call init each loop?

2009 December 10

@Pro

No, you do not need multiple inits. A demo is just a demo.

2010 January 8
Dillip Kumar permalink

Hi, boss !! done a great job !! :-)

Can you tell me changes to be made to your code available for hard code it to 128-bit Message Block and 128-bit Based Encryption Key for AES ?

i have tried to make some changes to it…
like :

1. Changing :

void aes_expandEncKey(uint8_t *k, uint8_t *rc)
{
register uint8_t i;

k[0] ^= rj_sbox(k[13]) ^ (*rc);
k[1] ^= rj_sbox(k[14]);
k[2] ^= rj_sbox(k[15]);
k[3] ^= rj_sbox(k[12]);
*rc = F( *rc);

for(i = 4; i < 16; i += 4) k[i] ^= k[i-4], k[i+1] ^= k[i-3],
k[i+2] ^= k[i-2], k[i+3] ^= k[i-1];
} /* aes_expandEncKey */

then similarly for:

2. void aes_expandDecKey(uint8_t *k, uint8_t *rc) ;

3. typedef struct {
uint8_t key[16];
uint8_t enckey[16];
uint8_t deckey[16];
} aes256_context;

4. uint8_t key[32]; to uint8_t key[16 ]; in demo.c file…
5. And few More changes…
6. but the input plain text with decrypted buffer varies totally….

I am using your license in my project report and bibliography too !! Dont mind for modification of work !! ;-)

And help me out !!

2010 January 8
Dillip Kumar permalink

And changed no of rounds to 10 for 128 instead 14 for 256 bit key !!

Leave A Comment

Note: You can use basic XHTML in your comments. Your email address will never be published.

Subscribe to this comment feed via RSS