## Using a PRP to efficiently iterate randomly over a large list

Iterating over lists of data is one of the most fundamental operations that computer programs need to perform. This sort of operation is commonly done by incrementing a counter or, in a modern programming language, by using an iterator. However, I have often found myself wanting to iterate over lists in a random fashion but have found this to be a big challenge. For a short list, it’s easy enough; simply shuffle the contents of the list. Unfortunately, for very large lists, it can take more memory than your system has available in order to store the state of a shuffled list. What is desired is a way to visit every element of a very large list only once without extensive processor or memory requirements. Here, I will show you how to randomly traverse an arbitrarily large list using a Pseudorandom Permutation (PRP). The method uses a Feistel construction along with a Pseudorandom Function (PRF).

Every block cipher, like AES, is actually a PRP. The only problem is that the block size of AES is much larger than would be required for any set we would practically want to traverse. So, what we need is a PRP that permutes a set of the size we care about. The Feistel construction can be used to take any PRF and use it to create a PRP of the size we need.

## Block Size

We are accustomed to thinking of block ciphers as taking a plaintext as input and returning a ciphertext. When we look at the mathematics of a PRP, however, it is defined in terms of sets. A PRP takes an integer in a set and returns another integer in the same set. So, in order to return a permutation of a large list, we can think of the PRP as taking an index into the list and returning the permuted index into the same list. The block size we want to use, then, is the smallest number of bits that can store the largest index into our list.

In Ruby, we might define this as follows:

@block_size = Math.log2(@num_elements).ceil

For this implementation, I have chosen to use a balanced Feistel network, so I need to make sure that the number of bits is divisible by two.

## Pseudorandom Function

The randomness in the Feistel construction comes from a PRF. In this case, I have chosen to use SHA-256. At each round of the network, we need to generate a random value that is XORd with half the block. Since SHA-256 returns more bits than we need, we simply throw away the extra bits.

def f(i, x) hex_digest = @sha256.hexdigest("#{@seed},#{i},#{x}") return hex_digest.to_i(16) >> (256-(@block_size/2)) end

## Permutations

Each round of the Feistel construction generates a random value seeded by the right side of the block and XORs this with the left side, then the sides are flipped. This should be done at least three times (I’m doing it seven).

def permute(x) xl = x >> (@block_size/2) xr = x - (xl << (@block_size/2)) 0.upto(6) do |i| xl = (xl ^ f(i, xr)) xl, xr = [xl, xr].reverse end return (xl << (@block_size/2)) + xr end

## Fitting the set

Most of the time, the number of items you will want to iterate over will not result in a nice even block size. This will mean that the permutation will work in a set that is larger than your actual set. To handle this case, we simply need to recursively permute until we are in the proper value range.

n = permute(@position) while (n >= @num_elements) n = permute(n) end return n

## Example

Let’s say there is an Internet resource with 10,000,000 elements that you would like to randomly traverse. Here’s how you might implement the traversal.

size = 10_000_000 seed = Random.new.bytes(32) re = RandomEnum.new(seed, size) 0.upto(size-1) do |i| n = re.next puts "next: #{n}" end

## Summary

I have given an overview and a sample implementation of how to use a PRP to traverse a large set or list. The full Ruby source code is provided here

require 'digest' class RandomEnum def initialize(seed, num_elements) @num_elements = num_elements @seed = seed @position = -1 @sha256 = Digest::SHA256.new @block_size = Math.log2(@num_elements).ceil if (@block_size % 2 != 0) @block_size += 1 end end def next @position += 1 n = permute(@position) while (n >= @num_elements) n = permute(n) end return n end def f(i, x) hex_digest = @sha256.hexdigest("#{@seed},#{i},#{x}") return hex_digest.to_i(16) >> (256-(@block_size/2)) end def permute(x) xl = x >> (@block_size/2) xr = x - (xl << (@block_size/2)) 0.upto(6) do |i| xl = (xl ^ f(i, xr)) xl, xr = [xl, xr].reverse end return (xl << (@block_size/2)) + xr end end

## References

[1] How to Encipher Messages on a Small Domain: Deterministic Encryption and the Thorp Shuﬄe. B. Morris, P. Rogaway, T. Stegers, Crypto 2009

leave a comment