Extra Fox – A blog by Christopher Taylor

Using a PRP to efficiently iterate randomly over a large list

Posted in Uncategorized by extrafox on April 4, 2012

Iterating over lists of data is one of the most fundamental operations that computer programs need to perform. This sort of operation is commonly done by incrementing a counter or, in a modern programming language, by using an iterator. However, I have often found myself wanting to iterate over lists in a random fashion but have found this to be a big challenge. For a short list, it’s easy enough; simply shuffle the contents of the list. Unfortunately, for very large lists, it can take more memory than your system has available in order to store the state of a shuffled list. What is desired is a way to visit every element of a very large list only once without extensive processor or memory requirements. Here, I will show you how to randomly traverse an arbitrarily large list using a Pseudorandom Permutation (PRP). The method uses a Feistel construction along with a Pseudorandom Function (PRF).

Every block cipher, like AES, is actually a PRP. The only problem is that the block size of AES is much larger than would be required for any set we would practically want to traverse. So, what we need is a PRP that permutes a set of the size we care about. The Feistel construction can be used to take any PRF and use it to create a PRP of the size we need.

Block Size

We are accustomed to thinking of block ciphers as taking a plaintext as input and returning a ciphertext. When we look at the mathematics of a PRP, however, it is defined in terms of sets. A PRP takes an integer in a set and returns another integer in the same set. So, in order to return a permutation of a large list, we can think of the PRP as taking an index into the list and returning the permuted index into the same list. The block size we want to use, then, is the smallest number of bits that can store the largest index into our list.

In Ruby, we might define this as follows:

@block_size = Math.log2(@num_elements).ceil

For this implementation, I have chosen to use a balanced Feistel network, so I need to make sure that the number of bits is divisible by two.

Pseudorandom Function

The randomness in the Feistel construction comes from a PRF. In this case, I have chosen to use SHA-256. At each round of the network, we need to generate a random value that is XORd with half the block. Since SHA-256 returns more bits than we need, we simply throw away the extra bits.

def f(i, x)
   hex_digest = @sha256.hexdigest("#{@seed},#{i},#{x}")
   return hex_digest.to_i(16) >> (256-(@block_size/2))
end

Permutations

Each round of the Feistel construction generates a random value seeded by the right side of the block and XORs this with the left side, then the sides are flipped. This should be done at least three times (I’m doing it seven).

def permute(x)
   xl = x >> (@block_size/2)
   xr = x - (xl << (@block_size/2))
   0.upto(6) do |i|
      xl = (xl ^ f(i, xr))
      xl, xr = [xl, xr].reverse
   end
   return (xl << (@block_size/2)) + xr
end

Fitting the set

Most of the time, the number of items you will want to iterate over will not result in a nice even block size. This will mean that the permutation will work in a set that is larger than your actual set. To handle this case, we simply need to recursively permute until we are in the proper value range.

n = permute(@position)
while (n >= @num_elements)
   n = permute(n)
end
return n

Example

Let’s say there is an Internet resource with 10,000,000 elements that you would like to randomly traverse. Here’s how you might implement the traversal.

size = 10_000_000
seed = Random.new.bytes(32)
re = RandomEnum.new(seed, size)
0.upto(size-1) do |i|
   n = re.next
   puts "next: #{n}"
end

Summary

I have given an overview and a sample implementation of how to use a PRP to traverse a large set or list. The full Ruby source code is provided here

require 'digest'

class RandomEnum
   def initialize(seed, num_elements)
      @num_elements = num_elements
      @seed = seed

      @position = -1
      @sha256 = Digest::SHA256.new

      @block_size = Math.log2(@num_elements).ceil
      if (@block_size % 2 != 0)
         @block_size += 1
      end
   end

   def next
      @position += 1
      n = permute(@position)
      while (n >= @num_elements)
         n = permute(n)
      end
      return n
   end

   def f(i, x)
      hex_digest = @sha256.hexdigest("#{@seed},#{i},#{x}")
      return hex_digest.to_i(16) >> (256-(@block_size/2))
   end

   def permute(x)
      xl = x >> (@block_size/2)
      xr = x - (xl << (@block_size/2))
      0.upto(6) do |i|
         xl = (xl ^ f(i, xr))
         xl, xr = [xl, xr].reverse
      end
      return (xl << (@block_size/2)) + xr
   end
end

References

[1] How to Encipher Messages on a Small Domain: Deterministic Encryption and the Thorp Shuffle. B. Morris, P. Rogaway, T. Stegers, Crypto 2009