next | previous | forward | backward | up | top | index | toc | Macaulay2 web site
Macaulay2Doc > The Macaulay2 language > hash tables > hashing

hashing

A hash table contains a set of key-value pairs. The access functions for hash tables accept a key and retrieve the corresponding value. Here are the details, together with a discussion of how we designed the hash table system seen in Macaulay2.

The keys and values are stored in the hash table. The hash table consists of a certain number of buckets, each of which can hold an arbitrary number of key-value pairs. The number of buckets is chosen to be large enough that typically one may expect each bucket to hold fewer than three key-value pairs. The key is used as follows to determine in which bucket the key-value pair is to be stored. The function hash is applied to the key to produce, in a deterministic way, an integer called the hash code, and the remainder of the hash code upon division by the number of buckets tells which bucket will be used.

It is essential that the hash code of a key never change, for otherwise the next attempt to find the key in the hash table will have an unpredictable result - the new hash code of the key may or may not lead to the same bucket, and the value may or may not be located.

Some hash tables and lists are mutable, i.e., their contents can be altered by assignment statements. As explained above, the hash code of a mutable thing is not permitted to change when the contents of the thing change. Hence, the algorithm for computing the hash code may not refer to the contents of a mutable thing.

The strong comparison operator === is provided to parrot the equality testing that occurs implicitly when accessing a key in a hash table. The fundamental requirement for this strong comparison operator is that things with different hash codes must also turn out to be different when tested with this comparison operator.

Here we come to a question of design. As discussed above, we must assign hash codes to mutable things in such a way that the hash codes don't depend on their contents. We can do this in various ways.

In Macaulay2, we chose the second approach listed above; we expect to have many mutable things appearing as keys in hash tables, and we need the speed. A counter with initial value 1000000 is incremented each time a mutable thing is created, and its value is taken as the hash code of the thing and stored within it. The strong comparison test cannot depend on the contents of mutable things, and thus such things appear to be containers with opaque walls. For mutable things, the test for equality must be the same as equality of the hash codes.

It is essential to have some hash tables for which equality amounts to equality of the contents. This cannot be achieved for mutable hash tables, but we do achieve it for non-mutable hash tables -- the hash code is computed directly from the contents of the thing in a deterministic way. This allows us to implement the notion of polynomial, say, as a hash table -- the keys can be the monomials (necessarily non-mutable) and the values can be the coefficients. The notion of monomial can be implemented as a hash table where the keys are the variables and the values are the corresponding exponents.

One further comforting remark: the routines that compute hash codes or strong equality do not get into infinite loops, despite the existence of circular structures: any circular structure must come into being by means of changing something, and so the circular loop in the structure necessarily involves a mutable thing, whose contents are not examined by the routines. This provides another argument in favor of taking the second approach listed above.

See also