[OFF TOPIC MATHS QUESTION] - Random Numbers

edit 2 - I think I may have worded this slightly better here :

edit 1 - it looks like what I want is a non-repeating random sequence generator. This seems quite popular. Now if only my head hadn’t exploded… https://en.wikipedia.org/wiki/Mersenne_Twister

I want to specify a range of numbers (could be 1000 to 9999, or 1000000 to 9999999), and I want to be able to retrieve values in that range in an apparently random fashion. Easy enough, you say.

The catch is, I want to ensure I never get a repeated number until they have all been issued. Also, it may be months between fetches, so I want to be able to carry on where I left off last time.

Ideally, I would be able to seed it at the start, give it the last number I retrieved, and have it continue along the same sequence (making the process completely predictable but apparently random).

I’m a complete duffer at maths, and I cannot find the right words to put into google.

My mind is thinking down the path of - given a start number and the range, you could add a certain value (that wraps around at the end of the range) which gives a sort of impression of randomness but is stepping through in a not obviously linear way.

Someone might request a few of these in a row, so a sequence like 1240, 8241, 1241, 8242, etc. might be spotted.

Note I want a purely mathematical solution, not one involving SQL (I can do that already) or one that involves storing the previously selected numbers in any way (except for maybe the last selected number and any seed required). I know, moon on a stick :slight_smile:

Does any of that make sense?

My first thought for this aspect was to use Python’s numpy.random.choice() to retrieve a random number without replacement.

I know there is more to the problem, but I’m on my way to a meeting!

Love this stuff though.

Can’t see how I get the next one in the sequence there if the sequence is not stored.

This looks interesting, trouble is I don’t understand a word of it :
https://www.usenix.org/legacy/events/usenix99/full_papers/deraadt/deraadt_html/node17.html

Hi David,

What do you think of this as a somewhat simple approach?

  • create a list of sequential numbers in some range:
    data=list(range(100))

  • shuffle it randomly “in place”
    random.shuffle(data)

  • now you can select values from the list sequentially. Since it was randomly shuffled, this is effectively the same as randomly selecting values in a list without replacement.

  • since you know the length of the list, you can keep track of the last index that was chosen so you know where you left off as well as when the list has been exhausted.

Could that work?

No, that won’t work because it requires me to store the entire list. I want to be able to feed back the last number retrieved (and possibly any seed) to get the next in the sequence, with the sequence being reproducible given the same parameters elsewhere.

I recall from hash-table design, you can get a guaranteed non-repeating sequence of N numbers if N is prime. However, it’s not necessarily random, or even close to it.

It doesn’t have to be machine-random, just no immediately obvious human-discernible pattern.

Can you require that the number of distinct values be a prime number?

Possibly - the range really can be fairly flexible (with a minimum digit length of 4 and no particularly upper digit length, though 8 might be optimal).

@alcampopiano & @p.colbert - I have edited the question with a link to my stack exchange question which proposes a solution, that quite frankly makes my skull melt :slight_smile:

Yeah, some of those formulas require a couple of semesters of some fairly abstract mathematics before the notation begins to make sense.

Also note that these guys are trying to solve a much more general problem, which makes it harder. You can use the same mechanisms, but you don’t need all that rigor, just something that’s hard for folks to guess, while not repeating.

If you look at the Python code in Wikipedia’s article, you’ve got the core generator right there. The missing pieces are the integers modulus, seed, and a. Once chosen, hold these constant. Then, when you put in a value for c, you get the next value for c.

So the problem now boils down to, how should we choose those three fixed parameters?

  1. modulus is the number of distinct values. It should be a prime number.
  2. seed, roughly speaking, is the average distance between successive values of c. Without further research, I think it should be roughly one half of the modulus.
  3. Since we’re not really interested in mathematical rigor here, a, the multiplier, is something you can experiment with. I’d suggest another prime number, with around half as many digits as modulus.
1 Like

Okey dokey - thanks for that, I shall give it a whirl!

You’re right in that my sticking point was working out what values to feed it.

The other trick might be testing it. :wink: I.e., how do you prove there are no accidental cycles?

I’ll run it through that python code in a big loop, dump it to a file, then “sort -u” and compare the line count :slight_smile:

In C, I’d probably build a bitset, with modulus number of bits, initially all zeroes. For each number generated, set its bit. If I find I’m setting a bit that was already set, then I’ve found a repeat.

But if the generator works as intended, I’ll reach the end of the loop, first, and at the end, all of the bits will be set.

What determines the lower and upper digit length (ie value) in those parameters?

I’ve just run 100k loop with the following start values :

m = 10000019
s = int(m/2)
a = 5653
c = a

and that gives me numbers as small as 3 digits and as big as 7.

You should end up with values as small as 1 digit (0), and as large as modulus - 1.

If you want a consistent number of digits, you could pad with leading zeroes.

I can’t have leading zeros as the value must be represented as an integer.

I also don’t think I can just (say) multiply the value by a multiple of 10 else I may accidentally clash with a to-be-generated number.

Hmm…

Add a N-digit value as an offset? Not to change the value of c, but just as the end result.

What do you guys think about using a python generator for this?

As far as I can see, without storing the full list across sessions, this will return a predictable random integer from a known range, without producing duplicates.

def predictable_random_integer(seed, ind):
    data=list(range(5,10))
    random.seed(a=seed)
    random.shuffle(data)
    print(data) # just here to show the list
    yield data[ind]

#first run
my_gen_obj=predictable_random_integer(seed=0, ind=2)
my_value=next(my_gen_obj)
print(my_value)

[7, 6, 5, 9, 8]
5


# second run
my_gen_obj=predictable_random_integer(seed=0, ind=3)
my_value=next(my_gen_obj)
print(my_value)

[7, 6, 5, 9, 8]
9

# third run
my_gen_obj=predictable_random_integer(seed=0, ind=4)
my_value=next(my_gen_obj)
print(my_value)

[7, 6, 5, 9, 8]
8

1 Like

So for example if the biggest number could be XXXX then add 10000 to it?