Efficient algorithm to randomly find available places inside a list in Python
Efficient algorithm to randomly find available places inside a list in Python
I need to randomly assign a place inside a list to an input. I need to check whether it is not occupied first and then use it. The best algorithm that I can come up with is the following:
def get_random_addr(input_arr):
while True:
addr = random.randrange(1, len(input_arr))
if input_arr[addr] is None:
break
return addr
This is obviously not efficient since as we occupy more slots, the loop takes longer to find an empty slot, and even it may take forever (suppose only one empty slot is left). Do you have any better solutions?
How I did it
Based on the chosen answer, this is how I ended up doing it. It is very fast and efficient compared to the solutions which search through the whole list and find the None
elements and randomly choose from the retrieved set. I think the bottleneck was random.choice
method which seems to be very slow.
None
random.choice
# Create a list of indexes at the beginning when all the values are None
available_index = list(range(1, len(input_arr)))
random.shuffle(available_index)
# To get a random index simply pop from shuffled available index
random_index = available_index.pop()
While this method has extra O(n) memory complexity, in practice it is very efficient and fast.
numpy
None
under the hood numpy still is
O(N)
its just a faster O(N)
– Joran Beasley
Jun 28 at 19:23
O(N)
O(N)
5 Answers
5
If you can't use numpy I'd keep a set of indexes which are known to contain None
. Every time None
is added or removed this set of indexes will be updated
None
None
this is a good answer(+1) ... but i dont think numpy solves it for him ... we think of numpy vector operations as O(1) but
A[A==None]
is still O(N)
under the hood ... the set of known empties is the right answer though i think– Joran Beasley
Jun 28 at 19:19
A[A==None]
O(N)
@JoranBeasley
numpy
is indeed O(N)
(it is still not using magic :) ) however numpy
's under-the-hood-loops are faster (sometimes in several orders of magnitudes)– DeepSpace
Jun 28 at 19:24
numpy
O(N)
numpy
Your function can take arbitrarly long to return. In particular, you will get into an infinite loop if no item is None
.
None
Instead, recover all indices which are None
and use random.choices
to randomly return k
of them.
None
random.choices
k
import random
def get_random_addr(input_arr, k=1, target=None):
return random.choices([i for i, v in enumerate(input_arr) if v is target], k=k)
l = [0, None, 2, 3, None, None]
for i in get_random_addr(l, k=2):
l[i] = i
print(l) # [0, None, 2, 3, 4, 5]
In the long run this is as inefficient as OP's code
– DeepSpace
Jun 28 at 19:12
none_index
should be something that's only calculated once, not each call– user3483203
Jun 28 at 19:13
none_index
@DeepSpace No because OP's answer may choose an index which is not None while this will not
– Olivier Melançon
Jun 28 at 19:13
@OlivierMelançon True, but this code is O(N) as well (it has to iterate over the whole list). BTW, I believe you meant to iterate over
enumerate(input_arr)
– DeepSpace
Jun 28 at 19:15
enumerate(input_arr)
@DeepSpace There is no way not to iterate over the list. You solution omits the fact that if you keep a set of available indices, you add an O(1) operation on every item setting. Depending on which operation is most often used, this could be longer
– Olivier Melançon
Jun 28 at 19:19
Similar to DeepSpace's idea, except with O(1)
memory and O(n)
time, but faster by a constant factor since it only iterates over half the slots in the array.
O(1)
O(n)
1/number_empty_slots
Code:
def get_random_addr(input_arr, num_empty_slots):
# num_empty_slots contains the number of empty slots in input_arr
for index, elem in enumerate(arr):
if elem is None:
if random.random() < 1 / num_empty_slots:
return index
num_empty_slots -= 1
Simply use enumerate
to index your list first, filter out those that are None
, and then use random.choice
to pick an available space.
enumerate
None
random.choice
from random import choice
def get_random_addr(input_arr):
return choice([index for index, value in enumerate(input_arr) if value is None])
print(get_random_addr([None, 1, None, 2]))
This outputs either 0
or 2
randomly, or None
if there is no more space available.
0
2
None
This solution slows down my code significantly! I'm dealing with pretty large lists and the code just halted at this line! Not sure why tho!
– A23149577
Jun 28 at 19:56
@A23149577 It's because I used a lambda function as a filter, which is inefficient in python. I've edited my answer to use list comprehension instead. Thanks for the feedback.
– blhsing
Jun 28 at 20:00
Well, it is still notably slower than my naive solution, but I think that's the cost I should pay to avoid infinite loop.
– A23149577
Jun 28 at 20:18
@A23149577 Your naive solution would certainly be faster when there are plenty of space available, since you don't have to scan through the entire list for available spaces first. But as you also realized, your solution would be much slower when there is less space left, so scanning for available spaces is simply a necessary evil for a predictable performance.
– blhsing
Jun 28 at 20:27
In my approach, I pick an arbitrary address in the target array, and if it is free I add it to the output list, but if it is not, I map that address to an address which does contain None
, nearest to the end of the list. All entries in the array beyond and including this mapped free address are dropped from this list, since they are either nonempty, or already are represented elsewhere in the list. I repeat this process, chopping away at the size of the target list, making it easier and easier to find new empty addresses as it proceeds. There's a few other minor details to make it all work, but I think the code below can explain those better than I can with words.
None
from random import random
def randint(max_val):
return int(random() * max_val)
def assign(values, target):
output =
mapping = dict()
mmax = 0
size = len(target)
for val in values:
idx = randint(size)
while target[idx] != None:
if idx in mapping:
idx = mapping.pop(idx)
mmax = max(mapping or [0])
break
min_size = max(idx, mmax)
try:
size -= target[size-1:min_size:-1].index(None)
except:
size = min_size + 1
if target[size-1] == None:
size -= 1
mapping[idx] = size
if idx > mmax:
mmax = idx
elif size-1 in mapping:
size -= 1
mapping[idx] = mapping.pop(size)
mmax = max(mapping or [0])
idx = randint(size)
target[idx] = val
output.append(idx)
return output
Note that this modifies the target list passed to it. If you don't want to modify it, you really have two options: implement a bit of extra logic to check if the "free" address is already consumed, or copy the entire list (in which case, reverse it and patch the indices, so that the .index()
can work on the list directly, which is the major time sink anyways.
.index()
I'd also recommend verifying that the solutions it produces are valid. I've done some testing on my part, but I very well could have missed something.
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
If you really want to do this efficiently, use
numpy
. Otherwise, keep a list of allNone
indices, by looping once, then just pop from that list and assign your value– user3483203
Jun 28 at 19:10