Efficient algorithm to randomly find available places inside a list in Python


Efficient algorithm to randomly find available places inside a list in Python



I need to randomly assign a place inside a list to an input. I need to check whether it is not occupied first and then use it. The best algorithm that I can come up with is the following:


def get_random_addr(input_arr):

while True:
addr = random.randrange(1, len(input_arr))
if input_arr[addr] is None:
break
return addr



This is obviously not efficient since as we occupy more slots, the loop takes longer to find an empty slot, and even it may take forever (suppose only one empty slot is left). Do you have any better solutions?



How I did it



Based on the chosen answer, this is how I ended up doing it. It is very fast and efficient compared to the solutions which search through the whole list and find the None elements and randomly choose from the retrieved set. I think the bottleneck was random.choice method which seems to be very slow.


None


random.choice


# Create a list of indexes at the beginning when all the values are None
available_index = list(range(1, len(input_arr)))
random.shuffle(available_index)

# To get a random index simply pop from shuffled available index
random_index = available_index.pop()



While this method has extra O(n) memory complexity, in practice it is very efficient and fast.





If you really want to do this efficiently, use numpy. Otherwise, keep a list of all None indices, by looping once, then just pop from that list and assign your value
– user3483203
Jun 28 at 19:10



numpy


None





under the hood numpy still is O(N) its just a faster O(N)
– Joran Beasley
Jun 28 at 19:23


O(N)


O(N)




5 Answers
5



If you can't use numpy I'd keep a set of indexes which are known to contain None. Every time None is added or removed this set of indexes will be updated


None


None





this is a good answer(+1) ... but i dont think numpy solves it for him ... we think of numpy vector operations as O(1) but A[A==None] is still O(N) under the hood ... the set of known empties is the right answer though i think
– Joran Beasley
Jun 28 at 19:19


A[A==None]


O(N)





@JoranBeasley numpy is indeed O(N) (it is still not using magic :) ) however numpy's under-the-hood-loops are faster (sometimes in several orders of magnitudes)
– DeepSpace
Jun 28 at 19:24



numpy


O(N)


numpy



Your function can take arbitrarly long to return. In particular, you will get into an infinite loop if no item is None.


None



Instead, recover all indices which are None and use random.choices to randomly return k of them.


None


random.choices


k


import random

def get_random_addr(input_arr, k=1, target=None):
return random.choices([i for i, v in enumerate(input_arr) if v is target], k=k)


l = [0, None, 2, 3, None, None]

for i in get_random_addr(l, k=2):
l[i] = i

print(l) # [0, None, 2, 3, 4, 5]





In the long run this is as inefficient as OP's code
– DeepSpace
Jun 28 at 19:12






none_index should be something that's only calculated once, not each call
– user3483203
Jun 28 at 19:13


none_index





@DeepSpace No because OP's answer may choose an index which is not None while this will not
– Olivier Melançon
Jun 28 at 19:13





@OlivierMelançon True, but this code is O(N) as well (it has to iterate over the whole list). BTW, I believe you meant to iterate over enumerate(input_arr)
– DeepSpace
Jun 28 at 19:15



enumerate(input_arr)





@DeepSpace There is no way not to iterate over the list. You solution omits the fact that if you keep a set of available indices, you add an O(1) operation on every item setting. Depending on which operation is most often used, this could be longer
– Olivier Melançon
Jun 28 at 19:19



Similar to DeepSpace's idea, except with O(1) memory and O(n) time, but faster by a constant factor since it only iterates over half the slots in the array.


O(1)


O(n)


1/number_empty_slots



Code:


def get_random_addr(input_arr, num_empty_slots):
# num_empty_slots contains the number of empty slots in input_arr
for index, elem in enumerate(arr):
if elem is None:
if random.random() < 1 / num_empty_slots:
return index
num_empty_slots -= 1



Simply use enumerate to index your list first, filter out those that are None, and then use random.choice to pick an available space.


enumerate


None


random.choice


from random import choice
def get_random_addr(input_arr):
return choice([index for index, value in enumerate(input_arr) if value is None])
print(get_random_addr([None, 1, None, 2]))



This outputs either 0 or 2 randomly, or None if there is no more space available.


0


2


None





This solution slows down my code significantly! I'm dealing with pretty large lists and the code just halted at this line! Not sure why tho!
– A23149577
Jun 28 at 19:56





@A23149577 It's because I used a lambda function as a filter, which is inefficient in python. I've edited my answer to use list comprehension instead. Thanks for the feedback.
– blhsing
Jun 28 at 20:00






Well, it is still notably slower than my naive solution, but I think that's the cost I should pay to avoid infinite loop.
– A23149577
Jun 28 at 20:18





@A23149577 Your naive solution would certainly be faster when there are plenty of space available, since you don't have to scan through the entire list for available spaces first. But as you also realized, your solution would be much slower when there is less space left, so scanning for available spaces is simply a necessary evil for a predictable performance.
– blhsing
Jun 28 at 20:27




In my approach, I pick an arbitrary address in the target array, and if it is free I add it to the output list, but if it is not, I map that address to an address which does contain None, nearest to the end of the list. All entries in the array beyond and including this mapped free address are dropped from this list, since they are either nonempty, or already are represented elsewhere in the list. I repeat this process, chopping away at the size of the target list, making it easier and easier to find new empty addresses as it proceeds. There's a few other minor details to make it all work, but I think the code below can explain those better than I can with words.


None


from random import random

def randint(max_val):
return int(random() * max_val)

def assign(values, target):
output =
mapping = dict()
mmax = 0
size = len(target)
for val in values:
idx = randint(size)
while target[idx] != None:
if idx in mapping:
idx = mapping.pop(idx)
mmax = max(mapping or [0])
break

min_size = max(idx, mmax)
try:
size -= target[size-1:min_size:-1].index(None)
except:
size = min_size + 1

if target[size-1] == None:
size -= 1
mapping[idx] = size
if idx > mmax:
mmax = idx
elif size-1 in mapping:
size -= 1
mapping[idx] = mapping.pop(size)
mmax = max(mapping or [0])

idx = randint(size)
target[idx] = val
output.append(idx)
return output



Note that this modifies the target list passed to it. If you don't want to modify it, you really have two options: implement a bit of extra logic to check if the "free" address is already consumed, or copy the entire list (in which case, reverse it and patch the indices, so that the .index() can work on the list directly, which is the major time sink anyways.


.index()



I'd also recommend verifying that the solutions it produces are valid. I've done some testing on my part, but I very well could have missed something.






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Comments

Popular posts from this blog

paramiko-expect timeout is happening after executing the command

Opening a url is failing in Swift

Export result set on Dbeaver to CSV