python find item in list

Python reminds me frequently of zork: "You are in a maze of twisty passages; You are in a twisty maze of passages."

For instance, you'd think that python would have a built-in find first item in a list function, but it doesn't. Simple searches, python tutorials, and python's own docs don't help point towards a good solution to this basic problem.

It turns out, however, that you can use a generator expression to build a succinct, speedy search. Generator expressions are one of the wonderfully useful, but hidden avenues in python -- and the technique could use a bit more advertisement.

First, take a quick look at why some of the more common search techniques fall short.

list.index(), for instance, only works for items that have a pre-defined equality function ( primitive types or custom classes )

list comprehensions ( ex. [ i for in list if ... ] ) are great but comprehensions touch every member of a list; so, not so speedy. filter() likewise has the same issue.

A google search turned up an old post: http://tomayko.com/writings/cleanest-python-find-in-list-function that people seem to reference frequently, but for me, having to define or import a function defeats the desire for simplicity.

The generator expression, however, is short and to the point:
(i for i in list if ... ).next()
Essentially, the bits in parens creates an iterator that yields control every time the if-statement succeeds; next() asks that iterator for a single value.

I wish I could claim credit for the solution, but a link from a link from a link turned up a tip from a commenter on another blog. Which proves once again, that without the web I couldn't write good software for the web. (hrmm.... )

18 comments:

tie said...

This method is great except... if the item is not in the list the generator expression would raise a StopIteration Exception, which requires a separate try/except block to be handled... not exactly clean

ionous said...

you're right tie; i should have included that in my post. the worst case is when you don't know if your item is in the list but you only want to handle when it is.

you *can* collapse lines following ':' onto the previous line though, so it's not a total loss.

try: (i for i in list if ... ).next()
except StopIteration: pass
else:
# do stuff

my personal feeling is that's still cleaner than a list comprehension or a separate find function -- but, definitely the StopIteration issue is worth drawing attention to.

plagiats said...

Thanks, it was usefulicious.
I used it this way :

listFiles = os.listdir(os.path.curdir)
try:
(filename for filename in listFiles if ".rar" in filename).next()
except:
print "Nothing to unrar..."

It's always nicer with a full example ;)

denis

Einar said...

Why not wrapping the expression in a for loop, removing the .next()? That will take care of the StopIteration without try/except.

ionous said...

Einar: in some cases i think that you could.


list= [1,2,3]
for i in ( i for i in list if i == 2 ):
print i
break


it's a little odd looking, but maybe not more so than the try-except syntax.

the one thing to keep in mind is that you'd need to call "break" to short circuit the generator -- otherwise it would touch every item in the list every time.

Lee said...
This comment has been removed by the author.
Lee said...

Here is another refinement that Python 2.6 makes possible: wrap the generator expression in the 'next()' function. For example:

seq = [1, 2, 3]
item = next((n for n in seq if n == 2), None)
if item is not None:
    print item

Choose whatever default (sentinel) value makes sense for your data.

Personally, I prefer the greater flexibility of getting the index of the item, rather than the item itself (as in 'list.index()'). This makes it easy to modify a list; for example, to delete the found item.

index = next((i for i in xrange(len(seq)) if seq[i] == 2), None)
if index is not None:
    del seq[index]

Also, with this approach you can just substitute 'xrange(len(seq) - 1, -1, -1)' to search in reverse, so you can find the index of the last matching item.

John said...

Mazes? Zork?

Showing your age there bro! Alas, I know all too well what you're refering to.

ionous said...

Lee: thanks! that's really cool. I'm still stuck in python2.5 land, but the 2.6 expression looks very nice.

Joe said...

What about:

found = any(n == 10 for n in list)

credit: www.siafoo.net/article/52

Greg said...

All list objects have an "in" operator, and an index() function.

x in mylist returns true if mylist contains x.

mylist.index(x) will return the index of x in mylist, or raises an exception if x is not in mylist. There's also rindex() which will find the last item in a list, and both functions have optional start and end indexes to limit the search.

All this goodness is built in. Check it out in the documentation.

ionous said...

@greg: yes *but* if you read my post and the other comments more closely, the methods that you suggest only works if you have or can define an equality operator for "x"

@joe: any() looks great - i need to migrate to 2.6 i think :)

Pascal said...

Isn't the "index" method of list what you're searching for ?

[1,2,3].index(2) -> 1
(Raises ValueError if value not found)

Seems perfect to me.

ionous said...

@pascal: depends; as i noted in the post it only works for types that have pre-defined equality operators... and only when you want "whole" object equality...

plagiats (above) gives a nice terse example of a case where you don't: searching for the first instance of a file with a name ending in ".rar"

guess since a number of people have asked variations of this question, i should have highlighted the problem better in the post. :}

Yg! said...

You can always use something like this too:

def lfind(haystack, needle):
z=-1
while 1:
try:
z = haystack.index(needle, z+1)
yield z
except:
break

this will return an iterator so you can:

for i in lfind(list, 'element'):
print(i)

or just:

list(lfind(list, 'element'))

so you get a list of all the element indexes on the list...

you can also enhance the search function with full text and other things using py3k...

Yg! said...

a case-insensitive version ( the comments are losing indentation, so re-indent the code first :) ):

def lfind(haystack, needle):
'''Finds something inside a list and returns the indexes as a iterable generator. Use list(lfind(a,b)) to get result as a list.'''
z=-1
hs = list(map(lambda x: x.lower(), haystack))
nd = needle.lower()
while 1:
try:
z = hs.index(nd, z+1)
yield z
except:
break

Codefisher said...

lee said he likes to get the index, as it is more flexible. How about both :)

index, item = next(((i, x) for i, x in enumerate(seq) if x == 2), (None, None))

I love enumerate :)

Eric said...

to yg!: there are plenty of ways of doing this, but we're looking here for the most compact, elegant and efficient one.

I doubt your 10 lines of code meet any of those three criteria.