- Beware of functions that iterate over input arguments multiple times. If these arguments are iterators, you may seee strange behavior and missing values.
- Python's iterator protocol defines how containers and iterators interact with the
nextbuild-in functions, for loops and related expressions.
- You can easily define your own iterable container type by implementing the
__iter__method as a generator.
- You can detect that a value is an iterator (instead of a contianer) if calling iter on it twice produces the same result, which can then be progressed with the next build-in function.
Say you want to analyze tourism numbers for the U.S state of Texas. Imagine the data set is the number of visitors to each city (in milions per year). You'd like to figure out what percentage of overall tourism each city revceives.
To do this you need a normalization function. It sums the inputs to determine the total number of tourists per yearr. Then is divides each city's individul visitor count by the total to find that city's contribution to the whole.
def normalize(numbers): total = sum(numbers) result =  for value in numbers: percent = 100 * value / total result.append(percent) return result
>>> visits = [15, 35, 80] >>> percentage = normalize(visits) >>> percentage [11.538461538461538, 26.923076923076923, 61.53846153846154]
def read_visits(data_path): with open(data_path) as f: for line in f: yield int(line)
normilize returns . The cause of this behavior is that an iterator only produces its results single time. If you iterate over an iterator or generator that has already raised a StopIteration exception, you won't get any result the second time around.
>>> it = read_visits('data') >>> percentage = normalize(it) >>> percentage 
it = read_visits('data') list(it) [15, 35, 80] list(it) 
One of solutions may not good one. The copy of the input iterator's contents could be large. Copying the iterator could cause your program to tun out of memory and crash.
def normalize(numbers): numbers = list(numbers) # Copy the iterator total = sum(numbers) result =  for value in numbers: percent = 100 * value / total result.append(percent) return result
One of solutions may not good one, either. One way around this is to accept a function that returns a new iterator each time it's called.
def normalize_func(get_iter): total = sum(get_iter()) # New Iterator result =  for value in get_iter(): # New Iterator percent = 100 * value / total result.append(percent) return result
To use normilize_func, you can pass in a lambda expression that calls the generator and produces a new iterator each time.
precentage = normalize_func(lambda: read_visits(path))
Though it works, having to pass a lambda function like this is clumsy. The better way to achieve the same result is to provide a new container class that implements the
The iterator protocol is now Python for loops and related expressions traverse the contents of a container type. When Python sees a statement like
for x in foo it will actually call
iter(foo) The iter build-in function calls the
foo.__iter__ special method in turn. The
__iter__ method must return an iterator object until it's exhausted (and raises a StopIteration exception)
class ReadVisits: def __init__(self, data_path): self.data_path = data_path def __iter__(self): with open(self.data_path) as f: for line in f: yield int(line)
The protocol states taht when an interator is passed to the iterator build-in function,
iter will return the iterator itself. In contrast, when a container type is passed
iter, a new iterator object will be returned each time. Thus, you can test an input value for this behavior and riase a TypeError to refect interators.
def normalize(numbers): if iter(numbers) is iter(numbers): raise TypeError("Must supply a container") total = sum(numbers) result =  for value in numbers: percent = 100 * value / total result.append(percent) return result