So far, we've thought a bit about doing computations. Let's change things up and look at how we might try to represent data instead.

For example, let's say I wanted to keep track of all my friends and their birthdays. How could I do that, so that I could work with it in Python?

Or perhaps, I'm writing a game server where everyone has a unique username and an account associated to it. How could I efficiently look up an account from the username?

Finally, what if I wanted to quickly find all the unique numbers which occur in a list. Is there a nice way to do that?

We'll see that each of these problems can be solved efficiently using a few fundamental data structures: lists, dictionaries and sets.

Lists are, perhaps, the fundamental data structure. They are...well...lists. Most of the time when dealing with data, we don't just have a single item or a couple items, but lots of them. Lists help us organize these into a coherent, ordered collection. For example:

In [2]:

```
numbers = [1, 4, 2, 3, 9]
print numbers
```

Now, suppose I want to talk about particular things in the list. How can I do that?

In [3]:

```
print numbers[0]
print numbers[1]
print numbers[-1]
```

*Note that like range, slicing follows the [,) half open interval convention.*

In [4]:

```
print numbers[1:4]
print numbers[2:5]
print numbers[1:]
print numbers[:-3]
```

I can do something even fancier by "striding" the items!

In [5]:

```
print numbers[::2]
print numbers[1::3]
print numbers[1:3:2]
```

In [6]:

```
numbers = [1, 4, 2, 3, 9]
numbers.append(10)
print numbers
```

In [7]:

```
numbers = [1, 4, 2, 3, 9]
numbers.insert(0, 10)
print numbers
```

In [8]:

```
numbers = [1, 4, 2, 3, 9]
numbers.insert(2, 10)
print numbers
```

I can also remove items from a list.

In [9]:

```
people = ["Hip", "Cindy", "Ken", "Bruce Wayne", "Judge Judy"]
print people.pop(0)
print people
```

In [11]:

```
people = ["Hip", "Cindy", "Ken", "Bruce Wayne", "Judge Judy"]
print people.pop(-1) # note: same as just people.pop()
print people
```

I can also join two existing lists together to form a new list by "adding" or concatenating them.

In [30]:

```
items1 = [1, 2, 3, 4, 5]
items2 = ['A', 'B', 'C']
print items1 + items2
print items2 + items1
```

In [12]:

```
data = ["house key", 7, 3.1, [3, 4, 1], ["lighter", [1, True], "pocket knife"]]
print data
```

In [13]:

```
things = [4, 1, 2, 9, 2, 'a', 2, 'thing', 4, 3.2]
for x in things:
print x
```

Create a function which takes a list of integers and prints whether the entry is positive, negative or zero.

Modify this function so that instead of printing "positive", "negative" or "zero", it returns the list of corresponding strings. (Can you make this more general? Instead of taking a number to the string "postive", "negative" or "zero", what if I want the squares of the numbers? Or each number incremented by one? How would you allow for this?)

Write a function which takes a list of numbers and returns the sum of the cubes of all the elements.

Write a function which finds the largest item in a list of numbers.

Write a function which returns a list but in reverse order.

Now that we've implemented a couple of these kinds of things, it turns out that Python has some built-in functionality that already does this for us. For example, if we want to find the largest element in a list, we can just use:

In [45]:

```
numbers = [2, 4, -1, 9, 10, 3, 20, 6, 2, 9, 9, 2, 3]
print max(numbers)
```

Similarly, if we want to add the elements, we can just use:

In [46]:

```
print sum(numbers)
```

One last, very useful and common operation is being able to sort a list of items:

In [49]:

```
numbers = [2, 4, -1, 9, 10, 3, 20, 6, 2, 9, 9, 2, 3]
print sorted(numbers)
```

*generator*. Although we won't go into depth with these, they become useful for performing small constructions or computations in a very compact way. For example, I can construct a list of squares using the "list-builder" notation:

In [17]:

```
[n**2 for n in range(1, 10)]
```

Out[17]:

In [21]:

```
numbers = [1, 5, 3, 9, 8]
sum(n**3 for n in numbers)
```

Out[21]:

Another common task is pairing up pieces of data. For example, in my list of friends' birthdays, I want a way of keeping a coherent grouping of each friend with their birthday. So something like, a list of pairs of the form (friend, birthday). Tuples are one possible solution to this:

In [14]:

```
birthdays = [
('James', 'Aug 25'),
('Tommy', 'Sept 12'),
('Jason', 'Jun 13'),
('Lexi', 'Feb 1'),
]
```

In [16]:

```
entry = birthdays[1]
print entry
print entry[1]
```

In [17]:

```
for (name, bday) in birthdays:
print bday
```

Further, we can make this even "cleaner" by just using:

In [19]:

```
for name, bday in birthdays:
print bday
```

In [16]:

```
numbers = [1, 5, 3, 6, 1]
for n, xn in enumerate(numbers):
print str(n) + ' -> ' + str(xn)
```

Suppose that I'm playing a card game like poker. What are some possible ways I could represent my current hand in the game?

Create a short list of friends' birthdays like we had above. Print a sorted version of this list. Notice the way Python sorts these? Now, write a function which creates a new birthday list but with the order of the name and the birthday swapped. Print out a sorted version of this list. Notice the difference?

Create a list of the numbers 1 to 10. Create a new list which consists of the consecutive pairs of numbers in the list. For example, [(1, 2), (2, 3), (3, 4), ..., (9, 10)]

Write a function which takes two tuples of (x1, y1) and (x2, y2) pairs and returns the "vector sum" (x1+x2, y1+y2).

*This suggests that we may try to implement an entire collection of support functions for vector algebra. Indeed, this kind of data abstraction is another crucial approach to building programs!*

Consider something like a reverse telephone book we're building for something like a caller ID system. We want to be able to identify a person by telephone number, so we need a data structure which takes a phone number and gives us a name. Second, we want to be able to update the our data by adding, removing or modifying entries as they change.

One way we may do this is to keep a big list of (key, value) pairs. But...to perform all the operations we want efficiently, we'd need to impose some kind of structure on how the elements are organized. If we're not planning on really using this structure aside from a short problem, this is perhaps too much effort to use.

Alternatively, we could try to implement such a data structure as a big list with possibly empty entries for every possible phone number. But, you can imagine, this is likely to be to dense of a representation for what we want...

What we really want is to use a dictionary, instead! These are a built-in data structure in Python and provide exactly the sort of key -> value behavior we would like.

In [4]:

```
phonebook = {}
phonebook['2173218822'] = 'Tommy Cool-guy'
phonebook['2173211122'] = 'Michael Jordan'
phonebook['2183321232'] = 'Bobby Bigmoney Patterson'
```

Alternatively, when initializing a dictionary, we can use the shorthand:

In [2]:

```
phonebook = {
'2173218822': 'Tommy Cool-guy',
'2173211122': 'Michael Jordan',
'2183321232': 'Bobby Bigmoney Patterson',
}
```

To look up an entry, we can simply use the same index notation we did for lists.

In [3]:

```
print phonebook['2173211122']
```

*At this point we should reflect on a major difference between lists and dictionaries. Remember that lists only take integer indicies, whereas dictionaries can use many other types as keys. Related to this, general types of keys don't have a natural ordering, so we should expect any kind of ordering on how the elements in a dictionary are stored.*

If we want to check if a key is in the dictionary, we can use the in and not in statements:

In [4]:

```
print '2173211122' in phonebook
print '2173211122' not in phonebook
```

We can remove entries using the del statement.

In [5]:

```
phonebook['2173211122'] = 'Michael Jordan'
print '2173211122' in phonebook
print phonebook['2173211122'] + ' is retiring...'
del phonebook['2173211122']
print '2173211122' in phonebook
```

Finally, as with lists, we can enumerate the elements.

In [6]:

```
for num, name in phonebook.iteritems(): # <- we discovered this should be just .items() in Python 3
print str(num) + ' -> ' + str(name)
```

- Write a dictionary which translates an abbreviation for each day of the week to a corresponding day number. For example:
Mon -> 1, Tues -> 2, Wed -> 3, Thur -> 4, ...

See if you can use this to build a "reverse" dictionary taking days numbers to day names. - Suppose you have the list:
['dog', 'pencil', 'fence', 'dog', 'apple', 'dog', 'dog', 'dog', 'pear', 'pencil', 'pear', 'pear']

Using a dictionary, compute the numbers of times each word occurs. - Using this, print out the
*two*most frequently occuring items. (Hint: Try converting the dictionary to a list of key-value pairs (or value-key pairs) and sort it.)

The last scenario we'll consider here is something like the following: Suppose that I'm fetching text from webpages and keeping track of the 20 most frequently occuring words on a particular webpage. Now, suppose that I want to check if there are any commonly occuring instances of such words from two different webpages.

How might I do this?

Well, if I stored both the elements in a list I could do something like, for each element in the first list, check if it's in the second.

The problem with this is that, say in the worst case, the two lists are totally disjoint, for each element in the first list, I'm performing a number of steps proportional to the length of the second list. In this case, 20. Then, I do this for each element in my first list. In total, I've now performed 400 steps!

We roughly say that this method scales as $O(n^2)$ of the input size. For our purposes, that's not a deal-breaker. But what if I were to double the data size? Or increase it an order of magnitude? Then we'd start to get into trouble... This is not something we're going to get into much here, but it's something worth keeping in mind when designing solutions to problems.

So, what else can we do? Well, we can use a smarter data structure! This time, a *set*. Python provides a data structure for working with sets in the same way we think of. For example, items will only occur in a set once. We can efficiently compute the union, intersection or difference of two sets. And so on...

In [10]:

```
A = {1, 2, 5, 7, 8, 9}
B = {2, 4, 5, 8, 10}
```

In [11]:

```
print A
print B
print len(A)
```

We can check set membership as follows.

In [12]:

```
print 2 in A
print 3 in A
```

Similar to lists, we can add and remove elements from sets:

In [13]:

```
A = {1, 4, 5}
A.add(3)
A
```

Out[13]:

In [14]:

```
A.remove(5)
A
```

Out[14]:

We can also compute some standard set operations as we would write them mathematically.

In [15]:

```
print A & B
print A | B
print A - B
```

Finally, sets support the standard subset comparisons we're used to:

In [16]:

```
A = {1, 2, 3}
B = {1, 3}
print B < A
print B <= A
```

In [17]:

```
S = {1, 2, 3}
for x in S:
print x
```

In [21]:

```
dice = range(1, 7)
pairs = {(a, b) for a in dice
for b in dice
if a == 2 or b == 2}
```

- Write a function which computes the symmetric difference of two sets.
- Suppose you have a list of numbers like:
[1, 4, 5, 3, 6, 7, 5, 3, 1, 9, 3, 3, 4, 2]

How could you use a set data structure to remove all the duplicates from the list? - Suppose that I'm running a server and I list of "bad" users who I want to make sure can't connect. Write a function which takes a list of users who want to connect and a set of bad users. For example, if I give it:
users = ['tom', 'pam', 'sasha', 'jj', 'que', 'jeff'] bad = set(['jeff', 'sasha'])

It should return an "allowed" list:['tom', 'pam', 'jj', 'que']

- Suppose you roll two fair die. Represent the pairs of die rolls for which at least one entry is either a 2 or a 5 using a set and compute the probability of getting one of these.