Before we take a look at the major building blocks, it's nice to have at least a basic understanding of how memory is arranged. You can think of memory as a long row of cells, each having an address and a payload.

Figure 1 - Simple model of computer memory
Each numbered box in figure 1 represents a memory node, where the number above the node is its address and the contents below represent the actual data we are interested in. The address tells us where to look for the data so we can read or change the data's value. So instead of trying to describe the entire contents of the data every time we access it, all is needed is the numeric address, which is easily searchable. (This numeric search is done in the background in your programming project. In reality, you are referencing the data with an easily readable name of your choice, which is interpreted as a numeric address).
There are two basic ways in which to organize variable data from which to create your own data structures. One is with an array and the other is with a linked list. You can think of these two organizations of memory, arrays and linked lists, as the basic building blocks used to make any kind of data structure you can think of. You can use them alone, or by combining them together in various interesting ways. What makes things really interesting is not just their structure, but the operations performed on their structure. I could have two totally different data structures that both use a linked list as their basis, but act completely differently depending on how I am allowed to interact with them.
The Array
Simply put, an array is a series of variables ordered by address.

Figure 2 - An array

Figure 3 - Adding a node to an array
The Linked List

Figure 4 - A pointer

Figure 5 - Adding node to a linked list
While we gain the ability to easily add or remove nodes, we lose efficient random access. In order to find a particular node, it would be necessary to traverse each pointer one by one until we find what we are looking for.
Example: The Heap
What I would like to do now is describe a data structure known as a heap. After that we'll see how it can be implemented as either an array or a linked list.

Figure 6 - Binary trees
Figure 6 shows us two binary trees. The one on the left is a heap while the one on the right is not. Notice that the heap is completely full (each node has two children) up until the bottom level, where the child noes are filled from left to right. Also notice that each parent node contains a value that is larger than either of its two child nodes. So a heap is a binary tree with 2 special properties:
1) All nodes at the bottom level are as far to the left as possible
2) Each node's value is larger than either of its children
What kind of operations can we perform on a heap? There are just three: inspection of the root node, removal of the root node and insertion of a new node. The interesting thing here is that we can always find the largest value in a heap by inspecting or removing the root node.
From looking at how a heap is put together you might right away think to use a linked list. Let's see how this is done. Instead of a singly linked list, we're going to need each node to have two pointers, one to each of its children. If it happens that a child doesn't exist (as on some of the the bottom level nodes of a heap), the pointer could just be pointing to nothing. We also need to keep track on the root node and the bottom rightmost nodes, so we'll have pointers to each of those as well.
Inspection of the root node is trivial since we are always keeping track of the root node as our starting point into the linked list. Removing the root node and inserting a new node are a little trickier. If we were to remove the root node, some other node would need to take its place. We already know that a heap's bottom level is filled from the left, so taking the rightmost bottom node would maintain the heap property. Now that a node has been found to replace the root node, you may realize that a heap property has been violated. The root node's value is no longer larger than each of its children. To resolve this, we can continually swap the current node with the larger of its two children until the current node has two smaller children or no children at all (figure 7).

Figure 7 - Removing the root node of a heap
When we perform the swapping we are not actually trading the node values, but changing the child pointers of the current node with the child pointers of swapped node. Inserting a new node into the heap is very similar to removing the root node, except that the operation happens in reverse. We place the new node beside the bottom-most right-most node, and continually swap with its parent until the heap property #2 is resolved.
A heap can also be built using an array. Storing a heap in a linear array may seem strange, but it is actually the most efficient way to do it.

Figure 8 - Heap as an array
From figure 9 we can see that the node at position 2 has its children at position 2*2+1=5 and 2*2+2=6. Generalizing this makes any node's position at k have it's child positions at 2*k+1 and 2*k+2. Notice that because the bottom row is always filled from the left, we will never have any gaps in our array. The three heap operations act in much the same way as before except that when we swap nodes we are swapping their values and not any child pointers.
There is no real penalty in using an array over a linked list in the case of a heap, except that each node in the array stores less data (no pointers). When building your own data structures however, you may find certain advantages to using either one. Just remember that arrays are useful for quick random access, while linked lists are useful for quick inserts and removals from the middle. When you think about what you are building, figure out what operations will be used and how a linked list or array will handle each case.
Post Comments