Building Blocks for Data Structures
It's difficult if not impossible to start any programming project without thinking about
variables. Like in mathematics, a variable is a placeholder for things
like numbers that change over the course of a given operation. Now variables can represent things like integers, floating point numbers or
even letters of the alphabet. These types of variables are what you
would normally get for free in any programming language. But what if
you wanted to create your own types of variables to play with? It is
possible, if you are familiar with the basic building blocks of data
structures. The building blocks described in this lesson will allow
you to create variable types ranging from the simple to the
extravagant. A variable type is also known as a data structure, which
describes how a variable is organized and the operations that can be
performed on it.
Before we take a look at the major building blocks, it's nice to have
at least a basic understanding of how memory is arranged. You can think of memory as a long row of cells,
each having an address and a payload.
Figure 1 - Simple model of computer memory
Each numbered box in figure 1 represents a memory node, where the
number above the node is its address and the contents below represent
the actual data we are interested in. The address tells us where to
look for the data so we can read or change the data's value. So
instead of trying to describe the entire contents of the data every
time we access it, all is needed is the numeric address, which is
easily searchable. (This numeric search is done in the background in
your programming project. In reality, you are referencing the data
with an easily readable name of your choice, which is interpreted as a
numeric address).
There are two basic ways in which to organize variable data from which
to create your own data structures. One is with an array and the other
is with a linked list. You can think of these two organizations of
memory, arrays and linked lists, as the basic building blocks used to
make any kind of data structure you can think of. You can use them
alone, or by combining them together in various interesting ways. What
makes things really interesting is not just their structure, but the
operations performed on their structure. I could have two totally
different data structures that both use a linked list as their basis,
but act completely differently depending on how I am allowed to
interact with them.
The Array
Simply put, an array is a series of variables ordered by address.
Figure 2 - An array An array has a starting address and a length. So
if I wanted to get the 3rd variable in the array in figure 2 (the
orange), I'd take the starting address of 4589 and add 2, which would
give me the address of the orange as 4591. Using this simple math it
is easy access any of the fruit data very quickly since we know the
array will always be ordered sequentially. This kind of addressing
system is known as random access. The catch here comes about when we
want to add or remove pieces of fruit from the array. Adding or
removing from the end of the array is not a big deal assuming we've
allocated enough space there. But adding or removing a single piece of
fruit from anywhere else would mean shifting some of the pieces of
fruit over one space. And this gets more expensive to do if our array
had many nodes.
Figure 3 - Adding a node to an array
The Linked List
The linked list solves the problem we were having with the array, where
we had to shift nodes over. To understand this it is necessary to know
a little bit about something called a pointer. Pointers do one
thing: they point to other variables.
Figure 4 - A pointer In
figure 4 we see a pointer as a variable itself that has a numerical
value representing the address of another variable. This address can be anywhere in memory, not just right next to the pointer. What you can then do is chain pointers together so that they form a list of some sort. The
simplest of such lists is known as a singly linked list, where every
node except the last contains a pointer to the next node in the list.
Since with a linked list we don't need the addresses to be sequentially
ordered, it is easy to add or remove nodes from the middle. When a
node is added to a singly linked list, we can just change the left
node's pointer to point to the new node, and set the new node's pointer
to point to the right node (figure 5).
Figure 5 - Adding node to a
linked list While we gain the ability to easily add or remove nodes,
we lose efficient random access. In order to find a particular node,
it would be necessary to traverse each pointer one by one until we find
what we are looking for.
Example: The Heap
What I would like to do now is describe a data structure known as a
heap. After that we'll see how it can be implemented as either an
array or a linked list.
A heap is actually a special case of a binary tree data structure. A
binary tree has a root node with up to 2 child nodes, where each child
can have up to two children as well, and so on forming a tree-like
structure. 
Figure 6 - Binary trees
Figure 6 shows us two binary trees. The one on the left is a heap
while the one on the right is not. Notice that the heap is completely
full (each node has two children) up until the bottom level, where the child noes are filled from left to right. Also notice that each parent node
contains a value that is larger than either of its two child nodes. So a
heap is a binary tree with 2 special properties:
1) All nodes at the bottom level are as far to the left as possible2) Each node's value is larger than either of its children
What kind of operations can we perform on a heap? There are just
three: inspection of the root node, removal of the root node and
insertion of a new node. The interesting thing here is that we can
always find the largest value in a heap by inspecting or removing the
root node.
From looking at how a heap is put together you might right away think
to use a linked list. Let's see how this is done. Instead of a singly
linked list, we're going to need each node to have two pointers, one to
each of its children. If it happens that a child doesn't exist (as on
some of the the bottom level nodes of a heap), the pointer could just be
pointing to nothing. We also need to keep track on the root node and
the bottom rightmost nodes, so we'll have pointers to each of those as
well.
Inspection of the root node is trivial since we are always
keeping track of the root node as our starting point into the linked
list. Removing the root node and inserting a new node are a little
trickier. If we were to remove the root node, some other node would
need to take its place. We already know that a heap's bottom level is
filled from the left, so taking the rightmost bottom node would
maintain the heap property. Now that a node has been found to replace
the root node, you may realize that a heap property has been violated.
The root node's value is no longer larger than each of its children.
To resolve this, we can continually swap the current node with the
larger of its two children until the current node has two smaller
children or no children at all (figure 7).
Figure 7 - Removing the root node of a heap When we
perform the swapping we are not actually trading the node values, but
changing the child pointers of the current node with the child pointers
of swapped node. Inserting a new node into the heap is very similar to
removing the root node, except that the operation happens in reverse.
We place the new node beside the bottom-most right-most node, and
continually swap with its parent until the heap property #2 is resolved.
A heap can also be built using an array. Storing a heap in a linear
array may seem strange, but it is actually the most efficient way to do
it.
Figure 8 - Heap as an array From figure 9 we
can see that the node at position 2 has its children at position 2*2+1=5
and 2*2+2=6. Generalizing this makes any node's position at k have it's child positions at 2*k+1 and 2*k+2.
Notice that because the bottom row is always filled from the left, we
will never have any gaps in our array. The three heap operations act
in much the same way as before except that when we swap nodes we are
swapping their values and not any child pointers.
There is no real penalty in using an array over a linked list in the
case of a heap, except that each node in the array stores less data (no
pointers). When building your own data structures however, you may
find certain advantages to using either one. Just remember that arrays
are useful for quick random access, while linked lists are useful for
quick inserts and removals from the middle. When you think about what
you are building, figure out what operations will be used and how a
linked list or array will handle each case.
-
-
- Post Body
-
Nice First LearnHub Lesson!
-
- Quote
- Posted 5 months ago.
-
-
- Post Body
-
-
- Quote
- Posted 5 months ago.
-
-
- Post Body
-
it neeeds some implementational approach….beside that it is ood
-
- Quote
- Posted about 1 month ago.
-
-
- Post Body
-
i like the pictures of the lesson.
-
- Quote
- Posted about 1 month ago.