Jump To Content
Computer Science

Computer Science


Building Blocks for Data Structures

It's difficult if not impossible to start any programming project without thinking about variables.  Like in mathematics, a variable is a placeholder for things like numbers that change over the course of a given operation.  Now variables can represent things like integers, floating point numbers or even letters of the alphabet.  These types of variables are what you would normally get for free in any programming language.  But what if you wanted to create your own types of variables to play with?  It is possible, if you are familiar with the basic building blocks of data structures.  The building blocks described in this lesson will allow you to create variable types ranging from the simple to the extravagant.  A variable type is also known as a data structure, which describes how a variable is organized and the operations that can be performed on it.

Before we take a look at the major building blocks, it's nice to have at least a basic understanding of how memory is arranged.  You can think of memory as a long row of cells, each having an address and a payload.


Figure 1 - Simple model of computer memory

Each numbered box in figure 1 represents a memory node, where the number above the node is its address and the contents below represent the actual data we are interested in.  The address tells us where to look for the data so we can read or change the data's value.  So instead of trying to describe the entire contents of the data every time we access it, all is needed is the numeric address, which is easily searchable.  (This numeric search is done in the background in your programming project.  In reality, you are referencing the data with an easily readable name of your choice, which is interpreted as a numeric address).

There are two basic ways in which to organize variable data from which to create your own data structures.  One is with an array and the other is with a linked list.  You can think of these two organizations of memory, arrays and linked lists, as the basic building blocks used to make any kind of data structure you can think of.  You can use them alone, or by combining them together in various interesting ways.  What makes things really interesting is not just their structure, but the operations performed on their structure.  I could have two totally different data structures that both use a linked list as their basis, but act completely differently depending on how I am allowed to interact with them.

The Array

Simply put, an array is a series of variables ordered by address.


Figure 2 - An array

An array has a starting address and a length.  So if I wanted to get the 3rd variable in the array in figure 2 (the orange), I'd take the starting address of 4589 and add 2, which would give me the address of the orange as 4591.  Using this simple math it is easy access any of the fruit data very quickly since we know the array will always be ordered sequentially.  This kind of addressing system is known as random access.  The catch here comes about when we want to add or remove pieces of fruit from the array.  Adding or removing from the end of the array is not a big deal assuming we've allocated enough space there.  But adding or removing a single piece of fruit from anywhere else would mean shifting some of the pieces of fruit over one space.  And this gets more expensive to do if our array had many nodes.


Figure 3 - Adding a node to an array

The Linked List


The linked list solves the problem we were having with the array, where we had to shift nodes over.  To understand this it is necessary to know a little bit about something called a pointer.   Pointers do one thing:  they point to other variables. 


Figure 4 - A pointer

In figure 4 we see a pointer as a variable itself that has a numerical value representing the address of another variable.  This address can be anywhere in memory, not just right next to the pointer.  What you can then do is chain pointers together so that they form a list of some sort.  The simplest of such lists is known as a singly linked list, where every node except the last contains a pointer to the next node in the list.  Since with a linked list we don't need the addresses to be sequentially ordered, it is easy to add or remove nodes from the middle.  When a node is added to a singly linked list, we can just change the left node's pointer to point to the new node, and set the new node's pointer to point to the right node (figure 5). 


Figure 5 - Adding node to a linked list

While we gain the ability to easily add or remove nodes, we lose efficient random access.  In order to find a particular node, it would be necessary to traverse each pointer one by one until we find what we are looking for.

Example: The Heap

What I would like to do now is describe a data structure known as a heap.  After that we'll see how it can be implemented as either an array or a linked list.

A heap is actually a special case of a binary tree data structure.  A binary tree has a root node with up to 2 child nodes, where each child can have up to two children as well, and so on forming a tree-like structure. 

Figure 6 - Binary trees

Figure 6 shows us two binary trees.  The one on the left is a heap while the one on the right is not.  Notice that the heap is completely full (each node has two children) up until the bottom level, where the child noes are filled from left to right.  Also notice that each parent node contains a value that is larger than either of its two child nodes.  So a heap is a binary tree with 2 special properties:

1) All nodes at the bottom level are as far to the left as possible
2) Each node's value is larger than either of its children

What kind of operations can we perform on a heap?  There are just three:  inspection of the root node, removal of the root node and insertion of a new node.  The interesting thing here is that we can always find the largest value in a heap by inspecting or removing the root node. 

From looking at how a heap is put together you might right away think to use a linked list.  Let's see how this is done.  Instead of a singly linked list, we're going to need each node to have two pointers, one to each of its children.  If it happens that a child doesn't exist (as on some of the the bottom level nodes of a heap), the pointer could just be pointing to nothing.   We also need to keep track on the root node and the bottom rightmost nodes, so we'll have pointers to each of those as well. 

Inspection of the root node is trivial since we are always keeping track of the root node as our starting point into the linked list.  Removing the root node and inserting a new node are a little trickier.  If we were to remove the root node, some other node would need to take its place.  We already know that a heap's bottom level is filled from the left, so taking the rightmost bottom node would maintain the heap property.   Now that a node has been found to replace the root node, you may realize that a heap property has been violated.  The root node's value is no longer larger than each of its children.  To resolve this, we can continually swap the current node with the larger of its two children until the current node has two smaller children or no children at all (figure 7). 



Figure 7 - Removing the root node of a heap

When we perform the swapping we are not actually trading the node values, but changing the child pointers of the current node with the child pointers of swapped node.  Inserting a new node into the heap is very similar to removing the root node, except that the operation happens in reverse.  We place the new node beside the bottom-most right-most node, and continually swap with its parent until the heap property #2 is resolved.

A heap can also be built using an array.  Storing a heap in a linear array may seem strange, but it is actually the most efficient way to do it. 


Figure 8 - Heap as an array

From figure 9 we can see that the node at position 2 has its children at position 2*2+1=5 and 2*2+2=6.  Generalizing this makes any node's position at k have it's child positions at 2*k+1 and 2*k+2.  Notice that because the bottom row is always filled from the left, we will never have any gaps in our array.  The three heap operations act in much the same way as before except that when we swap nodes we are swapping their values and not any child pointers.

There is no real penalty in using an array over a linked list in the case of a heap, except that each node in the array stores less data (no pointers).  When building your own data structures however, you may find certain advantages to using either one.  Just remember that arrays are useful for quick random access, while linked lists are useful for quick inserts and removals from the middle.  When you think about what you are building, figure out what operations will be used and how a linked list or array will handle each case.


Andrew Brown
  • Authority 512
Post Body
Andrew Brown said:

Nice First LearnHub Lesson!

  • Quote
  • Posted 5 months ago.
nelliemuller
  • Authority 554
Post Body
nelliemuller said:

Excellent lesson!

  • Quote
  • Posted 5 months ago.
mohi4all
  • Authority 20
Post Body
mohi4all said:

it neeeds some implementational approach….beside that it is ood

  • Quote
  • Posted about 1 month ago.
powder
  • Authority 18
Post Body
powder said:

i like the pictures of the lesson.

  • Quote
  • Posted about 1 month ago.
  • Your comment will be modifiable for 10 minutes after posted.

Page Author

Avatar
melancton
Name
melancton

From Here You Can…

Information

Most Recent Related Content

  • Lesson
    Avatar
    Lesson_page_128x128
    Title
    Introduction to XHTML
    Body
    IntroductionThis lesson is aimed to provide a brief introduction into XHTML f...
    Author
    jredse jredse
  • Lesson
    Avatar
    Lesson_page_128x128
    Title
    Unzipping a File
    Body
    Sometimes when downloading a file for Windows, the file has a .zip at the end...
    Author
    anteaya anteaya
  • Lesson
    Avatar
    Lesson_page_128x128
    Title
    C programming
    Body
    C programming is easy and at the same time difficult too. Therefore, it is ad...
    Author
    shubhi shubhi
  • Lesson
    Avatar
    Lesson_page_128x128
    Title
    Union, Typedef, Enums
    Body
    By now, you already know what is an array, what is a structure, and how to us...
    Author
    zainvi_sf zainvi_sf
  • Lesson
    Avatar
    Lesson_page_128x128
    Title
    Using md5 sums
    Body
    Once you have downloaded a file from the internet, it is always nice to check...
    Author
    anteaya anteaya

Published In…

© 2008 Jamie Wheatley, All Rights Reserved.