making noise since 1977

Parsing Hierarchical Data Structures

« | Mon June 16, 2008 | comments and reactions | permanent link | »

We've been recently tasked with designing an arguably complex application, which will rely heavily on storing, updating, and retrieving hierarchical data (think data-tree) in a meaningful way.

Two helpful introductions to the subject are Storing Hierarchical Data in a Database (sitepoint.com), and Four Ways to Work with Hierarchical Data (evolt.org).

If data management geekery like this interests you, you can go view the articles, but in a nutshell, the evolt.org article suggests four strategies:

  1. Recursion
  2. Stack
  3. Flat Table
  4. Modified Preorder Tree Traversal Algorithm

Intuitively, we jumped on our problem with a Recursive approach - one conceptually elegant function which could walk the "tree" and continue until no more children of the parent (root) element were found. Two main advantages of this approach are its relative ease of human comprehension, and the fact that the same recursive algorithm can easily be called with any node of the tree serving as the root.

The main disadvantage appears to be a processing/memory cost that may become unacceptable as the application is asked to scale. We still have to explore some benchmarking in our particular application, but our problem is perhaps compounded by the project's need to not only parse and display the "tree", but to perform mathematical calculations with every element in the requested tree.

Fortunately, our system is for an in-house MIS, which does allow the luxury of being able to ignore, or at least be less concerned with lengthy execution times, than might be demanded of a production-level application with higher demand and hundreds of concurrent users.

Still, with MySQL 5's stored procedure capability, the Stack method may net considerable performance gains. Even the flat table method might be worth exploring. What doesn't at this point seem possible with our system, is the complexly elegant Modified Preorder Tree Traversal Algorithm, which seems to require far too much overhead to keep updated in a system which may well eventually house tens, or even hundreds of thousands of parent>child relationships.

See also: Tree Traversal (Wikipedia)

blog comments powered by Disqus