Skip to content

Full Tree Selects via Materialized Paths

At my internship I have been working on this really awesome in-house analytics systems.  One of the database tables used in this project behaves as a self-referential tree.  Since I am using Ruby on Rails for this project, I decided to go with the acts_as_tree plugin on this particular model.  This plugin adds various tree related methods with a minimal requirement that a foreign key column named “parent_id” exists in the corresponding table.

This plugin functioned as expected and something can certainly be said for sheer simplicity that this particular tree structure employs.  The complication arose when my application called for a full select of all the descendants from a given node.  This particular tree structure doesn’t support an efficient way to grab all of the corresponding descendant records.

So I decided to implement a Materialized Path on top of my simple tree structure by adding a path column to the table. Here is a sample of what my tree table would look like.  For the sake of the example, we’ll call this table Nodes:

id parent_id path
1 NULL
2 NULL
3 1 N1
4 2 N2
5 3 N1.N3
6 4 N2.N4
7 6 N2.N4.N6

Each path is constructed by taking a given node’s parent path and concatenating it with the parent ID.  I like to prefix each ID number with the first letter of the models name, hence the “N” from “Nodes”.  Keep in mind that any piece of unique data can be used, it doesn’t need to be the ID number. My analytics system uses an alternative string of unique data for the materialized path.

Now grabbing all of the descendants for node 2 is quite simple really; here’s a sample SQL query:

SELECT * FROM nodes WHERE path LIKE "N2%";

The Rails code for adding this functionality is equally simple.  A before_create callback in the given model will do the trick.  Here is a quick sample:

before_create do |n|
  self.path = "#{n.parent.path}N#{n.parent.id}"
end

Also note, that depending on the data type used for the path column, you maybe limited on how deep the descendants can go.  For instances, a varchar(255) column only allows 255 characters, so if each node takes up an average 5 characters we’re limited to 51 nodes.  While this may seem like a lot, it’s something to just keep in mind when implementing this solution.

Post a Comment

Your email is never published nor shared. Required fields are marked *
*
*