Data models are an important concept in system design and software engineering in general. A data model is a way to organise data, define relationships within data, and create rules about its use. Recently, I’ve taken a dive into the various data models used in software today, and today I’ll explain the hierarchical, document, relational, network, and graph models.
Hierarchical Model #
The hierarchical model is one of the most basic data models. It stores data as nested data within a tree. A famous example of this is JSON. In JSON, data is expressed as elements nested within other elements. The hierarchical model is good due to its simplicity and ability to express one-to-one and one-to-many relationships. However, there is no support for joins or many-to-many relationships, meaning that data needs to be denormalised or a join solution needs to be manually implemented. The hierarchical model is best used in cases where the volume of data is low and there are few relationships.
{
"school_system": "Tech Academy",
"departments": [
{
"dept_name": "Math",
"courses": [
{
"course_id": "M101",
"title": "Algebra I"
}
]
}
]
}
Relational Model #
This is perhaps the most well-known and popular data model. It is an old, reliable model that expresses data as entries in tables with relationships to other tables. Some prevalent databases which use the relational model are SQLite, Postres, and MySQL. It has dominated the database field for many decades now, but databases using other models such as document or graph are slowly gaining traction again. Here is an overview of the benefits and drawbacks of the relational model:
Benefits:
- Declarative Querying - Relational databases generally use declarative querying languages like SQL, meaning that the access path to data is determined by the DB engine, not by the programmer. This makes it far faster and easier to work with. Furthermore, the query optimiser ensures that access paths are optimal.
- Easy to add new indexes - Thanks again to the query optimiser, you don’t have to recreate queries after adding new indexes to a database. The query optimiser automatically abstracts this step away, meaning that querying is more consistent and uniform across databases and use-cases.
- Joins - Joins between tables are great for representing many-to-one and many-to-many relationships. The relational model makes performing joins very easy, allowing for complex relationships to be expressed effortlessly.
- Normalisation - Relattionships allow data to be stored while avoiding duplication, meaning that the overall storage required for a database is lower. Normalisation is one of the key pillars of the relational model.
Drawbacks:
- Impedence mismatch - Often the relational model does not match well with the object model of a specific application, requiring awkward, suboptimal modelling that is both unintuitive and inefficient.
- Less Locality - Since data is often stored in different tables, it may take many transactions to retrieve relevant data for a request since the data is not stored locally. This impedes performance.
- Less flexibility - It is difficult to make changes to a data model within a relational database, since it would mean a new table or database has to be made and all the data has to be migrated there. This may be problematic because migrating may not be feasible to do in one go if there it too much data, but migrating it eventually may take a long time.
- Limited to homogenous data - This relates to point 1 quite a lot. Relational databases do not allow for heterogenous data, meaning that two objects with any difference in their model would have to be placed into two different tables, increasing complexity.
Network Model #
The network model is probably the most obscure model on this list. It is not really used anywhere since the graph model can do everything the network model can but simpler and better.
To start off with, the network model was developed by the CODASYL, or the Conference on Data Structuring Languages, in the 1960s. This model pre-dates the relational model, making it the second oldest model on this list. The network model represented data as nodes that each have several parents. This means that inserting new values and representing many-to-many relationships is easy yet querying is very difficult.
The big flaw of the network model is that data cannot be directly accessed, and instead to find it you must traverse the whole data structure. This means that creating a functional system required a lot of complex implementation which was also not very performant.
Graph Model #
The graph model is a more modern, flexible and simple version of the network model. In a graph, data is represented as nodes with edges that connect it to other nodes. This provides very good support for all types of relationships as well as heterogenous data.
The edges between nodes are directed and can represent an arbitrary relationship. This is immensely useful due to the flexibility it provides. For example, consider a social networking site where people can share their location to an arbitrary degree. Regions may be represented as nodes with edges that show what larger region it is part of and what smaller regions it contains. The Ueno district would have a “district of” edge to the Tokyo node, which might have a “city of” edge to Japan. This means that users can easily find other users within certain areas, since to find people in an area you just have to follow edges back from that node.
There are several key differences between the network and graph models.
- The network model has a schema which regulates possible nestings between records, a limitation that the network model does not have
- To access data in the network model, you must traverse to it. In a graph database, you can find nodes by unique identifiers and they can also be indexed meaning you can find them much easier.
- The network model stores records as an ordered set, whereas the graph model stores it as an unordered set.
Another thing advantage to the graph model is that there are many tools and databases that have already been made for it. Key databases include:
- AWS Neptune
- Neo4j
- TigerGraph
Conclusion #
These are the 4 major data models that have defined database systems over the past several decades. It is important to understand the differences between each datatype when designing a system and picking a database, as picking the incorrect model may significantly complicate the building of a new system.