Data Modelling is a critical component of any comprehensive data strategy. Defining, analysing and understanding how your data currently fits together and designing and building the ideal data model off the back of this, provides the best foundation for wider data programmes and reveals opportunities for truly unlocking value. We help many of our clients with data modelling and have laid out some of our top tips below, on how to build and maintain a robust data model which in turn helps the enterprise function efficiently and profitably.
Definition of data modelling
Data modelling is the process of discovering, analysing and scoping data requirements, and then representing and communicating these data requirements in a precise form called the Data Model. Data Models depict and enable an organization to understand its data assets. Data modelling is a critical component of data management. The modelling process requires that organizations discover and document how their data fits together. The modelling process itself designs how data fits together.
Shown below is the SIPOC diagram for Data Modelling (image sources DAMA DMBOK).
* SIPOC diagram is a Six Sigma tool used for documenting business processes. The word SIPOC stands for Suppliers, Inputs, Process, Outputs, and Customers which form the columns of the diagram. SIPOC diagram visually documents a business process from beginning to end.
Tip 1: Consider the level of detail required carefully
Models exist at three levels of detail: conceptual, logical, and physical.
- Conceptual Model: A Conceptual Model at this level is the representation of business concepts at the enterprise level and the relationship of the business entities. This model should always be used when interacting with C-level executives. They will understand the conceptual model better than any other model because it represents the data assets across the breadth of the enterprise at its highest level. In the conceptual model these data assets will usually map to business capabilities in its current state or in its target state.
- Logical Model: A Logical Model is a more detailed version of a data model than a conceptual model but not at its most granular level. Most often a logical model is focussed on a particular subject area within the enterprise. It is a platform-agnostic representation of data requirements and business rules governing that subject area. Product owners, programme/project managers or business analysts would typically fall into the category who might best understand the logical model.
- Physical Model: The Physical Model represents a technical solution of the logical data model adapted to work with a specific set of hardware, software and network tools. It is the physical model which gets translated into a physical implementation in a database. Because the physical data model accommodates technology limitations, structures are often combined (denormalized) to improve retrieval performance, as shown in this example with Student and School. These are targeted more towards DBAs, Database Developers, ETL Developers or Support staff.
Tip 2: Always have a top down approach when designing a data model
It is always a good practice to start with the conceptual model to understand what is most important to the business. Eventually, all the data assets which are crucial for smooth running of the business processes should be mapped into the conceptual model. Ensuring that all the important data entities are mapped into the conceptual model minimises the risk of having a gap in the enterprise data model (EDM). This will save a lot of time and effort in the long run and will avoid rework when going into detailed designs.
There are frequently physical databases on distributed platforms which move data from system to system and the modelers are asked to design the physical model first. In effect, they are asked to take a bottom-up approach to map all the existing databases, tables and other data assets into the physical data model. This physical model is then rolled up into a logical and a conceptual model. What transpires from that is a conceptual model which might have gaps in it (in terms of the business capabilities), or the model may have misrepresentation of entities or have badly defined relationships. For example, for an investment bank one business area might say a customer is any individual or organisation which has at least enquired about the products, whereas another business area might define a customer as a revenue-generating entity.
All these issues can be resolved when approaching the task of data modelling from a top-down perspective.
Tip 3: Keep the model as simple as possible
It is very easy for a data model to become a web of endless entities and data assets at any level of detail, especially in the physical model. A good data modeler should always strive to keep the model as simple as possible and easily readable across lots of different roles and purposes. The more self-explanatory a model is, the easier it is to maintain. An end-user will always want to use the data as easily and quickly as possible. For a data warehouse, a de-normalised model will be the user’s first preference because the data will be readily available without too much hassle. This means that catering to a specific requirement is a good practice to keep the model simple and readable.
Tip 4: The ability to evolve is the ultimate test of a good data model
A data model is never a static deliverable. It is an ever evolving and expanding artefact. Many times, a small business change can have a big impact on the existing implementation. Trying to be pro-active and keeping room for expansion is always better than dissecting a good model and stitching it in a number of places. This is easy to say in theory but can be difficult to implement practically, but sometimes a clear data model can emerge and tells us the best way to go about building it. For example, suppose we identify a Customer entity as a single individual and build relationships to this customer entity across our model, what would happen when a group of individuals (like an organisation) want to act as a single customer? A data modeler should be pro-active enough to gauge this and instead of building relationships to this customer entity, the modeler can create a super-type called ‘Party’ and create sub-types ‘Customer’ and ‘Organisation’ and then create relationships from this ‘Party’ super-type to the rest of the model.
Tip 5: A data model should always be supported by robust documentation
When it comes to documentation, the more the better but despite this, rarely do we see solid documentation around any deliverable. Having sufficient and updated documentation is necessary to a greater extent when it comes to data models because a model can be interpreted by the reader as they wish. It would be tragic to build a very good data model and have the implementation fall flat just because the DBA couldn’t understand the data model. There are plenty of documents which can support a good data model for example Data Dictionary, Business Glossary and Data Catalogue. Putting that effort into building a good documentation will go a long way in keeping that data model accurate and relevant.
With the right Data Model in place, you can expect to reduce costs and complexity, improve processes and collaboration and drive efficiencies within your organisation. If you are curious about how data modelling works in practice or would like help unlocking the promise of data modelling within your organisation, please get in touch with our experts.