Previously, I outlined our simplified approach to leveraging data as a strategic asset to create a competitive advantage. This methodology grouped activities into three distinct technology areas of focus: Data Integration, Data Management, and Data Presentation, which require a best practices approach to handling the organizational dynamics to successfully deliver these capabilities to the enterprise.
This article is primarily focused on data management which is the second part of this methodology.
Data Management is about recognizing your data as an asset and one of your organization’s most valuable resources. You must thoughtfully collect, store, model, and govern your data in a way that optimizes the performance of your data-driven applications. These tasks will support the streamlining of your data lifecycle to enable a data-driven organization by moving the data closer to the point of action.
Data Collection is focused on the process of managing the data that you are collecting. At first glance, many customers will say that they want to simply collect all data and keep it forever. While this may be a noble answer, it certainly is not the most effective approach; leading to the much more ambitious practice of trying to boil the ocean. As a big believer in agile development practices that deliver numerous iterations to deliver value along the way, I always recommend that we start small by focusing on what data is essential to the business process and what the different retention policies should be for this data.
Data Storage is about supporting the data lifecycle and how we should best store our data to support additional processing of the data and support business activities. This is about determining the best method to support these processes based on several variables while applying business rules based upon data classifications. Here are a few variables that would affect the data storage strategy: is the data structured or unstructured, what point of the data lifecycle is this data being stored, and what if any additional processing of this information may need to occur? In a simplified data process, you may have multiple storage points supporting the data throughout the data lifecycle as it moves closer to the point of action. An example is that you could initially store data in a data lake, applying some data integration techniques and transformations to then store in a data warehouse, which can then be moved and further processed to be stored in a business process specific data mart or analytics application that is closer to the point of action. As it relates to the data lifecycle, it is extremely important to factor in how the data will be used currently along with how it could be leveraged in the future as you start to think about where and how the data will be stored.
Data Modeling is a structured representation of the data geared towards the context in which the data will be used. This data must be structured in a way that is easy to use and understand while meeting the needs of the business users along with supporting the associated business processes. Effective data models must be designed with a focus on the process that it’s designed to support. This is another area where agile development practices play an important role. Again, I would recommend starting small and progressively enhancing the model to make it more complete.
In summary, data management plays a critical role in supporting the data lifecycle and building the foundations that make streamlining the data lifecycle possible. Organizations should carefully plan and consider their data management strategy and how it can support the larger needs of the organization.