SoftScience Group, Inc.

Information Naturalization Principles

White Papers

Information Naturalization Principles

Information Naturalization uncovers the natural structure of declarative knowledge

By David Neal
President, SoftScience Group, Inc.

Knowledge has structure. Computer systems are designed as machine representations of knowledge. The form of those representations of knowledge within a computer system should be conceptually understood in new ways in order to better design reusable knowledge components.

Broadly speaking, there are two significant types of human knowledge that exist, declarative knowledge and procedural knowledge.

There is also a distinction between data and information. The difference between data and information is that information has context. Information can be thought of as collections of related data stored in such a manner that the relationships that those data groupings have with other instances of data groupings are preserved.

Declarative knowledge is the portion of information that is descriptive of some real world entity. Declarative knowledge always has one or more state values, meaning it always has data values associated with the instances of real world entities. Those state (data) values need to be preserved in the temporal space (over time). The idea of preserving data values over time is called persistence. Declarative knowledge requires persistence.

Declarative Knowledge

(Credit ‐ illustration by SoftScience Web Media)

Procedural knowledge is descriptive of some real-world process. Procedural knowledge transforms real world entities from one form to another. Procedural knowledge transforms the declarative knowledge associated with real world entities from one form to another. Procedural knowledge is a consumer of declarative knowledge at the beginning and during the process and is a producer of declarative knowledge at the back end of the process.

Procedural Knowledge

(Credit ‐ illustration by SoftScience Web Media)

Since knowledge is an asset of any human organization or endeavor, the ability to reuse and redeploy knowledge is an important strategy for human productivity.

Computers have extended our reach in terms of knowledge management. Over time, software inventions have given us the tools to improve the organization and management of the knowledge that we possess. Higher levels of abstraction in programming languages and environments have made it easier to model procedural knowledge. Relational algebra and other computational inventions have revolutionized the storage of information, providing the needed persistence and structure for declarative knowledge.

We are now at the point that we should begin to understand the structure of knowledge in even broader, more universal ways in order to better reuse and redeploy our knowledge asset base.

Information Naturalization is a term that we first began to use in 1991 to describe this idea of the natural structure of declarative knowledge and how to recognize, define, and describe it. Declarative knowledge naturally comes in related groups of data items which collectively describe and define a particular instance of some object in the real world. Information Naturalization is a set of observations about how to determine the natural groupings of declarative knowledge and translate those knowledge groupings into appropriate units for persistent storage and future use.

During that same late 80s, early 90s time period, the value of reusable procedural knowledge structures were becoming understood as part of the evolving programming language paradigms. The design of these reusable procedural knowledge structures was termed object-oriented design.

Object-oriented design (and programming) was essentially the procedural knowledge equivalent of Information Naturalization. By examining the real world instances of objects and looking at their behaviors and internal processes along with their dependent declarative knowledge (consumptive inputs and resultant outputs), computer scientists realized that procedural knowledge could be efficiently modeled in reusable software code components called objects. We termed the study of this type of transformative procedural knowledge into more basal component structures as Information Functionalization.

Since object-oriented design and programming is now widely understood, we will focus more on Information Naturalization than Information Functionalization. Both are critical in the reuse of knowledge, but declarative knowledge has always been thought of as being the dependent sibling of procedural knowledge.

Information Naturalization is based on an unconventional way of looking at the problem domain of real world systems. Our research asserts that declarative knowledge and procedural knowledge are not co-equal in importance in the real world. Neither is declarative knowledge subordinate to procedural knowledge.

We assert that the declarative knowledge component is actually the primary independent portion of knowledge. Software systems should be designed based on the idea that procedural knowledge operates on a subset of the declarative knowledge rather than continuing to view the declarative knowledge as simply a set of raw materials for the process(es). Declarative knowledge should be understood first and its stored structure should be designed for maximum reuse within an enterprise's software architecture.

Procedural knowledge instances are logically subordinate to the existence of the declarative knowledge. A thing exists first before it can be consumed / used by a process. Understanding the declarative knowledge associated with something, including its identification and relationships to other things, is a necessary precondition for the understanding of the procedural component of knowledge.

We further assert that other modeling approaches, including most object-oriented modeling strategies, start with a bias toward the procedural knowledge component because software is fundamentally about performing some process(es). The analyst/designer of software is naturally influenced to focus on elements of declarative knowledge that directly impact the process(es) envisioned for the software system and ignore the pieces of related declarative knowledge that are not immediately useful to the task (or process) at hand.

The designer's bias towards process is understandable. Software, because of the early hardware limitations, started in the service of automating smaller repetitive tasks (processes) with limited declarative knowledge consumption and production. As computing machines became more powerful and persistent storage more feasable, software evolved into more complex sets of procedural knowledge requiring more complex sets of declarative knowledge in order to perform their functions.

By stepping out for a much wider analysis view first, and using that more holistic view in design, the reusability of the declarative knowledge component of computer systems can be dramatically expanded. The author believes that fully leveraging the declarative knowledge component of a business entity is actually the most significant information technology goal that a small business should pursue in order to successfully traverse the business formalization interval of an organization's lifecycle.

Concept of increasing complexity as a business grows, potentially overwhelming the existing intellectual capital of the key personnel.

During the business formalization interval of an enterprise as a business grows, increasing complexity of the decision-making information flow potentially overwhelms the existing intellectual capital of the key personnel.

(Credit ‐ Shutterstock)

The declarative knowledge of the business is generally the most important portion of its intellectual capital. Declarative knowledge not captured and retained is effectively intellectual capital squandered. It is for this reason that Information Naturalization puts emphasis on the persistence aspect of the declarative knowledge and the state aspect of the declarative knowledge contained in the instances of real world object classes.

What all this really means is the idea that there should be a much more sophisticated decoupling of data from process in software systems, where the procedural knowledge can be reused (reapplied) to multiple instances of declarative knowledge – and – the declarative knowledge should be viewed as being independent so it can be reused in multiple distinct instances of procedural knowledge (different processes).

Object-orientation is appropriate for modeling activities, Information Naturalization is appropriate for modeling information

Object-orientation is appropriate for modeling activities, i.e. procedural knowledge. Systems for managing persistent declarative knowledge need a different model. Information Naturalization provides that framework.

The declarative knowledge persists in a natural structure with groupings of related information and associations between those groupings operating in a layer that is independent of the actions or activities performed using that declarative knowledge.

Information Naturalization provides guidance about where to draw the informational (system) boundary around related groupings of information such that the group of related information items operates as an instance of a discrete real world entity. These entity instances make up the data/declarative knowledge layer.

The activities and processes (procedural knowledge) performed using the data/declarative knowledge layer are organized in terms of interfaces designed to model real-world actors who would perform logically related tasks in a particular contextual manner (business function/departmental function/job task set, etc.).

The reason this alternative/extension to object-oriented modeling is needed is that real world business information systems require persistence of significant amounts of declarative knowledge (which can't be derived procedurally).

The object-oriented paradigm fundamentally does not handle persistence well, particularly associations between instances of objects, in part because the methodologies are based first on modeling processes and activities rather than the structure and associations between instances of persistent declarative knowledge.

Data management paradigms

Data Mining – Capture all the unstructured transactional data and use algorithms to infer meaning from that data later.

Information Naturalization – Capture the transactional data and all its contextual relationships at the time of capture, whether or not these relationships are relevant to a particular process.

Data Normalization

(Credit ‐ illustration by SoftScience Web Media)

The critical difference between these paradigms is that information naturalization captures relationships that are factual, whether or not they are used, while data mining assumes it can infer relationships after the fact from the data instances.

Data mining of unstructured data can never be factual to the same degree as information captured and contemporaneously stored in a naturalized structure. Data mining can be useful for inferring behavior profiles but would not be particularly reliable for doing hard quantitative processes such as business and financial tracking or design and engineering analysis.

Information Naturalization Example

Consider the example of a simple application to keep track of the supplier contact numbers for a small construction firm. A simple process that allows the user to enter and look up phone numbers for supplier contacts could be built using a very simple data model that includes three entities: company, person, and phone. This declarative knowledge portion of the application would work perfectly well for the limited process envisioned.

ER Diagram - simple business contacts model

(Credit ‐ illustration by SoftScience Web Media)

As the construction firm grows, it opens up more locations and does business with more vendors. It also decides that it needs to more formally track contact information for its subcontractors, customers, potential customers, and employees. Rather than having a separate contacts application for each of these types of purposes, the desire to have an integrated software solution for managing contacts becomes important.

Utilizing that contact data across multiple functions within the construction firm without having to reenter the data in multiple systems, will significantly leverage the organization's intellectual capital and improve the company's operational efficiency. Now that the construction firm is larger and more complex, its founders have realized that managing their contact (and other) data is critical to their continued growth. They understand now that just like their Bobcat track loaders, the accumulated knowledge needed to run their business is a valuable asset that they must take proper care of.

After looking at several larger software design solutions, they realize that the declarative knowledge requirements for a more robust contact management system is more complex than the simple data items that they have been collecting. All of the previous contact data that their employees have entered in the past is incomplete in terms of the functionality that they wish to now deploy. In order to utilize all the accumulated contact data, someone is going to have to research the existing contacts and enter the additional data items needed to make the enhanced system functional.

If the original software solution used by the young construction firm had been designed with a separate declarative knowledge data layer built using information naturalization principles, the information that they now find missing, could have easily been entered at the time the company name, person name, and phone number were being entered into the simple original system.

By looking holistically at the real world use of contact information and the relationships between people, organizations, locations, and communications modalities, a naturalized data model for contact information could be built.

ER Diagram - 'Naturalized' business contacts model

(Credit ‐ illustration by SoftScience Web Media)

By capturing all the associations (possible relationships) between people, companies, locations, and their possible roles, the contacts knowledgebase can be extended to support multiple departmental business functions ranging from supply chain management (SCM) to human resources (HR) to sales (POS) to customer relationship management (CRM).

Individual people can be part of multiple relationships with a company and can have relationships with multiple companies. People can fill multiple roles within different organizations or even within the same organization. A naturalized data layer makes a single instance of the contact information for that person available to all the individual relationships that the person participates in.

The naturalized contacts data model could still be used in a simplified application where not all the possible contacts knowledge is initially captured, if the cost of acquisition of the unused data elements cannot be currently justified. If the simple data initially goes into a naturalized database structure however, the additional data can be more easily back-filled later when needed.

On the other hand, if the database structure supports retention of the more complex naturalized knowledge and the application software interfaces are sophisticated enough to lower the cost of capturing the additional information, the construction firm would be able to grow its information asset organically and would already have a richer data set of contact information available when their business needs reach the point of requiring the additional depth of contacts information.

Information Naturalization provides perspectiveless contextual structure

The goal of Information Naturalization is the construction of a process-neutral contextual structure for information. Something that can be described exists before you can do anything with it. Its descriptive attributes are declarative knowledge that document the instance of a thing and its relative associations with other things in the real world.

Information systems that can properly store the declarative knowledge about the thing must have a structure rich enough to handle all the possible natural interactions and associations with other things that are reasonably possible to occur over time. Populating such a naturalized data layer as completely as possible when the thing first interacts with the information system's interface, creates a contextual structure for that thing's existence in the real world and in the software model of the real world that the information system seeks to provide.

With current technology, by applying the principles of Information Naturalization to the design of persistent declarative knowledge storage, we can build systems that approach the natural world richness in terms of possible information entity interactions. Information is no longer understood only from the perspective of the enterprise that deploys the information systems. The same information can be shared and understood across multiple enterprises because the contextual structure of the data layers are based on the 'natural' structure of the possible associations that can exist in the real world between instances of the things being modeled.

One of the key aspects of the de-coupling of declarative knowledge from specific processes is that it allows natural understanding of the possible interactions between groupings of related information (information entities) from multiple perspectives. The essence and richness of the information entity is not tied to the perspective of a particular process or set of processes. Rather, analysis using Information Naturalization principles tries to examine how that information entity interacts with other information entities in the real world.

Expanding the examination scope to the real world space makes the understanding of the information more perspectiveless. For example, in the business contacts model discussed earlier for a person holding a position with a company, the description of that position from the employer's perspective (such as job title) may be different from the description of the person's position from an outside customer company's perspective (contact role).

Floor trusses.

For a truss manufacturer, a person may hold the job title of 'sales engineer' which describes the role the individual plays within the truss company. From the perspective of a customer of the truss company, the functions that the contact individual fills could be multiple distinct roles that are important for the customer organization to track separately.

(Credit ‐ SoftScience Web Media)

For a truss manufacturer, a person may hold the job title of 'sales engineer' which describes the role the individual plays within the truss company. For customers of the truss company, the role(s) that the individual fills from their perspective could be several distinct contact/service functions that are important for the organization to track separately.

A particular customer company business process may need to know the supplier company individual that they call to procure engineering documents about a truss system in order to include with a building permit they are filing for an upcoming project. A different customer business process may need to know which person at the truss supplier handles the logistics and scheduling for a project that is operational. From the truss company's perspective, the same 'sales engineer' may be tasked with both responsibilities.

A naturalized information architecture would make the important distinction between a job title and a functional role. The design would also recognize that an individual with a job title may have multiple roles and that the role(s) that the person fills may be different depending on the perspective of the party utilizing the information in their own business process.

Obviously, there are practical and economic limits to an open-ended study of how everything could relate to everything else in the real world, but expanding one's understanding of the declarative knowledge use beyond the immediate task at hand opens up the possibility for an easy move into the adjacent possible as the needs of an enterprise change.

Data as an asset requires its reuse

In the age of production automation, just-in-time supply chains, digital customer relationship management, and many other modern automated business practices, an increasingly important component of a business enterprise's assets are its information/knowledge base. Even if it makes a physical product, an organization's internal information resource is possibly it's most important asset because it drives everything else the business does.

Effective re-use of declarative knowledge across multiple applications allows the management of a business to better leverage it's critical intellectual capital. For this reason, reuse of the declarative knowledge base of a business is a critical component of effective asset deployment for that business.

From its inception in 1991, SoftScience Group, Inc. has been designing and building naturalized data layers that support most small business functions. We have also developed specialized extensions to those naturalized data layers for specific industries. These extensions expand the functionality of the enterprise data layer into specialized industry-specific areas of operation.

This architecture allows businesses to leverage their intellectual capital as they expand by having a place to capture the declarative knowledge the business generates as it grows. By utilizing this rich data layer with appropriate applications added as business conditions warrant, the processes of the business can be automated at the appropriate time. This mass customization approach to software deployment allows the information technology side of the business to grow organically. The key is to capture and store enterprise knowledge in a common naturalized data layer at the earliest possible opportunity.

SoftScience Group, Inc. has also been using principles of Information Naturalization and Information Functionalization for more than two and a half decades to produce sophisticated software solutions for some of the largest companies in the United States. Our solutions have been deployed within the petroleum industry, the construction industry, and for numerous smaller businesses.