by Daniel Lambert

Data is everywhere. There were about 21.6 billion devices (phones, cameras, and sensors) of all types connected to the Internet in 2020 [1]. This was about 2.6 devices per person worldwide [2]. This number should grow to 5.0 devices per person by 2025. In terms of data, the numbers are astronomical. In 2020, 64.2ZB of data was created or replicated worldwide, according to IDC [3]. This is 3.2 TB of data per connected device per year or 8.2 TB per person per year. In a day, it represents 22.5 GB of created and replicated data per person for 2020; this number should grow to over 60 GB per person per day by 2025. In comparison, one hour of standard video streaming on Netflix is 1 GB [4]. No wonder a growing quantity of corporate resources is allocated to managing, planning, and architecting data.

What is Data Architecture?

Data architects create artifacts of a data environment that aligns with the goals and objectives of their organization and its distinctive contextual requirements. According to TOGAF, “Data Architecture describes the structure of an organization's logical and physical data assets and data management resources” [5]. It is an offshoot of enterprise architecture.

Data architecture should not be confused with data modeling which consists of a “process of discovering, analyzing, representing, and communicating data requirements in a precise form called the data model. [6]” Data modeling is more about having a focused view on specific applications, systems, or business cases.

Data architecture is not limited to a set of products and tools an organization operates to manage its external and internal data. There is much more to it. Data architects define the methods to obtain, transform, and deliver usable data to the organization’s clients and business users. Most importantly, it detects the stakeholders who will use that data and their distinctive requirements. As indicated by Wayne Eckerson “A good data architecture flows right to left: from data consumers to data sources—not the other way. [7]”

Modern Data Architecture

Organizations are not limiting themselves to static IT-driven data architectures anymore, called data warehouses. They take too many resources to implement and change. Today’s data architecture needs to be ready for speed, flexibility, and innovation. The key to a successful data architecture upgrade is agility. As shown in Figure 1 above, modern data architecture may still include a data warehouse and data marts, but they need to be more flexible, adaptable, and agile. The use will be limited in generating reports, dashboards, diagrams, smart applications that are viewed by only a few casual users for analysis.

Data warehouse and data marts should only be one of many elements part of a modern data architecture. They are just a portion of a data lake environment, which is closer to a data ecosystem that uncovers and rapidly responds to changes, continuously understands, adapts, and delivers governed, tailored access to every stakeholder involved with operational applications and power analytic users and artificial intelligence that are looking for ways to finetune the organization’s operations.

A modern data architecture environment should not be confused with a data platform. Data architecture refers to all the engines and data applications that move, shape, secure, and validate data. A data platform is about the database engines (e.g., relational, Hadoop, OLAP, OLTP) that process and ingrate data to allow data engineers from IT and business stakeholders to create collaboratively datasets for business applications and systems.

Organizations are moving quickly to deploy new data tools in conjunction with legacy infrastructure to drive client-driven innovations such as more personalized digital approaches, real-time alerts, predictive maintenance, etc. These technical embellishments, including data lakes, client analytics platforms, and stream processing, for example, have hugely amplified the complexity of data architecture. Without a more modern approach to data architecture, the proliferation, and variety of data extracted from just about everywhere in a business’s environment are significantly impeding its ongoing ability to deliver new business capabilities to provide value, maintain current infrastructures, and safeguard the integrity of tagged raw data necessary to build artificial intelligence (AI) models.

The steady flow of rapid market changes makes it very costly for organizations to wait for a more modern data architecture. Amazon, Facebook, and Google among others have been successfully investing in AI innovations that are disturbing rapidly traditional business models, forcing laggards to reshape some facets of their own offering to keep up. Most cloud providers now offer serverless data platforms that can be used instantly, enabling early adopters to benefit from a faster time to market. Data analytics is now about automated model-deployment platforms enabling quicker use of new models. More and more businesses are adopting application programming interfaces (APIs) to share and synchronize data between disparate systems and applications within their data lakes to have a real-time view of what is really going on in their ecosystem to rapidly understand and integrate new perceived visions directly into their operational applications.

Six Foundational Data Architecture Shifts

In a modern data architecture, Antonio Castro, Jorge Machado, Matthias Roggendorf, and Henning Soller have identified six foundational shifts organizations need to grasp to enable more rapid delivery of new business capabilities and make more straightforward their current architectural model. “Even though organizations can implement some shifts while leaving their core technology stack intact, many require careful re-architecting of the existing data platform and infrastructure, including both legacy technologies and newer technologies previously bolted on [8]”. These 6 shifts are as followed:

  • From on-premises to cloud-based data platforms,
  • From batch to real-time data processing,
  • From pre-integrated commercial solutions to modular, best-of-breed platforms,
  • From point-to-point to decoupled data access,
  • From an enterprise warehouse to domain-based architecture, and
  • From rigid data models toward flexible, extensible data schemas.

Information Architecture - the Forgotten Part of Data Architecture

Most business and enterprise architects understand business capabilities and their supporting applications. To build a modern data architecture, many business and enterprise architects should also understand the need to examine the information concepts created, modified, and/or used by business capabilities and that are stored in one or several databases, as shown in Figure 2 below.

TSG-full

Information architecture is “the structural design of shared information environments. (…) It is a subset of data architecture where usable data (or information concept) is constructed in and designed or arranged in a fashion most useful or empirically holistic to the users of this data. [9]” Information architecture is also about mapping a single source of truth of a domain of information used to plan software development, customized software applications, build websites, etc. In business architecture, information concepts are standard business terms and semantics. Information concepts are usable data created, modified, and used by business capabilities. In information technology, a database can support or store one or several information concepts.

Information mapping allows the creation of visual representations of what usable data is required to ensure that a business capability is performing well. As pointed out by Sam Forouzi, to succeed in examining your information concepts, you will need to dig deeper in more details and ask the following questions. “What information … to have? … to capture? … to create? … to share? … may be public? … must be private? … must be logged or audited? … to see? … to sell? …. Is needed in the future? Who … needs it? … creates it? … enters it? (…) Where does the information need to be … captured? … created? … used? … shared? When does information need to be … captured? … created? … used? …shared? How does the information need to be … captured? … created? … used? … shared? [10]” Completing information relating to business capabilities will also allow better, smoother, and quicker planning of business process modeling (BPM) and UX design.

Conclusion

More and more digital transformations need to include the modernization of their organization’s data architecture. It needs to be agile and allow more speed, higher flexibility, and easier innovation implementation. A modern data architecture should also include information architecture to accelerate the planning of software development, customized software applications, build websites, etc. Without a proper and modern data architecture, traditional organizations have a greater risk of becoming laggards and more and more insignificant in tomorrow’s world.

_______________________________________________________________________________________________________
[1] This number is extracted from this table entitled “Internet of Things (IoT) and non-IoT active device connections worldwide from 2010 to 2025” from a Statista article published in November 2020.
[2] This number is 20.6 billion devices divided by 7.8 billion people worldwide in 2020. This second number is extracted from this Worldometers' webpage.
[3] Data extracted from the IDC Global DataSphere and StorageSphere Forecasts published in March 2021 by IDC.
[4] Data extracted from this article entitled “How much data are you using to stream your favorite Netflix shows? published in March 2020.
[5] Data architecture definition according to TOGAF.
[6] This definition is from DMBOK v2written by the Global Data Management Community.
[7] Quote from this article entitled “Ten Characteristics of a Modern Data Architecture” published in November 2018.
[8] The 6 shifts to modernize data architecture are described in detail in this article entitled “How to Build a Data Architecture to Drive Innovation—Today and Tomorrow” written by Antonio Castro, Jorge Machado, Matthias Roggendorf, and Henning Soller in June 2020 in McKinsey & Co.
[9] Definition of Information Architecture according to Wikipedia.
[10] Quote extracted from an article entitled “Information Architecture - The (Forgotten) Part of Architecture” written by Sam Forouzi in April 2021 on LinkedIn.