At a recent meeting of lawyers at a local bar association, the topic of discussion was cyber and information security.
With Facebook Analytica, Google trustbusting, Russian election tampering, and the latest – the hacking of Social Security numbers, bank details, and medical data of nearly 12 million Quest Diagnostics patients – all in the news, the timing was perfect.
The host of the event, the CEO from a local cybersecurity company, asked the question: “How many people in the crowd know the different between structured and unstructured data, and how that difference impacts cybersecurity?” You could hear the proverbial pin drop.
Can I see a show of hands, the host asked? There were none, showing that for lawyers, there remains a steep learning curve ahead. An important part of that curve is understanding the difference between structured and unstructured data, because each class of data presents unique challenges to cyber and information security.
What is structured data?
It’s about the type of database. Normally structured data is the kind of data that is kept in relational databases and illustrated visually in rows and columns, excel-sheet style. Its storage in this fashion, for reasons peculiar to computer thinking, enables algorithms and data mining tools to better search, access, and analyse what is stored.
There are common examples of its usage that are easy for a lay person – lay in the IT sense, not legal sense – to better evaluate and appreciate. These include debt card and banking activity; the whole terrain of sales, inventory, and delivery control systems; the online milieu of customer relations, and reservation systems for trains, airlines, hotels and the like.
Taken for granted today, the totals and sub-totals of this kind of structured data traditionally are digested and analyzed by companies big and small to make all sorts of decisions, allow or deny permissions, and generally to formulate business policy on a micro and macro scale.
How is unstructured data different?
Data that’s deemed unstructured is, ironically, the data of communication, privilege, proprietary concern. It’s also the stuff that’s not organized into tables, charts and spreadsheets, but instead stored in easily reachable and easily shared formats. For well over a decade now, most of us have used these formats daily, including text and other messaging (WhatApp, Facebook, etc.), emails, Word documents and PDF files, social media posts (Facebook, Twitter, etc.), and all the various and sundry shared photos, videos, and audios.
Why is the difference important in terms of cybersecurity?
The human mind and the machine mind are fundamentally different creatures with different strengths and weaknesses. Always keep this in mind when considering any matter of cyber or information security. In a tongue-and-cheek way, it’s man verses machine.
The very thing that makes communications more understandable to the human mind (via human language) makes it more enigmatic to machines (via algorithms) and thus harder to protect. The flip side is that this in turn makes it easier for a malevolently motivated party to hack, access and steal communications (i.e., unstructured data).
How it works.
Some of this tends to the obvious, but some does not. Because unstructured data is not ‘natural’ to machines (it being human language), humans have created technology to aid the machines, i.e., the computer and operating systems and software, but this technology and associated the solutions are relatively new.
For unstructured data is starts with aggregating all available data relative to the task involved. Once this is done, the patterns and relationships between the letters read by the technology are identified and named. Databases for structured data, by contrast, confine the data received and allocate the data to previously defined fields that it already understands.
The elusive quality of human language creates the challenge for unstructured data. No two people write or speak alike. There are billions of ways to communicate things, many with nuanced or contextual meanings not necessarily apparent in the words selected.
Different structure means different vulnerabilities and challenges.
In the end, all cyber-discussions boil down to vulnerabilities and how we close or protect against their exploitation.
With today’s technology, because structured data is confined and stored in precisely located databases, it is subject to relatively easy security protection.