Say you need a chair and your friend has one that he would be happy to give you. Before you agree to haul it away in a borrowed pick-up, you want to know more about the chair. Is it comfortable? Is it hard to clean? Will it match your décor? Is it the right size for your space?

So you ask your friend a few questions to get more information. What kind of upholstery does the chair have? How long is it? Does it recline? What color is it? All of these questions are meant to provide context for you to decide whether the chair suits your needs. In other words, you are gathering data about the chair. If you think of the chair as a single piece of data (or stated another way, as a data set containing just one record), then the description of the chair is its metadata. Put simply, metadata is data about data.

Whether your data is a physical object, a digital file, or a row in a spreadsheet, metadata is necessary to understand the value, function, and history of your data. If you learned that your friend’s chair was the Ron Arad Big Easy Chair (image at left) you might think twice before taking it. More realistically, you might learn that the chair is a reclining chair, vinyl upholstery, and is 84” wide.

Similarly, imagine reading an undated news article about an outbreak of a deadly disease in a nearby city. Without knowing the article’s publication date, you wouldn’t know if the outbreak is spreading rapidly or if it occurred 20 years ago! Suppose you later learn that the article came from The Onion, a satirical newspaper. You would then assume the article is exaggerated or untrue.

Metadata grows in importance with the quantity of data under consideration. If you have 100,000 chairs to choose from, you need ways to differentiate between them without examining each individually. Identifying metadata common to many chairs would help distinguish collections of like data within your data set. For instance, you might only be interested in recliners or accent chairs. Many chairs will fall into multiple collections, leaving it up to you to decide which are of primary importance.

Savanna, Thetus’s multi-source analysis solution, incorporates metadata throughout the analysis workspace. Users can capture details about people, organizations, places, and concepts in preformatted metadata fields or freeform narratives. Options for describing connections between different types of data allow for easy identification of related content. Privacy settings specify rules about who can access specific data. Cumulatively, Savanna’s metadata tools enable users to contextualize information and thus reap greater insight into complex problems.

Different types of metadata gain prominence depending on the user and what kind of information they need. You asked your friend questions relevant to a chair but your query would have been different if you were discussing a song, a map, or a car. Similarly, different Savanna users present varied forms of information relevant to their problem space. A geospatial analyst in the military assessing locations of weapons stockpiles cares about different data—and thus metadata—than an emergency planner assessing earthquake hazard mitigation strategies. In most cases, users have predefined metadata based on organizational or disciplinary standards.

Regardless of users’ specific needs, the importance of metadata remains. Metadata can be simple and intuitive, as when thinking about the characteristics of a couch, or it can be formalized, detailed, and discipline-specific. Given the potential for metadata to profoundly influence your interpretation of information, you should treat metadata as an integral part of any data set.

~ Rebecca Davies, Analyst

Curious? Let's set up a free trial.

Try Savanna
Scroll to Top