While it does serve a very important role in computing, you do need to remember that metadata can disclose information about you and your business.
“Data about data.”
That simple statement describes the essence of metadata. Breaking it down further, data about data refers to the information used to describe specific content (also data).
Viewing the Data
For example, if you right-click on any saved file, such as a text (.txt) file on your hard drive using Windows Explorer you can select “properties” and see additional information about that specific file. In this case, the information is about the file itself and includes such things as the file name, what program it can be opened with, when it was created, last modified and last accessed, the file size, full path name of the directory it is stored in, who created it, who the system owner is, and so on.
This additional information you can obtain about the file is the metadata. The metadata you can see when using Window Explorer properties is specifically called file system metadata. Metadata is associated with almost every type of electronic file available today. Even your email headers and attachments contain metadata. Most metadata is hidden and you have to know how to access it to change or limit the information provided.
Try right-clicking an image on your hard drive. Photographers who capture the perfect shot and can’t remember the camera settings, can try viewing the metadata attached to the picture to find out. This is especially true for JPEG images, although metadata is available on a wide variety of image file formats. In addition to information ranging from author to white balance to the camera lens manufacturer, the metadata stored with the image is in-depth, and possibly information you don’t want someone viewing the image to know.
Microsoft Office and Metadata
Microsoft Office files, like other types of digital data, also carry metadata, called document metadata. Some of the types of metadata that may be stored along with your saved office documents can include your name, initials, company name, computer name, the disk or network server the file was stored in, file properties, revisions, hidden text, deleted comments, and so much more. Microsoft Office documents are frequently passed among co-workers, clients and contractors, so when the documents are shared, quite often, so is large amounts of metadata.
The problem is often referred to as the metadata risk, and that risk is the disclosure of private information, usually unknowingly, because this metadata is hidden from plain view and users simply are not aware of it. When a document is sent outside your office to a client or contractor, the associated metadata may not stay hidden if the receiving party knows where and how to look for it. One critical area of interest is the capability to track changes made to the document. In Microsoft Word, metadata stores information about changed text, the name of the author making changes, and the date and time those changes were made. This information may be something those outside your company shouldn’t have access to.
When using Microsoft Office applications or any application, it’s important to familiarize yourself with which of the program’s tools will let you remove this metadata, and show what is normally hidden mark-up so you can ensure this type of associated information about the content will not be shared when you share the actual file. The removal of this type of data is often called data scrubbing or data cleansing.
In the latest version, Microsoft Office 2007, Microsoft included a Document Inspector feature in Microsoft Office Word 2007, Microsoft Office Excel 2007, and Microsoft Office PowerPoint 2007 that can help you find and remove hidden data and personal information in your Office documents.
If you use Office 2003/XP, Microsoft offers an add-in you can download that enables users to permanently remove hidden data. With this Remove Hidden Data add-in, you can run the tool on individual files from within your Office application, or run it on multiple files from the command line. Additionally, a Google search will produce a wide array of results for third-party tools and software that you can use to wipe out metadata from various files created using different programs.
The Importance of Metadata
With so many privacy concerns surrounding metadata, you may wonder why it exists. Metadata is actually useful for searching and controlling content. For example, consider metadata on web pages. Search engines often place a higher priority on metadata tags such as page title, keywords and description than they do on the actual contents of the page. To those searching the Web, this metadata is useful for finding relevant pages. Metadata is also important for faster and more accurate database search and retrieval and for information stored in data warehouses.
So while it does serve a very important role in computing, you do need to remember that metadata can disclose information about you and your business information that you may not even realize exists.
Did You Know…
In a landmark 2004 case, the U.S. District Court ruled that electronic documents must be produced .in native format. and .with their metadata intact.. (Williams v. Sprint). Metadata includes message attributes such as file owner, creation date, routing details, the sender, receivers, and subject line. [Source: The New Federal Rules of Civil Procedure: IT Obligations For Email]
Based in Nova Scotia, Vangie Beal is has been writing about technology for more than a decade. She is a frequent contributor to EcommerceGuide and managing editor at Webopedia. You can tweet her online @AuroraGG.
This article was originally published on August 10, 2007