Data refers to distinct pieces of information, usually formatted and stored in a way that is concordant with a specific purpose.
What does data look like?
Data can exist in various forms: as numbers or text recorded on paper, as bits or bytes stored in electronic memory, or as facts living in a person’s mind. Since the advent of computer science in the mid-1900s, however, data most commonly refers to information that is transmitted or stored electronically.
Grammatically, data is the plural form of the singular datum, but in practice, data is widely used as a mass noun, like sand or water. For example, one might say the data prove something to be true in this case, “data” is referring to many pieces of information that are being used collectively to validate a claim. Not all writers accept the popular mass noun usage, however. Some academic and technical editors are adamant about the Latin plural and singular distinction (“the set of data prove” and “one datum proves”).
What is the difference between data and information?
While data is a collection of individual statistics or facts, information is knowledge gained through research, study, instruction, or communication. A key difference between the two is that data can be unorganized, unrelated, or raw, while information is organized. Data on its own might not have any meaning and might need to be sorted, analyzed, or interpreted to become information.
When trying to make a decision, data might not be enough, but decisions can be made based solely on information. Businesses that can collect and use data can gain valuable information from it that will help them make faster and smarter business decisions. Computers are an extremely useful tool to turn data into information using software applications, formulas, and scripts.
Machine-readable vs. human-readable data
All data can be categorized as machine-readable, human-readable, or both. Human-readable data utilizes natural language formats (such as a text file containing ASCII codes or PDF document), whereas machine-readable data uses formally structured computer languages (Parquet, Avro, etc.) to be read by computer systems or software. Some data is readable by both machines and humans, as in the case of CSV, HTML, or JSON.
The line between machine- and human-readable data is becoming increasingly blurred because so many formats that are prevalent today are accessible enough to be navigated by a human yet structured enough to be processed by a machine. This is largely the result of artificial intelligence, machine learning, and automation, which streamlines tasks and workflows so manual data entry and analysis is done by a machine rather than a human. However, these processes need to maintain their human readability in case the programming needs to be adjusted. Most data in these cases also exist in a vacuum and does not have much meaning without context from a human perspective.
Read More: Interested in learning more about data?
Visit the TechRepublic Academy.
Example of data vs. information
- The response of an individual in a customer service survey is a single point of data. It might not have any meaning by itself, but when combined with several responses or the responses of multiple individuals, the combined data can be used to form information, which can be used to make some conclusions or develop insights about the customer service.
- The number of likes on a social media post is data, but when combined with other data such as comments, shares, and demographics of the people, that information can be used to recalibrate the social media post to increase audience engagement.
Data phrases in technology
Data has become the forefront of many mainstream conversations about technology. New innovations constantly draw commentary on data, how we use and analyze it, and broader implications for those effects. As a result, the popular IT vernacular has come to include a number of phrases new and old:
-
Big data: A massive volume of structured and unstructured data that is too large to process using traditional database and software technologies.
-
Big data analytics: The process of collecting, organizing, and synthesizing large sets of data to discover patterns or other useful information.
-
Data center: Physical or virtual infrastructure used by enterprises to house computer, storage, and networking systems and components for the company’s IT needs.
-
Data integrity: The validity of data, which can be compromised in a number of ways including human error or transfer errors.
-
Data miner: A software application that monitors and/or analyzes the activities of a computer, and subsequently its user, to collect information.
-
Data mining: A class of database applications that look for hidden patterns in a group of data that can be used to predict/anticipate future behavior.
-
Data warehouse: A data management system that uses data from multiple sources to promote business intelligence.
-
Database: A collection of data points organized in a way that is easily maneuvered by a computer system.
-
Metadata: Summary information about a data set.
-
Raw data: Information that has been collected but not formatted or analyzed.
-
Structured data: Any data that resides in a fixed field within a record or file, including data contained in relational databases and spreadsheets.
-
Unstructured data: Information that does not reside in a traditional column-row database like structured data.
The history of data
In the modern world, data is often thought of as computer or digital data, but data has a long and rich history dating back to the ancient world. The Ishango bone tool was using a tally stick around 19,000 BC. Several ancient and early civilizations used different forms of data, including quantitative and qualitative data.
In the 1600s, data was used by John Graunt to study death records, and in the 1800s it was used by Herman Hollerith to solve math problems using the Hollerith desk that used the power of punch cards. In the 1900s, the problem of processing a large amount of data was solved through automation by inventors such as Fritz Pfleumer who used magnetic tape for collecting and storing data. Data saw rapid evolution through the 1990s as a result of the internet which created an entirely new universe of data. The evolution of data continues into the realm of artificial intelligence, machine learning, and more.