Data Hunches: Incorporating Personal Knowledge into Visualizations

Original link: http://vis.pku.edu.cn/blog/data-hunches/

The data presented in the visualization is not necessarily perfect and there may be some errors. For data quality issues, only domain experts are often aware of them. After discovering data quality problems in the visualization, experts can provide feedback through written notification, conversation, etc., but such feedback methods are very inefficient. The work introduced in this paper first classifies data quality issues, and then uses a hand-drawn style-based visualization for different types of data quality issues, allowing experts to incorporate their own personal knowledge into the original visualization.

The authors first proposed the concept of data hunch: An analyst’s knowledge about how and why the data is an imperfect and partial representation of the phenomena of interest). In order to help experts better visualize data intuition, the author further classifies data intuition.

  • Value hunch: Represents the difference between a particular data value and the value recorded in the dataset. Numerical intuition applies to numeric, categorical, and text/label data. For example, consider that a data item should belong to category A rather than category B.
  • Structural hunch: Indicates that certain data points or relationships should not be included in the dataset, or that data items or relationships are missing. For example, some edges are missing in the network dataset.
  • Assessment hunch: Indicates confidence or quality about a dataset or individual data item, or provides context about a dataset. For example, assessing the credibility of a dataset.

On the basis of the above classification, the author intuitively designs different visualization forms for different data.

Figure 1: A prototype system for visualizing expert data intuition. The hand-drawn style visualization is the data intuition of the experts, and the red annotation is the interpretation of the data intuition.

As shown in Figure 1, in this prototype system, the original visualization is the prediction of the number of deaths due to COVID-19 in each country within a week. Each row represents a country, and the color indicates the control policy adopted by that country. The hand-drawn style annotations overlaid on the original visualization represent the expert’s data intuition, that is, the expert’s evaluation of the data in the visualization, including the modification of numerical values, the modification of country categories, the addition and deletion of countries, and comments on the overall visualization and other information.

Figure 2: Ways to help experts document data intuition.

Figure 2 lists three methods by which experts incorporate knowledge into original visualizations. Experts can modify the original data through data tables or formulas, or directly drag and drop the original data items, and can also score or comment on the reliability of the data items.

Finally, the author summarizes some guidelines for designing intuitive visualization of data, the important ones include:

  • The original data cannot be changed. Data intuition is some perception about the visualization itself, different from the raw data. So the data intuition is always displayed on top of the original visualization, and the original visualization has not changed.
  • Differentiate from the original visualization. The author clearly distinguishes the data intuition from the original visualization through the hand-drawn visualization.
  • The visualization of data intuition maintains similarity with the original visualization. Maintaining similarity makes it easier to compare data intuition numerically with the original visualization.
  • Provide reason and credibility for data intuition. Data intuition is an evaluation of the original data. In order to improve the credibility of the evaluation, specific explanations need to be provided.

This work proposes visualizations for different data intuitions, however, for different visualization forms, the corresponding data intuitions require different designs. In addition, there is a challenge between how to improve the credibility of data intuition and protect the privacy of users.

references:

Haihan Lin, Derya Akbaba, Miriah Meyer, and Alexander Lex. Data Hunches: Incorporating Personal Knowledge into Visualizations. IEEE VIS 2022.

This article is reprinted from: http://vis.pku.edu.cn/blog/data-hunches/
This site is for inclusion only, and the copyright belongs to the original author.