CALVI: Critical Thinking Assessment for Literacy in Visualizations (CALVI: Critical Thinking Assessment for Literacy in Visualizations)

Original link: http://vis.pku.edu.cn/blog/calvi/

While visualizations can be effective at conveying information, sometimes they can also convey misleading information to readers. This situation is called visualization misinformation. In the face of misinformation, an interesting question is: To what extent can the public recognize visual misinformation? To answer it, we first need to have a metric that measures the public’s corresponding ability to read, interpret, and reason about erroneous or potentially misleading visual information. This paper [1] presents CALVI, a systematic test to measure the critical thinking aspects of visual literacy (Fig. 1). CALVI contains 45 items from design spaces of error message types and diagram types. Each cell is a visual multiple choice question.

Figure 1: Calvi.

CALVI can be broken down into four parts. The first part is the misleader, a decision made during the construction of the visualization that may lead to conclusions that are not supported by the data. The second part is the chart type. The third section is a visualization led by misleaders and chart types. The fourth part is the item (item), which is the multiple-choice question related to visualization mentioned above.

CALVI identified nine types of misleaders based on two classifications of misleading visualizations [2, 3] (Fig. 2). It sets four criteria for merging and filtering the two classifications. First, misleaders should be visually detectable. Second, cognitive biases of the reader are not misleaders because they are not part of the visualization construction process. Again, the misleader must be universal across different chart types. Finally, misleaders must be self-contained, so they cannot claim domain-specific knowledge.

Figure 2: Nine Misleaders of CALVI.

Starting from the twelve graph types enumerated by VLAT [4], CALVI merges and filters graphs according to the principles of realism and diversity. First, it removes tree diagrams, which are less likely to appear in real life than other forms. In addition, according to the principle of diversity, considering the similarity between bar chart and histogram and between scatter plot and bubble chart, CALVI merges them respectively. Finally, nine chart types were identified.

CALVI then generates visualizations based on misleaders and chart types. It first generates easy, medium, and hard visualizations for each possible combination, then filters visualizations that are unlikely to occur in reality and only Keep one of several similar visualizations. CALVI ultimately generated 52 visualizations.

Based on the fact that “people may encounter misleading mixed with correctly interpreted visual information”, CALVI compiled two types of items: misleading items and normal items. Misleading items were related to 52 previously designed visualizations, while normal items were based on well-informed visualizations. The misleading item contained three types of answers: correct answers, answers that were incorrect and related to the misleader, and answers that were incorrect but not related to the misleader.

To identify possible ambiguities and misinterpretations in the visualizations and items and determine the sample size needed for the pilot phase of the test, the authors first recruited thirty participants for a pilot experiment. Each participant was asked to answer thirty items, half of which were randomly selected misleading items and half of which were normal items. The responses revealed some ambiguities and inconsistencies. As shown in Figure 3, Items A and B are about the misleader’s “inappropriate use of the scale”. The size of their sectors does not correspond to the percentages written on them. This inconsistency suggests that the visualization is not conveying reliable information, so the correct answer should be that it cannot be inferred. However, some participants indicated that they noticed the conflict but decided to believe one of the expressions. Another project about line charts and misleading labeling has a similar problem: the title of the chart does not align with the trend of the line (Figure 3 project C). Based on these inconsistencies, the authors additionally designed an open-ended question section asking participants to justify their answers to the three items. Furthermore, through simulations, they found that 500 participants allowed the models used afterwards to converge.

Figure 3: Three ambiguous and inconsistent items found in the pilot experiment.

In the pilot phase of the test, the authors recruited 497 participants and used the same experimental process as the pre-experiment to verify the effectiveness of the visualization and items. Descriptive statistics revealed that participants were about half as accurate (M = 0.39, SD = 0.16) when faced with misleading items as normal items (M = 0.80, SD = 0.13); most participants were able to The test was completed within (M = 19.80, SD = 10.25).

The authors then further revised the project through Item Response Theory (IRT), open-ended question feedback, and effectiveness evaluation. First, they characterized each item from two perspectives of item ease and discrimination through a two-parameter IRT model. Parameter values ​​were obtained by Bayesian estimation (Fig. 4). Based on parameter values, the authors removed T7 and T41 because 1) they were the most difficult and had very low discrimination; 2) they both had alternative visualizations. Afterwards, based on feedback from participants on open-ended questions, the authors chose to keep A and B because of the inability of participants to provide convincing arguments for other choices and the good level of difficulty and differentiation of the two items. However, item C was removed because C was about misleadingly labeled items, and most participants ignored contradictory information in the title. Finally, the authors ensured that there was a causal relationship between the measured competencies and people’s performance by using two validity measures, the False and Misleader-Related Score and the content validity index (CVI). According to these two measures, there are 6 items that need to be modified. The authors removed four of them except items A and B.

Figure 4: Difficulty and discrimination values ​​for each item.

1. Lily W. Ge, Yuan Cui, and Matthew Kay. CALVI: Critical Thinking Assessment for Literacy in Visualizations. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23), pages 1-18, 2023.

2. Andrew M. McNutt, Gordon L. Kindlmann, Michael Correll. Surfacing Visualization Mirages. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI ’20), pages 1-16, 2020.

3. Leo Yu-Ho Lo, Ayush Gupta, Kento Shigyo, Aoyu Wu, Enrico Bertini, Huamin Qu. Misinformed by Visualization: What Do We Learn From Misinformative Visualizations? Computer Graphics Forum, 41(3):515-525, 2022.

4. Sukwon Lee, Sung-Hee Kim, Bum Chul Kwon: VLAT: Development of a Visualization Literacy Assessment Test. IEEE Transactions on Visualization and Computer Graphics, 23(1):551-560, 2017.

This article is transferred from: http://vis.pku.edu.cn/blog/calvi/
This site is only for collection, and the copyright belongs to the original author.