Visualized history research has made important progress – early historical visualization data sets of various countries will be released at IEEE VIS 2023

Original link: http://vis.pku.edu.cn/blog/oldvisonline/

Recently, researchers from the University of Oxford and Yuan Xiaoru’s research group at the School of Intelligence at Peking University have worked closely with researchers from the Hong Kong University of Science and Technology, Fudan University, Huawei and other institutions to systematically collect visualizations created in the early history of various countries and build corresponding data sets. . The research work has made important progress, and the relevant papers were accepted by IEEE VIS 2023, the flagship international conference in the field of visualization. This working system has constructed an early historical visualization data set of each country, covering more than 13,000 early visualization works in each country, providing a foundation for future related research in the field of visualization and humanities.

People have a long history of making visualizations to express data. Many famous visualizations were born hundreds of years ago and have been passed down to this day (Figure 1), which provided valuable examples for subsequent visualization research. Nowadays, researchers can explore the social background, artistic paradigms and technical levels of historical eras through early visualization cases; visualization educators use excellent early historical visualization works to lead students into the field of visualization for the first time and experience the role and art of visualization. In the long history, although some visualizations have been fortunately preserved to this day, some have disappeared from people’s sight due to various reasons. Fortunately, with the development and popularization of digital technology, many precious early visualizations that have been handed down have been digitized and made available to the public for browsing through digital libraries. However, because relevant resources are scattered everywhere, even visualization professionals, most of them do not fully understand the historical visualization content that has been passed down to this day, and the value of these historical visualizations has not yet been fully realized. For these precious historical visual heritage, there is an urgent need for unified collection and organization to call on and guide the public and scientific researchers to pay more attention to this underexplored field.

Figure 1: Excellent early visualization work

In order to expand the influence of historical visualization and make it an important part of people’s understanding of visualization and history, researchers from the research group of Yuan Xiaoru at the School of Intelligence of Oxford University and Peking University, as well as the Hong Kong University of Science and Technology, Fudan University and Huawei completed an early historical Visualizing the construction of data sets. This work collected a total of early visualizations from seven online digital libraries, built a dataset containing more than thirteen thousand early visualizations, and provided people with a historical visualization image browsing interface (Figure 2).

Figure 2: Image browsing interface OldVisOnline https://oldvis.github.io/gallery/ for historical visualization data sets

In historical visualization data collection, the problem of data source needs to be solved first. This work selected seven online libraries for visual retrieval based on three aspects: data availability, metadata richness, and data quality. Get historical visualization images and metadata from it. However, building a unified dataset from multiple heterogeneous data sources faces many challenges. First, different digital libraries use different data schemas and fields when describing their collections. Secondly, visualizations are sparsely distributed in digital libraries, resulting in a large number of irrelevant elements that will inevitably be acquired. Finally, the sheer volume of collection content creates a huge workload for manual annotation. In order to deal with data format issues, this work proposes a data specification (Figure 3) to uniformly store data from different collections.

Figure 3: Metadata storage specification

In order to eliminate irrelevant elements and speed up the annotation rate, this work proposes a semi-automatic annotation method. By using deep learning methods to train the classifier for visualization type prediction, and then correcting it through the annotation system, the construction speed of the data set is accelerated. Taking into account the characteristics of historical visualization, this work first manually annotated all images of a data source. The annotation information included “whether it contains visualization”, “whether it contains maps” and “whether most of the content is text”, and then trained three Each classification model predicts “whether the image only contains text”, “whether it contains a map” and “whether it is a visualization other than a map”. In order to ensure the quality of data annotation, this work re-assigned correct classification labels to the incorrect results in the automatic classification of the model through manual inspection (Figure 4).

Figure 4: Classification annotation interface

Using semi-automated annotation methods, this study obtained a total of more than 13,000 historical visualization images and corresponding metadata. Then, the work analyzes and discusses the research value of historical visualization data to inspire relevant researchers to make full use of this data set to carry out future scientific research work. These application scenarios include but are not limited to design inspiration, historical research, machine learning, and education. The data set supports researchers to compare and verify different versions of visualizations. For example, users who search for John Snow in the OldVisOnline user interface can query different versions of cholera maps published in 1855 and 1936 (Figure 5). The 1936 version contained a typographical error, in which two boxes representing cases were merged into one. This finding provides a possible explanation for the data discrepancies in cholera maps reported in the literature.

Figure 5: Comparison of different versions of cholera maps

The rich content of this data set can be used as a sample set to inspire researchers to design novel visualization forms. As shown in Figure 6, the annotation interface supports users to conduct in-depth analysis of the data set. Users can mark the visualization type and mapping method of historical visualization to further analyze the design space of historical visualization. Through these usage scenario examples, we hope to inspire researchers to explore the data set more deeply.

Figure 6: Early visual design space annotation interface

This work uses deep learning methods to systematically collect early historical visualizations on a global scale for the first time, and discusses the role and value of related visualizations. Using this visual data set of early global history, scholars can carry out relevant in-depth exploration according to their own research directions. In the early historical visualization data set of various countries collected in this work, due to limitations of data sources, most of them originate from the West. On the other hand, China also has a long tradition of using visualization. The research team noticed that a large number of ancient Chinese books and other materials recorded a variety of ancient Chinese visualization work, which was significantly different from early Western visualization in form and connotation. Relevant work is being carried out smoothly and will be released to the public in the near future.

Zhang Yu, the first author of the paper, graduated from Peking University in 2017 and obtained his PhD from Oxford University in 2022. Other authors include Jiang Ruike and Liu Can from Peking University, Xie Liwenhan from Hong Kong University of Science and Technology, Zhao Yuheng and Chen Siming from Fudan University, and Ding Tianhong from Huawei. The corresponding author of this article is Yuan Xiaoru, School of Intelligence, Peking University. This work was supported by the National Natural Science Foundation of China project NSFC 62272012 “Research on Sample-driven Visual Design Space Exploration”.

This article is reproduced from: http://vis.pku.edu.cn/blog/oldvisonline/
This site is only for collection, and the copyright belongs to the original author.