I made a GameJam library in Notion

When Notion meets the game, what kind of spark do you think will be hit?

In my case, the two bumped into a library of games.

As a game lover, I find myself always drawn to those small games. For example, the indie games that players often say, or those works that were born in GameJam.

The so-called GameJam can be understood as a fast-paced game development activity that encourages creative expression. Participants, either alone or in teams, need to make a game from scratch within a specified time, expressing a specific game as much as possible in the game. (usually designated by the event organizer), and finally select outstanding works. (For details, please refer to this introduction by Ye Zitao)

I once tried the works born in GameJam at an offline game exhibition. Although the degree of completion is difficult to compare with those games displayed on Steam, most of them have ingenious gameplay design, and I can encounter bright spots from time to time. Treasure works, I am amazed at the creativity and talent of the developers.

However, offline exhibitions are limited after all. Due to various factors such as distance, opening hours, and traffic flow, it is difficult for me to play games that I am interested in freely. This also leaves me with regrets. I can’t stop paying attention to similar GameJam events, thinking about playing more games born in the event, but I can’t do it for various reasons.

Until not long ago, I saw an event called BOOOM being pushed in the App of Jinu.com. I suddenly recalled that this was the GameJam that buried my regrets. Excited, I set myself the goal of “experience all games”.

But to experience all the games, first I have to know which ones are in total so I can keep track of my progress. After a series of tossing, I successfully built a Notion game library within the day, and collected the information of all the games in this BOOOM.

The game library built in Notion contains the information of all the works of BOOOM this year.

In this article, I will review the process of building this game library myself. If you are interested in data collection, efficiency improvement or information management, it may also unlock some new knowledge for you, or new usage of the tools you are familiar with.

Evaluate implementation options

How to collect data?

In the case that the machine core has provided a list of BOOOM games , the source of the data can be said to be very clear, and it can be obtained directly from the official page. The question at the beginning is: how to collect this data? How will it be stored and presented after collection?

List of games on the official BOOOM page

If there are not many games, such as only a dozen or so, then I can consider collecting them manually, but a glance at the BOOOM page shows that a total of more than 100 works have been collected this time. If it takes 1 minute to manually collect a game, it will take two hours anyway, and this is a highly repetitive task, and it is difficult to guarantee that the motivation and interest will not be lost in the middle, so I better consider a more efficient solution.

It happened that I had a little understanding of Python crawlers, so I thought of using crawlers to collect data, so that not only can I understand “how many works”, but also the name, type, link and other information related to each work can be included. But whether and how to crawl depends on the situation of the target web page.

To do this, I need to evaluate BOOOM’s active pages from a crawler development perspective. The points I usually focus on include:

What data is needed : such as text, links, pictures, which will affect the way data is extracted and how the data will be stored later
Which crawler library to choose : It depends on the loading method of page data (static/dynamic), whether you need to log in, the complexity of crawling logic (load more/automatically turn pages/jump to new pages), etc. I usually consider Libraries are scrapy, selenium and requests
Is it possible not to write a crawler : for example, it is best to find the request to send data through Chrome Network debugging, and it is more convenient to manually construct it, because it can get clean JSON data and save trouble

After some observation, I have a preliminary understanding:

The official BOOOM game list is vertical scrolling, dynamically refreshed, and seems to have random sorting
The list refresh is limited. After dragging to the bottom a few times, it will be completely finished, and no new ones will come out.
The information of each game in the list covers the title, cover image (static/dynamic), tags, click to jump to the game details page
The details page of the game, including more pictures (even videos), text descriptions (game introduction, download method, thank you words), download buttons (some do not), developer information (some are incomplete)

In conclusion, it is possible to crawl, but you have to find a way to make the crawler simulate the operation of scrolling the list, because this part of the data is dynamically loaded and will be refreshed every time you scroll to the bottom. (For details, please see below)

I also clarified the information to be climbed and the intended use:

game name
Cover image : It is easy to identify quickly and visually, and it looks good as a cover
Labels : Provide references for finding works of interest to you, and may also be used for analysis
Link : It is convenient to open the details page to read the introduction, download the game and write a comment later

The information that needs to be collected is basically these in each game card

How to store management data?

Collecting data can be handed over to crawlers, but where does the data exist? How to view and use?

I quickly thought of my commonly used Notion , (minority should know most of them here, but I will introduce it just in case) a Swiss Army Knife-like note-taking software that integrates writing, planning, and management. Among them, the database function, Provides a wealth of data types and view presets, and also supports filtering, sorting, and searching, which almost perfectly matches my management needs for BOOOM game information.

Judging from the tutorial topics of the community, Notion has also been played out, writing notes, making plans, task management, and getting used to punching cards are not a problem.

With a plan for data collection and storage in place, I have listed the next things to do:

Write a crawler to collect data
Import data into Notion
Configure Views in Notion

Write a crawler to collect data

As I mentioned earlier, because the official game list is refreshed every time it scrolls to the bottom, I have to find a way to make the crawler simulate the operation of scrolling the list, so as to ensure that all game data is collected.

BOOOM’s official game list is dynamically refreshed

That’s why I chose the selenium library, a library often used to simulate human actions and test web pages, and it couldn’t be more appropriate to use it to crawl dynamic pages.

After clarifying the technical solution, I did not directly start writing the code, but first disassembled the task from top to bottom, just like the Work Breakdown Structure (WBS) in project management, I started from the purpose of a script , dismantling the problem to be solved in each step:

Open BOOOM webpage
Simulate scrolling page and load all data
Iterate over the list to extract the required information
save data to local file

After clarifying the task of each step, start Visual Studio Code (the software used to write code), create a new Python file, and then use comments to indicate the purpose of the script and the problem to be solved in each step. This part refers to Google’s Python code annotation specification . After personal practice, I found that it can indeed avoid a lot of confusion when reviewing the code.

Use documentation comments at the beginning of the script to indicate the purpose of the entire script, and block comments in the rest to indicate the purpose of the partial code

With such a general framework, we need to solve the problems of each step one by one. The specific technical details are omitted here. It is basically oriented to search engine programming. Considering that some people may be interested, here is a summary of how it was finally implemented:

Open the BOOOM web page : start a Chrome browser with selenium and open the BOOOM activity page
Simulate scrolling the page and load all the data : execute the JavaScript script with selenium, get the scrolling height of the web page and simulate the scrolling until the scrolling height no longer increases
Traverse the list to extract the required information : use Xpath expressions (a language for locating nodes in an XML tree structure) to locate each block of information in the web page, traverse one by one and continue to extract text, links, etc. using Xpath , stored in the structure of the list set of dictionaries
Save data to a local file : export data to a CSV file using pandas (a library commonly used for data manipulation)

In the end, I wrote more than 50 lines of crawler code (many of which are blank lines for comments and typesetting), and after running, I collected the data of all BOOOM games.

The crawler code may not be as complicated as you think, usually dozens of lines can be done

The crawled data is exported to a local CSV file, which contains these data for 113 games:

title: text, game name
tags: list, the game is marked with all tags
game_url: text, link to game details page
img_url: text, link to game cover image

Data exported by the crawler to CSV

After crawling the data, according to the previous plan, the next step is to import the data into Notion.

Import data into Notion

Friends who have used Notion may know that Notion’s database function supports direct import of CSV. After importing, it will automatically complete non-existing columns and set matching data types, but unfortunately I can’t use this function.

The problem is that the format of the data I collect , some of which are not recognized by Notion, such as tags and image links. The image link directly imported into Notion will be recognized as a link (URL attribute), not the image I want to see (Files & media attribute), which also means that I have to manually set the image data more than 100 times later, but I am A person who is extremely tired of repetitive work.

CSV is directly imported into Notion, the label will become text, and the image link will not be able to display the corresponding image

So a new problem is in front of us, how to avoid duplication of efforts to import this batch of data into Notion. I quickly thought of the Notion API , a set of interfaces publicly provided by Notion officials to facilitate developers to write programs, connect Notion with third-party tools, and realize automation. At this time, I have accumulated some experience and packaged commonly used functions. It is not difficult to write a new script to import data.

Notion’s official introduction to its API: Get through Notion pages, databases, and tools you use every day to create powerful workflows

Similar to writing a crawler, I also disassembled the steps in this link. Importing data into Notion requires two steps:

Read data from local file
Traverse the data one by one and create a new page in the specified Notion database

After that, write code implementation for each step:

Read data from local files : read CSV files with pandas to get data in DataFrame structure
Traverse the data one by one , and create a new page in the specified Notion database: Traverse each piece of data, extract the information and package it into a data format acceptable to Notion for creating new pages

The more troublesome thing here is the conversion of the data format. The raw data read is a single variable, but in order for the Notion API to use these data normally, it must be repackaged into a dictionary/list layered in strict accordance with the official requirements. set of formats.

The data format packaged in the code obviously has many more layers of nesting, which is dazzling.

Fortunately, Notion provides relatively complete documentation for developers who use the API. For example, this document fully lists the format examples of various data types, allowing me to understand how to insert pictures in the Notion database through the Files property.

But I can’t write the code once. I also stepped on the pit at the step of inserting the picture. It was written with reference to the official document, but a page was not added successfully. Finally, I found that there was a problem with the picture link. Notion The suffixed image link cannot be used on the API side , so I added another step of regular matching processing to solve it.

The cover image link that I climbed to is followed by a string of suffixes, which seems to be used to crop and zoom the original image. Regularly extract until the suffix of the image file name, the image link can be recognized by the Notion API

After many tests and fixes of various sizes, the data was finally successfully imported into Notion.

Successfully imported game data from Notion

Configure Views in Notion

So far, Notion’s database has saved this information:

game name
Label
Game details page link
Cover image link

Although the data required by the game library has been imported, there are still different use cases in actual use, so it is necessary to design corresponding data views based on different needs.

Starting from the goal of “experience all games in this BOOOM”, I listed these use cases:

Track your work around the progress of the experience
Randomly browse works and find inspiration
View the ratings of the works you have tried

After listing, I found that a new database needs to be built, because use cases 1 and 3 both involve the scoring of individual trial works, and the existing database manages game works, not scores. The side database will make the information management too bloated.

So I created a new “evaluation record” database, and designed the data model according to the needs of the demo:

Game : relationship, bound to the corresponding game in another database, you can look up tables through the relationship
Expressiveness : numerical value, for scoring
Innovation : numerical value, for scoring
In line with the theme : numerical value, for scoring
Like degree : numerical value, for scoring
Start the assessment : time, for recording
End assessment : time, for recording
General review : text, summary of gameplay + evaluation of advantages and disadvantages + summary
Evaluation time : the formula outputs the value, and the start and end times are used to calculate how many minutes of play
Total score : The formula outputs the value, and the total score is calculated by combining the above scores. The weights of each item are tentatively equal.
Total score – iconization : formula output text, convert the total score into N ⭐ text

The data/attributes that an assessment record will contain

Going back to the game’s database, referring to the use cases listed above, by combining the view, filter, sort, and visible attributes in Notion, I finally created 4 different data views:

Random walk : meet the needs of “randomly browse works and find inspiration”, presented in cards, showing cover, title, label, link, pseudo-random sorting (updated every minute)
Evaluation Kanban : To meet the needs of “tracking works around the progress of the experience”, the Kanban is grouped by progress and displays the cover, title, and label
Completed : Satisfy the requirement of “Viewing the Ratings of Experienced Works”, present it in a table, filter the completed ones, sort the ratings in descending order, and display all attributes
General table : for temporary search, presented in a table, showing all properties

random walk Assessment Kanban completed General table

In the random walk view, I used a pseudo-random algorithm I came up with, just to reset the list every minute. The principle is also very simple, there is a unique digital ID in the link of each work (extracted from it with a regular), I substitute the current timestamp in the formula (the precision is milliseconds, but it is updated every minute), divide the ID by the timestamp and take the remainder , in the end, each work will have a number that is updated every minute and the order is not fixed, and then you can use this value to sort .

Pseudo-random number calculated by ID Implementing a pseudorandom Notion formula

share to the community

At this point, the database required for my own trial has been set up, but I am going to make an additional sharing version, because I remembered the pain point of using the official page:

It is difficult to find a specific work, often you have to refresh the complete list and then Ctrl+F (at this time, I don’t know that I can use the machine core’s in-site search)
The continuity of browsing is easily interrupted, and the order of works changes from time to time in random sorting

Since I have such a pain point, maybe other people will have the same experience, and this game library should also be able to help them and play a greater value.

Soon, I copied an existing database and reconfigured a set of views:

Cards : Expect to solve the pain point of discontinuous browsing, card presentation, fixed sorting, display cover, title, label, link
Random walk : meet the needs of random exploration, and continue to use the existing unchanged (to be honest, this is back to the official display logic)
Tables : meet search needs, table rendering, display all properties

Card view of the shared game library

Then, through the sharing function of Notion, I publicly shared this new game library to the Internet, opened comments, search engine retrieval, and then sent a share link on the dynamic tape on the machine core to share it.

Share the core dynamics of the game library at that time

at last

After posting that post, I also had some unexpected gains.

I received a compliment from this BOOOM developer. Blasin-Ree , the developer of the game “Cato”, commented in the dynamic, saying that the Notion version of the game library is more convenient than the official one, which made me really happy for a while.

Likes from GameJam developers

After that, SleepyJeff, the developer of ” TRAiLS “, also found me. He helped me forward the link of this game library to the BOOOM developer group, but found that one work was missed, which may be the comparison of this group of submissions. Late, resulting in not being collected by crawlers.

When I received feedback that some works were not included, my first reaction was that there was a problem with the code I wrote.

After I learned about it, I went to investigate and confirmed that this work was still not included in the official list at that time, so I helped to manually add the information. This work is “Sbalasi” . After I tried it out, I found an audio game work with great art and a high degree of completion. I hope this wave of supplementary recordings will help them.

If you like audio games or funny game themes, it is recommended to play “Sbalasi”, a very interesting audio game work

There is also a windfall, that is, I was paid attention to by a man named Simon. I later learned that he is the founder of the machine core.

Looking back on this BOOOM game library construction, I learned these things:

If the data collected by the crawler needs to be continuously accessed and used, if the magnitude is not large, you can consider importing Notion
Pointing out the purpose in code comments is more helpful for reviewing than pointing out what was done
Packaging common functions around a usage scenario can make subsequent development easier
When it comes to repeatedly operating the Notion database, consider using the Notion API for automation
The product of a personal project may also be helpful to others, consider sharing more

Thanks to Blasin-Ree’s likes, SleepyJeff’s enthusiastic contact, and Simon’s attention. After this practice, I found that I not only took a step forward in knowledge output, but also established a closer relationship with people in the independent game community. connect.

If you are also interested in this BOOOM and want to try the game works, you can visit the event page of the machine core , or find your favorite game through my game library .

This is also my first article in the minority. If this article is helpful to you, I hope you can like it and collect it. This is the greatest support for my continuous creation.

If there is anything else you want to discuss, please leave your opinion in the comment area.

This article is reproduced from: https://sspai.com/post/74002
This site is for inclusion only, and the copyright belongs to the original author.