I wrote a paper analyzing my husband’s language changes before and after marriage, and the conclusion is…

July 10th this year is my first wedding anniversary with my husband. Since the beginning of the year, I’ve been thinking about what to give him – something creative and one that he likes. At the beginning of April, a post on Xiaohongshu “Xueba wrote a paper to relieve body anxiety for his girlfriend” gave me a sudden inspiration: I also want to write a paper for my husband!

This has a layer of “pun” meaning: in some cultures, the first anniversary of marriage is called Paper Anniversary (paper wedding) , the couple will give each other paper products as gifts, and the paper perfectly fits the theme.

640?wx_fmt=jpeg

Paper anniversary gifts are often cards or a book | Unsplash

Out of love for my husband and enthusiasm for academics, I started a three-month dissertation preparation. During the period, I went through the whole process of determining the theme and research methods, self-studying Chinese natural language processing, research and analysis, writing the first draft, looking for peer review by friends, and showing it.

Determine the theme and method

All beginnings are hard.

Writing a dissertation is no stranger to me, it’s etched in my DNA. From the beginning of my undergraduate degree, I followed the professor to do scientific research. At the postgraduate stage, I completed research projects independently, and experienced numerous defenses and academic speeches.

But writing a thesis without any direction is not easy. Of these, the most difficult is identifying a viable research topic . This small project is different from any research I have done before. It requires me to come up with a topic and research purpose “out of thin air” in order to determine a specific research method later.

I tried to open Google Scholar and searched for literature related to “marriage relationship and happiness”, and I browsed about ten articles. After reading and sorting, I found that the relevant papers are almost all from a sociological background, and most of the research process and conclusions are based on interviews and interviews with participants.

369

Unsplash

Gifts pay attention to “surprise”, and interview methods are obviously not feasible. How can the same research effect be achieved without the husband’s participation and detection?

At this point, the idea came to me that I could use natural language processing technology to perform text analysis on our chat records . In this way, it will not only avoid his direct participation (all chat records are stored in my mobile phone) and cause “scare the snake”, but also closely related to my career, it should be more handy for me.

This process involves a lot of skills I already have. For natural language processing, I choose to use the Python language. As a data scientist, I am familiar with Python, and the language packages for Simplified Chinese that can be searched on Github are basically based on Python.

The text and content are edited with Latex’s editor Overleaf. Most of my academic papers in the past were edited with Latex, which can automatically typeset, so that the whole paper looks like a tall halo – the final effect is indeed the same. At first glance, it seems that it is a paper that has been officially published. .

341

The beginning of the paper | Photo courtesy of the author

data analysis

The most important part of the whole paper is the data analysis link .

The first step is to collect data, which is often the most headache for data scientists. The initial data is usually bulky and unorganized, like an old warehouse in disrepair. Scientists need to clean the data to extract the parts they really need.

My initial idea was to extract all WeChat chat records in 2021, but unfortunately I found that WeChat does not have the function of directly extracting data, and can only use the most primitive method-manual copy and paste.

This is a long job, manually copying and pasting 100 messages before and after marriage at random. During this period, I have been clicking the mouse and keyboard. My hands were sore and wanted to give up another way, but I thought about my husband and data science. love, or choose to continue…

After two weeks, I collected 200 messages – in fact, this is also very consistent with the time allocation of a data analysis project . Any related projects I have done before basically spend nearly half of the time on data collection and processing. .

qCaBUXbKhnbJcOxCK-Rx1_--_z0KCAQek1b125_R

Giphy

The next step is to turn the initial data into a small database that can be analyzed . In these two hundred dialogues, there are a lot of distracting information that is not useful for analysis, and they need to be eliminated.

I used some of the methods commonly used in natural language processing.

It is mainly divided into three steps. The first step is tokenization , that is, dividing a sentence into several separate words and analyzing each word as an information unit. This is a characteristic of Chinese. Unlike English ready-made words, Chinese is a language composed of coherent “characters” – in the eyes of a machine, it is indistinguishable .

It took me a lot of time to find an existing Chinese word segmentation component on Github called “Jieba” to help me with word segmentation.

The second step is to remove stop words (Remove Stop Words) , that is, to remove words with high frequency but no practical help for language analysis, such as “yes”, “good”, “also” ; in addition , in the third step, I also removed words composed of single Chinese characters (for example: I, you, ok), these single-character words help to form a complete sentence, but I personally feel that it will not provide too important for analysis effect.

The above steps helped me to get a meaningful dataset (DataSet). There are about 500 real words left after cleaning. Analysis can begin.

The data analysis is mainly divided into two parts. The first part is to use the most frequently used hot words (Top Words) , and compare the difference of high frequency words before and after marriage. In this part, I used a visual word cloud (Word Cloud) to visually display the appearance of high-frequency words.

The second part is Sentiment Analysis of the text . I used a trained open source model to predict the sentiment of each word. The model scores words on a scale of 0 to 1, with 0 being the most negative sentiment and 1 being the most positive sentiment.

Next, with the predicted sentiment values, I used a t-test commonly used in statistics to analyze the mean values ​​of sentiment values ​​before and after marriage to see if there was a significant difference.

Hk9YxI7RHhSqaVroh4TFS-F9wBhefpWJ-B8trTVg

The training data used for this open source model comes from the e-commerce rating website | Giphy

It is worth mentioning that, through the analysis of the initial data distribution, I found that the entire research data does not conform to the normal distribution in a strict sense, so the t test cannot be performed. However, considering the limitations of my husband’s knowledge of statistics, in order to facilitate his understanding, I still use this most common detection method.

The results of the study showed that there are some high-frequency words throughout our marriage before and after marriage, such as “baby”, “going home”, “dog” (referring to our pet dogs Fuwa and Waffle), “care” (usually referring to him take care of me).

The difference is that after marriage, he has a new nickname for me “Cutie” (cute), and the number of times he calls me “wife” has also increased significantly, because after I officially become his wife, not only can I be more “justifiable” “He called me “wife”, which is also the most obvious expression of his “acting like a spoiled child” and showing his love to me. 🙂

350

Visualization of the analysis results of high-frequency words, before marriage on the left and after marriage on the right | Photo courtesy of the author

Another point worth noting is that negative emotional words such as “hard work” no longer appear in the list of high-frequency words after marriage ; the frequency of “I love you” has become higher. Overall, the average emotional score after marriage was also higher than before marriage.

Research conclusion: Marriage makes us happier and full of confidence in the future life, making me more determined that choosing to marry my husband is the most important and correct decision in my life.

281

Histogram of high-frequency words, before marriage on the left and after marriage on the right | Photo courtesy of the author

Peer Review

After writing the first draft, I followed academic conventions and planned to ask my friends to peer review. At first, I was worried that my friends would refuse for various reasons, because after all, I wanted to read a more academic paper and make suggestions. But unexpectedly, everyone was very interested in this paper. Not only did they help me change the grammar, but also put forward a lot of suggestions for research directions. In the end, even my manager at work joined the peer review team. I was really touched.

478

The emotional score results of words used before marriage and after marriage, left is before marriage, right is after marriage | Photo courtesy of the author

Achievement display

Before officially showing this paper to my husband, on a date, I gave him some clues.

At that time, I asked him to guess, and the hint I gave was related to our anniversary theme (paper), but even so, he never guessed it. When I finally told him that the gift was an academic paper, he was very surprised and moved, puzzled and puzzled, and it took several minutes for him to recover. I have been secretly preparing this gift at night for several months. He didn’t notice it at all.

The night before the wedding anniversary, when I came home from the date, the time and atmosphere were just right, I felt it was time to show the results of these three months.

Although I knew the format in advance, my husband was still very pleasantly surprised when he saw the paper. He is a particularly emotional person, so when he read the summary, he was moved to tears (slightly exaggerated, but the corners of his eyes were moist). He read the whole article seriously and said it was the best and most unique paper he had ever read.

786

We are graduating! | Photo courtesy of the author

postscript

Our mutual knowledge and acquaintance can be said to have always been closely intertwined with academics. We met on campus – both undergraduates were at the University of Toronto, I was in Industrial Engineering and he was in Mechanical Engineering.

In the summer of 2015, we met by chance at a seminar we attended together, and inadvertently learned that each other was preparing for the driver’s license test.

As a student, he was very keen to be my “audience”, listening to my practice before speaking. Every time I talk about some academic-related terminology in front of him, although he doesn’t understand it, he still thinks it’s amazing.

Later, when we finished our undergraduate studies, I chose to go on to graduate school as a data scientist at the Ontario Ministry of Health; he chose to work directly and is now an environmental noise and acoustics engineer at an engineering consulting firm.

On July 10, 2021, it was our sixth anniversary together. Due to the obstacles of the epidemic, we could not return to Canada, and our family members in China could not come, so my husband and I decided to hold a small wedding in Toronto. A small number of friends in Canada shared the good news with relatives in China through online live broadcast. Since then, we have officially become husband and wife.

370

The wedding scene with our two dogs | Photo courtesy of the author

The completion of this paper gave me a great sense of accomplishment. Although it is still far from the paper that can be published, it is more like a love letter I wrote for him, but it did let me experience the process of defining a “research topic” by myself and then implementing it step by step. It also made me discover for the first time that data science and papers can be so romantic!

Both natural language processing and statistical analysis fall under the umbrella of data science. My own major and work are related to data. Through this project, while practicing old skills, I also learned a lot of new knowledge, which is also very helpful for my own career development, which is a pleasant surprise.

I shared about writing a paper for my husband on Xiaohongshu (@爱豆沙包的BeanPaper), which unexpectedly became popular, with nearly 150,000 views, and nearly 7,000 likes and favorites, netizens have Showed interest in “writing papers”.

I believe that, with or without scientific background, everyone can use the structure and form of a paper to “measure” their own life, because everything we do can be summarized in terms of cause/process/effect/summary.

People call the first anniversary a paper anniversary, because they think that the wedding is simple, fragile but full of possibilities; further down, it is cotton, leather, linen… and even silver and gold weddings, which symbolize stronger and stronger feelings. Therefore, I think that after ten years, I can write a retrospective study of ten years , and look back at the road we have traveled, as well as the mentality and journey of each time period.

references

[1] Neutrino. (2020). jieba. https://ift.tt/vfAFDLk. GitHub.

[2] Wang, R. (2020). Snownlp. https://ift.tt/VsEoSuI. GitHub.

[3] Gove, WR, Hughes, M., & Style, CB (1983). Does marriage have positive effects on the psychological well-being of the individual? Journal of health and social behavior, (), 122–131.

[4] McDowall, D., McCleary, R., & Bartos, BJ (2019). Interrupted time series analysis. Oxford University Press.

Author: Bean

Editor: Weng Yang

178

This article is from Nutshell and may not be reproduced without authorization.

If necessary, please contact [email protected]

269

This article is reproduced from: http://www.guokr.com/article/461931/
This site is for inclusion only, and the copyright belongs to the original author.

Leave a Comment

Your email address will not be published.