Evaluating the Validity of Simplified Chinese Version of LIWC in Detecting Psychological Expressions in Short Texts on Social Network Services

Nan Zhao; Dongdong Jiao; Shuotian Bai; Tingshao Zhu

doi:10.1371/journal.pone.0157947

Abstract

The increasing need of automated analyzing web texts especially the short texts on Social Network Services (SNS) brings new demands of computerized text analysis instruments. The psychometric properties are the basis of the extensive use of these instruments such as the Linguistic Inquiry and Word Count (LIWC). For this study, Sina Weibo statuses were analyzed via rater coding and Simplified Chinese version of LIWC (SCLIWC), in order to evaluate the validity of SCLIWC in detecting psychological expressions in Weibo statuses (n = 60) and in identifying the psychological meaning of a single Weibo status (n = 11). Significant correlations between human ratings and SCLIWC scores and the high sensitivities of capturing single statuses with certain expressions identified by raters, proved the validity of SCLIWC in detecting psychological expressions. The results also suggested that, the efficiency of SCLIWC in detecting psychological expressions of SNS short texts could be higher if using status count scoring method, rather than the word count method as the common usage of LIWC. However, SCLIWC may not perform well in identifying the psychological meaning of a single piece of SNS short text because of its over-identification of target expressions. This study provided primary evidence of validity of SCLIWC, as well as the proper way of using it efficiently on SNS short texts.

Citation: Zhao N, Jiao D, Bai S, Zhu T (2016) Evaluating the Validity of Simplified Chinese Version of LIWC in Detecting Psychological Expressions in Short Texts on Social Network Services. PLoS ONE 11(6): e0157947. https://doi.org/10.1371/journal.pone.0157947

Editor: Xuchu Weng, Hangzhou Normal University, CHINA

Received: December 7, 2015; Accepted: June 7, 2016; Published: June 20, 2016

Copyright: © 2016 Zhao et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting Information files.

Funding: This work was supported by the National Basic Research Program of China (973 Program) (2014CB744600), National High-tech R&D Program of China (2013AA01A606), Key Research Program of Chinese Academy of Sciences (KJZD-EWL04), and the Scientific Foundation of the Institute of Psychology, Chinese Academy of Sciences (Y4CX143005). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

People’s daily language profoundly reflects their psychological worlds [1]. Today as the explosion of online textual data coming from people’s daily life naturally and spontaneously, the need to interpret the psychological aspects coded by language in online communication [2] and the need for valid computer-based methods of rapidly analyzing texts have been increasingly highlighted. While there were a range of general purpose computerized text analysis programs in psychology having been developed, such as General Inquirer [3], Wordnet [4] and Opinion Finder [5], the Linguistic Inquiry and Word Count (LIWC) [6] is now often the preferred automated text analysis method in psychology, and an important choice of natural language processing in computer sciences. LIWC was developed in early 1990s [7] to map psychological and linguistic dimensions of written expression, and then it was keeping updated. Composed by a text processing program and the dictionaries, LIWC could calculate a percentage of words falling into 80 psychologically or linguistically meaningful categories. These categories cover several important psychological aspects of an individual, including emotion, cognition, social contact and personal concerns. Another significant benefit of LIWC is that as a truly transparent text analysis method, the manipulation of output variables is totally visible to users and it allows users to extend the lexicons or even add new categories to meet their needs.

In past 20 years, LIWC has been used in hundreds of studies exploring the relationships between psychological processes and the word categories in daily language. The language features depicted by LIWC word categories have been found to reflect user’s attentional focus [8], emotionality [9], social status and hierarchy [10], social coordination and group processes [11], Deception [12], close relationships [13], cognitive styles [14], mental health status [15] and other individual differences [16]. Tausczik and Pennebaker’s review [17] has provided a detailed list of studies of this approach. LIWC was also used in computer science as a natural language processing tool to extract computable features from online textual data, especially in the recent boom of social media research. With the LIWC word categories as parts of the feature sets for computational prediction models, scientists could predict users’ personality [18–20], personal values [21], tie strength [22], mental health status [23–24], subjective well-being [24–25], and even political election result [26] based on the textual data of social medias and other sources.

No matter for which approach above, the validity of LIWC is a crucial issue. When the words of a category are used, does that mean the user do express the meaning as that category defined? The answer of this question could largely determine the interpretation of the relationship between word categories and psychological processes, as well as the effectiveness of the word categories as feature sets. Considering the large amount of work using LIWC as a tool, independent studies regarding its psychometric properties are quite few, especially for the validity of categories other than emotional expression. A direct evidence of LIWC validity was the comparison between human ratings and LIWC variables. Pennebaker and Francis [7] required judges to rate essays written by college students along 12 dimensions of LIWC, using a 7-point unipolar scale, and the validity was represented by the correlations between judges’ ratings of the category with the LIWC variable. Their results showed that for categories of emotion processes and some cognitive processes, there were medium to high correlations between human ratings and LIWC variables. Similar method was also used to provide evidence of LIWC validity in some other reports [27–28]. Another way of comparison was reported by Bantum and Owen [29], where raters reviewed each word and coded it into a specific emotion category or as being absent of emotion, and then compared with LIWC variable. The LIWC emotion lexicons were found to be quite effective in detecting emotional expressions. This method used the signal-detection indices to qualify the validity of the lexicon and was also implemented in recent study [30]. In addition, other validity data, such as the comparison of the LIWC value pattern among different studies, or the comparisons of LIWC values among different types of texts, were also reported [27].

Although there has been some evidence that LIWC was valid in processing many types of English texts, more work is still needed as LIWC being more widely used. First, while LIWC was translated into other languages, the validity data on that language would be a necessary basis for the applications after translating. Second, the validity of LIWC on texts of Social Network Service (SNS) is worthy to be evaluated. Compared to the essays and forum discussions used to test LIWC validity in previous studies, today SNS is often inundated with short texts, such as tweets and Weibo statuses. While facing to a bunch of independent short texts with various topics, could LIWC play as well as on those long texts with a single topic? This is a very valuable question since there is a huge demand to processing SNS texts with instruments like LIWC.

To meet the needs of processing Simplified Chinese texts, especially the web texts, Gao et al. [31] developed a Simplified Chinese version of Linguistic Inquiry and Word Count (SCLIWC) based on LIWC [28] and the Traditional Chinese version of LIWC (CLIWC) [32]. The SCLIWC was first translated from CLIWC and then each word was checked for its category through the same method used to develop LIWC [28]. Since there is no wildcard in Chinese, in order to improve the word capture rate of SCLIWC lexicon on today’s masses Chinese, the high frequency words extracted from Chinese SNS were added into the lexicon according to the LIWC categories [31]. However, for the application of SCLIWC, the validity data is still needed. In this study, we analyzed Simplified Chinese web text with the aim to answer two questions: How accurate the detection of psychological expressions was on different web texts by SCLIWC; and how to use SCLIWC in a more efficient way to detect psychological expressions on SNS short texts. To accomplish these aims, we employed the method comparing human ratings and LIWC variables as in several previous studies, and used three different types of web text: Sina Weibo statuses as the SNS short texts, Renren blogs as the SNS long texts, and news comments as the traditional web texts.

Study 1

This study was aimed at evaluating the validity of SCLIWC for identification of psychological expression in web text. Three different web texts: Weibo statuses, Renren blogs, and news comments, were processed by both SCLIWC and human raters. The validities of SCLIWC with different scoring methods on Weibo statuses of different time spans were assessed and compared in order to reveal the more effective way of identification.

Method

Participants and Materials.

Sina Weibo is a popular social media site in China which is similar to Twitter. The Weibo statuses (like tweets) from April 1, 2012 to April 30, 2012 of 60 Weibo users (30 males and 30 females, based on the gender information they filled in their Weibo profiles) were used in this study. These users were randomly selected from our active Weibo user pool [33], who met the following requirements:

During Jun1 2012 to Jun30 2012, each user posted 90–110 valid statuses.
During Jun1 2012 to Jun7 2012 (the first week), each user posted 20–26 valid statuses.
On Jun1 2012, each user posted 3–4 valid statuses.

Here the valid status means those whose word count was larger than 0 after deleting links, reposted content (“//@username:” or marked in the “retweeted_status” field of downloaded data object through Sina Weibo API), mentions (“@username”) and emotion icons. We downloaded the texts of these 60 users’ 5,931 valid statuses in April 2012. A Weibo status may include links, mentions, emotion icons, pictures, audio or video clips, and reposted contents. Since the aim of this study was to evaluate the validity of SCLIWC in processing texts, the content beyond the LIWC processing scope such as links, emotion icons, pictures, audio or video clips were removed. We would like to focus on the expression of Weibo users, while the mentions were others’ username and the repost contents were usually mixed with non-personal expression such as advertisements and news, so the mentions and reposted contents were also dropped, living only the original text expressed by the Weibo user of each status. The average text length of such a “cleaned” status was 25.2 Chinese characters with a range from 1 to 140 (the upper limit of one Sina Weibo status) characters. In this sample male users posted almost equal number of statues as females did (98.7 vs. 99.0), but their status length was a little shorter than females’ (23.8 vs. 26.6).These cleaned statuses were used for rater coding, and for SCLIWC scoring the mentions were further removed.

Renren is a popular social networking site in China which is similar to Facebook, and there are many users posting blogs through their Renren account. Sixty Renren blogs of 60 users (30 males, 30 females) were selected in the current study, which were all about the experiences, thinking and feelings of the writers, with 1552.1 characters (SD = 562.4) on average for each. The 60 news comments, published during 2012–2014, were selected in some China’s mainstream media websites (such as Xinhua and Sina), whose topics covered current politics, economy and social affairs, with 1603.9 characters (SD = 444.1) on average.

The Weibo statuses and Renren blogs were used in this study in the context of participants’ electronically informed consent. As detailedly described in Li et al.’s study [33], the participants were recruited online through an informed-consent web page with two buttons “I agree” and “I disagree”. Only if one clicks “I agree” to provide his/her informed consent to participant in this study, could we download and use his/her Weibo statuses or Renren blogs. This research plan was approved by the Institutional Review Board of Institute of Psychology, Chinese Academy of Sciences.

Rater coding.

Eighteen categories of SCLIWC were selected to be assessed in this study, covering personal pronouns (First Person), social processes (Family, Friends), affective processes (Positive Emotion, Negative Emotion, Anxiety, Anger, Sadness), cognitive processes (Insight, Causation, Discrepancy, Tentative), biological processes (Biological Processes) and personal concerns (Work, Achievement, Leisure, Money, Death). The category First Person here was created by merging the SCLIWC categories First Person Singular and First Person Plural. The selected categories of personal pronouns, social processes, affective process, cognitive process and biological processes were assessed by human judges in previous studies [28] and most of them were found to be relevant to some psychological outcomes [17]. The objects and events people care about may also be an important reflection of their mind, so we added some categories of personal concerns in our list.

For each given text, the raters in our study made the decision of whether, or how much, it was characterized by each of the 18 categories. The definitions of these categories referred to Pennebaker and Francis’s study [7] and Bantum and Owen’s study [29]. Our raters were required to evaluate the meaning of the whole content rather than detect certain words. For example, if a Weibo status obviously states the author’s performances or feelings, it would be characterized by the category First Person, no matter whether there was personal pronoun in this status or not; Similarly, those statuses using the Chinese character “death” in a colloquial way to express a strong attitude rather than discussing real death, would not be identified regarding to the category Death. We also excluded those statements using emotional words just to express some preference, e.g., “Nan likes icecream of green tea flavor”, while giving the ratings on affective processes, in order to identify those “real” emotional expressions.

We trained 3 graduate students of psychology institute as the raters of this study. Then they independently coded all the texts while being blind to the SCLIWC scoring results. Cleaned Weibo statuses were presented in a single work sheet for each Weibo user, without any supplementary information except the time it was posted. For each Weibo users, the raters made the judgments in sequence on how much the first day/first week/whole month’s Weibo statuses could be characterized by each category on a scale from “1” (“none”) to “7” (“quite a lot of”). For each Renren blog and news commentary, similar judgments were made for the whole article. The average of the 3 raters’ rating scores was the final human rating scores used in further analysis. The reliability of the 3 raters was measured using Cronbach alphas, and this index was found to be acceptably high for each category on each of the three text types, between .78 (Discrepancy on Weibo statuses) and .99 (Sad on news comments).

SCLIWC scoring.

Since there is no space between words as the word boundary in Chinese as in English, all the texts used in this study were firstly segmented into single words through Language Technology Platform (LTP) [34]. Then, the SCLIWC was conducted to count words of different categories in the texts. For the sets of Weibo statuses, Renren blogs and news comments, the word count of each category (the SCLIWC word count score) was directly calculated. For the single Weibo status, if there was one or more word of a SCLIWC category appearing in a status, this status would be labeled as the same category. For example, “I feel depressive today” was labeled as a sad status because the word “depressive” was in the SCLIWC sad category. Then the number of statuses in each category (the SCLIWC status count score) for each Weibo users was calculated. These two SCLIWC scores as well as the human ratings were put into SPSS 15.0 for further analysis.

Results and Discussion

The validity of SCLIWC in detecting psychological expressions in Weibo statuses, Renren blogs, and news comments.

As the usual way to analyzing texts using LIWC lexicon, the proportion of the word count of each category in the total text word number was firstly calculated and compared with human ratings. Table 1 shows the percentage of total words identified for Weibo statuses, Renren blogs and news comments in our study. In this table the inspected SCLIWC categories are briefly divided into 4 groups: self and others (mentioning person), affective processes, cognitive processes, and concerned contents (mentioning objects except person). The word percentages of most of these categories are similar to the means listed by LIWC’s authors for results of analyses of multiple texts written under different instructions [28]. Only for the category First Person, the word percentage is much lower than Pennebaker et al.’s results [28], which may reflect the characteristic of Chinese utterance. Meanwhile, the word percentages on different text types show discrepancies consistent to the features of the type: news comments were expressing opinions in an objective perspective and a rational manner, so they use much less first person and more causation words than Weibo and Renren; for the contents, Weibo and Renren were more personalized while news comments focused on public topics like economy and policies, so news comments mentioned less words of Biological process and Leisure, but much more words of Work, Achievement and Money.

Download:

Table 1. The percentage of total words detected by the SCLIWC categories on Weibo statuses, Renren blogs and news comments.

https://doi.org/10.1371/journal.pone.0157947.t001

To examine the validity of SCLIWC in detecting psychological expressions on Weibo statues, Renren blogs and news comments, we conducted Pearson Correlation analysis between the SCLIWC word count scores and the corresponding human ratings as Pennebaker and Francis [7] did (Table 2). For the categories about self and others, SCLIWC scores were significantly correlated with human ratings, but the correlations were small or medium, expect for the two categories, Family and Friend, high correlations were achieved on news comments. For affective processes, the correlations between SCLIWC scores and human ratings on three text types were close, and the small to medium degree of correlation was consistent with previous studies [7, 27]. The correlations between SCLIWC scores and human ratings were not significant for most categories of cognitive progresses on Weibo statuses and news comments, but were significant and achieved medium for Insight, Causation and Tentativeness on Renren blogs. Most of the correlations of concerned contents were significant except for Death category on Weibo statuses, and most of them achieved moderate, or even high on Renren blogs and news comments.

Download:

Table 2. correlations between human ratings and SCLIWC word count scores on Weibo statuses, Renren blogs and news comments.

https://doi.org/10.1371/journal.pone.0157947.t002

The current results show the validity of the SCLIWC word count score in detecting psychological expressions in Weibo statuses, Renren blogs, and news comments. The similarity of the word percentage profile of selected categories in our study with Pennebaker et al.’s [28] results, as well as the discrepancies of the profiles among different text types, are confirming the construct validity of SCLIWC. Moreover, the significant correlations between SCLIWC scores and human ratings are direct evidence of the concurrent validity of SCLIWC. As shown in Table 2, the validities of different categories on different text types are variant: the validities of Family and Friend are higher on news comments than on Weibo and Renren; the validities of the categories of cognitive process are quite low on Weibo statuses and news comments (except Tentativeness), but much better on Renren blogs (except Discrepancy); for those categories about concerned contents, the validities on Renren and media are higher than on Weibo. However, in general the validity of SCLIWC in the current study achieves the level of the validities of LIWC in previous studies [7, 27].

Comparing the validities of SCLIWC with different counting methods on Weibo statuses of different time spans.

The results above confirms the validity of SCLIWC in detecting psychological expressions on a considerable amount of Weibo statuses (a month’s statuses), whose total word number reached the level of a typical essay (like a blog or a media commentary) on the internet. However, as the length of text drops, the possibility of errors made by the psychological semantic dictionary may increase. To examine the validity of SCLIWC on texts with less words, we conducted Person Correlation Analysis on human ratings and SCLIWC scores on Weibo statuses in a day, a week and a month (Table 3). In general, the correlations between human ratings and SCLIWC scores changed not much among different quantities of Weibo statuses. The number of categories with significant correlation rose a little as the number of statuses increasing from a day’s to a week’s, and from a week’s to a month’s. It seems that SCLIWC did perform better in detecting psychological expressions in analyzing texts with larger number of words, nevertheless it also showed high validity in many categories on the texts as short as a day’s Weibo statuses (3–4 statuses) in current study.

Download:

Table 3. Correlations between human ratings and SCLIWC scores (word count/status count) on Weibo statuses of different time spans.

https://doi.org/10.1371/journal.pone.0157947.t003

Besides the scoring method of word count, which is the most common way in the use of psychological semantic dictionaries, status count could be another available scoring method in analyzing a group of Weibo statuses using SCLIWC. We also calculate the correlation coefficients between human ratings and the status count scores of each categories on a day’s, a week’s and a month’s Weibo statuses. As shown in Table 3, for most of the categories in which there was significant correlation between human ratings and SCLIWC scores, the coefficients became higher when using the status count scoring method, especially on the categories of cognitive processes. The consistency of the changing trends of correlation coefficients in all the four groups of SCLIWC categories implied that, when analyzing Weibo statuses the scoring method of status count may show better effect than word count, and this finding was true for different amount of Weibo statuses. When evaluating a month’s Weibo statuses, the correlations between human ratings and SCLIWC scores were significant or marginally significant in all the selected categories in our study while using status count method, and most of them achieved medium correlation, which confirmed that the dictionary SCLIWC does be a valid tool in detecting psychological expressions in Weibo statuses, if used in a proper way.

Study 2

On social medias such as Sina Weibo, a status was the natural unit expressing a complete thought, as well as the unit of interpreting the expressions by social media users. In Study 1, the outstanding performance of SCLIWC status count score in detecting psychological expressions in Weibo statuses raised a further question: whether this method could be used to make judgments on the psychological meanings of a single Weibo status? If a single status could be classified as certain SCLIWC categories automatically and accurately based on its psychological meaning, the scope of application of SCLIWC would be further expanded. To answer this question, we conducted Study 2, which was aimed at evaluating the validity of SCLIWC for identification of the psychological meaning of a single Weibo status.

With reference to Bantum and Owen’s method [29], we used signal-detection theory [35] and the signal-detection indices to quantify the accuracy of SCLIWC identification. For our purpose to estimate whether a Weibo status with a word of a SCLIWC category does expressing the meaning of that category, a signal is the expression of the psychological meaning of a certain category, and noise is the lack of such expression. Four signal-detection indices sensitivity, specificity, positive predictive value and negative predictive value, were used in this study. Sensitivity was the probability that a status that is actually representative of the psychological expression of a SCLIWC category would be identified by SCLIWC as belonging the same category. Specificity was the probability that a status not expressing the meaning of a SCLIWC category would be identified by SCLIWC as not belonging the same category. Positive predictive value was the probability that a status characterized by SCLIWC as expressing the meaning of a category is truly representative of the meaning of that category, and negative predictive value was the probability that a status characterized by SCLIWC as not being indicative of a category is, in fact, absent of the meaning of that category.