Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Improve word embedding using both writing and pronunciation

Abstract

Text representation can map text into a vector space for subsequent use in numerical calculations and processing tasks. Word embedding is an important component of text representation. Most existing word embedding models focus on writing and utilize context, weight, dependency, morphology, etc., to optimize the training. However, from the linguistic point of view, spoken language is a more direct expression of semantics; writing has meaning only as a recording of spoken language. Therefore, this paper proposes the concept of a pronunciation-enhanced word embedding model (PWE) that integrates speech information into training to fully apply the roles of both speech and writing to meaning. This paper uses the Chinese language, English language and Spanish language as examples and presents several models that integrate word pronunciation characteristics into word embedding. Word similarity and text classification experiments show that the PWE outperforms the baseline model that does not include speech information. Language is a storehouse of sound-images; therefore, the PWE can be applied to most languages.

Introduction

Word representation plays a very important role in natural language processing (NLP). The key issue is how to obtain word semantics. The initial word representation, which simply digitizes the word, cannot capture its semantics. Subsequent models that can capture semantics benefit from the distributional hypothesis of Harris [1], which assumes that words with similar contexts have similar meanings. The most efficient word embedding models are based on this concept.

Neural networks are important machine learning models that show superiority in many areas. The application of neural networks to word embedding models was proposed by Bengio et al. as early as 2003 [2]. Since then, neural network language modeling has gained attention in word representation and is now a popular technique [35]. However, the computation involved in neural networks is intensive. Mikolov et al. [6] proposed the continuous bag-of-words model (CBOW) and the Skip-gram model, which are highly efficient and can be trained on large-scale corpora.

To improve the quality of word embedding, researchers have focused on morphology, which refers to the elements that compose a word, such as prefixes and suffixes in English or the characters that make up a word in Chinese. Various researchers [79] have incorporated morphology into word embedding training. As such, word embedding models have been developed and have shown superior performance in a variety of NLP tasks, including dialog systems [10], sentiment analyses [11], machine translation [12], and text classification [13].

However, most existing models have focused on the writing itself, ignoring the fact that spoken language expresses the meaning directly, whereas writing is simply a way to record speech. One of the basic principles of modern linguistic theory considers that only spoken words truly reflect the concepts and that writing is simply a record of the spoken language, analogous to a phonograph [14, 15].

In speech processing systems, Bengio and Heigold utilize speech signal directly [16]. They project the signal and words into an embedding space where words that sound alike are nearby in the Euclidean sense. Kamper [17] and He [18] conducted a deeper study based on this. Levin [19] applied acoustic segment embedding to zero resource query-by-example keyword search. These acoustic embeddings which mainly capture phonetic structure are different from word embeddings.

Based on this, this paper proposes the pronunciation-enhanced word embedding model (PWE), which incorporates speech information into the model. Pinyin and phonetic symbols are both direct descriptions of word pronunciation. They both capture aspects of speech not present in the writing system. Therefore, this paper incorporates speech information through adding phonetic symbols or pinyin into the model. The PWE is highly scalable from two perspectives. First, the PWE can easily be combined with existing models. Second, the PWE can be applied to most languages.

This paper presents several methods for combining words and speech information for the Chinese language, English language and Spanish language to construct the PWE. The PWE outperforms a baseline model in word similarity and text classification experiments. In addition, this paper finds that the pronunciation embedding captures semantics and that word embedding contains speech information by revealing the semantic correlation between word embedding and pronunciation embedding. This paper confirms that including pronunciation improves the quality of word embedding.

Related work

Word2vec and related models

Word2vec is an efficient word embedding model that uses a neural network, as proposed by Mikolov et al. [6], and includes the CBOW and the Skip-gram model. The CBOW predicts a target word from its context, and the Skip-gram model uses a word to predict its context. Context is acquired by a sliding window. Given a word sequence D = {x1,x2,…,xN}, the CBOW maximizes the following average log probability: (1) whereas the objective of the Skip-gram model is denoted by (2) Here, j is the context window size. Word2vec uses the following softmax function to calculate the probability: where W denotes the dictionary; and vx and are the input and output word embeddings of word x, respectively. A large-scale corpus exists; therefore, hierarchical softmax and negative sampling are used to improve training efficiency [20].

Subsequently, many models have been proposed to improve word2vec. Le and Mikolov [21] modified word2vec to represent sentences and documents. Qiu et al. [22] considered proximity and ambiguity to improve word2vec. Levy and Goldberg [23] generalized the Skip-gram model moving the focus from linear bag-of-words contexts to arbitrary word contexts.

Improvement from morphology

Several researchers have added morphology to word embedding models to improve the quality of word embedding. Botha and Blunsom [7] assumed that word vectors comprise a linear function of arbitrary sub-elements of the word, e.g., surface form, stem, affixes, and other latent information. For example, the word “unbelievable” consists of “un”, “believ” and “able.” Xu and Liu [8] observe that morphemes have meaning. For example, words ending with the suffix “able” carry the meaning of “capable.” Therefore, morphemes are replaced by their meanings in the model.

For languages such as Chinese, the characters in a word also contain rich semantics. In Chinese, the meaning of the word “教室” (classroom) can be extracted from its two characters, “教” (teach) and “室” (room). Many similar words exist in Chinese. Therefore, Chen et al. [9] proposed a character-enhanced word embedding model (CWE) that integrates characters into training to jointly learn the characters and word embedding. Suppose that the character sequence of word xt is {c1,c2,…,ck}, where ck denotes the kth character of xt. The modified word embedding is defined as where is the original word embedding, k is the number of characters, and represents the jth character. In addition, the CWE proposes several methods to solve character ambiguities, including position-based, cluster-based and nonparametric cluster-based character embedding. The similarity-based character-enhanced word embedding(SCWE) improves the CWE by including the semantic contributions of characters to a word [24].

Writing is symbolized language

Linguists have long discussed the relationship between writing and language. At the beginning of the 20th century, Saussure [14] proposed that language is a storehouse of sound-images and that writing is a tangible form of those images. Language is primarily an auditory symbol system, whereas the written forms of language are secondary symbols (symbols of symbols) that represent the spoken symbols [25]. For a linguist, writing is, except for certain matters of detail, merely an external preservation device, similar to a phonograph, which stores observations about features of historical speech [15]. Therefore, writing is a symbolization of the language symbols that are the basic principles of linguistics.

The Chinese, English and Spanish languages are three different kinds of language in terms of the relationship between words and their pronunciations. In Chinese language, there is no connection between the word and its pronunciation. In English, due to the complex pronunciation rules, we can only guess the pronunciation from the spelling of a word. In Spanish, almost every character has a fixed pronunciation, so we can obtain the pronunciation of a word directly. Therefore, this paper uses the Chinese, English and Spanish languages as examples to create the PWE. In addition, pinyin is used to represent the pronunciation of Chinese characters. For an introduction to Pinyin, please refer to https://en.wikipedia.org/wiki/Pinyin.

The PWE

The basic model

The PWE reflects the linguistic theory of the relationship between writing and language; the model considers that spoken language is the direct expression of semantics and that writing is a record of spoken language. Therefore, the PWE integrates word pronunciations into the word embedding model. The key addition is the PWE's integration of pronunciation into word embedding. Suppose p is the pronunciation of word w, vector vw represents the word w, and vector vp represents the pronunciation p. Then, the modified word embedding to include the pronunciation is defined as follows:

This is the basic idea for obtaining a modified word embedding, and this idea can change according to specific circumstances. After acquiring , other existing word embedding models can be used to train pronunciation and word embedding. In the following sections, this paper introduces concrete PWEs based on specific word embedding models.

The PWE based on word2vec

Word2vec includes 2 models. This section uses the CBOW as an example to introduce the PWE based on word2vec.

The CBOW and PWE based on the CBOW (denoted as CBOW+P) are shown in Fig 1. The difference between the CBOW+P and the CBOW lies in the different methods used to construct the word embedding. The CBOW+P adds pronunciation embedding to the word embedding. Both models predict a target word from context, and the CBOW+P does not attempt to predict the target pronunciation. Therefore, the CBOW+P and the CBOW share the same objective as shown previously in Eq (1).

thumbnail
Fig 1. The CBOW and the CBOW+P.

The structure of the CBOW and the CBOW+P given a word sequence (w1,w2,w3} in which word w2 is predicted by word w1 and w3. In the CBOW+P, p1 and p3 are the pronunciations of words w1 and w3, respectively.

https://doi.org/10.1371/journal.pone.0208785.g001

Let W denote the dictionary and P denote the pronunciation set. A word wiW is represented by the word embedding , and its pronunciation piP is represented by the pronunciation embedding . Assume that pt is the pronunciation of word wt. Then, the word embedding of the CBOW+P is defined as, (3)

After acquiring the modified word embedding, the CBOW+P can be trained similarly to the CBOW; however, the CBOW+P jointly learns both the word embedding and the pronunciation embedding. This study used hierarchical softmax to improve the training efficiency.

The PWE based on the Skip-gram model (denoted as the Skip-Gram+P) also uses Eq (3) to modify the word embedding, and its objective is Eq (2).

The PWE based on the CWE

The CWE assumes that in languages such as Chinese, a word is usually composed of several characters and contains rich internal information [9]. Therefore, the CWE integrates character information into the word embedding model. This section describes a basic construction method for the PWE based on the CWE, and the following section presents another construction method for the Chinese language example.

In this basic method, word embedding consists of three parts: the word, the word's characters and the word's pronunciation. Let W denote the dictionary, C denote the set of characters and P denote the set of pronunciations. Suppose that the character sequence of word wtW is D = {c1,c2,…,ck}, where ciC and ptP is the pronunciation of word wt. is then used to represent the modified word embedding: Here, is the original word embedding, is the embedding that corresponds to the jth character in the word, and is the embedding that corresponds to the pt.

After acquiring the modified word embedding, the PWE based on the CWE can be trained similarly to the CWE; however, the PWE based on the CWE jointly learns the word embedding, the character embedding and the pronunciation embedding.

The PWE in different languages

This paper implemented the PWE based on word2vec for Chinese, English and Spanish. In addition, this study implemented the PWE based on the CWE for Chinese.

For word2vec, in Chinese, let P denote the pinyin set, and pinyin pjP is represented by embedding . Let W denote the dictionary, where word wtW is represented by the word embedding . Suppose word wt includes k characters, and the corresponding pinyin sequence is D = {p1,p2,…,pk}. The modified word embedding that adds the pronunciation is

For English and Spanish, suppose pt is the pronunciation of word wt; the modified word embedding is

This paper proposes two methods for merging the PWE based on the CWE for the Chinese language. The first method is to directly add the pronunciation vector based on the CWE, as described in the previous section, denoted as the CWE+P1. The modified word embedding consists of word, character and pronunciation embeddings. Assume that the character sequence of word wtW is D = {c1,c2,…,ck}, and its corresponding pinyin sequence is H = {p1,p2,…,pk}, where pi is the pinyin of character ci. The modified word embedding is defined as follows: where is the original word embedding; and and are the embedding of character cj and pinyin pj, respectively. For example, the word “教室” (classroom) includes two characters, “教” (teach) and “室” (room). The pinyin of “教” (teach) is “jiao4” and the pinyin of “室” (room) is “shi4.” The modified word embedding is

The second method creates an embedding for each pinyin of the character. Based on the specific pronunciation of the character in the word, this method adds the corresponding embedding to the word embedding to achieve the modified word embedding. This method is denoted as the CWE+P2. As described in Section 3, character ci may have n different pronunciations {p1,p2,…,pn}; therefore, the CWE+P2 creates n embedding values that correspond to those n pronunciations. For example, suppose that the character sequence of word wt is D = {c1,c2,…,ck} and the corresponding pinyin sequence is H = {p1,p2,…,pk}, where pi is the pinyin of character ci in the word wt. The modified word embedding is defined as follows: The embedding corresponds to the pinyin of the jth character in the word. For the word “教室” (classroom), embeddings for the pinyin “jiao4” of character “教” (teach) and the pinyin “shi4” of character “室” (room) are created and denoted as and , respectively. The modified word embedding for the word “教室” (classroom) is defined as follows:

Experiments

Datasets and tools

Using the Chinese, English and Spanish languages as examples, this study conducted an experiment to evaluate the model, selecting the Chinese Wikipedia Dump, English Wikipedia Dump and Spanish Wikipedia Dump corpora to train the model. You can get corpus from http://download.wikipedia.com/zhwiki/, http://download.wikipedia.com/enwiki/ and http://download.wikipedia.com/eswiki/ respectively. For Chinese, word segmentation is necessary; ANSJ was used as the word segmentation tool. You can get ANSJ from https://github.com/NLPchina/ansj_seg. ANSJ supports Chinese name recognition and includes a user-defined dictionary. ANSJ can process approximately one million words per second and can reach a word segmentation accuracy greater than 96%. This paper uses HanLP(refer to http://hanlp.linrunsoft.com/) to convert Chinese words into pinyin. HanLP is a comprehensive Chinese NLP processing tool implemented with Java. HanLP's pinyin conversion tool recognizes polyphones at millisecond response speeds. The context window size is set to 5, and the embedding dimension sizes are set to 100.

Word similarity

In the experiment, the cosine similarity of word embedding is used to indicate semantic relatedness of a given word pair. A word similarity experiment is implemented to evaluate the quality of word embedding by comparing the semantic relatedness computed by models with human judgments. For this paper, wordsim-240 and wordsim-296 [26] were used as the evaluation datasets for Chinese; for English, MTurk-771 [27], MEN [28], WS-353-SIM and WS-353-REL [29] were used; for Spanish, WS-353 [30] and RG-65 [31] were used. The numbers of word pairs in these datasets are shown in Table 1.

This paper uses the Spearman correlation ρ to evaluate the relatedness between the model results and human judgment; then, the model's performance on the word similarity experiment is evaluated according to ρ. In this experiment, the word pairs that contain new words are ignored. Each model is trained a minimum of 10 times; consequently, the experiment acquired at least 10 different results for each model. Tables 24 show the averaged experimental results for the different languages.

thumbnail
Table 2. Experimental results of word similarity (ρ * 100) for Chinese.

https://doi.org/10.1371/journal.pone.0208785.t002

thumbnail
Table 3. Experimental results of word similarity (ρ * 100) for English.

https://doi.org/10.1371/journal.pone.0208785.t003

thumbnail
Table 4. Experimental results of word similarity (ρ * 100) for Spanish.

https://doi.org/10.1371/journal.pone.0208785.t004

From the experimental results of the model based on the CBOW, a number of results can be observed. Regardless of languages and datasets, the best results are obtained by the CBOW+P. The CBOW+P outperforms the CBOW for all the datasets and languages. For Chinese, the ρ of the CBOW+P increases by 4.7% on wordsim-296. For Spanish, the ρ of the CBOW+P increases by 6.8% on RG-65. The models that include pronunciation information generally performed better than the benchmark models. For Chinese, the CBOW+P and the CWE+P1 outperformed the corresponding benchmark models. For English and Spanish, the CBOW+P outperformed the CBOW.

From the experimental results of the model based on the Skip-gram model, we observe that the PWE based on the Skip-gram model achieves good results. For English, the ρ of the Skip-gram+P model is better than that of the Skip-gram model on MTurk-771, WS-353-SIM and WS-353-REL. For Spanish, the ρ of the Skip-gram+P model increases by 4.6% on RG-65. However, the experimental results based on the Skip-gram model are somewhat weaker than the experimental results based on the CBOW. This suggests that it is difficult to predict surrounding sounds based on only one sound, but given a sound sequence that lacks a sound, it is easy to guess the missing sound based on the surrounding sounds, which likely explains why the experimental results based on the CBOW are better than those based on the Skip-gram model. From this experiment, we can observe that the PWE performs well for different languages.

Text classification

For text classification experiments, this paper adopted the Fudan (refer to http://www.datatang.com/data/44139), Sogou (refer to http://download.labs.sogou.com/) and Netease (refer to http://www.datatang.com/data/1196) corpora for Chinese, 20Newsgroups (refer to http://qwone.com/~jason/20Newsgroups/) for English and TASS 2017 [32] for Spanish as the experimental datasets. The Fudan corpus contains 20 categories. The number of documents in each category ranges from tens to thousands. For the experiment presented in this paper, 5 categories were selected, each of which contains more than 1,000 documents. Table 5 shows the categories and the corresponding numbers of documents. The documents include various types of papers and news reports. Sogou and Netease are news classification datasets. Sogou includes nine categories; each category includes 1,990 documents. Netease includes six categories with 4,000 documents each. 20Newsgroups contains 20 newsgroups and some of the newsgroups are very closely related to each other. This paper extracted six categories followed by its home page shown in Table 6. The category column indicates the categories that were extracted from the specific categories and the sub-categories column indicates categories that were included in extracted category. TASS 2017 is a sentiment classification dataset that is based on Twitter. TASS 2017 includes “P”, “N”, “NEU” and “NONE” 4 categories and “P”, “N” have the most samples. As such, this paper chose “P” and “N” for the experiment.

thumbnail
Table 5. Categories and their sizes selected from the Fudan corpus.

https://doi.org/10.1371/journal.pone.0208785.t005

This experiment used the average word embedding in the document to represent the document. The text classifier was trained with LIBLINEAR [33]. For a corpus that does not distinguish between the training set and test set, we used 5-fold cross validation. The accuracies of the models on different languages are shown in Table 7 and their F-measure scores for each category of different corpus are shown in Tables 812 respectively. The F-score is a metric that considers both precision and recall.

thumbnail
Table 8. F-score (%) on each category from the Fudan corpus.

https://doi.org/10.1371/journal.pone.0208785.t008

thumbnail
Table 9. F-score (%) on selected categories from the Sogou corpus.

https://doi.org/10.1371/journal.pone.0208785.t009

thumbnail
Table 10. F-score (%) on selected categories from the Netease corpus.

https://doi.org/10.1371/journal.pone.0208785.t010

thumbnail
Table 11. F-score (%) on selected categories from the TASS 2017 corpus.

https://doi.org/10.1371/journal.pone.0208785.t011

thumbnail
Table 12. F-score (%) on selected categories from the 20Newsgroups corpus.

https://doi.org/10.1371/journal.pone.0208785.t012

Several observations can be made from the preceding Tables. (1) In almost all languages and the corpus, the optimal accuracy is obtained by adding pronunciation information to the model. (2) Regardless of languages and the corpus, the optimal F-score in each category is generally obtained by a model that adds pronunciation information. (3) After adding the pronunciation embedding, the accuracy and F-score of each category are generally better than the benchmarks. For Chinese, the PWE based on word2vec generally outperforms word2vec, and the PWE based on the CWE is also generally better than the CWE. For English, the accuracy and the F-score of the PWE based on word2vec are generally better than that of word2vec on 20-Newsgroups. For Spanish, the accuracy and the F-score of “Negative” of the CBOW+P model is better than CBOW. The above 3 points demonstrate that including pronunciation information improves the performance of the word embedding model for different languages.

Qualitative analysis

This section evaluates the quality of word embedding and pronunciation embedding by finding the words most similar to a given sound and the sounds most similar to a given word in the Chinese language. All embeddings were trained by CBOW+P. Cosine similarity is used to find the 4 most similar embeddings. The experimental results are shown in Tables 13 and 14.

thumbnail
Table 13. 4 most similar pinyin of the target word in diminishing order of similarity from Pinyin 1 to Pinyin 4.

https://doi.org/10.1371/journal.pone.0208785.t013

thumbnail
Table 14. 4 most similar words of the target word in diminishing order of similarity from Word 1 to Word 4.

https://doi.org/10.1371/journal.pone.0208785.t014

Table 13 shows that a semantic correlation exists between words and the pinyin that are the most similar to the word. For example, the 4 most similar pinyin of the word “投降” (surrender) are “che4”, “tui4”, “jun1” and “bai4”, where “che4” and “tui4” mean “撤退” (withdraw), “jun1” means “军队” (army) and “bai4” means “败” (fail). The words “财富” (wealth) and “体育” (sports) also have semantic correlations to similar pinyin.

According to Table 14, the words that are most similar to the pinyin contain the characters of those pinyin. For example, the 4 most similar words of pinyin “bai4” include the characters “败” (fail), “拝” (bow), and 和“拜” (bow), whose pinyin are “bai4”. The 4 most similar words of the pinyin “shui4” include the characters “睡” (sleep) and “税” (tax), whose pinyin are both “shui4”. The 4 most similar words of the pinyin “cai2” include the characters “才” (just) and “财” (wealth), whose pinyin are both “cai2”. This result demonstrates that the word embedding obtained by the PWE contains rich sound information.

Conclusions

According to linguistic principles, spoken language is a direct expression of semantics, and written language is a record of spoken language. This paper proposes the PWE, which integrates word pronunciation into the word embedding model. The PWE is highly expandable from two aspects. First, the PWE can easily be combined with existing models such as word2vec and the CWE. Second, language is a storehouse of sound-images; therefore, the PWE can be applied to most languages. This paper introduces a variety of PWEs based on different existing models for different languages. Word similarity and text classification experiments demonstrate that the quality of word embedding is improved after adding sound information, which is beneficial to training. In addition, a qualitative embedding analysis revealed that word embedding contains rich sound information and that pronunciation embedding also contains semantic information. However, this paper adds pronunciation embedding to word embedding. The use of sound information is relatively simple and should be further utilized.

References

  1. 1. Harris Z S. Distributional structure. Word, 1954, 10(2–3): 146–162.
  2. 2. Bengio Y, Ducharme R, Vincent P, Jauvin C. A Neural Probabilistic Language Models. Journal of Machine Learning Research 3.6(2003):1137–1155.
  3. 3. Collobert R Weston J. A unified architecture for natural language processing:deep neural networks with multitask learning. International Conference DBLP, 2008:160–167.
  4. 4. Mnih A, Hinton G. A scalable hierarchical distributed language model. International Conference on Neural Information Processing Systems Curran Associates Inc. 2008:1081–1088.
  5. 5. Mikolov T, Kombrink S, Burget L, Cernocky J.H. Extensions of recurrent neural network language model. IEEE International Conference on Acoustics, Speech and Signal Processing IEEE, 2011:5528–5531.
  6. 6. Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
  7. 7. Botha J, Blunsom P. Compositional morphology for word representations and language modelling. International Conference on Machine Learning. 2014: 1899–1907.
  8. 8. Xu Y, Liu J. Implicitly Incorporating Morphological Information into Word Embedding. arXiv preprint arXiv:1701.02481, 2017.
  9. 9. Chen X, Xu L, Liu Z, Sun M, Luan HB. Joint Learning of Character and Word Embeddings. IJCAI. 2015: 1236–1242.
  10. 10. Ryu S, Kim S, Choi J, Yu H, Lee GG. Neural sentence embedding using only in-domain sentences for out-of-domain sentence detection in dialog systems. Pattern Recognition Letters, 2017, 88: 26–32.
  11. 11. Dos Santos C N, Gatti M. Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts. COLING. 2014: 69–78.
  12. 12. Zhang J, Liu S, Li M, Zhou M, Zong C. Bilingually-constrained Phrase Embeddings for Machine Translation. ACL (1). 2014: 111–121.
  13. 13. Chen J, Zhang C, Niu Z. Identifying Helpful Online Reviews with Word Embedding Features. International Conference on Knowledge Science, Engineering and Management. 2016: 123–133.
  14. 14. De Saussure F. Course in general linguistics (1915)[J]. New York: Philosophical Library.[JL], 1959.
  15. 15. Bloomfield L. Language, Holt. New York, 1933.
  16. 16. Bengio S, Heigold G. Word embeddings for speech recognition. In IEEE Int. Conf. Acoustics, Speech and Sig. Proc., 2014.
  17. 17. Kamper H, Wang W, Livescu K. Deep convolutional acoustic word embeddings using word-pair side information. IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2016:4950–4954.
  18. 18. He W, Wang W, Livescu K. Multi-view recurrent neural acoustic word embeddings. arXiv preprint arXiv:1611.04496, 2016.
  19. 19. Levin K, Jansen A, Van Durme B. Segmental acoustic indexing for zero resource keyword search. Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 2015: 5828–5832.
  20. 20. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed Representations of Words and Phrases and their Compositionality. Advances in Neural Information Processing Systems, 2013, 26:3111–3119.
  21. 21. Le Q V, Mikolov T. Distributed Representations of Sentences and Documents. Computer Science. 2014, 4:1188–1196.
  22. 22. Qiu L, Cao Y, Nie Z, Yu Y, Rui Y. Learning Word Representation Considering Proximity and Ambiguity. AAAI Conference on Artificial Intelligence. 2014: 1572–1578.
  23. 23. Levy O, Goldberg Y. Dependency-Based Word Embeddings. Annual Meeting of the Association for Computational Linguistics. 2014: 302–308.
  24. 24. Xu J, Liu J, Zhang L, Li Z, Chen H. Improve Chinese Word Embeddings by Exploiting Internal Structure. Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016:1041–1050.
  25. 25. Sapir Edward. Language: an introduction to the study of speech. Harcourt, Brace, 1921.
  26. 26. Jin P, Wu Y. SemEval-2012 task 4: evaluating Chinese word similarity. Joint Conference on Lexical and Computational Semantics. Association for Computational Linguistics, 2013:374–377.
  27. 27. Halawi Guy, Dror Gideon, Gabrilovich Evgeniy, Koren Yehuda. 2012. Large-scale learning of word relatedness with constraints. In KDD.
  28. 28. Bruni E., Tran N.-K., Baroni M. Multimodal distributional semantics. J. Artif. Intell. Res.(JAIR), vol. 49, no. 1–47, 2014.
  29. 29. Agirre E, Alfonseca E, Hall K, Kravalova J, Pas¸ca M, Soroa A. A study on similarity and relatedness using distributional and wordnetbased approaches. in Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2009, pp. 19–27.
  30. 30. Hassan S, Mihalcea R. (2009). Cross-lingual semantic relatedness using encyclopedic knowledge. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP).
  31. 31. Camacho-Collados J, Pilehvar MT, Navigli R. A Framework for the Construction of Monolingual and Cross-lingual Word Similarity Datasets. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL), Short Papers, Beijing, China, July 27–29, 2015.
  32. 32. Martínez-Cámara E, Díaz-Galiano M C, Cumbreras M Á G, Garc´ıaVega M, Villena-Rom´an J. Overview of TASS 2017. TASS 2017: Workshop on Sentiment Analysis at SEPLN. 2017.
  33. 33. Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ. 2008. Liblinear: A library for large linear classification. The Journal of Machine Learning Research, 9:1871–1874.