Computational Models of Consumer Confidence from Large-Scale Online Attention Data: Crowd-Sourcing Econometrics

Xianlei Dong; Johan Bollen

doi:10.1371/journal.pone.0120039

Abstract

Economies are instances of complex socio-technical systems that are shaped by the interactions of large numbers of individuals. The individual behavior and decision-making of consumer agents is determined by complex psychological dynamics that include their own assessment of present and future economic conditions as well as those of others, potentially leading to feedback loops that affect the macroscopic state of the economic system. We propose that the large-scale interactions of a nation's citizens with its online resources can reveal the complex dynamics of their collective psychology, including their assessment of future system states. Here we introduce a behavioral index of Chinese Consumer Confidence (C3I) that computationally relates large-scale online search behavior recorded by Google Trends data to the macroscopic variable of consumer confidence. Our results indicate that such computational indices may reveal the components and complex dynamics of consumer psychology as a collective socio-economic phenomenon, potentially leading to improved and more refined economic forecasting.

Figures

Citation: Dong X, Bollen J (2015) Computational Models of Consumer Confidence from Large-Scale Online Attention Data: Crowd-Sourcing Econometrics. PLoS ONE 10(3): e0120039. https://doi.org/10.1371/journal.pone.0120039

Academic Editor: Tobias Preis, University of Warwick, UNITED KINGDOM

Received: August 27, 2014; Accepted: January 19, 2015; Published: March 31, 2015

Copyright: © 2015 Dong, Bollen. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Data Availability: All data required to replicate our findings are available at the following URL: https://github.com/jbollen/Dong_Bollen_2014_CCI.

Funding: Xianlei Dong was supported by the Chinese Scholarship Council from 2013–2014. Johan Bollen was supported by NSF Award 0914939. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

The growth of most modern economies is driven by consumer spending [1]. Therefore, consumer confidence levels can have significant effects on economic growth. Consumer Confidence Indices (CCI) are designed to measure the degree of confidence that consumers have with respect to the state of the economic system. The basis for many CCIs lies in behavioral science where evidence has accumulated that individual consumer behavior is influenced by a number of emotional and social factors [2, 3] that interact with the consumer agents’ socio-economic context. In other words, the emotional state of consumers as well as their assessment of that of other consumers will shape their subsequent individual consumption patterns [4, 5]. In the aggregate, as consumers collectively lose or gain confidence in the state of the economy, this is assumed to affect their collective consumption patterns and thus economic growth yielding a complex interaction between consumer confidence and economic conditions. This interplay between the complex behavior of individual agents and the emergent properties of their collective behavior is analogous to those seen in many other large-scale socio-technical systems [6, 7].

Accurate, valid, and timely measures of consumer confidence are thus of pivotal importance to policy-makers and econometric forecasting. However, as a social and abstract construct “consumer confidence” is difficult to measure. Researchers have turned to social science methods such as surveys and questionnaires which are expensive and time-consuming to conduct, and are possibly subject to a number of personal, cultural, and social biases, e.g. social conformity bias [8] which will confound measures of consumer confidence with cultural and linguistic propensities to divulge or withhold accurate information concerning one’s level of confidence. The latter also makes it difficult to compare consumer confidence across different linguistic and cultural regions.

Here we investigate a computational approach that leverages large-scale search engine query volumes to gauge consumer confidence. We start from the assumption that search engine volumes reflect the issues that a population is contemporaneously pre-occupied with [9], congruent with recent work in the area of market modeling [10–14]. Hence, consumer confidence may be manifested in the volume of certain web searches such as “taxes”, “investment”, and “stocks”, but not others, e.g. “cloud” and “cat”. We focus on China since it provides an interesting case for Consumer Confidence studies given its unique linguistic and cultural background, and the important role that the consumption patterns of its burgeoning middle-class are now playing in the global economy [15].

We obtain Google query volume time series for a number of Chinese characters that are likely to express various facets of Chinese Consumer Confidence given their use in existing surveys of consumer confidence in China. Using a principal component analysis, we isolate the queries that are the main indicators of Chinese consumer confidence [16], and define a Chinese Consumer Confidence Index (C3I) from a linear combination of the respective search volume data. We cross-validate the C3I against existing gauges of consumer confidence, demonstrating its ability to offer an accurate, timely, and informative view on consumer confidence in a region that has been historically underserved with regards to econometric indices. Our results indicate that the C3I yields new information on the nature of Chinese Consumer Confidence. Our work may thus contribute to the science of modeling the social construct of consumer confidence and its socio-economic correlates that shape the emergent properties of economies as large-scale socio-technical systems [7].

Materials and Methods

In our investigation we rely on the following data sources:

Consumer Confidence data from the Chinese Consumer Confidence Index (CCI) and the Economist’s Confidence Questionnaire (ECQ) surveys for the period under consideration.
Google Trend data for a specific number of search queries corresponding to the same time period.

Given the different construction of the CCI and ECQ, we use the first as an official indicator of Chinese Consumer Confidence and the latter as a source from which to extract consumer confidence topics that are subsequently translated into Google queries.

Chinese Consumer Confidence Index (CCI) survey

The Chinese Consumer Confidence Index (CCI) is reported by the National Bureau of Statistics of China (NBSC) on a monthly basis. Its methodology consists of asking 3,500 individuals (after November, 2009) about their confidence levels of the present and the future. It consists of a questionnaire of about 5 simple questions each pertaining to what is assumed to be a specific component of consumer confidence, e.g. “How do you see your current employment conditions?”. Subjects’ responses are recorded on a 5-point scale. We obtained historical monthly data of Chinese CCI from National Bureau of Statistics of China for the period of January 2006 to June 2013, i.e. 90 months, as shown in Fig 1. It must be noted that the CCI numbers reported by the NBSC may be affected by changing data normalization practices and other adjustments over time [17].

Download:

Fig 1. Monthly time series of Chinese CCI provided by the National Bureau of Statistics of China for the period of January 2006 to June 2013.

https://doi.org/10.1371/journal.pone.0120039.g001

Economist’s Confidence Questionnaire (ECQ) topic extraction

The CCI is designed to be succinct and fast to administer. Hence it consists of short questions designed to be answered in terms that are directly evaluative of the question, e.g. “How do you see your current employment conditions?” answered by either “positive” or “negative”. However, we are looking to model the notion of Chinese Consumer Confidence as exhaustively as possible so we can determine its correlates in online indicators.

The Economist’s Confidence Questionnaire (ECQ) contains 31 open questions such as “What do you presently consider the greatest threat to the Chinese economy?”, with a number of possible responses provided that can range from a few items to more than 15. Given the more open and exhaustive nature of the ECQ we manually extract the core topics of the ECQ’s questions and answers, and corresponding Chinese characters, to define an initial set of terms that can be reliably transformed to specific Google search queries. The volume of the latter are then taken to indicate the level of online attention with respect to that particular topic. For example, ECQ Question 13 is “How do you think the dollar value may change in the next 6 months?”. From this question we manually extract the Chinese character for “dollar trend”, and add it to the set of topics that we deem to be indicative of consumer confidence. We then retrieve Google Trend data for each such individual topic.

As shown in S1 Table (Supplementary materials) and Fig 2, we extract a total of 44 topics from the ECQ’s questions ranging from large-scale macro-economic concepts such as “inflation” to more personal notions such as “food price”. Out of those 44 topics, only 34 have sufficient Google query volumes and are thus included as variables in our later analysis.

Download:

Fig 2. Topics and Variable Names extracted from CCI survey.

Note: only the first 34 topics (x₁-x₃₄) are used as variables in our model since other topics did not have sufficient query volumes in Google Trends.

https://doi.org/10.1371/journal.pone.0120039.g002

Google Trends data

Google Trends (www.google.com/trends/) is an online service offered by the Google search engine; it allows researchers to retrieve weekly/monthly normalized search volume data for any user-provided search query, provided the query has non-zero search volume. For example, a user can enter the query “good” and Google Trends will return a weekly time series whose values represent the volume of searches for that query recorded by Google in that period of time on a weekly basis. An example of the Google Trends data for the Chinese character “Hao” (en: “good”) is shown in Fig 3.

Download:

Fig 3. Google Trends graph showing weekly fluctuations of search volume for “Hao” (en: “good”).

https://doi.org/10.1371/journal.pone.0120039.g003

As such we obtain Google Trends data for the 34 above mentioned topics that produce non-zero search volumes from January 2006 to June 2013 thereby matching the date range of our CCI data. Since Google Trends data can be weekly and CCI data is released monthly, we convert all weekly Google Trends time series to monthly time series by means of a 4-week moving average. Since some months are longer than 4 weeks, where necessary, we move data points at the end of the month’s last week to the next month.

Methodological overview

Our research objectives are four-fold:

We model Chinese Consumer confidence from the covariances between 34 ECQ topic time series
We define a new Chinese Consumer Confidence Index (C3I) based on the principal components of (1)
We compare our C3I to the CCI using a stepwise regression model that fits the C3I components to the CCI, including a determination of whether or not one indicator leads the other.
We conduct a preliminary test of our model against new Google Trends data that was not included in the original data used model construction (July 2013 to May 2014).

In Fig 4 we show an overview of our multi-phased methodology which is further explained in subsequent sections.

Download:

Fig 4. Methodological overview.

(a) We study the relationship between China’s official CCI data (Y) and Google Trends Data (X). (b) We use a PCA to determine the principal components of X, followed by a Granger test and VAR to determine the lead or lag relations between X and Y. (c) PCA⁻¹ denotes the inverse operation of PCA to obtain the fitted values of our original model.

https://doi.org/10.1371/journal.pone.0120039.g004

Results and Discussion

Principal Component Analysis of ECQ topic covariances

Each of the 34 Google trends time series (corresponding to the ECQ questionnaire topics) can be taken as independent variables, representing a certain facet of consumer confidence. However, we need to determine the degree of multicollinearity to investigate whether each variable independently represents consumer confidence, and to ensure the validity of later regression models used to fit a potential C3I based on these 34 independent variables to the CCI.

Therefore we perform a principal components analysis (PCA) [18] to ensure the orthogonality of our components and to avoid the issue of multicollinearity in future regression models. Furthermore, this procedure reduces dimensionality and may provide information on the underlying components of the covariances of our 34 Google trends time series. We list the 10 highest ranked components with their loadings in Table 1 and provide a scree plot in Fig 5. A Kaiser-Meyer-Olkin Measure of Sampling Adequacy (KMO) test [19] and Squared Multiple Correlation (SMC) test [20] indicate that the PCA was indeed a suitable procedure with the large majority of values well above 0.8 (1.0 is optimal).

Download:

Table 1. Principal Components of our 34 Google trends time series with proportion of variance covered.

https://doi.org/10.1371/journal.pone.0120039.t001

Download:

Fig 5. Scree plot showing the eigenvalue distribution of our Principal Component Analysis of 34 Google trends time series.

https://doi.org/10.1371/journal.pone.0120039.g005

Judging from the scree plot, we arbitrarily retain the first 9 PCA components since they represent the majority of information on the original topic covariances (about 85%), thus ensuring we retain all relevant information for accurate modeling. Not all 9 components need to be included in our transitional model since each carries increasingly less information. In fact, whether we choose 8, 9, or 10 components should be of little significance to our transitional model.

We project our topic variables on the selected 9 components, (C₁, C₂, …, C₉), and define where X = (x₁, x₂, …, x₃₄)^T refers to our 34 topic time series and c_i refers to the entries of the 9 component vectors as listed in Table 2.

Download:

Table 2. Entries of the first 9 PCA components.

https://doi.org/10.1371/journal.pone.0120039.t002

To avoid spurious regression results [21, 22], we must determine whether our time series are stationary (I(0)) or have co-integrated relationships. After we extract the 9 first components, we conduct a KPSS test [23] and an ADF test [24, 25] to check the variables’ stationarity. As shown in Table 3 only C₁ is not stationary. We therefore define DC_1,t = C_1,t−C_1,t−1 (difference once), and find that it is stationary. Subsequently all the 10 variables (CCI, DC₁, C_{i ∈ {2, 3, ⋯, 9}}) are stationary.

Download:

Table 3. Stationary test results.

https://doi.org/10.1371/journal.pone.0120039.t003

Model definition

After determining the principal components of our Google trend time series data, i.e. the components that best describe consumer confidence as indicated from Google query volume with respect to our 34 survey topics, we perform a Vector Auto-regression (VAR) [26] to determine the degree of auto-correlation in our CCI data. As shown in Table 4, we find a considerable degree of auto-correlation, indicating the necessity to include CCI at lag 1 and C₃ at lag 2 as independent variables in future analysis. This finding is intuitive, since consumers may factor previous confidence into their assessment of future conditions along with other present information.

Download:

Table 4. Vector Auto-regression Results.

https://doi.org/10.1371/journal.pone.0120039.t004

We conduct a Granger Causality test [27] between our independent variables, C_{i ∈ {2, 3, ⋯, 9}} and DC₁ vs. one dependent variable, namely CCI_t, to look for Granger-causative relationships between CCI and independent variables. The results in Table 5 indicate that independent variable C₂ is Granger causative of the CCI. Results in behavioral science [28] indicate that people tend to discount older information in favor of newer information. We therefore choose variables that were lagged one and two units for testing. Refering to the results, we add C₂ at lag 1 and lag 2 in our independent variables.

Download:

Table 5. Results of Granger Test of CCI vs. DC₁ and C_{{1, 2, ⋯, 9}}.

https://doi.org/10.1371/journal.pone.0120039.t005

A normalization of CCI data in reference to 1996 data [17] ended in November 2009 leading to an apparent discontinuity in the CCI data in 2009–2010 as shown in Fig 1. To determine whether our CCI data is biased by structural changes or not, we conduct as Structural Change test [29]. The results are summarized in Table 6; the null-hypothesis that no structural change occurred must be rejected. In other words, the results indicate a structural change is likely to have occurred in November 2009 possibly because of the use of new survey or normalization methods [17].

Download:

Table 6. Results of Structural Change Tests.

https://doi.org/10.1371/journal.pone.0120039.t006

As indicated in Table 6, all three tests imply there is a structural change in the time series, which may have resulted from the NBSC standardization in November 2009. We therefore add dummy variable D to all the independent variables of our model, where the first time period comprises 47 months and the second time period comprises 42 months. (1)

Then, our transitional model (i.e. Model 2 in Fig 4) can be written as follows: (2) where t₀ = 47.

Results

We then proceed with a Stepwise Regression [30] as follows:

Set an appropriate significance level of 0.05.
Fit Eq 2 by Ordinary Least Squares (OLS).
If all the parameters pass the test, then stop, otherwise, proceed to step 4.
Select the variable with the lowest significance level, and drop it. Fit the new equation, minus the variable, by OLS.
Repeat step 3 and step 4, until all variables pass the test.

As shown in Table 7 the resulting model exhibits a good fit as indicated by a significant adjusted R² (0.923). We conduct a White Test [31] to determine whether the regression has heteroscedasticity. As indicated by the results shown in Table 8, this is the case. Therefore, we re-run our stepwise model with robust standard errors [32]. The results are shown in Table 9. The results improve considerably: all parameters pass the test and we observe an improved adjusted R². However, we must point out that although the use of robust standard errors improves the estimate, we can not guarantee that the regression has no heteroscedasticity.

Download:

Table 7. Regression results for model represented by Eq (2).

https://doi.org/10.1371/journal.pone.0120039.t007

Download:

Table 8. White test results.

https://doi.org/10.1371/journal.pone.0120039.t008

Download:

Table 9. Regression results for model represented by Eq (2).

https://doi.org/10.1371/journal.pone.0120039.t009

Using the regression results we can model C3I as shown in Eq (3). (3)

This fitted equation preserves the major components of the PCA (C₂−C₆) to avoid significant information loss. We can formulate our final fitted model using the original indices as shown in Eq (4). (4) where X^T = (x₁, x₂, …, x₃₄); and the entries of A, B, and C are provided in Table 10.

Download:

Table 10. Topic Effect on C3I according to matrix A, B and C.

https://doi.org/10.1371/journal.pone.0120039.t010

This result indicates that the C3I is partially shaped by its own previous values. We speculate that people may extrapolate their present confidence to an assessment of future economic confidence, in addition to other relevant information.

The first part of Eq (4), i.e. t ≤ 47 corresponds to the period before December 2009. Matrix A, shown in Table 11, can be split into 2 categories of topics, namely those that contribute positively to C3I and those that contribute negatively according to their coefficients. Note that the topics themselves do not contribute to C3I. The attention they receive in the population, measured by Google trends volume, is used as an indicator of the population’s pre-occupation with the topic in relation to the C3I. The topics in Table 11 thus reveal the internal topical structure of this particular measurement of consumer confidence through a behavioral measure and which topics contribute negatively or positively to our estimation of C3I. As shown in Tables 11, 12, and 13 we see that a number of topics that contribute positively to our estimation of C3I change polarity after November 2009. This change may indicate that the population changed its assessment of these topics, leading to a different contribution to their consumer confidence, or potentially a change in how the CCI is measured. For example, when a large number of individuals search for “over capacity” this might occur because of the perception of over capacity as a negative issue, while some years later, people might search for the same topic from the position that over capacity is improving, hence making a positive contribution to their consumer confidence.

Download:

Table 11. Matrix A: Parameters of topics’ current effect on C3I before December 2009.

https://doi.org/10.1371/journal.pone.0120039.t011

Download:

Table 12. Matrix B: Parameters of topics’ current effect on C3I after November 2009.

https://doi.org/10.1371/journal.pone.0120039.t012

Download:

Table 13. Matrix C: Parameters of topics’ future effect on C3I.

https://doi.org/10.1371/journal.pone.0120039.t013

Matrices A, B, and C indeed reveal significant changes in the structure of the C3I over time. In Tables 11, 12 and 13, we show how certain topics contribute positively or negatively to C3I values. In particular we see that before December 2009 (Table 11) positive topics include “stocks”, “CPI”, and “trade balance”. Negative topics notably include “prices”, e.g. “housing”, “fuel”, “food”, “over capacity”, and concerns about “economic transition”. Examining Table 12 we find that these are not influencing C3I as strongly after November 2009. Rather, the top ranked positively contributing topics are now “over capacity”, “real estate”, and “housing prices”. We do note that the negatively contributing topics continue to include “exchange rates” and “foreign exchange”.

Comparing Tables 11 and 12 with 13 reveals that the “future” influence of topics in our C3I model might overall be less than its current influence. Positive topics such as “real estate sales”, “population aging” and negative topics such as “crude or food price”, “exchange rate” have much lower parameter values in Table 13. In addition, these results shows that social media influence on C3I in our model increases after November 2009 possibly indicating that the public is increasingly expressing their outlook through online activity.

As shown in Fig 6, our Google Trends data indicates a consistent downward trend in consumer confidence from 2007 to the present which is not mirrored by official CCI data. However, Google Trends data presumably provides only a partial indicator of the factors that shape consumer confidence. We can therefore not conclude that our Google Trends model indicates an actual downtrend in consumer confidence. This result does point to an interesting divergence between two different, but related measures of consumer confidence. We also note that after the observed discontinuity, CCI does exhibit a slight downward trend.

Download:

Fig 6. The contribution of Google Trends Data to our C3I model plotted over time reveals a downward trend possibly indicating that the public are losing economic confidence as judged from search engine queries.

https://doi.org/10.1371/journal.pone.0120039.g006

Finally, we compare C3I values generated by our model to the actual CCI values in Fig 7 which highlights the strong degree of correspondence between our model and actual CCI values as reported by the Chinese National Bureau of Statistics. In fact, after conducting our original analysis, we obtained new Google Trends data for the period July 2013 to May 2014, nearly a year, and re-applied the model developed from the original to this new Google Trends data. As shown in Fig 7 our model outcomes match the new C3I values quite well, in spite of the renormalization that Google applies to each new data request, indicating that the C3I model is robust to minor changes in the underlying Google Trends data.

Download:

Fig 7. Graph overlay of CCI values estimated by our C3I model vs. official CCI values reported by the National Bureau of Statistics of China.

Note that the period from July 2013 to May 2014 represents an estimation of CCI values on the basis of new Google trends data not included in the data used to generated our model to determine model robustness.

https://doi.org/10.1371/journal.pone.0120039.g007

Conclusions

We model Chinese Consumer Confidence by analyzing the relationship between Chinese CCI data and Google Trends time series for query topics derived from official CCI questionnaires. Our model manages to approximate historical CCI values as well as new C3I values obtained after analysis of the original data by relying merely on Google Trends data as well as lagged CCI data manages. This finding suggests that the results of expensive and time-consuming Consumer Confidence surveys might be complemented by more economical and time-efficient methods that leverage online behavioral indicators and may additionally reveal the deeper structure of the abstract notion of consumer confidence as measured by the CCI. In fact, rather than an approximation of official CCI data, the use of Google Trends data might in fact complement the assessment of consumer confidence by incorporating additional relevant dimensions of consumer confidence and avoiding structural measurement changes.

We do caution that our use of the CCI as ground truth implies that any biases or deficiencies of the CCI will impact the validity of our own model as well. By extracting separate terms and topics from the CCI’s survey questions we may not fully capture the essence of how it semantically expresses Chinese consumer confidence. Furthermore, our selection is restricted to the CCI and may thus not comprehensively capture the full extent of actual Chinese consumer confidence. Future work will be directed at a more exhaustive and principled translation of the notion of consumer confidence to a set of search engine query terms, possibly from a variety of other sources. In fact, our reliance on Google trends data may introduce a number of other issues. If a particular aspect of consumer confidence can not be gauged from search engine volume, our method won’t capture it. As suggested by [34] the validity and accuracy of our model could thus be improved by the inclusion of other related indicators of consumer behavior such as social media feeds, blog volume, newspaper data, etc.

In spite of the deficiencies of our present approach, we have demonstrated the feasibility of modeling large-scale socio-economic phenomena such as consumer confidence from behavioral online data such as Google search queries. This opens new possibilities for more exhaustive, accurate, and finer-grained models of complex dynamic socio-technical systems such as a nation’s economy which is shaped by the interactions of large number of autonomous agents that respond to individual and collective conditions, as well as global systemic information such as financial news, economic growth forecasts, GDP numbers, and inflation numbers.

Supporting Information

S1 Table. Translation of ECQ Questionnaire with model topics marked in bold.

https://doi.org/10.1371/journal.pone.0120039.s001

(PDF)

Acknowledgments

We are grateful for the support of the Chinese Scholarship Council. We thank Jingwen Li and Jun Guan for their help and early comments on our work. We thank Giovanni Luca Ciampaglia for the insightful discussion and comments on the manuscript.

Author Contributions

Conceived and designed the experiments: XD JB. Performed the experiments: XD JB. Analyzed the data: XD. Wrote the paper: XD JB.

References

1. Deaton A, Muellbauer J. Economics and consumer behavior. New York: Cambridge University Press; 1980.
2. Kietzmann JH, Hermkens K, McCarthy IP, Silvestre BS. Social media? Get serious! Understanding the functional building blocks of social media. Business Horizons. 2011 May;54:241–251.
- View Article
- Google Scholar
3. Bollen J, Mao H, Zeng X. Twitter mood predicts the stock market. Journal of Computational Science. 2011 Mar;2:1–8.
- View Article
- Google Scholar
4. Frijda NH. Varieties of affect: Emotions and episodes, moods, and sentiments. The nature of emotions: Fundamental questions. 1994;p. 197–202.
5. Shi Z, Rui H, Whinston AB. Content sharing in a social broadcasting environment: evidence from twitter. MIS Quarterly. 2014;38:123–142.
- View Article
- Google Scholar
6. Garcia D, Tessone C, Mavrodiev P, Perony N. The digital traces of bubbles: feedback cycles between socio-economic signals in the Bitcoin economy. Journal of the Royal Society Interface. 2014 2014;11(99).
- View Article
- Google Scholar
7. Vespignani A. Modelling dynamical processes in complex socio-technical systems. Nat Phys. 2012;8:32–39.
- View Article
- Google Scholar
8. Nickerson RS. Confirmation Bias: A Ubiquitous Phenomenon in Many Guises. Review of General Psychology. 1998;22:175–220.
- View Article
- Google Scholar
9. Scott S, Varian H. Predicting the present with bayesian structural time series. International Journal of Mathematical Modelling. 2014;5:1–21.
- View Article
- Google Scholar
10. preis T, Reith D, Stanley HE. Complex dynamics of our economic life on different scales: insights from search engine query data. Philosophical Transactions of the Royal Society A. 2010;368:5707–5719. Available from:
- View Article
- Google Scholar
11. Bordino I, Battiston S, Caldarelli G, Cristelli M, Ukkonen A, Weber I. Web Search Queries Can Predict Stock Market Volumes. PLoS ONE. 2012 07;7(7):e40014. Available from: pmid:22829871
- View Article
- PubMed/NCBI
- Google Scholar
12. Kristoufek L. BitCoin meets Google Trends and Wikipedia: Quantifying the relationship between phenomena of the Internet era. Scientific Reports. 2013;3(3415). Available from: pmid:24301322
- View Article
- PubMed/NCBI
- Google Scholar
13. Curme C, Preis T, Stanley HE, Moat HS. Quantifying the semantics of search behavior before stock market moves. Proceedings of the National Academy of Sciences. 2014;111:11600–11605.
- View Article
- Google Scholar
14. Preis T, Moat HS, Stanley EH. Quantifying Trading Behavior in Financial Markets Using Google Trends. Sci Rep. 2013;3.
- View Article
- Google Scholar
15. Webman J. 2014 Market Outlook. OPPENHEIMERFUNDS; 2014.
16. Pearson K. LIII. On lines and planes of closest fit to systems of points in space. Philosophical Magazine Series 6. 1901;2:559–572.
- View Article
- Google Scholar
17. Center CEMA. Adjustment on the Historical Data of China’s Consumer Confidence Index. China Monthly Economic Indecators. 2010 Feb;p. 210.
18. Hotelling H. Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology. 1933;24:417–441.
- View Article
- Google Scholar
19. Cureton EE, D’Agostino RB. Factor Analysis: an applied approach. Hillside: NJ: Lawrence Erlbaum Associates; 1983.
20. Abdi H. Multiple Correlation Coefficient. In: Salkind NJ, editor. Encyclopedia of Measurement and Statistics. SAGE Publications, Inc.; 2007. p. 649–652.
21. Yule GU. Why do we Sometimes get Nonsense-Correlations between Time-Series?–A Study in Sampling and the Nature of Time-Series. Journal of the Royal Statistical Society. 1926 Jan;89:1–63.
- View Article
- Google Scholar
22. Granger CWJ, Newbold P. Spurious regressions in econometrics. Journal of Econometrics. 1974 Jul;2:111–120.
- View Article
- Google Scholar
23. Kwiatkowski D, Phillips PCB, Schmidt P, Shinb Y. Testing the null hypothesis of stationary against the alternative of a unit root. Journal of econometrics. 1992;54:159–178.
- View Article
- Google Scholar
24. Dickey DA, Fuller WA. Distribution of the Estimators for Autoregressive Time Series With a Unit Root. Journal of the American Statistical Association. 1979;74:427–431.
- View Article
- Google Scholar
25. Fuller WA. Introduction to Statistical Time Series. New York: John Wiley & Sons; 1976.
26. Sims C. Macroeconomics and reality. Econometrica: Journal of the Econometric Society. 1980;48:1–48.
27. Granger CWJ. Investigating Causal Relations by Econometric Models and Cross-spectral Methods. Econometrica. 1969;37:424–438.
- View Article
- Google Scholar
28. Barberis N, Shleifer A, Vishny RW. A Model of Investor Sentiment. Journal of Financial Economics. 1998;49:307–343.
- View Article
- Google Scholar
29. Pasinetti, Luigi L. Structural Change and Economic Growth. Cambridge: Cambridge University Press; 1981.
30. Draper NR, Smith H. Applied Regression Analysis. 3rd ed. New York: John Wiley & Sons; 1998.
31. White H. A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity. Econometrica. 1980;48:817–838.
- View Article
- Google Scholar
32. King G, King Roberts M., Gary , and Roberts Margaret E. 2014. How Robust Standard Errors Expose Methodological Problems They Do Not Fix, and What to Do About It. Political Analysis. 2012;p. 1–21.
- View Article
- Google Scholar
33. Gujarati DN. Basic Econometrics. 4th ed. Erdenekhuu; 2008.
- View Article
- Google Scholar
34. Lazer D, Kennedy R, King G, Vespignani A. The Parable of Google Flu: Traps in Big Data Analysis. Science. 2014;343:1203–1205. pmid:24626916
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Deaton A, Muellbauer J. Economics and consumer behavior. New York: Cambridge University Press; 1980.

[ref2] 2. Kietzmann JH, Hermkens K, McCarthy IP, Silvestre BS. Social media? Get serious! Understanding the functional building blocks of social media. Business Horizons. 2011 May;54:241–251.
View Article
Google Scholar

[3] View Article

[4] Google Scholar

[ref3] 3. Bollen J, Mao H, Zeng X. Twitter mood predicts the stock market. Journal of Computational Science. 2011 Mar;2:1–8.
View Article
Google Scholar

[6] View Article

[7] Google Scholar

[ref4] 4. Frijda NH. Varieties of affect: Emotions and episodes, moods, and sentiments. The nature of emotions: Fundamental questions. 1994;p. 197–202.

[ref5] 5. Shi Z, Rui H, Whinston AB. Content sharing in a social broadcasting environment: evidence from twitter. MIS Quarterly. 2014;38:123–142.
View Article
Google Scholar

[10] View Article

[11] Google Scholar

[ref6] 6. Garcia D, Tessone C, Mavrodiev P, Perony N. The digital traces of bubbles: feedback cycles between socio-economic signals in the Bitcoin economy. Journal of the Royal Society Interface. 2014 2014;11(99).
View Article
Google Scholar

[13] View Article

[14] Google Scholar

[ref7] 7. Vespignani A. Modelling dynamical processes in complex socio-technical systems. Nat Phys. 2012;8:32–39.
View Article
Google Scholar

[16] View Article

[17] Google Scholar

[ref8] 8. Nickerson RS. Confirmation Bias: A Ubiquitous Phenomenon in Many Guises. Review of General Psychology. 1998;22:175–220.
View Article
Google Scholar

[19] View Article

[20] Google Scholar

[ref9] 9. Scott S, Varian H. Predicting the present with bayesian structural time series. International Journal of Mathematical Modelling. 2014;5:1–21.
View Article
Google Scholar

[22] View Article

[23] Google Scholar

[ref10] 10. preis T, Reith D, Stanley HE. Complex dynamics of our economic life on different scales: insights from search engine query data. Philosophical Transactions of the Royal Society A. 2010;368:5707–5719. Available from:
View Article
Google Scholar

[25] View Article

[26] Google Scholar

[ref11] 11. Bordino I, Battiston S, Caldarelli G, Cristelli M, Ukkonen A, Weber I. Web Search Queries Can Predict Stock Market Volumes. PLoS ONE. 2012 07;7(7):e40014. Available from: pmid:22829871
View Article
PubMed/NCBI
Google Scholar

[28] View Article

[29] PubMed/NCBI

[30] Google Scholar

[ref12] 12. Kristoufek L. BitCoin meets Google Trends and Wikipedia: Quantifying the relationship between phenomena of the Internet era. Scientific Reports. 2013;3(3415). Available from: pmid:24301322
View Article
PubMed/NCBI
Google Scholar

[32] View Article

[33] PubMed/NCBI

[34] Google Scholar

[ref13] 13. Curme C, Preis T, Stanley HE, Moat HS. Quantifying the semantics of search behavior before stock market moves. Proceedings of the National Academy of Sciences. 2014;111:11600–11605.
View Article
Google Scholar

[36] View Article

[37] Google Scholar

[ref14] 14. Preis T, Moat HS, Stanley EH. Quantifying Trading Behavior in Financial Markets Using Google Trends. Sci Rep. 2013;3.
View Article
Google Scholar

[39] View Article

[40] Google Scholar

[ref15] 15. Webman J. 2014 Market Outlook. OPPENHEIMERFUNDS; 2014.

[ref16] 16. Pearson K. LIII. On lines and planes of closest fit to systems of points in space. Philosophical Magazine Series 6. 1901;2:559–572.
View Article
Google Scholar

[43] View Article

[44] Google Scholar

[ref17] 17. Center CEMA. Adjustment on the Historical Data of China’s Consumer Confidence Index. China Monthly Economic Indecators. 2010 Feb;p. 210.

[ref18] 18. Hotelling H. Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology. 1933;24:417–441.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref19] 19. Cureton EE, D’Agostino RB. Factor Analysis: an applied approach. Hillside: NJ: Lawrence Erlbaum Associates; 1983.

[ref20] 20. Abdi H. Multiple Correlation Coefficient. In: Salkind NJ, editor. Encyclopedia of Measurement and Statistics. SAGE Publications, Inc.; 2007. p. 649–652.

[ref21] 21. Yule GU. Why do we Sometimes get Nonsense-Correlations between Time-Series?–A Study in Sampling and the Nature of Time-Series. Journal of the Royal Statistical Society. 1926 Jan;89:1–63.
View Article
Google Scholar

[52] View Article

[53] Google Scholar

[ref22] 22. Granger CWJ, Newbold P. Spurious regressions in econometrics. Journal of Econometrics. 1974 Jul;2:111–120.
View Article
Google Scholar

[55] View Article

[56] Google Scholar

[ref23] 23. Kwiatkowski D, Phillips PCB, Schmidt P, Shinb Y. Testing the null hypothesis of stationary against the alternative of a unit root. Journal of econometrics. 1992;54:159–178.
View Article
Google Scholar

[58] View Article

[59] Google Scholar

[ref24] 24. Dickey DA, Fuller WA. Distribution of the Estimators for Autoregressive Time Series With a Unit Root. Journal of the American Statistical Association. 1979;74:427–431.
View Article
Google Scholar

[61] View Article

[62] Google Scholar

[ref25] 25. Fuller WA. Introduction to Statistical Time Series. New York: John Wiley & Sons; 1976.

[ref26] 26. Sims C. Macroeconomics and reality. Econometrica: Journal of the Econometric Society. 1980;48:1–48.

[ref27] 27. Granger CWJ. Investigating Causal Relations by Econometric Models and Cross-spectral Methods. Econometrica. 1969;37:424–438.
View Article
Google Scholar

[66] View Article

[67] Google Scholar

[ref28] 28. Barberis N, Shleifer A, Vishny RW. A Model of Investor Sentiment. Journal of Financial Economics. 1998;49:307–343.
View Article
Google Scholar

[69] View Article

[70] Google Scholar

[ref29] 29. Pasinetti, Luigi L. Structural Change and Economic Growth. Cambridge: Cambridge University Press; 1981.

[ref30] 30. Draper NR, Smith H. Applied Regression Analysis. 3rd ed. New York: John Wiley & Sons; 1998.

[ref31] 31. White H. A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity. Econometrica. 1980;48:817–838.
View Article
Google Scholar

[74] View Article

[75] Google Scholar

[ref32] 32. King G, King Roberts M., Gary , and Roberts Margaret E. 2014. How Robust Standard Errors Expose Methodological Problems They Do Not Fix, and What to Do About It. Political Analysis. 2012;p. 1–21.
View Article
Google Scholar

[77] View Article

[78] Google Scholar

[ref33] 33. Gujarati DN. Basic Econometrics. 4th ed. Erdenekhuu; 2008.
View Article
Google Scholar

[80] View Article

[81] Google Scholar

[ref34] 34. Lazer D, Kennedy R, King G, Vespignani A. The Parable of Google Flu: Traps in Big Data Analysis. Science. 2014;343:1203–1205. pmid:24626916
View Article
PubMed/NCBI
Google Scholar

[83] View Article

[84] PubMed/NCBI

[85] Google Scholar

Abstract

Figures

Introduction

Materials and Methods

Chinese Consumer Confidence Index (CCI) survey

Economist’s Confidence Questionnaire (ECQ) topic extraction

Google Trends data

Methodological overview

Results and Discussion

Principal Component Analysis of ECQ topic covariances

Model definition

Results

Conclusions

Supporting Information

S1 Table. Translation of ECQ Questionnaire with model topics marked in bold.

Acknowledgments

Author Contributions

References

Cookie Preference Center

Customize Your Cookie Preference