We get news from many media sources, and also through our friends, online and offline. By the time the news reaches us, it may have been retold in interesting ways, which so far have typically not been quantified. Normally it would be difficult to tell how the information that reaches us differs from its original source, because the sharing of the information is dispersed, or the situation itself is evolving. However, in a few cases, the source is better-defined, for example, when a public entity issues a press release.
In a recent study, we collected a sample of press releases by the U.S. Federal Open Market Committee, published speeches by President Barack Obama, as well as press releases from several tech companies and universities. We then gathered de-identified Facebook data, analyzed in aggregate, on shares of the articles covering the source and the corresponding comments, as shown in the diagram above.
Once the source is known, one can make several observations about how the information from the source makes its way and is discussed into news media and social media.
The analysis included 85 sources, covered by an average of 184 news articles, which were in turn shared 22K times on average, and garnered an average of 20K comments. We discuss these findings in greater detail below, and in the forthcoming paper to be presented at the International Conference on Weblogs and Social Media (ICWSM’16).
By taking the words in the original press release, and comparing them against words used in news articles covering the press release, we can get an estimate of the coverage. While no individual article covers a majority of the words in the source (the average is a bit above 20%), several articles combined do.
Caption: News article coverage of words contained in the source. Max denotes the single article out of the randomly chosen set with the most words from the original source. The cumulative curve shows the coverage obtained by combining words in all the articles in the sample.
Since coverage from a news article is typically only partial, one can ask whether the source is sometimes shared directly, e.g., sharing a transcript of the President’s speech directly on Facebook, as opposed to sharing a news article about the speech. In the vast majority of cases, what is shared is a news article, especially for presidential speeches and university press releases:
Caption: Percentage of Facebook shares that link directly to the source (“politics”: U.S. presidential speeches, “science”: university press releases, “tech”: press releases from tech companies, “finance”: statements from the U.S.Federal Open Market Committee).
A further question arises about the timeliness of the news coverage and discussion. While a fraction of the news articles appear simultaneously as the press release, potentially because of interviews given in advance of the announcement, a second wave of articles, along with the majority of shares and comments, occur about half a day later.
Caption: Fraction of articles, shares, and comments occurring in each hour after the first post.
Because the information is propagating in several layers, it is possible for some facts and ideas from the source to be amplified, while others fade. For example, when speaking about a drone strike that killed two American hostages, Warren Weinstein and Giovanni Lo Porto, President Obama emphasized families. However, the news articles and subsequent coverage emphasized that people had been killed.
Caption: An example of word clouds generated from information sources, news articles, shares, comments on President Obama’s speech about the deaths of Warren Weinstein and Giovanni Lo Porto. Green words are positive, red words are negative according to the LIWC dictionary. The size of a word represents word frequency.
One way of preserving information from the source directly is by using quotes. We find that university press releases and presidential speeches are most likely to be quoted, perhaps because presidential speeches are quotes themselves, and university press releases typically already contain quotes.
Caption: Fraction of news articles quoting the source, by source category
As the example above shows, the number of subjective words can vary. We measure subjectivity using two established sentiment dictionaries, LIWC and Vader (see paper for details). In general, we find that the news media uses the fewest subjective words, consistent with an aim to present news objectively. The source material itself tends to be more positive on average, while shares and comments tend to contain more negative terms. Conventions on Facebook may be helpful to consider when examining these findings. For example, likes are not included in this analysis but are a common way to express approval on Facebook (this analysis was done before the launch of Reactions). As a result, comparing positive and negative comments alone may not provide a full picture of responses.
Caption: Relative (left) subjectivity and (right) sentiment scores in different layers.
One may ask why the subjectivity increases in shares and comments compared to news articles. There are two possible reasons for the increased subjectivity: individuals focus on the existing subjective part of news articles when spreading the information, or individuals bring in novel perspectives or content that is subjective. We find that while individuals do not magnify existing subjectivity in the corresponding news article at all, novel words that individuals introduce in shares are twice as subjective as the corresponding news article.
Caption: the subjectivity of words in the article (“article”), words in share text that also occur in the article (“existing”), and words that are original to the share text (“novel”).
Since different news articles provide varying coverage, one can ask whether any of the above variables might be predictive of whether the article is shared over another article covering the same source. Interestingly we found no correlation between variables such as sentiment or coverage. Being posted early carried a very slight advantage. The only major factor that does matter is the prior number of shares of other articles from the same news site. Interestingly, however, the most shared article from one source to the next rarely comes from the same news site.
We analyzed information from its source through news articles, to shares and comments on Facebook. We found that while some things get lost in propagation, and individually news articles cover only a fraction of the words in the source, collectively articles provide comprehensive coverage. News articles also contain the fewest subjective words. While the sentiment appears to be most negative in comments, this is potentially skewed because in this layer, a “like” expresses agreement and positive sentiment, while disagreement could only be expressed in comments (the study was completed prior to the introduction of Facebook’s reactions.) We also saw that the emphasis can shift, as some words become more prominent in later layers. We hope that this study sheds some light on this and other interesting aspects of news cycles in social media.
 Chenhao Tan, Adrien Friggeri, Lada Adamic, “Lost in propagation? Unfolding news cycles from the source”, ICWSM’16.