Title: | Access to Twitter Streaming API via R |
---|---|
Description: | Functions to access Twitter's filter, sample, and user streams, and to parse the output into data frames. |
Authors: | Pablo Barbera <[email protected]> |
Maintainer: | Pablo Barbera <[email protected]> |
License: | GPL-2 |
Version: | 0.4.5 |
Built: | 2025-01-21 04:25:02 UTC |
Source: | https://github.com/cran/streamR |
This package provides a series of functions that allow R users to access Twitter's filter, sample, and user streams, and to parse the output into data frames.
Pablo Barbera [email protected]
filterStream
, sampleStream
,
userStream
, readTweets
, parseTweets
This function generates a OAuth token using the consumer key, consumer secret, access token and access token secret available in the "Keys and Access Token" tab of the "Application Management" website on Twitter's developers website.
createOAuthToken(consumerKey, consumerSecret, accessToken, accessTokenSecret)
createOAuthToken(consumerKey, consumerSecret, accessToken, accessTokenSecret)
consumerKey |
Consumer key for OAuth token |
consumerSecret |
Consumer secret for OAuth token |
accessToken |
Access token for OAuth token |
accessTokenSecret |
Access token secret for OAuth token |
A vector of string characters that contains ten sample tweets in plain text.
data(example_tweets)
data(example_tweets)
http://www.twitter.com/twitterapi
filterStream
opens a connection to Twitter's Streaming API
that will return public statuses that match one or more filter predicates.
Tweets can be filtered by keywords, users, language, and location. The output
can be saved as an object in memory or written to a text file.
filterStream(file.name = NULL, track = NULL, follow = NULL, locations = NULL, language = NULL, timeout = 0, tweets = NULL, oauth = NULL, verbose = TRUE)
filterStream(file.name = NULL, track = NULL, follow = NULL, locations = NULL, language = NULL, timeout = 0, tweets = NULL, oauth = NULL, verbose = TRUE)
file.name |
string, name of the file where tweets will be written. "" indicates output to the console, which can be redirected to an R object (see examples). If the file already exists, tweets will be appended (not overwritten). |
track |
string or string vector containing keywords to track.
See the |
follow |
string or numeric, vector of Twitter user IDs, indicating the users whose public
statuses should be delivered on the stream. See the |
locations |
numeric, a vector of longitude, latitude pairs (with the southwest corner
coming first) specifying sets of bounding boxes to filter public statuses by.
See the |
language |
string or string vector containing a list of BCP 47 language identifiers.
If not |
timeout |
numeric, maximum length of time (in seconds) of connection to stream.
The connection will be automatically closed after this period. For example, setting
|
tweets |
numeric, maximum number of tweets to be collected when function is called.
After that number of tweets have been captured, function will stop. If set to |
oauth |
an object of class |
verbose |
logical, default is |
filterStream
provides access to the statuses/filter Twitter stream.
It will return public statuses that
match the keywords given in the track
argument, published by the users
specified in the follow
argument, written in the language specified
in the language
argument, and sent within the location bounding
boxes declared in the locations
argument.
Note that location bounding boxes do not act as filters for other filter parameters. In the fourth example below, we capture all tweets containing the term rstats (even non-geolocated tweets) OR coming from the New York City area. For more information on how the Streaming API request parameters work, check the documentation at: https://developer.twitter.com/en/docs/tweets/filter-realtime/guides/basic-stream-parameters.
Also note that the language
parameter needs to be used in combination
with another filter option (either keywords or location).
If any of these arguments is left empty (e.g. no user filter is specified), the function will return all public statuses that match the other filters. At least one predicate parameter must be specified.
Note that when no file name is provided, tweets are written to a temporary file, which is loaded in memory as a string vector when the connection to the stream is closed.
The total number of actual tweets that are captured might be lower than the number of tweets requested because blank lines, deletion notices, and incomplete tweets are included in the count of tweets downloaded.
Pablo Barbera [email protected]
sampleStream
, userStream
, parseTweets
## Not run: ## An example of an authenticated request using the ROAuth package, ## where consumerkey and consumer secret are fictitious. ## You can obtain your own at dev.twitter.com library(ROAuth) requestURL <- "https://api.twitter.com/oauth/request_token" accessURL <- "https://api.twitter.com/oauth/access_token" authURL <- "https://api.twitter.com/oauth/authorize" consumerKey <- "xxxxxyyyyyzzzzzz" consumerSecret <- "xxxxxxyyyyyzzzzzzz111111222222" my_oauth <- OAuthFactory$new(consumerKey=consumerKey, consumerSecret=consumerSecret, requestURL=requestURL, accessURL=accessURL, authURL=authURL) my_oauth$handshake(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")) ## Alternatively, it is also possible to create a token without the handshake: my_oauth <- list(consumer_key = "CONSUMER_KEY", consumer_secret = "CONSUMER_SECRET", access_token="ACCESS_TOKEN", access_token_secret = "ACCESS_TOKEN_SECRET") ## capture 10 tweets mentioning the "Rstats" hashtag filterStream( file.name="tweets_rstats.json", track="rstats", tweets=10, oauth=my_oauth ) ## capture tweets published by Twitter's official account filterStream( file.name="tweets_twitter.json", follow="783214", timeout=600, oauth=my_oauth ) ## capture tweets sent from New York City in Spanish only, and saving as an object in memory tweets <- filterStream( file.name="", language="es", locations=c(-74,40,-73,41), timeout=600, oauth=my_oauth ) ## capture tweets mentioning the "rstats" hashtag or sent from New York City filterStream( file="tweets_rstats.json", track="rstats", locations=c(-74,40,-73,41), timeout=600, oauth=my_oauth ) ## End(Not run)
## Not run: ## An example of an authenticated request using the ROAuth package, ## where consumerkey and consumer secret are fictitious. ## You can obtain your own at dev.twitter.com library(ROAuth) requestURL <- "https://api.twitter.com/oauth/request_token" accessURL <- "https://api.twitter.com/oauth/access_token" authURL <- "https://api.twitter.com/oauth/authorize" consumerKey <- "xxxxxyyyyyzzzzzz" consumerSecret <- "xxxxxxyyyyyzzzzzzz111111222222" my_oauth <- OAuthFactory$new(consumerKey=consumerKey, consumerSecret=consumerSecret, requestURL=requestURL, accessURL=accessURL, authURL=authURL) my_oauth$handshake(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")) ## Alternatively, it is also possible to create a token without the handshake: my_oauth <- list(consumer_key = "CONSUMER_KEY", consumer_secret = "CONSUMER_SECRET", access_token="ACCESS_TOKEN", access_token_secret = "ACCESS_TOKEN_SECRET") ## capture 10 tweets mentioning the "Rstats" hashtag filterStream( file.name="tweets_rstats.json", track="rstats", tweets=10, oauth=my_oauth ) ## capture tweets published by Twitter's official account filterStream( file.name="tweets_twitter.json", follow="783214", timeout=600, oauth=my_oauth ) ## capture tweets sent from New York City in Spanish only, and saving as an object in memory tweets <- filterStream( file.name="", language="es", locations=c(-74,40,-73,41), timeout=600, oauth=my_oauth ) ## capture tweets mentioning the "rstats" hashtag or sent from New York City filterStream( file="tweets_rstats.json", track="rstats", locations=c(-74,40,-73,41), timeout=600, oauth=my_oauth ) ## End(Not run)
This function parses tweets downloaded using filterStream
,
sampleStream
or userStream
and returns a data frame. If tweet contains
280-character text it will return the complete text and not only 140 characters.
parseTweets(tweets, simplify = FALSE, verbose = TRUE, legacy = FALSE)
parseTweets(tweets, simplify = FALSE, verbose = TRUE, legacy = FALSE)
tweets |
A character string naming the file where tweets are stored or the name of the object in memory where the tweets were saved as strings. |
simplify |
If |
verbose |
logical, default is |
legacy |
logical, default is |
parseTweets
parses tweets downloaded using the filterStream
,
sampleStream
or userStream
functions
and returns a data frame where each row corresponds to one tweet and each column
represents a different field for each tweet (id, text, created_at, etc.).
The total number of tweets that are parsed might be lower than the number of lines in the file or object that contains the tweets because blank lines, deletion notices, and incomplete tweets are ignored.
To parse json to a twitter list, see readTweets
. That function can be significantly
faster for large files, when only a few fields are required.
Note also that the retweet_count
field contains the number of times a given tweet
was retweeted at the time it was captured from the API, or for automatic retweets the number
of times the original tweet was retweeted.
Pablo Barbera [email protected]
filterStream
, sampleStream
, userStream
## The dataset example_tweets contains 10 public statuses published ## by @twitterapi in plain text format. The code below converts the object ## into a data frame that can be manipulated by other functions. data(example_tweets) tweets.df <- parseTweets(example_tweets, simplify=TRUE, legacy=TRUE) ## Not run: ## A more complete example, that shows how to capture a user's home timeline ## for one hour using authentication via OAuth, and then parsing the tweets ## into a data frame. library(ROAuth) reqURL <- "https://api.twitter.com/oauth/request_token" accessURL <- "https://api.twitter.com/oauth/access_token" authURL <- "https://api.twitter.com/oauth/authorize" consumerKey <- "xxxxxyyyyyzzzzzz" consumerSecret <- "xxxxxxyyyyyzzzzzzz111111222222" my_oauth <- OAuthFactory$new(consumerKey=consumerKey, consumerSecret=consumerSecret, requestURL=reqURL, accessURL=accessURL, authURL=authURL) my_oauth$handshake() userStream( file="my_timeline.json", with="followings", timeout=3600, oauth=my_oauth ) tweets.df <- parseTweets("my_timeline.json") ## End(Not run)
## The dataset example_tweets contains 10 public statuses published ## by @twitterapi in plain text format. The code below converts the object ## into a data frame that can be manipulated by other functions. data(example_tweets) tweets.df <- parseTweets(example_tweets, simplify=TRUE, legacy=TRUE) ## Not run: ## A more complete example, that shows how to capture a user's home timeline ## for one hour using authentication via OAuth, and then parsing the tweets ## into a data frame. library(ROAuth) reqURL <- "https://api.twitter.com/oauth/request_token" accessURL <- "https://api.twitter.com/oauth/access_token" authURL <- "https://api.twitter.com/oauth/authorize" consumerKey <- "xxxxxyyyyyzzzzzz" consumerSecret <- "xxxxxxyyyyyzzzzzzz111111222222" my_oauth <- OAuthFactory$new(consumerKey=consumerKey, consumerSecret=consumerSecret, requestURL=reqURL, accessURL=accessURL, authURL=authURL) my_oauth$handshake() userStream( file="my_timeline.json", with="followings", timeout=3600, oauth=my_oauth ) tweets.df <- parseTweets("my_timeline.json") ## End(Not run)
This function parses tweets downloaded using filterStream
,
sampleStream
or userStream
and returns a list.
readTweets(tweets, verbose = TRUE)
readTweets(tweets, verbose = TRUE)
tweets |
A character string naming the file where tweets are stored or the name of the object in memory where the tweets were saved as strings. |
verbose |
logical, default is |
This function is the first step in the parseTweets
function and
is provided now as an independent function for convenience purposes. In cases
where only one field is needed, it can be faster to extract it directly from
the JSON data read in R as a list. It can also be useful to extract fields
that are not parsed by parseTweets
, such as hashtags or mentions.
The total number of tweets that are parsed might be lower than the number of lines in the file or object that contains the tweets because blank lines, deletion notices, and incomplete tweets are ignored.
Pablo Barbera [email protected]
## The dataset example_tweets contains 10 public statuses published ## by @twitterapi in plain text format. The code below converts the object ## into a list and extracts only the text. data(example_tweets) tweets.list <- readTweets(example_tweets) only.text <- unlist(lapply(tweets.list, '[[', 'text')) ## it can be done with an explicit loop: only.text <- c() for (i in 1:length(tweets.list)){ only.text[i] <- tweets.list[[i]]['text'] } print(unlist(only.text))
## The dataset example_tweets contains 10 public statuses published ## by @twitterapi in plain text format. The code below converts the object ## into a list and extracts only the text. data(example_tweets) tweets.list <- readTweets(example_tweets) only.text <- unlist(lapply(tweets.list, '[[', 'text')) ## it can be done with an explicit loop: only.text <- c() for (i in 1:length(tweets.list)){ only.text[i] <- tweets.list[[i]]['text'] } print(unlist(only.text))
sampleStream
opens a connection to Twitter's Streaming API
that will return a small random sample of public statuses, around 1%
at any given time.
sampleStream(file.name, timeout = 0, tweets = NULL, oauth = NULL, verbose = TRUE)
sampleStream(file.name, timeout = 0, tweets = NULL, oauth = NULL, verbose = TRUE)
file.name |
string, name of the file where tweets will be written. "" indicates output to the console, which can be redirected to an R object. If the file already exists, tweets will be appended (not overwritten). |
timeout |
numeric, maximum length of time (in seconds) of connection to stream.
The connection will be automatically closed after this period. For example, setting
|
tweets |
numeric, maximum number of tweets to be collected when function is called.
After that number of tweets have been captured, function will stop. If set to |
oauth |
an object of class |
verbose |
logical, default is |
For more information, check the documentation at: https://developer.twitter.com/en/docs/tweets/sample-realtime/overview/GET_statuse_sample
Note that when no file name is provided, tweets are written to a temporary file, which is loaded in memory as a string vector when the connection to the stream is closed.
The total number of actual tweets that are captured might be lower than the number of tweets requested because blank lines, deletion notices, and incomplete tweets are included in the count of tweets downloaded.
Pablo Barbera [email protected]
filterStream
, userStream
, parseTweets
## Not run: ## capture a random sample of tweets sampleStream( file.name="tweets_sample.json", user=FOO, password=BAR ) ## An example of an authenticated request using the ROAuth package, ## where consumerkey and consumer secret are fictitious. ## You can obtain your own at dev.twitter.com library(ROAuth) reqURL <- "https://api.twitter.com/oauth/request_token" accessURL <- "https://api.twitter.com/oauth/access_token" authURL <- "https://api.twitter.com/oauth/authorize" consumerKey <- "xxxxxyyyyyzzzzzz" consumerSecret <- "xxxxxxyyyyyzzzzzzz111111222222" my_oauth <- OAuthFactory$new(consumerKey=consumerKey, consumerSecret=consumerSecret, requestURL=requestURL, accessURL=accessURL, authURL=authURL) my_oauth$handshake(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")) ## Alternatively, it is also possible to create a token without the handshake: my_oauth <- list(consumer_key = "CONSUMER_KEY", consumer_secret = "CONSUMER_SECRET", access_token="ACCESS_TOKEN", access_token_secret = "ACCESS_TOKEN_SECRET") sampleStream( file.name="tweets_sample.json", oauth=my_oauth ) ## End(Not run)
## Not run: ## capture a random sample of tweets sampleStream( file.name="tweets_sample.json", user=FOO, password=BAR ) ## An example of an authenticated request using the ROAuth package, ## where consumerkey and consumer secret are fictitious. ## You can obtain your own at dev.twitter.com library(ROAuth) reqURL <- "https://api.twitter.com/oauth/request_token" accessURL <- "https://api.twitter.com/oauth/access_token" authURL <- "https://api.twitter.com/oauth/authorize" consumerKey <- "xxxxxyyyyyzzzzzz" consumerSecret <- "xxxxxxyyyyyzzzzzzz111111222222" my_oauth <- OAuthFactory$new(consumerKey=consumerKey, consumerSecret=consumerSecret, requestURL=requestURL, accessURL=accessURL, authURL=authURL) my_oauth$handshake(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")) ## Alternatively, it is also possible to create a token without the handshake: my_oauth <- list(consumer_key = "CONSUMER_KEY", consumer_secret = "CONSUMER_SECRET", access_token="ACCESS_TOKEN", access_token_secret = "ACCESS_TOKEN_SECRET") sampleStream( file.name="tweets_sample.json", oauth=my_oauth ) ## End(Not run)
userStream
opens a connection to Twitter's Streaming API
that will return statuses specific to the authenticated user. The output
can be saved as an object in memory or written to a text file.
userStream(file.name = NULL, with = "followings", replies = NULL, track = NULL, locations = NULL, timeout = 0, tweets = NULL, oauth = NULL, verbose = TRUE)
userStream(file.name = NULL, with = "followings", replies = NULL, track = NULL, locations = NULL, timeout = 0, tweets = NULL, oauth = NULL, verbose = TRUE)
file.name |
string, name of the file where tweets will be written. "" indicates output to the console, which can be redirected to an R object. If the file already exists, tweets will be appended (not overwritten). |
with |
string, detault is "followings", which will stream messages from accounts the authenticated user follow. If set to "user", will only stream messages from authenticated user. See the |
replies |
string, default is See the |
track |
string or string vector containing keywords to track. See the track parameter information in the Streaming API documentation for details: https://developer.twitter.com/en/docs/tweets/filter-realtime/guides/basic-stream-parameters. |
locations |
numeric, a vector of longitude, latitude pairs (with the southwest corner coming first) specifying sets of bounding boxes to filter statuses by. See the locations parameter information in the Streaming API documentation for details: https://developer.twitter.com/en/docs/tweets/filter-realtime/guides/basic-stream-parameters |
timeout |
numeric, maximum length of time (in seconds) of connection to stream.
The connection will be automatically closed after this period. For example, setting
|
tweets |
numeric, maximum number of tweets to be collected when function is called.
After that number of tweets have been captured, function will stop. If set to |
oauth |
an object of class |
verbose |
logical, default is |
This function provides access to messages for a single user.
The set of messages to be returned can include the user's tweets and/or replies, and public statuses published by the accounts the user follows, as well to replies to those accounts.
Tweets can also be filtered by keywords and location, using the track
and locations
arguments.
The total number of actual tweets that are captured might be lower than the number of tweets requested because blank lines, deletion notices, and incomplete tweets are included in the count of tweets downloaded.
Note that when no file name is provided, tweets are written to a temporary file, which is loaded in memory as a string vector when the connection to the stream is closed.
Pablo Barbera [email protected]
filterStream
, sampleStream
, parseTweets
## Not run: ## The following example shows how to capture a user's home timeline ## with the Streaming API and using authentication via the ROAuth ## package, with fictitious consumerkey and consumer secret. ## You can obtain your own at dev.twitter.com library(ROAuth) requestURL <- "https://api.twitter.com/oauth/request_token" accessURL <- "https://api.twitter.com/oauth/access_token" authURL <- "https://api.twitter.com/oauth/authorize" consumerKey <- "xxxxxyyyyyzzzzzz" consumerSecret <- "xxxxxxyyyyyzzzzzzz111111222222" my_oauth <- OAuthFactory$new(consumerKey=consumerKey, consumerSecret=consumerSecret, requestURL=requestURL, accessURL=accessURL, authURL=authURL) my_oauth$handshake(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")) ## Alternatively, it is also possible to create a token without the handshake: my_oauth <- list(consumer_key = "CONSUMER_KEY", consumer_secret = "CONSUMER_SECRET", access_token="ACCESS_TOKEN", access_token_secret = "ACCESS_TOKEN_SECRET") ## Capturing 10 tweets from a user's timeline userStream( file.name="my_timeline.json", with="followings", tweets=10, oauth=my_oauth ) ## End(Not run)
## Not run: ## The following example shows how to capture a user's home timeline ## with the Streaming API and using authentication via the ROAuth ## package, with fictitious consumerkey and consumer secret. ## You can obtain your own at dev.twitter.com library(ROAuth) requestURL <- "https://api.twitter.com/oauth/request_token" accessURL <- "https://api.twitter.com/oauth/access_token" authURL <- "https://api.twitter.com/oauth/authorize" consumerKey <- "xxxxxyyyyyzzzzzz" consumerSecret <- "xxxxxxyyyyyzzzzzzz111111222222" my_oauth <- OAuthFactory$new(consumerKey=consumerKey, consumerSecret=consumerSecret, requestURL=requestURL, accessURL=accessURL, authURL=authURL) my_oauth$handshake(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")) ## Alternatively, it is also possible to create a token without the handshake: my_oauth <- list(consumer_key = "CONSUMER_KEY", consumer_secret = "CONSUMER_SECRET", access_token="ACCESS_TOKEN", access_token_secret = "ACCESS_TOKEN_SECRET") ## Capturing 10 tweets from a user's timeline userStream( file.name="my_timeline.json", with="followings", tweets=10, oauth=my_oauth ) ## End(Not run)