Title: | Wrapper for the 'mediacloud.org' API |
---|---|
Description: | API wrapper to gather news stories, media information and tags from the 'mediacloud.org' API, based on a multilevel query <https://mediacloud.org/>. A personal API key is required. |
Authors: | Dix Jan [cre, aut] |
Maintainer: | Dix Jan <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.1.9000 |
Built: | 2024-10-31 03:19:24 UTC |
Source: | https://github.com/jandix/mediacloudr |
extract_meta_data
extracts native, open graph and twitter meta data
from html documents. The meta data include url, title, description and image.
The html document is parsed within the function
extract_meta_data(html_doc)
extract_meta_data(html_doc)
html_doc |
Character string including the html document. |
List with three sublists for native, open graph and twitter.
## Not run: library(httr) url <- "https://bits.blogs.nytimes.com/2013/04/07/the-potential-and-the-risks-of-data-science" response <- GET(url) html_document <- content(response, type = "text", encoding = "UTF-8") meta_data <- extract_meta_data(html_doc = html_document) ## End(Not run)
## Not run: library(httr) url <- "https://bits.blogs.nytimes.com/2013/04/07/the-potential-and-the-risks-of-data-science" response <- GET(url) html_document <- content(response, type = "text", encoding = "UTF-8") meta_data <- extract_meta_data(html_doc = html_document) ## End(Not run)
get_media
returns media source by their id. A media source
is one publisher. Every story that can be collected via get_story
or get_story_list
belongs to one media source.
get_media_source(media_id, api_key = Sys.getenv("MEDIACLOUD_API_KEY"))
get_media_source(media_id, api_key = Sys.getenv("MEDIACLOUD_API_KEY"))
media_id |
Positive integer that contains a valid media“ id. |
api_key |
Character string with the API key you get from mediacloud.org. Passing it is compulsory. Alternatively, function can be provided from the global environment. |
Data frame with results. See https://github.com/berkmancenter/mediacloud/blob/master/doc/api_2_0_spec/api_2_0_spec.md#media for field descriptions.
## Not run: media_source <- get_media_source(media_id = 604L) ## End(Not run)
## Not run: media_source <- get_media_source(media_id = 604L) ## End(Not run)
get_story
returns news stories by their id. One story represents
one online publication. Each story refers to a single URL from any feed
within a single media source.
get_story(story_id, api_key = Sys.getenv("MEDIACLOUD_API_KEY"))
get_story(story_id, api_key = Sys.getenv("MEDIACLOUD_API_KEY"))
story_id |
Positive numeric that contains a valid story id. |
api_key |
Character string with the API key you get from mediacloud.org. Passing it is compulsory. Alternatively, function can be provided from the global environment. |
Data frame with results. See https://github.com/berkmancenter/mediacloud/blob/master/doc/api_2_0_spec/api_2_0_spec.md#stories for field descriptions.
## Not run: story <- get_story(story_id = 604L) ## End(Not run)
## Not run: story <- get_story(story_id = 604L) ## End(Not run)
get_story
returns a list of stories based on a multifaceted query. One
story represents one online publication. Each story refers to a single URL
from any feed within a single media source.
get_story_list(last_process_stories_id = 0L, rows = 100, feeds_id = NULL, q = NULL, fq = NULL, sort = "processed_stories_id", wc = FALSE, show_feeds = FALSE, api_key = Sys.getenv("MEDIACLOUD_API_KEY"))
get_story_list(last_process_stories_id = 0L, rows = 100, feeds_id = NULL, q = NULL, fq = NULL, sort = "processed_stories_id", wc = FALSE, show_feeds = FALSE, api_key = Sys.getenv("MEDIACLOUD_API_KEY"))
last_process_stories_id |
Return stories in which the processed_stories_id is greater than this value. |
rows |
Number of stories to return, max 1000. |
feeds_id |
Return only stories that match the given feeds_id, sorted my descending publish date |
q |
If specified, return only results that match the given Solr query. Only one q parameter may be included. |
fq |
If specified, file results by the given Solr query. More than one fq parameter may be included. |
sort |
Returned results sort order. Supported values: processed_stories_id, random |
wc |
If set to TRUE, include a 'word_count' field with each story that includes a count of the most common words in the story |
show_feeds |
If set to TRUE, include a 'feeds' field with a list of the feeds associated with this story |
api_key |
Character string with the API key you get from mediacloud.org. Passing it is compulsory. Alternatively, function can be provided from the global environment. |
Data frame with results. See https://github.com/berkmancenter/mediacloud/blob/master/doc/api_2_0_spec/api_2_0_spec.md#stories for field descriptions.
## Not run: stories <- get_story_list() stories <- get_story_list(q = "Trump") ## End(Not run)
## Not run: stories <- get_story_list() stories <- get_story_list(q = "Trump") ## End(Not run)
extract_meta_data
A HTML document with basic meta tags for open-graph, twitter and native meta data.
meta_data_html
meta_data_html
An object of class character
of length 1.