Talkwalker API Overview
Talkwalker Search API Overview & Example
How it works
The Talkwalker Search API allows you to retrieve up to 500 sorted results for a given timeframe within the last 30 days. In addition, a histogram of the number of results can also be returned. You can sort the results by publication time, indexing time, engagement or other metrics. A single search query can support up to 50 operands. To create complex queries, operands may be combined using Boolean operators.
A few words about the results
Search results can be sorted by engagement, time or other metrics and be restricted to specific attribute value ranges (for example only return results published in a certain timerange). When no special filters are applied, a single search request will return results from all media types and all languages over the past 30 days sorted by engagement by default. You don’t need to execute one search request for each language and media type separately. To get a smaller set of results, you can either get only the highest ranked results or get a random sample set.
A brief example (Search)
The Talkwalker API search results endpoint (https://api.talkwalker.com/api/v1/search/results
) is used to search on the Talkwalker API. (For testing purpose the access_token
demo
can be used. Setting the variable pretty=true
will return formatted results)
curl -XGET 'https://api.talkwalker.com/api/v1/search/results?access_token=demo&q=cats&pretty=true'
response (all responses are UTF-8):
{
"status_code" : "0",
"status_message" : "OK",
"request" : "GET /api/v1/search/results?access_token=demo&q=cats&pretty=true",
"pagination" : {
"next" : "GET /api/v1/search/results?access_token=demo&q=cats&pretty=true&offset=10",
"total" : 298138
},
"result_content" : {
"data" : [ {
"data" : {
"url" : "http://example.blogspot.com/cats",
"indexed" : 1417999367498,
"search_indexed" : 1417999504832,
"published" : 1417999319393,
"title" : "Something cats",
"content" : "Welcome to my colorful little island (...)",
"title_snippet" : "Something with cats",
"root_url" : "http://example.blogspot.com/",
"domain_url" : "http://blogspot.com/",
"host_url" : "http://example.blogspot.com/",
"parent_url" : "http://example.blogspot.com/cats",
"lang" : "en",
"porn_level" : 0,
"fluency_level" : 90,
"spam_level" : 20,
"sentiment" : 5,
"source_type" : [ "BLOG", "BLOG_OTHER" ],
"post_type" : [ "TEXT" ],
"tokens_title" : [ "Something", "Something", "Cats", "Cats" ],
"tokens_content" : [ "Bead Hoarder Blog", "Bead Hoarder Blog"],
"tokens_mention" : [ "@yahoo" ],
"tags_internal" : [ "isQuestion" ],
"article_extended_attributes" : {
"num_comments" : 3
},
"source_extended_attributes" : {
"alexa_pageviews" : 0
},
"extra_article_attributes" : {
"world_data" : { }
},
"extra_author_attributes" : {
"world_data" : { },
"id" : "ex:example.blogspot.com-698904645",
"name" : "view my complete profile",
"gender" : "MALE"
},
"extra_source_attributes" : {
"world_data" : {
"continent" : "North America",
"country" : "United States",
"region" : "District of Columbia",
"city" : "Washington, D.C.",
"longitude" : -77.0094185808,
"latitude" : 38.8995493765,
"country_code" : "us"
},
"id" : "ex:example.blogspot.com",
"name" : "http://example.blogspot.com/"
},
"engagement" : 3,
"reach" : 0
}
}, {
"data" : {
"url" : "http://example.wordpress.com/2014/12/06/high-rez-snobbery-715-winter-trend-ice/",
... // truncated
more on the Talkwalker Search Results API
Talkwalker Streaming API Overview & Example
How it works
The Talkwalker Streaming API delivers real-time data through a persistent connection to our servers. Configure your stream with a set of filtering rules, connect to the stream and new results will be delivered in real time, as soon as they are found by our crawlers. You will not need to do any polling to receive new data.
You setup and configure the Streaming API by defining rules (Boolean query, language, media types, etc.). The Streaming API then finds and collects all relevant data and adds it to your data stream, with individually highlighted snippets per matched rule. This feature allows you to gather data from many rules through a single stream while easily matching the results back to your predefined rules.
Each rule allows filtering by title, content, author, language, URL, country, media type, and more parameters, using the same syntax as in our Talkwalker Search interface. You can also apply a list of sources to be included or excluded from the stream, to give you even further possibilities to narrow down the results you will get. A single rule can support up to 50 operands. To create complex rules, operands may be combined using Boolean Operators.
The documents are streamed in the order they are found by our crawlers and added to Talkwalker (i.e. by search_indexed
timestamp). Custom sorting is not possible with the Streaming API (however this can be done with the Search API). The documents are grouped in timeframes which contain all documents that were indexed between the given start and end time of the timeframe.
Each result (independent on how many rules match) will be counted as 1 credit.
A brief example (Streaming)
The Talkwalker API streaming endpoint (https://api.talkwalker.com/api/v3/stream
) is used to stream results from Talkwalker.
Creating a Stream
curl -XPUT 'https://api.talkwalker.com/api/v3/stream/s/teststream?access_token=demo'
-d '{ "rules" : [{ "rule_id": "rule-1", "query": "cats" }] }'
-H 'Content-Type: application/json; charset=UTF-8'
{
"status_code" : "0",
"status_message" : "OK",
"request" : "PUT /api/v3/stream/s/teststream?access_token=demo",
"result_stream" : {
"data" : [{
"stream_id" : "teststream",
"rules" : [{
"rule_id" : "rule-1",
"query" : "cats"
}]
}]
}
}
Streaming
curl -XGET 'https://api.talkwalker.com/api/v3/stream/s/teststream/results?access_token=demo'
The response is a stream of chunks, chunks contain meta data (CT_CONTROL
) on the Talkwalker stream or search results (CT_RESULT
).
{
"chunk_type" : "CT_CONTROL",
"chunk_control" : {
"stream" : [ {
"id" : "teststream",
"status" : "active"
} ],
"connection_id" : "#piv2tmbz2yxu#"
}
}
{
"chunk_type": "CT_RESULT",
"chunk_result": {
"data" : {
"data" : {
"url" : "http://example.blogspot.com/cats",
"indexed" : 1417999367498,
"search_indexed" : 1417999504832,
"published" : 1417999319393,
"title" : "Something cats",
"content" : "Welcome to my colorful little island (...)",
"title_snippet" : "Color and Light Inspirations in Jewelry: SUNNY RINGS :)",
"root_url" : "http://example.blogspot.com/",
"domain_url" : "http://blogspot.com/",
"host_url" : "http://example.blogspot.com/",
"parent_url" : "http://example.blogspot.com/2014/12/sunny-rings.html",
"lang" : "en",
... // truncated
}
}
}
}
more on the Talkwalker Streaming API
API Account
Access Token
Demo
To try the Talkwalker API, you can use the access token demo (access_token=demo
).
With this token you can try the Search API (results and histogram) and the streaming API.
The demo token allows only for one of the following pre-defined queries to be used: cats
, dogs
and cats AND dogs
.
Accessing the Talkwalker API with this token will not return any social media results, only results from blogs, forums and news are returned.
This token can be used for testing only.
Your own Access Token
To use the Talkwalker API with the topics from your Talkwalker Project or to get results from social media you need to apply and get your own access tokens.
-
read_only
access tokens are necessary for search. -
read_write
access tokens are necessary for search, updating and deleting documents in a project and for creating streams, deleting streams, setting panels and setting rules.
To get an access token please contact us.
Credits / Pricing
Monthly Reset of Credits
The credits will be reset every month, on the day of the subscription at 03:00 UTC. (Note that the monthly new results in Talkwalker projects are reset on the first of a new month at 0:00 UTC)
Remaining Credits Endpoint
The endpoint https://api.talkwalker.com/api/v1/status/credits
is used to get an overview of consumed credits and API calls.
{
"status_code" : "0",
"status_message" : "OK",
"request" : "GET /api/v1/status/credits?access_token=demo",
"result_creditinfo" {
"used_credits_monthly" : 0,
"used_credits_onetime" : 0,
"remaining_credits_monthly" : 0,
"remaining_credits_onetime" : 0,
"next_billing_period" : 1419634800000,
"estimate_credits_used_until_end_of_billing_period" : 0,
"monthly_total" : 0
}
}
Talkwalker Search API
Talkwalker Search Results API
https://api.talkwalker.com/api/v1/search/results
How it works
The Talkwalker Search API allows you to retrieve up to 500 sorted results for a given timeframe within the last 30 days. In addition, a histogram of the number of results can also be returned. You can sort the results by publication time, indexing time, engagement or other metrics. A single search query can support up to 50 operands. To create complex queries, operands may be combined using Boolean operators.
A few words about the results
Search results can be sorted by engagement, time or other metrics and be restricted to specific attribute value ranges (for example only return results published in a certain timerange). When no special filters are applied, a single search request will return results from all media types and all languages over the past 30 days sorted by engagement by default. You don’t need to execute one search request for each language and media type separately. To get a smaller set of results, you can either get only the highest ranked results or get a random sample set.
Parameters
parameter | description | required? | default value |
---|---|---|---|
|
API access token |
required |
|
|
The query to search for |
required |
|
|
Number of results to skip (for paging) |
optional |
default: 0 / maximum: 500 |
|
Number of hits per page (for paging) |
optional |
default: 10 / maximum: 500 |
|
Criteria for sorting the results. |
optional |
default: engagement |
|
Sorting order (ascending or descending) |
optional |
default: desc |
|
Turns highlighting on or off |
optional |
default: true |
|
Formatted json for testing |
optional |
false |
Credits
1 credit per returned result, at least 10 credits per call (e.g. 100 results = 100 credits, 10 results = 10 credits and 0 results = 10 credits).
Examples
Get 100 results containing the word "cats" and "dogs"
Set the query cats AND dogs with query=cats%20AND%20dogs
(note: in URLs spaces are replaced by %20
) and set hits per page to 100 with hpp=100
.
curl 'https://api.talkwalker.com/api/v1/search/results?access_token=demo&q=cats%20AND%20dogs&hpp=100'
More on the Talkwalker Query Syntax
Get results containing the word "cats" sorted from new to old
To sort the results by date, set sort_by
to published
(to sort by the date of publication), to get the newest results first, set sort_order=desc
.
curl 'https://api.talkwalker.com/api/v1/search/results?access_token=demo&q=cats&sort_by=published&sort_order=desc'
All options for sort_by
are :
reach
, facebook_shares
, facebook_likes
, twitter_shares
, twitter_retweets
,
twitter_followers
, youtube_likes
, youtube_dislikes
, youtube_views
,
cluster_size
, comment_count
, published
, search_indexed
, trending_score
You may find additional information on the document fields , except for trending_score
, as presented next.
What is the meaning of Trending Score?
The Trending Score evaluates the acceleration of the engagement of a specific story over time. Mentions which are rapidly getting more engagement are given a higher score (out of 10), enabling you to discover breaking stories on specific topics. When calculating the Trending Score our algorithms look at the speed at which the engagement is growing, and then assign each result with a relative score (out of 10) based upon the speed at which the engagement is growing. The scores are categorised in the following way:
-
0-3: the article is not trending (or not trending anymore)
-
4-6: the article is slightly trending
-
7-10: the article is really trending, right now
Get results containing the word "cats" sorted by their Trending Score
curl 'https://api.talkwalker.com/api/v1/search/results?access_token=demo&q=cats&sort_by=trending_score&sort_order=desc&pretty=true'
{
"status_code" : "0",
"status_message" : "OK",
"request" : "GET /api/v1/search/results?access_token=demo&q=cats&sort_by=trending_score&sort_order=desc&pretty=true",
"pagination" : {
"next" : "GET /api/v1/search/results?access_token=demo&q=cats&sort_by=trending_score&sort_order=desc&pretty=true&offset=10",
"total" : 5169304
},
"result_content" : {
"data" : [ {
"data" : { ...
//truncated
"word_count" : 421,
"trending_score" : 6
} ...
// truncated
} ]
}
}
Talkwalker Search Histogram API
https://api.talkwalker.com/api/v1/search/histogram/<type>
How it works
With the Talkwalker Search Histogram API, you can retrieve the distribution of the number of search results for a given search query.
Histograms can be made for distribution over time or over specific metrics (number of comments, number of shares, reach, retweets etc.).
By setting min
and max
a histogram can be limited to a specific range (min_include
and max_include
control if those bounds are included).
This can be a time range for published
/search_indexed
or just some upper and lower cap for e.g. engagement
histograms.
interval
defines the width of the bins, the accepted values are long integers for metrics or duration values (like 7d
for 7 days) for published
and search_indexed
dates.
When using a bin size of entire days, timezone
allows to set a timezone to specify the begin and end of the days.
The 30-day-limitation on global data also holds for histograms. |
Types
type | Description | Representation |
---|---|---|
|
Timestamp of publication (epoch time in milliseconds) |
Histogram |
|
Timestamp of indexation in Talkwalker (epoch time in milliseconds) |
Histogram |
|
The reach of an article/post represents the number of people who were reached by this article/post. |
Histogram |
|
The engagement of an article/post is the sum of actions made by others on that article/post. |
Histogram |
|
Number of Facebook shares an article has |
Histogram |
|
Number of Facebook likes an article has |
Histogram |
|
Number of Twitter retweets an article has |
Histogram |
|
Number of Twitter share an article has |
Histogram |
|
Number of Twitter likes an article has |
Histogram |
|
Number of Twitter followers a source has |
Histogram |
|
Number of Instagram likes an article has |
Histogram |
|
Number of YouTube views a video has |
Histogram |
|
Number of YouTube likes a video has |
Histogram |
|
Number of YouTube dislikes a video has |
Histogram |
|
Number of comments an article has |
Histogram |
|
Number of documents written in a language |
Top-N Distribution |
|
Number of documents with a source from a certain country |
Top-N Distribution |
|
Number of documents with a source from a certain region, depends on geolocation resolution |
Top-N Distribution |
|
Number of documents with a source from a certain city, depends on geolocation resolution |
Top-N Distribution |
|
Number of documents written by an author of a particular gender |
Top-N Distribution |
|
Number of documents written by an author in a predefined age group |
Distribution |
|
Total number of different authors |
Distribution |
|
Number of documents containing a particular hashtag |
Top-N Distribution |
|
Number of documents containing a particular emoji code |
Top-N Distribution |
|
percent of documents containing a particular word or hashtag |
Top-N Distribution |
|
Number of documents within a particular interest group |
Top-N Distribution |
|
Number of documents within a particular occupation group |
Top-N Distribution |
|
Number of documents with a particular sentiment |
Distribution |
Parameters
parameter | description | required? | allowed values | default value |
---|---|---|---|---|
|
a read/write token specified in the API application |
required |
||
|
The query to search for |
required |
Talkwalker query syntax |
|
|
Minimum value for bins |
optional |
Long Integer value |
|
|
Maximum value for bins |
optional |
Long Integer value |
|
|
Include min value |
optional |
|
|
|
Include max value |
optional |
|
|
|
Bin Interval |
optional |
Duration for |
|
|
Timezone (for interval) |
optional |
tz database: timezone name (e.g. |
|
|
Nested histogram |
optional |
|
|
|
Nested metric for time based histograms |
optional |
metric historgram types |
|
|
Size limiter for demographic distribution |
optional |
Integer value in ]0, 100] |
|
Possible values for interval when creating a histogram over published
or search_indexed
: year
, quarter
, month
, week
, day
, hour
, minute
, second
as well as numeric values with the units w
(week), d
(day), h
(hours), m
(minutes), and s
(seconds). (e.g. 5d
for 5 days or 2w
for 2 weeks).
The maximum number of histogram bins is 400, if the min
, max
and interval
parameters result in a larger number of bins, an error message (HTTP 400) is returned. Try reducing the range or increasing the interval.
value_type
allows specifying a type for nested statistics per bin in a histogram over published
or search_indexed
.
Since some parameters are only used by certain histogram types, the following table provides an overview of all working combinations.
access_token q | min max min_include max_include interval | timezone | breakdown value_type | top_n | |
---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
||||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
||||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Histogram Examples
Get a histogram over the last 8 days of results containing the word "cats" for Australian time
Set the query to cats
. For type published
the Talkwalker Search Histogram API returns results over the last seven days by default.
curl 'https://api.talkwalker.com/api/v1/search/histogram/published?access_token=demo&timezone=Australia/Perth&q=cats&interval=day&pretty=true'
{
"status_code" : "0",
"status_message" : "OK",
"request" : "GET /api/v1/search/histogram?access_token=demo&q=cats&interval=day",
"result_histogram" : {
"header" : {
"v" : [ "Number Results" ]
},
"data" : [ {
"t" : 1417478400000,
"v" : [ 4366.0 ]
}, {
"t" : 1417564800000,
"v" : [ 3385.0 ]
}, {
"t" : 1417651200000,
"v" : [ 4233.0 ]
}, {
"t" : 1417737600000,
"v" : [ 4071.0 ]
}, {
"t" : 1417824000000,
"v" : [ 2571.0 ]
}, {
"t" : 1417910400000,
"v" : [ 2191.0 ]
}, {
"t" : 1417996800000,
"v" : [ 3275.0 ]
}, {
"t" : 1418083200000,
"v" : [ 1140.0 ]
} ]
}
}
t
indicates the time-based lower bound of the current bucket, while v
is the number of elements inside that bucket.
Get a histogram with a resolution of 6 hours over the last 7 days of results containing the word "cats"
Set interval
to 6h
for 4 values per day.
curl 'https://api.talkwalker.com/api/v1/search/histogram/published?access_token=demo&q=cats&interval=6h'
The interval
parameter accepts the values year
, quarter
, month
, week
, day
, hour
, minute
, second
as well as numeric values with the units w
(week), d
(day), h
(hours), m
(minutes), and s
(seconds).
Get a histogram over a specific time window
Due to the 30-day-limitation on global data, please replace the timestamps in the following example by recent values. |
Set min
to 1601510400000
and max
to 1601856000000
to get a histogram of results published between 01.10.2020 and 05.01.2020 with start timestamp included and end timestamp excluded (default values).
curl 'https://api.talkwalker.com/api/v1/search/histogram/published?access_token=demo&q=cats&min=1601510400000&max=1601856000000'
{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v1/search/histogram/published?access_token=demo&q=cats&min=1601510400000&max=1601856000000",
"result_histogram": {
"header": {
"v": [
"Number Results"
]
},
"data": [
{
"t": 1601510400000,
"v": [
19123.0
]
},
{
"t": 1601596800000,
"v": [
18855.0
]
},
{
"t": 1601683200000,
"v": [
20678.0
]
},
{
"t": 1601769600000,
"v": [
14820.0
]
}
]
}
}
The min
and max
parameters accept timestamps in epoch format (milliseconds after 1.1.1970 UTC).
Special attention needs to be paid when working with the timezone
parameter.
In the above example, we get one result value for each started day in the respective timezone, amounting to a total of 4 values.
We repeat the call from before, but this time, we set timezone
to Asia/Tokyo (UTC+9).
{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v1/search/histogram/published?access_token=demo&q=cats&min=1601510400000&max=1601856000000&timezone=Asia%2FTokyo",
"result_histogram": {
"header": {
"v": [
"Number Results"
]
},
"data": [
{
"t": 1601478000000,
"v": [
10329.0
]
},
{
"t": 1601564400000,
"v": [
19244.0
]
},
{
"t": 1601650800000,
"v": [
22390.0
]
},
{
"t": 1601737200000,
"v": [
15045.0
]
},
{
"t": 1601823600000,
"v": [
6468.0
]
}
]
}
}
curl 'https://api.talkwalker.com/api/v1/search/histogram/published?access_token=demo&q=cats&min=1601510400000&max=1601856000000&timezone=Asia%2FTokyo'
This time, we get 5 result values.
This is due to min
and max
, which resolve to different times.
In the first example, min
and max
resolve to 01.10.2020 00:00:00 UTC and 05.10.2020 00:00:00 UTC respectively.
In this second example, they resolve to 01.10.2020 09:00:00 JST and 05.10.2020 09:00:00 JST respectively.
In other words, we get 15 instead of 24 hours worth of data for 01.10.2020 and we get 9 hours worth of data for 05.10.2020.
When changing timezone
, min
and max
need to be adjusted accordingly.
Get a histogram and statistics over engagement
For types different from published
and search_indexed
, the histogram API also returns statistics (average, minimum, maximum and sum) over every bin.
(For published
and search_indexed
, we can specify an additional metric for statistics with the value_type
parameter.)
We can use the min
and max
parameters to only consider documents whose engagement
lies within a specific range.
curl 'https://api.talkwalker.com/api/v1/search/histogram/engagement?access_token=demo&q=cats&min=1000&max=2000'
{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v1/search/histogram/engagement?access_token=demo&q=cats&min=1000&max=2000",
"result_histogram": {
"data": [
{
"v": [
33.0
],
"k": 1000.0,
"val": [
{
"count": 33,
"min": 1002.0,
"max": 1097.0,
"avg": 1051.121212121212,
"sum": 34687.0
}
]
}, (...) {
"v": [
14.0
],
"k": 1900.0,
"val": [
{
"count": 14,
"min": 1906.0,
"max": 1992.0,
"avg": 1944.9285714285713,
"sum": 27229.0
}
]
}
]
}
}
k
indicates the number-based lower bound of the current bucket, while v
is the number of elements inside that bucket.
The content of val
contains additional information about the elements of that bucket.
Get a histogram with a breakdown over sentiment
For time-based histograms (type: published
or search_indexed
), it is possible to add a breakdown parameter, set to either sentiment
, sourcetype
or country
.
The header then contains the different values for the chosen breakdown type, while the data field v
lists in the same order the number of matching elements from a bucket.
curl 'https://api.talkwalker.com/api/v1/search/histogram/published?access_token=demo&q=cats&breakdown=sentiment'
{
"status_code" : "0",
"status_message" : "OK",
"request" : "GET /api/v1/search/histogram/published?access_token=demo&q=cats&breakdown=sentiment&pretty=true",
"result_histogram" : {
"header" : {
"v" : [ "POSITIVE", "NEUTRAL", "NEGATIVE" ]
},
"data" : [ {
"t" : 1577923200000,
"v" : [ 3944.0, 10488.0, 1732.0 ]
} ...
// truncated
{
"t" : 1578528000000,
"v" : [ 922.0, 2814.0, 573.0 ]
} ]
}
}
Top N distribution examples
While it is possible to specify N by using the top_n parameter, a default value of 10 is set.
The output for Top N distributions is very similar to the histogram output, where the main differences are:
* The total hit number is contained in the header
* The key is stored in the ks
field
Get the Top 3 languages
curl 'https://api.talkwalker.com/api/v1/search/histogram/language?access_token=demo&q=cats&top_n=3&pretty=true'
{
"status_code" : "0",
"status_message" : "OK",
"request" : "GET /api/v1/search/histogram/language?access_token=demo&q=cats&top_n=3&pretty=true",
"result_histogram" : {
"data" : [ {
"v" : [ 550242.0 ],
"ks" : "en"
}, {
"v" : [ 6918.0 ],
"ks" : "de"
}, {
"v" : [ 1882.0 ],
"ks" : "ja"
}, {
"v" : [ 1711.0 ],
"ks" : "fr"
} ],
"total_hits" : 570382
},
"request_id" : "#qx0h3vup7r8d#"
}
ks
serves as string-based indicator of a bucket, while v
represents the number of elements inside.
The number of total results is provided in the total_hits
.
Get the Top 3 themes
The top N themes differ from other Top N distributions in the way that they are calculated based on a sample of the total documents. Thus, the result does not contain the number of documents from the sample, which include a certain token, but the percentage.
curl 'https://api.talkwalker.com/api/v1/search/histogram/theme_cloud?access_token=demo&q=cats&top_n=3&pretty=true'
{
"status_code" : "0",
"status_message" : "OK",
"request" : "GET /api/v1/search/histogram/theme_cloud?access_token=demo&q=cats&top_n=3&pretty=true",
"result_histogram" : {
"data" : [ {
"v" : [ 0.996 ],
"ks" : "cats"
}, {
"v" : [ 0.265 ],
"ks" : "cat"
}, {
"v" : [ 0.193 ],
"ks" : "dogs"
} ],
"total_hits" : 570399
},
"request_id" : "#qx0h7rc0i29n#"
}
ks
serves as string-based indicator of a bucket, while v
represents the percentage of elements inside results.
The number of total results is provided in the total_hits
.
Get the Top 3 emoji
curl 'https://api.talkwalker.com/api/v1/search/histogram/emoji?access_token=demo&q=cats&top_n=3&pretty=true'
{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v1/search/histogram/emoji?access_token=demo&q=cats&top_n=3&pretty=true",
"result_histogram": {
"data": [
{
"v": [ 5289.0 ],
"ks": "❤"
},
{
"v": [ 3349.0 ],
"ks": "😂"
},
{
"v": [ 2428.0 ],
"ks": "🐱"
}
],
"total_hits": 570083
},
"request_id": "#qx0br7wh57pp#"
}
ks
serves as string-based indicator of a bucket (JAVA source code encoded), while v
represents the number of elements inside.
The number of total results is provided in the total_hits
.
Get the Top 3 regions
curl 'https://api.talkwalker.com/api/v1/search/histogram/source_region?access_token=demo&q=cats&top_n=3&pretty=true'
{
"status_code" : "0",
"status_message" : "OK",
"request" : "GET /api/v1/search/histogram/source_region?access_token=demo&q=cats&top_n=3&pretty=true",
"result_histogram" : {
"data" : [ {
"v" : [ 2006.0 ],
"key" : {
"country_code" : "us",
"region" : "California",
"short_id" : "california_us"
}
}, {
"v" : [ 1599.0 ],
"key" : {
"country_code" : "ph",
"region" : "Isabela (province)",
"short_id" : "isabela_ph"
}
}, {
"v" : [ 786.0 ],
"key" : {
"country_code" : "us",
"region" : "New York",
"short_id" : "newyork_us"
}
} ],
"total_hits" : 17914
},
"request_id" : "#qyc8lsx8lof6#"
}
region
serves as string-based indicator of a bucket, while v
represents the number of elements inside.
The number of total results is provided in the total_hits
.
Get the Top 3 cities
curl 'https://api.talkwalker.com/api/v1/search/histogram/source_city?access_token=demo&q=cats&top_n=3&pretty=true'
{
"status_code" : "0",
"status_message" : "OK",
"request" : "GET /api/v1/search/histogram/source_city?access_token=demo&q=cats&top_n=3&pretty=true",
"result_histogram" : {
"data" : [ {
"v" : [ 1598.0 ],
"key" : {
"country_code" : "ph",
"region" : "Isabela (province)",
"city" : "Jones, Isabela",
"short_id" : "jones_isabela_ph"
}
}, {
"v" : [ 948.0 ],
"key" : {
"country_code" : "us",
"region" : "California",
"city" : "San Francisco",
"short_id" : "sanfrancisco_california_us"
}
}, {
"v" : [ 737.0 ],
"key" : {
"country_code" : "us",
"region" : "South Dakota",
"city" : "Sioux Falls, South Dakota",
"short_id" : "siouxfalls_southdakota_us"
}
} ],
"total_hits" : 12971
},
"request_id" : "#qyc8uo6qqxiz#"
}
city
serves as string-based indicator of a bucket, while v
represents the number of elements inside.
The number of total results is provided in the total_hits
.
Distribution examples
Contrary to the Top N distribution, no parameter is required. Other than that, both share the same structure:
-
The total hit number is contained in the header
-
The key is stored in the
ks
field
Get the sentiment distribution
curl 'https://api.talkwalker.com/api/v1/search/histogram/sentiment?access_token=demo&q=cats&pretty=true'
{
"status_code" : "0",
"status_message" : "OK",
"request" : "GET /api/v1/search/histogram/sentiment?access_token=demo&q=cats&pretty=true",
"result_histogram" : {
"data" : [ {
"v" : [ 327698.0 ],
"ks" : "NEUTRAL"
}, {
"v" : [ 149214.0 ],
"ks" : "POSITIVE"
}, {
"v" : [ 93485.0 ],
"ks" : "NEGATIVE"
}, {
"v" : [ 0.0 ],
"ks" : "NONE"
} ],
"total_hits" : 570397
},
"request_id" : "#qx0hczxspbq8#"
}
Get the number of unique authors
While the number of unique authors is a single value, it shares the same structure than distributions, thus the categorization.
curl 'https://api.talkwalker.com/api/v1/search/histogram/unique_author?access_token=demo&q=cats&pretty=true'
{
"status_code" : "0",
"status_message" : "OK",
"request" : "GET /api/v1/search/histogram/unique_author?access_token=demo&q=cats&pretty=true",
"result_histogram" : {
"data" : [ {
"v" : [ 306021.0 ],
"ks" : "value"
} ],
"total_hits" : 570392
},
"request_id" : "#qx0hf66t2vwz#"
}
Multiple query examples
It is possible to enter multiple queries when working with histograms.
All histogram types are compatible will multiple queries.
The result structure is similar to that of a histogram with breakdown parameter: query
contains all entered queries, while v
and in some cases val
contain the respective results in the same order.
Simple example
curl 'https://api.talkwalker.com/api/v1/search/histogram/published?access_token=demo&q=cats&q=dogs'
{
"status_code" : "0",
"status_message" : "OK",
"request" : "GET /api/v1/search/histogram/published?access_token=demo&q=cats&q=dogs&pretty=true",
"result_histogram" : {
"header" : {
"v" : [ "Number Results" ]
},
"data" : [ {
"t" : 1577923200000,
"v" : [ 123.0, 456.0 ]
} ...
// truncated
{
"t" : 1578528000000,
"v" : [ 789.0, 1011.0 ]
} ],
"query" : [ "cats", "dogs" ]
}
}
Top N example
Top N histogram types like language
can also be used with multiple queries.
The result contains all top N values for the different queries, which may be more than N in total, as shown in the example below.
curl 'https://api.talkwalker.com/api/v1/search/histogram/language?access_token=demo&q=dogs&q=cats&top_n=3'
{
"status_code" : "0",
"status_message" : "OK",
"request" : "GET /api/v1/search/histogram/language?access_token=demo&q=dogs&q=cats&pretty=true",
"result_histogram" : {
"data" : [
{
v: 377.0
v: 483.0
ks: "en"
}, {
v: 215.0
v: 251.0
ks: "ja"
}, {
v: -1.0
v: 123.0
ks: "ar"
}, {
v: 87.0
v: -1.0
ks: "es"
}
],
"total_hits" : 1700
"total_query_hits" : [ 721, 1234]
"query" : [ "dogs", "cats" ]
}
}
Searching for the top 3 languages for each query, the third place is different for both, so the result contains 4 keys in total.
A -1.0
marker is placed where one query does not have a value belonging to the top N.
The results are sorted by the sum of the values, not taking the -1.0
markers into consideration.
Multiple queries + breakdown example
curl 'https://api.talkwalker.com/api/v1/search/histogram/published?access_token=demo&q=dogs&q=cats&breakdown=sentiment'
{
"status_code" : "0",
"status_message" : "OK",
"request" : "GET /api/v1/search/histogram/published?access_token=demo&q=dogs&q=cats&pretty=true",
"result_histogram" : {
"header" : {
"v" : [ "POSITIVE", "NEUTRAL", "NEGATIVE" ]
},
"data" : [ {
"t" : 1577923200000,
"v" : [ 41.0, 38.0, 44.0, 152.0, 140.0, 164.0 ]
} ...
// truncated
{
"t" : 1578528000000,
"v" : [ 200.0, 250.0, 339.0, 333.0, 350.0, 328.0 ]
} ],
"query" : [ "dogs", "cats" ]
}
}
The content of v
is ordered by query first and breakdown second.
For two queries q1
, q2
and a breakdown over sentiment, this results in the following ordering:
[ <q1, POSITIVE>, <q1, NEUTRAL>, <q1, NEGATIVE>, <q2, POSITIVE>, <q2, NEUTRAL>, <q2, NEGATIVE> ]
As an illustration, compare the values for t
= 1577923200000 in the table below with the example above.
Dogs | Cats | |
---|---|---|
Positive |
41.0 |
152.0 |
Neutral |
38.0 |
140.0 |
Negative |
44.0 |
164.0 |
Talkwalker Search API and Talkwalker Projects
https://api.talkwalker.com/api/v1/search/p/<project_id>/results?access_token=<access_token>
How it works
Talkwalker users can use the topics defined in their project with the Talkwalker API.
Topics can be used with the Search Results API by setting the parameter topic
to one or more topic-IDs.
This allows Talkwalker users to use the queries from their projects and to retrieve the documents they get in their Talkwalker project including changes and tags that were done in Talkwalker.
In addition to the 30 days of search, the full history of Talkwalker projects is available in the search API, when used in combination with a Talkwalker project.
Talkwalker users can also retrieve the datasets defined in their Customer Intelligence project (also called VoC project) using the parameter dataset
. A dataset is a set of documents disjoined from topic or channel documents, and have been imported following a customized data format. Datasets older than 7 days can no longer be queried.
Parameters
Same as Search API Parameters with additional parameters topic
, filter
, channel
, panel
and dataset
:
parameter | description | required? | default value |
---|---|---|---|
|
API access token |
required |
|
|
The query to search for |
required |
|
|
Number of results to skip (for paging) |
optional |
default: 0 / maximum: see below this table |
|
Number of hits per page (for paging) |
optional |
default: 10 / maximum: 500 |
|
Criteria for sorting the results. |
optional |
default: engagement |
|
Sorting order (ascending or descending) |
optional |
default: desc |
|
Turns highlighting on or off |
optional |
default: true |
|
Formatted json for testing |
optional |
false |
|
One or more topics that are defined in the Talkwalker project |
optional, multiple |
|
|
One or more filters that are defined in the Talkwalker project |
optional, multiple |
|
|
One or more channels that are defined in the Talkwalker project |
optional, multiple |
|
|
One or more source panels that are defined in the Talkwalker project |
optional, multiple |
|
|
One or more datasets that are defined in the Talkwalker Customer Intelligence project |
optional, multiple |
The maximum offset value is influenced by the hits per page. hpp + offset can’t be greater than 10.000.
Limit the results by topic
In order to only include results that match either topic_a
or topic_b
when calling the search endpoint, we can add these topics as parameters.
https://api.talkwalker.com/api/v1/search/p/<project_id>/results?access_token=<access_token>&topic=topic_a&topic=topic_b
Get a list of all projects linked to an API application
Use the private access_token
from your API application on the https://api.talkwalker.com/api/v1/search/info
endpoint to get the list of all linked projects.
More on Talkwalker Resources API to retrieve the resources configured in a project.
curl 'https://api.talkwalker.com/api/v1/search/info?access_token=<access_token>'
Talkwalker Search Histogram API and Talkwalker Projects
https://api.talkwalker.com/api/v1/search/p/<project_id>/histogram/<type>
How it works
Talkwalker users can use the topics defined in their project with the Talkwalker API.
The Project Search Histogram API can be used with the same parameters
and types
as the Search Histogram API.
Additionally, in order to query a specific topic of a Talkwalker Project, the parameter topic
can be set to one or more topic-IDs.
Talkwalker users can also retrieve the datasets defined in their Customer Intelligence project using the parameter dataset
. Datasets older than 7 days can no longer be queried.
same types as Talkwalker Search Histogram API Types. Same usage, more on Histogram Examples. |
Parameters
Same as Histogram API Parameters with additional parameters topic
, filter
, channel
, panel
and dataset
:
parameter | description | required? | allowed values | default value |
---|---|---|---|---|
|
a read/write token specified in the API application |
required |
||
|
The query to search for |
required |
Talkwalker query syntax |
|
|
Minimum value for bins |
optional |
Long Integer value |
|
|
Maximum value for bins |
optional |
Long Integer value |
|
|
Include min value |
optional |
|
|
|
Include max value |
optional |
|
|
|
Bin Interval |
optional |
Duration for |
|
|
Timezone (for interval) |
optional |
tz database: timezone name (e.g. |
|
|
Nested histogram |
optional |
|
|
|
Nested metric for time based histograms |
optional |
metric historgram types |
|
|
Size limiter for demographic distribution |
optional |
Integer value in [0, 100) |
|
|
One or more topics that are defined in the Talkwalker project |
optional, multiple |
||
|
One or more filters that are defined in the Talkwalker project |
optional, multiple |
||
|
One or more channels that are defined in the Talkwalker project |
optional, multiple |
||
|
One or more source panels that are defined in the Talkwalker project |
optional, multiple |
Talkwalker Streaming API
Overview
In general, the results obtained through the Talkwalker Streaming API depend on three factors: the dataset, the connection and the rules.
There are two datasets, which can be accessed:
-
Global data: Contains publicly available data
-
Project data: Consists of monitored and uploaded private data, which is unavailable outside of the project
The connection to the Streaming API can also be established in two ways:
-
Volatile: Data is only matched while the client is connected. Disconnection results in missing data.
→ Use case: Data is immediately consumed (e.g. a live dashboard) -
Persisted: By setting up a collector instead of a direct connection, the results are buffered for 7 days. Data is available in real-time, and contrary to volatile data, it is possible for a client to disconnect and reconnect without losing any data.
→ Use case: Handling a large amount of data and having a complete dataset. Generally recommended use case.
In the following sections, we will present 3 ways of accessing data:
-
Through stream: Volatile, global data
-
Through project: Volatile, project and global data
-
Through collector: Persisted, global data (and project data depending on setup)
A special use case of collectors is in accessing past data, which is presented separately in the final section.
Introduction
The following table shows parameters common to all endpoints that will be used in the Talkwalker Streaming API documentation.
parameter | description | required? | default value |
---|---|---|---|
|
a read/write token specified in the API application |
required |
|
|
receive a prettified JSON response if set to true |
optional |
false |
Stream IDs, rule IDs, collector IDs, etc used in these endpoints can only contain lowercase letters, numbers and the characters -
and _
.
They have to start with a lower case letter.
There are three types of chunks in the response of a request: CT_RESULT
(containing the result), CT_CONTROL
(containing information about the connection and possible information about the next result chunk) and CT_ERROR
(for a possible error message).
Credits
Each new result (independent on how many rules match) is counted as 1 credit. Updated documents come at no cost. The field appearance
under matched
provides this information.
More on Result chunk
If no credits are left, the data access is stopped and a control chunk containing the resume_offset
of the end (needed for resuming) is sent.
API calls which don’t return any results are not counted.
The documents are billed after every completed chunk.
If a stream gets disconnected, a non-completed chunk will not be billed, since it needs to be restarted when resuming.
If an export task gets interrupted, all exported results are counted. Repeating the task means exporting duplicates, which count both times as 1 credit.
Project data access
When accessing data through a Talkwalker project, either a GET
or POST
HTTP request with several optional parameters can be used (see Table Optional parameters).
curl -XGET 'https://api.talkwalker.com/api/v3/stream/p/<project_id>/results?access_token=<access_token>'
curl -XPOST 'https://api.talkwalker.com/api/v3/stream/p/<project_id>/results?access_token=<access_token>'
-H 'Content-Type: application/json; charset=UTF-8'
-d '{"q": "<query_string>"}'
The POST
request accepts a query in either its body or its URL.
The max_hits
parameter is only accepted in the URL and stops the data access after a set amount of results.
parameter | description |
---|---|
|
query to match |
|
stops the data access after the given number of hits |
|
One or more topics or panels that are defined in the Talkwalker project |
|
One or more datasets that are defined in the Talkwalker Customer Intelligence project |
Limit the results by topic
In order to only include results that match either topic_a
or topic_b
when calling the stream endpoint, we can add these topics as parameters.
https://api.talkwalker.com/api/v3/stream/p/<project_id>/results?access_token=<access_token>&topic=topic_a&topic=topic_b
Result chunk
The result chunk for project data access has the form:
{
"chunk_type" : "CT_RESULT",
"chunk_result" : {
"data" : {
"data" : {
"url" : "<url>",
...
"matched" : {
"appearance" : "<appearance>"
},
...
}
}
}
}
The url
is not provided in every situation, e.g. Twitter results.
While new results count towards the max_hits
parameter and require 1 credit each, it is also possible to obtain updated results.
The field appearance
under matched
provides this information with the following values: UNKNOWN
, NEW
and UPDATED
. Updated results come at no cost.
Similar to when credits run out, only the specified maximum number of results is billed when the max_hits
parameter is set and that amount of results is reached before the end of a chunk.
Still, the full chunk is delivered.
Streams
Receiving results from a stream works in two steps: First, the stream is created, then the data is accessed.
A stream is a collection of rules, where at least one must be provided for it to provide results. It is therefore recommended for the stream creation request to contain at least 1 rule in its body.
curl -XPUT 'https://api.talkwalker.com/api/v3/stream/s/<stream_id>?access_token=<access_token>'
-d '{"rules" : [{"rule_id" : "<rule_id>", "query":"<query>"}]}'
-H "Content-Type: application/json; charset=UTF-8"
For the set of optional parameters for data access of a stream, consider Table Optional parameters.
curl -XGET 'https://api.talkwalker.com/api/v3/stream/s/<stream_id>/results?access_token=<access_token>&q=<query>'
curl -XPOST 'https://api.talkwalker.com/api/v3/stream/s/<stream_id>/results?access_token=<access_token>'
-d '<query_data>'
-H 'Content-Type: application/json; charset=UTF-8'
parameter | description |
---|---|
q |
the query to search for |
max_hits |
stops the stream after the given number of hits |
Each result consumes 1 credit, regardless of the number of matched rules.
When the parameter max_hits
is set, only the specified maximum number of results will be billed, even if the entire timeframe gets streamed after reaching the limit.
Collectors
A collector allows users to define a hybrid collection of rules, based on: projects, streams and queries/filters. At least one of these needs to be defined.
All results matching the collector’s setup are buffered for 7 days on the server and consume 1 credit, each, no matter how many times they are read. In other words, data can be downloaded multiple times without additional cost.
A special use case of collectors lies in the access of past data, which is presented separately in the next section.
In this section, we present how to download the results of a collector and provide a list of operations (including the definition of a collector) along with examples.
Downloading the results of a collector
The search results of a collector can be accessed by a GET
HTTP request, allowing for several optional parameters (see Table Optional parameters).
curl -XGET 'https://api.talkwalker.com/api/v3/stream/c/<collector_id>/results?access_token=<access_token>'
parameter | description | possible values | default value |
---|---|---|---|
|
position to resume the data access from. Can be retrieved from control chunks. |
"earliest" | "latest" | <resume_token> |
"earliest" |
|
what to do when we reached the most recent result |
"stop" | "wait" |
"wait" |
Operations on the definition of a collector
The Talkwalker Streaming API allows users to create/replace, retrieve and delete a collector, using the endpoint https://api.talkwalker.com/api/v3/stream/c/<collector_id>
.
Definition of a collector
The definition of a collector consists of the following parameters.
parameter | description | ||
---|---|---|---|
|
|
||
|
|
|
|
|
|
||
|
|
|
|
|
|
||
|
|
|
|
|
|
||
|
|
The collector_query must never include both project and project_topics , only one of these two is allowed.The collector_query supports only one project ID for both project and project_topics
|
To create an active collector, at least one parameter is required in the collector_query
(e.g. project, stream or project_topics).
An empty collector query can be used to create a paused collector for past data export tasks. See the section Creation of export tasks based on query parameter
For all list parameters except filters (e.g. streams
, queries
, topics
in project_topics
), only one element needs to match (OR
between elements of the list(stream IDs, topic IDs…)).
All filters need to be matched (AND
between different filter IDs).
If multiple parameters are provided (e.g. project_topics
and filters
), they must all be matched (AND
between different parameters).
Examples
-
Definition of stream "stream-1"
{ "stream_id" : "stream-1", "rules" : [{ "rule_id": "<rule_id>", "query": "<query>" }] }
-
Definition of collector "collector-1"
{ "collector_query" : { "streams" : ["stream-1"] }
All documents which match the stream remain in the collector for 7 days.
{
"collector_query" : {
"streams" : ["stream-1"],
"queries" : [{
"id" : "<q1>",
"query" : "<query>"
}]
}
This collector collects all documents, which match "stream-1" AND q1
.
{
"collector_query" : {
"project" : "<p1>",
"queries" : [{
"id" : "<q1>",
"query" : "<query_1>"
}, {
"id" : "<q2>",
"query" : "<query_2>"
}]
}
}
This collector collects all documents, which match p1 AND (q1 OR q2)
.
{
"collector_query" : {
"queries" : [{
"id" : "<q1>",
"query" : "<query_1>"
}, {
"id" : "<q2>",
"query" : "<query_2>"
}],
"project_topics": {
"project": "<p1>",
"topics": ["<t1>", "<t2>"]
},
"filters" : ["<f1>", "<f2>"]
}
}
This collector collects all documents, which match (q1 OR q2) AND (t1 OR t2) AND f1 AND f2
.
Create/update a collector
curl -XPUT 'https://api.talkwalker.com/api/v3/stream/c/<collector_id>?access_token=<access_token>'
-d '<collector_definition>'
-H 'Content-Type: application/json; charset=UTF-8'
For a <collector_definition>, the field state
should not be set (it is set to ACTIVE
automatically), and at least a project, a stream or a query must be set in the field collector_query
.
A collector can include only one project but multiple queries and streams. The number of allowed queries and streams is not limited.
curl -XPUT 'https://api.talkwalker.com/api/v3/stream/c/collector-1?access_token=<access_token>&pretty=true'
-d '{"collector_query" : {"streams" : ["stream-1"], "queries" : [{"id" : "q-1", "query" : "lang:en"}]}}'
-H 'Content-Type: application/json; charset=UTF-8'
{
"status_code" : "0",
"status_message" : "OK",
"request" : "PUT /api/v3/stream/c/collector-1?access_token=<access_token>&pretty=true",
"result_stream" : {
"collectors": [ {
"state": "ACTIVE",
"collector_id": "collector-1"
} ]
}
}
Retrieve the definition of a collector
curl -XGET 'https://api.talkwalker.com/api/v3/stream/c/<collector_id>?access_token=<access_token>&pretty=true'
In the response, the state of the collector is included, which can assume the following values: UNKNOWN
, ACTIVE
, ERROR
, DELETED
, PAUSED
, NO_CREDITS
.
{
"status_code" : "0",
"status_message" : "OK",
"request" : "GET /api/v3/stream/c/collector-1?access_token=<access_token>&pretty=true",
"result_stream" : {
"collectors" : [{
"collector_id" : "collector-1",
"state" : "ACTIVE",
"query" : {
"streams" : ["stream-1"],
"queries" : [{
"id" : "q-1",
"query" : "lang:en"
}]
}
}]
}
}
Delete a collector
Deleting a collector permanently removes it and its content. A new collector with the same name can be created, but it will not include the old collector’s results. Contrary, when updating a collector with a new query without deleting it, the old data is still included.
curl -XDELETE 'https://api.talkwalker.com/api/v3/stream/c/<collector_id>?access_token=<access_token>&pretty=true'
{
"status_code" : "0",
"status_message" : "OK",
"request" : "DELETE /api/v3/stream/c/collector-1?access_token=<access_token>&pretty=true",
"result_stream" : {
"collectors" : [ {
"collector_id" : "collector-1",
"state" : "DELETED"
} ]
}
}
Pause a collector
When calling this endpoint, a collector’s state changes to "PAUSED". A collector does not collect any real-time data while it is paused. When resuming a paused collector, all previously collected data is still included. A paused collector that is chosen as target for an export task still receives all exported data.
curl -XPOST 'https://api.talkwalker.com/api/v3/stream/c/<collector_id>/pause?access_token=<access_token>&pretty=true'
{
"status_code" : "0",
"status_message" : "OK",
"request" : "POST /api/v3/stream/c/collector-1/pause?access_token=<access_token>&pretty=true",
"result_stream" : {
"collectors" : [ {
"collector_id" : "collector-1",
"state" : "PAUSED"
} ]
}
}
Resume a collector
Resuming a collector shifts its state from "PAUSED" to "ACTIVE". All incoming data from the point of resuming the collector onwards is stored again.
curl -XPOST 'https://api.talkwalker.com/api/v3/stream/c/<collector_id>/resume?access_token=<access_token>&pretty=true'
{
"status_code" : "0",
"status_message" : "OK",
"request" : "POST /api/v3/stream/c/collector-1/resume?access_token=<access_token>&pretty=true",
"result_stream" : {
"collectors" : [ {
"collector_id" : "collector-1",
"state" : "ACTIVE"
} ]
}
}
Resume a collector
Resuming a collector shifts its state from "PAUSED" to "ACTIVE". All incoming data from the point of resuming the collector onwards is stored again.
curl -XPOST 'https://api.talkwalker.com/api/v3/stream/c/<collector_id>/resume?access_token=<access_token>&pretty=true'
{
"status_code" : "0",
"status_message" : "OK",
"request" : "POST /api/v3/stream/c/collector-1/resume?access_token=<access_token>&pretty=true",
"result_stream" : {
"collectors" : [ {
"collector_id" : "collector-1",
"state" : "ACTIVE"
} ]
}
}
Retrieve the information of all streams and collectors
curl -XGET 'https://api.talkwalker.com/api/v3/stream/info?access_token=<access_token>&pretty=true'
{
"status_code" : "0",
"status_message" : "OK",
"request" : "GET /api/v3/stream/info?access_token=<access_token>&pretty=true",
"result_stream" : {
"streams" : [ {
"stream_id" : "stream-1",
"enabled" : true
} ],
"collectors" : [ {
"collector_id" : "collector-1",
"state" : "ACTIVE"
} ]
}
}
Past data export through collectors
Exports allow to create asynchronous tasks that copy a selection of past data into a collector, which can then be accessed through collectors.
There are three options to define the source of past data:
-
existing stream definitions (via a stream ID)
-
Talkwalker projects (via project ID)
-
specifying a query in the request body
An ongoing task can be checked or aborted by using its task ID, included in the response.
There are 3 POST endpoints, presented in the following subsections, which can execute an export task. These 3 endpoints share the following parameters, set inside the body:
parameter | description | required? | default |
---|---|---|---|
|
timestamp (milliseconds since 1.1.1970, e.g. 1539302400000) or date of the timeframe’s start (2018-10-12) |
required |
|
|
timestamp or date of the timeframe’s end |
optional |
|
|
ID of the collector |
required |
|
|
the query to search for (conjunctive to existing queries, i.e. matching all) |
optional |
|
|
the maximum number of results to export before interrupting |
optional |
1.000.000 |
Each exported result consumes 1 credit. Exporting the same result multiple times due to overlapping export tasks therefore requires multiple credits.
Creation of export tasks for Talkwalker projects
An export task for a Talkwalker project is started with a POST request to the /api/v3/stream/p/<project_id>/export
endpoint.
curl -XPOST 'https://api.talkwalker.com/api/v3/stream/p/<project_id>/export?access_token=<access_token>
-d '{"start": "<date>", "stop":<timestamp>, "target":"<target>"}'
Furthermore, for Talkwalker project export tasks, it is possible to further narrow down the result set.
If not the complete project but a selection of its topics must be matched, this can be specified by using the topics
parameter.
parameter | description | required? |
---|---|---|
|
IDs of the topics taken into consideration |
optional |
curl -XPOST 'https://api.talkwalker.com/api/v3/stream/p/<project_id>/export?access_token=<access_token>
-d '{"start": "2018-11-15", "stop":1545127673884, "target":"testcollector", "topics":["topic1_id","topic2_id"]}'
Tags can also be included or excluded by using them in the query parameter. In this case, the IDs of the tags should be provided
curl -XPOST 'https://api.talkwalker.com/api/v3/stream/p/<project_id>/export?access_token=<access_token>
-d '{"start": "2018-11-15", "stop":1545127673884, "target":"testcollector", "topics":["topic1_id","topic2_id"], "query":"tag:tag_id"}'
It is also possible to retrieve the datasets that are defined in the Talkwalker Customer Intelligence project.
This can be specified by using the datasets
parameter. Datasets older than 7 days can no longer be queried.
parameter | description | required? |
---|---|---|
|
IDs of the datasets defined in the Talkwalker Customer Intelligence project |
optional |
curl -XPOST 'https://api.talkwalker.com/api/v3/stream/p/<CI_project_id>/export?access_token=<access_token>
-d '{"start": "2022-01-01", "stop":1641830679000, "target":"testcollector", "topics":["datasets1_id","datasets2_id"]}'
Creation of export tasks for existing streams
An export of data based on an existing stream definition is done, similar to projects, by sending a
POST request to the /api/v3/stream/s/<stream_id>/export
endpoint.
curl -XPOST 'https://api.talkwalker.com/api/v3/stream/s/<stream_id>/export?access_token=<access_token>
-d '{"start": "<date>", "target":<collector_id>}'
Creation of export tasks based on query parameter
With a third endpoint, it is possible to create an export task without providing a project or stream ID.
This endpoint depends on the query
parameter, which consequently becomes required
instead of optional
.
curl -XPOST 'https://api.talkwalker.com/api/v3/stream/export?access_token=<access_token>
-d '{"start": "<date>", "target":<collector_id>, "query":"<query>"}'
Example
In this example, we wish to export all data from September 2018.
We start by creating an empty collector.
curl -XPUT 'https://api.talkwalker.com/api/v3/stream/c/collector-1?access_token=<access_token>'
-d '{"collector_query" : {}}'
-H "Content-Type: application/json; charset=UTF-8"
The newly created collector is then used as target for the export task, where the time frame is limited to September 2018 using the start
(as date) and stop
(as timestamp, without quotes) parameters in the request body.
curl -XPOST 'https://api.talkwalker.com/api/v3/stream/s/stream-1/export?access_token=<access_token>
-d '{"start": "2018-01-09", "stop":1538352000000, "target":"collector-1"}'
{
"status_code" : "0",
"status_message" : "OK",
"request" : "POST /api/v3/stream/export?access_token=<access_token>",
"result_tasks" : {
"tasks" : [{
"creation_date" : "2018-12-31T15:24:34.069Z",
"type" : "export",
"id" : "task-1",
"status" : "queued",
"processed" : 0,
"progress" : 0.0,
"target" : "collector-1"
}]
}
}
In the response, the state of the export task is included, which can assume the following values: UNKNOWN
, QUEUED
, RUNNING
, FINISHED
, FAILED
, DELETED
, ABORTED
, RESULT_LIMIT_REACHED
.
Best practice: If results for longer time periods shall be exported, it makes sense to split the export task into multiple smaller export tasks (e.g. one month when exporting results for half a year). This allows for a better estimation of the credit cost and the amount of results for the remaining time frame.
Status of an export
Using the task ID, which can be obtained from the response when creating a new task, the status of an export can be accessed.
curl -XGET 'https://api.talkwalker.com/api/v3/tasks/export/<task_id>?access_token=<access_token>
{
"status_code" : "0",
"status_message" : "OK",
"request" : "GET /api/v3/tasks/export/task-1?access_token=<access_token>",
"result_tasks" : {
"tasks" : [{
"creation_date" : "2018-03-21T08:23:00.335Z",
"type" : "export",
"id" : task-1,
"status" : "finished",
"processed" : 3,
"progress" : 1.0,
"target" : "coll-01"
}]
}
}
The same request will give the list of all recent tasks if the task ID parameter is left aside.
curl -XGET 'https://api.talkwalker.com/api/v3/tasks/export?access_token=<access_token>
{
"status_code" : "0",
"status_message" : "OK",
"request" : "GET /api/v3/tasks/export?access_token=<access_token>",
"result_tasks" : {
"tasks" : [{
"creation_date" : "2018-03-21T08:28:35.469Z",
"type" : "export",
"id" : task-1,
"status" : "queued",
"processed" : 0,
"progress" : 0.0,
"target" : "collector-1"
},
{...},
{...}]
}
}
Abort a task
Using the task ID a currently running export task can be aborted.
curl -XDELETE 'https://api.talkwalker.com/api/v3/tasks/export/<task_id>?access_token=<access_token>
{
"status_code" : "0",
"status_message" : "OK",
"request" : "DELETE /api/v3/tasks/export/task-1?access_token=<access_token>",
"result_tasks" : {
"tasks" : [{
"id" : task-1,
"status" : "deleted"
}]
}
}
Modifying documents with the Talkwalker API
Any modifications of documents done via the Talkwalker API will overwrite changes done in Talkwalker. All earlier changes (manual or via export/import) in the same project are lost. |
https://api.talkwalker.com/api/v2/docs/p/<project_id>/<operation>
Parameters
parameter | description | required? | values |
---|---|---|---|
|
a read/write token specified in the API application |
required |
|
|
Talkwalker API recalculates set field based on the updated content |
optional |
sentiment |
|
Specifies if the modified document should be returned |
optional |
hide (default), show |
Advanced Parameters
All parameters in this section are optional.
parameter | description | values |
---|---|---|
|
Specifies when and which tokens are generated |
all_if_new (default), all_always_if_possible |
|
Specifies when the language is automatically detected |
only_if_new (default), always_if_new_content |
|
Special behavior for ellipsis in string fields |
no_special_handling, ignore_if_start_same (default) |
|
Fallback language |
(default: en) |
|
Language annotation |
annotate_if_not_set (default), use_default |
|
Sentiment annotation |
annotate_if_not_set (default), use_default |
Single Documents
To change result documents, use the https://api.talkwalker.com/api/v2/docs/p/<project_id>/<operation>
endpoint.
Creating new documents can be done on the create
operation, updating documents is done with the update
operation.
Using the upsert
operation, a document is created if not present yet, otherwise it is updated.
Deletion and un-deletion of documents can be done on the delete
and undelete
operations respectively.
The fields url
, published
, and content
are required.
When left empty, some fields (for example source_type
, post_type
and lang
) will be filled automatically with default values or automatically extracted values if possible.
A complete overview on writable fields can be found in the chapter Talkwalker Documents.
source_type defaults to OTHER , so in order for a document with default source_type to be displayed in the app, adapt the project settings accordingly.
|
Examples
Create
When executing create for a document that already exists, the request fails and the existing document remains unchanged. Only documents that match at least one topic in the project can be imported.
curl -XPOST 'https://api.talkwalker.com/api/v2/docs/p/<project_id>/create?access_token=<access_token>' -d '
{
"url" : "http://www.example.com/docs/doc1.html",
"title" : "This is a title",
"content" : "Example content. Really not that much.",
"tags_marking" : ["read"],
"published" : "1430136532000",
"extra_source_attributes": {
"geo":"de"
}
}' -H 'Content-Type: application/json; charset=UTF-8'
Please see the section Talkwalker Documents for a list of all writable fields that can be imported/modified with the Talkwalker API.
Update
When executing update and the document does not yet exist, the request fails and no new document is created.
important
tag, and removing the read
tag:curl -XPOST 'https://api.talkwalker.com/api/v2/docs/p/<project_id>/update?access_token=<access_token>' -d '
{
"url" : "http://www.example.com/docs/doc1.html",
"title" : "This is a new title",
"content" : "Example content. Really not that much.",
"+tags_marking" : ["important"],
"-tags_marking" : ["read"],
"extra_author_attributes" : {
"name" : null
},
"published" : "1430136532000"
}' -H 'Content-Type: application/json; charset=UTF-8'
Fields that are of type array, can be updated in three ways: using "<fieldname>"
to replace the whole array, "+<fieldname>"
to add an item to the array, and "-<fieldname>"
to remove an item.
Fields can be cleared by explicitly setting them null
.
curl -XPOST 'https://api.talkwalker.com/api/v2/docs/p/<project_id>/update?access_token=<application_access_token>' -d '
{
"url" : "http://www.news_site.com/news/news1.html",
"+customer_entities" : [ {
"type": "Category",
"id": ["Sports","Football"]
}, {
"type": "Place",
"id": ["USA", "Austin TX"]
} ]
}' -H 'Content-Type: application/json; charset=UTF-8'
type
is the entity-type (e.g. Person
, Brand
, Category
etc), id
is the actual entity name or hierarchy (e.g. Barack Obama
, BMW
, News
etc).
Types are used for grouping entities in theme clouds, IDs are the displayed themes in the theme clouds.
Hierarchical IDs are defined as an array (the order is important!).
When multiple different entities have the same name (e.g. two persons with the same name), a unique identifier can be added after two underscores.
Max Mustermann__1
, Max Mustermann__2
or Max Mustermann__politican
. Only the part of the ID before the underscores will be displayed in the Talkwalker user interface.
The Talkwalker user interface will only show the first 2 levels of IDs.
Upsert
The first upsert
operation works like a create
operation.
curl -XPOST 'https://api.talkwalker.com/api/v2/docs/p/<project_id>/upsert?access_token=<access_token>' -d '
{
"url" : "http://www.example.com/docs/doc1.html",
"title" : "This is a title",
"content" : "Example content. Really not that much.",
"tags_marking" : ["read"],
"published" : "1430136532000",
"extra_source_attributes": {
"geo":"de"
}
}' -H 'Content-Type: application/json; charset=UTF-8'
The second upsert
operation works like an update
operation: content is set to a new value.
curl -XPOST 'https://api.talkwalker.com/api/v2/docs/p/<project_id>/create?access_token=<access_token>' -d '
{
"url" : "http://www.example.com/docs/doc1.html",
"content" : "Updated content."
}' -H 'Content-Type: application/json; charset=UTF-8'
Delete
curl -XPOST 'https://api.talkwalker.com/api/v2/docs/p/<project_id>/delete?access_token=<access_token>' -d '
{
"url" : "http://www.example.com/docs/doc1.html"
}' -H 'Content-Type: application/json; charset=UTF-8'
Undelete
curl -XPOST 'https://api.talkwalker.com/api/v2/docs/p/<project_id>/undelete?access_token=<access_token>' -d '
{
"url" : "http://www.example.com/docs/doc1.html"
}' -H 'Content-Type: application/json; charset=UTF-8'
Multiple Documents
Multiple documents can be manipulated using the https://api.talkwalker.com/api/v2/docs/p/<project_id>
endpoint.
The execution order of the given document operations is not guaranteed (multiple operations on a single document in a single request should be avoided).
curl -XPOST 'https://api.talkwalker.com/api/v2/docs/p/<project_id>?access_token=<access_token>' -d '
[{
"create": {
"url": "http://www.example.com/docs/doc1.html",
"title" : "This is the title of doc 1",
"content" : "and this is the content of doc 1",
"published" : "1430136532000"
}
}, {
"update": {
"url": "http://www.example.com/docs/doc2.html",
"title" : "This is the title of doc 2",
"content" : "and this is the content of doc 2",
}
}, {
"delete": {
"url": "http://www.example.com/docs/doc3.html"
}
}]' -H 'Content-Type: application/json; charset=UTF-8'
Please see the section Talkwalker Documents for a list of all writable fields that can be imported/modified with the Talkwalker API.
If one or more operations fail the response will have the status code 49
and the response will include details of the failure.
The HTTP code of the response is still 200 , even if some operations failed.This means that every document modification in the response needs to be checked separately for a (partial) failure. |
Custom Fields
In order to create a new document that includes custom fields, these custom fields have to be added to the project first (see Project Settings/Scoring Engine/Custom Metrics in the Talkwalker application).
We assume a project that has two custom fields: One decimal number double_1
and one String text_1
.
A document like the following, including these two custom fields, can be created for this project:
curl -XPOST 'https://api.talkwalker.com/api/v2/docs/p/<project_id>/create?access_token=<access_token>' -d '
{
"url" : "http://www.example.com/docs/doc1.html",
"title" : "This is a match",
"content" : "Example content. Really not that much.",
"tags_marking" : ["read"],
"published" : "1430136532000",
"custom": {
"text_1" : "Example text. Also not that much.",
"double_1" : 0.75
}
}' -H 'Content-Type: application/json; charset=UTF-8'
When matching a custom field, the result has the following form.
{
"chunk_type" : "CT_RESULT",
"chunk_result" : {
"data" : {
"data" : {
...
"custom" : {
"test_1" : "Example text. Also not that much."
}
},
"highlighted_data" : [ {
"title_snippet" : "This is a <b>match</b>",
"content_snippet" : "Example content. Really not that much.",
"matched" : {
"project_id" : "<project_id>",
"project_profiles" : [ {
"id" : "jl124veg_32fssfsd9238",
"type" : "topic",
"title" : "Match"
} ]
}
} ]
}
}
}
Talkwalker Resources API
Resources
Resources are data retrieval settings from a Talkwalker project. This can be search-topics, filters, monitored-pages, source-panels, events, or saved-objects for embedding in external tools.
To get a list of the resources defined in a Talkwalker project use the project_id
and the access_token
on the https://api.talkwalker.com/api/v2/talkwalker/p/<project_id>/resources
endpoint.
curl 'https://api.talkwalker.com/api/v2/talkwalker/p/<project_id>/resources?access_token=<access_token>'
Parameters
parameter | description | required? | values |
---|---|---|---|
|
a read/write token specified in the API application |
required |
|
|
filter on the type of resources |
optional |
search, filter, page, event, panel |
If no type is specified, all resources will be returned |
Types
Type | description |
---|---|
|
List of topic IDs and titles |
|
List of custom filter IDs and titles |
|
List of channels and monitored pages |
|
List of event IDs and titles |
|
List of panel IDs and titles |
Examples
List of all topics of a project
curl 'https://api.talkwalker.com/api/v2/talkwalker/p/<project_id>/resources?access_token=<access_token>'
?type=search'
{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v2/talkwalker/p/<project_id>/resources?access_token=<access_token>&type=search",
"result_resources": {
"projects": [
{
"id": "<Project ID>",
"title": "<Project title>",
"topics": [
{
"id": "default",
"title": "",
"nodes": [
{
"id": "<topic ID 1>",
"title": "<topic title 1>"
},
{
"id": "<topic ID 2>",
"title": "<topic title 2>"
}
]
},
{
"id": "<topic Group ID 1>",
"title": "<topic Group title 1>",
"nodes": [
{
"id": "<topic ID 3>",
"title": "<topic title 3>"
}
]
}
]
}
]
}
}
Group ID default is the group "Ungrouped".
|
List of all channels and monitored pages of a project
curl 'https://api.talkwalker.com/api/v2/talkwalker/p/<project_id>/resources?access_token=<access_token>'
?type=page'
{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v2/talkwalker/p/<project_id>/resources?access_token=<access_token>&type=page",
"result_resources": {
"projects": [
{
"id": "<Project ID>",
"title": "<Project title>",
"channels": [
{
"id": "<Channel category ID>",
"title": "<Channel category title>",
"nodes": [
{
"id": "<channel ID 1>",
"title": "<www.example.com>"
}
}
]
}
]
}
}
List of all custom filters of a project
curl 'https://api.talkwalker.com/api/v2/talkwalker/p/<project_id>/resources?access_token=<access_token>'
?type=filter'
{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v2/talkwalker/p/<project_id>/resources?access_token=<access_token>&type=filter",
"result_resources": {
"projects": [
{
"id": "<Project ID>",
"title": "<Project title>",
"filters": [
{
"id": "<Filter category ID>",
"title": "<Filter category title>",
"nodes": [
{
"id": "<Filter ID>",
"title": "<Filter title>"
}
]
},
{
"id": "default_filter",
"title": ""
}
]
}
]
}
}
List of all events of a project
curl 'https://api.talkwalker.com/api/v2/talkwalker/p/<project_id>/resources?access_token=<access_token>'
?type=event'
{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v2/talkwalker/p/<project_id>/resources?access_token=<access_token>&type=event",
"result_resources": {
"projects": [
{
"id": "<Project ID>",
"title": "<Project title>",
"events": [
{
"id": "<Event ID>",
"title": "<Event title>",
}
]
}
]
}
}
List of all panels of a project
curl 'https://api.talkwalker.com/api/v2/talkwalker/p/<project_id>/resources?access_token=<access_token>'
?type=event'
{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v2/talkwalker/p/<project_id>/resources?access_token=<access_token>&type=panel",
"result_resources": {
"projects": [
{
"id": "<Project ID>",
"title": "<Project title>",
"panels": [
{
"id": "favourites",
"title": ""
},
{
"id": "star1",
"title": ""
},
{
"id": "<Custom panel ID 1>",
"title": "<Custom panel title 1>"
}
]
}
]
}
}
Tags
To get a list of the tag IDs defined in a Talkwalker project use the project_id
and the access_token
on the https://api.talkwalker.com/api/v2/talkwalker/p/<project_id>/tags
endpoint.
Tag IDs can be used in the query syntax.
curl 'https://api.talkwalker.com/api/v2/talkwalker/p/<project_id>/tags?access_token=<access_token>'
Parameters
parameter | description | required? |
---|---|---|
|
a read/write token specified in the API application |
required |
{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v2/talkwalker/p/<project_id>/tags?access_token=<access_token>,
"result_tags": {
"tags": [
{
"label": "<Tags label 1>",
"id": "<Tags ID 1>"
},
{
"label": "<Tags label 2",
"id": "<Tags ID 2>"
}
]
}
}
Tags labels supports hierarchy using / .
|
Dashboards, reports and alerts
To get a list of the dashboard/report/alert IDs defined in a Talkwalker project use the project_id
and the access_token
on the https://api.talkwalker.com/api/v2/talkwalker/p/<project_id>/views
endpoint.
curl 'https://api.talkwalker.com/api/v2/talkwalker/p/<project_id>/views?access_token=<access_token>'
Parameters
parameter | description | required? |
---|---|---|
|
a read/write token specified in the API application |
required |
{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v2/talkwalker/p/<project_id>/views?access_token=<access_token>",
"result_views": {
"projects": [
{
"id": "<Project ID>",
"title": "<Project Title>",
"dashboards": [
{
"id": "<Dashboard ID>",
"title": "<Dashboard title>",
"last_update": 1626941639367
}
],
"reports": [
{
"id": "<Report ID 1>",
"title": "<Report title 1>",
"last_update": 1628585633695
},title
{
"id": "<Alert ID 1>",
"title": "<Alert title 1>",
"last_update": 1628585747167
}
]
}
]
}
}
Alert and reports are under the array reports .
|
Modifying topics with Talkwalker API
Overview
To create or edit topics, use the endpoint
https://api.talkwalker.com/api/v2/talkwalker/p/<project_id>/topics/import
To delete topics, use the endpoint
https://api.talkwalker.com/api/v2/talkwalker/p/<project_id>/topics/delete
To get a list of all topics with their details, use the endpoint
https://api.talkwalker.com/api/v2/talkwalker/p/<project_id>/topics/list
Definition of a topic
The definition of a topic consists of the following parameters:
parameter | description | required? | values |
---|---|---|---|
|
unique topic ID |
optional |
generated automatically if absent or left empty |
|
topic title |
required |
|
|
topic description |
optional |
|
|
category title |
required if category_id is absent |
name of the group under which the topic will be created |
|
category ID |
required if category_title is absent |
|
|
override the query |
required |
|
|
the topic query |
required |
|
|
to determine if the query added is to be included or excluded |
required |
|
|
specifies the topic type |
optional |
|
Create/edit a topic
curl -XPOST 'https://api.talkwalker.com/api/v2/talkwalker/p/<project_id>/topics/import?access_token=<access_token>'
-d '<topic_definition>'
-H 'Content-Type: application/json; charset=UTF-8'
{
"topic_type": "search",
"topic_line_import": [
{
"topic_title": "my new topic",
"topic_description": "This is a topic description",
"category_title": "my new group",
"override": false,
"query": "cats",
"included_query": true
}
]
}
{
"topic_type": "search",
"topic_line_import": [
{
"topic_title": "my new ungrouped topic",
"topic_description": "This is a topic description",
"category_id": "default",
"override": false,
"query": "cats",
"included_query": true
}
]
}
{
"topic_type": "search",
"topic_line_import": [
{
"topic_id": "<topic_id>",
"topic_title": "changed topic name",
"topic_description": "This is an updated topic description",
"category_id": "default",
"override": true,
"query": "changed query",
"included_query": true
},
{
"topic_id": "<topic_id>",
"topic_title": "changed topic name",
"topic_description": "This is an updated topic description",
"category_id": "default",
"override": true,
"query": "exclude these words",
"included_query": false
}
]
}
Delete a topic
curl -XPOST 'https://api.talkwalker.com/api/v2/talkwalker/p/<project_id>/topics/delete?access_token=<access_token>'
-d '{
"bulk_topics": [
{
"id": "<topic1_id>",
"category_id": "default",
"topic_type": "FILTER"
},
{
"id": "<topic2_id>",
"category_id": "<category1_id>"
}
]
}'
-H 'Content-Type: application/json; charset=UTF-8'
To make sure that the topic has been successfully deleted, the success field in the call response should be set to true
{
"status_code": "0",
"status_message": "OK",
"request": "POST /api/v2/talkwalker/p/<project_id>/topics/delete?access_token=<access_token>",
"result_topic_deletion": {
"bulk_answer": [
{
"id": "kowmxxxx_5imm9xxxxxxx",
"category_id": "default",
"success": true
},
{
"id": "exampleid123",
"category_id": "examplecategoryid123",
"success": true
}
]
}
}
Retrieve the definition of all topics
curl 'https://api.talkwalker.com/api/v2/talkwalker/p/<project_id>/topics/list?access_token=<access_token>'
{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v2/talkwalker/p/<project_id>/topics/list?access_token=<access_token>",
"result_topics": {
"topic_categories": [
{
"id": "default",
"title": "",
"description": "",
"query_topics": [
{
"id": "<topic_id>",
"title": "topic name",
"description": "",
"included_queries": [
"query"
],
"excluded_queries": [
"query"
]
}
]
},
{
"id": "<category_id>",
"title": "category name",
"description": "",
"query_topics": [
{
"id": "<topic_id>",
"title": "topic name",
"description": "",
"included_queries": [
"query"
]
}
]
}
],
"filter_categories": [
{
"id": "default_filter",
"title": "",
"description": "",
"query_topics": [
{
"id": "<filter_id>",
"title": "filter name",
"description": "",
"included_queries": [
"query"
]
}
]
},
{
"id": "<converted_from_topic>",
"title": "",
"description": "",
"query_topics": [
{
"id": "<filter_id>",
"title": "filter name",
"description": "",
"included_queries": [
"query"
]
}
]
},
{
"id": "<category_id>",
"title": "filter category name",
"description": "",
"query_topics": [
{
"id": "<filter_id>",
"title": "filter name",
"description": "",
"included_queries": [
"query"
]
}
]
}
]
}
}
Retrieve the definition of a specific topic
In order to only get the definition of a specific topic when calling the endpoint, we can add the topic ID
to the endpoint.
curl 'https://api.talkwalker.com/api/v2/talkwalker/p/<project_id>/topics/list/<topic_id>?access_token=<access_token>'
{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v2/talkwalker/<project_id>/topics/list/<topic_id>?access_token=<access_token>",
"result_topic_details": {
"id": "<topic_id>",
"title": "topic name",
"description": "",
"category_id": "<category_id>",
"category_title": "",
"category_description": "",
"included_queries": [
"query"
],
"excluded_queries": [
"query"
],
"type": "SEARCH"
}
}
Talkwalker Image API
The Talkwalker Image API can be used to detect features or entities in images.
Images can be provided either as a URL or as a file-upload (a POST
request containing multipart/form-data
).
curl -XGET 'https://api.talkwalker.com/api/v2/detect/images/<type>?access_token=<access_token>&image_url=<image_url>'
curl -XPOST 'https://api.talkwalker.com/api/v2/detect/images/<type>?access_token=<access_token>' -F 'image_file=@image.jpg'
image_url
is a URL-encoded URL (e.g. http%3A%2F%2Fwww.my-url.com%2Fimage.jpg`).
type
defines the mode of the detection, the options are logo
, object
, scene
.
The image_url parameter can only be used with specific URLs.
For example URLs that start with a specific prefix (http://www.my-url.com in the example).
Please contact Talkwalker to have your account configured.
|
{
"status_code": "0",
"status_message": "OK",
"request": "GET /api/v2/detect/images/logo?access_token=<access-token>&image_url=http%3A%2F%2Fwww.my-url.com%2Fimage.jpg",
"result_processing": {
"match": [
{
"top": 848,
"left": 678,
"right": 983,
"bottom": 1155,
"confidence": 0.9657765030860901,
"id": "starbucks"
},
{
"top": 880,
"left": 1520,
"right": 1826,
"bottom": 1209,
"confidence": 0.9366655349731445,
"id": "starbucks"
},
{
"top": 672,
"left": 239,
"right": 521,
"bottom": 916,
"confidence": 0.9254658222198486,
"id": "starbucks"
}
],
"height": 1365,
"width": 2048
}
}
By adding the parameter detect
, one or multiple Talkwalker image IDs can be
specified to restrict the detection to only these.
match
contains the list of matched logos, each with their Talkwalker image ID and a confidence
value from 0 to 1.
If the selected type is logo
, the position of the match is given by the top
, left
, right
and bottom
corners of the bounding box.
Additional logos, scenes and objects can be added on request.
Talkwalker Query Syntax
A single search query can support up to 50 operands and be up to 4096 characters long in length. To create complex queries, operands may be combined using Boolean Operators.
All queries are executed in their unaccented and case insensitive form, thus a search for "éléVE" will also match all documents with the word "eleve". No language stemming is being done, thus a search for the "children" won’t return results with the word "child". Non Latin characters and Emojis can also be used for creating queries.
Note: Below lists are continuously extended, as we are adding new fields on an irregular basis.
Special Transformations
These transformations apply when a query contains no operators from the query syntax (quotes, AND
, OR
, wildcards etc, see below).
Words with only capital letters (and special chars +-&
) are executed as exact (case sensitive) raw data search (ABC
= ++"ABC"
, A&B
= ++"A&B"
).
Screen names (@name
), hashtags (#hashtag
), cashtags ($cashtag
) as well as words containing a dash (-
), a plus (+
) or an ampersand (&
) are executed as (case insensitive) raw data search (@username
= +"@username"
, p&t
= +"p&t"
).
If a query contains multiple simple words (no special characters like (#@+-&
), no operators and is not only capital letters, it is executed as a proximity search.
The maximum number of jumps is set to (#words - 1) * 10 (cat dog mouse bird
= "cat dog mouse bird"~30
).
To prevent this behaviour use the explicit query syntax below. (instead of cat dog mouse
use cat OR dog OR mouse
, cat AND dog AND mouse
or "cat dog mouse"
to search for one of the words, all the words or the exact phrase.
Boolean Operators
AND |
|
|
AND NOT |
|
|
OR |
|
|
Exclusion of Keywords |
Negative filters can be created by using the operator |
|
Phrase Search |
Quotes |
|
Combinations |
Brackets |
|
Wildcard Search |
The Wildcard operator |
|
Wildcard Search – one character |
The question mark |
|
Proximity Search |
The tilde symbol |
|
Fuzzy X Search |
The tilde symbol |
|
Fuzzy Search |
The tilde symbol |
|
Raw Data Search |
A simple |
|
Exact Raw Data Search |
Two |
|
NEAR/x |
The |
|
ONEAR/x |
Same as |
|
Sentence Search |
The SENTENCE operator works similar to the NEAR/x operator. It searches for keywords that appear in the same sentence. SENTENCE can also be used with multiple terms. |
|
Ordered Sentence Search |
Same as |
|
Note:
In phrase search and raw data phrase search (""
or +""
) the number and type white space characters are ignored.
For example "BMW series"
(one space) will also match documents which contain "BMW series"
(two spaces) and vice versa.
White space characters include spaces, tabs and new line characters, also transitions between letters and special characters are considered as whitespace.
For example +"P&T"
will match P&T
but also P& T
and P & T
.
Advanced Search Options:
Single Keyword Search |
Search for simple brands, products, keywords, etc. |
|
Title Search |
It searches within the title of an article. |
|
Content Search |
It searches within the article |
|
Author Search |
It searches for authors of articles. |
|
Author Short Name Search |
Search for a specific author short name (case sensitive). |
|
Mention Search |
Search for a mention of a user (case sensitive). |
|
Hashtag Search |
Search for a hashtag (case sensitive). |
|
Language Search |
It searches for languages of articles. |
|
Author Gender Search |
Search only for |
|
Author ID Search |
Search for posts by the author with a certain ID. |
|
Source ID Search |
Search for posts, that were released at the source with a certain ID. |
|
Author Description Search |
Search in the description (biography) of the author with |
|
Source Country Restriction |
It searches for the country of origin of sources. |
|
Author Country Restriction |
Search for articles by authors from a specific country, |
|
Source Type Restriction |
|
|
Comments Search |
Find only comments by setting |
|
Retweets Search |
Find only retweets with |
|
Twitter Reply Search |
Find only tweets that are replies to other tweets |
|
Verified Author Search |
Find only documents from verified authors |
|
Verified Source Search |
Find only documents from verified sources |
|
Questions Search |
Search for questions. |
|
Has Image Search |
|
|
Has Audio Search |
|
|
Has Video Search |
|
|
Talkwalker Tags Search |
|
|
Score Search |
|
|
Post Type Search |
|
|
Device Search |
Find only documents from certain devices. Possible values are BOT MOBILE_IOS MOBILE_ANDROID MOBILE_WINDOWS MOBILE_BLACKBERRY MOBILE_OTHER TABLET_IOS TABLET_ANDROID TABLET_WINDOWS TABLET_OTHER PC OFFICIAL_WEBSITE EXTERNAL_WEBSITE. |
|
Image Search |
|
|
Demographic Search |
|
|
Customer Entity Search |
|
|
Custom Field Search |
|
|
For sourcetype
, posttype
, lang
, sourcegeo
, image
(objects and scenes) and demographic
(familiy-status, occupation and interest) see Value options or use the suggestions in the query editor.
Brand related images (logos) and cities are only suggested by the query editor.
Url based Search
Url Search |
|
|
Parent Url Search |
|
|
Host Url Restriction |
|
|
Domain Url Restriction |
|
|
Resolved Url Restriction |
|
|
Resolved Domain Url Restriction |
|
|
Site Search |
|
|
In Urls Search |
|
|
All URL based search features are case sensitive.
Metric (Minimum / Maximum) Restrictions
metric_name:>n
, metric_name:<n
and metric_name:n
return only documents which match a specific value or range of a metric.
Following tables explains the possible metrics
metric_name | Description | Example |
---|---|---|
|
The reach of an article/post represents the number of people who were reached by this article/post. |
|
|
The engagement of an article/post is the sum of actions made by others on that article/post. |
|
|
Number of views a Dailymotion video has |
|
|
Number of Facebook share an article has |
|
|
Number of Facebook likes an article has |
|
|
Number of Twitter retweets an article has |
|
|
Number of Twitter share an article has |
|
|
Number of Twitter followers a source has |
|
|
Number of Twitter likes an article has |
|
|
Number of YouTube views a video has |
|
|
Number of YouTube likes a video has |
|
|
Number of YouTube dislikes a video has |
|
|
Number of circulated copies of the source |
|
|
Number of printed copies of the source |
|
|
Number of sold copies of the source |
|
|
Number of alexa page views |
|
|
Number of alexa unique visitors |
|
|
Number of Instagram likes a post has |
|
|
Number of Instagram followers a post has |
|
|
Number of Comments an article has |
|
|
Timestamp of publication (epoch time in milliseconds) |
|
|
Timestamp of indexation in Talkwalker (epoch time in milliseconds) Note: The metric |
|
|
Get a random sample of the results (percent of the total number of results i.e. setting 25 will return one of four the documents) values:1-100 |
|
|
Similar to sample, with higher precision (i.e. setting 2000 will return one of 500 documents) values:1-1000000 |
|
|
Porn level of an article (0-100) |
|
|
Spam level of an article (0-100) |
|
|
The detected sentiment of the article (values -5 (negative) to 5 (positive)). |
|
|
The number of words in the content |
|
|
Based on review ratings. Please note that Talkwalker review ratings go to 10 in order to respect half-steps. A 5 star review on amazon equals a rating of 10. |
|
Geographic Restrictions
Note: Some documents have precise geographic data in form of GPS measured coordinates provided by the source. For other documents this data is based on source metadata, with a certain precision level.
These levels (ordered from lowest precision to highest) are: country
, region
and city
(extracted data) and coordinates
(exact data).
The coordinates for lower precision geographic data are equal to their capital.
Geographic restrictions exist for
sourcegeo : location of the source of the articles (e.g. a page, a site, a publication).
authorgeo : general location of the author of the article.
articlegeo : location where the article was written.
Restriction | Description | Example |
---|---|---|
|
Restricts the results to a geographic area defined either by coordinates ( |
|
|
Restricts to documents that have a minimum precision level of location data. Possible levels are |
|
|
Restricts to documents that have a specific geo detection accuracy. Options are |
|
Example: Search for documents that are in a box that roughly corresponds to Luxembourg and have exact coordinates.
Luxembourg’s north end is at around 50.3°
, south is at 49.4°
, west at 5.7°
and east at 6.5°
, the upper left corner is 50.3,5.7
the lower right corner is 49.4,6.5
. The final query is :
sourcegeo:50.3,5.7;49.4,6.5 AND sourcegeo_resolution:coordinates
.
Special Query Modifiers
All queries are executed in their unaccented and case insensitive form on the content and the title of documents. To change this behaviour, use flag:<modifier_name>
to enable special query modes.
Modifier Name | Description | Example |
---|---|---|
|
Query will also match URLs and links (e.g. in |
|
|
Query will also match author field |
|
|
Query will also match author description field |
|
|
Use Raw data search as default. All keywords are considered as case-insensitive exact character string including special characters and punctuation. |
|
|
Use Exact raw data search as default. All keywords are considered as case-insensitive exact character string including special characters and punctuation. |
|
The special modifiers can be combined: carsharing flag:matchauthor flag:matchfuzzywords
searches for words like carsharing, car sharing or car-sharing in the fields title
, content
and author_name
.
Note: When matchinurls
or matchauthor
is set, API results will not have highlighting in snippets when one of these fields is matched.
Talkwalker Documents
Note: Below lists are continuously extended, as we are adding new fields on an irregular basis.
Fields
field_name | datatype | deprecated⁴ | accepted dataformat | required | writable | default | Example |
---|---|---|---|---|---|---|---|
url |
string |
no |
url¹ |
yes |
yes |
- |
|
published |
long |
no |
timestamp in ms |
yes |
yes |
- |
|
title |
string |
no |
<500 chars |
no |
yes |
- |
|
content |
string |
no |
<50,000 chars |
yes |
yes |
- |
|
indexed |
long |
no |
- |
no |
no |
- |
|
search_indexed |
long |
no |
- |
no |
no |
- |
|
title_snippet |
string |
no |
- |
no |
no |
- |
|
content_snippet |
string |
no |
- |
no |
no |
- |
|
root_url |
string |
no |
- |
no |
no |
extracted from url |
|
domain_url |
string |
no |
- |
no |
no |
extracted from url |
|
host_url |
string |
no |
- |
no |
no |
extracted from url |
|
parent_url |
string |
no |
- |
no |
yes |
- |
|
lang |
string |
no |
2 char iso |
no |
yes |
detected from content |
|
porn_level |
integer |
no |
0..100 |
no |
yes |
- |
|
fluency_level |
integer |
yes |
0..100 |
no |
yes |
- |
|
spam_level |
integer |
yes |
0..100 |
no |
yes |
- |
|
noise_level |
integer |
no |
0..100 |
no |
yes |
- |
|
noise_category |
string |
no |
see list² |
no |
yes |
- |
|
sentiment |
integer |
no |
-5..5 |
no |
yes |
0 |
|
reach |
integer |
no |
>0 |
no |
yes |
- |
|
engagement |
integer |
no |
>0 |
no |
yes |
- |
|
rating |
integer |
no |
0..10 |
no |
yes |
- |
|
fakenews_level |
integer |
no |
0..100 |
no |
yes |
- |
|
provider |
string |
no |
a-z0-9_ <100 chars |
no |
yes |
- |
|
source_type |
list of string |
no |
see list² |
no |
yes |
"OTHER" |
|
post_type |
list of string |
no |
see list² |
no |
yes |
"TEXT" |
|
cluster_id |
string |
no |
- |
no |
no |
- |
|
meta_cluster_id |
string |
no |
- |
no |
no |
- |
|
tags_internal |
list of string |
no |
- |
no |
no |
- |
|
tags_marking |
list of string |
no |
see list² |
no |
yes |
- |
|
tags_customer |
list of string |
no |
see ³ |
no |
yes |
- |
|
tags_plugin |
list of string |
no |
see ³ |
no |
yes |
- |
|
matched_query |
string |
no |
no |
no |
- |
||
matched_profile |
string |
no |
no |
no |
- |
||
images |
list of image |
no |
see below |
no |
see below |
- |
|
videos |
list of video |
no |
see below |
no |
see below |
- |
|
article_extended_attributes |
article_extended_attributes |
no |
see below |
no |
see below |
- |
|
source_extended_attributes |
source_extended_attributes |
no |
see below |
no |
see below |
- |
|
extra_article_attributes |
extra_article_attributes |
no |
see below |
no |
see below |
- |
|
extra_author_attributes |
extra_author_attributes |
no |
see below |
no |
see below |
- |
|
extra_source_attributes |
extra_source_attributes |
no |
see below |
no |
see below |
- |
|
customer_entities |
list of customer_entity |
no |
see below |
no |
see below |
- |
|
entity_url |
list of entities |
no |
no |
no |
- |
|
|
word_count |
integer |
no |
>0 |
no |
yes |
- |
|
copyright |
string |
no |
no |
yes |
- |
|
See the chapter on Protocols, Encodings and Value Field Options for possible values for the fields sourcetype
, lang
, or geo
.
¹ Can not be changed after creating a new document.
² See list of value options.
³ tags_customer
: a-zA-Z0-9-
or space, supports hierarchy using /
, can only be set in project specific documents, not in general document import.
` tags_plugin` : have to be in the form <vendor_id><vendor_field>:<value>
⁴ Deprecated fields values are not used anymore by the backend.
These fields can be removed in a future release.
Twitter exclusive fields
Unlike other result types, some parameters are not filled (url, title…) for Twitter results since they do not allow to export their data.
Instead, fields with "source_type" : [ "SOCIALMEDIA", "SOCIALMEDIA_TWITTER" ] include three special fields unique to Twitter results.
|
field_name | datatype | accepted dataformat | required | writable | Example |
---|---|---|---|---|---|
external_provider |
string |
- |
no |
no |
|
external_id |
string |
- |
no |
no |
|
external_author_id |
string |
- |
no |
no |
|
{
"matched_profile": [
...
],
"indexed": 1622576620359,
"search_indexed": 1622576629341,
"lang": "en",
"fluency_level": 80,
"sentiment": 0,
"source_type": [ "SOCIALMEDIA", "SOCIALMEDIA_TWITTER" ],
"post_type": [ "TEXT" ],
"external_provider": "twitter",
"external_id": "<twitter id>",
"external_author_id": "<twitter author id>"
}
Content
Talkwalker provides result snippets for all content. In all cases, the content
field only contains the first words of the document, in addition, we provide
the part of the document which matches the query in the content_snippet
field.
In the Streaming API a snippet is provided for every matching rule.
URLs
To filter on specific websites in a query, the fields domain_url
and host_url
can be used.
host_url
is used for specific hosts like www.talkwalker.com
or blog.talkwalker.com
, while domain_url
would filter on all host in a specific domain
(i.e. domain_url:blog.talkwalker.com
would return all results of the domain talkwalker.com
also those from www.talkwalker.com
while host_url:blog.talkwalker.com
would return only results
from blog.talkwalker.com
not from www.talkwalker.com
).
Sentiment
Talkwalker uses natural language processing (NLP) to compute a general sentiment for the documents in our index. The accuracy of automatic detection is limited by irony, sarcasm and misspellings in the documents.
Reach
The reach of an article/post represents the number of people who were reached by this article/post.
Note that the views only get set to a proper value if the host of the URL is either a domain (like theguardian.com) or if it is a domain with a well-known 3rd-level-subdomain in front (mainly applies to www, e.g. www.theguardian.com).
Reach is set to 0 for other hosts, i.e. hosts with other 3rd-level-subdomains, like on foobar.blogspot.com, as using the Alexa views of the domain would assign much too high reach to mere sub-hosts otherwise.
For imported documents reach
can be set via the Talkwalker API.
Reach is calculated in the following ways:
Blogs; News Sites; Forums: Number of Page Views
Facebook: The Number of Fans of the Page (Note: Only available for public pages, which are monitored by Talkwalker, we don’t collect any fan counts for user profiles)
Twitter: The number of Followers of the author
image object
field_name | datatype | accepted dataformat | required | writable | Example |
---|---|---|---|---|---|
url |
string |
normalized url |
yes |
yes |
|
legend |
string |
<1000 chars |
no |
yes |
|
width |
integer |
no |
yes |
||
height |
integer |
no |
yes |
"images" : [{
"url" : "http://www.example.com/image1.jpg",
"legend" : "Lorem ipsum dolor sit amet, consectetur adipiscing elit"
}]
video object
field_name | datatype | accepted dataformat | required | writable | Example |
---|---|---|---|---|---|
url |
string |
normalized url |
yes |
yes |
|
legend |
string |
<1000 chars |
no |
yes |
|
width |
integer |
no |
yes |
||
height |
integer |
no |
yes |
"videos" : [{
"url" : "http://www.example.com/video1.mpg",
"legend" : "Lorem ipsum dolor sit amet, consectetur adipiscing elit"
}]
customer_entity object
field_name | datatype | accepted dataformat | required | writable | Example |
---|---|---|---|---|---|
type |
string |
person, place, organization, keyword, iptc, source_distribution_type |
yes |
yes |
|
id |
list of string (ordered) |
<100 chars |
yes |
yes |
|
The Talkwalker user interface will only show the first 2 levels of Ids.
The list of Ids defines a hierarchy where each Id defines a superset of the following Ids.
See list of id options for source_distribution_type
in chapter "formats".
Examples:
"customer_entities" : [{
"type": "Place",
"id": ["USA", "Austin TX"]
}]
"customer_entities" : [{
"type": "organization",
"id": ["UNO"]
}, {
"type": "organization",
"id": ["WHO"]
}]
Attributes
These fields are only set for certain post types.
Article extended attributes fields will be updated for up to 1 month.
The source extended attributes represent the exact value at publication.
Not all urls will have all meta data, e.g.:
-
Blog, news and messageboard posts (not their comments), will only have facebook_shares, twitter_shares set.
-
All the other types will only be set if the sourcetype is of the same type and if the data is available.
article_extended_attributes object
field_name | datatype | accepted dataformat | required | writable |
---|---|---|---|---|
facebook_shares |
long |
>0 |
no |
yes |
facebook_likes |
long |
>0 |
no |
yes |
twitter_retweets |
long |
>0 |
no |
yes |
twitter_likes |
long |
>0 |
no |
yes |
twitter_shares |
long |
>0 |
no |
yes |
url_views |
long |
>0 |
no |
yes |
pinterest_likes |
long |
>0 |
no |
yes |
pinterest_pins |
long |
>0 |
no |
yes |
pinterest_repins |
long |
>0 |
no |
yes |
youtube_views |
long |
>0 |
no |
yes |
youtube_likes |
long |
>0 |
no |
yes |
youtube_dislikes |
long |
>0 |
no |
yes |
linkedin_shares |
long |
>0 |
no |
yes |
linkedin_likes |
long |
>0 |
no |
yes |
dailymotion_views |
long |
>0 |
no |
yes |
vkontakte_likes |
long |
>0 |
no |
yes |
vkontakte_shares |
long |
>0 |
no |
yes |
instagram_likes |
long |
>0 |
no |
yes |
source_extended_attributes object
field_name | datatype | accepted dataformat | required | writable |
---|---|---|---|---|
alexa_pageviews |
long |
>0 |
no |
yes |
facebook_followers |
long |
>0 |
no |
yes |
twitter_followers |
long |
>0 |
no |
yes |
instagram_followers |
long |
>0 |
no |
yes |
pinterest_followers |
long |
>0 |
no |
yes |
linkedin_followers |
long |
>0 |
no |
yes |
print_run_print |
long |
>0 |
no |
yes |
print_run_sales |
long |
>0 |
no |
yes |
print_run_circulation |
long |
>0 |
no |
yes |
extra_article_attributes object
field_name | datatype | accepted dataformat | required | writable |
---|---|---|---|---|
world_data/continent |
string |
- |
no |
no |
world_data/country |
string |
- |
no |
no |
world_data/region |
string |
- |
no |
no |
world_data/city |
string |
- |
no |
no |
world_data/longitude |
double |
- |
no |
no |
world_data/latitude |
double |
- |
no |
no |
world_data/country_code |
string |
- |
no |
no |
world_data/resolution |
string |
- |
no |
no |
geo |
string |
see below |
no |
yes |
id |
string |
- |
no |
no |
type |
string |
- |
no |
no |
name |
string |
- |
no |
no |
gender |
string |
- |
no |
no |
image_url |
string |
- |
no |
no |
short_name |
string |
- |
no |
no |
url |
string |
- |
no |
no |
geo
is a write only field that is used to set the world_data
fields and can not be read afterwards.
extra_author_attributes object
field_name | datatype | accepted dataformat | required | writable |
---|---|---|---|---|
world_data/continent |
string |
- |
no |
no |
world_data/country |
string |
- |
no |
no |
world_data/region |
string |
- |
no |
no |
world_data/city |
string |
- |
no |
no |
world_data/longitude |
double |
- |
no |
no |
world_data/latitude |
double |
- |
no |
no |
world_data/country_code |
string |
- |
no |
no |
world_data/resolution |
string |
- |
no |
no |
geo |
string |
see below |
no |
yes |
id |
string |
- |
no |
no |
type |
string |
- |
no |
no |
name |
string |
- |
no |
yes |
gender |
string |
- |
no |
yes |
image_url |
string |
url |
no |
no |
short_name |
string |
no space allowed |
no |
yes |
url |
string |
url |
no |
yes |
geo
is a write only field that is used to set the world_data
fields and can not be read afterwards.
extra_source_attributes
field_name | datatype | accepted dataformat | required | writable |
---|---|---|---|---|
world_data/continent |
string |
- |
no |
no |
world_data/country |
string |
- |
no |
no |
world_data/region |
string |
- |
no |
no |
world_data/city |
string |
- |
no |
no |
world_data/longitude |
double |
- |
no |
no |
world_data/latitude |
double |
- |
no |
no |
world_data/country_code |
string |
- |
no |
no |
world_data/resolution |
string |
- |
no |
no |
geo |
string |
see below |
no |
yes |
id |
string |
- |
no |
no |
type |
string |
- |
no |
no |
name |
string |
- |
no |
yes |
gender |
string |
- |
no |
no |
image_url |
string |
url |
no |
yes |
short_name |
string |
- |
no |
no |
url |
string |
url |
no |
yes |
geo
is a write only field that is used to set the world_data
fields and can not be read afterwards.
Evolution and stability of document fields
The structure of the documents will not be changed. Existing fields will not be removed and their formatting will not be changed. Occasionally, new fields will be added to the documents and the order of fields can change, please take this into account when implementing a custom client.
Streaming
(repeated extra entries for each matching rule, available in streaming only)
highlighted_data object
On streaming, this information is present in an highlighted_data
object
field | Name | Write access through API | Comment |
---|---|---|---|
|
matched rule |
- |
ID of matched rule |
|
matched rule |
- |
Query of matched rule (when ID is not set) |
|
matched stream |
- |
ID of matched stream |
|
matched panel |
- |
ID of matched panel |
|
matched project |
- |
ID of matched Talkwalker project |
|
matched profiles |
- |
Profiles which matched (if Talkwalker project) |
|
Title Snippet |
- |
If a match occurred in the title, this field will contain the snippet related to the query set in the datafeed. |
|
Content Snippet |
- |
If a match occurred in the article, this field will contain the snippet related to the query set in the datafeed. |
Protocols, Encodings and Value Field Options
Protocols and Encodings
The Talkwalker API uses HTTP protocol 1.1. The Streaming API streams documents using the HTTP 1.1 Chunked transfer encoding mechanism.
The data is compressed using gzip: "Accept-Encoding:gzip" must be set in the header.
The Encoding used is UTF-8
.
The maximum size for POST and PUT requests is 5120 kB.
Evolution of JSON fields
The structure of the json responses will not be changed. Existing fields will not be removed and their formatting will not be changed. However, new fields will be added to the responses and the order of fields can change, please take this into account when implementing a custom client.
Value options
The Following tables contain possible options and formats for certain fields.
Below lists are continuously extended, as we are adding new fields on an irregular basis. |
Source Type Options
Media Source Types | |
---|---|
|
All news sites |
|
Printed magazines online sites |
|
Printed newspaper online sites |
|
Results from sites that publish press releases |
|
Results from TV or radio station sites |
|
News agency sites |
|
News results that do not fall under of the other news categories |
|
All print articles |
|
Articles from printed magazines |
|
Articles from printed newspapers |
|
Other printed articles |
|
Newsletters |
|
All blog sites |
|
All forums and message boards |
|
All social media sites |
|
Results from Twitter |
|
Results from Facebook |
|
Results from YouTube |
|
Results from LinkedIn |
|
Results from Google+ |
|
Results from Flickr |
|
Results from Foursquare |
|
Results from Instagram |
|
Results from Pinterest |
|
Results from Mixcloud |
|
Results from SoundCloud |
|
Results from Vimeo |
|
Results from Dailymotion |
|
Results from Weibo |
|
Results from vk.com |
|
Results from Vine |
|
Results from Disqus, cannot be delivered via API as blocked by compliance |
|
Results from Ekşi Sözlük |
|
Results from Twitch |
|
Results from TikTok |
|
Podcasts |
|
Results from review sites, cannot be delivered via API as blocked by compliance |
|
All articles from broadcast |
|
Articles from radio |
|
Articles from TV |
|
Everything else which does not fit into the above listed categories |
Language Options
ABKHAZIAN |
|
HERERO |
|
OSSETIAN |
|
AFAR |
|
HINDI |
|
PALI |
|
AFRIKAANS |
|
HIRI MOTU |
|
PANJABI |
|
AKAN |
|
HUNGARIAN |
|
PERSIAN |
|
ALBANIAN |
|
ICELANDIC |
|
POLISH |
|
AMHARIC |
|
IDO |
|
PORTUGUESE |
|
ARABIC |
|
IGBO |
|
PUSHTO |
|
ARAGONESE |
|
INDONESIAN |
|
QUECHUA |
|
ARMENIAN |
|
INTERLINGUA |
|
RAETO ROMANCE |
|
ASSAMESE |
|
INTERLINGUE |
|
ROMANIAN |
|
AVARIC |
|
INUKTITUT |
|
RUNDI |
|
AVESTAN |
|
INUPIAQ |
|
RUSSIAN |
|
AYMARA |
|
IRISH |
|
SAMOAN |
|
AZERBAIJANI |
|
ITALIAN |
|
SANGO |
|
BAMBARA |
|
JAPANESE |
|
SANSKRIT |
|
BASHKIR |
|
JAVANESE |
|
SARDINIAN |
|
BASQUE |
|
KANNADA |
|
SCOTTISH GAELIC |
|
BELARUSIAN |
|
KANURI |
|
SERBIAN |
|
BENGALI |
|
KASHMIRI |
|
SHONA |
|
BIHARI |
|
KAZAKH |
|
SICHUAN YI |
|
BISLAMA |
|
KHMER |
|
SINDHI |
|
BOSNIAN |
|
KIKUYU |
|
SINHALESE |
|
BRETON |
|
KINYARWANDA |
|
SLOVAK |
|
BULGARIAN |
|
KIRGHIZ |
|
SLOVENIAN |
|
BURMESE |
|
KOMI |
|
SOMALI |
|
CATALAN |
|
KONGO |
|
SOUTHERN SOTHO |
|
CHAMORRO |
|
KOREAN |
|
SOUTH NDEBELE |
|
CHECHEN |
|
KURDISH |
|
SPANISH |
|
CHINESE SIMPLIFIED |
|
KWANYAMA |
|
SUNDANESE |
|
CHINESE TRADITIONAL |
|
LAO |
|
SWAHILI |
|
CHURCH SLAVIC |
|
LATIN |
|
SWATI |
|
CHUVASH |
|
LATVIAN |
|
SWEDISH |
|
CORNISH |
|
LIMBURGISH |
|
TAGALOG |
|
CORSICAN |
|
LINGALA |
|
TAHITIAN |
|
CREE |
|
LITHUANIAN |
|
TAJIK |
|
CROATIAN |
|
LUBA KATANGA |
|
TAMIL |
|
CZECH |
|
LUXEMBOURGISH |
|
TATAR |
|
DANISH |
|
MACEDONIAN |
|
TELUGU |
|
DIVEHI |
|
MALAGASY |
|
THAI |
|
DUTCH |
|
MALAY |
|
TIBETAN |
|
DZONGKHA |
|
MALAYALAM |
|
TIGRINYA |
|
ENGLISH |
|
MALTESE |
|
TONGA |
|
ESPERANTO |
|
MANX |
|
TSONGA |
|
ESTONIAN |
|
MAORI |
|
TSWANA |
|
EWE |
|
MARATHI |
|
TURKISH |
|
FAROESE |
|
MARSHALLESE |
|
TURKMEN |
|
FIJIAN |
|
MOLDAVIAN |
|
TWI |
|
FINNISH |
|
MONGOLIAN |
|
UIGHUR |
|
FRENCH |
|
NAURU |
|
UKRAINIAN |
|
FRISIAN |
|
NAVAJO |
|
URDU |
|
FULAH |
|
NDONGA |
|
UZBEK |
|
GALLEGAN |
|
NEPALI |
|
VENDA |
|
GANDA |
|
NORTHERN SAMI |
|
VIETNAMESE |
|
GEORGIAN |
|
NORTH NDEBELE |
|
VOLAPUK |
|
GERMAN |
|
NORWEGIAN |
|
WALLOON |
|
GREEK |
|
NORWEGIAN BOKMAL |
|
WELSH |
|
GREENLANDIC |
|
NORWEGIAN NYNORSK |
|
WOLOF |
|
GUARANI |
|
NYANJA |
|
XHOSA |
|
GUJARATI |
|
OCCITAN |
|
YIDDISH |
|
HAITIAN |
|
OJIBWA |
|
YORUBA |
|
HAUSA |
|
ORIYA |
|
ZHUANG |
|
HEBREW |
|
OROMO |
|
ZULU |
|
Country Options
AFGHANISTAN |
|
GIBRALTAR |
|
PALESTINE |
|
ALAND ISLANDS |
|
GREECE |
|
PANAMA |
|
ALBANIA |
|
GREENLAND |
|
PAPUA NEW GUINEA |
|
ALGERIA |
|
GRENADA |
|
PARAGUAY |
|
AMERICAN SAMOA |
|
GUADELOUPE |
|
PERU |
|
ANDORRA |
|
GUAM |
|
PHILIPPINES |
|
ANGOLA |
|
GUATEMALA |
|
PITCAIRN |
|
ANGUILLA |
|
GUERNSEY |
|
POLAND |
|
ANTARCTICA |
|
GUINEA |
|
PORTUGAL |
|
ANTIGUA AND BARBUDA |
|
GUINEA BISSAU |
|
PUERTO RICO |
|
ARGENTINA |
|
GUYANA |
|
QATAR |
|
ARMENIA |
|
HAITI |
|
REUNION |
|
ARUBA |
|
HEARD ISLAND AND MCDONALD ISLANDS |
|
ROMANIA |
|
AUSTRALIA |
|
HONDURAS |
|
RUSSIA |
|
AUSTRIA |
|
HONG KONG |
|
RWANDA |
|
AZERBAIJAN |
|
HUNGARY |
|
SAINT BARTHELEMY |
|
BAHAMAS |
|
ICELAND |
|
SAINT HELENA |
|
BAHRAIN |
|
INDIA |
|
SAINT KITTS AND NEVIS |
|
BANGLADESH |
|
INDONESIA |
|
SAINT LUCIA |
|
BARBADOS |
|
IRAN |
|
SAINT MARTIN |
|
BELARUS |
|
IRAQ |
|
SAINT PIERRE AND MIQUELON |
|
BELGIUM |
|
IRELAND |
|
SAINT VINCENT AND THE GRENADINES |
|
BELIZE |
|
ISLE OF MAN |
|
SAMOA |
|
BENIN |
|
ISRAEL |
|
SAN MARINO |
|
BERMUDA |
|
ITALY |
|
SAO TOME AND PRINCIPE |
|
BHUTAN |
|
JAMAICA |
|
SAUDI ARABIA |
|
BOLIVIA |
|
JAPAN |
|
SENEGAL |
|
BONAIRE SINT EUSTASIUS AND SABA |
|
JERSEY |
|
SERBIA |
|
BOSNIA AND HERZEGOVINA |
|
JORDAN |
|
SERBIA AND MONTENEGRO |
|
BOTSWANA |
|
KAZAKHSTAN |
|
SEYCHELLES |
|
BOUVET ISLAND |
|
KENYA |
|
SIERRA LEONE |
|
BRAZIL |
|
KIRIBATI |
|
SINGAPORE |
|
BRITISH INDIAN OCEAN TERRITORY |
|
KUWAIT |
|
SINT MAARTEN |
|
BRITISH VIRGIN ISLANDS |
|
KYRGYZSTAN |
|
SLOVAKIA |
|
BRUNEI |
|
LAOS |
|
SLOVENIA |
|
BULGARIA |
|
LATVIA |
|
SOLOMON ISLANDS |
|
BURKINA FASO |
|
LEBANON |
|
SOMALIA |
|
BURUNDI |
|
LESOTHO |
|
SOUTH AFRICA |
|
CAMBODIA |
|
LIBERIA |
|
SOUTH GEORGIA AND THE SOUTH SANDWICH ISLANDS |
|
CAMEROON |
|
LIBYA |
|
SOUTH KOREA |
|
CANADA |
|
LIECHTENSTEIN |
|
SOUTH SUDAN |
|
CAPE VERDE |
|
LITHUANIA |
|
SPAIN |
|
CAYMAN ISLANDS |
|
LUXEMBOURG |
|
SRI LANKA |
|
CENTRAL AFRICAN REPUBLIC |
|
MACAO |
|
SUDAN |
|
CHAD |
|
MACEDONIA |
|
SURINAME |
|
CHILE |
|
MADAGASCAR |
|
SVALBARD AND JAN MAYEN |
|
CHINA |
|
MALAWI |
|
SWAZILAND |
|
CHRISTMAS ISLAND |
|
MALAYSIA |
|
SWEDEN |
|
COCOS ISLANDS |
|
MALDIVES |
|
SWITZERLAND |
|
COLOMBIA |
|
MALI |
|
SYRIA |
|
COMOROS |
|
MALTA |
|
TAIWAN |
|
CONGO |
|
MARSHALL ISLANDS |
|
TAJIKISTAN |
|
COOK ISLANDS |
|
MARTINIQUE |
|
TANZANIA |
|
COSTA RICA |
|
MAURITANIA |
|
THAILAND |
|
COTE DIVOIRE |
|
MAURITIUS |
|
THE DEMOCRATIC REPUBLIC OF CONGO |
|
CROATIA |
|
MAYOTTE |
|
TIMOR LESTE |
|
CUBA |
|
MEXICO |
|
TOGO |
|
CURACAO |
|
MICRONESIA |
|
TOKELAU |
|
CYPRUS |
|
MOLDOVA |
|
TONGA |
|
CZECH REPUBLIC |
|
MONACO |
|
TRINIDAD AND TOBAGO |
|
DENMARK |
|
MONGOLIA |
|
TUNISIA |
|
DJIBOUTI |
|
MONTENEGRO |
|
TURKEY |
|
DOMINICA |
|
MONTSERRAT |
|
TURKMENISTAN |
|
DOMINICAN REPUBLIC |
|
MOROCCO |
|
TURKS AND CAICOS ISLANDS |
|
ECUADOR |
|
MOZAMBIQUE |
|
TUVALU |
|
EGYPT |
|
MYANMAR |
|
UGANDA |
|
EL SALVADOR |
|
NAMIBIA |
|
UKRAINE |
|
EQUATORIAL GUINEA |
|
NAURU |
|
UNITED ARAB EMIRATES |
|
ERITREA |
|
NEPAL |
|
UNITED KINGDOM |
|
ESTONIA |
|
NETHERLANDS |
|
UNITED STATES |
|
ETHIOPIA |
|
NETHERLANDS ANTILLES |
|
UNITED STATES MINOR OUTLYING ISLANDS |
|
FALKLAND ISLANDS |
|
NEW CALEDONIA |
|
URUGUAY |
|
FAROE ISLANDS |
|
NEW ZEALAND |
|
US VIRGIN ISLANDS |
|
FIJI |
|
NICARAGUA |
|
UZBEKISTAN |
|
FINLAND |
|
NIGER |
|
VANUATU |
|
FRANCE |
|
NIGERIA |
|
VATICAN |
|
FRENCH GUIANA |
|
NIUE |
|
VENEZUELA |
|
FRENCH POLYNESIA |
|
NORFOLK ISLAND |
|
VIETNAM |
|
FRENCH SOUTHERN TERRITORIES |
|
NORTHERN MARIANA ISLANDS |
|
WALLIS AND FUTUNA |
|
GABON |
|
NORTH KOREA |
|
WESTERN SAHARA |
|
GAMBIA |
|
NORWAY |
|
YEMEN |
|
GEORGIA |
|
OMAN |
|
ZAMBIA |
|
GERMANY |
|
PAKISTAN |
|
ZIMBABWE |
|
GHANA |
|
PALAU |
|
Noise Category Options
Noise category Types | |
---|---|
|
Social Media Account Promotion |
|
Financial News |
|
Hate Speech |
|
Promotions |
|
Job Offers |
|
Real Estate |
|
Diet and Pharma |
|
Free content |
|
SEO and Scam |
Objects
Short Id | Name | Short Id | Name |
---|---|---|---|
|
airplane |
|
apple |
|
backpack |
|
barbecue |
|
bed |
|
bench |
|
bicycle |
|
bird |
|
boat |
|
book |
|
bottle |
|
bowl |
|
bus |
|
cake |
|
can |
|
car |
|
cat |
|
cell phone |
|
chair |
|
cigarette |
|
clock |
|
couch |
|
cow |
|
cup |
|
dining table |
|
dog |
|
donut |
|
elephant |
|
fire hydrant |
|
foreo espada |
|
foreo iris |
|
foreo issa |
|
foreo luna |
|
fork |
|
hair drier |
|
handbag |
|
horse |
|
keyboard |
|
kite |
|
knife |
|
laptop |
|
lipstick |
|
microwave |
|
motorcycle |
|
mouse |
|
oven |
|
parking meter |
|
payment terminal |
|
person |
|
pizza |
|
potted plant |
|
Red Bull can |
|
refrigerator |
|
sandwich |
|
scissors |
|
sink |
|
skateboard |
|
skis |
|
snowboard |
|
spoon |
|
sports ball |
|
stadium scoreboard |
|
stop sign |
|
suitcase |
|
surfboard |
|
teeth |
|
tennis racket |
|
tie |
|
toaster |
|
toilet |
|
toothbrush |
|
traffic light |
|
train |
|
truck |
|
tv |
|
umbrella |
|
vase |
|
wine glass |
Scenes
Short Id | Name | Short Id | Name |
---|---|---|---|
|
airport terminal |
|
amusement park |
|
anechoic chamber |
|
athletic field |
|
atrium |
|
auditorium |
|
badminton court |
|
balcony veranda |
|
ballroom |
|
bar |
|
baseball field |
|
basketball court |
|
bathroom |
|
batters box |
|
bazaar indoor |
|
beach |
|
bedroom |
|
boardwalk |
|
boat deck |
|
booth |
|
bowling alley |
|
boxing ring |
|
bridge |
|
building |
|
bus interior |
|
cafeteria |
|
campsite |
|
campus |
|
canal |
|
car interior |
|
casino indoor |
|
castle |
|
cavern indoor |
|
cemetery |
|
chalet |
|
church |
|
church indoor |
|
closet |
|
coast |
|
cockpit |
|
conference center |
|
conference room |
|
construction site |
|
corral |
|
courthouse |
|
courtroom |
|
courtyard |
|
covered bridge |
|
creek |
|
desert |
|
desert sand |
|
diner |
|
discotheque |
|
door |
|
doorway |
|
driveway |
|
engine room |
|
escalator |
|
exhibition |
|
field |
|
fire station |
|
fishpond |
|
football field |
|
forest |
|
forest road |
|
fountain |
|
garage |
|
garbage dump |
|
gas station |
|
general store indoor |
|
golf course |
|
gym |
|
harbor |
|
highway |
|
home office |
|
hospital |
|
house |
|
ice skating rink |
|
igloo |
|
indoor |
|
industrial area |
|
islet |
|
jacuzzi |
|
kitchen |
|
lake |
|
laundromat |
|
library |
|
lido deck |
|
lighthouse |
|
living room |
|
lobby |
|
locker room |
|
market outdoor |
|
marsh |
|
martial arts gym |
|
moat |
|
mosque |
|
mountain |
|
movie theater |
|
nursery |
|
ocean |
|
office |
|
oilrig |
|
open nature |
|
orchard |
|
outdoor |
|
palace |
|
panel table |
|
pantry |
|
park garden |
|
parking indoor |
|
parking outdoor |
|
pavilion |
|
phone booth |
|
plaza |
|
podium |
|
pond |
|
pub |
|
racecourse |
|
raceway |
|
raft |
|
restaurant |
|
riding arena |
|
ring |
|
river |
|
rostrum |
|
ruin |
|
runway |
|
sandbox |
|
sauna |
|
school |
|
server room |
|
shopping mall |
|
skatepark |
|
ski |
|
skyscraper |
|
squash court |
|
stable |
|
stage |
|
staircase |
|
store |
|
store outdoor |
|
street |
|
subway |
|
subway interior |
|
swamp |
|
swimming pool |
|
tennis |
|
theater |
|
tower |
|
train interior |
|
train station |
|
valley |
|
vegetable garden |
|
village |
|
vineyard |
|
volcano |
|
volleyball court |
|
volleyball court indoor |
|
waterfall |
|
wave |
|
wind farm |
|
windmill |
|
wine cellar bottle storage |
|
winter nature |
|
yard |
Family Status
Short Id | Name | Short Id | Name |
---|---|---|---|
|
Married |
|
Parents |
|
Senior |
|
Single |
Occupations
Short Id | Name | Short Id | Name |
---|---|---|---|
|
Accountant |
|
Actor |
|
Ambulanceman |
|
Architect |
|
Artist/ Art |
|
Author/Writer |
|
Blogger |
|
Celebrity |
|
Comedian |
|
Communication |
|
Construction Worker |
|
Consultant |
|
Customer service |
|
Designer |
|
DJ |
|
Engineer |
|
Entrepreneur |
|
Executive manager |
|
Financial Analyst |
|
Firefighter |
|
Health worker |
|
Hospitality |
|
Human Resources Professional |
|
Investor |
|
IT professional |
|
Journalist |
|
Kitchen staff |
|
Lawyer |
|
Manufacturing |
|
Marketing |
|
Military |
|
Model |
|
Musician |
|
Photographer |
|
Policeman |
|
Promoter |
|
Public Service Worker |
|
Publisher |
|
Realtor |
|
Sales |
|
Scientist |
|
Security officer |
|
Social Media |
|
Aid worker |
|
Sportsperson |
|
Student |
|
Stylist |
|
Teacher |
|
Trainer/ Coach |
|
Transportation |
|
TV/radio host |
Interests
Short Id | Name | Short Id | Name |
---|---|---|---|
|
Advertising & Marketing |
|
Animals |
|
Apparel |
|
Art |
|
Automotives General |
|
Boats & Watercraft |
|
Business News |
|
Business Services |
|
Celebrities & Entertainment News |
|
Classic Vehicles |
|
Colleges & Universities |
|
Comics & Animation |
|
Computer Hardware |
|
Consumer Electronics |
|
Crafts |
|
Discount & Outlet Stores |
|
Employment |
|
Face & Body Care |
|
Family and Parenting |
|
Fantasy Sports |
|
Fashion world |
|
Finance |
|
Fitness & Health |
|
Food & Drinks |
|
Fun |
|
Games & Puzzles |
|
Gardening & Landscaping |
|
General beauty |
|
General Education |
|
Global News |
|
Government |
|
Home Furnishings and Improvement |
|
Humor |
|
Legal |
|
Literature/Books |
|
Motorcycles |
|
Movies |
|
Music & Audio |
|
Online/ Video Games |
|
Outdoors |
|
Primary & Secondary Schooling (K 12) |
|
Programming |
|
Real Estate |
|
Science |
|
Social Media |
|
Sports |
|
Toys |
|
Travel |
|
TV |
|
Vehicles General |
|
Weather |
Devices
Name | Mapped Id | Official website | OFFICIAL_WEBSITE |
---|---|---|---|
|
MOBILE_ANDROID |
|
MOBILE_IOS |
|
MOBILE_WINDOWS |
|
MOBILE_BLACKBERRY |
|
MOBILE_OTHER |
|
TABLET_ANDROID |
|
TABLET_IOS |
|
TABLET_WINDOWS |
|
TABLET_OTHER |
|
PC |
|
BOT |
|
EXTERNAL_WEBSITE |
FAQ
How to get all new documents from a Talkwalker project?
The following command accesses data from a Talkwalker project. This data access takes place in real-time data and applies the rules defined in the given Talkwalker project.
curl -XGET 'https://api.talkwalker.com/api/v3/stream/p/<project_id>/result?access_token=<access_token>'
Using a POST request instead of a GET request allows for additional filtering of the results, for example by setting a query.
curl -XPOST 'https://api.talkwalker.com/api/v3/stream/p/<project_id>/result?access_token=<access_token>'
-d '{"q":"<query>"}'
-H "Content-Type: application/json; charset=UTF-8"
What does an empty collector query mean?
It is possible to leave the collector_query
empty when creating a new collector:
curl -XPUT 'https://api.talkwalker.com/api/v3/stream/c/<collector_id>?access_token=<access_token>'
-d '{"collector_query" : {}}'
-H "Content-Type: application/json; charset=UTF-8"
This collector will not collect any live data, instead it only serves as a target for export tasks.
How to get all documents from a Talkwalker project for a certain time period in the past?
In order to access past results from a Talkwalker project, an export task is used, and the exported results are sent to a collector. First, an empty collector is created. Empty collectors do not collect data in real-time, so the newly created collector remains paused while receiving all data from the export task.
curl -XPUT 'https://api.talkwalker.com/api/v3/stream/c/<collector_id>?access_token=<access_token>'
-d '{"collector_query" : {}}'
-H "Content-Type: application/json; charset=UTF-8"
Once the collector is created, it is used as target of a project export task. Export tasks require a start date or timestamp and can take an optional stop date or timestamp.
curl -XPOST 'https://api.talkwalker.com/api/v3/stream/p/<project_id>/export?access_token=<access_token>'
-d '{"start":"yyyy-MM-dd['T'HH:mm:ss[.SSS][XXX]]", "stop":"yyyy-MM-dd['T'HH:mm:ss[.SSS][XXX]]","target":"<collector_id>"}'
-H "Content-Type: application/json; charset=UTF-8"
An example can be found in the Streaming API documentation.
How to get past and live data for my project?
One collector can collect both, past and live data, with a combination of two approaches.
First, a new collector is created for the project in question:
curl -XPUT 'https://api.talkwalker.com/api/v3/stream/c/<collector_id>?access_token=<access_token>'
-d '{"collector_query" : {"project" : <project_id>}}'
-H "Content-Type: application/json; charset=UTF-8"
This collector, from the moment of its creation, collects live data. Thus, the only part missing is the past data.
In order to access the past data, an export task is needed, which is given the newly created collector as target:
curl -XPOST 'https://api.talkwalker.com/api/v3/stream/p/<project_id>/export?access_token=<access_token>'
-d '{"start":"yyyy-MM-dd['T'HH:mm:ss[.SSS][XXX]]", "target":"<collector_id>"}'
-H "Content-Type: application/json; charset=UTF-8"
What to do when out of credits during an export task?
If all credits are consumed during an export task, the task is interrupted. All data exported up to that point is sent to the target collector, so the amount of exported data always matches the amount of consumed credits.
How to get an estimation for the number of results of an export task?
There is no means for estimating the number of results beforehand.
However, the cost of the full export task can be estimated by subdividing the complete time period in multiple chunks. Exporting a fraction of the whole first can provide a rough picture of the dimension of the whole. Similarly, one can go chunk by chunk through a larger time frame and adapt the estimation after each finished task.
Imagine an export task that is supposed to export result data for a period of two years:
curl -XPOST 'https://api.talkwalker.com/api/v3/stream/p/<project_id>/export?access_token=<access_token>'
-d '{"start":"2016-09-01", "stop":"2018-09-01","target":"<collector_id>"}'
-H "Content-Type: application/json; charset=UTF-8"
This can also be subdivided 24 tasks of 1 month each:
curl -XPOST 'https://api.talkwalker.com/api/v3/stream/p/<project_id>/export?access_token=<access_token>'
-d '{"start":"2016-09-01", "stop":"2016-10-01","target":"<collector_id>"}'
-H "Content-Type: application/json; charset=UTF-8"
Still, the resulting estimation is never guaranteed to fully reflect the real outcome. Spikes in the data, for example, can’t be estimated this way.
Is it possible to get results for multiple projects with one collector?
No, this is not possible. In order to receive results for multiple projects, one collector per project is needed.
How to limit the number of results of an export task?
When creating an export task, it is possible to add the limit
parameter to the body:
curl -XPUT 'https://api.talkwalker.com/api/v3/stream/p/<project_id>/export?access_token=<access_token>'
-d '{"start":"2016-09-01","target":"<collector_id>", "limit":1000}'
-H "Content-Type: application/json; charset=UTF-8"
If there are 1000 results or less, the export task succeeds.
Otherwise, it fails with status result_limit_reached
, the first 1000 results are written to the collector and 1000 credits are consumed.
How to get the results of a stream from a certain point onward?
When creating a collector, a stream can be connected to it. All data found by that stream from this point on is collected by the collector.
curl -XPUT 'https://api.talkwalker.com/api/v3/stream/s/<stream_id>?access_token=<access_token>'
-d '{"rules" : [{"rule_id" : "<some_rule_id>", "query":"<some_query>"}]}'
-H "Content-Type: application/json; charset=UTF-8"
curl -XPUT 'https://api.talkwalker.com/api/v3/stream/c/<collector_id>?access_token=<access_token>'
-d '{"collector_query" : {"collector_query" : {"streams" : ["<stream_id>"]}}}'
-H "Content-Type: application/json; charset=UTF-8"
When displaying the results, there are two types of chunks: CT_RESULT, containing the matching documents, and CT_CONTROL containing control information.
{
"chunk_type": "CT_CONTROL",
"chunk_control": {
"connection_id": "#pch41wmpsxsh#",
"resume_offset": "<resume_token>",
"collector_id": "<collector_id>"
}
}
Among others, each CT_CONTROL chunk contains an offset value "resume_offset", which can be used as a request parameter.
The result access will then start at the beginning of the slice with that resume_token
and return all results from that point on.
curl -XGET 'https://api.talkwalker.com/api/v3/stream/c/<collector_id>/results?access_token=<access_token>&resume_offset=<resume_token>'
How to get only certain topics from a project
The following command creates a collector that returns in real time all documents from a list of topics of a Talkwalker project.
curl -XPUT 'https://api.talkwalker.com/api/v3/stream/c/<collector_id>?access_token=<access_token>'
-d '{"collector_query" : {"project_topics" : {"project": <project_id>, "topics":["<topic_id>"]}}}'
-H "Content-Type: application/json; charset=UTF-8"
Topics that were removed from the project, but included in the request, are not taken into consideration.
How to get the IDs of Talkwalker topics?
To get a list of the topics defined in a Talkwalker project use the project_id
and the access_token
on the https://api.talkwalker.com/api/v2/talkwalker/p/<project_id>/resources
endpoint.
Optionally, the filter type
can be set if we want to obtain only search-topics: type=search
curl -XGET 'https://api.talkwalker.com/api/v2/talkwalker/p/<project_id>/resources?access_token=<access_token>&type=search'
The result, using the above filter, has the form:
{
"status_code" : "0",
"status_message" : "OK",
"request" : "GET /api/v3/talkwalker/p/<project_id>/resources?access_token=<access_token>&type=search",
"result_resources" : {
"projects" : [ {
"id" : "<project_id>",
"title" : "Air France",
"topics" : [ {
"id" : "2p1nevfo_121244b12ade",
"title" : "Category 1",
"nodes" : [ {
"id" : "l9gb1vj7_9utd4cawszq7",
"title" : "topic 1"
}, {
"id" : "g8wf5sd4_8svs0cfghje8",
"title" : "topic 2"
} ]
}, {
"id" : "kj241kj4_h214jhv21l2a",
"title" : "Catergory 2",
"nodes" : [ {
"id" : "w6fc8sf4_4fds6hdgsjd1",
"title" : "topic 1"
} ]
} ]
} ]
}
}
To get results for all projects in 'search' use search
as topic ID.
To use a single topic, use the ID of the topic (for example w6fc8sf4_4fds6hdgsjd1
for topic 1 of category 2).
How to eliminate comments from a stream?
To remove comments and retrieve only the original documents add -is:comment
to the rules of a stream.
curl -XPUT 'https://api.talkwalker.com/api/v3/stream/s/<stream_id>?access_token=<access_token>'
-d '{"rules" : [{"rule_id" : "<rule_id>", "query":"-is:comment"}]}'
-H "Content-Type: application/json; charset=UTF-8"
Another possibility consists in using -is:comment
as a query parameter when reading results from a stream.
curl -XGET 'https://api.talkwalker.com/api/v3/stream/s/<stream_id>/results?access_token=<access_token>&q=-is:comment'
How to get past documents of a Talkwalker project that include special keywords
First, an empty collector is created.
curl -XPUT 'https://api.talkwalker.com/api/v3/stream/c/<collector_id>?access_token=<access_token>'
-d '{"collector_query" : {}}'
-H "Content-Type: application/json; charset=UTF-8"
This collector is set as target of a project export task.
curl -OST 'https://api.talkwalker.com/api/v3/stream/p/<project_id>/export?access_token=<access_token>'
-d '{"start":"<date>", "stop":"<date>","target":"<collector_id>", "query":"keyword-1 AND keyword-2"}'
-H "Content-Type: application/json; charset=UTF-8"
The above export task sends all those document to the collector which were published in the given timeframe and which match the query. The query in this example requires documents to include two keywords.
How to use a single stream for multiple applications / clients?
In order to use one single stream to retrieve data for more than one application / client, we set one separate rule per application.
curl -XPUT https://api.talkwalker.com/api/v3/stream/s/<stream_id>?access_token=<access_token>
-d '{"rules":[{"rule_id" : "rule-app-1", "query" : "<query>"},{"rule_id" : "rule-app-2", "query" : "<query>"}]}'
After creating this stream, we receive in real time those results which match one of both queries.
curl -XGET 'https://api.talkwalker.com/api/v3/stream/s/<stream_id>/results?access_token=<access_token>'
The returned results are in the format below.
The documents can be separated using the matched
field, indicating which rule the result belongs to and thus, which application is concerned.
{
"chunk_type" : "CT_RESULT",
"chunk_result" : {
"data" : {
"data" : { <default result data (see simple search)> },
"highlighted_data" : [ {
"matched" : {
"rule_id" : "rule-app-1"
"stream_id" : "<stream_id>"
}
"title_snippet" : "<title_snippet_for_rule>",
"content_snippet" : "<content_snippet_for_rule>"
} ]
}
}
}
How to get the number of results grouped by media types?
The Talkwalker API provides only documents and histograms. To group results into custom sets, you have to get all the results and then compute those sets locally. Alternatively you can perform separate searches (or histograms) for each of the groups you want to create (use the Talkwalker query syntax to restrict the results to those matching a single group).
Code Examples
Streaming Client Examples
PHP
This example needs the php cURL library and PHP 5.5. |
<?php
class TalkwalkerApiStreamingV3ClientExample {
const READ_BEFORE_TERMINATE = 3;
private $url;
private $token;
private $streamId;
private $collectorId;
private $resumeOffset;
private $ruleId;
private $finished = false;
private $unprocessedData = '';
private $headerSize = -1;
private $header = '';
private $waitForRetry = 0;
private $errorData = '';
public function __construct($url, $token, $streamId, $collectorId, $resumeOffset, $ruleId) {
$this->url = $url;
$this->token = $token;
$this->streamId = $streamId;
$this->collectorId = $collectorId;
$this->resumeOffset = $resumeOffset;
$this->ruleId = $ruleId;
}
public function run() {
$this->deleteStream();
$this->deleteCollector();
$this->createStream();
$this->createCollectorWithStream();
while (!$this->finished) {
$this->unprocessedData = '';
$this->errorData = '';
$this->headerSize = -1;
$this->header = '';
$url = $this->url . '/v3/stream/c/' . $this->collectorId . '/results?access_token='
. $this->token;
if (!empty($this->resumeOffset)) {
$url .= '&resume_offset=' . $this->resumeOffset;
}
$ch = curl_init($url);
$this->setDefaultCurlOptions($ch);
curl_setopt($ch, CURLOPT_HTTPGET, true);
curl_setopt($ch, CURLOPT_WRITEFUNCTION, [$this, "readStream"]);
curl_exec($ch);
$httpStatus = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if (curl_errno($ch) == 0 && $httpStatus == 200) {
$this->finished = true;
}
// else: error occurred
if ($httpStatus > 0 && $httpStatus != 200) {
$this->onStatusError($this->errorData);
}
curl_close($ch);
if (!$this->finished) {
if ($this->waitForRetry > 0) {
echo "SERVICE UNAVAILABLE \n";
echo "WAITING " . $this->waitForRetry . "s UNTIL RETRYING\n";
sleep($this->waitForRetry);
$this->waitForRetry = 0;
} else {
sleep(60);
}
}
}
$this->deleteCollector();
$this->deleteStream();
}
private function readStream($ch, $data) {
$resultsToRead = TalkwalkerApiStreamingV3ClientExample::READ_BEFORE_TERMINATE;
$httpStatus = curl_getinfo($ch, CURLINFO_HTTP_CODE);
$headerSize = curl_getinfo($ch, CURLINFO_HEADER_SIZE);
$this->unprocessedData = $this->unprocessedData . $data;
// read the header when it is complete
if ($this->headerSize < $headerSize) {
$this->headerSize = $headerSize;
$headerComplete = false;
} else {
$headerComplete = true;
}
$partialHeader = substr($this->unprocessedData, 0, $headerSize);
if ($headerComplete && $this->header == '') {
$this->header = substr($this->unprocessedData, 0, $headerSize);
$this->unprocessedData = substr($this->unprocessedData, $headerSize);
}
if ($headerComplete && $httpStatus == 200) {
$arrayData = explode("\r\n", $this->unprocessedData);
$this->unprocessedData = '';
$count = count($arrayData);
for ($i = 0; $i < $count; $i++) {
$line = $arrayData[$i];
// try parse json
if (strlen($line) > 0) {
$json = json_decode($line);
if ($json == NULL) {
// put it back only if last element
if ($i == $count - 1) {
$this->unprocessedData = $line;
} else {
$this->finished = true;
$this->handleParseError($line);
return 0;
}
} else {
if (isset($json->chunk_type)) {
switch ($json->chunk_type) {
case "CT_ERROR":
$this->handleStreamError($json->chunk_error);
break;
case "CT_CONTROL":
if (isset($json->chunk_control)) {
$this->resumeOffset = $json->chunk_control->resume_offset;
}
$this->handleStreamControl($json->chunk_control);
break;
case "CT_RESULT":
$this->handleStreamResult($json->chunk_result);
$resultsToRead--;
if ($resultsToRead == 0) {
$this->finished = true;
return 0;
}
break;
default:
$this->unhandledStreamChunk($json);
break;
}
} else {
$this->unhandledStreamChunk($json);
break;
}
}
}
}
} elseif ($httpStatus == 503) {
$headerArray = $this->parseHeader($partialHeader);
if (array_key_exists('Retry-After', $headerArray)) {
$this->waitForRetry = $headerArray['Retry-After'];
}
} else {
$this->errorData = $this->errorData . $data;
}
return strlen($data);
}
private function createStream() {
echo "CREATING STREAM\n";
$url = $this->url . '/v3/stream/s/' . $this->streamId . '?access_token=' . $this->token;
if ($this->performRequest($url, [
CURLOPT_CUSTOMREQUEST => 'PUT',
CURLOPT_POSTFIELDS => json_encode([
'rules' => [
[
'rule_id' => $this->ruleId,
'query' => 'published:>0'
]
]
])
])) {
echo 'CREATED STREAM: ' . $this->streamId . "\n";
}
}
private function createCollectorWithStream() {
echo "CREATING COLLECTOR\n";
$url = $this->url . '/v3/stream/c/' . $this->collectorId . '?access_token=' . $this->token;
if ($this->performRequest($url, [
CURLOPT_CUSTOMREQUEST => 'PUT',
CURLOPT_POSTFIELDS => json_encode([
'collector_query' => [
'streams' => [$this->streamId]
]
])
])) {
echo 'CREATED COLLECTOR: ' . $this->collectorId . "\n";
}
}
private function deleteStream() {
$url = $this->url . '/v3/stream/s/' . $this->streamId . '?access_token=' . $this->token;
if ($this->performRequest($url, [
CURLOPT_CUSTOMREQUEST => 'DELETE'
])) {
echo 'DELETED STREAM: ' . $this->streamId . "\n";
}
}
private function deleteCollector() {
$url = $this->url . '/v3/stream/c/' . $this->collectorId . '?access_token=' . $this->token;
if ($this->performRequest($url, [
CURLOPT_CUSTOMREQUEST => 'DELETE'
])) {
echo 'DELETED COLLECTOR: ' . $this->collectorId . "\n";
}
}
private function performRequest($url, $curlOptions) {
$ch = curl_init($url);
$this->setDefaultCurlOptions($ch);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt_array($ch, $curlOptions);
$result = curl_exec($ch);
$answer = json_decode($this->getPayload($ch, $result));
curl_close($ch);
if ($answer != null && $answer->status_code != '0') {
echo 'ERROR: ' . $answer->status_message . "\n";
return false;
}
return true;
}
private function getPayload($ch, $data) {
$headerSize = curl_getinfo($ch, CURLINFO_HEADER_SIZE);
return trim(substr($data, $headerSize));
}
private function parseHeader($header) {
$headers = [];
foreach (explode("\r\n", $header) as $i => $line) {
if ($i === 0) {
$headers['http_code'] = $line;
} else {
if ($line != '') {
list ($key, $value) = explode(': ', $line);
$headers[$key] = $value;
}
}
}
return $headers;
}
private function onStatusError($str) {
echo "START ERROR \n{$str}\n";
}
private function handleParseError($str) {
echo "Could not parse '{$str}'\n";
}
private function handleStreamError($err) {
echo "ERROR\n";
print_r($err);
}
private function handleStreamControl($ctrl) {
echo 'CONTROL: ' . json_encode($ctrl) . "\n";
echo 'UPDATED RESUME_OFFSET TO ' . $this->resumeOffset . "\n";
}
private function handleStreamResult($res) {
$url = empty($res->data->data->url)
? 'URL not available. Provider: ' . $res->data->data->external_provider
: $res->data->data->url;
echo "RESULT: $url\n";
}
private function unhandledStreamChunk($json) {
echo "UNHANDLED\n";
print_r($json);
}
private function setDefaultCurlOptions($ch) {
curl_setopt($ch, CURLOPT_HTTPGET, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($ch, CURLOPT_TIMEOUT, 90);
curl_setopt($ch, CURLOPT_FAILONERROR, false);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'PhpExampleClient/1.0.0');
curl_setopt($ch, CURLOPT_ENCODING, 'gzip');
curl_setopt($ch, CURLOPT_HTTPHEADER, ['Cache-Control: no-cache', 'Pragma: no-cache',
'Content-Language: en-US', 'Content-Type: application/json']);
}
}
echo "START\n";
$options = getopt('u:t:s:c:o:r:');
$url = $options['u'];
$token = $options['t'];
$streamId = $options['s'];
$collectorId = $options['c'];
$resumeOffset = $options['o'];
$ruleId = $options['r'];
$example = new TalkwalkerApiStreamingV3ClientExample($url, $token, $streamId, $collectorId,
$resumeOffset, $ruleId);
$example->run();
echo "DONE\n";
Java
import java.io.BufferedReader;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.DataOutputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.net.URLConnection;
import java.util.HashMap;
import java.util.Map;
import java.util.zip.GZIPInputStream;
import org.apache.commons.io.IOUtils;
import com.fasterxml.jackson.core.JsonFactory;
import com.fasterxml.jackson.core.type.TypeReference;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.node.JsonNodeFactory;
import com.fasterxml.jackson.databind.node.ObjectNode;
/**
* Example class can be used as an example.
* It is invoked via the ExampleTest class in this test case
*/
public class TalkwalkerApiStreamingV3ClientExample {
private final String url;
private final String token;
private final String streamId;
private final String collectorId;
private String resumeOffset;
private final String ruleId;
private static final int READ_BEFORE_TERMINATE = 3;
TalkwalkerApiStreamingV3ClientExample(String url, String token, String streamId,
String collectorId, String resumeOffset,
String ruleId) {
this.url = url;
this.token = token;
this.streamId = streamId;
this.collectorId = collectorId;
this.resumeOffset = resumeOffset;
this.ruleId = ruleId;
}
public void run() throws InterruptedException, IOException {
try {
delete("/v3/stream/c/", collectorId);
} catch (FileNotFoundException e) {
System.out.println("COLLECTOR " + collectorId + " NOT FOUND, NO DELETION NEEDED");
}
try {
delete("/v3/stream/s/", streamId);
} catch (FileNotFoundException e) {
System.out.println("STREAM " + streamId + " NOT FOUND, NO DELETION NEEDED");
}
createStream();
createCollectorWithStream();
boolean finished = false;
while (!finished) {
try {
String _url = url + "/v3/stream/c/" + collectorId + "/results?access_token=" + token;
_url = resumeOffset == null || resumeOffset.isEmpty() ? _url : _url + "&resume_offset="
+ resumeOffset;
URLConnection connection = createConnection(_url);
HttpURLConnection httpConnection = (HttpURLConnection) connection;
httpConnection.setRequestMethod("GET");
httpConnection.setRequestProperty("User-Agent", "JavaExampleClient/1.0.0");
httpConnection.setRequestProperty("Accept-Encoding", "gzip");
httpConnection.connect();
int httpCode = httpConnection.getResponseCode();
// getting the correct input stream
if (httpCode == 200) {
try (InputStream is = httpConnection.getInputStream()) {
try {
finished = readStream(httpConnection, is);
} catch (IOException ioe) {
//stream or connection was interrupted, retry with next iteration
}
}
} else if (httpCode == 503) {
// the service is currently unavailable
int secondsToWait = httpConnection.getHeaderFieldInt("Retry-After", 60);
System.out.println("TEMPORARILY UNAVAILABLE");
System.out.println("WAITING " + secondsToWait + "s UNTIL RETRYING");
Thread.sleep(secondsToWait * 1000);
} else {
// when encountering an error, we exit loop
try (InputStream is = httpConnection.getErrorStream()) {
readError(httpConnection, is);
} catch (IOException e) {
e.printStackTrace();
} finally {
finished = true;
}
}
} catch (IOException ex) {
// try again
ex.printStackTrace();
// sleep a minute
Thread.sleep(60 * 1000);
}
}
delete("/v3/stream/c/", collectorId);
delete("/v3/stream/s/", streamId);
}
private void readError(HttpURLConnection httpConnection, InputStream errorInputStream)
throws IOException {
ByteArrayOutputStream bos = new ByteArrayOutputStream();
byte[] dataBuf = new byte[1024 * 1024];
// read answer
while (true) {
int read = errorInputStream.read(dataBuf, 0, dataBuf.length);
if (read == -1) {
break;
}
bos.write(dataBuf, 0, read);
}
InputStream is = new ByteArrayInputStream(bos.toByteArray());
if ((httpConnection.getContentEncoding() != null) && (httpConnection.getContentEncoding()
.equals("gzip"))) {
is = new GZIPInputStream(is);
}
// read json using jackson json (another library may be used here)
JsonFactory factory = new JsonFactory();
ObjectMapper mapper = new ObjectMapper(factory);
TypeReference<HashMap<String, Object>> typeRef = new TypeReference<HashMap<String, Object>>() {
};
mapper.readValue(is, typeRef);
}
private boolean readStream(HttpURLConnection httpConnection, InputStream inputStream)
throws IOException {
// reading the stream and invoking the listener
InputStream is = inputStream;
if ((httpConnection.getContentEncoding() != null) && (httpConnection.getContentEncoding()
.equals("gzip"))) {
is = new GZIPInputStream(is);
}
int resultsToRead = READ_BEFORE_TERMINATE;
BufferedReader reader = new BufferedReader(new InputStreamReader(is, "UTF-8"),
100);
String line;
while ((line = reader.readLine()) != null) {
if (line.isEmpty()) {
continue;
}
// parse json (use an available json parser)
JsonFactory factory = new JsonFactory();
ObjectMapper mapper = new ObjectMapper(factory);
TypeReference<HashMap<String, Object>> typeRef =
new TypeReference<HashMap<String, Object>>() {
};
HashMap<String, Object> o = mapper.readValue(line, typeRef);
Object oType = o.get("chunk_type");
if (oType != null && oType instanceof String) {
String type = (String) oType;
switch (type) {
case "CT_ERROR":
Map<String, Object> errorChunk = getAsMap(o, "chunk_error");
handleStreamError(errorChunk);
break;
case "CT_CONTROL":
Map<String, Object> controlChunk = getAsMap(o, "chunk_control");
if (controlChunk != null) {
resumeOffset = getAsT(controlChunk, "resume_offset", String.class);
}
handleStreamControl(controlChunk);
break;
case "CT_RESULT":
Map<String, Object> resultChunk = getAsMap(o, "chunk_result");
handleStreamResult(resultChunk);
resultsToRead--;
if (resultsToRead == 0) {
return true;
}
break;
default:
unhandledStreamChunk(o);
break;
}
} else {
unhandledStreamChunk(o);
}
}
return false;
}
static Map<String, Object> getAsMap(Map<String, Object> o, String key) {
if (o != null) {
Object oRet = o.get(key);
if (oRet != null && oRet instanceof Map) {
return (Map<String, Object>) oRet;
}
}
return null;
}
static <T> T getAsT(Map<String, Object> o, String key, Class<T> clz) {
if (o != null) {
Object oRet = o.get(key);
if (oRet != null && clz.isInstance(oRet)) {
return (T) oRet;
}
}
return null;
}
private void handleStreamError(Map<String, Object> errorChunk) {
System.out.println("ERROR: " + errorChunk);
}
protected void handleStreamControl(Map<String, Object> controlChunk) {
System.out.println("CONTROL: " + controlChunk);
System.out.println("UPDATED RESUME_OFFSET TO " + resumeOffset);
}
protected void handleStreamResult(Map<String, Object> resultChunk) {
Map<String, Object> resultData = getAsMap(resultChunk, "data");
Map<String, Object> entryData = getAsMap(resultData, "data");
String url = getAsT(entryData, "url", String.class) == null
? "URL not available. Provider: " + getAsT(entryData, "external_provider", String.class)
: getAsT(entryData, "url", String.class);
System.out.println("RESULT: " + url);
}
private void unhandledStreamChunk(Map<String, Object> unhandledChunk) {
System.out.println("UNHANDLED: " + unhandledChunk);
}
private void createStream() throws IOException {
System.out.println("CREATING " + streamId);
String _url = url + "/v3/stream/s/" + streamId + "?access_token=" + token;
URLConnection connection = createConnection(_url);
HttpURLConnection httpConnection = (HttpURLConnection) connection;
httpConnection.setRequestMethod("PUT");
httpConnection.setRequestProperty("User-Agent", "JavaExampleClient/1.0.0");
httpConnection.setRequestProperty("charset", "utf-8");
httpConnection.setDoOutput(true);
httpConnection.setDoInput(true);
DataOutputStream wr = new DataOutputStream(connection.getOutputStream());
JsonNodeFactory factory = JsonNodeFactory.instance;
ObjectNode arrayEntry = factory.objectNode().put("rule_id", ruleId)
.put("query", "published:>0");
ObjectNode on = factory.objectNode();
on.putArray("rules").add(arrayEntry);
System.out.println(on.toString());
wr.writeBytes(on.toString());
wr.flush();
wr.close();
httpConnection.connect();
int httpCode = httpConnection.getResponseCode();
if (httpCode == 200) {
System.out.println("CREATED " + streamId);
} else {
System.out.println("ERROR");
System.out.println(IOUtils.toString(httpConnection.getInputStream(), "UTF-8"));
}
}
protected void createCollectorWithStream() throws IOException {
System.out.println("CREATING " + collectorId);
String _url = url + "/v3/stream/c/" + collectorId + "?access_token=" + token;
URLConnection connection = createConnection(_url);
HttpURLConnection httpConnection = (HttpURLConnection) connection;
httpConnection.setRequestMethod("PUT");
httpConnection.setRequestProperty("User-Agent", "JavaExampleClient/1.0.0");
httpConnection.setRequestProperty("charset", "utf-8");
httpConnection.setDoOutput(true);
httpConnection.setDoInput(true);
DataOutputStream wr = new DataOutputStream(connection.getOutputStream());
JsonNodeFactory factory = JsonNodeFactory.instance;
ObjectNode on = factory.objectNode();
on.putObject("collector_query").putArray("streams").add(streamId);
System.out.println(on.toString());
wr.writeBytes(on.toString());
wr.flush();
wr.close();
httpConnection.connect();
int httpCode = httpConnection.getResponseCode();
if (httpCode == 200) {
System.out.println("CREATED " + collectorId);
} else {
System.out.println("ERROR");
System.out.println(IOUtils.toString(httpConnection.getInputStream(), "UTF-8"));
}
}
private void delete(String path, String id) throws IOException {
String _url = url + path + id + "?access_token=" + token;
URLConnection connection = createConnection(_url);
HttpURLConnection httpConnection = (HttpURLConnection) connection;
httpConnection.setRequestMethod("DELETE");
httpConnection.setRequestProperty("User-Agent", "JavaExampleClient/1.0.0");
httpConnection.setRequestProperty("charset", "utf-8");
httpConnection.setDoOutput(true);
httpConnection.setDoInput(true);
httpConnection.connect();
int httpCode = httpConnection.getResponseCode();
if (httpCode == 200) {
System.out.println("DELETED " + id);
} else {
System.out.println(IOUtils.toString(httpConnection.getInputStream(), "UTF-8"));
}
}
private URLConnection createConnection(String url) throws IOException {
URL request = new URL(url);
URLConnection connection = request.openConnection();
connection.setConnectTimeout(30000);
connection.setReadTimeout(90000);
connection.setUseCaches(false);
connection.setRequestProperty("Content-Language", "en-US");
return connection;
}
}
API Examples for Postman
The following file is a Postman collection that contains some examples of API calls. It can be directly imported to Postman to facilitate the testing.
After importing the file, 3 variables needs to be created: token (mandatory for all calls), project_id and topic_id.
{
"info": {
"_postman_id": "8f0f8295-8e3d-4949-8c05-23ecb51e7c7c",
"name": "Talkwalker API calls examples",
"schema": "https://schema.getpostman.com/json/collection/v2.1.0/collection.json"
},
"item": [
{
"name": "Remaining credits",
"request": {
"method": "GET",
"header": [],
"url": {
"raw": "https://api.talkwalker.com/api/v1/status/credits?access_token={{token}}",
"protocol": "https",
"host": [
"api",
"talkwalker",
"com"
],
"path": [
"api",
"v1",
"status",
"credits"
],
"query": [
{
"key": "access_token",
"value": "{{token}}"
}
]
}
},
"response": []
},
{
"name": "Search API",
"request": {
"method": "GET",
"header": [],
"url": {
"raw": "https://api.talkwalker.com/api/v1/search/results?access_token={{token}}&q=Cats&sort_by=trending_score&sort_order=desc",
"protocol": "https",
"host": [
"api",
"talkwalker",
"com"
],
"path": [
"api",
"v1",
"search",
"results"
],
"query": [
{
"key": "access_token",
"value": "{{token}}"
},
{
"key": "q",
"value": "Cats"
},
{
"key": "sort_by",
"value": "trending_score"
},
{
"key": "sort_order",
"value": "desc"
}
]
}
},
"response": []
},
{
"name": "Histogram API - published over specific time window",
"request": {
"method": "GET",
"header": [],
"url": {
"raw": "https://api.talkwalker.com/api/v1/search/histogram/published?access_token={{token}}&q=Cats&min=1617804000000&max=1617890400000",
"protocol": "https",
"host": [
"api",
"talkwalker",
"com"
],
"path": [
"api",
"v1",
"search",
"histogram",
"published"
],
"query": [
{
"key": "access_token",
"value": "{{token}}"
},
{
"key": "q",
"value": "Cats"
},
{
"key": "min",
"value": "1617804000000"
},
{
"key": "max",
"value": "1617890400000"
}
]
}
},
"response": []
},
{
"name": "Histogram API - engagement",
"request": {
"method": "GET",
"header": [],
"url": {
"raw": "https://api.talkwalker.com/api/v1/search/histogram/engagement?access_token={{token}}&q=Cats",
"protocol": "https",
"host": [
"api",
"talkwalker",
"com"
],
"path": [
"api",
"v1",
"search",
"histogram",
"engagement"
],
"query": [
{
"key": "access_token",
"value": "{{token}}"
},
{
"key": "q",
"value": "Cats"
}
]
}
},
"response": []
},
{
"name": "Histogram API - published with a breakdown over sentiment",
"request": {
"method": "GET",
"header": [],
"url": {
"raw": "https://api.talkwalker.com/api/v1/search/histogram/published?access_token={{token}}&q=Cats&breakdown=sentiment",
"protocol": "https",
"host": [
"api",
"talkwalker",
"com"
],
"path": [
"api",
"v1",
"search",
"histogram",
"published"
],
"query": [
{
"key": "access_token",
"value": "{{token}}"
},
{
"key": "q",
"value": "Cats"
},
{
"key": "breakdown",
"value": "sentiment"
}
]
}
},
"response": []
},
{
"name": "Histogram API - Top N distribution",
"request": {
"method": "GET",
"header": [],
"url": {
"raw": "https://api.talkwalker.com/api/v1/search/histogram/language?access_token={{token}}&q=Cats&top_n=3",
"protocol": "https",
"host": [
"api",
"talkwalker",
"com"
],
"path": [
"api",
"v1",
"search",
"histogram",
"language"
],
"query": [
{
"key": "access_token",
"value": "{{token}}"
},
{
"key": "q",
"value": "Cats"
},
{
"key": "top_n",
"value": "3"
}
]
}
},
"response": []
},
{
"name": "List of resources of project",
"request": {
"method": "GET",
"header": [],
"url": {
"raw": "https://api.talkwalker.com/api/v2/talkwalker/p/{{project_id}}/resources?access_token={{token}}",
"protocol": "https",
"host": [
"api",
"talkwalker",
"com"
],
"path": [
"api",
"v2",
"talkwalker",
"p",
"{{project_id}}",
"resources"
],
"query": [
{
"key": "access_token",
"value": "{{token}}"
}
]
}
},
"response": []
},
{
"name": "List of all projects linked to an API application",
"request": {
"method": "GET",
"header": [],
"url": {
"raw": "https://api.talkwalker.com/api/v1/search/info?access_token={{token}}",
"protocol": "https",
"host": [
"api",
"talkwalker",
"com"
],
"path": [
"api",
"v1",
"search",
"info"
],
"query": [
{
"key": "access_token",
"value": "{{token}}"
}
]
}
},
"response": []
},
{
"name": "Search API - Results limited by topic",
"request": {
"method": "GET",
"header": [],
"url": {
"raw": "https://api.talkwalker.com/api/v1/search/p/{{project_id}}/results?access_token={{token}}&topic={{topic_id}}",
"protocol": "https",
"host": [
"api",
"talkwalker",
"com"
],
"path": [
"api",
"v1",
"search",
"p",
"{{project_id}}",
"results"
],
"query": [
{
"key": "access_token",
"value": "{{token}}"
},
{
"key": "topic",
"value": "{{topic_id}}"
}
]
}
},
"response": []
},
{
"name": "List of all tag IDs",
"request": {
"method": "GET",
"header": [],
"url": {
"raw": "https://api.talkwalker.com/api/v2/talkwalker/p/{{project_id}}/tags?access_token={{token}}",
"protocol": "https",
"host": [
"api",
"talkwalker",
"com"
],
"path": [
"api",
"v2",
"talkwalker",
"p",
"{{project_id}}",
"tags"
],
"query": [
{
"key": "access_token",
"value": "{{token}}"
}
]
}
},
"response": []
},
{
"name": "List of all streams and collectors created",
"request": {
"method": "GET",
"header": [],
"url": {
"raw": "https://api.talkwalker.com/api/v3/stream/info?access_token={{token}}",
"protocol": "https",
"host": [
"api",
"talkwalker",
"com"
],
"path": [
"api",
"v3",
"stream",
"info"
],
"query": [
{
"key": "access_token",
"value": "{{token}}"
}
]
}
},
"response": []
},
{
"name": "Streaming API - Create a stream",
"request": {
"method": "PUT",
"header": [],
"body": {
"mode": "raw",
"raw": "{\"rules\" : [{\"rule_id\" : \"rule-1\", \"query\":\"Cats\"}]}"
},
"url": {
"raw": "https://api.talkwalker.com/api/v3/stream/s/stream-1?access_token={{token}}",
"protocol": "https",
"host": [
"api",
"talkwalker",
"com"
],
"path": [
"api",
"v3",
"stream",
"s",
"stream-1"
],
"query": [
{
"key": "access_token",
"value": "{{token}}"
}
]
}
},
"response": []
},
{
"name": "Streaming API - Create a collector",
"request": {
"method": "PUT",
"header": [],
"body": {
"mode": "raw",
"raw": "{\"collector_query\" : {\"streams\" : [\"stream-1\"], \"queries\" : [{\"id\" : \"q-1\", \"query\" : \"lang:en\"}]}}"
},
"url": {
"raw": "https://api.talkwalker.com/api/v3/stream/c/collector-1?access_token={{token}}",
"protocol": "https",
"host": [
"api",
"talkwalker",
"com"
],
"path": [
"api",
"v3",
"stream",
"c",
"collector-1"
],
"query": [
{
"key": "access_token",
"value": "{{token}}"
}
]
}
},
"response": []
},
{
"name": "Streaming API - Retreive the definition of a collector",
"request": {
"method": "GET",
"header": [],
"url": {
"raw": "https://api.talkwalker.com/api/v3/stream/c/collector-1?access_token={{token}}",
"protocol": "https",
"host": [
"api",
"talkwalker",
"com"
],
"path": [
"api",
"v3",
"stream",
"c",
"collector-1"
],
"query": [
{
"key": "access_token",
"value": "{{token}}"
}
]
}
},
"response": []
},
{
"name": "Streaming API - Downloading the results of a collector",
"request": {
"method": "GET",
"header": [],
"url": {
"raw": "https://api.talkwalker.com/api/v3/stream/c/collector-1/results?access_token={{token}}&end_behaviour=stop",
"protocol": "https",
"host": [
"api",
"talkwalker",
"com"
],
"path": [
"api",
"v3",
"stream",
"c",
"collector-1",
"results"
],
"query": [
{
"key": "access_token",
"value": "{{token}}"
},
{
"key": "end_behaviour",
"value": "stop"
}
]
}
},
"response": []
},
{
"name": "Streaming API- Pause a collector",
"request": {
"method": "POST",
"header": [],
"url": {
"raw": "https://api.talkwalker.com/api/v3/stream/c/collector-1/pause?access_token={{token}}",
"protocol": "https",
"host": [
"api",
"talkwalker",
"com"
],
"path": [
"api",
"v3",
"stream",
"c",
"collector-1",
"pause"
],
"query": [
{
"key": "access_token",
"value": "{{token}}"
}
]
}
},
"response": []
},
{
"name": "Streaming API - Resume a collector",
"request": {
"method": "POST",
"header": [],
"url": {
"raw": "https://api.talkwalker.com/api/v3/stream/c/collector-1/resume?access_token={{token}}",
"protocol": "https",
"host": [
"api",
"talkwalker",
"com"
],
"path": [
"api",
"v3",
"stream",
"c",
"collector-1",
"resume"
],
"query": [
{
"key": "access_token",
"value": "{{token}}"
}
]
}
},
"response": []
},
{
"name": "Streaming API - Create an export task",
"request": {
"method": "POST",
"header": [],
"body": {
"mode": "raw",
"raw": "{\"start\": \"2021-04-08\", \"stop\": \"2021-04-09\", \"target\":\"collector-1\", \"topics\":[\"topic_id\"]}"
},
"url": {
"raw": "https://api.talkwalker.com/api/v3/stream/p/{{project_id}}/export?access_token={{token}}",
"protocol": "https",
"host": [
"api",
"talkwalker",
"com"
],
"path": [
"api",
"v3",
"stream",
"p",
"{{project_id}}",
"export"
],
"query": [
{
"key": "access_token",
"value": "{{token}}"
}
]
}
},
"response": []
},
{
"name": "Streaming API - Status of an export",
"request": {
"method": "GET",
"header": [],
"url": {
"raw": "https://api.talkwalker.com/api/v3/tasks/export/<task_ID>?access_token={{token}}",
"protocol": "https",
"host": [
"api",
"talkwalker",
"com"
],
"path": [
"api",
"v3",
"tasks",
"export",
"e36a61db-da26-456f-a4c9-45cff824e47d"
],
"query": [
{
"key": "access_token",
"value": "{{token}}"
}
]
}
},
"response": []
},
{
"name": "Streaming API - List of recent task IDs",
"request": {
"method": "GET",
"header": [],
"url": {
"raw": "https://api.talkwalker.com/api/v3/tasks/export?access_token={{token}}",
"protocol": "https",
"host": [
"api",
"talkwalker",
"com"
],
"path": [
"api",
"v3",
"tasks",
"export"
],
"query": [
{
"key": "access_token",
"value": "{{token}}"
}
]
}
},
"response": []
},
{
"name": "Streaming API - Abort a task",
"request": {
"method": "DELETE",
"header": [],
"url": {
"raw": "https://api.talkwalker.com/api/v3/tasks/export/<task_ID>?access_token={{token}}",
"protocol": "https",
"host": [
"api",
"talkwalker",
"com"
],
"path": [
"api",
"v3",
"tasks",
"export",
"<task_ID>"
],
"query": [
{
"key": "access_token",
"value": "{{token}}"
}
]
}
},
"response": []
},
{
"name": "List of dashboards created in the project",
"request": {
"method": "GET",
"header": [],
"url": {
"raw": "https://api.talkwalker.com/api/v2/talkwalker/p/{{project_id}}/views?access_token={{token}}",
"protocol": "https",
"host": [
"api",
"talkwalker",
"com"
],
"path": [
"api",
"v2",
"talkwalker",
"p",
"{{project_id}}",
"views"
],
"query": [
{
"key": "access_token",
"value": "{{token}}"
}
]
}
},
"response": []
},
{
"name": "Add new topic",
"request": {
"method": "POST",
"header": [],
"body": {
"mode": "raw",
"raw": "{\r\n \"topic_type\": \"search\",\r\n \"topic_line_import\": [\r\n {\r\n \"topic_id\": \"\",\r\n \"topic_title\": \"my new topic\",\r\n \"category_title\": \"\",\r\n \"category_id\": \"default\",\r\n \"override\": false,\r\n \"query\": \"Cats\",\r\n \"included_query\": true\r\n }\r\n ]\r\n}",
"options": {
"raw": {
"language": "json"
}
}
},
"url": {
"raw": "https://api.talkwalker.com/api/v2/talkwalker/p/{{project_id}}/topics/import?access_token={{access_token}}",
"protocol": "https",
"host": [
"api",
"talkwalker",
"com"
],
"path": [
"api",
"v2",
"talkwalker",
"p",
"{{project_id}}",
"topics",
"import"
],
"query": [
{
"key": "access_token",
"value": "{{access_token}}"
}
]
}
},
"response": []
},
{
"name": "Edit existing topic",
"request": {
"method": "POST",
"header": [],
"body": {
"mode": "raw",
"raw": "{\r\n \"topic_type\": \"search\",\r\n \"topic_line_import\": [\r\n {\r\n \"topic_id\": \"<topic_id>\",\r\n \"topic_title\": \"changed topic name\",\r\n \"category_title\": \"\",\r\n \"category_id\": \"default\",\r\n \"override\": true,\r\n \"query\": \"changed query\",\r\n \"included_query\": true\r\n },\r\n {\r\n \"topic_id\": \"<topic_id>\",\r\n \"topic_title\": \"changed topic name\",\r\n \"category_title\": \"\",\r\n \"category_id\": \"default\",\r\n \"override\": true,\r\n \"query\": \"exclude these words\",\r\n \"included_query\": false\r\n }\r\n ]\r\n}",
"options": {
"raw": {
"language": "json"
}
}
},
"url": {
"raw": "https://api.talkwalker.com/api/v2/talkwalker/p/{{project_id}}/topics/import?access_token={{access_token}}",
"protocol": "https",
"host": [
"api",
"talkwalker",
"com"
],
"path": [
"api",
"v2",
"talkwalker",
"p",
"{{project_id}}",
"topics",
"import"
],
"query": [
{
"key": "access_token",
"value": "{{access_token}}"
}
]
}
},
"response": []
}
],
"event": [
{
"listen": "prerequest",
"script": {
"type": "text/javascript",
"exec": [
""
]
}
},
{
"listen": "test",
"script": {
"type": "text/javascript",
"exec": [
""
]
}
}
]
}
Throubleshooting
Error Codes
http code | status code | message | description |
---|---|---|---|
|
|
OK |
Default answer |
|
|
Internal Server Error |
An unexpected exception was encountered. |
|
|
Search Execution Exception |
An unexpected exception was encountered. Related to the search |
|
|
Parameter Missing |
Required parameters are missing. The missing parameters are provided in key 'params'. |
|
|
Error in query |
Could not parse query. The details can be found under 'details'. |
|
|
Invalid parameter value |
A parameter has an unacceptable value. The parameter is listed under 'param' and the details under 'details'. |
|
|
Invalid, missing or inactive access token |
The access token is either missing or the provided value is invalid. |
|
|
Call limit exceeded for this endpoint |
The called endpoint has a limited call frequency, the values should be cached by the client. |
|
|
No credits left. |
The account ran out of credits. |
|
|
API application is inactive |
The API account is inactive. 'appId' gives the ID of that account. |
|
|
No such application linked |
The provided ID is not linked in the API to any project or application. |
|
|
Linked application inactive or deleted |
The linked application is inactive or deleted. |
|
|
Access denied: Insufficient access rights. |
The used access token does not have enough access rights. 'rights_req' will list the required access rights, 'rights_got' lists the access rights provided by that access token. |
|
|
Wrong stream id. No such stream defined. |
A non existing stream was accessed. |
|
|
Invalid operation on document |
The search document modification operation is not supported. 'reason' and 'details' will provide more information. |
|
|
Could not parse json |
The JSON that was passed via POST could not be properly interpreted (it was not in the expected format). |
|
|
Invalid operation on stream |
Modifying a stream failed. See 'reason' for details. |
|
|
Number of rules to set exceeds maximum number of rules |
Exceeded the maximum allowed rules for this API account. 'number_max' is the limit, 'number_available' how many we can save and 'number_saving' the number we tried to save |
|
|
Cannot create any more streams |
Exceeded maximum amount of streams ('number_max') |
|
|
A stream with this name already exists |
The stream 'stream_id' is already defined. |
|
|
Number of sources to set exceeds maximum number of sources |
Exceeded the maximum allowed sources (whitelist or blacklist) for this API account. 'number_max' is the limit, 'number_available' how many we can save and 'number_saving' the number we tried to save. |
|
|
Stream has no rules defined |
Exception when trying to stream with a stream that has no rule defined. |
|
|
Stream got disconnected because newer stream running |
A new stream (same 'stream_id') is connected, so the old stream will be disconnected. |
|
|
Stream got disconnected |
The stream was disconnected due to the given reason. |
|
|
Endpoint or action not found |
The called endpoint was not found. |
|
|
Connection is not secure, must use HTTPS |
Authentication API endpoints need to be called using HTTPS. |
|
|
User was not found in this application |
This user ID does not exist or is not linked to this project. |
|
|
Access to this project is forbidden |
This project can not be accessed with the given access_token. |
|
|
Limit of maximum concurrent streams reached |
Too many streams running in parallel for this account. |
|
|
Could not find rule with id |
A rule with the given ID could not be found. |
|
|
Could not find panel with id |
A panel with the given ID could not be found. |
|
|
Panel is still referenced |
This panel could not be deleted, it is still used in a stream. |
|
|
HTTP Version Not Supported |
The Talkwalker Streaming API supports HTTP 1.1 or newer. |
|
|
Url is malformed |
The given URL for channel monitoring is malformed |
|
|
Could not execute action in Talkwalker |
Error in connecting to a Talkwalker project |
|
|
Access prohibited |
Access prohibited due to access restriction settings |
|
|
Cannot create any more panels |
The maximum number of panels is reached |
|
|
Cannot find a project with this id |
The project with this ID could not be found or is not accessible |
|
|
Request entity too large |
Request entity too large |
|
|
Global search is disabled for this account |
Global search is disabled for this account. |
|
|
Some or all requests of this bulk request failed |
Some or all requests of this bulk request failed |
|
|
Request entity too large |
The sent PUT or POST request is too large (maximum 5120kb) |
|
|
Service Temporarily Unavailable |
The Talkwalker servers are restarting due too an update |
Error Handling
Streaming API
Resuming an interrupted stream
A stream collector can be interrupted for several reason: given maximum of hits (max_hits
) reached, end_behaviour
configured accordingly, no credits left, server issues or connection problems.
To resume a disconnected stream, set the parameter resume_offset
to the resume_offset
of the last CT_CONTROL
chunk.
This allows to continue streaming from the exact position of that resume_offset
on.
{
"chunk_type": "CT_CONTROL",
"chunk_control": {
"connection_id": "<some_connection_id>",
"resume_offset": "<resume_offset>",
"collector_id": "<some_collector_id>"
}
}
curl -XGET 'https://api.talkwalker.com/api/v3/stream/c/<collector_id>/results?access_token=<access_token>&resume_offset=<resume_offset>'
Document API
Document import fails with "Does not match any xyz" Error
Documents that do not match a projects queries can not be imported into a project. The details of the the error message explain what part of the project was not matched. The provided document must match the settings of the project (languages, countries, source types and blocked sources) and the query of at least one topic.
When importing documents from a specific domain, an extra topic similar to domainurl:"http://my-site.com/"
can help to match all uploaded documents.