Article
An article represents a piece of content created in the content management
system. Different types of content like text or video articles share
the same message structure, they can be distinguished by the
Article.Type
field.
Text articles (type = Article.Type.ARTICLE
) also have
Article.SubType
to differentiate its purpose and form.
Teaser
To improve performance of database access and during network transmission tapir
is using a lightweight representation of Article
in some places.
Depending on the service used to retrieve an article, the Article
message might
only contain data required on section pages:
Article.body
set tonull
Article.elements
filtered byElement.relations
to only containTEASER
, but neitherOPENER
norSOCIAL
Thus, not containing any data that is only required on detail pages.
This lightweight representation is sometimes referred to as Teaser
.
Field name | Type | Description |
---|---|---|
id | int64 | Unique ID of the article defined by the content management system (required). |
type | Type | Main content type of the article (required). See list of supported [ContentType ][ct] |
sub_type | SubType | Subtype of the article. For ARTICLE this field holds a sub_type , for others like GALLERY it may not. |
section_tree | Reference | Hierarchical section tree information of the article (required). |
fields | map<string, string> | Generic map containing general content and configuration information of the article (required). See fields |
bodies | repeated Body | Recursive textual body of the article to be rendered on detail pages. May be null for Teaser. |
metadata | Metadata | The articles Metadata, containing state and various timestamps. |
elements | repeated Element | Element s required to render the teaser, such as IMAGE , VIDEO or AUTHOR |
keywords | repeated Keyword | Extracted keywords from the article body like persons, locations, organizations etc. |
onwards | int64 | IDs of articles related to this article. Related articles are defined manually in the content management system by the editorial department. |
variants | map<string, Article> | Variants of this article, e.g. for headline testing. |
entities | string [deprecated] | Extracted entities from the article body like persons, locations, organizations etc. deprecated — use keywords instead. |
authors | repeated Author | Authors and or Agencies ƒor this content |
related_articles | repeated Article | Editorial articles, which are related to the main article. May only be an empty unresolved article (not all services will resolve these). |
references | repeated Reference [] | References, e.g. URLs belonging to this article. |
message Article {
int64 id = 1;
Type type = 2;
SubType sub_type = 3;
stroeer.core.v1.Reference section_tree = 4;
map<string, string> fields = 5;
repeated Body bodies = 6;
Metadata metadata = 7;
repeated Element elements = 8;
repeated Keyword keywords = 9;
repeated int64 onwards = 10 [deprecated = true];
map<string, Article> variants = 11;
repeated Author authors = 12;
repeated Article related_articles = 13;
repeated Reference references = 14;
repeated string entities = 100 [deprecated = true];
fields
The entry set is defined by the content management system and will vary depending on the main type of the article.
⚠ Clients must be resilient to unknown or missing entries. ⚠
For Article.Type.ARTICLE
this map will contain the following data:
key | mandatory | description |
---|---|---|
headline | * | the headline for this content |
top_line | * | "dachzeile" |
ref_path | * | URL path for this article e.g. /${section_tree}/id_${id}/${title}.html |
ref_canonical | * | Canonical URL of this article, may differ if external, e.g. https://www.example.com/external.html |
summary | summary for this content | |
teaser_text | used on teasers, overrides summary | |
meta_robots | ||
social_headline | used for social markup, overrides headline | |
headline_short | used for "Schlagzeilen", overrides headline | |
meta_title | HTML <meta title> | |
expert_line | ||
social_description | ||
meta_description | HTML <meta description> | |
reading_time_minutes | estimated reading time in minutes | |
flag:hidden | this content must be excluded from automated curations (CMS: no auto-content /manuell kuratieren ) |
For Article.Type.GALLERY
this map will contain the following data:
key | mandatory | description |
---|---|---|
headline | * | the headline for this content |
enum Type
Enum value | Description |
---|---|
TYPE_UNSPECIFIED | unspecified |
ARTICLE | A text article, usually sub typed |
IMAGE | [deprecated] An image article, unused, deprecated |
VIDEO | A video article, contains HLS-videos, as well as external live str |
GALLERY | A gallery article |
EMBED | An embed article including an oembed or edge_side_include element |
AUTHOR | An author article, currently not implemented |
AGENCY | [deprecated] An agency article, unused, deprecated |
EXTERNAL | An external article (teaser-like external article) |
INTERNAL | Used for internal purposes only. |
CLUSTER | a thematically grouped cluster for various amount of articles, resolved Trails embedded via related_articles |
enum Type {
TYPE_UNSPECIFIED = 0;
ARTICLE = 1;
IMAGE = 2 [deprecated = true];
VIDEO = 3;
GALLERY = 4;
EMBED = 5;
AUTHOR = 6;
AGENCY = 7 [deprecated = true];
EXTERNAL = 8;
CLUSTER = 9;
INTERNAL = 100;
}
enum SubType
Content with Type.ARTICLE
is usually sub typed to alter its form and purpose.
Enum value | Description |
---|---|
SUB_TYPE_UNSPECIFIED | unspecified |
NEWS | Meldung/Nachricht — this is the default |
COLUMN | Kolumne |
COMMENTARY | Kommentar |
INTERVIEW | Interview |
CONTROVERSY | Pro und Kontra/Streitgespräch |
TAGESANBRUCH | Tagesanbruch |
EVERGREEN | Evergreen |
AGENCY_IMPORT | Content originally imported from agency/tickers by the CMS |
ADVERTORIAL | Advertorial |
QUIZ | Quiz |
GAME | (Browser)Game |
COMPLIANCE | Internal company articles like an imprint or contact forms |
RECIPE | Cooking recipe |
enum SubType {
SUB_TYPE_UNSPECIFIED = 0;
NEWS = 1;
COLUMN = 2;
COMMENTARY = 3;
INTERVIEW = 4;
CONTROVERSY = 5;
TAGESANBRUCH = 6;
EVERGREEN = 7;
AGENCY_IMPORT = 8;
ADVERTORIAL = 9;
QUIZ = 10;
GAME = 11;
COMPLIANCE = 12;
RECIPE = 13;
}
Article ۰ Body
The Body
represents a basic block. Each Body
is self-contained and holds
all the data required for rendering within its data structures.
Common use cases for this are Type.BODY
where the textual article body can be found
and the TYPE.ARTICLE_SOURCE
where onward articles are referenced.
Field name | Type | Description |
---|---|---|
children | repeated BodyNode | Recursive/Nested structure that usually represents the textual body / Markup / HTML |
type | Type | Unique ID of the article defined by the content management system (required). |
message Body {
repeated BodyNode children = 1;
Type type = 2;
Type
Each Body
has a Body.Type
to help the consumer to correctly interpret
the BodyNode's
content.
Enum value | Description |
---|---|
TYPE_UNSPECIFIED | unspecified |
BODY | The textual article body including all inline elements such as IMAGE , VIDEO and EMBED |
ARTICLE_SOURCES | A wrapper for all article sources ("Quellenaparat"). There can only be one of these per article. |
DISCLAIMER | A article disclaimer with important notes/legal stuff. E.g. "medizinischer Hinweis" on all medical articles |
TRUST_BOX | Includes information what the current article type is (e.g. opinion article). There can only be one of these per article. |
TABLE_OF_CONTENTS | Table of contents for this article, consists of anchors which refer to sub headlines within the BODY |
enum Type {
TYPE_UNSPECIFIED = 0;
BODY = 1;
ARTICLE_SOURCES = 2;
DISCLAIMER = 3;
TRUST_BOX = 4;
TABLE_OF_CONTENTS = 5;
}
BodyNode
Recursive structure representing all types of possible nodes inside an article.
One use-case is to represent HTML-like markup in tapir, but it
is also used to map custom elements that require a strict
positional placement within the textual body. Things that are not part of the
textual article body are represented as individual Body
parts so they
can be rendered independently if required.
Clients must be resilient to unknown or missing nodes.
message BodyNode {
string type = 1;
string text = 2;
map<string, string> fields = 3;
repeated BodyNode children = 4;
repeated Element elements = 5;
Reference reference = 6;
}
Field name | Type | Description |
---|---|---|
type | string | Type of the node (required). |
text | string | Text of the node, only set for text nodes (type == 'text' ). |
fields | map<string, string> | Additional information for the node depending on it's type, e.g. href for a nodes. See fields |
children | repeated BodyNode | Nested Items, e.g. the text of a <p> or a <a> . |
elements | repeated Element | Elements of the node, e.g. video, image, gallery, embed, ... |
fields
HTML like
type | description |
---|---|
text | most basic type , its text value can be found in the text field. The word_count can be found in the BodyNode.fields for each BodyNode[type=text] |
p | paragraph / <p> |
span | <span> |
sub_headline | a sub headline, may be part of the table of contents |
a | anchor / <a> , link target can be found in the repeated Reference[] structure |
strong | strong / <strong> |
em | emphasis / <em> |
sub | subscript / sub |
sup | superscript / sup |
hr | horizontal rule / <hr> |
br | line break / <br> |
ul | unordered list / <ul> |
ol | ordered list / <ol> |
li | list / <li> |
table | table / <table> |
thead | table head / <thead> |
tbody | table body / <tbody> |
tfoot | table footer / <tfoot> |
th | table header / <th> |
tr | table row / <tr> |
td | table data cell / <td> |
Custom
type | description |
---|---|
image | inline image element, check elements |
video | inline video element, check elements |
gallery | inline gallery element, check elements |
oembed | inline oEmbed element, check elements |
esi | inline edge side include element, check elements |
quote | inline quotation element, check elements |
infobox | inline box, consists of textual content in children and optional elements |
pros_and_cons | pros and cons box, consists of elements and structured text in children |
[~] |
Article ۰ Element
Element
s are self-contained objects that represent structured data that
is usually too complex to fit into our usual workhorse which is the
Body
.
Element
s can appear in multiple places within the Article
:
Article.elements
Element
s of the article which are not part of the textual body, e.g. author,
opener and teaser. Those elements should be used to render the article
as a teaser e.g. on section pages.
BodyNode.children
:
Is the place where Element
are quite commonly used. They come in various
types
and will be rendered inplace, thus breaking up the textual body.
Elements.children
:
Some more sophisticated Element
s make use of nesting to make their API
representation more concise and help to structure things hierarchically:
-
video
uses nesting to model its optional poster image which itself is a normalimage
. -
galleries
have their individualimage
s nested within.
Different types of elements like images or videos share
the same message structure distinguished by the ElementType
field.
See Sample section.
Field name | Type | Description |
---|---|---|
type | Type | type of this Element , see Element.Type |
relations | repeated Relation | The usages (relations) of an element. See Relation |
assets | repeated Asset | Assets describing this Element , See Samples |
children | repeated Element | nested Elements , e.g. for Element of type gallery and video |
message Element {
Type type = 1;
repeated Relation relations = 2;
repeated Asset assets = 3;
repeated Element children = 4;
Element.Type
Enum value | Description |
---|---|
TYPE_UNSPECIFIED | unspecified |
ARTICLE | unused |
IMAGE | image, containing further Assets . Sample |
VIDEO | video, containing nested Asset and an optional nested image Element . Sample |
GALLERY | gallery, consists of many nested image Element s. |
OEMBED | oEmbed, contains one metadata Asset . Todo: sample |
AUTHOR | author, contains one metadata Asset and an optional image Element . Todo: sample |
AGENCY | author, contains one metadata Asset |
EDGE_SIDE_INCLUDE | <esi:include> that must be resolved server-side for SEO reasons, otherwise similar to OEMBED |
CITATION | oEmbed, contains one metadata Asset . Todo: sample |
INTERNAL_WIDGET | widget or embed that is handled directly by the front end rendering Todo: sample |
AUDIO | audio element Todo: sample |
enum Type {
TYPE_UNSPECIFIED = 0;
ARTICLE = 1;
IMAGE = 2;
VIDEO = 3;
GALLERY = 4;
OEMBED = 5;
AUTHOR = 6;
AGENCY = 7;
EDGE_SIDE_INCLUDE = 8;
CITATION = 9;
INTERNAL_WIDGET = 10;
AUDIO = 11;
}
Element.Relation
Enum value | Description |
---|---|
RELATION_UNSPECIFIED | unspecified |
OPENER | As an opener element (within the content) |
TEASER | As a teaser element (when externally viewed) |
SOCIAL | Use as social element (mostly images), e.g. <og:image> or JSON-LD |
enum Relation {
RELATION_UNSPECIFIED = 0;
OPENER = 1;
TEASER = 2;
SOCIAL = 3;
}
Samples
For details on certain fields
or usages of Assets
, please follow this link.
image element
- usually consist of one
Asset[@type=METADATA]
and severalAsset[@type=IMAGE]
, one for each crop.
{
"type": "IMAGE"
"relations": [ "OPENER", "TEASER" ],
"assets": [{
"fields": {
"media_id": "90635672v2",
"caption": "Annalena Baerbock und Joschka Fischer bei einer Wahlkampfveranstaltung: Die Grünen-Kanzlerkandidatin fordert von der Bundesregierung, mindestens 10.000 Menschen aus Afghanistan aufzunehmen.",
"alt_text": "Annalena Baerbock und Joschka Fischer bei einer Wahlkampfveranstaltung: Die Grünen-Kanzlerkandidatin fordert von der Bundesregierung, mindestens 10.000 Menschen aus Afghanistan aufzunehmen.",
"description": "Annalena Baerbock und Joschka Fischer bei einer Wahlkampfveranstaltung: Die Grünen-Kanzlerkandidatin fordert von der Bundesregierung, mindestens 10.000 Menschen aus Afghanistan aufzunehmen.",
"source": "/Reuters-bilder"
},
"type": "METADATA",
"metadata": {
"state": "STATE_UNSPECIFIED",
"start_time": { "seconds": "-62135596800" },
"end_time": { "seconds": "253402300799" }
}
},
{
"type": "IMAGE",
"fields": {
"crop": "original",
"url": "https://di7yufqc6mgnl.cloudfront.net/2021/08/90635672v2/fit-in/0x0/annalena-baerbock-und-joschka-fischer-bei-einer-wahlkampfveranstaltung-die-gruenen-kanzlerkandidatin-fordert-von-der-bundesregierung-mindestens-10000-menschen-aus-afghanistan-aufzunehmen.jpg",
"width": "1920",
"height": "1280"
},
},
{
"type": "IMAGE",
"fields": {
"crop": "16:9",
"url": "https://di7yufqc6mgnl.cloudfront.net/2021/08/90635672v2/0x0:1920x1077/fit-in/0x0/annalena-baerbock-und-joschka-fischer-bei-einer-wahlkampfveranstaltung-die-gruenen-kanzlerkandidatin-fordert-von-der-bundesregierung-mindestens-10000-menschen-aus-afghanistan-aufzunehmen.jpg",
"width": "1920",
"height": "1077"
}
}
]}
}
video element
- usually consist of one
Asset[@type=METADATA]
and oneAsset[@type=VIDEO]
. - If the video has a poster image, it can be found as a nested child
Element[type=IMAGE]
withinchildren[]
{
"relations": [ "OPENER" ],
"type": "VIDEO"
"assets": [
{
"type": "METADATA",
"fields": {
"media_id": "0DgeZjJtJ8EC",
"caption": "Eine Statue der Justitia mit einer Waage und einem Schwert in ihren Händen.",
"frame_capture:url": "https://d1q9f0uk9ts7gc.cloudfront.net/2021/08/0DgeZjJtJ8EC/thumbnails/maas-haben-500-menschen-aus-kabul-ausgeflogen_thumb.0000031.jpg",
"frame_capture:numerator": "1",
"frame_capture:denominator": "5"
},
"metadata": {
"state": "STATE_UNSPECIFIED",
"start_time": { "seconds": "-62135596800" },
"end_time": { "seconds": "253402300799" }
}
},
{
"type": "VIDEO"
"fields": {
"duration_seconds": "157.576",
"mime_type": "application/vnd.apple.mpegurl",
"url": "https://d1q9f0uk9ts7gc.cloudfront.net/2021/08/0DgeZjJtJ8EC/hls/maas-haben-500-menschen-aus-kabul-ausgeflogen.m3u8",
"height": "1080",
"width": "1920"
},
}
],
"children": [
{
"type": "IMAGE"
"assets": [
{
"fields": {
"media_id": "amOyEe-u5llZ"
},
"type": "METADATA",
"metadata": {
"start_time": { "seconds": "-62135596800" },
"end_time": { "seconds": "253402300799" }
}
},
{
"type": "IMAGE",
"fields": {
"crop": "original",
"url": "https://di7yufqc6mgnl.cloudfront.net/2021/08/amOyEe-u5llZ/fit-in/0x0/image.png",
"width": "1500",
"height": "844"
}
}
]
}
]
}
gallery element
- usually consist of one
Asset[@type=METADATA]
describing the gallery itself - several nested
Element[@type=IMAGE]
withinchildren[]
, one for each image of this gallery
{
"relations": [ "OPENER" ],
"type": "GALLERY",
"assets": [
{
"type": "METADATA",
"fields": {
"headline": "Gallery ipsum dolor",
"ref_path": "/test-playground/id_100000067/gallery-ipsum-dolor.html",
"ref_canonical": "https://www.t-online.de/test-playground/id_100000067/gallery-ipsum-dolor.html",
"url": "/test-playground/id_100000067/gallery-ipsum-dolor.html"
}
}
],
"children": [
{
"type": "IMAGE",
"assets": [
{
"type": "METADATA",
"fields": {
"media_id": "82333994v1",
"caption": "Wer unterwegs Äpfel pflückt, kann sich strafbar machen.",
"alt_text": "Wer unterwegs Äpfel pflückt, kann sich strafbar machen.",
"description": "Wer unterwegs Äpfel pflückt, kann sich strafbar machen.",
"source": "Patrick Seeger/dpa-tmn/dpa"
},
"metadata": {
"start_time": { "seconds": "-62135596800" },
"end_time": { "seconds": "253402300799" }
}
},
{
"type": "IMAGE",
"fields": {
"crop": "original",
"url": "https://di7yufqc6mgnl.cloudfront.net/2021/05/82333994v1/fit-in/0x0/wer-unterwegs-aepfel-pflueckt-kann-sich-strafbar-machen.jpg",
"width": "640",
"height": "360"
}
}
]
}
]
}
Article ۰ Element ۰ Asset
Asset of an Element.
An asset configuration is dependant upon its use, it may alter depending
on its type
field.
Field name | Type | Description |
---|---|---|
type | Type | Type of the asset. |
fields | map<string, string> | Generic map containing general content and configuration information of the asset. Clients must be resilient to unknown or missing entry sets. |
metadata | Metadata | Only present for assets of TYPE.METADATA . Technical metadata for the parent element (state, validity, ...). See Metadata |
reference | Reference | Reference, e.g. URL belonging to this asset. |
message Asset {
Type type = 1;
map<string, string> fields = 2;
Metadata metadata = 3;
Reference reference = 4;
enum Type
Type of an asset.
Enum value | Description |
---|---|
TYPE_UNSPECIFIED | unspecified |
IMAGE | image asset with an resizable template URL and some image stats (width , height , cropping ). See samples |
VIDEO | internal video asset, expect (m3u8 /HLS ) URLS and some video stats (width , height , druation ) within fields |
EXTERNAL_VIDEO | holds (m3u8 /HLS ) URLS to external videos, such as live streams and glomex |
METADATA | holds Metadata for the parent element and fields that also depend on the parent Element.Type |
LINK | additional link (href, reference) asset for parent Element , e.g. an image with an optional link target. |
AUDIO | internal audio asset, expect (mp3 ) URLS |
enum Type {
TYPE_UNSPECIFIED = 0;
IMAGE = 1;
VIDEO = 2;
EXTERNAL_VIDEO = 3;
METADATA = 4;
LINK = 5;
AUDIO = 6;
}
Samples
Image Asset
{
"type": "IMAGE",
"fields": {
"crop": "16:9",
"url": "https://${CDN_URL}/89670804v20/0x37:1920x1079/fit-in/0x0/das-covid-19-dashboard-vom-robert-koch-institut-symbolbild-die-corona-inzidenz-in-muenchen-ist-deutlich-gesunken.jpg",
"width": "1920",
"x": "0",
"y": "37",
"height": "1079"
}
}
field | description |
---|---|
url | the URL for this cropped images withou scaling. If scaling is desired, replace the /0x0/ with the desired dimensions, /fit-in/ will make sure that the cropped image will fit inside this rectangle. |
crop | this cropped image's loginal name, e.g. original , 16:9 , custom |
x | x-offset off the original image for this crop |
y | y-offset off the original image for this crop |
width | the width of this cropped image, before scaling. |
height | the height of this cropped image, before scaling |
NOTES:
x + width <= width(original_image)
otherwise the image generator will faily + height <= height(original_image)
otherwise the image generator will fail
Video Asset
{
"type": "VIDEO"
"fields": {
"duration_seconds": "157.576",
"mime_type": "application/vnd.apple.mpegurl",
"url": "https://d1q9f0uk9ts7gc.cloudfront.net/2021/08/0DgeZjJtJ8EC/hls/maas-haben-500-menschen-aus-kabul-ausgeflogen.m3u8",
"height": "1080",
"width": "1920"
}
}
field | description |
---|---|
url | the URL of this asset, usually a m3u8 playlist URL |
mime_type | the mime type of this asset, usually a m3u8/HLS |
duration_seconds | video duration in seconds |
height | the height of the original video, may differ from the transcoded video. |
width | the width of the original video, may differ from the transcoded video. |
Video Metadata Asset
{
"type": "METADATA",
"fields": {
"media_id": "0DgeZjJtJ8EC",
"caption": "Eine Statue der Justitia mit einer Waage und einem Schwert in ihren Händen.",
"frame_capture:url": "https://example.com/thumbnails/thumb.0000031.jpg",
"frame_capture:numerator": "1",
"frame_capture:denominator": "5"
}
}
field | description |
---|---|
media_id | alpha-numeric CMS id of the media |
caption | the video's caption |
frame_capture:url | if frame capture was enabled during transcoding, this is the URL of the last frame capture |
frame_capture:numerator | frame capture images are numerated, starting at 0000000 which can be used as the poster image. |
frame_capture:denominator | numerator and denominator can be used to to calculate which frame capture image must be displayed at a given time. |
Example: for numerator=1
and denominator=5
, we have to increment the frame capture every 5 / 1 == 5 seconds
:
00:00.000 --> 00:05.000
/thumbnails/thumb.0000000.jpg
00:05.000 --> 00:10.000
/thumbnails/thumb.0000001.jpg
00:10.000 --> 00:15.000
/thumbnails/thumb.0000002.jpg
00:15.000 --> 00:20.000
/thumbnails/thumb.0000003.jpg
...
Article ۰ Keyword
Extracted keywords from the article body like persons, locations, organizations etc.
Field name | Type | Description |
---|---|---|
value | string | Unique value of this keyword. |
type | string | Type/Category of this keyword like location , organization , person |
score | float | Score for the relevance of this keyword set by the engine |
message Keyword {
string value = 1;
string type = 2;
float score = 3;
}
Article ۰ Metadata
Article metadata like publication state and technical timestamps.
Field name | Type | Description |
---|---|---|
state | State | State of the article in the content management system. See enum State |
start_time | Timestamp | Manually set editorial timestamp (Gültig von) at which the article is valid to deliver on digital platforms in seconds of UTC time since Unix epoch. |
end_time | Timestamp | Manually set editorial timestamp (Gültig bis) till the article is valid to deliver on digital platforms in seconds of UTC time since Unix epoch. |
publish_time | Timestamp | Editorial timestamp (Publikationsdatum) of the first publication of the article in seconds of UTC time since Unix epoch. This date will be set automatically by the content management system. |
update_time | Timestamp | Editorial timestamp (Aktualisierungsdatum) at which the article was updated in seconds of UTC time since Unix epoch. On first publication this timestamp matches publish_time . Afterwards it's either updated manually in the content management system or automatically if the article content changed significantly. |
transformation_time | Timestamp | Technical timestamp at which the article was transformed in the API layer in seconds of UTC time since Unix epoch. |
transformation_errors | int64 | Number of errors occurred while fetching and/or transforming optional article components (e.g. embeds or nested documents ) to an article message. |
last_modification_time | Timestamp | Technical timestamp at which the article was published regardless of the amount and significance of the change. |
event_source | EventSource | Source of the event that caused this item to be transformed and to be written into the DB. |
seo_score | double | The article score (originates from team data's Content Engine, higher scores are better) |
publication_id | int64 | The unique publication_id provided by the CMS, can be used to correlate the state of documents in tapir with the corresponding CMS publication event. |
related_article_source | string | Source of this article, if embedded in another article as a related article. |
tenant | string | The tenant this article belongs to. e.g. www , berlin or such |
message Metadata {
State state = 1;
google.protobuf.Timestamp start_time = 2;
google.protobuf.Timestamp end_time = 3;
google.protobuf.Timestamp publish_time = 4;
google.protobuf.Timestamp update_time = 5;
google.protobuf.Timestamp transformation_time = 6;
int64 transformation_errors = 7;
google.protobuf.Timestamp last_modification_time = 8;
EventSource event_source = 9;
double seo_score = 10;
int64 publication_id = 11;
string related_article_source = 12;
string tenant = 13;
enum State
State of the item (Article
, Element
)
in the content management system. The state
in combination with
start_time
and end_time
determines whether or not this item should be
rendered; this must be respected by all consumers especially
when content is duplicated or cached.
The terms deleted
(articles) and archived
(media lib) are interchangeable/synonyms.
This enum combines those two into State.DELETED
. An Article is in State.DELETED
if it was deleted in the content management system, or if it's end_time
has been reached.
An Article is in State.DRAFT
if it has never been published, or if the
start_time
lies in the future.
Enum value | description |
---|---|
STATE_UNSPECIFIED | unspecified |
PUBLISHED | published content which is currently within its validity dates |
DELETED | this content is deleted or expired in the CMS |
DRAFT | this content was never published in the CMS |
enum State {
STATE_UNSPECIFIED = 0;
PUBLISHED = 1;
DELETED = 2;
DRAFT = 3;
}
enum EventSource
Even more detail about the circumstances of transformation for this article.
The EventSource
will be of type:
PRIMARY
in case this article was directly updated and publishedSECONDARY
in case this article was indirectly updated. This can be caused by updates of nested elements, such as Videos that may expire at some point. Another source of change may be Scheduled Events like this item becomes valid or invalid at some point in time in the future after the item's original publication time.
Enum value | description |
---|---|
EVENT_SOURCE_UNSPECIFIED | unspecified |
PRIMARY | this article's transformation was caused by a direct change in the CMS |
SECONDARY | this article's transformation was caused by a transitive update |
CONTENT_ENGINE | this article's transformation was caused by an external system (Content Engine) |
enum EventSource {
EVENT_SOURCE_UNSPECIFIED = 0;
PRIMARY = 1;
SECONDARY = 2;
CONTENT_ENGINE = 3;
}
Author
This represents an author (or agency). The entity may be the main content on author pages or simply indicate the author of an Article.
Field name | Type | Description |
---|---|---|
id | int64 | The unique identifier (cms id) of the author. |
type | Author.Type | The type of the author entity. |
fields | map[string, string] | The fields of the author. This is a map of key-value pairs. The keys are the field names and the values are the field values. |
elements | Article.Element[] | The elements of the author, e.g. the author's profile picture. |
work_history | Author.HistoryEntry[] | The career entries of the author. |
education | Reference[] | The education entries of the author. |
social_profiles | Reference[] | The social profiles of the author. |
areas_of_expertise | string[] | List of topics where the author possesses extraordinary knowledge |
references | Reference [] | References, e.g. URLs belonging to this article. |
message Author {
int64 id = 1;
Type type = 2;
map<string, string> fields = 3; // migrate from Asset[type=metadata]
repeated Article.Element elements = 4; // profile picture
repeated HistoryEntry work_history = 5;
repeated Reference education = 6;
repeated Reference social_profiles = 7;
repeated string areas_of_expertise = 8;
repeated Reference references = 9;
enum Type
Enum value | Description |
---|---|
TYPE_UNSPECIFIED | unspecified |
AUTHOR | The author is a person. |
AGENCY | The author is an agency or company. |
enum Type {
TYPE_UNSPECIFIED = 0;
AUTHOR = 1;
AGENCY = 2;
}
HistoryEntry
Lists previous jobs and details about the author's career.
Field name | Type | Description |
---|---|---|
role | string | The role of the author for this occupation. |
description | string | A description of the author's role. |
message HistoryEntry {
string role = 1;
string description = 2;
}
Sample Author
{
"id": 100000001,
"type": "AUTHOR",
"fields": {
"flag:hidden": "true",
"role": "Hier steht ein Titel",
"academic_degree": "Prof.",
"last_name": "Doe",
"short_name": "jdoe",
"headline": "Autorenseite von John Doe",
"first_name": "John",
"ignore_vg_wort": "true",
"url": "/author/id_100000001/john-doe.html"
},
"elements": [
{ "//": "Author Image Element removed for better readability" }
],
"work_history": [
{
"role": "Dummy",
"description": "Hält nur als pseudo Autor her, John Doe eben ;)"
},
{
"role": "Chief Executive Officer of ACME",
"description": "Very important"
}
],
"education": [
{
"children": [],
"fields": {},
"type": "",
"label": "John Doe Acedamy",
"href": "https://www.john.doe.acedamy.com"
},
{
"children": [],
"fields": {},
"type": "",
"label": "ACME university",
"href": "https://www.acmemilano.it/"
}
],
"social_profiles": [
{
"children": [],
"fields": {},
"type": "",
"label": "MySpace",
"href": "https://myspace.com/johndoe"
},
{
"children": [],
"fields": {},
"type": "",
"label": "Instagram",
"href": "https://www.instagram.com/johndoe.x/?hl=en"
}
],
"areas_of_expertise": [
"Dummy",
"ACME",
"Example",
"no-op",
"Cyber",
"PDP-11-Assembly",
"Tetris"
]
}
Reference
A Reference represents a link to another entity, for example an Article
,
a Section
or an external website, or a whole tree structure, for example
a section tree or breadcrumb navigation.
Field name | Type | Description |
---|---|---|
type | string | The type is used for filtering in a list of references. It describes a use-case, which usually has a defined render position. See type |
label | string | The label of the reference. |
href | string | The href of the reference. It can be relative or absolute. |
fields | map<string, string> | Contains all optional attributes of the reference. Clients must be resilient to unknown or missing entries. See fields |
children | repeated Reference | Hierarchically structured references for representing a navigation or tree. |
message Reference {
string type = 1;
string label = 2;
string href = 3;
map<string, string> fields = 4;
repeated Reference children = 5;
}
type
Example entries:
unspecified
textstage_title
stage_themenbereiche
stage_header_links
stage_top_themen
stage_tag_category
fields
Contains one or more optional attributes of the reference:
target
rel
flag:internal
layout
Samples
{
"label": "Home"
"href": "/",
"children": [
"label": "Spielwiese (Tests)",
"href": "/test-playground/"
]
}
Stage
A stream stage with companions and the main content area. Embedded items can be editorial articles, advertisement and/or stages (only one level deep).
message Stage {
Configuration configuration = 1;
repeated Item stream_items = 2;
repeated Item companion_items = 3;
⚙︎ ArticlePageService
service ArticlePageService {
# turns the requested article with editorial render relevant data for the user and SEO bots.
rpc GetArticlePage (GetArticlePageRequest) returns (GetArticlePageResponse) {}
}
Description
Request message to get an article page.
message GetArticlePageRequest {
# ID of the article defined by the content management system (required).
int64 id = 1;
}
Response message for an article page request.
Status codes:
OK
| article exists and is publishedNOT_FOUND
| article doesn't exist or is not published according to it'sMetadata
message GetArticlePageResponse {
# Article page with all render relevant data for the user and SEO bots.
stroeer.page.article.v1.ArticlePage article_page = 1;
}
Status/Error scenario's
scenario found
description | article was found in the datastore and is published and valid according to it's metadata |
gRPC status | OK (0) |
gRPC error payload | none |
HTTP status | 200 (OK) |
cacheable | yes |
scenario invalid id
description | article id is invalid |
gRPC status | INVALID_ARGUMENT (3) |
gRPC error payload | google.rpc.Bad |
HTTP status | 400 (BAD REQUEST) |
cacheable | yes |
scenario not found
description | article was not found in the datastore |
gRPC status | NOT_FOUND (5) |
gRPC error payload | none |
HTTP status | 404 (NOT FOUND) |
cacheable | yes |
scenario not yet valid
description | article was found in the datastore, but is not valid yet according to its metadata.start_time |
gRPC status | NOT_FOUND (5) |
gRPC error payload | google.rpc.ResourceInfo, check description field for recommended http status code |
HTTP status | 404 (NOT FOUND) |
cacheable | yes |
scenario not published
description | article was found in the datastore, but it's state is neither State.DELETED nor State.PUBLISHED |
gRPC status | NOT_FOUND (5) |
gRPC error payload | google.rpc.ResourceInfo, check description field for recommended http status code |
HTTP status | 404 (NOT FOUND) |
cacheable | yes |
scenario expired
description | article was found in the datastore, but is expired according to metadata.end_time |
gRPC status | NOT_FOUND (5) |
gRPC error payload | google.rpc.ResourceInfo, check description field for recommended http status code |
HTTP status | 410 (GONE) |
cacheable | yes |
scenario deleted/archived
description | article was found in the datastore, but it's state is State.DELETED |
gRPC status | NOT_FOUND (5) |
gRPC error payload | google.rpc.ResourceInfo, check description field for recommended http status code |
HTTP status | 410 (GONE) |
cacheable | yes |
scenario internal
description | internal error processing the article |
gRPC status | INTERNAL (13) |
gRPC error payload | none |
HTTP status | 500 (INTERNAL SERVER ERROR) |
cacheable | no |
scenario timeout
description | timeout loading and processing the article |
gRPC status | DEADLINE_EXCEEDED (4) |
gRPC error payload | none |
HTTP status | 504 (GATEWAY TIMEOUT) |
cacheable | no |
[~] |
⚙︎ CoreArticleService
Core service to either query a single article (rpc GetArticle()
) identified
by its id or to query multiple articles (rpc ListArticles()
) by providing
a query.
All results returned from this service are unfiltered by default, hence they may contain
elements
that are expired, not yet valid or whose state
is not PUBLISHED
.
This behaviour can be changed by providing a RequestSettings
object.
service ArticleService {
rpc GetArticle(GetArticleRequest) returns (stroeer.core.v1.Article) {}
rpc BatchGetArticles(BatchGetArticlesRequest) returns (BatchGetArticlesResponse) {}
rpc ListArticles(ListArticlesRequest) returns (ListArticlesResponse) {}
// Allow Empty as request param
// buf:lint:ignore RPC_REQUEST_STANDARD_NAME
rpc ListSections(google.protobuf.Empty) returns (ListSectionsResponse) {}
}
⚙︎ GetArticle
rpc GetArticle (GetArticleRequest) returns (stroeer.core.v1.Article) {}
returns a single stroeer.core.v1.Article
if the given id
exists,
an Error
, otherwise. (todo: describe errors)
Field name | Type | Description |
---|---|---|
id | int64 | [required] Unique id of the article to be fetched. |
message GetArticleRequest {
int64 id = 1;
RequestSettings request_settings = 2;
}
⚙︎ BatchGetArticle
returns multiple stroeer.core.v1.Article
for the given ids
. The
ordering of items will the same ordering as the ids
requested.
If an id
does not exists, it is omitted in the result (no error will be raised).
There is a maximum of 100 items that can be queried in one batch.
Field name | Type | Description |
---|---|---|
ids | repeated int64 | [required] A list of ids of the articles to be fetched |
message BatchGetArticlesRequest {
repeated int64 ids = 1;
RequestSettings request_settings = 2;
}
returns a message-listarticlesresponse
with
articles matching the query. If the results exceed 100 Articles or 1 MB
the response can be paginated to obtain additional results.
ListArticlesRequest
Field name | Type | Description |
---|---|---|
query | Query | [required] find items based on query values |
filters | Filters | [optional] A filter expression is applied after a Query finishes, but before the results are returned. |
page_size | int32 | [optional] limit the results per page, default is 10 ; max is 100 (or result exceeds 1 MB ). Values above 100 will be coerced to 100. If results get truncated, you can use pagination. |
page_token | string | [optional] A page token, received from a previous ListArticles call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListArticles must match the call that provided the page token. |
message ListArticlesRequest {
Query query = 1;
Filters filters = 2;
int32 page_size = 3;
string page_token = 4;
RequestSettings request_settings = 5;
Query
Specify the search criteria. The list-API is build around sections which come in two flavors:
home_section
: find all articles that resides within that exact section. Thehome_section
is equal to the settings found in the CMS, e.g./nachrichten/wissen/
root_section
: this property is derived from thehome_section
path by retaining only the root folder, e.g. for/nachrichten/wissen/
theroot_section
becomes/nachrichten/
In most cases using the root_section
should yield better results since
it will also find content in nested sections whereas home_section
would
only return content which was curated into the exact section that was queried.
Field name | Type | Description |
---|---|---|
path | string | [required] path , with leading and trailing slash (e.g. /nachrichten/ ) |
type | Type | [required] query type, either Type.HOME_SECTION or Type.ROOT_SECTION |
sort_by | SortBy | [required] sorting of the result set, either SortBy.UPDATE_TIME or SortBy.PUBLISH_TIME |
order | Order | [optional] sorting direction for the results regarding the sort_by field, default is Order.ASCENDING |
from_time | Timestamp | [optional] time constraint that refers to the sort_by field. |
to_time | Timestamp | [optional] time constraint that refers to the sort_by field. |
message Query {
string path = 1;
Type type = 2;
SortBy sort_by = 3;
Order order = 4;
google.protobuf.Timestamp from_time = 5;
google.protobuf.Timestamp to_time = 6;
Type
Enum value | Description |
---|---|
TYPE_UNSPECIFIED | unspecified |
HOME_SECTION | query by exact home section which is configured in the CMS |
ROOT_SECTION | query by exact root section which is derived from home section when only retaining the first level of the path |
see the description above why these query types exist, also see
Reference
how section information are stored.
enum Type {
TYPE_UNSPECIFIED = 0;
HOME_SECTION = 1;
ROOT_SECTION = 2;
}
SortBy
Enum value | Description |
---|---|
SORT_BY_UNSPECIFIED | unspecified |
UPDATE_TIME | sort by the content's update_time |
PUBLISH_TIME | sort by the content's publish_time |
enum SortBy {
SORT_BY_UNSPECIFIED = 0;
UPDATE_TIME = 1;
PUBLISH_TIME = 2;
}
Order
order of index traversal, default: ascending.
Enum value | Description |
---|---|
ORDER_UNSPECIFIED | unspecified |
ASCENDING | ascending order index traversal |
DESCENDING | descending order index traversal |
enum Order {
ORDER_UNSPECIFIED = 0;
ASCENDING = 1;
DESCENDING = 2;
}
Filters
If you need to further refine the Query results, you can optionally provide a filter expression. A filter expression determines which items within the Query results should be returned to you. All of the other results are discarded.
A filter expression is applied after a Query finishes, but before the results are returned. Therefore, a Query consumes the same amount of read capacity, regardless of whether a filter expression is present.
Field name | Type | Description |
---|---|---|
type_includes | ContentType | type to include into the result set |
type_includes | ContentType | type to exclude from the result set |
sub_type_includes | ContentSubType | sub_type to include into the result set |
sub_type_excludes | ContentSubType | sub_type to exclude from the result set |
message Filters {
repeated Article.Type type_includes = 1;
repeated Article.Type type_excludes = 2;
repeated Article.SubType sub_type_includes = 3;
repeated Article.SubType sub_type_excludes = 4;
}
RequestSettings
Alters the behavior of the request in a way that filters or alters the result or parts of the result based on validity of the article or its elements.
You can also alter the view mode of the article, selecting either the full
article or a limited version of it (called teaser
or trail
)
RequestSettings.ArticleViewMode
Enum value | Description |
---|---|
ARTICLE_VIEW_MODE_UNSPECIFIED | unspecified, defaults to ARTICLE_VIEW_MODE_DEFAULT |
ARTICLE_VIEW_MODE_DEFAULT | full article including body and all elements. |
ARTICLE_VIEW_MODE_TEASER | elements that are not required when teasering the article are removed (e.g. body). |
RequestSettings.ArticleValidity
Enum value | Description |
---|---|
ARTICLE_VALIDITY_UNSPECIFIED | unspecified, defaults to ARTICLE_VALIDITY_IGNORE |
ARTICLE_VALIDITY_VALID | filters articles that are considered valid and allowed to be accessed publicly. |
ARTICLE_VALIDITY_IGNORE | Ignore the article validity and return everything as is, even deleted or expired content. |
RequestSettings.ElementValidity
Enum value | Description |
---|---|
ELEMENT_VALIDITY_UNSPECIFIED | unspecified, defaults to ELEMENT_VALIDITY_IGNORE |
ELEMENT_VALIDITY_VALID | Remove invalid elements from its parent article, such as expired images or videos. |
ELEMENT_VALIDITY_IGNORE | Ignore the element validity and return everything as is, even deleted or expired content. |
message RequestSettings {
ArticleViewMode article_view_mode = 1;
ArticleValidity article_validity = 2;
ElementValidity element_validity = 3;
enum ArticleViewMode {
ARTICLE_VIEW_MODE_UNSPECIFIED = 0;
ARTICLE_VIEW_MODE_DEFAULT = 1;
ARTICLE_VIEW_MODE_TEASER = 2;
}
enum ArticleValidity {
ARTICLE_VALIDITY_UNSPECIFIED = 0;
ARTICLE_VALIDITY_VALID = 1;
ARTICLE_VALIDITY_IGNORE = 2;
}
enum ElementValidity {
ELEMENT_VALIDITY_UNSPECIFIED = 0;
ELEMENT_VALIDITY_VALID = 1;
ELEMENT_VALIDITY_IGNORE = 2;
}
}
ListArticlesResponse
Field name | Type | Description |
---|---|---|
articles | Article | list of articles that match the query and also the filter , otherwise empty . |
next_page_token | string | A token that can be sent as page_token to retrieve the next page. If this field is omitted, there are no subsequent pages. |
message ListArticlesResponse {
repeated stroeer.core.v1.Article articles = 1;
string next_page_token = 2;
}
⚙︎ ListSections
list the available root sections
ListSectionsResponse
list all available root_sections
that can be used in the query
above.
message ListSectionsResponse {
repeated string sections = 1;
}
⚙︎ CurationService
This services allows to query curations within the CMS. In the CMS domain
this is implemented as Lists
which usually contain one ore more Articles
.
service CurationService {
rpc GetCuration(GetCurationRequest) returns (GetCurationResponse) {}
rpc BatchGetCuration(BatchGetCurationRequest) returns (BatchGetCurationResponse) {}
}
⚙︎ GetCuration
Fetch a curation by its id and return the repeated stroeer.core.v1.Article
this
curation contains. The response may be empty in case the curation does not contain any items.
a NOT_FOUND
status code will indicate the curation id
does not exist.
GetCurationRequest
Field name | Type | Description |
---|---|---|
id | int64 | [required] id of the list to be fetched |
message GetCurationRequest {
int64 id = 1;
}
GetCurationResponse
Field name | Type | Description |
---|---|---|
id | int64 | the id of this list |
label | string | the label of this list |
update_time | Timestamp | Technical timestamp at which the curation was updated in seconds UTC time since Unix epoch. |
articles | repeated Article | curated items of this list |
message GetCurationResponse {
int64 id = 1;
string label = 2;
google.protobuf.Timestamp update_time = 3;
repeated stroeer.core.v1.Article articles = 4;
}
⚙︎ BatchGetCuration
Fetch multiple curations by their id and return the repeated stroeer.core.v1.Article
those
curations contain. The response may be empty in case the curation does not contain any items.
The ordering of items will the same ordering as the ids
requested.
BatchGetCurationRequest
Field name | Type | Description |
---|---|---|
ids | repeated int64 | the ids of the lists to be fetched |
message BatchGetCurationRequest {
repeated int64 ids = 1;
}
BatchGetCurationResponse
Field name | Type | Description |
---|---|---|
curations | GetCurationResponse | a single response item that corresponds to ids this service was called with. |
message BatchGetCurationResponse {
repeated GetCurationResponse curations = 1;
}
⚙︎ SectionPageService
Message to provide parameters when requesting data for a section page, currently only the path of the page.
Correct paths have a leading and a trailing slash, like /nachrichten/unterhaltung/
The homepage has the path /
.
message GetSectionPageRequest {
// valid section_path, with leading and trailing slash
string section_path = 1;
// use to page through sections. If unspecified, it will default to `1`.
// Paging is 1-based (1 is the first page, there is no page `0`)
//
// Due to underlying mechanisms and seo requirements, page-size is fixed at 30
// The service may return fewer than this value.
int32 page = 2;
}
Response message when requesting data for a section page.
Responds with NOT_FOUND
if an unknown path is requested, or the path is incorrect.
message GetSectionPageResponse {
stroeer.page.section.v1.SectionPage section_page = 1;
// Total number of pages in this `section_path`
int32 total_pages = 2;
PaginationType pagination_type = 3;
enum PaginationType {
// Not specified.
PAGINATION_TYPE_UNSPECIFIED = 0;
// The default pagination type.
FIXED_BLOCK = 1;
// Pagination type for Evergreen-Ressorts.
GHOST_BLOCK = 2;
}
}
Service to fetch all data needed to render a section page, like the homepage or "/politik/" [~]
Status/Error scenario's
scenario: found
description | all data for the section page was found |
gRPC status | OK (0) |
gRPC error payload | none |
HTTP status | 200 (OK) |
cacheable | yes |
scenario: section path is empty
description | client did not provide a section path |
gRPC status | INVALID_ARGUMENT (3) |
gRPC error payload | google.rpc.Bad |
HTTP status | 400 (BAD REQUEST) |
cacheable | yes |
scenario: section path is invalid
description | client provided an invalid section path |
gRPC status | INVALID_ARGUMENT (3) |
gRPC error payload | google.rpc.Bad |
HTTP status | 400 (BAD REQUEST) |
cacheable | yes |
scenario: section path is unknown
description | client provided an unknown section path |
gRPC status | NOT_FOUND (5) |
gRPC error payload | none |
HTTP status | 404 (NOT FOUND) |
cacheable | yes |
scenario partial section data
description | artificial internal error processing parts of this section (no data but valid section) |
gRPC status | INTERNAL (13) |
gRPC error payload | none |
HTTP status | 500 (INTERNAL SERVER ERROR) |
cacheable | no |
scenario internal
description | internal error processing the section |
gRPC status | INTERNAL (13) |
gRPC error payload | none |
HTTP status | 500 (INTERNAL SERVER ERROR) |
cacheable | no |
scenario timeout
description | timeout loading and processing the section |
gRPC status | DEADLINE_EXCEEDED (4) |
gRPC error payload | none |
HTTP status | 504 (GATEWAY TIMEOUT) |
cacheable | no |
Scenarios about incomplete section data needs to be defined. No section data results in an internal server error while incomplete section data might be returned. [~]
⚙︎ StageService
############################################################################
Description
############################################################################ Get single stages by requesting them via well-known ids, e.g. "schlagzeilen" or "meistgelesen".
############################################################################
Status/Error scenario's
############################################################################
scenario: found description: service responded without encountering exceptions gRPC status: OK gRPC error payload: none HTTP status: OK cacheable: yes
scenario: internal description: internal error while loading data gRPC status: INTERNAL gRPC error payload: none HTTP status: 500 cacheable: no
scenario: timeout description: timeout while loading data gRPC status: DEADLINE_EXCEEDED gRPC error payload: none HTTP status: 504 cacheable: no [~]