Article

An article represents a piece of content created in the content management system. Different types of content like text or video articles share the same message structure, they can be distinguished by the Article.Type field. Text articles (type = Article.Type.ARTICLE) also have Article.SubType to differentiate its purpose and form.

Teaser

To improve performance of database access and during network transmission tapir is using a lightweight representation of Article in some places.

Depending on the service used to retrieve an article, the Article message might only contain data required on section pages:

  • Article.body set to null
  • Article.elements filtered by Element.relations to only contain TEASER, but neither OPENER nor SOCIAL

Thus, not containing any data that is only required on detail pages. This lightweight representation is sometimes referred to as Teaser.

Field nameTypeDescription
idint64Unique ID of the article defined by the content management system (required).
typeTypeMain content type of the article (required). See list of supported [ContentType][ct]
sub_typeSubTypeSubtype of the article. For ARTICLE this field holds a sub_type, for others like GALLERY it may not.
section_treeReferenceHierarchical section tree information of the article (required).
fieldsmap<string, string>Generic map containing general content and configuration information of the article (required). See fields
bodiesrepeated BodyRecursive textual body of the article to be rendered on detail pages. May be null for Teaser.
metadataMetadataThe articles Metadata, containing state and various timestamps.
elementsrepeated ElementElements required to render the teaser, such as IMAGE, VIDEO or AUTHOR
keywordsrepeated KeywordExtracted keywords from the article body like persons, locations, organizations etc.
onwardsint64IDs of articles related to this article. Related articles are defined manually in the content management system by the editorial department.
variantsmap<string, Article>Variants of this article, e.g. for headline testing.
entitiesstring [deprecated]Extracted entities from the article body like persons, locations, organizations etc. deprecated — use keywords instead.
authorsrepeated AuthorAuthors and or Agencies ƒor this content
related_articlesrepeated ArticleEditorial articles, which are related to the main article. May only be an empty unresolved article (not all services will resolve these).
referencesrepeated Reference[]References, e.g. URLs belonging to this article.

message Article {
int64 id = 1;
Type type = 2;
SubType sub_type = 3;
stroeer.core.v1.Reference section_tree = 4;
map<string, string> fields = 5;
repeated Body bodies = 6;
Metadata metadata = 7;
repeated Element elements = 8;
repeated Keyword keywords = 9;
repeated int64 onwards = 10 [deprecated = true];
map<string, Article> variants = 11;
repeated Author authors = 12;
repeated Article related_articles = 13;
repeated Reference references = 14;
repeated string entities = 100 [deprecated = true];

[~]

fields

The entry set is defined by the content management system and will vary depending on the main type of the article.

⚠ Clients must be resilient to unknown or missing entries. ⚠

For Article.Type.ARTICLE

this map will contain the following data:

keymandatorydescription
headline*the headline for this content
top_line*"dachzeile"
ref_path*URL path for this article e.g. /${section_tree}/id_${id}/${title}.html
ref_canonical*Canonical URL of this article, may differ if external, e.g. https://www.example.com/external.html
summarysummary for this content
teaser_textused on teasers, overrides summary
meta_robots
social_headlineused for social markup, overrides headline
headline_shortused for "Schlagzeilen", overrides headline
meta_titleHTML <meta title>
expert_line
social_description
meta_descriptionHTML <meta description>
reading_time_minutesestimated reading time in minutes
flag:hiddenthis content must be excluded from automated curations (CMS: no auto-content/manuell kuratieren)

For Article.Type.GALLERY

this map will contain the following data:

keymandatorydescription
headline*the headline for this content

enum Type

Enum valueDescription
TYPE_UNSPECIFIEDunspecified
ARTICLEA text article, usually sub typed
IMAGE[deprecated] An image article, unused, deprecated
VIDEOA video article, contains HLS-videos, as well as external live str
GALLERYA gallery article
EMBEDAn embed article including an oembed or edge_side_include element
AUTHORAn author article, currently not implemented
AGENCY[deprecated] An agency article, unused, deprecated
EXTERNALAn external article (teaser-like external article)
INTERNALUsed for internal purposes only.
CLUSTERa thematically grouped cluster for various amount of articles, resolved Trails embedded via related_articles

enum Type {
TYPE_UNSPECIFIED = 0;
ARTICLE = 1;
IMAGE = 2 [deprecated = true];
VIDEO = 3;
GALLERY = 4;
EMBED = 5;
AUTHOR = 6;
AGENCY = 7 [deprecated = true];
EXTERNAL = 8;
CLUSTER = 9;
INTERNAL = 100;
}

[~]

enum SubType

Content with Type.ARTICLE is usually sub typed to alter its form and purpose.

Enum valueDescription
SUB_TYPE_UNSPECIFIEDunspecified
NEWSMeldung/Nachricht — this is the default
COLUMNKolumne
COMMENTARYKommentar
INTERVIEWInterview
CONTROVERSYPro und Kontra/Streitgespräch
TAGESANBRUCHTagesanbruch
EVERGREENEvergreen
AGENCY_IMPORTContent originally imported from agency/tickers by the CMS
ADVERTORIALAdvertorial
QUIZQuiz
GAME(Browser)Game
COMPLIANCEInternal company articles like an imprint or contact forms
RECIPECooking recipe

enum SubType {
SUB_TYPE_UNSPECIFIED = 0;
NEWS = 1;
COLUMN = 2;
COMMENTARY = 3;
INTERVIEW = 4;
CONTROVERSY = 5;
TAGESANBRUCH = 6;
EVERGREEN = 7;
AGENCY_IMPORT = 8;
ADVERTORIAL = 9;
QUIZ = 10;
GAME = 11;
COMPLIANCE = 12;
RECIPE = 13;
}

[~]

[~]

Article ۰ Body

The Body represents a basic block. Each Body is self-contained and holds all the data required for rendering within its data structures.

Common use cases for this are Type.BODY where the textual article body can be found and the TYPE.ARTICLE_SOURCE where onward articles are referenced.

Field nameTypeDescription
childrenrepeated BodyNodeRecursive/Nested structure that usually represents the textual body / Markup / HTML
typeTypeUnique ID of the article defined by the content management system (required).
message Body {
repeated BodyNode children = 1;
Type type = 2;

[~]

Type

Each Body has a Body.Type to help the consumer to correctly interpret the BodyNode's content.

Enum valueDescription
TYPE_UNSPECIFIEDunspecified
BODYThe textual article body including all inline elements such as IMAGE, VIDEO and EMBED
ARTICLE_SOURCESA wrapper for all article sources ("Quellenaparat"). There can only be one of these per article.
DISCLAIMERA article disclaimer with important notes/legal stuff. E.g. "medizinischer Hinweis" on all medical articles
TRUST_BOXIncludes information what the current article type is (e.g. opinion article). There can only be one of these per article.
TABLE_OF_CONTENTSTable of contents for this article, consists of anchors which refer to sub headlines within the BODY
enum Type {
TYPE_UNSPECIFIED = 0;
BODY = 1;
ARTICLE_SOURCES = 2;
DISCLAIMER = 3;
TRUST_BOX = 4;
TABLE_OF_CONTENTS = 5;
}

[~]

BodyNode

Recursive structure representing all types of possible nodes inside an article.

One use-case is to represent HTML-like markup in tapir, but it is also used to map custom elements that require a strict positional placement within the textual body. Things that are not part of the textual article body are represented as individual Body parts so they can be rendered independently if required.

Clients must be resilient to unknown or missing nodes.

message BodyNode {
string type = 1;
string text = 2;
map<string, string> fields = 3;
repeated BodyNode children = 4;
repeated Element elements = 5;
Reference reference = 6;
}

[~]

Field nameTypeDescription
typestringType of the node (required).
textstringText of the node, only set for text nodes (type == 'text').
fieldsmap<string, string>Additional information for the node depending on it's type, e.g. href for a nodes. See fields
childrenrepeated BodyNodeNested Items, e.g. the text of a <p> or a <a>.
elementsrepeated ElementElements of the node, e.g. video, image, gallery, embed, ...

fields

HTML like

typedescription
textmost basic type, its text value can be found in the text field. The word_count can be found in the BodyNode.fields for each BodyNode[type=text]
pparagraph / <p>
span<span>
sub_headlinea sub headline, may be part of the table of contents
aanchor / <a>, link target can be found in the repeated Reference[] structure
strongstrong / <strong>
ememphasis / <em>
subsubscript / sub
supsuperscript / sup
hrhorizontal rule / <hr>
brline break / <br>
ulunordered list / <ul>
olordered list / <ol>
lilist / <li>
tabletable / <table>
theadtable head / <thead>
tbodytable body / <tbody>
tfoottable footer / <tfoot>
thtable header / <th>
trtable row / <tr>
tdtable data cell / <td>

Custom

typedescription
imageinline image element, check elements
videoinline video element, check elements
galleryinline gallery element, check elements
oembedinline oEmbed element, check elements
esiinline edge side include element, check elements
quoteinline quotation element, check elements
infoboxinline box, consists of textual content in children and optional elements
pros_and_conspros and cons box, consists of elements and structured text in children
[~]

[~]

Article ۰ Element

Elements are self-contained objects that represent structured data that is usually too complex to fit into our usual workhorse which is the Body.

Elements can appear in multiple places within the Article:

  1. Article.elements

Elements of the article which are not part of the textual body, e.g. author, opener and teaser. Those elements should be used to render the article as a teaser e.g. on section pages.

  1. BodyNode.children:

Is the place where Element are quite commonly used. They come in various types and will be rendered inplace, thus breaking up the textual body.

  1. Elements.children:

Some more sophisticated Elements make use of nesting to make their API representation more concise and help to structure things hierarchically:

  • video uses nesting to model its optional poster image which itself is a normal image.

  • galleries have their individual images nested within.

Different types of elements like images or videos share the same message structure distinguished by the ElementType field.

See Sample section.

Field nameTypeDescription
typeTypetype of this Element, see Element.Type
relationsrepeated RelationThe usages (relations) of an element. See Relation
assetsrepeated AssetAssets describing this Element, See Samples
childrenrepeated Elementnested Elements, e.g. for Element of type gallery and video

message Element {
Type type = 1;
repeated Relation relations = 2;
repeated Asset assets = 3;
repeated Element children = 4;

[~]

Element.Type

Enum valueDescription
TYPE_UNSPECIFIEDunspecified
ARTICLEunused
IMAGEimage, containing further Assets. Sample
VIDEOvideo, containing nested Asset and an optional nested image Element. Sample
GALLERYgallery, consists of many nested image Elements.
OEMBEDoEmbed, contains one metadata Asset. Todo: sample
AUTHORauthor, contains one metadata Asset and an optional image Element. Todo: sample
AGENCYauthor, contains one metadata Asset
EDGE_SIDE_INCLUDE<esi:include> that must be resolved server-side for SEO reasons, otherwise similar to OEMBED
CITATIONoEmbed, contains one metadata Asset. Todo: sample
INTERNAL_WIDGETwidget or embed that is handled directly by the front end rendering Todo: sample
AUDIOaudio element Todo: sample

enum Type {
TYPE_UNSPECIFIED = 0;
ARTICLE = 1;
IMAGE = 2;
VIDEO = 3;
GALLERY = 4;
OEMBED = 5;
AUTHOR = 6;
AGENCY = 7;
EDGE_SIDE_INCLUDE = 8;
CITATION = 9;
INTERNAL_WIDGET = 10;
AUDIO = 11;
}

[~]

Element.Relation

Enum valueDescription
RELATION_UNSPECIFIEDunspecified
OPENERAs an opener element (within the content)
TEASERAs a teaser element (when externally viewed)
SOCIALUse as social element (mostly images), e.g. <og:image> or JSON-LD
enum Relation {
RELATION_UNSPECIFIED = 0;
OPENER = 1;
TEASER = 2;
SOCIAL = 3;
}

[~]

Samples

For details on certain fields or usages of Assets, please follow this link.

image element

  • usually consist of one Asset[@type=METADATA] and several Asset[@type=IMAGE], one for each crop.
{
  "type": "IMAGE"
  "relations": [ "OPENER", "TEASER" ],
  "assets": [{
    "fields": {
      "media_id": "90635672v2",
      "caption": "Annalena Baerbock und Joschka Fischer bei einer Wahlkampfveranstaltung: Die Grünen-Kanzlerkandidatin fordert von der Bundesregierung, mindestens 10.000 Menschen aus Afghanistan aufzunehmen.",
      "alt_text": "Annalena Baerbock und Joschka Fischer bei einer Wahlkampfveranstaltung: Die Grünen-Kanzlerkandidatin fordert von der Bundesregierung, mindestens 10.000 Menschen aus Afghanistan aufzunehmen.",
      "description": "Annalena Baerbock und Joschka Fischer bei einer Wahlkampfveranstaltung: Die Grünen-Kanzlerkandidatin fordert von der Bundesregierung, mindestens 10.000 Menschen aus Afghanistan aufzunehmen.",
      "source": "/Reuters-bilder"
    },
    "type": "METADATA",
    "metadata": {
      "state": "STATE_UNSPECIFIED",
      "start_time": { "seconds": "-62135596800" },
      "end_time": { "seconds": "253402300799" }
    }
  },
  {
    "type": "IMAGE",
    "fields": {
      "crop": "original",
      "url": "https://di7yufqc6mgnl.cloudfront.net/2021/08/90635672v2/fit-in/0x0/annalena-baerbock-und-joschka-fischer-bei-einer-wahlkampfveranstaltung-die-gruenen-kanzlerkandidatin-fordert-von-der-bundesregierung-mindestens-10000-menschen-aus-afghanistan-aufzunehmen.jpg",
      "width": "1920",
      "height": "1280"
    },
  },
  {
    "type": "IMAGE",
    "fields": {
      "crop": "16:9",
      "url": "https://di7yufqc6mgnl.cloudfront.net/2021/08/90635672v2/0x0:1920x1077/fit-in/0x0/annalena-baerbock-und-joschka-fischer-bei-einer-wahlkampfveranstaltung-die-gruenen-kanzlerkandidatin-fordert-von-der-bundesregierung-mindestens-10000-menschen-aus-afghanistan-aufzunehmen.jpg",
      "width": "1920",
      "height": "1077"
    }
  }
  ]}
}

video element

  • usually consist of one Asset[@type=METADATA] and one Asset[@type=VIDEO].
  • If the video has a poster image, it can be found as a nested child Element[type=IMAGE] within children[]
{
  "relations": [ "OPENER" ],
  "type": "VIDEO"
  "assets": [
    {
      "type": "METADATA",
      "fields": {
        "media_id": "0DgeZjJtJ8EC",
        "caption": "Eine Statue der Justitia mit einer Waage und einem Schwert in ihren Händen.",
        "frame_capture:url": "https://d1q9f0uk9ts7gc.cloudfront.net/2021/08/0DgeZjJtJ8EC/thumbnails/maas-haben-500-menschen-aus-kabul-ausgeflogen_thumb.0000031.jpg",
        "frame_capture:numerator": "1",
        "frame_capture:denominator": "5"
      },
      "metadata": {
        "state": "STATE_UNSPECIFIED",
        "start_time": { "seconds": "-62135596800" },
        "end_time": { "seconds": "253402300799" }
      }
    },
    {
      "type": "VIDEO"
      "fields": {
        "duration_seconds": "157.576",
        "mime_type": "application/vnd.apple.mpegurl",
        "url": "https://d1q9f0uk9ts7gc.cloudfront.net/2021/08/0DgeZjJtJ8EC/hls/maas-haben-500-menschen-aus-kabul-ausgeflogen.m3u8",
        "height": "1080",
        "width": "1920"
      },
    }
  ],
  "children": [
    {
      "type": "IMAGE"
      "assets": [
        {
          "fields": {
            "media_id": "amOyEe-u5llZ"
          },
          "type": "METADATA",
          "metadata": {
            "start_time": { "seconds": "-62135596800" },
            "end_time": { "seconds": "253402300799" }
          }
        },
        {
          "type": "IMAGE",
          "fields": {
            "crop": "original",
            "url": "https://di7yufqc6mgnl.cloudfront.net/2021/08/amOyEe-u5llZ/fit-in/0x0/image.png",
            "width": "1500",
            "height": "844"
          }
        }
      ]
    }
  ]
}
  • usually consist of one Asset[@type=METADATA] describing the gallery itself
  • several nested Element[@type=IMAGE] within children[], one for each image of this gallery
{
  "relations": [ "OPENER" ],
  "type": "GALLERY",
  "assets": [
    {
      "type": "METADATA",
      "fields": {
        "headline": "Gallery ipsum dolor",
        "ref_path": "/test-playground/id_100000067/gallery-ipsum-dolor.html",
        "ref_canonical": "https://www.t-online.de/test-playground/id_100000067/gallery-ipsum-dolor.html",
        "url": "/test-playground/id_100000067/gallery-ipsum-dolor.html"
      }
    }
  ],
  "children": [
    {
      "type": "IMAGE",
      "assets": [
        {
          "type": "METADATA",
          "fields": {
            "media_id": "82333994v1",
            "caption": "Wer unterwegs Äpfel pflückt, kann sich strafbar machen.",
            "alt_text": "Wer unterwegs Äpfel pflückt, kann sich strafbar machen.",
            "description": "Wer unterwegs Äpfel pflückt, kann sich strafbar machen.",
            "source": "Patrick Seeger/dpa-tmn/dpa"
          },
          "metadata": {
            "start_time": { "seconds": "-62135596800" },
            "end_time": { "seconds": "253402300799" }
          }
        },
        {
          "type": "IMAGE",
          "fields": {
            "crop": "original",
            "url": "https://di7yufqc6mgnl.cloudfront.net/2021/05/82333994v1/fit-in/0x0/wer-unterwegs-aepfel-pflueckt-kann-sich-strafbar-machen.jpg",
            "width": "640",
            "height": "360"
          }
        }
      ]
    }
  ]
}

[~]

[~]

Article ۰ Element ۰ Asset

Asset of an Element.

An asset configuration is dependant upon its use, it may alter depending on its type field.

Field nameTypeDescription
typeTypeType of the asset.
fieldsmap<string, string>Generic map containing general content and configuration information of the asset. Clients must be resilient to unknown or missing entry sets.
metadataMetadataOnly present for assets of TYPE.METADATA. Technical metadata for the parent element (state, validity, ...). See Metadata
referenceReferenceReference, e.g. URL belonging to this asset.
message Asset {
Type type = 1;
map<string, string> fields = 2;
Metadata metadata = 3;
Reference reference = 4;

[~]

enum Type

Type of an asset.

Enum valueDescription
TYPE_UNSPECIFIEDunspecified
IMAGEimage asset with an resizable template URL and some image stats (width, height, cropping). See samples
VIDEOinternal video asset, expect (m3u8/HLS) URLS and some video stats (width, height, druation) within fields
EXTERNAL_VIDEOholds (m3u8/HLS) URLS to external videos, such as live streams and glomex
METADATAholds Metadata for the parent element and fields that also depend on the parent Element.Type
LINKadditional link (href, reference) asset for parent Element, e.g. an image with an optional link target.
AUDIOinternal audio asset, expect (mp3) URLS

enum Type {
TYPE_UNSPECIFIED = 0;
IMAGE = 1;
VIDEO = 2;
EXTERNAL_VIDEO = 3;
METADATA = 4;
LINK = 5;
AUDIO = 6;
}

[~]

Samples

Image Asset

{
  "type": "IMAGE",
  "fields": {
    "crop": "16:9",
    "url": "https://${CDN_URL}/89670804v20/0x37:1920x1079/fit-in/0x0/das-covid-19-dashboard-vom-robert-koch-institut-symbolbild-die-corona-inzidenz-in-muenchen-ist-deutlich-gesunken.jpg",
    "width": "1920",
    "x": "0",
    "y": "37",
    "height": "1079"
  }
}
fielddescription
urlthe URL for this cropped images withou scaling. If scaling is desired, replace the /0x0/ with the desired dimensions, /fit-in/ will make sure that the cropped image will fit inside this rectangle.
cropthis cropped image's loginal name, e.g. original, 16:9, custom
xx-offset off the original image for this crop
yy-offset off the original image for this crop
widththe width of this cropped image, before scaling.
heightthe height of this cropped image, before scaling

NOTES:

  • x + width <= width(original_image) otherwise the image generator will fail
  • y + height <= height(original_image) otherwise the image generator will fail

Video Asset

{
  "type": "VIDEO"
  "fields": {
    "duration_seconds": "157.576",
    "mime_type": "application/vnd.apple.mpegurl",
    "url": "https://d1q9f0uk9ts7gc.cloudfront.net/2021/08/0DgeZjJtJ8EC/hls/maas-haben-500-menschen-aus-kabul-ausgeflogen.m3u8",
    "height": "1080",
    "width": "1920"
  }
}
fielddescription
urlthe URL of this asset, usually a m3u8 playlist URL
mime_typethe mime type of this asset, usually a m3u8/HLS
duration_secondsvideo duration in seconds
heightthe height of the original video, may differ from the transcoded video.
widththe width of the original video, may differ from the transcoded video.

Video Metadata Asset

{
  "type": "METADATA",
  "fields": {
    "media_id": "0DgeZjJtJ8EC",
    "caption": "Eine Statue der Justitia mit einer Waage und einem Schwert in ihren Händen.",
    "frame_capture:url": "https://example.com/thumbnails/thumb.0000031.jpg",
    "frame_capture:numerator": "1",
    "frame_capture:denominator": "5"
  }
}
fielddescription
media_idalpha-numeric CMS id of the media
captionthe video's caption
frame_capture:urlif frame capture was enabled during transcoding, this is the URL of the last frame capture
frame_capture:numeratorframe capture images are numerated, starting at 0000000 which can be used as the poster image.
frame_capture:denominatornumerator and denominator can be used to to calculate which frame capture image must be displayed at a given time.

Example: for numerator=1 and denominator=5, we have to increment the frame capture every 5 / 1 == 5 seconds:

00:00.000 --> 00:05.000
/thumbnails/thumb.0000000.jpg

00:05.000 --> 00:10.000
/thumbnails/thumb.0000001.jpg

00:10.000 --> 00:15.000
/thumbnails/thumb.0000002.jpg

00:15.000 --> 00:20.000
/thumbnails/thumb.0000003.jpg

...

[~]

[~]

Article ۰ Keyword

Extracted keywords from the article body like persons, locations, organizations etc.

Field nameTypeDescription
valuestringUnique value of this keyword.
typestringType/Category of this keyword like location, organization, person
scorefloatScore for the relevance of this keyword set by the engine
message Keyword {
string value = 1;
string type = 2;
float score = 3;
}

[~]

[~]

Article ۰ Metadata

Article metadata like publication state and technical timestamps.

Field nameTypeDescription
stateStateState of the article in the content management system. See enum State
start_timeTimestampManually set editorial timestamp (Gültig von) at which the article is valid to deliver on digital platforms in seconds of UTC time since Unix epoch.
end_timeTimestampManually set editorial timestamp (Gültig bis) till the article is valid to deliver on digital platforms in seconds of UTC time since Unix epoch.
publish_timeTimestampEditorial timestamp (Publikationsdatum) of the first publication of the article in seconds of UTC time since Unix epoch. This date will be set automatically by the content management system.
update_timeTimestampEditorial timestamp (Aktualisierungsdatum) at which the article was updated in seconds of UTC time since Unix epoch. On first publication this timestamp matches publish_time. Afterwards it's either updated manually in the content management system or automatically if the article content changed significantly.
transformation_timeTimestampTechnical timestamp at which the article was transformed in the API layer in seconds of UTC time since Unix epoch.
transformation_errorsint64Number of errors occurred while fetching and/or transforming optional article components (e.g. embeds or nested documents) to an article message.
last_modification_timeTimestampTechnical timestamp at which the article was published regardless of the amount and significance of the change.
event_sourceEventSourceSource of the event that caused this item to be transformed and to be written into the DB.
seo_scoredoubleThe article score (originates from team data's Content Engine, higher scores are better)
publication_idint64The unique publication_id provided by the CMS, can be used to correlate the state of documents in tapir with the corresponding CMS publication event.
related_article_sourcestringSource of this article, if embedded in another article as a related article.
tenantstringThe tenant this article belongs to. e.g. www, berlin or such
message Metadata {
State state = 1;
google.protobuf.Timestamp start_time = 2;
google.protobuf.Timestamp end_time = 3;
google.protobuf.Timestamp publish_time = 4;
google.protobuf.Timestamp update_time = 5;
google.protobuf.Timestamp transformation_time = 6;
int64 transformation_errors = 7;
google.protobuf.Timestamp last_modification_time = 8;
EventSource event_source = 9;
double seo_score = 10;
int64 publication_id = 11;
string related_article_source = 12;
string tenant = 13;

[~]

enum State

State of the item (Article, Element) in the content management system. The state in combination with start_time and end_time determines whether or not this item should be rendered; this must be respected by all consumers especially when content is duplicated or cached.

The terms deleted (articles) and archived (media lib) are interchangeable/synonyms. This enum combines those two into State.DELETED. An Article is in State.DELETED if it was deleted in the content management system, or if it's end_time has been reached.

An Article is in State.DRAFT if it has never been published, or if the start_time lies in the future.

Enum valuedescription
STATE_UNSPECIFIEDunspecified
PUBLISHEDpublished content which is currently within its validity dates
DELETEDthis content is deleted or expired in the CMS
DRAFTthis content was never published in the CMS
enum State {
STATE_UNSPECIFIED = 0;
PUBLISHED = 1;
DELETED = 2;
DRAFT = 3;
}

[~]

enum EventSource

Even more detail about the circumstances of transformation for this article.

The EventSource will be of type:

  • PRIMARY in case this article was directly updated and published
  • SECONDARY in case this article was indirectly updated. This can be caused by updates of nested elements, such as Videos that may expire at some point. Another source of change may be Scheduled Events like this item becomes valid or invalid at some point in time in the future after the item's original publication time.
Enum valuedescription
EVENT_SOURCE_UNSPECIFIEDunspecified
PRIMARYthis article's transformation was caused by a direct change in the CMS
SECONDARYthis article's transformation was caused by a transitive update
CONTENT_ENGINEthis article's transformation was caused by an external system (Content Engine)
enum EventSource {
EVENT_SOURCE_UNSPECIFIED = 0;
PRIMARY = 1;
SECONDARY = 2;
CONTENT_ENGINE = 3;
}

[~]

[~]

Author

This represents an author (or agency). The entity may be the main content on author pages or simply indicate the author of an Article.

Field nameTypeDescription
idint64The unique identifier (cms id) of the author.
typeAuthor.TypeThe type of the author entity.
fieldsmap[string, string]The fields of the author. This is a map of key-value pairs. The keys are the field names and the values are the field values.
elementsArticle.Element[]The elements of the author, e.g. the author's profile picture.
work_historyAuthor.HistoryEntry[]The career entries of the author.
educationReference[]The education entries of the author.
social_profilesReference[]The social profiles of the author.
areas_of_expertisestring[]List of topics where the author possesses extraordinary knowledge
referencesReference[]References, e.g. URLs belonging to this article.
message Author {
int64 id = 1;
Type type = 2;
map<string, string> fields = 3; // migrate from Asset[type=metadata]
repeated Article.Element elements = 4; // profile picture

repeated HistoryEntry work_history = 5;
repeated Reference education = 6;
repeated Reference social_profiles = 7;
repeated string areas_of_expertise = 8;

repeated Reference references = 9;

[~]

enum Type

Enum valueDescription
TYPE_UNSPECIFIEDunspecified
AUTHORThe author is a person.
AGENCYThe author is an agency or company.
enum Type {
TYPE_UNSPECIFIED = 0;
AUTHOR = 1;
AGENCY = 2;
}

[~]

HistoryEntry

Lists previous jobs and details about the author's career.

Field nameTypeDescription
rolestringThe role of the author for this occupation.
descriptionstringA description of the author's role.

message HistoryEntry {
string role = 1;
string description = 2;
}

[~]

Sample Author

{
  "id": 100000001,
  "type": "AUTHOR",
  "fields": {
    "flag:hidden": "true",
    "role": "Hier steht ein Titel",
    "academic_degree": "Prof.",
    "last_name": "Doe",
    "short_name": "jdoe",
    "headline": "Autorenseite von John Doe",
    "first_name": "John",
    "ignore_vg_wort": "true",
    "url": "/author/id_100000001/john-doe.html"
  },
  "elements": [
    { "//": "Author Image Element removed for better readability" }
  ],
  "work_history": [
    {
      "role": "Dummy",
      "description": "Hält nur als pseudo Autor her, John Doe eben ;)"
    },
    {
      "role": "Chief Executive Officer of ACME",
      "description": "Very important"
    }
  ],
  "education": [
    {
      "children": [],
      "fields": {},
      "type": "",
      "label": "John Doe Acedamy",
      "href": "https://www.john.doe.acedamy.com"
    },
    {
      "children": [],
      "fields": {},
      "type": "",
      "label": "ACME university",
      "href": "https://www.acmemilano.it/"
    }
  ],
  "social_profiles": [
    {
      "children": [],
      "fields": {},
      "type": "",
      "label": "MySpace",
      "href": "https://myspace.com/johndoe"
    },
    {
      "children": [],
      "fields": {},
      "type": "",
      "label": "Instagram",
      "href": "https://www.instagram.com/johndoe.x/?hl=en"
    }
  ],
  "areas_of_expertise": [
    "Dummy",
    "ACME",
    "Example",
    "no-op",
    "Cyber",
    "PDP-11-Assembly",
    "Tetris"
  ]
}

[~]

[~]

Reference

A Reference represents a link to another entity, for example an Article, a Section or an external website, or a whole tree structure, for example a section tree or breadcrumb navigation.

Field nameTypeDescription
typestringThe type is used for filtering in a list of references. It describes a use-case, which usually has a defined render position. See type
labelstringThe label of the reference.
hrefstringThe href of the reference. It can be relative or absolute.
fieldsmap<string, string>Contains all optional attributes of the reference. Clients must be resilient to unknown or missing entries. See fields
childrenrepeated ReferenceHierarchically structured references for representing a navigation or tree.

message Reference {
string type = 1;
string label = 2;
string href = 3;
map<string, string> fields = 4;
repeated Reference children = 5;
}

[~]

type

Example entries:

  • unspecified text
  • stage_title
  • stage_themenbereiche
  • stage_header_links
  • stage_top_themen
  • stage_tag_category

fields

Contains one or more optional attributes of the reference:

  • target
  • rel
  • flag:internal
  • layout

Samples

{
  "label": "Home"
  "href": "/",
  "children": [
        "label": "Spielwiese (Tests)",
        "href": "/test-playground/"
  ]
}

[~]

[~]

Stage

A stream stage with companions and the main content area. Embedded items can be editorial articles, advertisement and/or stages (only one level deep).

message Stage {
  Configuration configuration = 1;
  repeated Item stream_items = 2;
  repeated Item companion_items = 3;

[~]

⚙︎ ArticlePageService

service ArticlePageService {
  # turns the requested article with editorial render relevant data for the user and SEO bots.
  rpc GetArticlePage (GetArticlePageRequest) returns (GetArticlePageResponse) {}
}

Description

[~]

Request message to get an article page.

message GetArticlePageRequest {
  # ID of the article defined by the content management system (required).
  int64 id = 1;
}

[~]

Response message for an article page request.

Status codes:

  • OK | article exists and is published
  • NOT_FOUND | article doesn't exist or is not published according to it's Metadata
message GetArticlePageResponse {
 # Article page with all render relevant data for the user and SEO bots.
 stroeer.page.article.v1.ArticlePage article_page = 1;
}

[~]

Status/Error scenario's

scenario found

descriptionarticle was found in the datastore and is published and valid according to it's metadata
gRPC statusOK (0)
gRPC error payloadnone
HTTP status200 (OK)
cacheableyes

scenario invalid id

descriptionarticle id is invalid
gRPC statusINVALID_ARGUMENT (3)
gRPC error payloadgoogle.rpc.Bad
HTTP status400 (BAD REQUEST)
cacheableyes

scenario not found

descriptionarticle was not found in the datastore
gRPC statusNOT_FOUND (5)
gRPC error payloadnone
HTTP status404 (NOT FOUND)
cacheableyes

scenario not yet valid

descriptionarticle was found in the datastore, but is not valid yet according to its metadata.start_time
gRPC statusNOT_FOUND (5)
gRPC error payloadgoogle.rpc.ResourceInfo, check description field for recommended http status code
HTTP status404 (NOT FOUND)
cacheableyes

scenario not published

descriptionarticle was found in the datastore, but it's state is neither State.DELETED nor State.PUBLISHED
gRPC statusNOT_FOUND (5)
gRPC error payloadgoogle.rpc.ResourceInfo, check description field for recommended http status code
HTTP status404 (NOT FOUND)
cacheableyes

scenario expired

descriptionarticle was found in the datastore, but is expired according to metadata.end_time
gRPC statusNOT_FOUND (5)
gRPC error payloadgoogle.rpc.ResourceInfo, check description field for recommended http status code
HTTP status410 (GONE)
cacheableyes

scenario deleted/archived

descriptionarticle was found in the datastore, but it's state is State.DELETED
gRPC statusNOT_FOUND (5)
gRPC error payloadgoogle.rpc.ResourceInfo, check description field for recommended http status code
HTTP status410 (GONE)
cacheableyes

scenario internal

descriptioninternal error processing the article
gRPC statusINTERNAL (13)
gRPC error payloadnone
HTTP status500 (INTERNAL SERVER ERROR)
cacheableno

scenario timeout

descriptiontimeout loading and processing the article
gRPC statusDEADLINE_EXCEEDED (4)
gRPC error payloadnone
HTTP status504 (GATEWAY TIMEOUT)
cacheableno
[~]

[~]

⚙︎ CoreArticleService

Core service to either query a single article (rpc GetArticle()) identified by its id or to query multiple articles (rpc ListArticles()) by providing a query.

All results returned from this service are unfiltered by default, hence they may contain elements that are expired, not yet valid or whose state is not PUBLISHED. This behaviour can be changed by providing a RequestSettings object.


service ArticleService {
rpc GetArticle(GetArticleRequest) returns (stroeer.core.v1.Article) {}
rpc BatchGetArticles(BatchGetArticlesRequest) returns (BatchGetArticlesResponse) {}
rpc ListArticles(ListArticlesRequest) returns (ListArticlesResponse) {}
// Allow Empty as request param
// buf:lint:ignore RPC_REQUEST_STANDARD_NAME
rpc ListSections(google.protobuf.Empty) returns (ListSectionsResponse) {}
}

[~]

⚙︎ GetArticle

rpc GetArticle (GetArticleRequest) returns (stroeer.core.v1.Article) {}

returns a single stroeer.core.v1.Article if the given id exists, an Error, otherwise. (todo: describe errors)

Field nameTypeDescription
idint64[required] Unique id of the article to be fetched.
message GetArticleRequest {
int64 id = 1;
RequestSettings request_settings = 2;
}

[~]

⚙︎ BatchGetArticle

returns multiple stroeer.core.v1.Article for the given ids. The ordering of items will the same ordering as the ids requested. If an id does not exists, it is omitted in the result (no error will be raised).

There is a maximum of 100 items that can be queried in one batch.

Field nameTypeDescription
idsrepeated int64[required] A list of ids of the articles to be fetched
message BatchGetArticlesRequest {
repeated int64 ids = 1;
RequestSettings request_settings = 2;
}

[~]

returns a message-listarticlesresponse with articles matching the query. If the results exceed 100 Articles or 1 MB the response can be paginated to obtain additional results.

ListArticlesRequest

Field nameTypeDescription
queryQuery[required] find items based on query values
filtersFilters[optional] A filter expression is applied after a Query finishes, but before the results are returned.
page_sizeint32[optional] limit the results per page, default is 10; max is 100 (or result exceeds 1 MB). Values above 100 will be coerced to 100. If results get truncated, you can use pagination.
page_tokenstring[optional] A page token, received from a previous ListArticles call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListArticles must match the call that provided the page token.
message ListArticlesRequest {
Query query = 1;
Filters filters = 2;
int32 page_size = 3;
string page_token = 4;
RequestSettings request_settings = 5;

[~]

Query

Specify the search criteria. The list-API is build around sections which come in two flavors:

  1. home_section: find all articles that resides within that exact section. The home_section is equal to the settings found in the CMS, e.g. /nachrichten/wissen/
  2. root_section: this property is derived from the home_section path by retaining only the root folder, e.g. for /nachrichten/wissen/ the root_section becomes /nachrichten/

In most cases using the root_section should yield better results since it will also find content in nested sections whereas home_section would only return content which was curated into the exact section that was queried.

Field nameTypeDescription
pathstring[required] path, with leading and trailing slash (e.g. /nachrichten/)
typeType[required] query type, either Type.HOME_SECTION or Type.ROOT_SECTION
sort_bySortBy[required] sorting of the result set, either SortBy.UPDATE_TIME or SortBy.PUBLISH_TIME
orderOrder[optional] sorting direction for the results regarding the sort_by field, default is Order.ASCENDING
from_timeTimestamp[optional] time constraint that refers to the sort_by field.
to_timeTimestamp[optional] time constraint that refers to the sort_by field.
message Query {
string path = 1;
Type type = 2;
SortBy sort_by = 3;
Order order = 4;
google.protobuf.Timestamp from_time = 5;
google.protobuf.Timestamp to_time = 6;

[~]

Type

Enum valueDescription
TYPE_UNSPECIFIEDunspecified
HOME_SECTIONquery by exact home section which is configured in the CMS
ROOT_SECTIONquery by exact root section which is derived from home section when only retaining the first level of the path

see the description above why these query types exist, also see Reference how section information are stored.

enum Type {
TYPE_UNSPECIFIED = 0;
HOME_SECTION = 1;
ROOT_SECTION = 2;
}

[~]

SortBy

Enum valueDescription
SORT_BY_UNSPECIFIEDunspecified
UPDATE_TIMEsort by the content's update_time
PUBLISH_TIMEsort by the content's publish_time
enum SortBy {
SORT_BY_UNSPECIFIED = 0;
UPDATE_TIME = 1;
PUBLISH_TIME = 2;
}

[~]

Order

order of index traversal, default: ascending.

Enum valueDescription
ORDER_UNSPECIFIEDunspecified
ASCENDINGascending order index traversal
DESCENDINGdescending order index traversal
enum Order {
ORDER_UNSPECIFIED = 0;
ASCENDING = 1;
DESCENDING = 2;
}

[~]

Filters

If you need to further refine the Query results, you can optionally provide a filter expression. A filter expression determines which items within the Query results should be returned to you. All of the other results are discarded.

A filter expression is applied after a Query finishes, but before the results are returned. Therefore, a Query consumes the same amount of read capacity, regardless of whether a filter expression is present.

Field nameTypeDescription
type_includesContentTypetype to include into the result set
type_includesContentTypetype to exclude from the result set
sub_type_includesContentSubTypesub_type to include into the result set
sub_type_excludesContentSubTypesub_type to exclude from the result set
message Filters {
repeated Article.Type type_includes = 1;
repeated Article.Type type_excludes = 2;
repeated Article.SubType sub_type_includes = 3;
repeated Article.SubType sub_type_excludes = 4;
}

[~]

RequestSettings

Alters the behavior of the request in a way that filters or alters the result or parts of the result based on validity of the article or its elements.

You can also alter the view mode of the article, selecting either the full article or a limited version of it (called teaser or trail)

RequestSettings.ArticleViewMode

Enum valueDescription
ARTICLE_VIEW_MODE_UNSPECIFIEDunspecified, defaults to ARTICLE_VIEW_MODE_DEFAULT
ARTICLE_VIEW_MODE_DEFAULTfull article including body and all elements.
ARTICLE_VIEW_MODE_TEASERelements that are not required when teasering the article are removed (e.g. body).

RequestSettings.ArticleValidity

Enum valueDescription
ARTICLE_VALIDITY_UNSPECIFIEDunspecified, defaults to ARTICLE_VALIDITY_IGNORE
ARTICLE_VALIDITY_VALIDfilters articles that are considered valid and allowed to be accessed publicly.
ARTICLE_VALIDITY_IGNOREIgnore the article validity and return everything as is, even deleted or expired content.

RequestSettings.ElementValidity

Enum valueDescription
ELEMENT_VALIDITY_UNSPECIFIEDunspecified, defaults to ELEMENT_VALIDITY_IGNORE
ELEMENT_VALIDITY_VALIDRemove invalid elements from its parent article, such as expired images or videos.
ELEMENT_VALIDITY_IGNOREIgnore the element validity and return everything as is, even deleted or expired content.
message RequestSettings {
ArticleViewMode article_view_mode = 1;
ArticleValidity article_validity = 2;
ElementValidity element_validity = 3;

enum ArticleViewMode {
ARTICLE_VIEW_MODE_UNSPECIFIED = 0;
ARTICLE_VIEW_MODE_DEFAULT = 1;
ARTICLE_VIEW_MODE_TEASER = 2;
}

enum ArticleValidity {
ARTICLE_VALIDITY_UNSPECIFIED = 0;
ARTICLE_VALIDITY_VALID = 1;
ARTICLE_VALIDITY_IGNORE = 2;
}

enum ElementValidity {
ELEMENT_VALIDITY_UNSPECIFIED = 0;
ELEMENT_VALIDITY_VALID = 1;
ELEMENT_VALIDITY_IGNORE = 2;
}
}

[~]

ListArticlesResponse

Field nameTypeDescription
articlesArticlelist of articles that match the query and also the filter, otherwise empty.
next_page_tokenstringA token that can be sent as page_token to retrieve the next page. If this field is omitted, there are no subsequent pages.
message ListArticlesResponse {
repeated stroeer.core.v1.Article articles = 1;
string next_page_token = 2;
}

[~]

⚙︎ ListSections

list the available root sections

ListSectionsResponse

list all available root_sections that can be used in the query above.

message ListSectionsResponse {
repeated string sections = 1;
}

[~]

[~]

⚙︎ CurationService

This services allows to query curations within the CMS. In the CMS domain this is implemented as Lists which usually contain one ore more Articles.

service CurationService {
rpc GetCuration(GetCurationRequest) returns (GetCurationResponse) {}
rpc BatchGetCuration(BatchGetCurationRequest) returns (BatchGetCurationResponse) {}
}

[~]

⚙︎ GetCuration

Fetch a curation by its id and return the repeated stroeer.core.v1.Article this curation contains. The response may be empty in case the curation does not contain any items.

a NOT_FOUND status code will indicate the curation id does not exist.

GetCurationRequest

Field nameTypeDescription
idint64[required] id of the list to be fetched
message GetCurationRequest {
int64 id = 1;
}

[~]

GetCurationResponse

Field nameTypeDescription
idint64the id of this list
labelstringthe label of this list
update_timeTimestampTechnical timestamp at which the curation was updated in seconds UTC time since Unix epoch.
articlesrepeated Articlecurated items of this list

message GetCurationResponse {
int64 id = 1;
string label = 2;
google.protobuf.Timestamp update_time = 3;
repeated stroeer.core.v1.Article articles = 4;
}

[~]

⚙︎ BatchGetCuration

Fetch multiple curations by their id and return the repeated stroeer.core.v1.Article those curations contain. The response may be empty in case the curation does not contain any items. The ordering of items will the same ordering as the ids requested.

BatchGetCurationRequest

Field nameTypeDescription
idsrepeated int64the ids of the lists to be fetched
message BatchGetCurationRequest {
repeated int64 ids = 1;
}

[~]

BatchGetCurationResponse

Field nameTypeDescription
curationsGetCurationResponsea single response item that corresponds to ids this service was called with.
message BatchGetCurationResponse {
repeated GetCurationResponse curations = 1;
}

[~]

[~]

⚙︎ SectionPageService

Message to provide parameters when requesting data for a section page, currently only the path of the page. Correct paths have a leading and a trailing slash, like /nachrichten/unterhaltung/ The homepage has the path /.

message GetSectionPageRequest {
  // valid section_path, with leading and trailing slash
  string section_path = 1;

  // use to page through sections. If unspecified, it will default to `1`.
  // Paging is 1-based (1 is the first page, there is no page `0`)
  //
  // Due to underlying mechanisms and seo requirements, page-size is fixed at 30
  // The service may return fewer than this value.
  int32 page = 2;
}

[~]

Response message when requesting data for a section page. Responds with NOT_FOUND if an unknown path is requested, or the path is incorrect.

message GetSectionPageResponse {
stroeer.page.section.v1.SectionPage section_page = 1;

 // Total number of pages in this `section_path`
int32 total_pages = 2;

PaginationType pagination_type = 3;

 enum PaginationType {
   // Not specified.
   PAGINATION_TYPE_UNSPECIFIED = 0;

   // The default pagination type.
   FIXED_BLOCK = 1;

   // Pagination type for Evergreen-Ressorts.
   GHOST_BLOCK = 2;
 }
}

[~]

Service to fetch all data needed to render a section page, like the homepage or "/politik/" [~]

Status/Error scenario's

scenario: found

descriptionall data for the section page was found
gRPC statusOK (0)
gRPC error payloadnone
HTTP status200 (OK)
cacheableyes

scenario: section path is empty

descriptionclient did not provide a section path
gRPC statusINVALID_ARGUMENT (3)
gRPC error payloadgoogle.rpc.Bad
HTTP status400 (BAD REQUEST)
cacheableyes

scenario: section path is invalid

descriptionclient provided an invalid section path
gRPC statusINVALID_ARGUMENT (3)
gRPC error payloadgoogle.rpc.Bad
HTTP status400 (BAD REQUEST)
cacheableyes

scenario: section path is unknown

descriptionclient provided an unknown section path
gRPC statusNOT_FOUND (5)
gRPC error payloadnone
HTTP status404 (NOT FOUND)
cacheableyes

scenario partial section data

descriptionartificial internal error processing parts of this section (no data but valid section)
gRPC statusINTERNAL (13)
gRPC error payloadnone
HTTP status500 (INTERNAL SERVER ERROR)
cacheableno

scenario internal

descriptioninternal error processing the section
gRPC statusINTERNAL (13)
gRPC error payloadnone
HTTP status500 (INTERNAL SERVER ERROR)
cacheableno

scenario timeout

descriptiontimeout loading and processing the section
gRPC statusDEADLINE_EXCEEDED (4)
gRPC error payloadnone
HTTP status504 (GATEWAY TIMEOUT)
cacheableno

Scenarios about incomplete section data needs to be defined. No section data results in an internal server error while incomplete section data might be returned. [~]

[~]

⚙︎ StageService

############################################################################

Description

############################################################################ Get single stages by requesting them via well-known ids, e.g. "schlagzeilen" or "meistgelesen".

############################################################################

Status/Error scenario's

############################################################################

scenario: found description: service responded without encountering exceptions gRPC status: OK gRPC error payload: none HTTP status: OK cacheable: yes

scenario: internal description: internal error while loading data gRPC status: INTERNAL gRPC error payload: none HTTP status: 500 cacheable: no

scenario: timeout description: timeout while loading data gRPC status: DEADLINE_EXCEEDED gRPC error payload: none HTTP status: 504 cacheable: no [~]

[~]