16Jun '17

UnifiedViews Pipeline for annotating metadata and data using PoolParty Extractor and GraphSearch

Posted by adequate_admin

We announce the UnifiedViews pipeline for ingesting PortalMonitor data and metadata to PoolParty GraphSearch. In the process, both data and metadata are annotated by the Adequate PoolParty thesauri concepts via the means of PoolParty Extractor, hence making it possible to search for datasets based on the thesauri concepts. As result, GraphSearch can be used as a service for different purposes: full-text search, filtering based on various facets and keyword-based search.

In more detail, the workflow of the pipeline is as follows. The pipeline fetches CSV data from the Portal Monitor stripping off the content, and at the same time it calls the PoolParty Extractor which annotates the content of the data with respect to the Adequate thesauri concepts.

The pipeline also fetches the metadata from the Portal Monitor as JSON-LD, by first transforming it to RDF and then stripping off — using SPARQL — some relevant parts such as title, description, and also using the PoolParty Extractor it annotates the metadata using the Adequate thesauri concepts — same as done previously with the data.

In the end, the content of data, metadata, together with the annotated data and metadata is sent to the GraphSearch.