Faceted search with Search API Solr

The following documentation describes an use case for Drupal Module Search API, Search API Solr and Facets in WissKI to create a Faceted Search and Full Text Search. The use case is based on the project Objekte im Netz (OiN http://objekte-im-netz.fau.de/) and the specific example of a faceted search using Solr on the page of the Sammlungsportal (http://objekte-im-netz.fau.de/portal/). All project-specific settings are identified by the keyword OiN. These serve either to explain certain facts in detail or provide optional settings.

Installation of Solr Server, Core and Drupal modules

Installation on the server (Do it yourself or ask your admin!)

  • Read the documentation and follow the installation instructions (e.g. https://lucene.apache.org/solr/guide/7_0/installing-solr.html). To some extent, you can also refer to tutorials on setting up Solr Search for Drupal 8, e.g.: https://www.ostraining.com/blog/drupal/apache-solr/
  • Notes on memory usage with Solr:
    • After installing the server, the Solr Dashboard can be accessed via the respective domain on port 8938. The dashboard displays the memory usage of Solr on the corresponding server (physical memory, swap, JVM memory).  
    • Solr tries to use as much RAM (physical memory) as resources are freely available. 
    • JVM memory (Java Virtual Machine Memory) is set to 500 MB by default. A value that is too low can lead to slowing down or to an OutOfMemoryException. Therefore, depending on the use case, you should consider whether this value is sufficient or has to be increased (https://lucene.apache.org/solr/guide/6_6/jvm-settings.html#jvm-settings).

dashboard solr

Install Drupal modules

The following modules are required for a faceted search:

  • Search API
  • language
  • Solr search
  • Facets: For Range/Slider Widget the Library has to be installed additionally. Either read the following issue (https://www.drupal.org/project/facets/issues/2927550) or the readme of the module itself. The necessary steps are mentioned there.
  • Knurg-Patch: With every module update to Search API the patch should be applied https://www.drupal.org/project/search_api/issues/3031621
    • Note: The adjustments were made by the developers of the Search API only in the dev version (OiN currently use search api version 1.13.0). 

 

Create a Solr Core 

What is a core? 

In Solr, the term core or collection is used to refer to a single index and its associated configuration files. A Solr server can have multiple cores so that data with different structures can be indexed on the same server. (OiN: A core called portal is created for the Sammlungsportal.)

  • Enter the following command in the console to create a new core named portal: bin/solr create -c portal
  • Restart Solr (service solr restart) and check whether the Core portal is displayed in the dashboard under Core Admin or Core Selector.

Create server in WissKI

add server

  • Go to Configuration >Search and metadata > Search API via the admin menu.
  • Click on Add Server
    • Assign server name (e.g. oin solr server​​​​)
    • Select a Connector (e.g. Standard). Then under Configure Solr backend further settings (backend, path, core name, index timeout etc) can be entered.
    • Add core name (e.g. portal)
    • These settings are application-dependent and must be filled out by the user.
    • The following settings should be activated under AdvancedRetrieve result data from Solr, Enable Retrieve highlighted snippets
    • If Drupal was set up on the server as a multi-site, you should activate Retrieve results for this site only under Multi-Site compatibility.

  • Before an index is built, the configuration files for the core must be stored in the correct directory so that Drupal can properly recognize and index the core.

    • There is one directory per core on the server. Depending on the server or installation settings, this can be found elsewhere. On an Ubuntu server the directories of the Solr cores are located under /var/solr/data/corename. Basically, each core contains the two directories /config and /data.

    • For the correct core config files you can download the file config.zip under Configuration>Search and Metadata>Search API>Server. All files except elevate.xml must be placed in the directory /conf. The writeable elevate.xml must be in the directory /data.

    • Afterwards Solr must be restarted. 

    • If the core in the Solr Dashboard still throws an error message (as for example in the issue https://www.drupal.org/project/search_api_solr/issues/3015993), in the file solrcore.properties the variable solr.install.dir may have to be adapted.

Add index and facets

After the core of Drupal has been recognized correctly, go to Configuration > Search and metadata > Search API and click on Add Index.

Create Index: Add Index

How does Solr access the data in WissKI?   

Each index is stored in its own Solr core. The WissKI entity is used as the data source for the index. Since the Solr Search module is a Drupal module, the index is generated via the memory structures of Drupal bundles and fields. Each group or path in the pathbuilder has a bundle-ID or field-ID. The mapping from path to field or group to bundle can be checked under Configure Field at the respective path or group.  configurate index

OiN: In OiN only the bundle Sammlungsobjekt is indexed. This means that all paths from the group Sammlungsobjekt and all bundles associated with it (entity reference paths or groups within the group Sammlungsobjekt) can be selected and indexed. The index runs across all fields of Sammlungsobjekte, which were selected/ added under Configuration > Search and metadata > Search API > [Index name] > Fields. This means that the number and state of the fields is stored at the time of indexing.

Steps to Create an Index for Sammlungsportal:

  • Specify an index name (e.g. Index Portal)
  • Select as Datasource WissKI Entity
  • Under CONFIGURE THE WISSKI ENTITY DATASOURCE select all bundles, which should be indexed and click on the radio button Only those selected 
  • Select an index order under CONFIGURE THE DEFAULT TRACKER
  • Select the server (e.g. oin solr server)
  • Under INDEX OPTIONS the cron batch size for the indexing can be influenced. → OiN: we reduced the cron batch size to 1, because of performance issues. 
  • The tracking size cannot be changed in the Interface of the module. For more information about the tracking size: Run grep -R tracking_page_size on the server. 
  • After creating an index for the first time, the tracking process is started automatically. 

 

 

Add Fields and Aggregated Fields

Fields are individual fields or paths. Aggregated fields are fields that can contain several paths (e.g. a facet that groups all locations together with the operator union: place of manufacture, place of discovery, former location together). 

OiN: The following example is from the pathbuilder gemeinsame_maske of the project Objekte im Netz. The path Herstellungsort from the group Sammlungsobjekt should be included in the index.  

The group Sammlungsobjekt (Main-Bundle) contains the group Herstellung (Sub-Bundle). This subbundle contains the path Herstellungsort:

pathbuilder

To index the data of  the path Herstellungsort  and later to use it as a facet, this path must be added via its field-ID. There are two ways to do this:add field

  • In the index under Fields, use Add Field to add the facets: 
    • Under Configuration >Search and metadata > Search API > Index Portal > Fields individual fields can be added via Add Fields. All fields and bundles of the selected bundles (CONFIGURE THE WISSKI ENTITY DATASOURCE) are displayed. If several bundles were selected, the fields and bundles are sorted alphabetically. 
    • If the field Herstellungsort should be added, the structure in the pathbuilder must be taken into account. Similar to a concatenation between concepts via properties in the pathbuilder, a concatenation of bundle-ID and field-ID, which are linked via :entity:, may occur when a field is added. Here is an example: 
      • Path Herstellungsortsamm:S1_Collection_Object -> ecrm:P108i_was_produced_by -> ecrm:E12_Production -> ecrm:P7_took_place_at -> samm:S40_Geographical_Place -> ecrm:P87_is_identified_by -> ecrm:E48_Place_Name
      • The group Herstellung has the bundle-ID b1f3693d52df44af738511cb918baaba. The path Herstellungsort lies within the group Herstellung and has the field-ID f9d6d3a152d0f92c753a8c4c733a8d56.
    • Under Add Fields to index portal you can find the bundle Herstellung (b1f3693d52df44af738511cb918baaba)  
    • Click on + before Herstellung and on the + before WissKI entity to refer from the entity Herstellung to the next WissKI entity.
    • Then search for the field Herstellungsort (f9d6d3a152d0f92c753a8c4c733a8d56) and click the Add button at the end of the line.
    • Click on Done and check if the field is listed in the index. The field Herstellungsort is uniquely identified by the bundle field ID concatenation: b1f3693d52df44af738511cb918baaba:entity:f9d6d3a152d0f92c753a8c4c733a8d56 
    • Hints: Aggregated field (e.g. there should be a facet containing all personal data records (inventor, artist, previous owner, author, collector)). 
      • Currently, the Drupal Solr Search module can only add fields that are directly connected to the bundle via Aggregated field. This means that there are no fields selectable that are connected to the bundle via an Entity Reference
      • For the above example: The Herstellungsort cannot be added into an aggregated field via the interface, since this is within the group Herstellung and is linked to the mainbundle Sammlungsobjekt via the subbundle Herstellung. Strictly speaking, a group is an entity reference
      • To add Entity Reference Fields to the index, you can either model out the path or use the alternative option.
      • It should have become clear that this procedure takes quite a long time. Especially including the time to model out the paths to create aggregated fields. 
  • Alternative: Insert the fields directly into the configuration file of the Search Index
    • As described above, add any field and any aggregated field to the index. These do not necessarily must make sense and can be deleted later. Only the structure of how these fields are stored in the configuration file is required for the further procedure. 
    • The individual configuration files can be viewed under the menu item Configuration > Development > Synchronize > Export > Single Item.
    • In case of uncertainty or first attempt, make a dump of the complete config.zip under Full archive as a precaution. 
    • The configuration file of the index can be called up via Single Item.export
    • Mark with Ctrl+a and copy with Ctrl+c the contents of the configuration file. 
    • Then go to Import > Single File and select Configurations type: Search Index and paste the copied file with Ctrl+v
    • Search the file for the added aggregated field.aggregated field
    • Copy this structure and paste it into the file under field_settings.(Beware: this file is a YAML file.)
    • Change the first line (= unique ID e.g. aggregated_field_2) and the label (= any name e.g. Orte). 
    • If the data type (e.g. type: string) or the aggregation type (e.g. union) changes, insert it at the appropriate place.
    • All fields that belong to the aggregated field are listed under fields. The bundle-field-ID- concatenations of the fields must now be added there.
    • Open the pathbuilder in a new tab, in which the field occurs and scroll to the end of the pathbuilder. There click Show Solr Paths. Then the bundle-field-ID-concatenation appears in the solr column.
    • Copy this concatenation for the field and paste the concatenation under fields into the configuration file.
    • Finally, the file must be imported.

 

Add Rendered HTML output for Fulltext search

OiN use case: Rendered HTML output corresponds to a complete view of the WissKI Entity and is the source for the full text search. In the Sammlungsportal only this field is currently used for a full text search in combination with a Search View, View Mode and Facets. The following instructions are optional, if no full text search is required:

  • Add Rendered HTML output via Add Field
  • Via Edit you can specify a user role as well as a view mode for the selected bundles (e.g. Sammlungsobjekt). 
  • rendered item
  • For the Rendered HTML output Field a new View Mode Objekte Vorschau was created under Structure > Create View Mode.
  • Under Structure > WissKI Bundles and Object Classes select the bundle to be indexed, e.g. Sammlungsobjekt. There go to Edit Bundle > Manage display and clone the layout of the bundle for the newly created View Mode Objekte Vorschau. Then leave only the fields in the display that should be searchable. 
  • In the Rendered HTML output Field change the View Mode to Objekte Vorschau.
  • Note: Pay attention to Drupal Permissions! The selected user role in the Rendered HTML output Field needs the same permissions/roles in the view. For example, if the Anonymous User is chosen and the permissions of the view are set to something, that the User Role does not have rights to access, then these bundles are not added to the indexed data.
  • Under Configuration >Search and metadata > Search API > Index Portal > Processors configure the highlighting and ignoring of characters: Enabled: Highlight

processors

Start index

After all fields and settings have been configured, the index can be started. Under Configuration > Search and metadata > Search API > Index Portal > View click Index now. If no cron job has been set up, indexing will only take place as long as the tab in which the index was triggered is open. The process is automatically aborted, when the tab is closed.

indexing

Setting Up a Search View with Full-Text Search

Go to Structure >Views > Add View and create a view (e.g. search). Drupal documentation exists for creating a search view. (https://www.drupal.org/docs/8/modules/search-api/getting-started/search-forms-and-results-pages/searching-with-views-0). When creating the view, the index must be selected under View settings.

OiN: For a full text search like on http://objekte-im-netz.fau.de/portal/suche the following steps are required:

  • Under Fields select Rendered HTML output and activate Use highlighted field data.
  • Select Search: Full Text search as filter. 
  • Change the filter to Expose this filter to visitors, to allow them to change it. 
  • Click on Required if a user input is required before displaying the indexed fields. If Required is not active, please note that the highlighting only takes effect after entering a search term. The layout of the view mode is bound to the highlighting. Since highlighting is only activated if something has been entered in the search slot, the default values "Object type inventory number title" are entered in OiN and the OR operator is selected as the default operator.
  • Sort Criteria can influence, for example, whether all entries with images should be displayed first (Image desc).
  • Important: Under Advanced > Query options skip item access checks must be active, otherwise the indexing will be slowed down unnecessarily. 
  • Add the filters and fields you need.

Set up facets

facet

Facets can be generated from all indexed fields under Configuration >Search and metadata > Facets. These facets are attached to the View Search and are displayed above it. To be able to create a facet, the index must first be configured and a view created. For each facet created, a facets block is automatically generated, which can then be added under Structure > Block layout > Place Block.

facet settings

  • The Search View must be selected as Facet source.
  • All fields selected in the index are displayed under Field. Select the field to display the facet and change the name of the facet if necessary.
  • After saving, Facet settings will open for further configuration of the facet:
    • Select a widget to define how the facet should be displayed.
    • For example for List of links the following settings can be made: 
      • Activate Show the amount of results to display the number of hits after each entry.
      • Since there are fields with many entries, you can set a soft limit for displaying the entries. If there are more entries than this limit, the remaining entries will not be displayed until more is clicked on the show.
      • Activate Show title of facet if you want the title of the facet to be displayed instead of the block title.
      • Under Operator you can specify whether the facet should be searched as OR or AND.
      • Under Facet sorting you define how the displayed entries of a facet should be sorted. For a purely alphabetical sorting it is sufficient to activate Sort by display value.
  • Further settings are not discussed here, since their function is described quite understandably in the interface. Test it out!
  • Finally, a block must be added for each facet under Structure > Block layout > Place Block:
  • Block configuration: If the facets are only to be displayed on the Search View, the path to the view must be added under Pages, e.g. "/search", and Show for the listed pages must be clicked.

 

Optional: Module Glossary Search API

OiN use case: Since collection objects from over 20 collections are to be displayed in the collection portal at some point, some facets will have an enormous number of entries. These facets must be reloaded each time the view is called. Depending on the browser and Internet connection, loading these facets can take a very long time to display all entries. Since the project has a collection with more than 20000 collection objects with corporate or personal data, the performance of this collection has been tested and found not to be user-friendly due to the long loading time. 

glossary facetFor this problem the module Glossary Search API was tested. The module allows the display of facet entries depending on their initial letter. In addition to the facet with many entries (e.g. corporations), a glossary facet (e.g. glossary of corporations) is created. In the Facet Settings, the entities are then made dependent on the Facette Glossary to entities, so that only selected entries (lists sorted by initial letter) are loaded:

  • Install the module via composer require and activate it under Extend: Glossary Search API
  • Under Configuration > Search and metadata > Search API > Index > Fields the field must be added to facet entities (as described above) if not already done.
  • Under Configuration >Search and metadata >Search API> Index > Processors, the Glossary Processor must then be enabled and the field (entities) whose entries are to be grouped selected.
  • Then (re)index.
  • Under Configuration > Search and metadata > Facets create a new facet via Add facet (glossary of corporations).
  • Then configure the Glossary Facet Glossary for Corporations): 
  • Select as widget Glossary Widget. 
  • The order of the processors must be changed in the settings under Advanced Settings > Build Stage: URL Handler by Glossary Processor 
  • Add the Glossary Facet in the block layout
  • Finally, configure faceted corporations:
  • Activate Dependent Facet with the Enable Condition equal to the Glossary Facet.
  • As Condition mode select Check whether the facet is selected/not empty
  • Deactivate Show facet title and deactivate Block layout Display Block title under Block layout, so that when an initial letter is selected in the Glossary Facet, the entries of corporations are only displayed for the selected initial letter and the loading time is minimized.