wiki:Dev/SearchPushModel

Version 5 (modified by pferreir, 3 years ago) (diff)

--

Search Push Model

Objectives

  • Leaving OAI to Invenio;
    • No-one in the admin mailing list seems to be using it for other than interfacing with Invenio;
  • Using a scheduler-based "push model" that uses Invenio's WebUpload (or other equivalent technology) to send web results to the search engine;
    • A container of things to index is kept.
  • Using the existing XML generating mechanisms (or something new) in order to send the information to the search engine;
  • Allow other services to be plugged in, by making the module extensible enough as to allow other application handlers to register themselves as "receivers" of this data;

Requirements

  • An extensible mechanism that allows the creation of "agents" that are notified periodically about changes occurring in Indico records (conferences, contributions and sub-contributions), processing the information in the way they choose to;
    • The information doesn't need to be "live", meaning that the size of the notification cycle can be in the order of minutes (as to not affect the response time significantly);
  • A particular "agent" that, using Invenio's WebUpload, periodically uploads records that need to be updated;
    • Invenio includes the code for a small "client library" that can be used for this task;
  • An indico.modules.scheduler-based job that controls the synchronization between "agents" and remote services;
  • Means of logging and error control, as well as mechanisms that guarantee that no data is lost;
  • A mechanism for the manual export of data (in case of failure and in order to index data that already exists in Indico);
  • Detailed documentation, explaining how to develop such an "agent", and the different phases of the update process;

Architecture

Systems

  • Indico
  • Remote System (i.e. Invenio) - a system that consumes Information provided that Indico, and that will be periodically sent metadata ("push model");

Design

Agents

An *agent* represents a remote service that consumes data. It basically defines a set of operations that are performed each time there is an update cycle. This normally consists of sending information to the remote service in question, after properly processing it.

Agents vs. Agent Manager

Agents register themselves with an "Agent Manager" that is responsible for properly invoking them each time there is the need to perform an update.

Process

Interaction between layers

"Agent task" (job)

Components

Testing

Future development

  • Build it as a plugin (or at least a module that can be deactivated)? Maybe other people don't want to use search engines - why should we store this information then?

Attachments (2)

Download all attachments as: .zip