GLS^3

Tuesday, 11 Jul 2006
english • sw • it • soho • linux • projects • opensource
516 words

Imagine an Integrated Desktop Search Engine that re-index every work you write, every file you create, every “content”… as you do that things… in a very discreet way. Every developer will thinks it’s a very difficult “mission”, it isn’t? ;)

Ok, but take a look here:

The GNU/Linux Semantic Storage System (GLS³) is a solution designed to facilitate the management and retrieval of your data. It is a solution that distances you from thinking about Where you store your data to What your data is. With GLS³, you can organize and retrieve your data based on their semantics, based on What they mean to you, and not based on their hierarchical location. GLS³ is an open source semantic storage solution for GNU/Linux that indexes your data, extracts from it metadata and relevant information, allows you to organize it using queries and tags, an API to allow Developers to integrate searching and organization capabilities in their application, an extensible plugin-based Type System, shared schemas between applications through an API, a pseudo file system for backward compatibility, a web interface, As-You-Type searching and more.

I suggest also to take a look to the published demo-videos: these are very interesting and… pregnant!

For Developers and Expert Users:

GLS³ is implemented in C++, with much reliance on the Standard Template Library (STL), and uses ZThread, for multithreading support. Apache's Lucene is used for Information Retrieval (IR). Lucene is a widely recognized IR library used in the implementation of internet search engines and local, single-site searching. After evaluating several IR libraries, Lucene was selected for its leading performance results, the detailed documentation available and its wide adoption, and thus, active development. PostgreSQL was used as the Database Management System (DBMS) for storage of Metadata about Documents, Types and Stores. The Design Documentation describes why PostgreSQL was selected as the DBMS of choice. However, we are evaluating the possibility of migrating to to a lighter DBMS, specifically, SQLite. The core of GLS³ is an user-level daemon process that communicates with other client processes through either an API or an Internet Socket Interface. Information is sent to the Socket Interface as XML, and requests are received through it also through XML, hence comes the need to parse XML documents in the GLS³ daemon. Additionally, To pass objects between the GLS³ daemon and the API, objects are serialized to XML. libxml was used for parsing XML documents. GLS³ includes a pseudo file system that provides a backward-compatibility layer for non-GLS³-aware applications. The pseudo file system is implemented as a client module for FUSE. This file system, named glscubefs, allows users to browse and search the stored information through the traditional file systems interface. Additional dependencies may be brought up by Importers. For example, the PDF Importer depends on xpdf. GLS³ does not depend on any libraries specific to a desktop environment, with the only exception being the Browser. The prototype browser was implemented in HTML, CSS and JavaScript, along with a small KDE container application that uses KHTML for rendering.

Better than Beagle or other similar projects? Test and Answer, please! ;)

Source, OSSBlog.

Too much coffee
too little time

Ivan De Marino

GLS^3