HomeProductsServicesDemosContact
Lingo3G Document Clustering Engine
Commercial or Open Source?

The very first version of the Lingo clustering engine we created is available free of charge as part of the Open Source Carrot2 Framework. While the commercial edition of Lingo builds on the experience we gained from on the Open Source engine, we decided to completely rewrite the code in order to achieve superior clustering quality and performance.

Feature Open Source edition Commercial edition
Time of clustering [s]* 100 results 0.34 s 0.06 s
200 results 0.52 s 0.10 s
400 results 0.84 s 0.17 s
5000 results ---** 1.70 s
Hierarchical clustering no yes
Customizable stop word list yes yes
Label filtering (suppressing specific words or phrases in the output cluster labels) no yes
Label boosting (promoting specific words or phrases in the output cluster labels) no yes
Synonyms (defining groups of words or phrases to be treated as synonymous) no yes
Document-to-cluster misassignment (ratio of documents in a cluster that are irrelevant to the cluster label) medium low
Number of tunable parameters 2 55***
Further development Only critical bugfixes New features planned

*) Clustering speed measurements were done for 100, 200, 400 snippets downloaded from Yahoo! for query 'london', using the Lingo3G Tuning Browser application. Benchmark environment: Pentium M 1.3 GHz, 768 MB RAM, Windows XP. Java Virtual Machine: Sun JDK 1.4.2, JVM switches: -Xmx512m -Xms128m -XX:NewRatio=1 -server. Time presented in the table is an average of 75 runs, for each algorithm time measurement was followed by 25 untimed warm-up runs

**) Open Source edition is not scalable enough to reliably cluster very large numbers of documents.

***) Using parameters the following aspects can be tuned: preferred number of clusters and depth hierarchy, preferred length of cluster labels, desired number of unclustered documents, document-to-cluster assignment precision, maximum cluster size and many more.

Learn more: Applications | Features | Integration