Category: Open Source

Locality-Sensitive Hashing on Spark with Clojure/Flambo

Locality-Sensitive Hashing on Spark with Clojure/Flambo

Record Linkage is a process of finding similar entities in a Dataset. Using this technique one can implement systems like: Plagiarism Detectors – which are able identify fraudulent scientific papers or articles, Document Similarity – finding similar articles on the internet, Fingerprint Matching, etc. The possibilities are endless. But the topic which we are focusing …

+ Read More

Nolan Scheduler

Nolan Scheduler

How often have you come across requirements that demand tasks to be performed repetitively at a defined interval? Yes, I am talking about a scheduler but a simple, yet powerful one that justifies its name- Just schedules. That is what Nolan Scheduler is all about. Kuldeep, a champion clojurist, wrote the library and it is …

+ Read More

GDF Graph Loader for TinkerPop 2.x

GDF Graph Loader for TinkerPop 2.x

Recently, we came across .gdf files that are a CSV like format for Graphs primarily used by GUESS. Although GDF file format is supported by Gephi, it was still missing from TinkerPop, one of the widely used graph computing framework. Today, we are happy to release gdfpop, an open source implementation of GDF File Reader …

+ Read More