Tag Archives: Batch Processing

Locality-Sensitive Hashing on Spark with Clojure/Flambo

Record Linkage is a process of finding similar entities in a Dataset. Using this technique one can implement systems like: Plagiarism Detectors – which are able identify fraudulent scientific papers or articles, Document Similarity – finding similar articles on the … Continue reading

Posted in Analysis, FORMCEPT, Open Source, Research | Tagged , , , , , , , , , , , | Comments Off

Data Analysis should be your Compass

Imagine that you are going from a well-known location- Point A, to an¬†unknown location- Point B. Along your journey, you are referring to a GPS based navigation system and deciding how to proceed in a particular direction. In this scenario, … Continue reading

Posted in FORMCEPT | Tagged , , , , , , , , , | Comments Off