Tag: Batch Processing

Locality-Sensitive Hashing on Spark with Clojure/Flambo

Locality-Sensitive Hashing on Spark with Clojure/Flambo

Record Linkage is a process of finding similar entities in a Dataset. Using this technique one can implement systems like: Plagiarism Detectors – which are able identify fraudulent scientific papers or articles, Document Similarity – finding similar articles on the internet, Fingerprint Matching, etc. The possibilities are endless. But the topic which we are focusing …

+ Read More

Data Analysis should be your Compass

Data Analysis should be your Compass

Imagine that you are going from a well-known location- Point A, to an¬†unknown location- Point B. Along your journey, you are referring to a GPS based navigation system and deciding how to proceed in a particular direction. In this scenario, there are can be two possibilities: You might know how to reach Point C optimally …

+ Read More