Transforming Indian Insurance Sector with Data Analytics Edge

Insurance Industry – A Late Adopter to Analytics

With India’s insurable population projected to reach 750 mn1 in 2020, the insurance penetration still hovers below the world average (3.9% against global average of 6.3% in 20132). However, the woes of the industry are far more than just that. Lurking behind steep growth are burgeoning customer acquisition costs, high customer churn that leads to lower retention, and cut-throat competition among players vying for a larger pie of the customer’s wallet.

Customer Acquisition

To begin with, most Indian insurers employ a salesforce consisting of agents who sell directly to customers and in return get paid fat commissions that shoot up the costs of acquiring new customers. Customer acquisition by a company refers to the process of converting prospects and inquiries to new customers by persuading them to buy the company’s products and services. As of 2011, operating expenditures and customer acquisition costs of Indian insurance companies accounted for 25% to 50%3 of the total annual premiums.

Customer Retention

The challenges to customer retention are multi-faceted. High rate of attrition among customers translates into too many policies being returned or lapsed way before they enable adequate premium income for the insurer to become profitable. Customer retention envisages the activities undertaken to retain the maximum number of customers by securing customer loyalty towards the company / brand. The key to customer retention is knowing your customer well enough to facilitate meaningful engagement. To put things into perspective, most Indian insurers collect a large amount of customer data during the initial stages of customer onboarding, i.e. the application process. As policy lifecycle changes from filing a new application to its actual usage and recurring premium payments, the data generation practically ceases. In the absence of targeted customer loyalty programmes, and given that insurance is still not exactly a favourite of the Indian household, customer relationships often wane leading to high churn rates.

Share of Customer’s Wallet

The fresh scramble among insurance players to secure a higher share of the customer’s wallet is in the light of the fact that increasing a company’s pie in the customer’s wallet is often a cheaper way of bolstering revenue than increasing the company’s share in the market. Share of wallet (SOW) refers to the proportion of the customer’s total spending that a business attains through its products and services. However, in the absence of actionable customer analytics and insights into their wallet spend, this remains a wishful proposition.

Even as Indian insurers struggled to make both ends meet, it took them five years (from 2005 to 2010) to finally start selling policies online. The price for late adoption to data analytics have been paid both by insurers and customers – Indian insurance sector lost a mind-boggling INR 30,401 Cr4 (~ 9% of the industry worth in that year) to frauds and scams in 2011.

Data Analytics in Indian Insurance Sector

Data analytics in Indian insurance sector can be perceived as a three-pronged tool – Marketing Analytics, Loyalty Analytics and Risk Analytics.

Marketing Analytics include analytics that drive efforts to maximize fresh influx of customers and attain a higher share of the customer’s wallet, such as promotion campaign analytics, segmentation and targeting analytics, price & premium optimization and marketing mix modelling. Loyalty Analytics are targeted towards optimizing customer retention rates by establishing touch-points for customer engagement on one hand and alleviating customer grievances and doubts on the other. These include analytics on customer satisfaction assessments, customer churn analytics, reduction of claims settlement periods, personalization of customer experience, claims settlement optimization, and customer life-time-value analysis.

Risk Analytics are more for the insurer than for the customer, but also with substantial spill-over effects on the latter. These include risk optimization through analytics such as analytics for fraud detection & management, operationalizing claims approval & risk scorecards, actionable analytics on policy renewal & revival, predictive loss forecasting & modeling.

Customer Analytics for Future Revolution

According to a report by BCG-Google, 75% of insurance policies in India are anticipated to be influenced by digital channels by 20205. This essentially translates into a vast online ecosystem to catalyse growth and profitability for insurers. Using multi-modal data analysis, they can zoom into both structured data (application and policy data, for example) and text-based data (such as reports and experiential data on social media) to design more powerful products, formulate correct pricing and fuel better acquisition followed by improved customer stickiness.

Data analytics has been identified as one of the most pressing issues of insurance company by a PwC report on ’Top Insurance Industry Issues 2014’. Even though the industry virtually sits on a data tank, it lacks the tools and corporate will-power to gain business advantage from that data. According to global risk solutions provider LexisNexis, presently the expenditure on data analytics by insurance companies in India is far too low.6

How far the Indian insurers are ready to walk the extra mile to enter into the analytics fold is yet to be seen – however, the fact remains that analytics will be the key to growth and profitability in the years to come. In our forthcoming blog, we will bring to you the various solutions that FORMCEPT can offer to insurers to enable higher customer acquisition, better retention and larger share of the customer’s wallet.

Posted in Analysis, FORMCEPT, Insurance | Comments Off

Locality-Sensitive Hashing on Spark with Clojure/Flambo

Record Linkage is a process of finding similar entities in a Dataset. Using this technique one can implement systems like: Plagiarism Detectors – which are able identify fraudulent scientific papers or articles, Document Similarity – finding similar articles on the internet, Fingerprint Matching, etc. The possibilities are endless. But the topic which we are focusing on in this article is *De-Duplication, which is the process of finding (and removing if need be) duplicates from a dataset.

Why do we need this? The answer is simple, removing or at least identifying redundant data to save Space (memory, disk space, etc.) or/and Time (avoiding unnecessary/repeated computation on duplicate data). One can simply go about doing this by comparing each entity in the dataset with every other entity, finding similarity score between those entities and doing some more computations on it depending on the application you are building.

But think of it in this way; if there are six strings in a dataset, [“aa”, “ab”, “ac” , “xx”, “xy”, “xz”] and I want to find possible duplicates from it, I’d rather compare “aa” with only “ab” and “ac” for finding its duplicates rather than the entire dataset because clearly [“aa”, “ab”, “ac”] are sort of similar to each other and way different from [“xx”, “xy”, “xz”]. But there is no merit in comparing everyone with everyone.

The computation time of this approach is O(n*(n -1)) where ‘n’ is the number of entities in the dataset. Now this approach is all good when the dataset size is very small. But when the data becomes very “Big”, the awful computation time of O(n*(n -1)) just won’t cut it. So we need to find a technique which reduces the number of candidates to be compared with a particular entity i.e. generating “similar” subsets from the main dataset and then running separate de-duplication tasks on the smaller datasets in a distributed manner which will greatly reduce the overall running time of the program.

So how do we achieve this? How do we intuitively create smaller “similar” datsets from the “bigger” main dataset without wasting too much time in pre-processing?

LSH to the rescue! It stands for Locality-sensitive hashing and it is one of the most common and a convenient algorithm for Document Similarity (in my opinion of course). The best part about this algorithm is that when one hashes the entities (documents or just strings) using LSH, all the “similar” entities tend to have similar hashes. From then on, it is just a matter of grouping the entities by their hash values which will give you smaller datasets and then finding duplicates in a distributed manner. Enough of the explanation, let’s just jump straight into the code. The language of choice is Clojure and we will be writing it for Apache Spark using a Clojure DSL called flambo.

We will be using two external libraries for this, flambo and a hashing library written on top of Google’s Guava, aesahaettr.

You can use this test file (contains restaurant details from two guides, fodors and zagats, and has about 112 duplicates in it) to play around with.

(require '[flambo.api :as f]
         '[flambo.conf :as conf])

(def c (-> (conf/spark-conf)
           (conf/master "local[2]")
           (conf/app-name "De-Duplication")))

(def sc (f/spark-context c))

(def rdd (f/text-file sc "/path/to/file"))

First things first, we need to generate shingles of the rows (string) of the RDD. The reason for doing this is that the chances of having the hash values match of a string’s corresponding shingles are greater than the entire string itself.

(defn k-shingles
  [n s]
  (let [indexed-str-seq (into {}
                          (map-indexed (fn [idx itm] {idx itm}) (seq s)))
        shingles (->> (map
                        (fn [[idx str-seq]]
                          (if (<= idx (- (dec (count indexed-str-seq)) (dec n)))
                            (reduce str
                              (map #(indexed-str-seq % "") (range idx (+ idx n))))))
                      (filter #(not (nil? %))))]

The function above requires two arguments, ‘n’ -> shingle size and ‘s’ -> The string.

NOTE: If your string size is very small, for e.g. if the rows of RDD are just first names, you can just create a list of individual strings of your string:

(map str (k-shingles 1 "punit"))

After this we need to hash each generated shingle of the string ‘X’ amount of times. This is again done to improve the chances of hash values matching.

(require '[æsahættr :as hasher])

(defn gen-hash-with-seeds
 (map #(hasher/murmur3-32 %) seeds))

(defn hash-n-times
 [n shingles-list]
 (let [hash-fns (gen-hash-with-seeds (range n))]
     (fn [x]
         (fn [y] (hasher/hash->int (hasher/hash-string y x)))

Now it is time to generate the MinHash Signature of that string (or document). We do this by taking the lists of hashed values (where all of them have the same size) and finding the minimum hash value at position ‘i’ from every list thereby generating a single list of hash values which is the minhash for that string.

(defn min-hash
 (reduce (fn [x y] (map (fn [a b] (min a b)) x y)) l))

Now we partition the minhash signature into smaller ‘bands’ and then hash each of them for a final time. The purpose of doing this is that candidate (or similar) strings will have at least one or matching ‘hashed’ band and then we can group the strings by their ‘hashed’ band and generate candidate lists.

(defn partition-into-bands
 [band-size min-hashed-list]
 (partition-all band-size min-hashed-list))

(defn band-hash-generator
 (let [r (range -1
           (unchecked-negate-int (inc (count banded-list))) -1)
       ; Incrementing "band-size" because we are starting from -1
       hash-fns (gen-hash-with-seeds r)
       hashed-banded-list (map
                            (fn [x y]
                                (hasher/hash-string x
                                  (clojure.string/join "-" y))))
                            hash-fns banded-list)]

; Output of "partition-into-bands" is the input
; for "band-hash-generator"

After this we will have our RDD in the following form

RDD[String, List(Int)]

After this we have to write a code which maps through the List of hash values and then writes a Key-Value pair of [hash value, String]. Then you use the “combine-by-key” function of flambo to gather all the Strings (or Docs) with the same hash value. The only minor issue in this case is that when two strings have multiple matching bands, you will still have to collect the candidate list for all of them and then apply a distinct on the sorted candidate lists. Now it is only a matter of comparing the strings in all the candidate lists. The method that we generally use to compare strings is Levenshtein Distance. You can also set a threshold parameter that will classify the strings as duplicates only if the Levenshtein Distance is greater than it.

[*] De-Duplication is a subset of a larger topic, Document Similarity, which is mentioned above.


  1. Clojure for Data Science book –
  2. spark-hash intro –
Posted in Analysis, FORMCEPT, Open Source, Research | Tagged , , , , , , , , , , , | Comments Off

Unlock Insights to Boost User Experience Online

FORMCEPT has achieved pioneering position in extracting in-depth insights from piles of data. An interesting testimony to the degree of our influence in data analytics solutions space is our recent collaboration with ESPN Cricinfo to deliver a data analytics solution on our patent-pending platform that coincided with Cricket World Cup 2015. The solution for one of the leading providers of high value cricket analysis, news and trends, fit well with its popularity as a trusted authority on the game.

The visually eye pleasing data points are arranged in tabular and chart form for easy readability. The ability to extract insights from the site helps deliver a superior level of engagement with web visitors, irrespective of what their position is regarding cricket as a game.

Player Profile Analysis

Player Profile Analysis

Records - Most Wins by Teams

Records – Team-wise Wins

Some of the ways in which it provides engrossing and interesting data analytics experience to different categories of website visitors are as below -

1. A Team Manager

a. Single View – A team manager can easily correlate past performance and gain an idea of a player’s form over a period of time stretching back to last 10-15 years.

b. Decision Making – It helps him make insightful decision on whether to select, retain or drop a players based on his strengths and performance.

c. All-inclusive – The holistic data platform from FORMCEPT allows a well-thought view of how a player performs in different formats of the game. So if you are a team manager and want to see if a player is fit for IPL, you can check out his T20 record.

2. A Player

a. Visually Attractive – A player can gain invaluable performance insights in a highly interactive tabular and graphical format.

b. Performance Analysis – A batsman can conduct a meticulous self-analysis with help of detailed breakdown. This helps him to carry out an all-inclusive data-backed SWOT analysis and thus improve his game based on the website’s insights.

c. Up-to-date – The platform collates data in near real time. It allows a player to see latest figures in an easy to read manner and do a competitive analysis.

d. Takeaways – After a prolonged duration, it becomes difficult to track what are a player’s strong points, weak points or which team or player he is susceptible to. The impactful visualizations helps him glean useful data on performance trend of recent past and take corrective action for future.

3. End Users (fans)

a. Engrossing – As a fan, you get interesting facts quickly about your favorite player or team. For any cricket sportsperson you get insights on various aspects around his performance.

b. Authority – You comes across as a knowledgeable authority on the game of cricket when you share visually appealing stats over social media with your friends and acquaintances

c. Interesting Data Cuts – Fans get practically innumerous ways to look at data filtered by various parameters such as match formats, players, opposition, country played, ground played, and year etc. for both batting and bowling.

4. ESPN Cricinfo site

a. User Loyalty – As a result of the high amount of time spent by website visitors playing around with its interactive platform, the chances of sales conversion and business revenues is higher than its competitors.

b. Competitive differentiator – The site gains a competitive upper hand from the platform’s enhanced repeat visit potential and amazingly captivating content.

c. Rewarding User experience – The platform provides a highly appealing and immensely immersive UI/UX experience online for website visitors, thereby giving the site a distinct appeal.

Here is a glimpse of Insights Interface-

Sachin Tendulkar and Opposition

Sachin Tendulkar and Opposition

Sachin Tendulkar - Pace vs Spin

Sachin Tendulkar – Pace vs Spin

Sachin Tendulkar at World Cup

Sachin Tendulkar at World Cup

ESPNCricinfo site successfully merges ESPN’s proven cricket expertise with FORMCEPT’s superior technical acumen. The outcome is a visually stunning data analytics and insight platform that increases the value of information consumed by the website visitors.

Posted in Analysis, FORMCEPT | Tagged , , , | Comments Off

Nolan Scheduler

How often have you come across requirements that demand tasks to be performed repetitively at a defined interval? Yes, I am talking about a scheduler but a simple, yet powerful one that justifies its name- Just schedules. That is what Nolan Scheduler is all about.

Kuldeep, a champion clojurist, wrote the library and it is now an important part of FORMCEPT platform. It schedules all the jobs within the platform and keeps users up-to-date with the job status.

Email Scheduler

Lets take an example of email scheduler that is required to read emails from an email account, say GMail and do the same periodically. This is the classic use case for a scheduler. So, here is how you can schedule your email reader job-

Step-1: Pick the function to schedule

In this case, we can create a simple function that reads all the unread emails from the specified GMail account. Here is my namespace with the function read-email-

(ns fcgmail.core
  ^{:author "Anuj" :doc "FORMCEPT GMail Reader"}
  (:require [clojure-mail.core :as mcore]
            [clojure-mail.message :as m]

; GMail Store Connection
(def ^:private gstore (atom nil))

(defn- read-msg
  "Reads the message and returns the subject and body"
  {:subject (msg :subject)
   :body (-> (filter
               #(and (:content-type %)
                     (.startsWith (:content-type %) "TEXT/PLAIN"))
               (msg :body))
             first :body)})
(defn read-email
  "Reads unread emails and marks them as read"
  [uri email pwd]
    (reset! gstore (mcore/gen-store email pwd))
    (let [msgs (mcore/unread-messages @gstore :inbox)
          fcmsgs (map #(read-msg (m/read-message %)) msgs)]
      (doseq [fcmsg fcmsgs]
        (log/info (str "Retrieved: " fcmsg))
        ; Do whatever you want with the message
    (catch Exception e (log/error (str "Failed: " (.getMessage e))))
      (do (mcore/mark-all-read @gstore :inbox)
          (mcore/close-store @gstore)))))

It uses clojure-mail project to connect to GMail and read the messages. I will keep that explanation for the next blog but I encourage readers to go ahead and take a look at this project as well.

Step-2: Schedule

Now, comes the most interesting part. This how you can schedule your target function, i.e. read-email for this example-

(ns fcgmail.core
  (:require [nolan.core :as n]))

; Create Scheduler
(defonce sc (n/get-mem-scheduler))

; Schedule
(n/add-schedule sc "R//PT30S" #(read-email uri email pwd))

That is it :-) – Your scheduled function will be called every 30 seconds as per the repeating intervals syntax of ISO 8601. The function add-schedule returns a schedule ID which can be used later to expire a scehdule which stops all further executions and removes it from schedule store as shown below-

(expire sc scid)
; Check expiry status
(expired? sc scid)
; Should return true

By default, the library comes with built-in in-memory scheduler but you can extend the ScheduleStore protocol to the store of your choice. Please give it a try.

Posted in Development, FORMCEPT, Open Source, Research | Tagged , , | Comments Off

GDF Graph Loader for TinkerPop 2.x

Recently, we came across .gdf files that are a CSV like format for Graphs primarily used by GUESS. Although GDF file format is supported by Gephi, it was still missing from TinkerPop, one of the widely used graph computing framework.

Today, we are happy to release gdfpop, an open source implementation of GDF File Reader for TinkerPop 2.x under Apache License, Version 2.0. It allows you directly import .gdf files into FORMCEPT’s FactorDB storage engine that is compliant to TinkerPop 2.x blueprint APIs.

gdfpop APIs

gdfpop provides a method GDFReader.inputGraph that takes in an existing com.tinkerpop.blueprints.Graph instance and an input stream to the GDF file. There are three optional parameters-

  1. buf: Buffer size for BatchGraph. See BatchGraph for more details.
  2. quote: You can specify the quote character that is being used for the values. Default is double quotes.
  3. eidp: Edge property to be used as an ID

The implementation handles all the missing values, datatypes, default values and quotes gracefully. Here is a sample .gdf file that can be loaded via gdfpop-

nodedef>name VARCHAR,label VARCHAR2,class INT, visible BOOLEAN default false,color VARCHAR,width FLOAT,height DOUBLE
a,'Hello "world" !',1,true,'114,116,177',10.10,20.24567
b,'Well, this is',2, ,'219,116,251',10.98,10.986123
c,'A correct 'GDF' file',,,, ,
edgedef>node1 VARCHAR,node2 VARCHAR,directed BOOLEAN,color VARCHAR, weight LONG default 100
a, b,true,' 114,116,177',
b,c ,false,'219,116,251 ',300
c, a  , ,,


For example, consider the following graph taken from default TinkerPop implementation-


It has 6 vertices and 6 edges with each vertex having two properties- label and age and each edge having a weight. The only change that we have done to convert it into a GDF file is that the property name has been renamed to label because name is used as node/vertex ID in GDF. See GDF File Format for all the possible properties for a vertex. The gdf file corresponding to the above graph is shown below-

nodedef>name VARCHAR,label VARCHAR,age INT,lang VARCHAR
edgedef>node1 VARCHAR,node2 VARCHAR,name VARCHAR,label VARCHAR,weight FLOAT

Although, GDF specification does not talk about an ID for the edges but you can ask gdfpop to use a specific edge property as an edge ID using the eidp parameter.

Using gdfpop

Consider an example.gdf file with the above vertices and edges is provided as input and you wish to use all the awesomness of TinkerPop 2.x stack on it. To do so, follow these steps-

Step-1: Build gdfpop

Currently, gdfpop is not available on Maven Central, so you will have to pick the latest release or build from source using the following command-

mvn clean compile install

Once Maven builds gdfpop, it will be available within your local maven repository and good to be integrated with your existing code base using the following maven dependency-


Step-2: Load GDF files

Now, you can use the org.formcept.gdfpop.GDFReader functions to process and load the above example.gdf file as shown below-

// initialize
Graph graph = new TinkerGraph();
// load the gdf file
GDFReader.inputGraph(graph, new FileInputStream(new File("example.gdf")), "\"", "name");
// write it out as GraphSON
GraphSONWriter.outputGraph(graph, System.out);

The above code snippet will create a TinkerGraph, load it with all the vertices and edges as defined in example.gdf file and dump the loaded graph in GraphSON format that we can easily verify. For example, here is a JSON dump from the sample run of the above code-

    "mode": "NORMAL",
    "vertices": [{
        "name": "3",
        "label": "lop",
        "lang": "java",
        "_id": "3",
        "_type": "vertex"
    }, {
        "age": 27,
        "name": "2",
        "label": "vadas",
        "_id": "2",
        "_type": "vertex"
    }, {
        "age": 29,
        "name": "1",
        "label": "marko",
        "_id": "1",
        "_type": "vertex"
    }, {
        "age": 35,
        "name": "6",
        "label": "peter",
        "_id": "6",
        "_type": "vertex"
    }, {
        "name": "5",
        "label": "ripple",
        "lang": "java",
        "_id": "5",
        "_type": "vertex"
    }, {
        "age": 32,
        "name": "4",
        "label": "josh",
        "_id": "4",
        "_type": "vertex"
    "edges": [{
        "weight": 1.0,
        "node1": "4",
        "name": "10",
        "node2": "5",
        "_id": "10",
        "_type": "edge",
        "_outV": "4",
        "_inV": "5",
        "_label": "created"
    }, {
        "weight": 0.5,
        "node1": "1",
        "name": "7",
        "node2": "2",
        "_id": "7",
        "_type": "edge",
        "_outV": "1",
        "_inV": "2",
        "_label": "knows"
    }, {
        "weight": 0.4,
        "node1": "1",
        "name": "9",
        "node2": "3",
        "_id": "9",
        "_type": "edge",
        "_outV": "1",
        "_inV": "3",
        "_label": "created"
    }, {
        "weight": 1.0,
        "node1": "1",
        "name": "8",
        "node2": "4",
        "_id": "8",
        "_type": "edge",
        "_outV": "1",
        "_inV": "4",
        "_label": "knows"
    }, {
        "weight": 0.4,
        "node1": "4",
        "name": "11",
        "node2": "3",
        "_id": "11",
        "_type": "edge",
        "_outV": "4",
        "_inV": "3",
        "_label": "created"
    }, {
        "weight": 0.2,
        "node1": "6",
        "name": "12",
        "node2": "3",
        "_id": "12",
        "_type": "edge",
        "_outV": "6",
        "_inV": "3",
        "_label": "created"

You can notice that it has 6 vertices and 6 edges that were defined in the example.gdf file earlier.

Currently, gdfpop is compatible with only TinkerPop 2.x implementation. Going forward we may look into providing a plug-in for TinkerPop 3.x as well based on the interest of the community. Feel free to give us a shout at gdfpop.


  1. GDF: A CSV Like Format For Graphs –
  2. GUESS: The Graph Exploration System –\_GUESS\_.gdf_format
  3. Gephi: The Open Graph Viz Platform –
  4. TinkerPop: An Open Source Graph Computing Framework –
  5. gdfpop: Open source GDF File Reader for TinkerPop 2.x –
  6. Apache License, Version 2.0:
  7. GraphSON Reader and Writer Library:
Posted in Development, FORMCEPT, Open Source, Research | Tagged , , , , | Comments Off

Gen-next of resumes : From standard text to visual infographics

Earlier this week, veteran HR executive, Lee E. Miller in his column for, noted how visual resumes will dominate the next big wave in the recruitment industry. With recruiters starting to see more visual resumes, candidates are considering to traverse that path and catch the attention of recruiters by turning to infographics over textual resumes.

Recruiters, who have gladly received the idea of visual resumes, believe that the acceptance is going to increase across the recruitment industry as the innovation and creativity involved reduces the effort from recruiter’s end to quite an extent. Stunning visuals are often showcased to mesmerize the HR managers and stand out above the crowd.

 Resume Intent @FORMCEPT

“It is easier to absorb visual content as vision rate of humans is very high and over 90% of visual information that is captured gets stored in the brain.”

Images are easily captured by a human brain and are retained for longer periods of time. To deliver a lasting impression on HR managers – FORMCEPT offers infographics to illustrate candidates’ profiles as visual summary of skills, experience, achievements, education and interests. This is how a resume infographic looks like-

Visual Resume Infographics

Over and above, FORMCEPT provides advanced analytics options for the recruiters to query and explore multiple resumes and also compare them alongside. For more details, please contact us.

Posted in FORMCEPT, Infographics, resume, visual CV | Tagged , , , , , , , , , , , | Comments Off

Data Analysis should be your Compass

Imagine that you are going from a well-known location- Point A, to an unknown location- Point B. Along your journey, you are referring to a GPS based navigation system and deciding how to proceed in a particular direction. In this scenario, there are can be two possibilities:

GPS Scenario

  1. You might know how to reach Point C optimally (event though the GPS may be suggesting a longer route via Point-X) and then rely on the GPS system to reach the destination, i.e. Point-B.
  2. You might blindly follow the GPS based navigation system to take you to the destination (Point-B) through Point-X that it thinks at that time might be the best possible route for you.

While you are on your way, you might change your course in-between due to traffic jams, or road blocks. In that case, the online navigation system will re-calculate the route to pick up where you are and start guiding you.

In fact, navigation systems have become intelligent enough to find out whether there is a traffic jam at certain places and provide alternate efficient routes, all in real time. In addition to that, they are non-intrusive and they provide the driver with complete freedom to follow the navigation system or change the course- “Navigation system adapts to the change”.

The navigation system provide you insights on the traffic data/route and you as a decision maker take the input and act on that.

So, how is this relevant in the context of Business? Consider, a typical organization where, the CXOs know the current state of the business (Point-A) and are eager to accomplish business goals (Point-B) faster. They have enough data collected inherently (knowledge) and are progressing towards the goal (Point B). In the context of business, Point-B might be any of these depending on the CXO level within the company-

  1. Increasing the revenue by  x%
  2. Increase product features as per the market demand
  3. Save cost by y%
  4. Increase customer base by z%
  5. Save inventory cost etc.
Company Compass

What is missing is a data driven analysis platform (GPS Navigation System) that can guide them to reach the desired destination faster and with the existing resources.

Why they need a platform rather than an application is that, one application may not be the silver bullet for all the requirements. An organization needs more than one application, custom built, for the business using the available data and resources to solve a particular business problem. The data driven analysis platform should inherently support that. The platform should be agile so that it can support multiple applications and adapt to the business requirement by doing all the heavy lifting of the repetitive and common tasks related to data analysis. In other words, it should quickly re-calculate the optimal path to the destination as and when there is a deviation from the earlier suggested path.

Can the current traditional Business Intelligence systems do that? It is challenging because the traditional BI systems are designed to work on structured data and are monolithic by nature. Moreover, the rate at which the data is being generated these days is much higher and mostly unstructured. The platform that can capture, store and analyze such data should

  • Treat the unstructured data in the same rigour as the structured data
  • Provide quick insights in as and when they are required (on-demand/real-time)
  • Understand context, i.e. put forth the possible strategies to reach the wanted destination and based on the choice taken by the decision maker assist them optimally

FORMCEPT Big Data platform is designed just for that. It enables enterprises

  • To gain business insights faster by leveraging the available data
  • To respond faster to the ever changing Business Intelligence requirements
  • To make “Dark Data” extinct by leveraging the historical data of the organization

FORMCEPT Data Analysis FlowFORMCEPT uses proprietary Data FoldingSM techniques to discover the relations and patterns that exists across the datasets and generates fact based unified views. What it means to business is that different business units can create their own virtual data in the form of unified data view and can write their own cognitive based data driven applications for the business problem.

For example, an e-commerce company’s marketing department can build their own Influencer Application which understands the customers holistically based on not only transactional data but also public data, like- social media, blogs, etc.. Based on this application, they can target a product promotion campaign effectively, thereby, increasing the revenue and customer base.

To learn more about FORMCEPT and how it can solve your business problem, please

Posted in FORMCEPT | Tagged , , , , , , , , , | Comments Off

Big Data Tech Conclave 2013 – Part-2

In the previous blog, we discussed how FORMCEPT addresses the “Data Infrastructure Issues” using its MECBOT platform. In this blog we will take you through two real customer use-cases and show how enterprises can leverage MECBOT to solve the business problems.

Use Case 1: Loyalty Analysis and Targeted Promotional Campaign

Data Sources Goal
Bank Statements and Bills (PDF documents) To segment the customers
based on loyalty and target a promotional campaign
on specific set of products
Public data sources, like Geolocation, Region, Country, etc.

Following are the basic requirements for this use case-

  • Deploy a scalable data analysis platform for storage and analysis of documents *
  • Extract facts, like- account numbers, transactions, etc. from these documents
  • Enrich the content using the location data
  • Identify transaction patterns from the data and come up with a loyalty model
  • Validate the model
  • Represent the results such that key stakeholders can explore the results and initiate a targeted promotional campaign

* One of the key factor for underlying Data Infrastructure

Continue reading

Posted in FORMCEPT, Infographics, Retail | Comments Off

Big Data Tech Conclave 2013 – Part-1

Leaders from around the world gathered at the “Big Data Tech Conclave 2013 Winter Edition” marking the success of the event held on the 6th and 7th December 2013 at Bangalore. FORMCEPT was associated with the global conclave as an endorsing partner.

Big Data Tech Conclave winter editionThe 2-day event hosted back-to-back inspiring session around the deluge called Big Data. It was well attended and eminent personalities from the industry shared their knowledge and experience with the audience.

In this blog, FORMCEPT would like to share the key takeaways from the event.

Big Data Tech ConclaveOn the first day of the event there was one thing common across all the talks- “Data Infrastructure Issues”. It is a broader term for the issues related to Data Capturing (Structured and/or Unstructured), Storage, Analysis, Delivery and Visualization.

Most of the talks forced us to think- Do enterprises need to worry about the “Data Infrastructure Issues? and that too all of them?” or do they just need to worry about solving their business problem? It made us think- when we buy a Fridge or AC do we ever ask about the compressor being attached or any of the electronic system being used? If not then why can’t we ease the pain for the enterprises in the similar way for their Data Infrastructure issues?

If we talk about the current scenario of data infrastructure, it is evident that the traditional technologies are slowly being replaced by the upcoming technologies and the gap between human expertise and the technology is increasing at a rapid pace. This scenario is jeopardizing the data analysis, typically Big Data analysis adoption in most of the enterprises due to the lack of robust Data Infrastructure. On the other hand, if you ask the CXOs, they definitely want to adopt the same as they are aware of the competition that is taking advantage of emerging data analysis techniques.

FORMCEPT addresses this by MECBOT, a unified analytics platform, built on top of state of the art Open source like- Hadoop, HBase, Storm and Spark. Enterprises are now taking advantage of MECBOT that does all the heavy-lifting around data and makes it available on-demand as well as in real-time. Enterprises can focus on their business problem rather than worrying about the “Data Infrastructure Issues”. MECBOT also allows enterprises to develop Data Driven applications faster and scale it on demand using their existing skill set.

To know more about FORMCEPT and MECBOT, please

Posted in FORMCEPT | Tagged , , | Comments Off

FORMCEPT featured at TechCrunch Bangalore

For the first time ever, TechCrunch International City event arrived in India and was held in the tech-hub Bangalore spanning across 2 days (November 14 – 15, 2013).

FORMCEPT was featured among 50 startups selected from hundreds of entries for Pitch Presentations. The event showcased these startups launching their products before a live and online audience, including a panel of 50 investors and expert judges.

We are proud to be a part of chosen few to demonstrate out product MECBOT at TechCrunch platform.

Tech Crunch IndiaTechCrunch is a leading technology media property, dedicated to obsessively profiling startups, reviewing new Internet products, and breaking tech news. TechCrunch Bangalore focused on encouraging the upcoming Indian startups to have a ground-breaking impact on the global stage.

Posted in FORMCEPT | Comments Off