Big Data Tech Conclave 2013 – Part-2

Big Data Tech Conclave 2013 – Part-2

In the previous blog, we discussed how FORMCEPT addresses the “Data Infrastructure Issues” using its MECBOT platform. In this blog we will take you through two real customer use-cases and show how enterprises can leverage MECBOT to solve the business problems.

Use Case 1: Loyalty Analysis and Targeted Promotional Campaign

Data Sources Goal
Bank Statements and Bills (PDF documents) To segment the customers
based on loyalty and target a promotional campaign
on specific set of products
Public data sources, like Geolocation, Region, Country, etc.

Following are the basic requirements for this use case-

  • Deploy a scalable data analysis platform for storage and analysis of documents *
  • Extract facts, like- account numbers, transactions, etc. from these documents
  • Enrich the content using the location data
  • Identify transaction patterns from the data and come up with a loyalty model
  • Validate the model
  • Represent the results such that key stakeholders can explore the results and initiate a targeted promotional campaign

* One of the key factor for underlying Data Infrastructure

The following capabilities are provided out-of-the-box by MECBOT to address the above requirements that are related to data infrastructure-

  • Data Source Configuration: In this case, the input is configured to be a file that can be streamed to MECBOT or uploaded from the file system
  • Data Plan: MECBOT provides a visual tool where data steward/admin/analyst can specify various processors, extraction rules etc. and harmonize the data fetched from internal and external data sources. The harmonized data can then be stored for analysis.

Here is a sample data plan created for the above use case-

Data Modeling using MECBOT
Fig 1: Data Plan

Once the Data Plan is finalized and scheduled for execution, MECBOT performs the following steps-

  1. Fetches the content through data sources: In this case, content is collected from file system if it is batch mode or from the stream for real-time analysis
  2. Processes it as per data plan and harmonizes the data: In this case, the transactions from the bills are extracted along with account numbers
  3. Enriches the content: In this case, the existing location details are augmented with geo-locations. The system also marks the currencies and transaction amounts in the data.
  4. Creates various views using Data Folding(SM) Techniques: MECBOT semantically links relevant data, creates linkages within the data and aligns data around the business domain

In short, MECBOT creates several “unified topology” views across the data. These views can be further combined together to generate more interesting views and noticeable patterns. All the views generated are stored back into the system for on-demand retrieval and analysis in the future.

By taking care of the above four steps (primarily involved with data pre-processing and 80% of the effort), MECBOT assists the Data Scientists/Analysts to focus only on building the loyalty model. Further, it provides a seamless query interface for the analysts to do the exploratory data analysis and choosing the right dataset to build the Loyalty Model.

Once the models are developed, the results of the analysis and the dataset can be represented in the form of nice charts and visualization as shown below. In this case a bubble chart is used to represent the loyal customers as shown in Fig. 2 and clusters of customers as shown in Fig. 3. The model being visualized is the Recency-Frequency-Monetary (RFM) model and the colors define the loyalty segments.

Customer Loyalty Analysis
Fig 2: Customer Loyalty RFM Analysis                                            Fig 3: Cluster Analysis

CMO or Head of Marketing can now interact with the visualization [Fig. 2, 3] and determine primary location for the campaign. They can also choose multiple locations to compare the loyalties side by side as shown in Fig. 4.

Comparative Analysis of Loyalty
Fig 4 : Comparative Analysis of Loyalty across Location

The above use case shows how MECBOT helps in Loyalty Analysis. Now, lets take a look at entirely different use case of Call Data Records (CDR) analysis.

Use Case 2: Call Data Record Analysis for Influencers

Data Sources Goal
Call Data Records (CSV) To identify a set of Influencers
who can be targeted for
promotional campaigns
Public data sources, like YELP, etc.

Following are the basic requirements for this use case-

  • Store the CDR Data
  • Analytics around CDR data
    • Determine the strength of connection for each individual
    • Determine the connected individuals within a network based on the call records
    • Determine the kind of services being used by a group and the frequency
    • Determine the usage patterns, like- the time of the day when the usage is maximum
  • Identify Influencers and target them for a marketing campaign

The following capabilities are provided out-of-the-box by MECBOT to address the above requirements that are related to data infrastructure-

  • Data Source Configuration: In this case, the input is configured to be CDR data that can be either provided as CSV files or if you have customer related information in a database, then that can be configured as well
  • Data Plan: The visual tool guides you in designing the appropriate model by defining the rules that govern it

Once the data plan is finalized and executed, MECBOT fetches the dataset and stores it for further analysis. In this case, the enrichment like GeoTagging, etc. is done on the data before it is being stored. Since CDR dataset has explicit relations between the caller and the callee, MECBOT uses its distributed graph engine to prepares a graph view of the CDR data behind the scenes. Now, Data Scientists can explore the connected graphs and create models on top of it to determine the Influencers in the network.

Fig. 5 shows the network created using CDR data. Here, the colors represent the type of service being used (call/data) and the size of the node represents the strength of the individual subscriber within the network.

Influencer Network Graph
Fig 5: Influencer Network Graph

The above graph can be further explored and drilled down by selecting a subscribed node and exploring the top 10 influenced subscribers as shown in Fig. 6.

Influencers in a Consumer Network
Fig 6: Influencers in a Consumer Network

The above graph shows that if we focus on 4 subscribers, we can reach each and everyone in the above selected Influencer network.

Data Scientists can further explore the graph, augment it with YELP dataset to figure out various aspects, like- demography, travel pattern etc. They can then assign certain scores to the subscribers based on their loyalty and strength in the network. These scores can be exported back to the traditional RDBMS database for the existing CRM systems can be used for running focused campaigns.

Summary

With the help of MECBOT, enterprises can save the huge amount of time spent by each and every data scientist/analyst on just pre-processing of data to make it ready for consumption. In this blog, we discussed two use cases from different domains and showcased how MECBOT can capture, store and analyze the data. Both the use-cases demonstrate how Data Scientists can focus on the actual business problem instead of focusing on the Data Infrastructure issues.

To know more about FORMCEPT and MECBOT, please contactus@formcept.com