Retail - Edging ahead with Big Data

“We all have been hearing about Big Data for over three years. We're at the point where the industry really needs some tangible examples and show what it means.”
                                                  -  Curt Hecht, Global Chief Revenue Officer, The Weather Company

Retail industry around the world has evolved significantly over the years. From corner stores to departmental stores and now in the days of supermarket and hypermarket, retail industry has seen drastic transitions. One of the major game changers in the retail industry was the advent of Internet that revolutionized it with the introduction of E-commerce.

With the launch of Amazon e-commerce services in 1995, many companies started using Internet aggressively for commercial transactions. Since then e-commerce has seen a meteoric rise. The forecast is that online retail sales in the US will reach $327 billion by 2016 with 56% of the population indulged in online shopping.

The rise of e-commerce has led to a large number of online portals that are leveraging multiple channels like social media network to generate awareness about their products. In the process, e-commerce industry has started contributing significantly to the amount of unstructured data that is being generated across the web in addition to the traditional transactional data that they used to generate. This phenomenon is often related to the term "Big Data". It was Internet at the dawn of the 20th century and it is Big Data today which is turning heads and giving companies the cutting edge advantage over the competitors.

With the rising competition, several attempts are being made in the retail industry to make the process more efficient and convenient both for consumers as well as retailers. Most of these attempts are targeted towards adopting a data driven approach that involves capturing and processing immense amount of useful social media content in addition to the in-store transactional data. In this blog, one such case study has been presented. The implementation was done on top of FORMCEPT's MECBOT platform.

Case Background:
Research reveals that approximately 97% of the people who visit an ecommerce site leave without buying anything. While the top performing sites convert almost 17% of the visitors, others are still way behind. Amazon, Wal-Mart and Target are three of the top players in online retail today. Each one of them has their own unique style and enjoy significant customer base. FORMCEPT analyzed the data related to the product “Graco Car Chair” retrieved from these portals.

Problem Domain:
The study was targeted towards these questions-

  1. Content Quality:
    • Is the content attracting new customers apart from creating a regular customer base?
    • Is product description getting easily indexed by the search engines?
    • Is it engaging enough to fortify customer relationships?
  2. Product Assortment:
    • What are the products that my competitors have?
    • What variants of the product do my competitors have?
    • Should I list the products or the variations that I don't have?

Data Challenge:
Data at hand was not in a readily analyzable format. The captured content was mostly plain-text with irregular patterns spanned across the 3 portals. Existence of different variants of the same product posed another challenge in the analysis process.

MECBOT was used to identify the identical products across the retailers and then compare them on various metrics. The outcome was handy in determining the worthiness of listing a product on the portal and also manage the inventory better. The following key components of MECBOT were used:

  1. Data Extraction (Grabby):
    A bunch of files having the product details received from e-commerce portal was sent to MECBOT through Grabby. Grabby holds the responsibility of receiving and pulling the data from external sources. It can be easily configured based on the requirements.
  2. Data Storage (The storage engine):
    Grabby stored the data with MECBOT using the storage engine and the Storage Engine made the data available to C3 engine for desired analytics. Storage engine stores the data based on its structure, size and the required retrieval method. With elastic storage potential, MECBOT provides horizontal scalability and is fault tolerant.
  3. Data Analysis (C3 engine):
    C3 (Classify, Compare and Correlate) engine is made up of classification, clustering and other machine learning algorithms. It was used to identify keywords and buzzwords in the user reviews and classify them accordingly. The following components of C3 engines were used in the process:

    1. Natural Language Processing Component (NLP):
      It extracted the keywords which best described the product by employing natural language processing.
    2. Clustering Component:
      The compare feature of the C3 engine which is a part of clustering component played a crucial role in tracing products across retailers and clustering them into distinct categories. Not only that, if a variant of the same product existed on a competitor website, it recognized that and made it available for analysis. Overall, it fast-tracked identification of similar products across the portals and made it easy to compare.
    3. Sentiment Analyzer Component:
      This feature along with NLP component made differentiation of positive and negative comments very simple. It processed the keywords and identified related sentiments. This type of analysis helped to capture the notion of the product in general at a glance.
    4. Data Visualization Component (Intent channel):
      After deep complex analytics, the results were visualized graphically for a comprehensive view of data and analysis results. The visualization components used were Bar, Spiral and Node-Link graphs delivered in an interactive mode over the Intent Channel.

To remain competitive in the market, retailers ought to keep the right product assortment for the customers. An efficient product assortment ensures that they maintain product variety and manage the inventory efficiently. Thus, it becomes highly important for an online retailer to manage the assortment of goods they are selling.

Data analysis can go a long way in helping retailers finding the right product mix. With the huge influx of Big Data lately, better decisions can be made based on the results of analysis. Let’s take a look at some of the results from our analysis:

Content Analysis:

The Infographic on the right shows the compilation of data received for analysis and how the 3 retailers stood against each other in terms of products, images and reviews tally.

Retailers used different terminologies to showcase their inventory namely “In stock”, “Available” etc. In fact, there was an occurrence of “In Stock” and “in stock” to portray product availability on the same website, in this case Amazon.  Amazon also specified a number to suggest limited stock.

To enhance the reputation of the products on their portals Amazon have put their own site likes as against Wal-Mart which leveraged Facebook likes. The rating of products in Amazon was preferred over the number of likes whereas Wal-Mart data showed that Facebook likes had a greater impact than the product ratings.





Content analysis results also signified the importance of titles and the number of keywords used in the product description. Although, there was a thin-line of separation, Amazon seemed a little bit ahead in the race of providing the content on the web which included higher number of images, reviews and words used in the description.

Discounting prices is often seen as a common practice to attract more customers. However, it is interesting to note that although a certain retailer was the only one to offer discounts on its portal, their price after discount was not the lowest on offer. Another retailer refrained from giving discounts and rather concentrated on impacting the buyers psychologically by "odd pricing" their products at $0.01 lesser than the competitors.

Product Assortment:

The C3 engine comes with a clustering component that was used to group similar products across the three portals to narrow down the product range required for comparison.

The node link graph proved to be extremely handy in identifying variants of the same product on other portals. It linked all the products on these websites based on common features they possessed. End users can easily interact with the graph and explore the nearest variant to a particular product all by themselves.

The drill down analysis highlighted the relevant words used for product description and the supporting images. Amazon clearly scored over the other two in this regard as they used most number of words to describe a product and also benefited by putting in the highest number of relevant keywords in the description.





With its semantic capabilities, MECBOT easily associated sentiments with the comments or reviews posted by the users. It represented the sentiments on colour scaling with green representing a highly positive sentiment and red symbolizing an outright negative sentiment.








A graphical representation of the sentiments gave an overview of sentiments of all the products at a glance. It captured each and every sentiment in just one graph and segmented them product wise. One can easily spot the most popular product by noticing the product with most comments with positive sentiments and also isolate the product with fraudulent reviews which in this case is a product with large number of reviews but from the same user.

analyzed the 3 competitors in the online retail market and came up with valuable and actionable insights.

FORMCEPT has developed a seamless analysis platform called MECBOT. Its mission is to harness the underlying potential of Big Data and help you derive true business value out of it. By leveraging Big Data, FORMCEPT focuses on revenue maximization as well as achieving operational excellence, thereby, boosting the ROI. For more details, please contact us.