The Story of MECBot's Evolution

 

FORMCEPT Completes 9 Years of Solving Data Analytics Challenges with Product Innovation.

Introduction to MECBot.

FORMCEPT was founded in 2011, and during our early days between 2012-14, our mission was primarily to democratize access to data and analytics across organizations without the need for them to invest in any special skills, technology or infrastructure.

We released our initial version of MECBot (1.0) in 2012, which was adopted by two of the Fortune 1000 companies and also by a few well-known enterprises in India and the U.K. MECBot 1.0 enabled storage, analysis, and retrieval of data on-demand at the right time and on the right device. Users also had the option to configure interests like what data to ingest, from where to source it, and the device where the insights would be delivered.

Backed by the classification of unstructured text data using FORMCEPT's proprietary Natural Language Processing algorithms and a Knowledge Graph that's built on the concepts of Linked Data, MECBot compared the content by topic, timeline, sentiment, etc. Thus, MECBot could easily automate tasks like opinion mining, trend analysis, and pattern detection at scale.

It could also take care of all data infrastructure issues like data capturing, data storage, data analysis, and data visualization. MECBot simplified data harmonization and enrichment of data. For example, we developed the Healthcare Engine that can be used to annotate the medical records by augmenting the Enhancement Structure of Apache Stanbol. The existing data analytics solutions in the market at that time provided several of these features but not in a unified and integrated manner.

By accessing a unified view of the data, users were now able to tie unstructured data to mainstream analytics on MECBot with just a few clicks. ESPN Cricinfo and Monster India were some of our earliest customers. Apart from receiving the prestigious Frost and Sullivan GIL award during this time, we also filed multiple patents, of which four got approved within a year.

2015-2016: Shift to Graph Storage

The next revolution in MECBot happened in the storage layer.

We had already sown the seeds of graph-based storage back in 2013 when we opted for a storage structure that can directly query the specific cell where multiple data values are related. We equipped MECBot's Fact Store to make the entire data storage layer as graph-based. Our multi-modal data store - FactorDB - was positioned as the smart, unified storage for all kinds of data. Graph technology can work across domains and also supports the upcoming use cases of IoT. It enabled MECBot to interface with machines (IoT integration), monitor assets intelligently, perform auto-recognition of hidden patterns in data, and empower users with Smart Data Discovery. This laid the groundwork for MECBot 2.0.

2017-18: MECBot 2.0 Goes Live.

By abstracting all the layers, we ensured that MECBot 2.0 supported all the technologies underneath each layer without the user having to worry about hopping from one technology to another.

One of the most important features that set us apart from the crowd is the seamless connection of data across multiple sources using our patented Data Folding™ techniques (now referred to as MECSense). It helped to discover patterns automatically without depending on the user's query. It is not limited by underlying datasets, schema, or algorithms.

This helps to intelligently identify patterns in real-time by coupling unsupervised machine learning with cutting-edge artificial intelligence. It also forms the basis of MECBot's ability to perform ad-hoc queries.

Then, in 2017, we received Series A Funding from GVFL and were awarded 4 U.S. patents and filed nine patents. It was a precursor to positioning MECBot as an Augmented Analytics product. Augmented Analytics automates the entire process of data ingestion, unification, pre-processing, cleaning and transforming, contextualization, and converts it to insights with little or no supervision by a Data Scientist.

2019: Adapting to Changes in DevOps, DataOps & MLOps.

We started 2019 on a great note as we were recognized as the Best AI Company of the year at the Global AI Summit and Awards, organized by AICRA.

Our goal now was to augment MECBot's ability to deliver the holy trinity of Data Analytics in real-time. These are:

  • Continuous integration,
  • Continuous delivery, and
  • Continuous deployment,

without the need for coding by the user.

To make this happen, we introduced a new component in MECBot 3.0 - the Blueprint. Conceptually, this layer was always there, but it was scattered across multiple components.

At present, Blueprint is the first component of MECBot with which users interact. It is essentially the data ingestion module whose primary function is to input all enterprise data irrespective of their source, type, or format. However, data ingestion with MECBot’s Blueprint is much more than just that.

At FORMCEPT, we understand that in analytics, garbage in = garbage out. We recognize that data scientists spend over 60% of their precious time in cleaning and massaging the data and preparing it for data analysis and AI-ML models.

MECBot’s Blueprint component addresses this problem using an intelligent and dynamic pipeline engine. It comes loaded with the following layers.

  • Data Collection Layer: We have built our own plugins to ingest file-based data, mainly textual, image, and videos (currently on Alpha). We have also integrated with the third-party tool called Streamsets to ingest data from devices or Databases. Our plugins are containerized (using docker and orchestrated by Kubernetes), and hence, can auto-scale based on the incoming data.
  • Data Streaming: In each phase of the Data Movement, the components and subcomponents process their respective tasks and publish it back to Apache Kafka.
  • Data Transformation: The data which comes in is then processed for various transformations through Spark. After each transformation, the transformed data is again published on Kafka for downstream users to do further actions on the data.
  • Data Enrichment: Real-world data is richly interconnected and needs to be augmented by the right context. For example, MECBot maps the IP addresses to geo-locations so that various downstream analytics can be more nuanced. MECBot uses powerful Knowledge Bases to bring in the semantics of unstructured content. The Knowledge Bases use graph storage internally and are indexed using Elasticsearch.
  • Data Management: MECBot uses FactorDB, our proprietary meta-database for storage. Enterprises have wide varieties of data, and each needs different NoSQL technologies to store. Hence, FactorDB is built on top of NoSQL (it uses Cassandra, Memcached, HDFS, and Elasticsearch).
  • Data Analytics: All the models are based on Spark MLlib. Each model is built as a plug-in that runs on Spark. We have also validated the models with FaaS (Function as a Service). Each model can independently scale up and down as it is dockerized and orchestrated using Kubernetes.
  • MECBot Monitoring: Provides complete monitoring of MECBot and internally, we use Prometheus and visualize using Kibana.

Mecbot is entirely developed in JVM languages (Clojure, Scala, and Java). The above layers of MECBot Blueprint automate the following functions:

  • Ingest data from a variety of internal and external sources.
  • Clean, pre-process and massage the data.
  • Define data relationships with Domain Data Model.
  • Augment data by connecting with its domain of origin.
  • Convert all ingested data instantly into graph format without any coding by the user.
  • Enable Data Transformation by users.
  • Generate flattened views of data for discovery & visualization.
  • Preserve lineage, carry out smart cataloging & keep all data and views hydrated with new inputs.
  • Carry out Master data management, metadata management, versioning and audit.

MECBot’s Blueprint is a web-based tool where all the transformations can be executed at scale in a well-orchestrated flow, without writing much code. All the transformations are run over Spark that runs either on-premise clusters or on the cloud, like Databricks. Users can define a transformation pipeline that can run at the row-level, cell-level, or entity level. These transformations also bring a standard representation of their poly-structured and unstructured data (CSV/TSV/PDFs/PPTs/DOC/XLS/JSON/XML) so that unified storage of data across data sources is possible. MECBot Blueprint Transformer brings trust and reliability as it shows what transformations have happened and at what time.

In the data-to-insights lifecycle, once pre-processed data is available, the second important task is “Feature Engineering”. Identifying features from the raw data may require various transformations which are taken care of by MECBot Blueprint. Once the features are identified, then the following steps need to be addressed:

  1. Training the model.
  2. Tuning the model.
  3. Deploying the model.

This is where MECBot's Wisdom layer comes in. Wisdom supports all kinds of Supervised and Unsupervised models like Clustering, Correlation, Pattern Mining, and Recommendation. Instead of the user having to enter tedious codes, ML pipelines can be created in a few minutes with Wisdom's drag-and-drop feature. Users can configure the models through API without worrying about the infrastructure, and scale the models up and down without worrying about the size of the dataset, as all the supported models are dockerized and orchestrated by Kubernetes.

Each model in Wisdom has two plugins:

  1. the Model Training plugin, and
  2. the Prediction plugin for running the models.

MECBot also provides observability (monitoring) support wherein the user can understand how much memory, disk, CPU, etc. are being consumed by the model. In short, MECBot enables Data Scientists, Citizen Data Scientists, Data Analysts, and Domain Experts to run multiple models on their data and choose the best one that fits their data, without worrying about the scalability or the configuration of various tools.

2020: Launch of MECBot 3.0 & Breaking Ground with Trusted Dataset.

Armed with the above arsenal of features, MECBot 3.0 was launched in the first quarter of 2020. But MECBot's super-powers have continued to grow further. We are now looking to solve the challenge of enabling access to 'trusted datasets' at scale.

The trusted data set has three key attributes:

· Data Security – Integrity of at data in motion, at rest, or in transit.

· Data Privacy – Role-based access to data & complete transparency.

· Speed and Scale – Managing large volumes of data at the scale of data generation by efficiently using hardware resources.

The urgency to transform data into data governed insights in real-time is growing at an equally rapid pace. However, enterprises are struggling to attain trusted datasets, as explained above.

In MECBot, data security and data governance are of prime importance. This is made possible in the latest version of MECBot where data lineage shows how the data is getting transformed, who changed it, when it was changed, and so on. Further, admin users have the right to restore each of the previous versions of the data. Data Security in MECBot is inspired by Apache Shiro. It is built on JWT for role-based access and sharing based on explicit user definitions combined with lineage, versioning, and audit trail ensure.

Repeatability ensures that once a model is stored in MECBot, the entire pipeline is scheduled to run automatically without any human intervention.

With MECBot, one can easily map the Domain Data Model by defining the entities and attributes without worrying about the underlying data structure and storage. This provides tremendous power in the hands of Data Scientists & Decision Makers by reducing their dependency on the IT team and also provides immense efficiency and time-savings to the IT team.

MECBot contextualizes the data, creates an Enterprise Knowledge Grid, and uses Deep Text Analysis to automatically extract entities, and facts around entities, and contextualizes it using powerful Knowledge Bases that are extensible. Since MECBot stores the data along with its relationships, "Natural Language Query" is possible for self-discovery of data. MECBot's FactorDB inherently records the time of each Fact that it stores. ML Pipelines can be created without worrying about the infrastructure, configurations, etc., and the model that best fits the data can be selected.

Now, suppose customers want to go for collaboration with external teams. It also provides ways to run the models on a cloud of their choice. MECBot comes pre-configured with one-click installation of complex technologies. It also offers complete observability, where one can not only see how MECBot uses the hardware resources, but also monitor individual model-level performance.

The creation of a Trusted Dataset is the biggest challenge today due to the traditional “Data First Approach” where unifying all data, including unstructured data, is an unsolved puzzle. Solving this puzzle has taken us closer than ever to fulfilling the vision we had started with - i.e., democratizing access to high-quality data and meaningful analytics for enterprises of sizes and types.