Behavioral Analytics

Clustering Concepts and Consumers

Why, Where, What and How
Measure Everything
Web Data Streams
Web Analytics
Clustering Tribes
Engagement Marketing
Tell Your Friends
The Enterprise
Mob Profiles
Biofeedback Marketing
Consulting Services

It has become increasingly urgent for enterprises to leverage the massive amount of information that they daily accumulate 24/7 that is in an unstructured data format.  This is text information in the form of documents, notes, email, chat, web forms, voicemail, news stories, regulatory filings, repair records, field representative notes, invoices, blogs, etc.  Although traditional behavioral analytics can find what is going on – the why and wherefores remains buried in unstructured content – and it can represent the hidden story behind the data.  Text analytics generally includes such tasks as the categorization of taxonomies, the clustering of concepts, entity and information extraction, sentiment analysis and summarization 
 
Why is text analytics important to enterprises and marketers – because enterprises need to know the reason a product is failing so that they can fix the cause, not just respond to the impact of a recall.  Because marketers need to know what messages or ads are resulting in the highest levels of responses and sales.  Increasingly enterprises are accumulating nearly 80% of their data in unstructured formats.  And because a deep analysis of patents, blogs, reports, emails, surveys, orders and other documents takes too long to be performed manually.  Text analytics provides an automated solution to organizing key concepts from this unstructured content.  This involves analyzing large amounts of documents, email, chat and text to discover previously unknown patterns and concepts. The information might contain hidden relationships or patterns that are buried in this unstructured content which would otherwise be extremely difficult, if not impossible, to discover manually 
 
Text analytics can use information retrieval (IR) and information extraction (IE) as well as natural language processing (NLP) techniques to organize and prioritize documents about any subject.  These text analytics techniques can be used individually or combined by enterprises to gain new insight into unstructured content data sources within their legacy and operational systems.  In addition, text analytical tools can convert unstructured content and parse it over to a structure format which is amenable to behavioral analytics.  For example, all of the daily emails that an enterprise accumulates on a daily basis can be organized into several piles of groupings, such as those customers seeking service assistance, or to those complaining about specific products or services.  Aside from organizing key concepts – tribes of consumers can also be clustered by text analytics into key segments – along service and product lines. 
 
Information retrieval (IR) systems identify the text in a large collection of documents or web pages which match a user’s query – it is the most basic level of text analytics. The most popular IR systems are today’s search engines such as Google and Yahoo, which identify ‘keywords’ within pages and documents on the Web that are relevant to a set of given words.  IR systems are often used in public and corporate libraries, where the documents are typically not the books themselves but digital records containing information about the books.  IR text analytics systems allow humans to narrow down the set of documents that are relevant to a particular problem 
 
Text analytics involves applying computationally intensive algorithms to large collections of text, and can speed up the analysis considerably by reducing the number of documents for human researchers to focus on.  However the most useful text analytics systems for enterprises and marketers are the natural language processing (NLP) and information extraction (IE) technologies and tools. They provide the method by which behavioral analytics can be performed on a large percentage of information normally tossed out by enterprises or marketers. 
 
NLP is one of the oldest fields of artificial intelligence and one of the most difficult to execute.  NLP systems convert unstructured content into readable human language.  NLP software can convert human language into a more formal format that is easier for behavioral analytics to be performed.  NLP is the analysis of human language so that computers can understand natural languages as humans do. NLP is a sub-set of computational linguistics.  NLP can perform some types of analysis with a high degree of success, such as tagging classifies words into categories such as noun, verb or adjective.   
 
NLP can execute disambiguation to identify the meaning of a word, given its usage, from the multiple meanings that the word may have.  NLP can also perform parsing of a grammatical analysis of a sentence. Shallow parsers identify only the main grammatical elements in a sentence, such as noun phrases and verb phrases, whereas deep parsers generate a complete representation of the grammatical structure of a sentence.  Companies that provide this type of NLP software include Crossminder and Basis Technologies. 
 
Information extraction (IE) is the process of automatically obtaining structured data from an unstructured content. Often this involves the parsing of the unstructured content into one or more structured templates, which are then used to guide the extraction process.  IE systems rely heavily on the data generated by NLP systems. Tasks that IE systems can perform include term analysis, entity recognition and fact extraction. IE software can parse structured tables from documents, emails, etc, so that subsequent analyzes can be performed.  Companies that provide this type of IE software include Attensity and Megaputer 
 
By using NLP and IE software enterprises and marketers can convert the vast silos of unstructured content they collect from all channels into a structured format (database or tables) by which behavioral analytics can be performed to discover previously unknown knowledge about visitors, consumers, customers and their relation to their products and services. Text analytics can be used to convert unstructured content into predictive models for digital targeting.

H I S T O R Y:

   
Computational linguistics as a field predates artificial intelligence, a field under which it is often grouped. Computational linguistics originated with efforts in the United States in the 1950s to use computers to automatically translate texts from foreign languages, particularly Russian scientific journals, into English. Since computers had proven their ability to do arithmetic much faster and more accurately than humans, it was thought to be only a short matter of time before the technical details could be taken care of that would allow them the same remarkable capacity to process language. 
 
When machine translation (also known as mechanical translation) failed to yield accurate translations right away, automated processing of human languages was recognized as far more complex than had originally been assumed. Computational linguistics was born as the name of the new field of study devoted to developing algorithms and software for intelligently processing language data. When artificial intelligence came into existence in the 1960s, the field of computational linguistics became that sub-division of artificial intelligence dealing with human-level comprehension and production of natural languages. 
 
In order to translate one language into another, it was observed that one had to understand the grammar of both languages, including both morphology (the grammar of word forms) and syntax (the grammar of sentence structure). In order to understand syntax, one had to also understand the semantics and the lexicon (or 'vocabulary'), and even to understand something of the pragmatics of language use. Thus, what started as an effort to translate between languages evolved into an entire discipline devoted to understanding how to represent and process natural languages using computers.
 __________________________________________________________________________ 
 
Text analytical tools enable enterprises to create an infrastructure to improve early warning systems that include text-based information to better assess and trigger organizational responses.  Text mining analyzes can identify trends, patterns, and complex inter-document relationships within large numbers of documents and communications with customers.  Enterprises have spent millions of dollars capturing, storing and maintaining customer data this includes a wide array of information about what customers have bought and how they use services, to feedback data captured through surveys, and emails.  This data is used to determine many things, including what products a company should sell, what to cross-sell to an existing customer, who might be at risk for attrition and what kinds of programs to implement to engender customer loyalty. 
  
Much of this data contains important customer information and feedback that can guide enterprises for better business decisions, focus marketing programs and drive product development.  Unstructured data from customers and prospects comes in feedback surveys, emails, web forms, comment boxes, customer service or technician notes, claim forms, blogs and more text not found in databases or tables.   
 
Failure to leverage unstructured content can have consequences to enterprises and marketers which can lead to customer dissatisfaction when their feedback is not heard or reacted to, leading to attrition.  It can also lead to stale and serious product issueswhen feedback and input is not heard or acted upon, which can lead to liability or safety issues that are not identified or rectified.  This can lead to a tarnished brand when a negative opinion and word of mouth may go undetected.

webassets/4-1.JPG

How Attensity can capture customer feedback to reduce attrition   
 
Text analytics can lead to a better understanding of customer sentiment; it allows enterprises to understand how customers feel about their company, products, services, content and more.  Text analytics tools can generate actionable satisfaction scores based on unstructured data to identify and score customers who are at riskconversely these tools can also find those customer that are champions of products or services via social networking. Unstructured content from surveys, emails, online forms and blogs can be analyzed via these text analytical tools to identify customers who are at risk of attrition (churn) and who are acting as detractors.  Information extraction tools can assign churn scores based on unstructured data and populate predictive models constructed via behavioral analytics. Text analytics tools can also be used to improve the quality of products in order to reduce returns.  They can assist enterprises in identify what is wrong with their products and services 
 
The National Centre for Text Mining (NaCTeM) offers text mining services to researchers that enable semantic searching of text and the discovery of new knowledge.  NaCTeM has links and articles on text mining, including demonstrations of text analytics tools, like TerMine.  TerMine is a quick way for a reader to pick out articles of potential interest from a large body of text, the software decomposes documents reducing ambiguity which may cause irrelevant information to be retrieved (low precision) and relevant information to be overlooked (low recall). TerMine has also been used to build controlled vocabularies and ontologies, that is collections of words and phrases common to a subject area, by extracting candidate terms from a body of text.   
  
Text analytics incorporates different algorithms and models including the following: 
 
Bayesian Models – They considers all possible parameter values, includes a penalty for including too much model structure, thus guards against overfitting. 
  
Concept Decomposition – A procedure for text retrieval based on a sparsified matrix, which can enhance the accuracy compared with the technique based on latent semantic indexing with singular value decomposition. 
 
Orthogonal Decomposition – A technique used in exploratory data analysis and for making predictive models, it involves the calculation of a data matrix, by centering the data for each attribute. 
  
Probabilistic Models – Embraces the object-relational nature of structured data by capturing probabilistic interactions between attributes of related entities which can be used to predict link structure. 
   
Vector Space Models – An algebraic model for representing text documents as vectors of identifiers, such as, for example, index terms, they are used in information filtering, information retrieval, indexing and relevancy rankings. 
   
Latent Semantic Indexing – A technique in natural language processing, in particular in vectorial semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. 

     

Graph-based Models – Are usually described by means of nodes and edges, roughly corresponding to places and their spatial relations.

     
Text Streaming ModelsAn automated unsupervised learning of latent topics from text documents for document organization, retrieval and filtering of information.  For the following text analytics applications:  

     

Clustering – Technique for unsupervised document organization, automatic topic extraction and fast information retrieval or filtering.

   

Factor Analysis – A data reduction method.

   

Visualization – Graphical depictions of relationships between sets of concepts, permitting the end user to identify previously unrecognized or unknown relationships.

 

Metadata Generation – The use of self-organizing map (SOM) algorithm to cluster training web pages to discover some semantic descriptions about the web pages.

 

Information Extraction – Technique for locating specific pieces of data in natural-language documents.

   

Text Classification – There are two sorts: supervised document classification where human feedback provides information on the correct classification for documents, and unsupervised document classification, where the classification is without reference to external information.

 

Text Segmentation – The process of dividing written text into meaningful units, such as sentences or topics.

   

Text Summarization – Technique based on statistical, linguistically and heuristic methods to calculate how often certain key words appear in a document by frequency and which sentences they are present in, and where these sentences are in the text.

webassets/4-2.JPG

The self-organized clustering of words related to smoking    
 
Some text mining tools can create links with structured databases for a global view and behavioral analytics model.  For more information on text analytics technologies go to textanalysis.info or tlab.it.  The following is a list of text analysis, text mining, and information extraction and retrieval software for the enterprise and marketer:     
 
ActivePoint: offers natural language processing for contextual search

   

AeroText: enterprise scalable data extraction tool suite

   

Arrowsmith: software for discovery from complementary literatures

   

Attensity: extract "who", "what", "where", "when" and "why" facts

   

Aubice: proprietary algorithms find relationships between keywords

   

Basis Technology: natural language analysis of multilingual text

   

ClearForest: tools for analysis and visualization of documents

   

Compare Suite: compares and highlights texts by keywords

   

Connexor: discovers grammatical and semantic information

   

Copernic Summarizer: from many languages from various applications

   

Corpora: a natural language processing company

   

Crossminder: natural language processing and text analytics

   

Cypher: generates the RDF graph and SeRQL query from natural language input

   

DolphinSearch: text-reading robot 

   

dtSearch: for indexing, searching, and retrieving free-form text files

   

Eaagle: analyze large volumes of unstructured text and create reports

   

Enkata: providing a range of enterprise-level solutions for text analysis

   

Entrieva: categorizes and organizes unstructured text from virtually any source

   

Expert System: proprietary COGITO platform for the semantic comprehension

   

Files Search Assistant: quick and efficient search within text documents

   

Intellexer: document comparison and summarization software

   

ISYS: searches across multiple sources; on-the-fly HTML conversion

   

Leximancer: automatic concept maps of text data collections

   

Lextek: classifying, routing, and filtering text according to user defined profiles

   

Linguamatics: natural language processing search engine

   

Monarch: transform any report into a live database

   

NewsFeed Researcher: multi-document summarization tool, with RSS news feeds

   

Nstein: guides users to the most relevant information

   

Power Text Solutions: extensive capabilities for "free text" analysis

   

Readability Studio: offers tools for determining text readability levels

   

Recommind MindServer: uses probabilistic latent semantic analysis

   

SAS Text Miner: a suite of text processing and analysis tools

   

SPSS: extract key concepts, sentiments, and relationships from unstructured data

   

TEMIS: an information discovery solution for enterprises

   

TeSSI®: semantic indexing, semantic searching, coding and information extraction

   

Textalyser: online text analysis tool, providing detailed text statistics

   

TextOre: providing B2B analytic software and services

   

TextPipe Pro: text conversion, extraction and manipulation workbench

   

TextQuest: text analysis software

   

Tibo: for mining text, images, and numerical data

   

Readware: models queries, messages and expressions

   

Quenza: automatically extracts entities, cross references and builds databases

   

VantagePoint: graphical views to discover knowledge from text databases

   

VisualText™: a GUI development kit for building accurate text analyzers

   

Wordstat: for analyzes of questions, interviews, surveys, etc.

   

Many commercial packages also offer free or limited trial versions.    
 
FREE:
   

GATE: a free open source framework and graphical development environment

   

LingPipe: is a suite of Java libraries

   

Open Calais: an open-source toolkit for blogs, websites or applications

   

S-EM (Spy-EM): a text classification system that learns

   

The Semantic Indexing Project: a standalone indexer/search application

   

Vivisimo/Clusty: web search and text clustering engine

   
C H E C K L I S T:
 
Here are some typical applications for text analytics for both marketers and enterprises including the following: 
 
1.      Analyzing open-ended responses from market research surveys of a product or service. The idea is to permit respondents to express their "views" or opinions without constraining them to particular dimensions or a particular response format. This may yield insights into customers' views and opinions that might otherwise not be discovered.  A marketer or enterprise may discover that a certain cluster of words or terms are commonly used in association with a product or service. 
 
2.      Automatic processing of instant messages, emails, blogs, etc.  Another common application for text analytics is to automate the classification of emails.  The automatic classification of email can be useful in applications where messages need to be routed to a specific department or agency.  This can be part of the overall web analytics strategy for enterprises and marketers.   
   
3.      The analysis of warranty or insurance claims, diagnostic interviews, trouble tickets, Q&A emails, web response forms and surveys, etc.  In some business domains, the majority of information is collected in open-ended, textual form. For example, customer interviews can be summarized in brief narratives, such as the servicing of automobiles or electronic products – where typically service personnel transcribe some notes about recurring problems. Increasingly, those notes are collected electronically, so those types of narratives are readily available for text analytics. This information can then be used to identify common complaints on certain products or services, which can lead to their improvement   
   
4.      The competitive intelligence analysis of rivals web sites – by mapping all their web pages and links. The "crawling" of a competitor’s site could uncover links and derive a list of terms and documents to quickly determine competitive intelligence about their activities, intents, focus and strategies 
   
5.      A final and very important application is the clustering of customer profiles for enterprises and marketers via the analysis of all their text communications.
  
The grouping of consumers along the lines of similar interests has been available in a variety of forms for years. For TV, there are channels that only show programming on a certain topic.  ESPN and CNN are prime examples.  Radio stations, focus and play only a specific genre of music. Today, where everything happens online, you have groups, forums, blogs, social networks, and more.  Using text analytics enterprises and marketers are able to create and study clusters of consumer tribes.
    

Tribes enable an enterprise to build customer loyalty – by providing them a place they belong – it leads to strategically narrow and focused channels of communications.  Tribes can promote word-of-mouth marketing, if certain segments of consumers liked a product or service they will share with their friends and associates. Tribes can improve advertising, memberships, cross-selling, related products and additional revenue streams from these segmented consumer groupings.

   
Tribes allow consumers behaviors and communications to define their own clusters by using text and behavioral analytics. Tribes can review products and services which can lead to improvements and revenue growth. Tribes enable the grouping of consumers along products and services lines. Tribes enable enterprises to differentiate themselves from competitors. Tribes of consumers can take over their own ownership.    Finally, enterprises and marketers need to provide the functionality which allows consumers to create their own cluster tribes by analyzing their own words and behaviors.  They need to proactively use and leverage text and behavioral analytics to cluster concepts and consumer tribes.  Tribes cannot consume a product or a service without engaging with them at an intimate level.     
  
Tribes create cultures which absorb, change and improve products and services.  Tribes are held together by shared passions not demographics amongst their networks of colleagues and friends – the binding glue are emotional which are covered in detail in the Engagement Marketing section of this site.  Tribes can become micro marketers by their recommendation, suggestions and communication on social networking sites – which are also covered in detail in the Tell Your Friends section of this site.