|
It has become increasingly urgent for enterprises to leverage
the massive amount of information that they daily accumulate 24/7 that is in an unstructured data format. This
is text information in the form of documents, notes, email, chat, web forms, voicemail, news stories, regulatory filings,
repair records, field representative notes, invoices, blogs, etc. Although traditional behavioral analytics
can find what is going on – the why and wherefores remains buried in unstructured content – and it can represent the hidden story behind the data.
Text analytics generally includes such tasks as the categorization of taxonomies, the clustering of concepts, entity
and information extraction, sentiment analysis and summarization.
Why is text analytics important to enterprises and marketers
– because enterprises need to know the reason a product is failing so that they can fix the cause, not just respond
to the impact of a recall. Because marketers need to know what messages or ads are resulting in the highest
levels of responses and sales. Increasingly enterprises are accumulating nearly 80% of their data in unstructured
formats. And because a deep analysis of patents, blogs, reports, emails, surveys, orders and other documents
takes too long to be performed manually. Text analytics provides an automated solution to organizing key concepts from this unstructured
content. This involves analyzing large amounts of documents, email, chat and text to discover previously
unknown patterns and concepts. The information might contain hidden relationships or patterns that are buried in this unstructured
content which would otherwise be extremely difficult, if not impossible, to discover manually.
Text analytics can use information retrieval (IR) and information extraction
(IE) as well as natural language processing (NLP) techniques to organize and prioritize documents about any subject.
These text analytics techniques can be used individually or combined by enterprises to gain new insight into unstructured
content data sources within their legacy and operational systems. In addition, text analytical tools can
convert unstructured content and parse it over to a structure format which is amenable to behavioral analytics.
For example, all of the daily emails that an enterprise accumulates on a daily basis can be organized into several
piles of groupings, such as those customers seeking service assistance, or to those complaining about specific products or
services. Aside from organizing key concepts – tribes of consumers can also be clustered by text
analytics into key segments – along service and product lines.
Information retrieval (IR) systems identify the text in a large collection of
documents or web pages which match a user’s query – it is the most basic level of text analytics. The most popular
IR systems are today’s search engines such as Google and Yahoo, which identify ‘keywords’ within pages and
documents on the Web that are relevant to a set of given words. IR systems are often used in public and
corporate libraries, where the documents are typically not the books themselves but digital records containing information
about the books. IR text analytics systems allow humans to narrow down the set of documents that are relevant
to a particular problem. Text
analytics involves applying computationally intensive algorithms to large collections of text, and can speed up the analysis
considerably by reducing the number of documents for human researchers to focus on. However the most useful
text analytics systems for enterprises and marketers are the natural language processing (NLP) and information extraction (IE) technologies and tools. They provide the
method by which behavioral analytics can be performed on a large percentage of information normally tossed out by enterprises
or marketers. NLP is one of the oldest fields of artificial intelligence and one of the most difficult to execute.
NLP systems convert unstructured content into readable human language. NLP software
can convert human language into a more formal format that is easier for behavioral analytics to be performed.
NLP is the analysis of human language so that computers can understand natural languages as humans do. NLP is a sub-set
of computational linguistics. NLP can perform some types of analysis with a high degree of success, such
as tagging classifies words into categories such as noun, verb or adjective.
NLP can execute disambiguation to identify the meaning of a word, given its
usage, from the multiple meanings that the word may have. NLP can also perform parsing of a grammatical
analysis of a sentence. Shallow parsers identify only the main grammatical elements in a sentence, such as noun phrases and
verb phrases, whereas deep parsers generate a complete representation of the grammatical structure of a sentence. Companies
that provide this type of NLP software include Crossminder and Basis Technologies.
Information extraction (IE) is the process of automatically obtaining
structured data from an unstructured content. Often this involves the parsing of the unstructured content into one or more
structured templates, which are then used to guide the extraction process. IE systems rely heavily on the
data generated by NLP systems. Tasks that IE systems can perform include term analysis, entity recognition and fact extraction.
IE software can parse structured tables from documents, emails, etc, so that subsequent analyzes can be performed.
Companies that provide this type of IE software include Attensity and Megaputer. By using NLP and IE software enterprises and marketers can
convert the vast silos of unstructured content they collect from all channels into a structured format (database or tables)
by which behavioral analytics can be performed to discover previously unknown knowledge about visitors, consumers, customers
and their relation to their products and services. Text analytics can be used to convert unstructured content into predictive
models for digital targeting. H I
S T O R Y: Computational linguistics as a field predates artificial intelligence, a
field under which it is often grouped. Computational linguistics originated with efforts in the United States in the 1950s
to use computers to automatically translate texts from foreign languages, particularly Russian scientific journals, into English.
Since computers had proven their ability to do arithmetic much faster and more accurately than humans, it was thought to be
only a short matter of time before the technical details could be taken care of that would allow them the same remarkable
capacity to process language. When
machine translation (also known as mechanical translation) failed to yield accurate translations right away, automated processing
of human languages was recognized as far more complex than had originally been assumed. Computational linguistics was born
as the name of the new field of study devoted to developing algorithms and software for intelligently processing language
data. When artificial intelligence came into existence in the 1960s, the field of computational linguistics became that sub-division
of artificial intelligence dealing with human-level comprehension and production of natural languages. In order to translate one language into another, it was observed that one had to
understand the grammar of both languages, including both morphology (the grammar of word forms) and syntax (the grammar of
sentence structure). In order to understand syntax, one had to also understand the semantics and the lexicon (or 'vocabulary'),
and even to understand something of the pragmatics of language use. Thus, what started as an effort to translate between languages
evolved into an entire discipline devoted to understanding how to represent and process natural languages using computers.
__________________________________________________________________________ Text analytical tools enable enterprises to create an infrastructure
to improve early warning systems that include text-based information to better assess and trigger organizational responses. Text mining
analyzes can identify trends, patterns, and complex inter-document relationships within large numbers of documents and communications
with customers. Enterprises have spent millions of dollars capturing, storing and maintaining customer data this includes a wide
array of information about what customers have bought and how they use services, to feedback data captured through surveys,
and emails. This data is used to determine many things, including what products a company should sell,
what to cross-sell to an existing customer, who might be at risk for attrition and what kinds of programs to implement to
engender customer loyalty.
Much of this data contains important customer
information and feedback that can guide enterprises for better business decisions, focus marketing programs and drive product
development. Unstructured data from customers and prospects comes in feedback surveys, emails, web forms,
comment boxes, customer service or technician notes, claim forms, blogs and more text not found in databases or tables. Failure to leverage unstructured
content can have consequences to enterprises and marketers which can lead to customer dissatisfaction when their feedback is not heard or reacted to, leading
to attrition. It can also lead to stale and serious product issues – when feedback and input is not heard or acted upon,
which can lead to liability
or safety issues that are
not identified or rectified. This can lead to a tarnished brand when a negative opinion and word of mouth may go undetected.

How Attensity can capture customer feedback to reduce attrition Text analytics can lead to a better understanding of customer
sentiment; it allows enterprises to understand how customers feel about their company, products, services, content and more. Text analytics tools can generate actionable satisfaction scores based on unstructured data to identify and score customers who are at risk –
conversely these tools can also find those customer that are champions of products or services via social networking.
Unstructured content from surveys, emails, online forms and blogs can be analyzed via these text analytical
tools to identify customers
who are at risk of attrition (churn) and who are acting as detractors. Information
extraction tools can assign churn scores based on unstructured data and populate predictive models constructed via behavioral
analytics. Text analytics
tools can also be used to improve the quality of products in order to reduce returns. They
can assist enterprises in identify what is wrong with their products and services. The National Centre for Text Mining (NaCTeM) offers text mining services to researchers that enable semantic searching
of text and the discovery of new knowledge. NaCTeM has links and articles on text mining, including demonstrations
of text analytics tools, like TerMine. TerMine
is a quick way for a reader to pick out articles of potential interest from a large body of text, the software decomposes
documents reducing ambiguity which may cause irrelevant information to be retrieved (low precision) and relevant information
to be overlooked (low recall). TerMine has also been used to build controlled vocabularies and ontologies, that is collections
of words and phrases common to a subject area, by extracting candidate terms from a body of text. Text analytics incorporates different algorithms and models including the following: Bayesian Models – They considers all possible parameter values, includes a penalty for including too much model
structure, thus guards against overfitting. Concept Decomposition – A procedure for text retrieval based on a sparsified matrix, which can
enhance the accuracy compared with the technique based on latent semantic indexing with singular value decomposition. Orthogonal
Decomposition
– A technique used in exploratory data analysis and for making predictive models, it involves the calculation of a data
matrix, by centering the data for each attribute.
Probabilistic Models – Embraces the object-relational nature of structured
data by capturing probabilistic interactions between attributes of related entities which can be used to predict link structure. Vector Space Models – An algebraic model for representing text documents as vectors of identifiers, such as, for
example, index terms, they are used in information filtering, information retrieval, indexing and relevancy rankings. Latent Semantic Indexing – A technique in natural language processing, in particular in vectorial semantics, of analyzing
relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents
and terms. Graph-based Models – Are usually
described by means of nodes and edges, roughly corresponding to places and their spatial relations. Text Streaming Models – An
automated unsupervised learning of latent topics from text documents for document organization, retrieval and filtering of
information. For the following text analytics applications:
Clustering – Technique for unsupervised document
organization, automatic topic extraction and fast information retrieval or filtering. Factor Analysis – A data reduction method. Visualization – Graphical depictions of relationships between sets of concepts,
permitting the end user to identify previously unrecognized or unknown relationships. Metadata Generation – The use of self-organizing map (SOM) algorithm to cluster
training web pages to discover some semantic descriptions about the web pages. Information Extraction – Technique for locating specific pieces of data in natural-language documents. Text Classification – There are two sorts: supervised document classification where human feedback provides information
on the correct classification for documents, and unsupervised document classification, where the classification
is without reference to external information. Text
Segmentation –
The process of dividing written text into meaningful units, such as sentences or topics. Text Summarization – Technique based on statistical, linguistically and heuristic methods to calculate
how often certain key words appear in a document by frequency and which sentences they are present in, and where these sentences
are in the text.

The self-organized clustering of words related to smoking Some text mining tools can create links with structured databases for a global view and behavioral analytics model.
For more information on text analytics technologies go to textanalysis.info or tlab.it. The following
is a list of text analysis, text mining, and information extraction and retrieval software for the enterprise and marketer: ActivePoint: offers natural language processing for contextual search AeroText: enterprise
scalable data extraction tool suite Arrowsmith: software for discovery from complementary literatures
Attensity: extract
"who", "what", "where", "when" and "why" facts
Aubice: proprietary
algorithms find relationships between keywords Basis Technology: natural language analysis of multilingual
text ClearForest: tools
for analysis and visualization of documents Compare Suite: compares and highlights texts by keywords
Connexor: discovers
grammatical and semantic information Copernic Summarizer: from many languages from various applications
Corpora: a
natural language processing company Crossminder: natural language processing and text analytics Cypher: generates
the RDF graph and SeRQL query from natural language input DolphinSearch: text-reading robot dtSearch: for indexing, searching, and retrieving free-form
text files Eaagle: analyze
large volumes of unstructured text and create reports Enkata: providing a range of enterprise-level solutions
for text analysis Entrieva: categorizes and organizes unstructured text from virtually any
source Expert System: proprietary COGITO platform for the semantic comprehension Files Search Assistant:
quick and efficient search within text documents Intellexer: document
comparison and summarization software ISYS: searches across multiple sources; on-the-fly
HTML conversion Leximancer: automatic concept maps of text data collections Lextek: classifying,
routing, and filtering text according to user defined profiles Linguamatics: natural language processing search engine Monarch: transform any report into a live database
NewsFeed Researcher:
multi-document summarization tool, with RSS news feeds Nstein: guides
users to the most relevant information Power Text Solutions: extensive capabilities for "free text"
analysis Readability Studio:
offers tools for determining text readability levels Recommind MindServer:
uses probabilistic latent semantic analysis SAS Text Miner: a suite of text processing and analysis tools SPSS: extract key concepts, sentiments, and relationships
from unstructured data TEMIS: an information discovery solution for enterprises TeSSI®:
semantic indexing, semantic searching, coding and information extraction Textalyser: online
text analysis tool, providing detailed text statistics TextOre: providing B2B analytic software and services
TextPipe Pro: text conversion, extraction and manipulation workbench TextQuest: text
analysis software Tibo: for mining text, images, and numerical data Readware: models
queries, messages and expressions Quenza: automatically extracts entities, cross references and builds
databases VantagePoint: graphical views to discover knowledge from text databases VisualText™: a GUI development kit for building accurate text analyzers Wordstat: for
analyzes of questions, interviews, surveys, etc.
Many commercial packages
also offer free or limited trial versions. FREE:
GATE: a free open source framework and graphical
development environment LingPipe: is a suite of Java libraries Open Calais: an
open-source toolkit for blogs, websites or applications S-EM (Spy-EM): a text classification system that learns The Semantic Indexing Project: a standalone indexer/search application Vivisimo/Clusty: web search and text clustering engine
C H
E C K L I S T: Here
are some typical applications for text analytics for both marketers and enterprises including the following: 1. Analyzing open-ended responses from market research surveys of a product or service. The idea is to permit respondents to express their
"views" or opinions without constraining them to particular dimensions or a particular response format. This may
yield insights into customers' views and opinions that might otherwise not be discovered. A marketer
or enterprise may discover that a certain cluster of words or terms are commonly used in association with a product or service. 2.
Automatic processing
of instant messages, emails, blogs, etc. Another common application for text analytics is to automate the classification
of emails. The automatic classification of email can be useful in applications where messages need to be
routed to a specific department or agency. This can be part of the overall web analytics strategy for enterprises
and marketers. 3. The analysis of warranty or insurance claims,
diagnostic interviews, trouble tickets, Q&A emails, web response forms and surveys, etc. In some business domains, the majority of
information is collected in open-ended, textual form. For example, customer interviews can be summarized in brief narratives,
such as the servicing of automobiles or electronic products – where typically service personnel transcribe some notes
about recurring problems. Increasingly, those notes are collected electronically, so those types of narratives are readily
available for text analytics. This information can then be used to identify common complaints on certain products or services,
which can lead to their improvement. 4.
The competitive
intelligence analysis of rivals web sites – by mapping all their web pages and links. The "crawling" of a competitor’s site could
uncover links and derive a list of terms and documents to quickly determine competitive intelligence about their activities,
intents, focus and strategies. 5. A final and very important application is the
clustering of customer profiles for enterprises and marketers via the analysis of all their text communications. The grouping of consumers along the lines of
similar interests has been available in a variety of forms for years. For TV, there are channels that only show programming
on a certain topic. ESPN and CNN are prime examples. Radio stations, focus and play
only a specific genre of music. Today, where everything happens online, you have groups, forums, blogs, social networks, and
more. Using text analytics enterprises and marketers are able to create and study clusters of consumer
tribes. Tribes enable an enterprise to build customer loyalty – by providing them a place they belong
– it leads to strategically narrow and focused channels of communications. Tribes can promote word-of-mouth marketing, if certain segments of consumers
liked a product or service they will share with their friends and associates. Tribes can improve advertising, memberships,
cross-selling, related products and additional revenue streams from these segmented consumer groupings. Tribes allow consumers behaviors and communications
to define their own clusters by using text and behavioral analytics. Tribes can review products and services which can lead
to improvements and revenue growth. Tribes enable the grouping of consumers along products and services lines. Tribes enable enterprises to differentiate themselves from competitors.
Tribes of consumers can take over their own ownership. Finally, enterprises and marketers need to provide the functionality which allows consumers to create
their own cluster tribes by analyzing their own words and behaviors. They need to proactively use and leverage
text and behavioral analytics to cluster concepts and consumer tribes. Tribes cannot consume a product
or a service without engaging with them at an intimate level. Tribes create
cultures which absorb, change and improve products and services. Tribes are held together by shared passions
not demographics amongst their networks of colleagues and friends – the binding glue are emotional which are covered
in detail in the Engagement Marketing section of this site. Tribes can become micro marketers by their
recommendation, suggestions and communication on social networking sites – which are also covered in detail in the Tell
Your Friends section of this site.
|