Interaction Analytics using Text Mining to optimize customer support operations Raghav Garg National University of Singapore Abstract Customer support division is the foundation for ensuring satisfied customer base and is one area which can have direct visible and easily measurable impact by leveraging data analytics

Interaction Analytics using Text Mining to optimize customer support operations
Raghav Garg
National University of Singapore
Abstract
Customer support division is the foundation for ensuring satisfied customer base and is one area which can have direct visible and easily measurable impact by leveraging data analytics. Customer support interaction analytics helps business to use unstructured call transcripts data and leverage advance text mining techniques to identify the interaction reasons between customer and bank. Results from topic modelling are more reliable than current manual classification of inbound calls. These results establish the foundations for customer experience framework and are used for both predictive and prescriptive analytics. Customers with high propensity to call for specific reasons are predicted, and then optimal customer centric solution is provided by using a mix of different channels comprising of agent chat, email and chatbot.

1. Introduction
In today’s global marketplace, all the banks are investing in customer support center models to assist customers, enhance customer experience apart from focusing on scaling business in terms of cross-selling and up-selling opportunities. At the same time, supporting the customers through different channels is a costly endeavor and customer call center being the costliest one. Key to success is striking the perfect balance between customer care and resources which eventually improves overall efficiency. Primary task is to understand how customers interact with contact center and what kind of issues can be resolved by digital channels like mobile applications, internet banking, chatbots, emails, etc. Call center analytics can come into picture which combines variety of channels that banks can use to serve customers by maintaining performance. Four of the most common approaches in advance analytics which are being leveraged by businesses to improve call center performance are (https://www.salesforce.com/hub/service/call-center-analytics):
Speech Analytics can reveal opportunities to streamline processes, eliminate unnecessary steps, identify emotions in real time and proactively tell the agent solutions increasing his efficiency. This primarily focuses on customer’s pitch, modulation of the voice to identify customer support and emotions.

Text Analytics is leveraging the ever-increasing text data from sources like social media, emails, chatbots, live chat with agents, audio call transcripts, discussion forums, etc. Text analytics can help identify the patterns and insights from text to identify the primary and sub-intent within a message, perform root cause analysis by identifying the common problems faced by customers, appropriately design the marketing mailers, etc. This can form foundation for both predictive analytics and prescriptive analysis.
Predictive analytics engine can help banks to proactively plan and take decisions for future. Based on historical performance of contact centers, banks can forecast the call volume during certain seasons and leverage this information for staffing extra resources. Similarly, based on customers past calling behavior, banks can predict customers propensity to call and identify the reasons for contact. This information can then be incorporated to provide proactive solution and improving his experience with the bank. Along with that banks can also leverage metrics like First Call Resolution (FCR) which primarily tells about the customers whose queries have not been resolved during their first contact with organization. This can lead the organizations to leverage analytics and reduce repeat calls at customer level and also identify the low performing agents to provide relevant trainings and enable them to take required solutions.

Self-Serve Channels is the focus of every forward-thinking bank. They are targeting the customer centric approach to provide all the relevant information on digital platforms and making more personalized chatbots. This can drastically improve the call center performance and reduce the incoming call load. For example, banks have developed very easy to use services to block cards, update personal information, request for fee waiver, etc.
Impact of Analytics on Customer Experience
Customers have multiple ways to contact a bank ranging from emails, chat, chatbot, digital services and most common has always been making a call to customer support center. It also turns out to be the most expensive mode of customer support to maintain by the banks as it has infrastructure costs, agents cost to name a few. Also, the most common set of customer complaints banks usually face from customers are regarding not getting prompt resolution by customer care agents, long durations of being on hold, multiple transfers within the call and most importantly lack of satisfactory resolution. In world of social media, these experiences may be voiced to friends and others and can have huge impact on perception about service and support by bank.
This is the one of the primary reasons banks are moving towards other channels of communication to provide resolution to customer queries and improve end to end customer support experience. Banks are leveraging advance analytics and focusing on flipping these drawbacks and complaints into opportunities to focus on improve customer experience. Banks are closely monitoring the incoming call volumes during different seasons of the year as call interactions reasons may vary from time to time. First and most important focus area for banks is to perform the root cause analysis which can help business to know the primary reasons for which customers are making calls. Customer care agents do label each of the incoming call into a specific category. As this involves manual labeling of the inbound calls and agents are always flooded with huge volume of calls to handle, the accuracy of these labels is impacted. These labels may not be something which is actionable and can be transformed into a digital service for quick and prompt resolution. There are number of factors leading to low reliability on the customer care agents’ labels and hence banks are finding ways where they can use machine learning methods to classify and identify root cause reasons.

There is abundant data being generated by customers at every given point as they are looking and finding ways for getting required information from the banks. They go to website and look for relevant information or solution to the query and this leads to generation of click stream data and captures what all pages and places customers visit on the bank’s website. This data can be combined into a path trail analysis which captures what all page’s customer visits in his session. This can be utilized to evaluate performance of multiple pages by evaluating whether customers can find what they are looking for on certain pages or end up calling to the bank. Customers have an option to chat with the agents and agents respond to customers queries in real time, this is also cost heavy and expensive mode of support provided to customer. This is generating large amount of text data which can be used to understand the what issues were being discussed in the chat and what solution was provided by agents. Using advances machine learning and deep learning models data generated from these conversations can be used to provide quick solutions to agents and they can save time rather than looking for the information for the customers. Customers also outreach by emails which has high response time and hence customers are moving away from this mode of communication. Most commonly used method is calling the support center of bank and it generates both audio data which can be used to understand customers emotion and sentiment during the call. This emotion and sentiment-based analysis can be used to understand customer behavior and based on outcome agent might prefer to transfer the customer to tier II agents for quick and better support. In some cases banks also have infrastructure where these incoming calls are converted to transcripts data i.e. audio calls are converted into text conversation between customers and agents. This huge amount of text data can be used for text analytics and natural language processing and can provide abundant insights into customers calling reasons and how agents provide solutions to customer queries. Banks are still making progress with advances in artificial intelligence where they can build chatbots which can provide solution to customer queries without any manual intervention. Out of all these modes of communication managing and operating customer care support centers is the most expensive and focus is to reduce incoming calls and migrate customers to other digital platforms.

5143508890
Customer Bank Machine Learning
Modes of communication
Figure 1: Customer Support Journey and Role of Analytics
Figure 1 shows how a customer can reach a bank through different modes of communications and machine learning can be used on data generated through these modes with objective of creating an optimal mix of channels to support customers and reduce overhead operational costs. Analyzing the content and volume of contacts made, banks can redirect customers to specific channels and maintain optimal staff at different periods of time. This can lead to improving customer experience and also migrate customers to digital platforms.

When data generated from above mentioned modes of communication is combined with other metadata, such as customer demographics, transaction history, customer status on different products, etc. interaction analytics can deliver valuable insights which can enhance customer experience. This can be used for multiple use cases for example, improving first call resolution rates, reduce number of repeat calls by customer for similar reasons, reduce average handling time by enabling agent with quick access to information.

Banks are investing huge amount of resources to improve call center performance as managing the call center is a costly affair. With increasing digital channels and ways to reach customers banks are betting on these channels to resolve customers concern and establish these channels as the primary point of contact.
Interaction analytics can produce high return on investment by providing insights into customer journey, improve agent’s productivity, enable other modes of customer centric contact points apart from calls. Each organization may value these benefits in a different way but one thing which is consistent across all the businesses is positive impact on customer experience.
2. Problem Statement
Customer support call centers are dynamic in nature and have been evolving with time as new trends and business models are becoming mainstream. Virtual call centers and Interactive Voice Response (IVR) have dramatically changed the customer support operations in last decade. With continuous evolution of methods to serve and support increasing customer base, banks are finding it hard to assess and quantify the effectiveness of these solutions.

One thing which has remained consistent in all these years is reliability on customer support teams to serve customer needs and establish strong and positive customer relationships, but they also account for major cost to the banks. Most prominent challenge faced by customer support centers is high overhead costs and how to minimize it by improving the efficiency and performance. Customers outreach to bank for myriad number of reasons, few of them could be very simple reasons to contact like checking available balance, checking due date for bill payment, request for card replacement, etc. to more complex one’s where customer need to register the security token device and understand how to use it to update personal information or make an overseas fund transfers. Along with that banks have huge customer base comprising of different demographic and psychographic characteristics. Customer behavior and reasons for contact can vary across different segments which can be segregated based on age, income, location, spending behavior, number of products, tenure with the bank, historical contact pattern, etc. Bank can use these characteristic and data sources for predictive analytics in various applications.
With huge number of inbound calls every year, banks are finding way to reduce the inbound calls and figuring out ways to provide solutions to serve the customers through upcoming digital platforms. Reducing the number of incoming calls and migrating these customers to digital platforms has direct impact on reducing overhead costs and eventually increasing revenue. Bank is planning to use unstructured data generated by incoming calls audio. All the incoming customer calls over period of 1 year are converted to text transcript and these are used for natural language processing to perform root cause analysis and understand reasons for which customers contact the support team. These will provide 2-fold solution for the business owners. First, identify the reasons which have a predictive nature and then build predictive model to predict these reasons. Second, identify the reasons or areas which can be solved by improving the existing services. For example, lot of customers might go to FAQs page but can not find the information they are looking for. If lot of people are making calls for similar reasons, then FAQs page can be redesigned and develop customer centric personalized FAQs page.

Root Cause Analysis
Call Transcript data captures the conversation between agent and customer during the call. There are set of other fields in the data which capture information around agent name, agent id, number of holds during the call, number of transfers during the call, timestamp, call duration, total agent talk time, total customer talk time, etc. Variable to focus from this data is text transcript which captures customer and agent conversation in order they speak during the call. Although data quality is a big challenge, it still provides the details around the reason customer called for and what actions did agent take to resolve the issue or concern.
Along with that bank also captures the log of agent’s desktop which captures a vital information in terms of actions agent took or screens he went to provide the required information to customers. In raw data, for every incoming call there are multiple rows which capture agents action and these actions need to be pre-processed and transformed into a single path to perform analysis and identify what primary action was taken by agent.

For every incoming call to the customer support center, agents assign the tags or labels to each call with aim to capture the intent for which the customer made a call. This label should be able to classify the primary reason for which customer made a call and context for which customer needed information. Initial exploratory analysis was performed to understand the distribution of different call reasons based on which further strategy can be devised. There are over 100 unique reasons captured in the data. From simple univariate analysis graphical view, it was established that distribution of call reasons is very skewed with ‘Account transaction’ having 12% of total call volume and top 10 reasons accounting for ~55% calls. Further, reading the transcripts across different call reasons it was realized that agent-based labels cannot be completely relied to be source of truth and has lot of mis labelled data. Below is the summary of the distribution of call labels labelled by agents.
-17081511830300
-101600239616Figure 2: Agent – Call Label Distribution for categories with more than 1% calls*
00Figure 2: Agent – Call Label Distribution for categories with more than 1% calls*

left42158’All figures cited are fictional or are blocked for confidentiality reasons. All figures cited are for illustrative purposes only’
020000’All figures cited are fictional or are blocked for confidentiality reasons. All figures cited are for illustrative purposes only’

Figure 2 captures the top agent – call labels distribution for top 39 categories. These 39 labels have more than 0.75% total number of calls tagged across them. Remaining labels have less than 0.75% calls. This shows that the data is highly skewed while capturing the true labels by agents. This has two major drawbacks with objective to reduce number of incoming calls. First, if any predictive analytics model is built to predict the reason for which customer may contact, in most of the cases reasons predicted would be among ‘Account Transaction’ and these are very generic and broad terms in banking industry. These reasons can capture multiple sub-intents which are currently not defined and hence any modelling exercise done on current raw data would not be useful. Second, in many calls customers talk about more than one reason but current format captures only single intent. So, to provide a holistic solution, business needs to understand what are the different reasons for which customers contact.
Next step was to understand the reason behind this distribution, is this true representation or not. One way to do this was to manually browse through text transcripts for calls with different labels. From initial exploration it was realized that lot of calls have been mislabeled. Hence, these labels can not be relied completely to understand the reasons for which customers are calling the bank and cannot be the foundation in the exercise on how bank can improve its services. For example, ‘Account Transaction’ is a very broad and generic topic and does capture multiple sub classification of calls ranging from customer enquiry about outstanding payable bills, billing cycles, confirm fund transfers, overseas transactions, etc. So, it’s important to be able to identify these sub-topics from the point of developing a supervised machine learning models. Else, even an advance predictive model to predict customer propensity to call and most likely reason to contact would predict ‘Account Transaction’ reason as most probable reason for majority of population base. This does not give any actionable insights to customer support agents, whereas it could have been useful if predictions were more specific and actionable.

Bank wants to leverage data science capability and use natural language processing on calls transcript data along with other metadata to classify the calls into set of clusters each signifying one very specific actionable reason. Combine these predictions with agent log activity and actions during the call to come up with actionable solutions for each call reason cluster.

3. State of the Art
With the emergence of the Web 2.0 and social media, the amount of unstructured, textual data on the Internet has grown tremendously, especially at the micro level (Gopal, Marsden, ; Vanthienen, 2011). As of 2017, Gartner research has forecasted data volume to grow 800% in next five years of which 80% data will be unstructured accounting to text, images, etc. which users are generating every second. This exponentially increasing public data is creating abundant opportunities and widening horizons to extract insights for both qualitative and quantities purposes.

Traditionally, researchers used qualitative methods like manual coding to analyze natural language data. But with available data increasing exponentially this approach is no longer feasible for individual researchers or even teams of researchers. Along with that one cannot deny the bias introduced from subjective interpretation of different researchers (Indulska, Hovorka, ; Recker, 2012). One of the most essential and tedious task working with text data is categorizing text into some clusters or groups and assigning text to one or more categories. There are myriad ways to categorize text but each one has its own set of assumptions and limitations. Conventionally, researchers have manually coded text to categorize it (Berg ; Lune, 2011). Bottom-up Top-Down approaches are among the most basic ones among various coding methodologies which have been developed over time. In bottom-up approach the text suggests the words and phrases and researcher should analyze the data without any pre-conceived notions or bias towards the subject being analyzed. Where as in top-down approach, researchers use predefined coding architecture derived from literature and assign text to these classes (Urquhart, 2012).

Manual coding done by researchers have many strengths, most important being to understand the meaning of complex natural language comprising of negations or even sarcasm. At the same time manual coding methods adopted by researchers are prone to subjectivity and are not scalable on new data sets or similar problems from different industries. To overcome these, researchers have developed algorithms and leveraged abundant computing power to analyze text ranging from dictionary-based bag of words method to applying machine learning and deep learning algorithms.

In dictionary-based approach researchers compile a list of words and phrases in context to some specific problem. Then large amount of text data is analyzed using these dictionaries. It is one of the methods to automate top-down manual coding. This approach is widely used in sentiment analysis where list of positive words and negative words are used to classify text into positive, neutral or negative categories. Text data (for example, amazon reviews) are parsed and based on list of positive words and negative words in dictionary a normalized score is calculated for each review, popular SentiStrength (Thelwall, Buckley, Paltoglou, Cai, ; Kappas, 2010), use dictionary-based text categorization methods.

In machine learning approach researchers leverage supervised or semi-supervised learning methods. As mentioned previously, researchers have established pre-defined categories but haven’t established a relationship between words or phrases with categories (Quinn et al., 2010). Using a set of manually classified documents as set of model training group, one can apply supervised modelling algorithms to detect the linear or non-linear relationship between the pattern of words and category assignment. This training group forms a base model to predict on unseen future data. Email classification is one of the classical example leveraging supervised machine learning methods and Google classifies incoming emails into primary, social, promotions and spam category based on patterns within the text.
There are numerous cases where no pre-defined categorization exists and due to abundance of data it Is not feasible to manually label the data for supervised learning methods. In these scenarios unsupervised machine learning methods are leveraged to categorize text and find hidden patterns in data. (Quinn et al., 2010). Unsupervised learning methods like clustering, principal component analysis, Latent Dirichlet Allocation (LDA), etc. use features within text data to discover latent categories and assign text documents to these categories. This approach is like manual bottom-up coding method (Berente & Seidel, 2014). Unsupervised machine learning methods are gaining popularity in recent times as they require minimal human or subject matter expert intervention, can reproduce results on future data sets and are free from human bias. Only downside of using unsupervised methods is quantifying the performance of results. Takes huge amount of time in post-analysis time to ensure results are reliable and make sense.

Researchers have made significant progress in the field of Information Retrieval. Primary methodology proposed by them focuses on reducing each text document in the corpus to numerical representation. In the popular ‘Term Frequency – Inverse Document Frequency’ (TF-IDF) methodology to analyze textual data for every document comprises of two components. A count is established based on occurrences of every word referred to as term frequency. Inverse document frequency accounts for the number of incidences of a word in entire set of documents. Finally, a product of term frequency and inverse document frequency component is taken resulting in TF-IDF matrix for entire data set. This reduces the text data to set of numerical representation of words considering the occurrences of word in single document and at same time comparing the exclusivity of the word in entire corpus. Although TF-IDF captures the exclusivity of words and gives relative importance of words in a document as compared to full corpus, it is not able to inter and intra document statistical structure and contextual meaning within the text.
To overcome these shortcomings researchers have proposed different dimensionality reduction techniques. Clustering is widely used method in unsupervised learning techniques to group the set of documents into clusters and each cluster is uniquely identified by set of attributes which are extracted from available data features. There are multiple ways to approach unsupervised learning, by representing documents as bag of words and standard clustering techniques are used. Another method is based on matrix approximation methods i.e. singular value decomposition of document-word count matrix. This approach is referred to as latent semantic indexing (LSI).

‘The primary ideas in a statistical topic model are based on probabilistic model for each document in collection. A topic is a multinomial probability distribution over the V unique words in the corpus. Each topic t is a probability vector, p(w|t) = p(w1|t), . . . , p(wv|t), where ?v p(wv|t) = 1 and there are T topics in total, 1 ? t ? T.

A document is characterized as finite mixture of T topics. Each document d, 1 ? d ? N, has its own set of mixture coefficients, p(t = 1|d), . . . , p(t = T|d), a multinomial probability vector such as ?t p(t|d) = 1. Finally, a random word selected from document d will have the conditional distribution p(w|d) that is the mixture over topics, where each topic is multinomial over words:
pwd= t=1Tpwtp(t|d)If W words are to be simulated for document d using above mentioned model, following set of operations would be repeated W times. First, a topic t is sampled according to distribution p(t|d) and then sample a word w according to distribution p(w|t). Based on this generative model for given data, second step is to establish topic-word and document-topic distributions given observed data.’ (David, Chaitanya, Padhraic, Mark, 2006).

Hofmann came up with ‘probabilistic LSI’ (Hofmann, 1999) or pLSI which was based on EM algorithm. Blei, Ng and Jordan accounted for major drawback of overfitting in pLSI by leveraging general Bayesian setting (Blei, Ng and Jordan, 2003). All the methods described are based on the “bag of words” assumption – which implies that order in which words are present in a document have no meaning and hence is neglected. In probability theory, this is referred to as exchangeability assumption.

Researchers worked on mixture models to capture exchangeable representation of words and documents. This framework is called Latent Dirichlet allocation (LDA). LDA is a three-level hierarchical Bayesian model.

‘Before delving deep into LDA, we define the following terms:
A word is the most basic and elementary unit of text data defined within set of vocabulary {1, . . . , V}. Words are represented using unit-basis vectors and vth word in the vocabulary is represented by a V-vector w such that wv = 1 and wu=0 for u ? v.

A document is a sequence of N words represented by w = (w1, . . . , wn)
A corpus is a collection of M documents denoted by D = {w1, . . . , wm}
The primary idea of LDA is to represent documents as combination of topics where each topic is defined by a distribution of words. It assumes below mentioned generative process for every document w in the corpus.

Choose N ~ Poisson (?)
Choose ??~ Dir (?)
For each of N words wn:
Choose a topic zn ~ Multinomial (?)
Choose a word wn from p(wn|zn,?), a multinomial probability conditioned on the topic zn.’ (David, Andrew and Michael, 2003)
A k-dimensional Dirichlet random variable ??can take values in the (k-1)-simplex and has below mentioned probability density function:
126682512700
(1)
For the parameters ? and ?, joint probability of topics ???a set of N topics z and set of N words w is given by:
838200126365
(2)
In above mathematical formulation, p(zn|?) is simply ?i for the unique i such that zin = 1. Integrating over ? and summing over z, mathematical distribution of a given document is obtained:
2381258890
(3)

Marginal distribution of document is calculated by integrating over ? and summing over z. Finally, probability of corpus is calculated by taking the product of the marginal probabilities of single document. Probability of the corpus is given by:
1905011430
(4)
Figure 3 depicts the graphical representation capturing three level of LDA model architecture. The outer box represents the documents and inner box represents the repeated choice of topics and words within a document. The parameters ? and ? are corpus level parameters which are generated once in the process. The variables ?d are document level variables which are generated once for every document. Variables zdn and wdn are defined at word level and are sampled for every word once per document.

0186690
Figure 3: Graphical representation of LDA model
LDA model varies from simple Dirichlet – multinomial clustering model. In classical clustering model there are two levels in the model. Dirichlet is sampled once for a corpus, a multinomial clustering variable is selected once for each document in the corpus and finally set of words are selected for the document conditional on cluster variable. LDA involves three levels and main distinguishing factor is that topic node is sampled repeatedly within the document. (David, Andrew and Michael, 2003)
With the evolution of social media and information sharing platforms like Facebook, Twitter, Instagram, etc. online discussions have grown popular and are becoming reliable source of up to date information for users due to increasing consumer base and their liveliness on these platforms. Due to diversities in topics and issues discussed along with users originating from different parts of the globe, it’s not feasible to model these discussions using traditional supervised machine learning methods. For example, in a discussion about a sci-fi movie, group of users might be discussing about the story plot, other group who is interested in movie climax or plot-holes and similarly there can be other set of users talking about different segments. This makes essential for users to understand the context before joining the discussion.
To identify the latent topics among the documents, aspect terms are main features in most of the topic modelling analysis. In case where there are multiple intents this approach might have trouble modelling the true intent of discussion. Actions are kind of activity or functioning of a group of aspects (Lin et al. 2012), which leads to aspect-action relationship model. The relationships between aspects and actions in a discussion can make it easy to understand the complex relations in the discussions. Every discussion is assumed to be defined by intent and intent is defined by set of aspect/action topics that describe key semantics in the discussion. (Ghasem and Tat-Seng, 2014)
There is huge amount of research ongoing to model unstructured text data in an unsupervised environment. Topic Modelling using LDA forms the foundation for most of the research and advances in the field.

4. Solution: Description of Approach
Bank wants to leverage natural language processing to train unsupervised machine learning model on calls transcript data to understand the primary reasons for which customers outreach to support center and eventually use these results to build a predictive model to develop customer centric solutions as a part of next phase. This will enable bank to reduce the high overhead cost from customer support teams and at the same time enhance customer experience and journey with the bank. Getting prompt and reliable response to queries is one the big factors which ensure customer’s positive experience.
Unsupervised machine learning is the methodology used as there are no true labels in the data. Latent Dirichlet Allocation (LDA) is used for topic modelling analysis as it helps to classify the data into set of separate clusters. But before going deep into machine learning model first step is to understand the call transcript data. From initial exploration and manual readings of the call transcript data it is very untidy and required lot of pre-processing before feeding into machine learning models. Below are the primary steps involved in pre-processing and cleaning call transcripts data:
Noise Removal – Text data always comprise of lot of noise and proper care should be taken in dealing with this sort of noisy data.

Remove all the numeric digits as these primarily signify either amounts or account numbers or customer verification number in the given context.

Remove all punctuations and special characters, which are part of text but does not signify any information. Examples: characters like ‘:’, ‘-‘, etc.
Stopword removal is one of the most crucial steps in pre-processing. This comprises of removing the most frequent words as they do not capture any meaning or information. There are two components to this, one is general stopwords list which captures all the prepositions etc. like is, in, and, that, etc. and other is manual stopword list which captures domain and problem specific keywords. Example: keywords like ‘bank’, ‘name’, ‘account number’, etc. which are part of most of the calls as agent asks about the name and account number to verify the account holder.

Convert the text into lowercase as algorithm will treat same word separately if it’s present in lower and upper case.
Bi-grams were extracted from full data to combine most commonly occurring set of words. Example: ‘credit card’, ‘account number’, etc. often comes together and it’s more meaningful to combine them into single word. Similarly list of all the countries, regions in Singapore were gathered from internet and then clubbed into single categories mentioning ‘countries’ and ‘Singapore region’ respectively. This was incorporated at later stage after multiple iterations of the model as there were clusters which had names of multiple countries along with specific set of keywords. Idea was to capture the intent rather than names of different countries and initial results were diluted due to presence of multiple countries. One other pre-processing method which was incorporated was to scrape all the different products which the bank owns and curate a list of different sets. For example – list of all the credit cards and club them into single category. Similar analysis was done for all the products. This helps to better capture the customer intent regarding certain categories of product.

Lemmatization – Lot of words in English language are inflected with a morphological suffix. Example: word ‘look’ can have words like ‘looks’, ‘looking’, ‘looked’, etc. Now all these words share same stem but have slightly different context. Lemmatization is the algorithmic process of identifying the lemma for a given word. Stemming is the other method which is used to extract the stem of a given word. Primary difference comes from the point that stemming is rule based and operates on a single word without knowledge of the context. Stemming, however is easier to implement and runs faster but has lower accuracy as compared to lemmatization. In the given Lemmatization was used to pre-process the given data.

The initial hypothesis was that top call reason ‘Account Transactions’ is too generic to understand the reason for customer call and is having multiple actionable groups within itself. To validate this Markov Chain is built which is a common model in text processing. In Markov Chain, each choice of word depends on the previous word. It captures the most commonly occurring words and captures the neighboring keywords to define clusters and set of words leading to others. This can be easily visualized using R and can help understand the patterns within text data. (https://www.tidytextmining.com/ngrams.html)
Below visualization in Figure 4 is referred to as network or ‘graph’ not in the sense of visualization but combination of connected nodes from bigrams of transcript text data. Every graph is primarily composed of three components:
From: the initiating node
To: the concluding node
Weight: numeric value assigned to each pair
-29019522288500
Figure 4: Directed graph of common bigrams in transcripts labelled as ‘Account Transactions’
R provides an igraph package which has many inbuilt functionalities to manipulate and analyze networks. Weight is used to filter the pairs in above case to remove noise from pair of words which have very low occurrence and does not capture the primary meaningful conversations context.

It is evident from the above visualizations (Figure 3) that as expected all the calls tagged under ‘Account Transactions’ capture multiple intents and call reasons. Figure 4 shows that customers have made call for multiple reasons like outstanding balance, fund transfers, account verification, giro deduction, lost card, etc. Now further investigation is required which can classify call transcripts into set of clusters and assign probability to each cluster.

Topic Modelling is one the most common approach for unsupervised learning and Latent Dirichlet Allocation (LDA) is the primary algorithm which forms the foundation of topic modelling. It helps to divide the collection of documents such as transcripts, news articles, documents into natural groups so that each group has similar contextual documents.
LDA is a popular method for fitting a topic model. It treats each transcript as a mixture of topics and each topic as a combination of words. This provides a flexibility in classifying the transcripts as it does not divide transcript into discrete groups but allow for the overlap in terms of content. As expected single transcript can have multiple sub-topics or intents and topic modelling can assign multiple topics to single transcript.

-1143006604000
115162511334Figure 5: A flowchart for topic modelling
00Figure 5: A flowchart for topic modelling

Figure 5 captures the primary steps involved in topic modelling analysis. First and one of the most crucial steps is to pre-process the data and remove noise. Second, tidy text is used to generate corpus object which is used to create document-term matrix. Finally, LDA model is tuned with different parameters to achieve optimal solution. LDA is a mathematical model which estimates both topics and keywords at the same time which is finding the topics and assigning set of keywords to each topic ensuring the optimal mixture of topics that define each transcript. Visualization gives easy representation and business friendly view to look at different topics created by topic modelling.

In python Gensim module allows for LDA model estimation from training corpus and prediction of topic distribution of new and unseen data. The model can be updated and re-trained anytime quickly with incoming stream data. The core estimation of model is based on the onlineldavb.py.script.
(https://radimrehurek.com/gensim/models/ldamodel.html)
The algorithm for training LDA model on corpus is:
Streamed – training documents are incorporated sequentially.

Processed in constant memory w.r.t number of documents (transcripts). Interesting aspect of the algorithm is that it is not impacted by the size of training corpus.

Distributed – can speed up the process of training and model evaluation by making use of clusters of machine.

LDA model training comprises number of parameters which play significant role in optimal training. Below is the list of parameters: (https://radimrehurek.com/gensim/models/ldamodel.html)
corpus – It is the foundation of the model and can be passed as stream of document vectors or sparse matrix of shape comprising of number of terms and number of documents.

num_topics – Number of topics to be mined from the set of training data.

id2word – It helps to map word IDs to words and is used to determine the vocabulary size as well as debugging and topic printing.

distributed – It decides whether to use distributed computing or not. Distributed computing can be used to accelerate the model training process.

chunksize – Number of documents to be used in each training chunk.

passes – Number of passes through the corpus during training.

update_every – It defines whether batch learning or online iterative learning should be deployed.

alpha – It can be set to 1D array of length equal to number of topics. Default settings use ‘asymmetric’ which uses fixed normalized asymmetric prior number of topics and ‘default’ learns an asymmetric prior from the training corpus of documents.

eta – It describes the A-priori belief on word probability:
scalar for a symmetric prior over topic probabilities
vector of length num_words to denote an asymmetric probability for each word defined by the user.

matrix of shape comprising of num_topics and num_words to assign probability for each word-topic combination.

‘auto’ can be used to learn asymmetric prior from the data.

decay – A number to weight what percentage of previous lambda value is forgotten when each new document is examined. Usual value ranges from 0.5 to 1.

offset – controls the pace at which the first few steps are slowed down in initial iterations.

eval_every – Log perplexity is estimated every that many updates and setting it to one slows down the training by 2 times.

iterations – controls the maximum number of iterations through the corpus when inferring the topic distribution.

gamma_threshold – defines the minimum change in gamma parameters to continue iterating for training on corpus.

minimum_probability – Any topic lower than this probability will be filtered out.

random_state – seed used for reproducibility of results for future replication of results.

per_word_topics – Used by model to compute list of topics most likely to occur for each word.

Grid search is used to build set of multiple models for evaluation. Business knowledge and logical interpretation plays major role in defining relevance of topics for transcripts. Mathematical method of evaluating topics is using perplexity measure and coherence measure. After multiple iterations and discussion with business owners’ topic model results with 50 topics are finalized.

pyLDAvis is the Python library for interactive topic model visualization. It is designed to help users interpret the topics in topic model that has been trained on the text corpus. Package extracts information from the final LDA model object and uses that to create web based interactive visualization. The visualization can be both used within Jupyter notebook or standalone HTML files and shared easily.

-2387609715500
-163830156210Figure 6: Topics Modelling visualization showing 50 topics and 30 most relevant terms in topic 1 (Bottom 15 keywords are masked)
020000Figure 6: Topics Modelling visualization showing 50 topics and 30 most relevant terms in topic 1 (Bottom 15 keywords are masked)

The above visualization in Figure 6 helps to understand and visualize different topics describing the transcripts data. Size of topic is directly proportional to number of total documents (number of call transcripts) in full data. As the cursor is moved across different topics, right hand side shows the top keywords within each topic. There is a slider at top right corner which can be adjusted to find optimal relevance score for each topic, it helps to define how exclusive the set of words are within each topic and is measured by saliency and relevance.

-22098016573500
Table 1: Sample output form 50 topics from Topic Modelling and top 10 keywords within each topic (Other topics are not shown due to confidentiality purpose)
Table 1 captures the sample topics among 50 topics from Topic modelling using LDA algorithm. Topics look interpretable from the set of top keywords. For example, topic 1 has keywords card, new card, block, replace, etc. and manual reading of transcripts validated the primary context of these transcripts is block card. Topic 8 has keywords like waiver, late charges, annual fee, late, etc. and this topic groups all the calls where in customer contacts regarding the fee on their cards and they request for fee waiver or late charges waive off. Similarly, there are other topics like topic 24 which describes customers calling to request for consolidated hard copy statement or changing the billing cycle for payments. Topic 32 comprises the customers who contact to increase their credit limit or spending limit. Topic 34 describes customers who want to activate their card for overseas transactions.
It is evident from the topics and keywords assigned to each topic that they can distinguish the call transcripts quite clearly. Results are validated by reading the transcripts within each topic by contact center team.

-327660-3048000
-206375150495Table 2: Distribution of topics across different customer care agent-based labels
020000Table 2: Distribution of topics across different customer care agent-based labels

One of the primary reasons of leveraging natural language processing on transcript data was that current labelling of the calls by customer care agents is not reliable and most of the calls being tagged under ‘account transactions’, and other similar generic categories. As a next step, distribution of different topics within these customer care center agent-based labels is checked. ‘Account transaction’ should have calls distribution among multiple topics and reasons like block card or few waivers should be very specifically aligned to topics from topic modelling. Table 2 captures the distribution of different topics from topic modelling across four of the call labels tagged by customer care agents. As expected, initial call labels such as ‘Account Transactions’ are split across more than two topics with substantia percentage, indicating presence of sub-topics or multiple customer intents within these call transcripts. Whereas the more specific one’s such as ‘Fee Reversal’, ‘Amend Limits’ and ‘Token Status/History’ have dominance across one topic validating that agents tag these calls correctly most of the times and at the same time this is a sort of validation for topic modelling results.

Results were validated by sharing the sample transcripts with the call operations team and topic labels for each call. They assigned a validation score to each topic based on the inference of top keywords within each topic and how much they align with the customer intent within the transcript. 37 topic clusters accounting for 78% of calls had more than 80% of validation score.
As a part of driving actionable insights for every topic the next data source leveraged is agent’s audit log. It captures the actions the agent takes during the call to resolve or provide answer to customer’s queries. For every call there are multiple rows stored in the data, as every action is recorded, beginning with new call, customer verification, specific actions based on customer’s conversation and finally end call. Initial step before using this for analysis is to identify set of general keywords which are present across most of the calls or are very generic and does not capture any specific actions. For this exploratory data analysis, unigram distribution of key actions was taken and validated with business team. There are set of 90 unique actions and 17 keywords were identified as generic which does not capture any specific actions and hence were removed. Post this a path analysis was performed which transforms the data from multiple rows for single customer to single row and creates a path of different actions which were taken during the agent-customer conversation. Before performing path analysis all the actions for every call were sorted by time stamp as it captures the order in which different actions were taken.

1200593249
Figure 7: Flow of processing agent action data for path analysis
Figure 7 shows the raw data (for illustration purpose) for call C1 having agent actions A1, A2, A3, A4 and A5 with call start time as T1.1 and call end time as T1.5. It is pre-processed and generic agent actions (A1 and A5 in this case) are removed before creating a path of different actions and transforming the data in which every call is represented in 1 row of agent actions.

There are multiple reasons of performing this analysis. First, there could be set of defined actions every agent performs for customer calls pertaining to specific reasons. For example: Let’s say a customer calls in to block his lost card. So, set of actions agent would perform in this case would be to verify customer details, verify set of last transactions on the card, confirm block card request, enquire about new replacement card or not, confirm delivery time of new replacement card and finally submit request. If there are set of these calls with substantial volume, then these can be incorporated into digital channels or chatbots where customer can come in with certain query and perform these set of actions to resolve the query or submit request. Second, TF-IDF analysis is performed on agent action path to understand the most common action an agent performs for certain set of call reasons identified by topic modelling. Along with actionable insights this will also serve as one method of topic modelling validation. As in most of the agent actions should be like what topics are identified from topic modelling.
Term Frequency (TF) is one of the most basic and obvious technique to find relevance and number of occurrences of words in the document. The higher the frequency of the word in a document more is the relevance of the word to that document. But this has certain drawbacks when used for large number of documents. Certain set of keywords occur in high frequency for majority of documents and TF is not able to differentiate between them. Inverse Document Frequency (IDF) is also way to find relevance of keywords with basic hypothesis that less frequent words are more informative to differentiate set of documents.

IDF = log (N/DF)
Where N represents the total number of documents in the data and DF is the number of documents in which a certain keyword is present.
Now, TF and IDF scores are combined to come up with TF-IDF scores which helps to identify the most relevant key words in set of documents. Python has sklearn which can compute TF-IDF scores easily and automate the whole process. Table 3 shows top 5 agent audit actions based on TF-IDF scores and top 5 audit actions account for 25% of total customer calls. These are also in alignment with initial hypothesis and as expected by call operations team.

159125514599500
534450156707Table 3: Top 5 agent audit actions in transcript data
020000Table 3: Top 5 agent audit actions in transcript data

Now, top 3 audit actions are quantified within each topic identified by topic modelling. Table 4 captures the top 3 agent audit actions for random topics 1, 28 and 33. It is evident that all the 3 agent actions identified by TF-IDF are relevant an in-line with keywords assigned to each topic from topic modelling exercise. As topic modelling is unsupervised learning technique and there is no correct way to validate the results, using agent audit log activity helps to validates that topics assigned to call transcripts are correct and can be used for future analysis and classification of incoming calls.

-19812018542000
-95250208280Table 4: Top 3 agent audit action for topic 1, 28 and 33 from topic modelling
020000Table 4: Top 3 agent audit action for topic 1, 28 and 33 from topic modelling

Combining all the analysis including topic modelling on transcript data and top agent actions during every call gives a holistic understanding about the primary reasons for which customers make a call.
5. Discussion
First and most importantly, analysis proves that current labelling of inbound calls is not very informative and, in most cases, very generic to understand customer calling behavior and there is need to devote resources to optimize call reduction programs.

The analytical solution proposed provides actionable insights at two levels both aiming to reduce customer support operations cost and providing quick resolution to customer queries. First, the results of natural language processing are shared with the customer support operations team and they validated the results and have incorporated the results for better classification of inbound customer calls. Also, these results provide the information about the major reasons for which customers make a call and these will be leveraged by different business owners to improve the services and ensure better consumer experience. Second, it validates the application of machine learning methods on unstructured data and proves to be better than manual classification at large scale. It turns out be more efficient and scalable approach to classify large amount of text into set of specific categories.
Results can be leveraged on two fronts, first, there are big clusters of calls identified among clusters labelled as ‘block card’, ‘fee waiver’, etc. implying that large number of customers make calls for these reasons. Business can develop set of digital services and market these to customers for easy accessibility and ensure customers migrate to digital channels as compared to traditional call centers. Second, there are very specific clusters which can be targeted by improving current services like improving user interface for few of the web pages.
All the results combined will provide a better data driven approach to allocate resources to different segments of the business to optimize customer support operations.

6. Conclusions and Further Work
The analysis performed to improve the support center operations are based on latest advancements in the field of machine learning and natural language processing. Results are in alignment with the initial hypothesis from the business.

Topic Modelling is the primary machine learning methodology used to perform root cause analysis and understand the primary reasons for which customers contact the support center teams. As it remains to be one of the most expensive ways to provide the customer support, analytical solution developed from transcript data can helps the bank to identify different segments for which customer make calls. This information enables the business stakeholders with actionable insights across different verticals within the organization. These results will form the foundation of allocating resources to optimize the contact center operations and migrate customers to digital platforms.
The results from topic modelling analysis will be used by data science team to further develop predictive machine learning models for different topics. This will help to identify the customer propensity to contact or call for different reasons in future. Different customer reasons have different predictive nature attached with them. For example, it can not be predicted if a customer will lose his credit card in near future or not. So, this action is not of predictive nature and hence needs an easily accessible method on website and digital platforms, where customer can go and submit a request to block and order for a replacement. At the same time, there are reasons like activating card for overseas usage which can be predicted based on customer historical transaction history and location data. So, using a predictive model a real time solution can be developed to provide an option to activate card. Also, using these propensity scores customized customer centric solution strategies can be designed. First, having a personalized Frequently Asked Questions (FAQs) for every customer, proactively outreach to set of customers to provide solutions or transfer to digital platforms. Similarly marketing campaigns can be designed to resolve certain set of queries by promoting digital services and mobile application accessibility.

Customer support call center is one of the ways by which customers contact banks. Other frequently used mode of contact is chat option with agents. This is also an expensive mode to maintain due to high overhead costs. Same analysis is planned to replicate on text data generated from customer’s chat with agents. It is direct application of the methodology and solution developed from call transcripts text data. All the codes and process have been converted into set of functions which are equivalent of an automated tool. Using the text data from chat platform in pre-defined format all the pre-processing will be performed with very minimal human intervention. For LDA model there are set of default parameters which can be tuned using a grid search inbuilt into the program. The results from topic modelling on chat data can enable agents with recommendations during the customer chat sessions and can easily provide the information requested by customer.
There are other aspects of natural language processing which can be explored by leveraging deep learning techniques. Analytics enable the businesses and organizations to optimize their resources and take data driven decisions rather than gut-based decisions. It’s a transformation and journey which starts with identification of problems, building set of hypotheses, identifying relevant data sources, developing machine learning and predictive models and finally deploying it. Post deployment of predictive solutions, they need to be constantly monitored to ensure the underlying population and trends are homogeneous. As if there are any changes in some financial policies or customer demographics then models might need to re-calibrated. Most importantly, It requires buy in across all the levels within an organization ranging from analyst to stakeholders to global leaders. As former CEO of HP Carly Fiorina says aptly that ‘The goal is to turn data into information, and information into insights’. It is perfect for the problem and solution discussed.

Bibliography
http://www.callcentertimes.com/LinkClick.aspx?fileticket=mB27Q%2B746Z0%3D&tabid=83
https://www.informationbuilders.com/sites/default/files/pdf/solutions_brief/call_center_sb_final.pdf
Gopal, R., Marsden, J. R., & Vanthienen, J. (2011). Information mining—reflections on recent advancements and the road ahead in data, text, and media mining. Decision Support Systems, 51(4), 727-731.

Indulska, M., Hovorka, D. S., & Recker, J. (2012). Quantitative approaches to content analysis: Identifying conceptual drift across publication outlets. European Journal of Information Systems, 21(1), 49-69
Berg, B. L., & Lune, H. (2011). Qualitative research methods for the social sciences. Boston: Pearson.

Urquhart, C. (2012). Grounded theory for qualitative research: A practical guide. Thousand Oaks, CA: Sage.

Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., & Kappas, A. (2010). Sentiment in short strength detection informal text. Journal of the American Society for Information Science and Technology, 61(12), 2544-2558.

Quinn, K. M., Monroe, B. L., Colaresi, M., Crespin, M. H., & Radev, D. R. (2010). How to analyze political attention with minimal assumptions and costs. American Journal of Political Science, 54(1), 209- 228.

Berente, N., & Seidel, S. (2014). Big data & inductive theory development: Towards computational grounded Theory? In Proceedings of the 20th Americas Conference on Information Systems (pp. 1–11). Savannah.

Chakrabarti, S: Mining the Web: Discovering Knowledge from Hypertext Data. Morgan Kaufmann Publishers (2002).

Deerwester, S.C. , Dumais, S.T. , Landauer, T.K. , Furnas, G.W. , Harshman, R.A.: Indexing by Latent Semantic Analysis. American Society of Information Science, 41(6) (1990) 391–407
David Newman, Chaitanya Chemudugunta, Padhraic Smyth, and Mark Steyvers: Analyzing Entities and Topics in News Articles Using Statistical Topic Models. In: Intelligence and Security Informatics. ISI 2006. Lecture Notes in Computer Science, vol 3975. Springer, Berlin, Heidelberg
Hofmann, T.: Probabilistic Latent Semantic Indexing. 22nd Int’l. Conference on Research and Development in Information Retrieval (1999)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. Journal of Machine Learning Research, 1 (2003) 993–1022
David M. Blei, Andrew Y. Ng, Michael I. Jordan: Latent Dirichlet Allocation, Journal of Machine Learning Research 3 (2003) 993-1022
Ghasem Heyrani-Nobari, Tat-Seng Chua: User Intent Identification from Online Discussions Using a Joint Aspect-Action Topic Model:AAAI – 2014
https://www.salesforce.com/hub/service/call-center-analytics/
https://www.tidytextmining.com/ngrams.html
https://radimrehurek.com/gensim/models/ldamodel.html