Tryst with T-SNE
I received an article on what is google upto with AI. It was a discussion on how to visualise highly dimensional data into 2 dimension space using a clustering technique called T-SNE. It was an interesting read until you ask the question how do you go about using for your own data / use-case. I found one site which gave a great step by step instruction on why you should be wary of this technique (https://distill.pub/2016/misread-tsne/). Great, I haven’t even begun to understand the technique, now I have equally understand the caveats as well. But I ploughed on. I wanted to read a sample data of hotel feedback (https://archive.ics.uci.edu/ml/datasets/Eco-hotel) and see how this technique works on an unstructured text format. I have tried sentiment analysis and topic modelling before. I wanted to check if T-SNE would give me a different perspective. I used RTM, RGL and Rtsne package to perform data clean-up and transformation and run the technique.
Result of the analysis
I tried with a setting of perplexity (yep, it’s a parameter) of 40, which is midway. I got the results on a 3D graph, the snapshot of which is attached. I could see a number of words that appear together a lot in the dataset. Like Hotel and Spa appeared together. The other group of words was kitchen, dish and chef. Environment and Fantastic was a repeating theme. On a simple graph, we are able to scan thru, zoom in/out and twist and turn through the 3D cloud to get a sense of the aspects of hospitality that are deemed important. While sentiment analysis gives whether a tweet is happy or not, I was able to get a sense of underlying themes on what the customers consider important.