Unlike the nicely cleaned and formatted datasets that you get to analyse in tutorials, real-world data is often messy and noisy. The dataset contains US railroad incidents from 2019. It is not the most exciting dataset since I tried to find a public dataset which is similar to the type of data that he has to analyse at work. The whole notebook is available here, if you want to go through it. I will give some code samples from a notebook that I created for a friend who wanted to get started with text analysis in Python for his job. I want to show how to apply some simple, but powerful, text analysis techniques and how to tackle problems you might run into. However, I want to give a pragmatic example on how to deal with real-world text data which you might encounter in your daily work life. There are plenty of tutorials and articles on how to get started with NLP in Python, some of which I will link to in this article. This article will be of a similar format. Some Python knowledge is necessary, so I suggest you check out my previous article in which I give tips on how to get started with Python or R for Data Analysis. In this article, I want to start with the very basics of text analysis in Python. Besides, there exist various Python libraries for natural language processing (NLP) with a huge plethora of in-built functions that will do the heavy lifting for you. If you can write long and awkward functions in Excel, let me reassure you that Python is way easier and more intuitive. I have outlined in a previous article, that many people are reluctant to pick up coding because they believe that it is difficult and in-depth math knowledge is required. However, I honestly do not know why someone would do that if free and less awkward tools exist - like Python. So, apparently using MS Excel for text data is a thing, because there are add-ons you can install that create word counts and word clouds and can apparently even perform sentiment analysis.
0 Comments
Leave a Reply. |