Tweets Cleaning with Python

A tweet can contain a lot of things, from plain text, mentions, hashtags, links, punctuations to many other things. When you’re working on a data science or machine learning project, you may want to remove these things first before you process the tweets further. I am going to show you the steps needed to be…

Importing An Excel File to Pandas Data Frame

Excel format (.xlsx) is one of the most common document formats when you’re dealing with data analysis. I generally prefer CSV document format, but sometimes you have to deal with Excel document too. One of the main differences between an Excel and CSV format is you can have up to several sheets in an Excel…

Extracting (or Removing) Mentions and Hashtags in Tweets using Python

One of the most common problem when dealing with Twitter data (such as tweets) is knowing how to extract hashtags and mentions, or in some cases knowing how to remove these hashtags and mentions from a tweet. If you’re like me, who has to deal with data from social media (particularly Twitter) at work a…

Creating Time Range in Python: Date Range and Month Range

Oftentimes we run into a problem where we are required to generate a list or series of dates between two available dates. Pandas has provided us with some functionalities that made this possible using date_range() or period_range(). First, let’s define the two dates we have to generate the dates in between. Using date_range() We are…

Extracting Datetime Format in Python: Day, Date, Month, Year, etc.

Extracting data in datetime format can be very tricky and frustrating. I have summed up a few things about datetime in Python that I have learned in the code below. Let’s load the packages that we are going to need, such as pandas and datetime, then, define a variable to store a datetime data format….