In this article, we will download a data set from an online resource and create a working visualization of that data. As we have discussed in the previous article, the internet is being bombarded by lots of data each second. An incredible amount and variety of data can be found online. The ability to analyze data allows you to discover the patterns and connections.

We will access and visualize the data store in CSV format. We will use Python’s CSV module to process weather data. We will analyze the high and low temperatures over the period in two different locations. Then we will use matplotlib to generate a chart.

By the end of this article, you’ll be able to work with different datasets and build complex visualizations. It is essential to be able to access and visualize online data which contains a wide variety of real-world datasets.

The CSV File Format

CSV stands for Comma Separated Values. As the name suggests, it is a kind of file in which values are separated by commas.

It is one of the simpler ways to store the data in a textual format as a series of comma separated values. The resulting file is called a CSV file.

For example, below given is a line of weather data on which we are going to work in this article.

2014-1-6,62,43,52,18,7,-1,56,33,9,30.3,30.2,. . ., 195

This is the weather data. We will start with a small dataset. It is CSV formatted data Stika, Alaska. You can download datasets from https://www.wunderground.com. This is how your CSV file of data will look like:

csv data format

The first line is called the header.

Parsing the CSV File Headers

We can easily parse the values and extract the required information using the Python’s csv module. Let’s start by analyzing the first line of the file which contains the headers used for data.

1. Create a python file nameweather_data.py

2. Write the following statement to import the CSV module:

import csv

3. Download the data file from here.

4. Open the file using Python’s open function and print the headers:

filename = ‘sitka_weather_07-2018_simple.csv’ 
with open(filename) as f:
	reader = csv.reader(f)  #line 1 
	header_row = next(reader)  #line 2
	print(header_row)	#line 3

After importing the CSV module, we store the name of the file in the variable filename. We then open the file using the open function and store the result file object in f.  

Next, on line 1, we have called the reader function of the CSV module and passed the file object f to it as an argument. This function creates a reader object associated with that file.

The CSV module contains a next() function which returns the next line in the file. next() function accepts a reader object as an argument. So, it returns the next line of the file with which reader object is associated. We only need to call the next() function once to get the first line of the file which contains header normally.  So, the header is stored in the variableheader_row. The next line, just prints the header row.

Here is the output of the above code snippet:

['STATION', 'NAME', 'DATE', 'AWND', 'PGTM', 'PRCP', 'SNWD', 'TAVG', 'TMAX', 'TMIN', 'WDF2', 'WDF5', 'WSF2', 'WSF5', 'WT01', 'WT02', 'WT04', 'WT05', 'WT08']

reader processes the first line of comma-separated values and stores each as an item in the list.  For example, TMAX denotes maximum temperature for that day.

Printing the Headers with their positions

We usually print header with their position in the list, to make it easier to understand the file header data.

import csv

filename = 'sitka_weather_2018_full.csv'

with open(filename) as f:
    reader = csv.reader(f)
    header_row = next(reader)
    for index,column_header in enumerate(header_row):
        print(index, column_header)

Output:

0 STATION

1 NAME

2 DATE

3 AWND

4 PGTM

5 PRCP

6 SNWD

7 TAVG

8 TMAX

9 TMIN

10 WDF2

11 WDF5

12 WSF2

13 WSF5

14 WT01

15 WT02

16 WT04

17 WT05

18 WT08

We have used the enumerate() function on the list to get the index of each item in the list and as well the value. NOTE: We have removed the print(header_row) line to get a more detailed version of data.

Here we can see that the date and respective max temperature are stored in column 2 and 8 respectively. Let’s extract these.

Extracting and Reading Data

Now that we know which columns of data we need, let’s read in some of that data. First, we’ll read in the high temperature for each day:

highs = []
    for row in reader:
        highs.append(row[8])  #appending high temperatures   
    print(highs)

We make an empty list named highs. Then, we iterate through each row in the reader and keep appending the high temperature which is available on index 8 to the list. The reader object continues from where it left in the CSV file and automatically returns a new line on its current position. Then we print the list. It looks like below:

['48', '48', '46', '42', '46', '44', '39', '36', '34', '28', '34', '41', '53', '63', '60', '54', '47', '46', '42', '45', '43', '41', '41', '40', … , ]

As we can see that list returned is in the form of strings. Now, we convert these strings to number using int() so that they can be read by matplotlib.

for row in reader:
	if row[8] == ‘’:
		continue 	# There are some empty strings which can’t be converted to int
high = int(row[8]) #Convert to int
highs.append(high)  #appending high temperatures 

Now our data is ready to for plotting.

[48, 48, 46, 42, 46, 44, 39, 36, 34, 28, 34, 41, 53, 63, 60, 54, 47, 46, 42, 45, 43, 41, 41, 40, 40, 41, 39, 40, 40, 39, 36, 35, 35, 34, 42, 41, 39, 42, 39, 37, 37, 40, 43, 41, 40, 38, 36, 37, 39, 39, 38, 41, 42, 41, 39, 38, 42, 39, 40,. . ., ]

Plotting Data in Temperature Chart using Matplotlib

To visualize the temperature data, we will first create a plot of daily high temperatures using matplotlib.

So, here’s the Code.

import csv
from matplotlib import pyplot as plt
filename = 'sitka_weather_2018_full.csv'

with open(filename) as f:
    reader = csv.reader(f)
    header_row = next(reader)
    highs = []
    for row in reader:
        if row[8]=='':
            continue
        high = int(row[8],10)
        highs.append(high)  #appending high temperatures   
    
    #Plot Data
    fig = plt.figure(dpi = 128, figsize = (10,6))
    plt.plot(highs, c = 'red') #Line 1
    #Format Plot
    plt.title("Daily High Temperatures, 2018", fontsize = 24)
    plt.xlabel('',fontsize = 16)
    plt.ylabel("Temperature (F)", fontsize = 16)
    plt.tick_params(axis = 'both', which = 'major' , labelsize = 16)
    plt.show()

Running the Above Code, you’ll get this.

Data Visualizing from CSV Format to Chart

Thank you for reading. I hope It will be helpful. Comment If you find any difficulty. I’ll love to solve your problem.

Here’re some more Articles, you might be interested:

— Data Visualization in Python Using Simple Line Chart

— Developing Chat Application in Python with Source Code

— Top 5 Python Web Frameworks to Learn

Write A Comment