Read CSV files with Python

Read CSV files with Python

How do I read a CSV file with Python? It’s a common question, at the time of this writing there are 22, 335 questions related to “Python” and “CSV” on stack overflow. In this article I will to show you how to read CSV files with Python using a few different methods.

So what exactly is a CSV file? Well, as per the abbreviation it’s a comma separated values file. In simple terms we can think of CSV files as structured data that is written into a file with some type of delimiter to separate the values.

Python offers a few built in tools to read csv files, we’ll explore some of these options in more detail.

Using the Python string split method

One of the most basic ways we can read data from a CSV file is simply to use the split method of the string data type.

csv_data = []
with open("data.csv") as f:
    while f.readline():
        csv_data.append((f.readline().split(',')).trim())

In the code above we open a csv file (data.csv) and read the entire file line by line into a Python list. The split method of the string data type allows us to split each line by the comma separator.

While this method works just fine the biggest drawback is that each line we read will contain a newline character at the end and we have to deal with that ourselves.

$ python readcsv.py
['1', 'Eleanor', 'Torres', '6/15/1966', '53', 'Male', '$3877.89', '36766286', 'nudovug', 'weluwi', 'goccait', 'gizco', 'cegtatan\n']

As you can see in the output above the last item in the list has a new line character at the end (‘cegtatan\n’).

It’s fairly trivial to deal with this, we could either iterate through each line as we’re processing it or simply handle it when we’re going to use the data. I won’t go into detail of how exactly to deal handle this as I believe there are more elegant ways to parse CSV files and this shouldn’t be your go-to method.

Using the Python builtin CSV module

There is a builtin module allowing you to read csv files with Python, you simply need to import the CSV module and you’re ready to go.

The CSV modules has a couple of different ways to handle CSV files, I find the DictReader method the most useful.

csvfile = 'data.csv'
rows = []

with open(csvfile) as fp:
    csv_data = csv.DictReader(fp)
    header = next(csv_data)
    rows += [row for row in csv_data]

As seen in the snippet above, using the builtin method is very simple. Let’s breakdown what we are doing here in more detail.

In line 4 we do the standard opening of a file and allowing the lines to be read using a context handler. Once we have a file handler, we can use the CSV module.

Using the csv.DictReader method will give us an iterator which in turn yields an Ordered dict.

A bonus of using the CSV module is that new line characters are taken care of by the module.

If you haven’t seen an OrderedDict before it may look a little strange, but think of it as an ordered list of Tuples.

print("Header: %s" % header)
print(rows[0]['name'])

Header: ['id', 'first', 'last', 'birthday', 'age', 'gender', 'salary', 'timestamp', 'tag1', 'tag2', 'tag3', 'tag4', 'tag5']
OrderedDict([('id', '2'), ('first', 'Leroy'), ('last', 'Zimmerman'), ('birthday', '6/22/1958'), ('age', '44'), ('gender', 'Male'), ('salary', '$1511.04'), ('timestamp', '465854377'), ('tag1', 'ejje'), ('tag2', 'rov'), ('tag3', 'ol'), ('tag4', 'uskir'), ('tag5', 'jomizi')])

We created a header list which is separate from our data, I like to do this so I can use the field names in loops.

If we look at the OrderedDict we can see its a list of tuples, to access the data for the first line you do so using a standard list index. In order to access a field then we can use the field name as we would if this were a dictionary.

For example: rows[100][‘gender’] – Would return the gender of the record at index 100.

One drawback I find with this method is that you can’t easily deal with the data in a columnar fashion. If for example you wanted to get the total salary for the first 10 people, you would need to approach it as follows:

  • Iterate over the first 10 rows
  • Access the data in the salary column and either store all the values in a temporary variable and sum it. Alternatively sum the values as you’re iterating over the rows.

Using Pandas to read CSV files

Finally, we get to the most innovative way to read csv files with Python. We will use the Pandas library to read a file into a Dataframe.

import pandas as pd

df = pd.read_csv('data.csv')
print(df.columns)
print(df.head(10))

It’s as simple as that using Pandas, we now have a dataframe that we can work with.

The output of our script above will give us the column headers as well as the first 10 rows.

Index(['id', 'first', 'last', 'birthday', 'age', 'gender', 'salary',
       'timestamp', 'tag1', 'tag2', 'tag3', 'tag4', 'tag5'],
      dtype='object')
   id    first       last    birthday  age  gender    salary   timestamp      tag1      tag2     tag3      tag4      tag5
0   1  Eleanor     Torres   6/15/1966   53    Male  $3877.89    36766286   nudovug    weluwi  goccait     gizco  cegtatan
1   2    Leroy  Zimmerman   6/22/1958   44    Male  $1511.04   465854377      ejje       rov       ol     uskir    jomizi
2   3      Ada     Vaughn   8/31/1972   26    Male  $2783.30   663351321       esi     dofij    ucuef       rug        ke
3   4    Belle       Reed    5/6/2001   24    Male  $7546.58   549575210  zovidlun     atepi      owi       gam    molcuh
4   5    Leroy      Hayes    5/5/1982   38    Male  $5391.78   172739447     hitwe     ojivi  vapbiza  gahmibti  uzperpug
5   6    Shawn   Alvarado    6/7/1971   34  Female  $2250.64  1522506759     jaiba   rajbaju  nomliac      leiw  cidhiewi
6   7    Helen     Brewer  10/30/2002   35    Male  $7480.43   167041895       sib  ecejoege      wac     utceb    hosfew
7   8   Rodney  Zimmerman  10/31/1962   23  Female  $4387.22  1337286313    hefwem  zajcefiw       po       gar     daere
8   9    Gavin    Clayton   7/23/1994   56    Male  $7292.20  1194400903      vewi       oda    hovag     libir      supi
9  10    Della     Bowman   4/20/1990   29  Female  $3695.18   344285573   ajbalfe       pem  cedzeon      wapi        ke

There is a lot more you can do with Pandas but I will leave that for another tutorial.

If you’re wanting to further expand your Python toolkit be sure to check out my web scraping with Python post.