
How do I read a CSV file with Python? It’s a common question, at the time of this writing there are 22, 335 questions related to “Python” and “CSV” on stack overflow. In this article I will to show you how to read CSV files with Python using a few different methods.
So what exactly is a CSV file? Well, as per the abbreviation it’s a comma separated values file. In simple terms we can think of CSV files as structured data that is written into a file with some type of delimiter to separate the values.
Python offers a few built in tools to read csv files, we’ll explore some of these options in more detail.
Using the Python string split method
One of the most basic ways we can read data from a CSV file is simply to use the split method of the string data type.
csv_data = [] with open("data.csv") as f: while f.readline(): csv_data.append((f.readline().split(',')).trim())
In the code above we open a csv file (data.csv) and read the entire file line by line into a Python list. The split method of the string data type allows us to split each line by the comma separator.
While this method works just fine the biggest drawback is that each line we read will contain a newline character at the end and we have to deal with that ourselves.
$ python readcsv.py ['1', 'Eleanor', 'Torres', '6/15/1966', '53', 'Male', '$3877.89', '36766286', 'nudovug', 'weluwi', 'goccait', 'gizco', 'cegtatan\n']
As you can see in the output above the last item in the list has a new line character at the end (‘cegtatan\n’).
It’s fairly trivial to deal with this, we could either iterate through each line as we’re processing it or simply handle it when we’re going to use the data. I won’t go into detail of how exactly to deal handle this as I believe there are more elegant ways to parse CSV files and this shouldn’t be your go-to method.
Using the Python builtin CSV module
There is a builtin module allowing you to read csv files with Python, you simply need to import the CSV module and you’re ready to go.
The CSV modules has a couple of different ways to handle CSV files, I find the DictReader method the most useful.
csvfile = 'data.csv' rows = [] with open(csvfile) as fp: csv_data = csv.DictReader(fp) header = next(csv_data) rows += [row for row in csv_data]
As seen in the snippet above, using the builtin method is very simple. Let’s breakdown what we are doing here in more detail.
In line 4 we do the standard opening of a file and allowing the lines to be read using a context handler. Once we have a file handler, we can use the CSV module.
Using the csv.DictReader method will give us an iterator which in turn yields an Ordered dict.
A bonus of using the CSV module is that new line characters are taken care of by the module.
If you haven’t seen an OrderedDict before it may look a little strange, but think of it as an ordered list of Tuples.
print("Header: %s" % header) print(rows[0]['name']) Header: ['id', 'first', 'last', 'birthday', 'age', 'gender', 'salary', 'timestamp', 'tag1', 'tag2', 'tag3', 'tag4', 'tag5'] OrderedDict([('id', '2'), ('first', 'Leroy'), ('last', 'Zimmerman'), ('birthday', '6/22/1958'), ('age', '44'), ('gender', 'Male'), ('salary', '$1511.04'), ('timestamp', '465854377'), ('tag1', 'ejje'), ('tag2', 'rov'), ('tag3', 'ol'), ('tag4', 'uskir'), ('tag5', 'jomizi')])
We created a header list which is separate from our data, I like to do this so I can use the field names in loops.
If we look at the OrderedDict we can see its a list of tuples, to access the data for the first line you do so using a standard list index. In order to access a field then we can use the field name as we would if this were a dictionary.
For example: rows[100][‘gender’] – Would return the gender of the record at index 100.
One drawback I find with this method is that you can’t easily deal with the data in a columnar fashion. If for example you wanted to get the total salary for the first 10 people, you would need to approach it as follows:
- Iterate over the first 10 rows
- Access the data in the salary column and either store all the values in a temporary variable and sum it. Alternatively sum the values as you’re iterating over the rows.
Using Pandas to read CSV files
Finally, we get to the most innovative way to read csv files with Python. We will use the Pandas library to read a file into a Dataframe.
import pandas as pd df = pd.read_csv('data.csv') print(df.columns) print(df.head(10))
It’s as simple as that using Pandas, we now have a dataframe that we can work with.
The output of our script above will give us the column headers as well as the first 10 rows.
Index(['id', 'first', 'last', 'birthday', 'age', 'gender', 'salary', 'timestamp', 'tag1', 'tag2', 'tag3', 'tag4', 'tag5'], dtype='object') id first last birthday age gender salary timestamp tag1 tag2 tag3 tag4 tag5 0 1 Eleanor Torres 6/15/1966 53 Male $3877.89 36766286 nudovug weluwi goccait gizco cegtatan 1 2 Leroy Zimmerman 6/22/1958 44 Male $1511.04 465854377 ejje rov ol uskir jomizi 2 3 Ada Vaughn 8/31/1972 26 Male $2783.30 663351321 esi dofij ucuef rug ke 3 4 Belle Reed 5/6/2001 24 Male $7546.58 549575210 zovidlun atepi owi gam molcuh 4 5 Leroy Hayes 5/5/1982 38 Male $5391.78 172739447 hitwe ojivi vapbiza gahmibti uzperpug 5 6 Shawn Alvarado 6/7/1971 34 Female $2250.64 1522506759 jaiba rajbaju nomliac leiw cidhiewi 6 7 Helen Brewer 10/30/2002 35 Male $7480.43 167041895 sib ecejoege wac utceb hosfew 7 8 Rodney Zimmerman 10/31/1962 23 Female $4387.22 1337286313 hefwem zajcefiw po gar daere 8 9 Gavin Clayton 7/23/1994 56 Male $7292.20 1194400903 vewi oda hovag libir supi 9 10 Della Bowman 4/20/1990 29 Female $3695.18 344285573 ajbalfe pem cedzeon wapi ke
There is a lot more you can do with Pandas but I will leave that for another tutorial.
If you’re wanting to further expand your Python toolkit be sure to check out my web scraping with Python post.