Buomsoo Kim

Importing local files in Google Colab

|

For Google Colab starters: Start machine learning with Google Colaboratory

As I mentioned in above post for Colab starters, Google Colab is an EASY, FREE, ACCESSIBLE, and SOCIAL way to code Python and implement machine learning algorithms.

In this post, we explore how to import files (csv, txt, or json format) in Colab.

Importing CSV / TXT files

CSV or TXT files are most common formats for sharing data. Importing CSV and TXT files are largely similar.

1. Upload file

To upload file, files module under google.colab should be imported in advance. Then use files.upload() function to upload CSV or TXT file. You could select the file by clicking the grey button and choose the file by clicking.

from google.colab import files
uploaded = files.upload()

Uploaded file is in Python dictionary format, with key as name of uploaded file and corresponding value as the contents of the file.

Note that in this case, each line is separated by \r\n.

2. Decode file

One way is to directly decode the contents using decode() function and separate each sentence using split() function. Result is a list with each element as contents in each line of the dataset.

file_name = "data.txt"
uploaded[file_name].decode("utf-8")
uploaded[file_name].decode("utf-8").split("\r\n")

3. Parse data

We can further separate each features in line using split() function again.

data = uploaded[file_name].decode("utf-8").split("\r\n")

for i in range(len(data)):
  data[i] = data[i].split(",")

print(data)

Using Pandas

Another way is to use pandas and io packages. This is slightly simpler with high-level functions. First convert dataset into StringIO object.

import pandas as pd
import io

io.StringIO(uploaded["data.txt"].decode("utf-8"))

Then, parse the dataset using read_csv() function. Note that result is pandas dataframe, instead of 2-D list like above method.

pd.read_csv(io.StringIO(uploaded["data.txt"].decode("utf-8")))


Importing JSON files

JSON is another common file format to share datasets. When importing JSON files in Python, we fall back on json library.

1. Upload data

import json
from google.colab import files
uploaded = files.upload()

2. Decode file

Decode and create StringIO object.

file_name = "data.json"
io.StringIO[file_name].decode("utf-8")

3. Parse file

JSON file can be easily parsed using json.loads() function. Result is Python dictionary, which is pretty similar data structure to JavaScript Object.

json.loads(uploaded[file_name].decode("utf-8"))


Code

Code in this post can be exhibited by below link. \

And more

In this post, I have shown you ways to upload local files in Google Colab. However, this is not the only way, and not the easiest either. As you know, Colab is one of the applications embedded in Google Drive. By taking advantage of such fact, we can easily import files that are in your Google Drive. In next post, I will cover how to import files from Google Drive.