How to download Pandas Dataframe as Excel or CSV in Django?
Implementation of downloading a file using Django and Pandas with information about HTTP Response.
Introduction
Pandas is an essential tool used by Python developers used for data analysis purposes, but what is the point of the analysis if we are not able to provide that insight to the end user. When Django and Pandas are used in conjunction we can create applications that are not only developed faster but are also smart in utilizing the data that is collected from the user. I recently stumbled on one such use case, where the user interacts with the frontend and turns some filters and then the backend processes that data, and then provide the aggregated data as an Excel or CSV file to the user. I will be describing how I went about doing that below.
Prerequisites
To follow along with this article, I am assuming you already have a Django project that is using Pandas library ready.
Requirements
In addition to Django and Pandas project ready, you would need to install Openpyxl.
Openpyxl
Openpyxl is a tool that allows you to read and write Open Office XML formats such as Excel 2010 xlsx/xlsm/xltx/xltm files.
pip install openpyxl
Once installed, you can create Excel files natively from Python using a simple code like this.
from openpyxl import Workbook
workbook = Workbook()
sheet = workbook.active
sheet["A1"] = "ID"
sheet["B1"] = "Key"
sheet["A2"] = 1
sheet["B2"] = "Test Key"
workbook.save(filename="test.xlsx")
This will create a file called "test.xlsx". Try it out yourself.
HTTP Response
As you know, Django uses HTTP Request and Response to communicate with the client. Therefore all data must be converted to Byte String before sending it over to the client.
A Typical HTTP Response looks something like this.
For more information, check out this article by Mozilla about HTTP Messages.
As you can see in the above picture, every HTTP response has a body section which is a Byte Array and a Header called Content-Type specifies what type of content it is and which character set the browser should use to decode the bytes. In the above case, the server is sending text/html data and the browser need to use iso-8859-1 charset.
In our case, we are sending bytes that are related to Excel spreadsheets. So googling for a bit I found the content-type header for that is this. Source: Mozilla article about Common MIME Types
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
What are MIME Types?
MIME is short for Multipurpose Internet Mail Extensions which is a standard that indicates the nature and format of a document sent via the Internet.
Important: Browsers use the MIME type, not the file extension, to determine how to process a URL, so web servers must send the correct MIME type in the response's Content-Type header. If this is not correctly configured, browsers are likely to misinterpret the contents of files and sites will not work correctly, and downloaded files may be mishandled.
For more information, check out MDN Article about MIME Types
The structure of a MIME Type according to IANA (Internet Assigned Numbers Authority) is like this.
type/subtype
Therefore, in the MIME Type for Excel files, the type is "application", which belongs to any kind of binary data that cannot be directly decoded into a human-readable form (text, HTML, etc.). Files with type application require some sort of external or third-party software to decode and read such as PDFs, Zip files, Excel files, etc.
Download Excel Files using Django and Pandas
Finally, we will see how we can send the Pandas dataframe to the client as an Excel file.
I am assuming that you already have the code for pandas so I will do some abstraction in a variable called data.
First, we will import to the inbuilt library, BytesIO so that we can write the Excel file as a Byte array.
BytesIO is a library using which we can write data onto system memory instead of writing them as a file.
from io import BytesIO
Then we will use Python context manager to open a Byte buffer on which we can write the Excel file. Context Managers allow you to allocate and release memory precisely when you want it.
with BytesIO() as b
with pd.ExcelWriter(b) as writer:
# You can add multiple Dataframes to an excel file
# Using the sheet_name attribute
data1.to_excel(writer, sheet_name="DATA 1", index=False)
data2.to_excel(writer, sheet_name="DATA 2", index=False)
filename = "analytics_data.xlsx"
# imported from django.http
res = HttpResponse(
b.getvalue(), # Gives the Byte string of the Byte Buffer object
content_type='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
)
res['Content-Disposition'] = f'attachment; filename={filename}'
return res
We will be using context managers to open and close files because that's more efficient than manually closing and releasing the memory after use. If you already know how to export a set of data frames as an Excel file, that code should look like this.
with pd.ExcelWriter('data.xlsx') as writer:
data1.to_excel(writer, sheet_name="DATA 1", index=False)
data2.to_excel(writer, sheet_name="DATA 2", index=False)
We will simply take this code and wrap it around a BytesIO context manager which gives us a memory buffer b
as a file to work with. We will simply take that memory buffer and pass it to pd.ExcelWriter()
class and the rest of the code will be the same this that context manager.
with BytesIO() as b
with pd.ExcelWriter(b) as writer:
data1.to_excel(writer, sheet_name="DATA 1", index=False)
data2.to_excel(writer, sheet_name="DATA 2", index=False)
Now that we have our Excel data written in a memory buffer, we can simply convert that to ByteArray using the method b.getvalue()
which we will pass as a parameter in Django HttpResponse
with appropriate content_type
that we discussed earlier.
filename = "analytics_data.xlsx"
# imported from django.http
res = HttpResponse(
b.getvalue(), # Gives the Byte string of the Byte Buffer object
content_type='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
)
In addition to the Content-Type header, we also need to provide another header called Content-Disposition that tells the browser if we want to show the data in the browser download and save it as a local file as an attachment. We can also pass a filename with this header.
res['Content-Disposition'] = f'attachment; filename={filename}'
return res
Here is the complete code snippet for Django view to download the excel file.
def download_analytics_endpoint(request):
with BytesIO() as b:
data = get_analytics_data()
with pd.ExcelWriter(b) as writer:
data.to_excel(writer, sheet_name="Data", index=False)
filename = f"analytics_data.xlsx"
res = HttpResponse(
b.getvalue(),
content_type='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
)
res['Content-Disposition'] = f'attachment; filename={filename}'
return res
That is all, now you just have to create a Django URL for this function and you can download any file you want, just convert that file to a Byte string and use the correct Content-Type header.