The Tabula way
What is Tabula and how does it work?
Tabula is an offline software, available under MIT open-source license for Windows, Mac and Linux operating systems, that allows you upload a PDF file and extract a selection of rows and columns from any table it may contain.
Getting Tabula
Tabula is available for the 3 major operating systems. Download it for Windows, MacOS and Linux . It works in a java environment so you will have to download java runtime environment if you don’t already have it.
Note:Â Tabula for Mac OS X comes with Java
Tips for installing
- Once the program is downloaded, you are halfway toward your first table extraction. Follow these steps to get Tabula set up and ready to go.
- Your downloaded file would be a zip file, so extract the folder within
- Go into the extracted folder and run the Tabula program in it
- It should automatically open in your browser (chrome, firefox, safari are all confirmed browsers that work)
- If it does not launch on you browser, use this URL – http://localhost:8080
You should now see the user interface of Tabula.
Extracting your table
Tabula is a pretty easy application to use once installed. This steps should see through the process:
- Upload your PDF file: Run the application file in your extracted folder. Tabula should launch and show the interface in figure 1 below. click on the Browse button as highlighted on the image to select among your documents the PDF you want to extract from. Here is an example PDF that you could use. The uploaded file should show on the right hand side as shown in Figure 1.

- Viewing the PDF document for Extraction:Â From the same screen seen in Figure 1, click on your uploaded file and you should get a view like Figure 2 below. Select the section of the table you want to extract, or select all if you are extracting the full table. Note: you can always adjust your selection.

- Exporting the data: Immediately after making your selection, your data should immediately show in a similar screen like Figure 3 below. You have an option to copy to clipboard and paste wherever you like or download your CSV file which can be opened in any spreadsheet application (Microsoft Excel, LibreOffice Calc, Google Spreadsheet…).. Simple and easy!

The limits of Tabula
As great as Tabula is, it has some shortcomings.
- It does not work on Multi-lines rows or merged cells.
- Tabula cannot detect a scanned PDF document. it only works on text-based PDF
Quickly pick one of those PDF files and see how the extraction goes. For more information, see the references below.
Tabula and command line
If you are at ease with the command line, and would like to use Tabula on a batch of similar documents, then you could use the tabula-extractor library directly. All information about this can be found here: https://github.com/tabulapdf/tabula-extractor/wiki/Using-the-command-line-tabula-extractor-tool
