Extracting Data from PDFs using Tabula

PDFs can be all forms and shapes – if you’re facing a nicely formatted PDF that is not scanned give Tabula a shot to extract the information. How? read the short walkthrough below:

You’ll need:

Waltkthrough: Extracting data from PDF tables

  1. Download the PDF at:: http://www.unhabitat.org/pmss/getElectronicVersion.aspx?nr=3387&alt=1

  2. Start Tabula (most likely by double clicking on the tabula icon)

  3. point your browser tof

  4. Choose the file you want to upload and click Submit


  5. Wait until the PDF is fully loaded

  6. Scroll down to page 167 – we’ll extract that table.

  7. Click and pull a selection box over the table


  8. A window will pop up to show how Tabula would extract the data.


  9. Now download the Data as CSV


  10. Fantastic you liberated the table from the PDF. Quick and easy wasn’t it?

Any questions? Got stuck? Ask School of Data!

Last updated on Sep 02, 2013.