Extracting Data from PDFs using Tabula
PDFs can be all forms and shapes - if you’re facing a nicely formatted PDF that is not scanned give Tabula a shot to extract the information. How? read the short walkthrough below:
You’ll need:
- Tabula
- a PDF: e.g.
Waltkthrough: Extracting data from PDF tables
- Download the PDF at:
- Start Tabula (most likely by double clicking on the tabula icon)
- point your browser tof
- Choose the file you want to upload and click Submit
- Wait until the PDF is fully loaded
- Scroll down to page 167 - we’ll extract that table.
- Click and pull a selection box over the table
- A window will pop up to show how Tabula would extract the data.
- Now download the Data as CSV
- Fantastic you liberated the table from the PDF. Quick and easy wasn’t it?
- Improve this page Edit on Github Help and instructions
-
Donate
If you have found this useful and would like to support our work please consider making a small donation.