How to Extract Tables from PDF Files
Using PDF Files is an easy and secure way to mail out and share documents. When you want to retain the format and details of a table, it’s a good idea to convert the document into a PDF file. However, extracting tables from one is quite a task if you’re receiving a PDF file.
Of course, you would want to save time and wouldn’t want to create a new table from scratch. So, how do you extract tables from PDF without relying on copy and paste? Here are some of the ways you can achieve that.
Using Online Converters
The quickest way is to use the online file converters to change the PDF file to any other document format you want. Several free online tools convert PDF to Excel to extract tabular data. Popular examples are cometdocs and Smallpdf. However, most of these tools offer only basic extraction capabilities.
Disclaimer: We strong advise you to avoid using online convertors while dealing with PDF files that contant confidential and sensitive information.
Here’s how to extract tables from PDF files using Smallpdf:
Step 1: Visit the Smallpdf website.
Step 2: Select the conversion to be done.
Step 3: Drag the PDF file to be converted into the PDF converter.
Step 4: Once the file is done uploading, click on Convert to Excel.
Step 5: Click on Download to save the PDF file to your device.
Step 6: Launch the downloaded excel to confirm if your table has been accurately converted.
Extract Tables Using Microsoft Power BI
Microsoft BI is another useful tool to extract table data from PDF files. Particularly, the Power Query feature on Power BI makes it easy for users to import PDF files and extract the table data inside the document.
However, this works only for those with Office 365 subscription. Alternatively, you will have to purchase a Power BI package separately.
The Power Query feature is also available under the Power BI free trial. Here’s how to use it to extract tabular data:
Step 1: Download, install, and launch Microsoft Power BI.
Step 2: Select Get Data in the Home section of the app’s desktop.
Step 3: Click on File then select PDF.
Step 4: Click on Connect to browse the location of the PDF file on your computer.
Step 5: Select the location of the PDF file to import it into Power BI.
Step 6: Once the file has been imported into Power BI, you should see a Navigator with table numbers and Page numbers. Select the Table Number to be loaded.
Step 7: Select Load to create the Table on Power BI.
Using Microsoft Excel
Microsoft Excel like Power BI has the Power Query feature which can be used to load PDF files and extract tabular data. However, this feature is only available on Excel 2016 or newer versions. Here’s how it works:
Step 1: Launch Microsoft Excel.
Step 2: Select Data on the Ribbon.
Step 3: Select Get Data to launch the dropdown.
Step 4: Select From File and From PDF.
Step 5: Select the location of the PDF file to import it into Excel.
Step 6: Once the file has been imported into Excel, you should see a Navigator with table numbers, page numbers or a preview of the data within the PDF. Select the Table Number to be loaded.
Step 7: Select Load to create the table on Excel.
If you’re using Microsoft Office 2016 or a newer package and you observe that your data tab does not contain the Power Query feature. Here’s how to install it.
Step 1: Visit the Microsoft Website.
Step 2: Select the Language of the add-on.
Step 3: Select Download.
Step 4: Select the version of the add-in you want to download.
Step 5: Select Next to commence download.
Step 6: Once the file has completed download, Run the file to install the add-in.
Using Adobe Acrobat Pro DC
Adobe Acrobat Pro DC is another tool that can be used to extract tabular data from PDF files or convert PDF files into other file types such as excel for data analysis. The tool has a 7-day trial version and a professional version and can be used online or downloaded to a device.
Adobe Acrobat Web Version
Here’s how to use the web version to extract tabular data:
Step 1: Visit the Adobe Acrobat Pro DC weblink.
Step 2: At the top of the page, select the Convert menu.
Step 3: Scroll down the website and locate PDF to Excel.
Step 4: On the new pop-up menu, drag and drop the PDF file you want to convert.
Step 5: Once the file is done uploading, select Export to XLSX.
Step 6: Select the Download icon at the top of the page to save the converted file to your device.
Adobe Acrobat on PC
Here’s how to use the PC version to extract tabular data:
Step 1: Launch the Adobe Acrobat Pro DC app.
Step 2: On the app, select Open a File.
Step 3: Using the file library select the PDF file to be imported to the app.
Step 4: Once the file opens, use the cursor to highlight the table data.
Step 5: Right-click on the highlighted area and select Export Selection As.
Step 6: On the File Library input details of the File Name and select the File Type.
Comparing Two PDF Files
Extracting tables from PDF files is not an impossible task. Using Power Query on Excel and Power BI, you can import other file types apart from PDF. On the other hand, using Adobe Acrobat Pro DC, you can compare two PDF files side by side.