Extract PDF And Working With PDF
In this tutorial, We will learn how to work with PDF and how to extract PDF using python.
What is PDF?
PDF stand for Portable Document Format. It's is file extention (.pdf). This type of file is very used for sharing. This type of can't be modify and fomate. You can only read it not write or modify.
Need to Install PyPDF2:
Firstly, We need to install the PyPDF2 module to extract the pdf using python. Open your Terminal and Write command for install:
PS D:\New folder> pip install PyPDF2
How to Extract PDF Using Python:
To extract the pdf file in a simple way using the Python PyPDF2 model.
import PyPDF2
a = PyPDF2.PdfFileReader("f.pdf")
#get document information using getDocumentInfo.
print(a.documentInfo)
Output:
{'/Author': 'john', '/Creator': 'Microsoft® Word 2019', '/CreationDate': "D:20220101203121+05'30'", '/ModDate': "D:20220101203121+05'30'", '/Producer': 'Microsoft® Word 2019'}
import PyPDF2
a = PyPDF2.PdfFileReader("f.pdf")
#get number of pages in pdf file using getNumPages.
print(a.getNumPages())
Output:
import PyPDF2
a = PyPDF2.PdfFileReader("f.pdf")
#get number of pages in pdf file using getNumPages.
print(a.getNumPages())
5
import PyPDF2
#read pdf file
a = PyPDF2.PdfFileReader("f.pdf")
#get information of document
print(a.documentInfo)
#get page numbers
print(a.getNumPages())
#extract text from the pdf page number 4
print(a.getPage(4).extractText())
Output:
EXPERIMENT 2
Aim :
-
To study Data Definition language Statements.
DDL (Da.................................................................................................................................................................................................................................................................................................................This term is
also known as data description..................................
How to Write Extract Data Into a .txt File:
There is a simple way to write extract data from a pdf into a .txt file.
import PyPDF2
#read pdf file.
a = PyPDF2.PdfFileReader("f.pdf")
# print(a.documentInfo)
# print(a.getNumPages())
# print(a.getPage(4).extractText())
#extract data from a pdf file page number 4.
t=a.getPage(4).extractText()
#simple data of pdf page number 4 write into a .txt file.
with open("new.txt","w") as f:
f.write(t)
0 Comments