surroundings
- python 3.8
- PyPDF2 2.1.0
foreword
PyPDF2
is an open source, free library written in pure python
language, mainly used to process pdf
files, including common functions such as separation, merging, cropping, conversion, encryption, and decryption.
Install
Install using pip
, execute the command
pip install PyPDF2
Example of use
Let’s take a look at some common pdf
file operation examples
Get basic information
Mainly using PdfReader
from PyPDF2 import PdfReader reader = PdfReader("test.pdf") # 总页数number_of_pages = len(reader.pages) # 第一页page = reader.pages[0] text = page.extract_text()
merge
Prepare the pdf
files to be merged and put them in the folder pdfs
. If there is a requirement for merging order, order the original files in a specific order, such as 1.pdf
, 2.pdf
import os from PyPDF2 import PdfFileMerger src_path = 'pdfs' # 将待拼接的pdf文件以绝对路径的形式放在一个列表里pdf_list = [f for f in os.listdir(src_path) if f.endswith('.pdf')] pdf_list = [os.path.join(src_path, filename) for filename in pdf_list] pdf_file_merger = PdfFileMerger() for pdf in pdf_list: pdf_file_merger.append(pdf, import_bookmarks=False) pdf_file_merger.write("merged.pdf")
encrypt documents
from PyPDF2 import PdfReader, PdfWriter reader = PdfReader("test.pdf") writer = PdfWriter() # 拷贝每一页的内容for page in reader.pages: writer.add_page(page) # 在新的pdf文件中添加密码writer.encrypt("secret-password") # 保存成新的pdf with open("encrypted.pdf", "wb") as f: writer.write(f)
After executing the code, open the generated encrypted.pdf
, you will be asked to enter a password to view
File decryption
from PyPDF2 import PdfReader, PdfWriter # 读取上面加密的pdf文件reader = PdfReader("encrypted.pdf") writer = PdfWriter() # 解密if reader.is_encrypted: reader.decrypt("secret-password") # 将每一页内容加到writer对象中for page in reader.pages: writer.add_page(page) # 保存解密后的pdf with open("decrypted.pdf", "wb") as f: writer.write(f)
After the code is executed, the newly generated decrypted.pdf
does not need to enter the password
Add watermark
from PyPDF2 import PdfWriter, PdfReader # 读取作为水印的pdf watermark = PdfReader("watermark.pdf") # 待加水印的pdf reader = PdfReader("test.pdf") page = reader.pages[0] # watermark.pdf的第一页作为水印page.merge_page(watermark.pages[0]) writer = PdfWriter() writer.add_page(page) # 保存成新的pdf with open("output.pdf", "wb") as fp: writer.write(fp)
Finally, for more detailed documentation, you can refer to the official link https://pypdf2.readthedocs.io/en/latest/
Topics in Python Practical Modules
More useful python
modules, please move
https://xugaoxiang.com/category/python/modules/
This article is reprinted from https://xugaoxiang.com/2022/06/11/python-module-31-pypdf2/
This site is for inclusion only, and the copyright belongs to the original author.