Python utility module (31) PyPDF2

surroundings

foreword

PyPDF2 is an open source, free library written in pure python language, mainly used to process pdf files, including common functions such as separation, merging, cropping, conversion, encryption, and decryption.

Install

Install using pip , execute the command

 pip install PyPDF2

Example of use

Let’s take a look at some common pdf file operation examples

Get basic information

Mainly using PdfReader

 from PyPDF2 import PdfReader reader = PdfReader("test.pdf") # 总页数number_of_pages = len(reader.pages) # 第一页page = reader.pages[0] text = page.extract_text()

merge

Prepare the pdf files to be merged and put them in the folder pdfs . If there is a requirement for merging order, order the original files in a specific order, such as 1.pdf , 2.pdf

 import os from PyPDF2 import PdfFileMerger src_path = 'pdfs' # 将待拼接的pdf文件以绝对路径的形式放在一个列表里pdf_list = [f for f in os.listdir(src_path) if f.endswith('.pdf')] pdf_list = [os.path.join(src_path, filename) for filename in pdf_list] pdf_file_merger = PdfFileMerger() for pdf in pdf_list: pdf_file_merger.append(pdf, import_bookmarks=False) pdf_file_merger.write("merged.pdf")

encrypt documents

 from PyPDF2 import PdfReader, PdfWriter reader = PdfReader("test.pdf") writer = PdfWriter() # 拷贝每一页的内容for page in reader.pages: writer.add_page(page) # 在新的pdf文件中添加密码writer.encrypt("secret-password") # 保存成新的pdf with open("encrypted.pdf", "wb") as f: writer.write(f)

After executing the code, open the generated encrypted.pdf , you will be asked to enter a password to view

File decryption

 from PyPDF2 import PdfReader, PdfWriter # 读取上面加密的pdf文件reader = PdfReader("encrypted.pdf") writer = PdfWriter() # 解密if reader.is_encrypted: reader.decrypt("secret-password") # 将每一页内容加到writer对象中for page in reader.pages: writer.add_page(page) # 保存解密后的pdf with open("decrypted.pdf", "wb") as f: writer.write(f)

After the code is executed, the newly generated decrypted.pdf does not need to enter the password

Add watermark

 from PyPDF2 import PdfWriter, PdfReader # 读取作为水印的pdf watermark = PdfReader("watermark.pdf") # 待加水印的pdf reader = PdfReader("test.pdf") page = reader.pages[0] # watermark.pdf的第一页作为水印page.merge_page(watermark.pages[0]) writer = PdfWriter() writer.add_page(page) # 保存成新的pdf with open("output.pdf", "wb") as fp: writer.write(fp)

Finally, for more detailed documentation, you can refer to the official link https://pypdf2.readthedocs.io/en/latest/

Topics in Python Practical Modules

More useful python modules, please move

https://xugaoxiang.com/category/python/modules/

This article is reprinted from https://xugaoxiang.com/2022/06/11/python-module-31-pypdf2/
This site is for inclusion only, and the copyright belongs to the original author.

Leave a Comment