Use Python to implement RSS to Newsletter, welcome Email to subscribe to this blog update

Original link: https://www.skyue.com/23042217.html

origin

I want to add the Newsletter function to the blog, and it is best to support RSS to Newsletter. In this way, as long as the blog is updated, the Newsletter can be automatically updated.

There are quite a lot of such services, but most of them are charged and not cheap, such as the famous Mailchimp .

There are also free ones, such as Mailbrew . I used it for a while, but this product is no longer maintained. Recently, I couldn’t even log in, so I had to give up.

living comfortably without anybody’s help. So began to toss.

train of thought

To implement RSS to Email, only two functions are needed:

  1. Form tool, collect readers’ email addresses, need to support unsubscribe
  2. Mail sending service, parse RSS content and send it by mail

The second point only needs to find a mail sending server, and then use a Python script, which is easy to solve. I chose Amazon’s AWS SES email service , which is charged according to the amount sent, 0.1 knives per 1,000 emails. As far as my blog update frequency and subscriptions are concerned, the cost can be ignored.

Point 1, I don’t want (in fact, I don’t know how to) write front-end code, let alone do server-side development and operate databases. It is natural to consider finding a third-party database + form that supports API operations. The reader mailing list is relatively important data, and requires a reliable and easy-to-manage database. Notion’s Database and Google Sheet are better choices. The difference between the two:

  • Google Sheet has its own form tool, but it cannot be accessed in China
  • Notion is accessible in China, but you need to find a third-party form tool

I chose Notion to work with NotionForms , which provides form pages/components, and collects data and saves it to Notion’s Database. The free version does not limit the amount of collected data, which is completely sufficient.

OK, the advertising time is up, welcome to submit subscription through the following form, currently mainly push the content of the weekly magazine, updated every Monday morning.

(The whole process is fine for me to test myself, but after all, it will be the first push next Monday, if you receive any weird content, please pat)

The only regret of the free version of NotionForms is that it does not support data updates, which means that it cannot directly support unsubscribe.

The temporary solution is: provide a separate form and corresponding Notion Database for unsubscribe, and manually delete the subscriber list after receiving the unsubscribe application.

If there are too many unsubscribes, you can also write a script based on the Notion API, which will be considered later.

Technical realization

Based on the above ideas, only Python scripts are needed to complete all the work, only three steps:

  1. Read the Email list in Notion’s Database
  2. Crawl RSS data to get blog updates
  3. Call AWS SES to send email

Finally, configure the crontab timing task and execute it automatically every day.

Preparation

Of course, some preparatory work is required, roughly as follows, there are no step-by-step screenshots during the tossing process, you will take a look:

  1. Prepare Database : Create two Databases in Notion , namely the subscription list and the unsubscribe list . Both lists must have an attribute named “Email” with a value type of “Email”.
  2. Form preparation : Use NotionForms to connect to Notion, and create forms based on the above two Databases. Regarding form sharing methods, NotionForms supports independent form page URLs and also provides iframe embedding.
  3. Prepare Notion Token and permissions : Go to the Notion developer page , create an integration, select Internal as the type, and need a name, such as “NewsletterAPP”.

    1. A Token will be generated, please save it and use it later in the Python script.
    2. Go back to Notion’s “Subscription List” Database, click “Add connections” in the Database menu, find the “NewsletterAPP” you just created, and add authorization.
  4. Prepare AWS SES and permissions . I won’t talk about creating an account . There are a few points to note:

    1. The newly created account is in the sandbox and cannot send letters to the outside world. It is necessary to submit a work order to apply for removal from the sandbox. In the work order, it is necessary to specify the purpose, the amount of sending, and the handling method of bounced mail, etc.
    2. You need to create an IAM identity for the account, then obtain the corresponding aws_access_key_id and aws_secret_access_key , and save them to ~/.aws/credentials . ( reference )
  5. Third-party Python libraries that need to be used

    1. requests : do not explain
    2. feedparser : RSS parsing library, very easy to use
    3. boto3 : The official library of AWS SES, it is very convenient to send emails

full code

1. To run the code, you need to put the AWS SES identity credentials in the ~/.aws/credentials file, and add parameters to the code for other information.

2. Complete code

 #!/usr/local/bin/python3 # -*- coding: UTF-8 -*- import boto3,feedparser,requests,re,datetime from botocore.exceptions import ClientError from time import mktime # ++++++++++ # 各种参数准备# ++++++++++ # Notion参数EMAIL_DATABASE_ID = '' # 订阅列表的database id,在database的url中有NOTION_API_TOKEN = '' # 在Notion的开发者页面integration中查看EMAIL_DATABASE_URL = 'https://api.notion.com/v1/databases/{database_id}/query'.format(database_id=EMAIL_DATABASE_ID) HEADERS = { 'accept': 'application/json', 'Authorization': 'Bearer {token}'.format(token=NOTION_API_TOKEN), 'Notion-Version': '2022-06-28', 'content-type': 'application/json' } PAGE_SIZE = 100 # AWS SES参数REGION_NAME = 'ap-northeast-2' # AES SES地区,在AWS账户中有SOURCE = 'Name <email_address>' # AWS SES账户中已经授权的发件人client = boto3.client('ses',region_name=REGION_NAME) # RSS地址及个人邮箱,邮件用于接收通知NOTI_MYSELF_EMAIL = 'email_address' # 每次Newsletter发完后,会邮件通知此邮箱BLOG_URL = 'https://www.skyue.com/category/weekly/' # 博客URL,在Newsletter正文的底部使用,类似于查看更多RSS_URL = 'https://www.skyue.com/feed/category/weekly/' # 需要转Newsletter的RSS地址AUTHOR_URL = 'https://www.skyue.com' # 作者的个人主页,Newsletter正文作者处有用到# NotionForm取消订阅的表单地址,Newsletter正文底部用到UNSCRIBE_URL = '' # ++++++++++ # 开始实现功能# ++++++++++ # 通过Notion接口,获取邮件列表# Notion数据库中,邮件地址存放在Email列,且Email列的数据类型为Email # Notion中,至少需要有一个邮件地址,代码未做无地址的兼容# 返回数据组,存储邮件列表def get_emails(): payload = {"page_size": PAGE_SIZE} emails = [] response = requests.post(EMAIL_DATABASE_URL, json=payload, headers=HEADERS) for result in response.json()['results']: if result['properties']['Email']['email']: emails.append(result['properties']['Email']['email']) next_cursor = response.json()['next_cursor'] while next_cursor: payload = {"page_size": PAGE_SIZE, 'start_cursor': next_cursor} response = requests.post(EMAIL_DATABASE_URL, json=payload, headers=HEADERS) for result in response.json()['results']: if result['properties']['Email']['email']: emails.append(result['properties']['Email']['email']) next_cursor = response.json()['next_cursor'] # 对邮件列表的数据进行格式校验和去重emails = list(set(filter_email(emails))) return emails # 定义一个函数,用于检验邮箱地址格式,过滤不正确的邮箱def filter_email(emails): pattern = r'(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)' result = [] for e in emails: if re.match(pattern, e.strip()): result.append(e.strip()) return result # 通过RSS获取文章内容,并处理好HTML邮件正文,使用feedparser库# 只获取最新一版文章,返回结果为字典,包括标题、链接、发布时间及文章正文def get_article(): rss_weekly = feedparser.parse(RSS_URL) title = rss_weekly['entries'][0].title link = rss_weekly['entries'][0].link content = rss_weekly['entries'][0].content[0].value.replace('<a href="', '<a style="color:#3354AA" href="') published = datetime.datetime.fromtimestamp(mktime(rss_weekly['entries'][0].published_parsed)) + datetime.timedelta(hours=8) # 根据文章字段,生成邮件的正文部分content_html = ''' <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <title> {title} </title> <meta name="viewport" content="width=device-width, initial-scale=1.0"/> </head> <body style="margin: 0; padding: 0; font-size: 1.2em; color:#111; text-decoration:none; "> <table cellpadding="0" cellspacing="0" width="100%"> <tr> <td> <table align="center" border="0" cellpadding="5" cellspacing="0" width="600" style="border-collapse: collapse; border-color:lightgray; "> <tr> <td> <h1> <a style="color:black;text-decoration:none;" href="{link}">{title}</a> </h1> <p> <i>by <a style="color:black" href="{author_url}">拾月</a></i> </p> <p> {content} </p> </td> </tr> <tr> <td>-- EOF --</td> </tr> <tr> <td> <p>链接:<a style="color:black;" href="{blog_url}">往期周刊</a> | <a style="color:black;" href="{unsubscribe_url}">取消订阅</a> </p> </td> </tr> </table> </td> </tr> </table> </body> </html> '''.format( title=title, link=link, content=content, blog_url = BLOG_URL, unsubscribe_url = UNSCRIBE_URL, author_url = AUTHOR_URL ) return { 'title': title, 'link': link, 'published': published, 'content_html': content_html } # 定义一个日志写入函数,用于保存相关成功或错误信息到日志文件def write_log(text): with open('email_send_log.txt', 'a') as f: f.write(text) # 定义发送邮件的函数,参数: # 类型:发送给自己的通知,还是发给读者的文章,会记录到日志中# 收件人: # 标题: # 正文: # 返回值:发送状态,True or False def send_email(email_type, to_email, title, body): try: # 尝试使用aws ses发邮件response = client.send_email( Destination={ 'ToAddresses': [ to_email, ], }, Message={ 'Body': { 'Html': { 'Charset': "UTF-8", 'Data': body }, }, 'Subject': { 'Charset': "UTF-8", 'Data': title, }, }, Source=SOURCE, ) # 保存日志,成功或错误都保存,并返回发送状态except ClientError as e: log = '{dt} {email_type} to {email} error info: {error}\n'.format( dt=datetime.datetime.now(), error=e.response['Error']['Message'], email = to_email, email_type = email_type ) write_log(log) return False else: log = "{dt} {email_type} to {email} success message_id: {messageid}\n".format( dt = datetime.datetime.now(), messageid=response['MessageId'], email = to_email, email_type = email_type ) write_log(log) return True # 批量发Newsletter,返回发送成功和失败的数量def send_newsletter(): success_cnt = 0 failure_cnt = 0 emails = get_emails() article = get_article() have_new_post = 0 #判断文章时间,如果是当天的,才发if article['published'].strftime('%Y-%m-%d') == datetime.datetime.now().strftime('%Y-%m-%d'): have_new_post = 1 for email_addr in emails: status = send_email('send_newsletter_email' ,email_addr, article['title'], article['content_html']) if status == True: success_cnt = success_cnt + 1 else: failure_cnt = failure_cnt + 1 return { 'success_cnt': success_cnt, 'failure_cnt': failure_cnt, 'have_new_post': have_new_post } if __name__ == '__main__': try: result = send_newsletter() # 记录脚本运营状态write_log('{dt} run_script success: success_cnt={success_cnt}, failure_cnt={failure_cnt}, new_post={have_new_post}\n'.format( dt=datetime.datetime.now(), success_cnt=result['success_cnt'], failure_cnt=result['failure_cnt'], have_new_post=result['have_new_post'] )) # 邮件通知自己脚本的运行状态send_email('send_noti_myself' ,NOTI_MYSELF_EMAIL, 'Newsletter脚本运行通知', '<html>发送成功:{success_cnt},发送失败:{failure_cnt}, 更新数量:{have_new_post}</html>'.format( success_cnt=result['success_cnt'], failure_cnt=result['failure_cnt'], have_new_post=result['have_new_post'] )) except Exception as e: write_log('{dt} run_script error: {e}\n'.format(dt=datetime.datetime.now(),e=e)) # 邮件通知自己脚本的运行状态send_email('send_noti_myself' ,NOTI_MYSELF_EMAIL, 'Newsletter脚本运行通知', '<html>脚本运行报错:{e}</html>'.format(e=e))

3. Set timed tasks

For example, I save the code in /sky/job/newsletter/newsletter.py , and then set the following crontab tasks to be executed at 8:00 every Monday morning.

 0 8 * * 1 python3 /sky/job/newsletter/newsletter.py >> /sky/job/newsletter/crontab.log 2>&1

This article is transferred from: https://www.skyue.com/23042217.html
This site is only for collection, and the copyright belongs to the original author.