伟的博客 研发工程师

python批量下载sogou音乐专辑,还有几个问题的版本2010-11-16

2011-03-10

原创文章,转载时请注明地址:http://zwssd1980.blog.163.com/blog/static/3029649201121010185785/ http://zwssdzwssd.appspot.com/log/79.html 问题1:存到系统的音乐文件名是乱码—已经修改 问题2:需要手动修改url地址 问题3:程序还不够模块化 好了,先这么多,以后再改 如果要说版本的话就算是0.01吧 20101123 修改了扩展名问题。 20101124 修改了正则,使其可以适用成名曲列表 20101125 增加了正则,使其适用按歌手列表下载,并修改成win下可用版本 #!/usr/bin/python #coding=GBK # Last Change: 2010-11-16 16:14:17 “”” downsogoump3.py 批量下载sogou音乐软件 “”” import re import os import sys import time import glob import string import socket import getopt import urllib import urllib2 import threading from sgmllib import SGMLParser import cookielib if name == “main”: urls = [] urls2 = [] #mp3.sogou.com专辑url地址 mp3url = raw_input(“请输入sogou音乐列表地址:\n”) # 初始化输入的变量 #mp3url = ‘http://music.sogou.com/singer/f4/detailSinger_%C1%F9%D5%DC.html?w=02310300’ #存储的目录 #songdir = “/home/david/音乐/sogoump3/”; songdir = “d:/音乐/sogoump3/”; response = urllib2.urlopen(mp3url) html = response.read() #print html pattern = ‘http://mp3.sogou.com/down.so\?s=(.+?)&w=02410600’ urls = re.findall(pattern, html) urlstart = ‘http://mp3.sogou.com/down.so?s=’ urlend = ‘&w=02410600’ myurltest = ‘’ for urltest in urls: myurltest = urltest break if myurltest==””: pattern = ‘http://mp3.sogou.com/down.so\?t=(.+?)&w=02420600’ urls = re.findall(pattern, html) urlstart = ‘http://mp3.sogou.com/down.so?t=’ urlend = ‘&w=02420600’ for urltest in urls: myurltest = urltest break print myurltest if myurltest==””: pattern = ‘/down.so\?gid=(.+?)&ac=0&c’ urls = re.findall(pattern, html) urlstart = ‘http://mp3.sogou.com/down.so?gid=’ urlend = ‘&ac=0&c’ for url in urls: myurl = urlstart+url+urlend #print myurl response2 = urllib2.urlopen(myurl) html2 = response2.read() pattern2 = ‘div class=”dl”><img src=’http://zwssd1980.blog.163.com/blog/ urls2 = re.findall(pattern2, html2) info1 = ‘(.+?)’ filename = re.findall(info1, html2) for filename2 in filename: filename = filename2.replace(‘_’, ‘ ‘) #在windows下边不需要解码 #filename = filename.decode(‘GBK’) #filename = filename.encode(‘utf-8’) #songdir = songdir.decode(‘utf-8’) #songdir = songdir.encode(‘GBK’) for url2 in urls2: url2ext = url2[-4:] filename = songdir + filename + url2ext print filename urllib.urlretrieve(url2, filename)


Comments