黄杰, 2012-03-30
root[a]linuxsand.info
我很喜欢小王子, 想下载123诗社中的音频文件, 但发现地址是加密的. 尽管也许可以用浏览器缓存或是借助嗅探工具, 我还是想自己动手试一试.
保存该网页为文本文件: webpage.txt, 提取出所有加密地址, 去重, 得到加密后的地址, 写入encrypted_urls.txt.
代码:
# coding: utf-8
import re
content = open('webpage.txt').read()
url_list = re.findall('soundFile:\S+', content)
urls = list(set(url_list))
real_urls = [i[11:-17] + '\n' for i in urls]
with open('encrypted_urls.txt', 'w') as f:
f.writelines(real_urls)
内容:
aHR0cDovL3AucGFvd2FuZy5uZXQvZmlsZS9wb2VtL3hpbnJhbjEwMDIwODAyeGlhb3dhbmd6aTE0Lm1wMw aHR0cDovL3AucGFvd2FuZy5uZXQvZmlsZS9wb2VtL3hpbnJhbjEwMDIwODAyeGlhb3dhbmd6aTEzLm1wMw aHR0cDovL3AucGFvd2FuZy5uZXQvZmlsZS9wb2VtL3hpbnJhbjEwMDcxMXhpYW93YW5nemkyNjMubXAzA aHR0cDovL3AucGFvd2FuZy5uZXQvZmlsZS9wb2VtL3hpbnJhbjEwMDIwODAxeGlhb3dhbmd6aWRhb2R1MC0yLm1wMw aHR0cDovL3AucGFvd2FuZy5uZXQvZmlsZS9wb2VtL3hpbnJhbjEwMDEyNzAxeGlhb3dhbmd6aTkubXAzA aHR0cDovL3AucGFvd2FuZy5uZXQvZmlsZS9wb2VtL3hpbnJhbjEwMDEyMTAzeGlhb3dhbmd6aTcubXAzA aHR0cDovL3AucGFvd2FuZy5uZXQvZmlsZS9wb2VtL3hpbnJhbjEwMDExMzAyLm1wMw aHR0cDovL3AucGFvd2FuZy5uZXQvZmlsZS9wb2VtL3hpbnJhbjEwMDExNTAxeGlhb3dhbmd6aTQubXAzA aHR0cDovL3AucGFvd2FuZy5uZXQvZmlsZS9wb2VtL3hpbnJhbjEwMDcxMXhpYW93YW5nemkyNjEubXAzA aHR0cDovL3AucGFvd2FuZy5uZXQvZmlsZS9wb2VtL3hpbnJhbjEwMDMwNHhpYW93YW5nemkxOC5tcDM aHR0cDovL3AucGFvd2FuZy5uZXQvZmlsZS9wb2VtL3hpbnJhbjEwMDIyMzAxeGlhb3dhbmd6aTE1Lm1wMw aHR0cDovL3AucGFvd2FuZy5uZXQvZmlsZS9wb2VtL3hpbnJhbjEwMDYyMHhpYW93YW5nemkyMy5tcDM aHR0cDovL3AucGFvd2FuZy5uZXQvZmlsZS9wb2VtL3hpbnJhbjEwMDcxMXhpYW93YW5nemkyNy5tcDM aHR0cDovL3AucGFvd2FuZy5uZXQvZmlsZS9wb2VtL3hpbnJhbjEwMDIwMzAxeGlhb3dhbmd6aTExLm1wMw aHR0cDovL3AucGFvd2FuZy5uZXQvZmlsZS9wb2VtL3hpbnJhbjEwMDEyMTAyeGlhb3dhbmd6aTYubXAzA aHR0cDovL3AucGFvd2FuZy5uZXQvZmlsZS9wb2VtL3hpbnJhbjEwMDExNTAxeGlhb3dhbmd6aTMubXAzA aHR0cDovL3AucGFvd2FuZy5uZXQvZmlsZS9wb2VtL3hpbnJhbjEwMDQxNHhpYW93YW5nemkyMS0xLm1wMw aHR0cDovL3AucGFvd2FuZy5uZXQvZmlsZS9wb2VtL3hpbnJhbjEwMDQxNHhpYW93YW5nemkyMS0yLm1wMw aHR0cDovL3AucGFvd2FuZy5uZXQvZmlsZS9wb2VtL3hpbnJhbjEwMDcxMXhpYW93YW5nemkyNjIubXAzA aHR0cDovL3AucGFvd2FuZy5uZXQvZmlsZS9wb2VtL3hpbnJhbjEwMDQxNHhpYW93YW5nemkyMS0zLm1wMw aHR0cDovL3AucGFvd2FuZy5uZXQvZmlsZS9wb2VtL3hpbnJhbjEwMDcxMXhpYW93YW5nemkyNC5tcDM aHR0cDovL3AucGFvd2FuZy5uZXQvZmlsZS9wb2VtL3hpbnJhbjEwMDMwNHhpYW93YW5nemkxNy5tcDM aHR0cDovL3AucGFvd2FuZy5uZXQvZmlsZS9wb2VtL3hpbnJhbjEwMDMwNHhpYW93YW5nemkxOS5tcDM aHR0cDovL3AucGFvd2FuZy5uZXQvZmlsZS9wb2VtL3hpbnJhbjEwMDEyMzAxeGlhb3dhbmd6aTgubXAzA aHR0cDovL3AucGFvd2FuZy5uZXQvZmlsZS9wb2VtL3hpbnJhbjEwMDYyMHhpYW93YW5nemkyMi5tcDM aHR0cDovL3AucGFvd2FuZy5uZXQvZmlsZS9wb2VtL3hpbnJhbjEwMDEyNzAyeGlhb3dhbmd6aTEwLm1wMw aHR0cDovL3AucGFvd2FuZy5uZXQvZmlsZS9wb2VtL3hpbnJhbjEwMDcxMXhpYW93YW5nemkyNS5tcDM aHR0cDovL3AucGFvd2FuZy5uZXQvZmlsZS9wb2VtL3hpbnJhbjEwMDIwMzAyeGlhb3dhbmd6aTEyLm1wMw aHR0cDovL3AucGFvd2FuZy5uZXQvZmlsZS9wb2VtL3hpbnJhbjEwMDIyMzAyeGlhb3dhbmd6aTE2Lm1wMw aHR0cDovL3AucGFvd2FuZy5uZXQvZmlsZS9wb2VtL3hpbnJhbjEwMDQxNHhpYW93YW5nemkyMC00Lm1wMw aHR0cDovL3AucGFvd2FuZy5uZXQvZmlsZS9wb2VtL3hpbnJhbjEwMDExMzAxLm1wMw aHR0cDovL3AucGFvd2FuZy5uZXQvZmlsZS9wb2VtL3hpbnJhbjEwMDEyMTAxeGlhb3dhbmd6aTUubXAzA
播放这些音频文件用的是wp audio player, 在audio-player.php末尾找到了加密代码.
function encodeSource($string) {
$source = utf8_decode($string);
$ntexto = "";
$codekey = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_-";
for ($i = 0; $i < strlen($string); $i++) {
$ntexto .= substr("0000".base_convert(ord($string{$i}), 10, 2), -8);
}
$ntexto .= substr("00000", 0, 6-strlen($ntexto)%6);
$string = "";
for ($i = 0; $i < strlen($ntexto)-1; $i = $i + 6) {
$string .= $codekey{intval(substr($ntexto, $i, 6), 2)};
}
return $string;
}
看了半天(几乎完全不懂 PHP 语法), 查了在线手册, 差不多明白了:
其中, 试分析第5步的意义. 假设第4步得到的字串有488位, 如果没有第5步而直接做取6位运算: 488不能被6除尽, 取6运算循环进行81次(488 // 6 = 81), 488的最后2位没有被加密(488 - 81 * 6 = 2), 造成原字串损失, 那么解密运算无法还原出正确URL, 导致播放器无法播放指定音频.
第5步是在488后添加4个0(不同长度字串可能得到不同个数的0), 在取6循环中不损失.
那么, 解密操作就是:
# coding: utf-8
def decrypt(encrypted_url):
url_part = []
codekey = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_-'
for i in encrypted_url[:-1]:
index_num = codekey.index(i) # 加密后字串在codekey中的索引, 如 a -> 26
bin_str = index2bin(index_num) # 将传入的索引值(int)转换为6位二进制(string)
url_part.append(bin_str)
url = ''.join(url_part)
# 将所有形如 011010 的6位二进制数拼接为字串; 假设字串长度为492
# 舍去492位字串的后4位, 因为不能被8整除, 得488
url = url[:(len(url) // 8 * 8)]
url_part2 = []
k = 0
# 把488长度的字串分割为8位长度的字串若干
# 将传入的8位二进制(string)转换为ascii
for j in range(len(url) // 8):
bin = url[k:k + 8]
url_part2.append(bin2ascii(bin))
k += 8
return ''.join(url_part2)
def index2bin(num):
'''将传入的索引值(int)转换为6位二进制(string)'''
x = bin(num)[2:] # 转为二进制数, 27 -> 0b11010 -> 11010
y = ['0'] * 6
y[(6 - len(x)):] = x # 转为6位二进制数, 如 11010 -> 011010
bin_str = ''.join(y)
return bin_str
def bin2ascii(bin):
'''将传入的8位二进制(string)转换为ascii'''
ten = int(bin, 2)
ascii_str = chr(ten)
return ascii_str
urls = []
for i in open('encrypted_urls.txt').readlines():
url = decrypt(i)
urls.append(url+'\n')
with open('urls.txt', 'w') as f:
f.writelines(urls)
内容:
http://p.paowang.net/file/poem/xinran10020802xiaowangzi14.mp3 http://p.paowang.net/file/poem/xinran10020802xiaowangzi13.mp3 http://p.paowang.net/file/poem/xinran100711xiaowangzi263.mp3 http://p.paowang.net/file/poem/xinran10020801xiaowangzidaodu0-2.mp3 http://p.paowang.net/file/poem/xinran10012701xiaowangzi9.mp3 http://p.paowang.net/file/poem/xinran10012103xiaowangzi7.mp3 http://p.paowang.net/file/poem/xinran10011302.mp3 http://p.paowang.net/file/poem/xinran10011501xiaowangzi4.mp3 http://p.paowang.net/file/poem/xinran100711xiaowangzi261.mp3 http://p.paowang.net/file/poem/xinran100304xiaowangzi18.mp3 http://p.paowang.net/file/poem/xinran10022301xiaowangzi15.mp3 http://p.paowang.net/file/poem/xinran100620xiaowangzi23.mp3 http://p.paowang.net/file/poem/xinran100711xiaowangzi27.mp3 http://p.paowang.net/file/poem/xinran10020301xiaowangzi11.mp3 http://p.paowang.net/file/poem/xinran10012102xiaowangzi6.mp3 http://p.paowang.net/file/poem/xinran10011501xiaowangzi3.mp3 http://p.paowang.net/file/poem/xinran100414xiaowangzi21-1.mp3 http://p.paowang.net/file/poem/xinran100414xiaowangzi21-2.mp3 http://p.paowang.net/file/poem/xinran100711xiaowangzi262.mp3 http://p.paowang.net/file/poem/xinran100414xiaowangzi21-3.mp3 http://p.paowang.net/file/poem/xinran100711xiaowangzi24.mp3 http://p.paowang.net/file/poem/xinran100304xiaowangzi17.mp3 http://p.paowang.net/file/poem/xinran100304xiaowangzi19.mp3 http://p.paowang.net/file/poem/xinran10012301xiaowangzi8.mp3 http://p.paowang.net/file/poem/xinran100620xiaowangzi22.mp3 http://p.paowang.net/file/poem/xinran10012702xiaowangzi10.mp3 http://p.paowang.net/file/poem/xinran100711xiaowangzi25.mp3 http://p.paowang.net/file/poem/xinran10020302xiaowangzi12.mp3 http://p.paowang.net/file/poem/xinran10022302xiaowangzi16.mp3 http://p.paowang.net/file/poem/xinran100414xiaowangzi20-4.mp3 http://p.paowang.net/file/poem/xinran10011301.mp3 http://p.paowang.net/file/poem/xinran10012101xiaowangzi5.mp3
之后就可以用 wget -i urls.txt 下载了.