Recently I realized that Amazon offers high resolution photos in some products.
I have found the way to extract these hires images.
Note Note!: This method might be unavailable in the future because Amazon may not like this kind of hack or change the implementation :P
http://www.amazon.com/Silver-Violin-Nicola-Benedetti/dp/B008CYV046/
If you see the html source of this page, you can find the following json string.
If you would like to see the hi-resolution image, you should pick up the url which is defined in hiRes property.
I use Jackson for parsing JSon string.
The Usage of the AmazonImageService class is quite simple.
I have found the way to extract these hires images.
Note Note!: This method might be unavailable in the future because Amazon may not like this kind of hack or change the implementation :P
Basic Strategy
For example, this page:http://www.amazon.com/Silver-Violin-Nicola-Benedetti/dp/B008CYV046/
If you see the html source of this page, you can find the following json string.
var colorImages = {"initial":[{"large":"http://ecx.images-amazon.com/images/I/XXXXXXX.jpg",....
It represents the urls of various size of images.If you would like to see the hi-resolution image, you should pick up the url which is defined in hiRes property.
Source Code
You know I still like Java, I will show you the simple Java code.I use Jackson for parsing JSon string.
import java.io.IOException;
import java.net.URISyntaxException;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import com.dukesoftware.utils.common.AmazonImageInHtmlFinder.AmazonImageUrl;
import com.fasterxml.jackson.core.JsonParseException;
import com.fasterxml.jackson.core.type.TypeReference;
import com.fasterxml.jackson.databind.DeserializationFeature;
import com.fasterxml.jackson.databind.JsonMappingException;
import com.fasterxml.jackson.databind.ObjectMapper;
public class AmazonImageService {
// pattern for finding JSon string which has hires image url.
private static final Pattern AMAZON_HIRES_IMAGE_JSON_PATTERN
= Pattern.compile("var\\s+colorImages\\s*=\\s*(\\{.+\\});");
// some cleanup pattern
private static final Pattern PATTERN_REMOVE_AMAZON_SIZE = Pattern.compile("\\._[\\w0-9]{1,6}_");
private final ObjectMapper mapper;
public AmazonImageService()
{
this(new ObjectMapper());
}
public AmazonImageService(ObjectMapper mapper)
{
this.mapper = mapper;
// a bit important part: Since I would like to minimize the pojo, I configure like this - ignore unknown properties.
this.mapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false);
}
public List<AmazonImageUrl> getImages(String asin)
throws URISyntaxException, IOException, JsonParseException,
JsonMappingException {
String url = toAmazonUrl(asin);
String content = getStringContentsFromURL(url, "utf-8");
String jsonStr = extractJSon(content);
Map<String, List<AmazonImageUrl>> map
= mapper.readValue(jsonStr, new TypeReference<Map<String, List<AmazonImageUrl>>>() {});
List<AmazonImageUrl> resultUrls = extractImageUrls(map);
cleanup(resultUrls);
return resultUrls;
}
private static void cleanup(List<AmazonImageUrl> resultUrls) {
for(AmazonImageUrl resultUrl : resultUrls)
{
if(resultUrl.getHiRes() != null)
{
resultUrl.setHiRes(PATTERN_REMOVE_AMAZON_SIZE.matcher(resultUrl.getHiRes()).replaceAll("").trim());
}
}
}
private static List<AmazonImageUrl> extractImageUrls(
Map<String, List<AmazonImageUrl>> map) {
List<AmazonImageUrl> resultUrls = new ArrayList<AmazonImageUrl>();
for(List<AmazonImageUrl> imageDataList : map.values())
{
for(AmazonImageUrl imageData : imageDataList)
{
if(imageData.getHiRes() != null || imageData.getLarge() != null)
{
resultUrls.add(imageData);
}
}
}
return resultUrls;
}
private static String extractJSon(String content) {
Matcher matcher = AMAZON_HIRES_IMAGE_JSON_PATTERN.matcher(content);
String jsonStr = "{}";
if(matcher.find())
{
jsonStr = matcher.group(1);
}
return jsonStr;
}
private static String toAmazonUrl(String asin) {
return "http://www.amazon.com/product/dp/"+asin+"/";
}
// Java pojo class for mapped from JSon object.
public static class AmazonImageUrl{
private String hiRes;
private String large;
public String getHiRes() {
return hiRes;
}
public void setHiRes(String hiRes) {
this.hiRes = hiRes;
}
public String getLarge() {
return large;
}
public void setLarge(String large) {
this.large = large;
}
@Override
public String toString() {
return hiRes;
}
}
// trivial utility methods....
// you can use java commons library etc...
private static String getStringContentsFromURL(String u, String charset) throws URISyntaxException, IOException {
URL url = new URL(u);
HttpURLConnection connection = null;
try{
connection = (HttpURLConnection)url.openConnection();
return toString(connection.getInputStream(), charset);
}
finally{
if(connection != null){
connection.disconnect();
}
}
}
private static String toString(InputStream is, String charsetName) throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
copy(is, baos, _1K_BYTES);
return baos.toString(charsetName);
}
private final static void copy(InputStream is, OutputStream os, int bufsize) throws IOException {
copy(is, os, new byte[bufsize]);
}
private final static void copy(InputStream is, OutputStream os, byte[] buffer)throws IOException {
try{
for (int bytes = 0 ;(bytes = is.read(buffer)) != -1; )
{
os.write(buffer, 0, bytes);
}
os.flush();
}finally{
closeQuietly(is);
}
}
}
The Usage of the AmazonImageService class is quite simple.
String asin = "XXXXXXXX"; Listimages = new AmazonImageService().getImages(asin);
コメント