Recently I realized that Amazon offers high resolution photos in some products.
I have found the way to extract these hires images.
Note Note!: This method might be unavailable in the future because Amazon may not like this kind of hack or change the implementation :P
http://www.amazon.com/Silver-Violin-Nicola-Benedetti/dp/B008CYV046/
If you see the html source of this page, you can find the following json string.
If you would like to see the hi-resolution image, you should pick up the url which is defined in hiRes property.
I use Jackson for parsing JSon string.
The Usage of the AmazonImageService class is quite simple.
I have found the way to extract these hires images.
Note Note!: This method might be unavailable in the future because Amazon may not like this kind of hack or change the implementation :P
Basic Strategy
For example, this page:http://www.amazon.com/Silver-Violin-Nicola-Benedetti/dp/B008CYV046/
If you see the html source of this page, you can find the following json string.
var colorImages = {"initial":[{"large":"http://ecx.images-amazon.com/images/I/XXXXXXX.jpg",....It represents the urls of various size of images.
If you would like to see the hi-resolution image, you should pick up the url which is defined in hiRes property.
Source Code
You know I still like Java, I will show you the simple Java code.I use Jackson for parsing JSon string.
import java.io.IOException; import java.net.URISyntaxException; import java.util.ArrayList; import java.util.List; import java.util.Map; import java.util.regex.Matcher; import java.util.regex.Pattern; import com.dukesoftware.utils.common.AmazonImageInHtmlFinder.AmazonImageUrl; import com.fasterxml.jackson.core.JsonParseException; import com.fasterxml.jackson.core.type.TypeReference; import com.fasterxml.jackson.databind.DeserializationFeature; import com.fasterxml.jackson.databind.JsonMappingException; import com.fasterxml.jackson.databind.ObjectMapper; public class AmazonImageService { // pattern for finding JSon string which has hires image url. private static final Pattern AMAZON_HIRES_IMAGE_JSON_PATTERN = Pattern.compile("var\\s+colorImages\\s*=\\s*(\\{.+\\});"); // some cleanup pattern private static final Pattern PATTERN_REMOVE_AMAZON_SIZE = Pattern.compile("\\._[\\w0-9]{1,6}_"); private final ObjectMapper mapper; public AmazonImageService() { this(new ObjectMapper()); } public AmazonImageService(ObjectMapper mapper) { this.mapper = mapper; // a bit important part: Since I would like to minimize the pojo, I configure like this - ignore unknown properties. this.mapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false); } public List<AmazonImageUrl> getImages(String asin) throws URISyntaxException, IOException, JsonParseException, JsonMappingException { String url = toAmazonUrl(asin); String content = getStringContentsFromURL(url, "utf-8"); String jsonStr = extractJSon(content); Map<String, List<AmazonImageUrl>> map = mapper.readValue(jsonStr, new TypeReference<Map<String, List<AmazonImageUrl>>>() {}); List<AmazonImageUrl> resultUrls = extractImageUrls(map); cleanup(resultUrls); return resultUrls; } private static void cleanup(List<AmazonImageUrl> resultUrls) { for(AmazonImageUrl resultUrl : resultUrls) { if(resultUrl.getHiRes() != null) { resultUrl.setHiRes(PATTERN_REMOVE_AMAZON_SIZE.matcher(resultUrl.getHiRes()).replaceAll("").trim()); } } } private static List<AmazonImageUrl> extractImageUrls( Map<String, List<AmazonImageUrl>> map) { List<AmazonImageUrl> resultUrls = new ArrayList<AmazonImageUrl>(); for(List<AmazonImageUrl> imageDataList : map.values()) { for(AmazonImageUrl imageData : imageDataList) { if(imageData.getHiRes() != null || imageData.getLarge() != null) { resultUrls.add(imageData); } } } return resultUrls; } private static String extractJSon(String content) { Matcher matcher = AMAZON_HIRES_IMAGE_JSON_PATTERN.matcher(content); String jsonStr = "{}"; if(matcher.find()) { jsonStr = matcher.group(1); } return jsonStr; } private static String toAmazonUrl(String asin) { return "http://www.amazon.com/product/dp/"+asin+"/"; } // Java pojo class for mapped from JSon object. public static class AmazonImageUrl{ private String hiRes; private String large; public String getHiRes() { return hiRes; } public void setHiRes(String hiRes) { this.hiRes = hiRes; } public String getLarge() { return large; } public void setLarge(String large) { this.large = large; } @Override public String toString() { return hiRes; } } // trivial utility methods.... // you can use java commons library etc... private static String getStringContentsFromURL(String u, String charset) throws URISyntaxException, IOException { URL url = new URL(u); HttpURLConnection connection = null; try{ connection = (HttpURLConnection)url.openConnection(); return toString(connection.getInputStream(), charset); } finally{ if(connection != null){ connection.disconnect(); } } } private static String toString(InputStream is, String charsetName) throws IOException { ByteArrayOutputStream baos = new ByteArrayOutputStream(); copy(is, baos, _1K_BYTES); return baos.toString(charsetName); } private final static void copy(InputStream is, OutputStream os, int bufsize) throws IOException { copy(is, os, new byte[bufsize]); } private final static void copy(InputStream is, OutputStream os, byte[] buffer)throws IOException { try{ for (int bytes = 0 ;(bytes = is.read(buffer)) != -1; ) { os.write(buffer, 0, bytes); } os.flush(); }finally{ closeQuietly(is); } } }
The Usage of the AmazonImageService class is quite simple.
String asin = "XXXXXXXX"; Listimages = new AmazonImageService().getImages(asin);
コメント