Here is an code snippet for extracting image src from html.
The example usage is below.
In this example, you should only prepare HttpUtils.getStringContentsFromURL method, which is getting html from given url, for your self.
private static final Pattern IMG_SRC_PATTERN = Pattern.compile("<img\\s+.*src\\s*=\\s*('|\")(.+?)\\1.+?>"); public static List<String> extractImgSrces(final String content) { List<String> list = new ArrayList<>(); final Matcher matcher = IMG_SRC_PATTERN.matcher(content); while(matcher.find()){ list.add(matcher.group(2)); } return list; }
The example usage is below.
In this example, you should only prepare HttpUtils.getStringContentsFromURL method, which is getting html from given url, for your self.
public static void main(String[] args) throws URISyntaxException, IOException { extractImgSrces(HttpUtils.getStringContentsFromURL("http://www.google.com/", "utf-8")).stream().forEach(System.out::println); }
コメント