12 Ağustos 2016 Cuma

Using HtmlAgilityPack to manipulate and read img src’s

HtmlAgilityPack is a cool library that allows developers to download an html document from the web and read it’s content, it also supports linq to objects.
If you are going to use this library I suggest you download it directly from Nuget.
Lets make an example of how you can get all image nodes with agility pack…
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
static void Main(string[] args)
{
    //WebClient to download page
    var client = new WebClient();
    string html = client.DownloadString("http://www.christianjvella.com/wordpress/");
    //load string to AgilityPack document
    var document = new HtmlDocument();
    document.LoadHtml(html);
    //document.DocumentNode.SelectNodes("//img") - this will return all nodes which are <img /> including their attributes
    foreach (var image in document.DocumentNode.SelectNodes("//img"))
    {
 
        var src = image.GetAttributeValue("src", null);
        var altText = image.GetAttributeValue("alt", null);
 
        //if you want you can save the image to a stream or to a file
        //you can also manipulate the attributes - in this case I am changing the src
        image.SetAttributeValue("src", "newimageuri.com/image.jpg"); //set the src to point to the new src
    }
    //write the new html
    Console.Write(document.DocumentNode.OuterHtml);
    Console.ReadLine();
}
In this C# example we are creating a WebClient to download the HTML as Text/String, then we are loading the text into an agility document and we are getting the “img” nodes and doing something with them
If you want to download the Image Stream or file use one of these methods and call it after you get the src, sometimes we need to add the Website Url to the src in order to download it.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
private static void SaveImageToDrive(string src)
{
    const string localFilename = @"c:\file.jpg";
    using (var client = new WebClient())
    {
        client.DownloadFile(src, localFilename);
    }
}
 
public static Stream SaveImageToStream(string src)
{
    var client = new WebClient();
    var stream = client.OpenRead(src);
    return stream;
}