Skip to content Skip to sidebar Skip to footer

Remove Chain Of Duplicate Elements With Html Agility Pack

I'm trying to remove any duplicate or more occurrences of any < br > tags in my html document. This is what I've come up with so far (really stupid code): HtmlNodeCollection

Solution 1:

I think you too complicated your solution, although the idea is seems to be correct, as I understand.

Suppose, it would be easier to find all the <br /> nodes first, and just remove those, whose previous sibling is <br /> node.

Let's start with the next example:

var html = @"<div>the first line<br /><br />the next one<br /></div>";
var doc = new HtmlDocument();
doc.LoadHtml(html);

now find <br /> nodes and remove the chain of duplicate elements:

var nodes = doc.DocumentNode.SelectNodes("//br").ToArray();
foreach (var node in nodes)
    if (node.PreviousSibling != null && node.PreviousSibling.Name == "br")
        node.Remove();

and get the result of it:

var output = doc.DocumentNode.OuterHtml;

it is:

<div>the first line<br>the next one<br></div>

Solution 2:

Maybe you can do this htmlsource = htmlSource.Replace("<br /><br />", <br />);

or maybe something like this

string html = "<br><br><br><br><br>";

    html = html.Replace("<br>", string.Empty);

    html = string.Format("{0}<br />", html);

    html = html.Replace(" ", string.Empty);
    html = html.Replace("\t", string.Empty);

Post a Comment for "Remove Chain Of Duplicate Elements With Html Agility Pack"