Saturday 7 June 2008

Remove HTML from strings in C#.NET

If you want to remove tags such as HTML or XML (basically anything between < and >) from a string, use this regular expression to replace them with another value (nothing in this example):

string regularExpresion = "<[^<>]*>";

string body = "<html>this<br />is<br />a<br />test</html>";

Regex regex = new Regex(regularExpresion, RegexOptions.IgnoreCase | RegexOptions.Multiline);

string bodyNoHTML = regex.Replace(body, "").Trim();

Input:
<html>1<br />2<br />2<br />4</html>

Output:
1234

1 comment:

Anonymous said...

Greeat job!!! I just used this very successfully. thanks