I am a danish programmer living in Bangkok.
Read more about me @ rasmus.rummel.dk.
Webmodelling Home > ASP.NET > C# Utility Functions > String Strip Tags

String Strip Tags

References :

Usage

  • Example
    • I use StripTags in my RichTextBox WebControl to get the text value (eg. counting characters) of the whole html input.
  • Example Code
    • string myHtml = "<table><tr><td>my table</td></tr></table>";
      string myText = Utils.String.StripTags(myHtml);
      

The StripTags function :

public static string StripTags(string pTaggedText)
{
	return StripTags(pTaggedText, new string[] { });
}
public static string StripTags(string pTaggedText, string[] pTagsToStrip)
{
	if (pTagsToStrip.Length == 0) //strip all tags
	{
		Regex rx = new Regex("<[^>]+>");
		string resultText = rx.Replace(pTaggedText, "");
 
		return resultText;
	}
	else //strip only specified tags
	{
		string tagsToStrip = "";
		for (int s = 0; s < pTagsToStrip.Length; s++)
		{
			if (s > 0) { tagsToStrip += "|"; }
			tagsToStrip += pTagsToStrip[s];
		}
		Regex rx = new Regex("</?(?i:" + tagsToStrip + ")([^>]*>");
		string resultText = rx.Replace(pTaggedText, "");
 
		return resultText;
	}
}

I have not worked enough on the above function, that I am confident it is reliable. I would very much appreciate comments on how to improve the function. One way to improve the function is to improve the regular expression used, currently I have these in mind :

  • <[^>]+> : the one I am using because the non-gready match is logically builtin.
  • <(.|\n)*?> : secures that tags spanning multiple lines are matched (also secures non-gready matching using "*?").
  • </?(?i:script|embed|iframe)([^>])*> : the principle I am using for removing selected tags.

Comments

You can comment without logging in
 
 B  U  I  S 
Words: Chars: Chars left: 
 Captcha 
 Nickname
Facebook