Regex for Stripping a Cascading Style Sheet File Down to Its Selectors

by Andrew Barber 14. September 2010 13:30

I am going to be creating a Text Template Transformation Toolkit (T4) Template to generate code constants in Visual Studio Web projects for the classes that are in the CSS files contained in that project. While Visual Studio already has Intellisense support for CSS classes, there are two cases where it does not appear: In user controls and code files. It's a fairly simple T4 task, which I intended for getting my feet wet. But this post isn't about that...

Using .NET Regular Expressions to Parse the Selectors

The T4 will need to parse the CSS files to find the selectors. My first thought was that it would be easy; a simple Regex replacement - this pattern replaced with the empty string: (all code here is C#)

@"({.*?})"

That replaces everything between curly braces (including the braces). Note the use of the lazy quantifier *? without which, the regex would capture and replace everything from the first open curly brace to the last. Not what we want! But then, what about comments?

@"(/\*.*?\*/)"

I strip the comments out first, because there could be curly braces inside the comments. That gets me thinking... where else could curly braces appear? Two possible places come to mind, and I'll do replacements for those as well; after the comments but before the main replacement.

  • Within content properties' values, enclosed in quotes: @"("".*?"")"
  • Within the parenthesis enclosing URL values: @"(\(.*?\))"

I do each of these as individual Regex replacements on the content of the CSS file. All of these are done with the RegexOptions.SingleLine option set on the Regexes, so that line breaks have no effect on the results. At this point, what I'm left with is fairly easy to parse to get class selectors only. I'll be sharing my T4 Template when it's complete, so I'll cover that there.

Why Not Combine All This Into a Single Regex?

I'm sure there's a way to write a single Regex here to capture the class selectors. But the almost completely free-form nature of the content properties and comment sections, as well as the somewhat more flexible nature of URL values (curly braces can exist in URLs, though I prefer to avoid them), this just struck me as a more straightforward and safe way to do it; removing what we know can not be included first.

Comments are closed

Links/Profile

Disclaimer
The opinions expressed herein are my own personal opinions and do not represent those of my partners, clients or contractors in any way.

© Copyright 2012 AndrewBarber.com