Software Developer's Resources

Shopping Cart Systems
Mobile Text Marketing Solutions
Online Backup Solutions
ASP.NET Web Development
Skip Navigation Links.

ASP.NET: How to Debug Regular Expressions

This article gives several tips on how to debug regular expressions in ASP.NET applications.

Debugging long and complex regular expressions can be very challenging and time consuming. Using these tips should increase your effectiveness and shorten your debugging time.

Chunk Long Regular Expressions into Short Ones

Where is the bug in your regular expression? It can be difficult to locate bugs in long complex regular expressions.

Bad Example

public void Parse(string Input)
{
    string Pattern;   
    
    Pattern = "January|...|December [0-9]{1,2}, [0-9]{4} at ([0-9]{1,2}):([0-9]{2}) (AM|PM)";
    
    MatchObj = Regex.Match(Input, Pattern);
}

Solution

Chunk long regular expressions into short regular expressions.

Start debugging by including only the first pattern and commenting out the remaining patterns. Once you have the first pattern debugged and working add the next pattern.

Better Example

public void Parse(string Input)
{
    string Pattern;
    string PatternAmPm;
    string PatternDay;
    string PattermHours;
    string PatternMinutes;
    string PatternMonth;
    string PatternSeconds;
    string PatternYear;
    
    PatternMonth = "January|...|December";
    PatternDay = "[0-9]{1,2}", ";
    PatternYear = "[0-9]{4} at ";
    PatternHours = "([0-9]{1,2}):";
    PatternMinutes = "([0-9]{2}):";
    PatternSeconds = "([0-9]{2})";
    PatternAmPm = "(AM|PM)";
    
    Pattern = PatternMonth + PatternDay + PatternYear;
    Pattern += PatternHours + PatternMinutes + PatternSeconds + PatternAmPm;
    
    MatchObj = Regex.Match(MatchObj, Pattern);
}

Dump Match Object

The Regex.Match method returns a Match object containing the parsed elements in the Groups and Captures collections. You need to view the contents of these collections to determine whether the regular expression correctly parsed the input string. However, the Visual Studio (2005) Local debug window does not display the contents of the Groups and Captures collections.

Solution

Create a Dump method whichs accepts an Regex Match object as an input parameter, walks through the Groups and Captures collections, and gets the desired data values.

public void Dump(Match MatchObj)
{
    int c;
    int g;
    string Value;
    
    for (int g = 0; g < MatchObj.Groups.Count; g++)
    {
        GroupObj = MatchObj.Groups[g];
        for (int c = 0; c < GroupObj.Captures.Count; c++)
        {
            CaptureObj = GroupObj.Captures[c];
            Value = CaptureObj.Value;        
        } // <-- Set breakpoint here.
    }
}

Call the Dump method and pass the Match object immediately after calling Regex.Match:

public bool Parse(string Input)
{
    Match MatchObj;
    string Pattern = "[0-9]*";
    
    MatchObj = Regex.Match(Input, Pattern);
    Dump(MatchObj);
}

Set a breakpoint in the Dump method immediately after getting the Capture value (as noted by the <-- in the Dump method).

Start your application in debug mode. When the debugger stops at the breakpoint view Value in the locals window. Repeat executing the loop until you've walked all through the entire collection.

Additional Suggestions

You may want to update the Dump method to write the match information to your trace output stream.

Capture Non-Relevant Substrings

Many times you want only a few pieces of data from a string. The remaining items are not required for your application. The usual approach is to write a regular expression to match the relevant and non-relevant substrings but only capture the relevant substring using the "()" capture expressions. Should Regex.Match fail matching on the non-relevant substrings you don't know where the failure ocurred.

Bad Example

Suppose we want to parse the time from a string like:
January 15, 2008 at 12:43:04 PM

We could create a regular expression to capture only the time at the end of the string and not capture the date components at the beginning of the string like:

    string Pattern;
    string Pattern1;
    string Pattern2;
    
    Pattern1 = "January|...|December";
    Pattern2 = "[0-9]{1,2}, [0-9]{4} at ([0-9]{1,2}):([0-9]{2}):([0-9]{2}) (AM|PM)";
    Pattern = Pattern1 + Pattern2;
    
    MatchObj = Regex.Match(MatchObj, Pattern);
    

Solution

The solution is to capture non-relevant substrings. By doing so you are able to view in the debugger what substrings Regex.Match was able to match and where it stopped in the matching process.

Better Example

We've added capture expressions on the month names, day of month, and year regular expressions sub elements:

    string Pattern;
    string Pattern1;
    string Pattern2;
    
    Pattern1 = "(January|...|December)";
    Pattern2 = "([0-9]{1,2}), ([0-9]{4}) at ([0-9]{1,2}):([0-9]{2}):([0-9]{2}) (AM|PM)";
    Pattern = Pattern1 + Pattern2;
    
    MatchObj = Regex.Match(MatchObj);
}

One could argue that adding capture expressions around the delimiters might be useful too.

Use Regex.Match not Regex.IsMatch

Regex.IsMatch gives you a pass/fail on whether the input string matched the pattern. This general result is good once you have created and tested the regular expression. However, when you are debugging the regular expression you don't know what part of the regular expression failed to match a good input string.

Bad Example

    bool Status;
    
    Status = Regex.IsMatch(Input, Pattern);
    

Solution

Use Regex.Match instead of Regex.IsMatch and use the Dump solution given above to view the captures.

Better Example

    Match MatchObj;
    
    MatchObj = Regex.Match(Input, Pattern);
    Dump(MatchObj);

Use Named Indexes For Groups and Captures

As you add or remove subcomponents to your regular expression the index position of the captured strings will change in the Groups and Captures collections. If you use numerical indices you'll spend a lot of wasted time updating the indices.

A second problem: You can't tell from the numerical indices what the associated value is. Is "2" the "Month", or the "Year"?

Bad Example

Suppose we are parsing a date string like "05/23/03":

public DateTime DateParse(string Input)
{
    string Day;
    Match MatchObj;
    string Month;
    string Year;
    
    MatchObj = new Regex.Match(Input, Pattern);
    Month = MatchObj.Groups[1].Captures[0].Value;
    Day = MatchObj.Groups[2].Captures[0].Value;
    Year = MatchObj.Groups[3].Captures[0].Value;           
}

If we update the regular expression to parse additional substrings at the beginning of the input string we'll need to update the Group collection indices.

Solution

The solution is to use named indexes, such as, variables to index into the Groups collection.

Better Example

    private int m_Day = 1;
    private int m_Month = 2;
    private int m_Year = 3;
            
public DateTime DateParse(string Input)
{
    string Day;
    Match MatchObj;
    string Month;
    string Year;
    
    MatchObj = new Regex.Match(Input, Pattern);
    Month = MatchObj.Groups[m_Day].Captures[0].Value;
    Day = MatchObj.Groups[m_Month].Captures[0].Value;
    Year = MatchObj.Groups[m_Year].Captures[0].Value;           
}

For a short regular expression and a small number of substring captures as shown in our example this tip may not be valuable. This tip becomes very valuable in cases where a long complex regular expression is used.

Check Count or Success

What if the match fails? Do we assume the input string will be correctly formatted and Regex.Match will always succeed?

Bad practice. We should always assume the input string may contain invalid data and Regex.Match can fail.

Bad Example

Here we assume the match succeeded and we grab the expected values from the Groups and Captures collections.

    private int m_Day = 1;
    private int m_Month = 2;
    private int m_Year = 3;
            
public DateTime DateParse(string Input)
{
    string Day;
    Match MatchObj;
    string Month;
    string Year;
    
    MatchObj = new Regex.Match(Input, Pattern);
    
    Month = MatchObj.Groups[m_Day].Captures[0].Value;
    Day = MatchObj.Groups[m_Month].Captures[0].Value;
    Year = MatchObj.Groups[m_Year].Captures[0].Value;           
}

Solution

Check the Match.Success or the Groups count.

Match.Success tells whether the match passed or failed with a boolean value.

Match.Groups.Count includes one (1) for the entire string and a count of the captures.

Better Example

    private int m_Day = 1;
    private int m_Month = 2;
    private int m_Year = 3;
            
public bool DateParse(string Input, ref DateTime Out)
{
    string Day;
    int DayAsInt;
    Match MatchObj;
    string Month;
    int MonthAsInt;
    DateTime Out;
    string Year;
    int YearAsInt;
    
    MatchObj = new Regex.Match(Input, Pattern);
    if( !MatchObj.Success) return false;
    
    // Or, we can check Groups.Count
    if( MatchObj.Groups.Count != 4) false;
    
    Month = MatchObj.Groups[m_Day].Captures[0].Value;
    Day = MatchObj.Groups[m_Month].Captures[0].Value;
    Year = MatchObj.Groups[m_Year].Captures[0].Value;
    
    MonthAsInt = Convert.ToInt32(Month);
    DayAsInt = Convert.ToInt32(Day);
    YearhAsInt = Convert.ToInt32(Year);
            
    Out = new DateTime(YearAsInt, MonthAsInt, DayAsInt);
    
    return true;           
}

Ignore Case

Most of the time (in my experience) we want to capture the substrings from the input string without regard to the case of the characters.

If you call Regex.Match, as shown in the Bad Example, below the default behavior is to match the characters exactly. In this case the input string is lowercase characterer and the pattern specifies uppercase characters, so the match fails.

Also, note RegexOptions is not specified in the call to Regex.Match. The match still fails.

Bad Example

Where Input = "abc";

public bool Parse(string Input)
{
    Match MatchObj;
    
    MatchObj = Regex.Match(Input, "[A-Z]{3}");
}

Solution

Use RegexOptions.IgnoreCase or use lowercase characters in your regular expression pattern.

Better Example

public bool Parse(string Input)
{
    Match MatchObj;
    
    MatchObj = Regex.Match(Input, "[A-Z]{3}", RegexOptions.IgnoreCase);
    
    // Or do this:
    MatchObj = Regex.Match(Input, "[a-zA-Z]{3}");    
}

Use a Test Framework

The input string to your parsing method can have potentially hundreds of variations. Can your regular expression parse these variations successfully?

Solution

Use a test framework, such as, MbUnit to test all potential variations.