Regular Expressions in C# (including a new comprehensive email pattern)

June 15, 2010, 10:00 pm by Rhyous

Of course C# supports regular expressions. I happen to have learned regular expressions in my dealings with FreeBSD, shell scripting, php, and other open source work. So naturally I would want to add this as a skill as I develop in C#.

What is a Regular Expression?

This is a method in code or script to describe the format or pattern of a string. For example, look at an email address:

someuser@somedomain.tld

It is important to understand that we are not trying to compare the email string against another string, we are trying to compare the string against a pattern.

To verify the email was in the correct format using String functions, it would take dozens of different functions running one after another. However, with a regular expression, a proper email address can be verified in one single function.

So instead regular expression is a language, almost like a scripting language in itself, for defining character patterns.

Most characters represent themselves. However, some characters don’t represent themselves without escaping them with a backslash because they represent something else. Here is a table of those characters.

Expression	Meaning
*	Any number of the previous character or character group.
+	One of more of the previous character or character group.
^	Beginning of line or string.
$	End of line or string.
?	Pretty much any single character.
.	Pretty much any character, zero characters, one character, or any number of characters
[ … ]	This forms a character class expression
( … )	This forms a group of items

You should look up more regular expression rules. I don’t explain them all here. This is just to give you an idea.

Example 1 – Parameter=Value

Here is a quick example of a regular expression that matches String=String. At first you might think this is easy and you can use this expression:

.*=.*

While that might work, it is very open. And it allows for zero characters before and after the equals, which should not be allowed.

This next pattern is at least correct but still very open.

.+=.+

What if the first value is limited to only alphanumeric characters?

[a-zA-z0-9]=.+

What if the second value has to be a valid windows file path or URL? And we will make sure we cover start to finish as well.

^[0-9a-zA-Z]+=[^<>|?*\”]+$

See how the more restrictions you put in place, the more complex the expression gets?

Example 2 – The email address

The pattern of an email is as follows: (Reference: wikipedia)

See updates here: C# – Email Regular Expression

It will always have a single @ sign
1 to 64 characters before the @ sign called the local-part. Can contain characters a–z, A–Z, 0-9, ! # $ % & ‘ * + – / = ? ^ _ ` { | } ~, and . if it is not at the first or end of the local-part.
Some characters after the @ sign that have a pattern as follows called the domain.
1. It will always have a period “.”.
2. One or more character before the period.
3. Two to four characters after the period.

So a simple patterns of an email address should be something like these:

This one just makes sure there are characters before and after the @
.+@.+
This one makes sure the are characters before and after the @ as well as a character before and after the . in the domain.
.+@.*+\..+
This one makes sure that there is only one @ symbol.
[^@]+@[^@]+\.

This are all quick an easy examples and will not work in every instance but are usually accurate enough for casual programs.

But a comprehensive example is much more complex.

I wrote one myself that is the shortest and gets the best results of any I have found:

^[\w!#$%&'*+\-/=?\^_`{|}~]+(\.[\w!#$%&'*+\-/=?\^_`{|}~]+)*@((([\-\w]+\.)+[a-zA-Z]{2,4})|(([0-9]{1,3}\.){3}[0-9]{1,3}))$

Here is another complex one I found: [reference]

^(([^<>()[\]\\.,;:\s@\""]+(\.[^<>()[\]\\.,;:\s@\""]+)*)|(\"".+\""))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$

So let me explain the first one that I wrote as it passes my unit tests below:

The start
[\w!#$%&’*+\-/=?\^_`{\|}~]+	At least one valid local-part character not including a period.
(\.[\w!#$%&’+\-/=?\^_`{\|}~]+)	Any number (including zero) of a group that starts with a single period and has at least one valid local-part character after the period.
@	The @ character
(	Start group 1
(	Start group 2
([\-\w]+\.)+	At least one group of at least one valid word character or hyphen followed by a period
[\w]{2,4}	Any two to four valid top level domain characters.
)	End group 2
\|	an OR statement
(	Start group 3
([0-9]{1,3}\.){3}[0-9]{1,3}	A regular expression for an IP Address.
)	End group 3
)	End group 1

Code for both examples

Here is code for both examples. My email regular expression is enabled and the one I found on line is commented out. To see how they work differently, just comment out mine, and uncomment the one I found online.

using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

namespace RegularExpressionsTest
{
    class Program
    {
        static void Main(string[] args)
        {
            // Example 1 - Parameter=value
            // Match any character before and after the =
            // String thePattern = @"^.+=.+$";

            // Match only Upper and Lowercase letters and numbers before
            // the = as a parameter name and after the equal match the
            // any character that is allowed in a file's full path
            //
            // ^[0-9a-zA-Z]+    This is any number characters upper or lower
            //                  case or 0 thru 9 at the string's beginning.
            //
            // =                Matches the = character exactly
            //
            // [^<>|?*\"]+$     This is any character except < > | ? * "
            //                  as they are not valid in a file path or URL

            String theNameEqualsValue = @"abcd=http://";

            String theParameterEqualsValuePattern = "^[0-9a-zA-Z]+=[^<>|?*\"]+$";
            bool isParameterEqualsValueMatch = Regex.IsMatch(theNameEqualsValue, theParameterEqualsValuePattern);
            Log(isParameterEqualsValueMatch);

            // Example 2 - Email address formats

            String theEmailPattern = @"^[\w!#$%&'*+\-/=?\^_`{|}~]+(\.[\w!#$%&'*+\-/=?\^_`{|}~]+)*"
                                   + "@"
                                   + @"((([\-\w]+\.)+[a-zA-Z]{2,4})|(([0-9]{1,3}\.){3}[0-9]{1,3}))$";

            // The string pattern from here doesn't not work in all instances.
            // http://www.cambiaresearch.com/c4/bf974b23-484b-41c3-b331-0bd8121d5177/Parsing-Email-Addresses-with-Regular-Expressions.aspx
            //String theEmailPattern = @"^(([^<>()[\]\\.,;:\s@\""]+(\.[^<>()[\]\\.,;:\s@\""]+)*)|(\"".+\""))"
            //                       + "@"
            //                       + @"((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])"
            //                       + "|"
            //                       + @"(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$";

            Console.WriteLine("Bad emails");
            foreach (String email in GetBadEmails())
            {
                Log(Regex.IsMatch(email, theEmailPattern));
            }

            Console.WriteLine("Good emails");
            foreach (String email in GetGoodEmails())
            {
                Log(Regex.IsMatch(email, theEmailPattern));
            }
        }

        private static void Log(bool inValue)
        {
            if (inValue)
            {
                Console.WriteLine("It matches the pattern");
            }
            else
            {
                Console.WriteLine("It doesn't match the pattern");
            }
        }

        private static List GetBadEmails()
        {
            List emails = new List();
            emails.Add("joe"); // should fail
            emails.Add("joe@home"); // should fail
            emails.Add("a@b.c"); // should fail because .c is only one character but must be 2-4 characters
            emails.Add("joe-bob[at]home.com"); // should fail because [at] is not valid
            emails.Add("joe@his.home.place"); // should fail because place is 5 characters but must be 2-4 characters
            emails.Add("joe.@bob.com"); // should fail because there is a dot at the end of the local-part
            emails.Add(".joe@bob.com"); // should fail because there is a dot at the beginning of the local-part
            emails.Add("john..doe@bob.com"); // should fail because there are two dots in the local-part
            emails.Add("john.doe@bob..com"); // should fail because there are two dots in the domain
            emails.Add("joe<>bob@bob.come"); // should fail because <> are not valid
            emails.Add("joe@his.home.com."); // should fail because it can't end with a period
            emails.Add("a@10.1.100.1a");  // Should fail because of the extra character
            return emails;
        }

        private static List GetGoodEmails()
        {
            List emails = new List();
            emails.Add("joe@home.org");
            emails.Add("joe@joebob.name");
            emails.Add("joe&bob@bob.com");
            emails.Add("~joe@bob.com");
            emails.Add("joe$@bob.com");
            emails.Add("joe+bob@bob.com");
            emails.Add("o'reilly@there.com");
            emails.Add("joe@home.com");
            emails.Add("joe.bob@home.com");
            emails.Add("joe@his.home.com");
            emails.Add("a@abc.org");
            emails.Add("a@192.168.0.1");
            emails.Add("a@10.1.100.1");
            return emails;
        }
    }
}

Tags: c#, email, Regex, Regular Expressions
Category: csharp | Comment (RSS) | Trackback

22 Comments

pelo quemado por secador says:

February 4, 2019 at 6:47 am

tratamiento para cabello quemado por plancha

Rhyous

Reply to this comment
champu pelo graso says:

January 21, 2019 at 6:52 pm

mejores shampoos para cabello graso

Rhyous

Reply to this comment
como posicionar mi canal says:

January 8, 2019 at 5:24 pm

primeras posiciones en youtube

Rhyous

Reply to this comment
Hannes says:

March 8, 2018 at 2:32 am

The second regex works like a charm. Thanks!

Reply to this comment
Evgeny says:

October 11, 2016 at 2:13 am

The e-mail pattern gives negative result for ivanov@gk-pik.ru although it is a valid e-mail address.

Reply to this comment
- Evgeny says:
  
  October 11, 2016 at 2:39 am
  
  Sorry for the previous comment. The pattern works. I had extra white spaces at the end of the email address and this is why it was giving negative result.
  
  Reply to this comment
Rhyous says:

October 11, 2012 at 4:10 pm

I am going to have to watch for the new Top Level Domains (TLDs) as they may have TLDs with more than four characters coming in 2013.
http://newgtlds.icann.org/en/program-status/application-results/strings-1200utc-13jun12-en

Reply to this comment
NRK says:

May 29, 2012 at 12:28 pm

I am also using MSDN RegEx, it's working fine checking single-character domain.

Reply to this comment
- Rhyous says:
  
  May 29, 2012 at 3:53 pm
  
  Are you replying to Paul, because if so, he commented on the MSDN site and they updated it shortly after his comment to be more accurate.
  
  Reply to this comment
Igor says:

February 14, 2012 at 7:54 am

How come email like 'aпп@dgh.com' is passing validation?

Reply to this comment
- Rhyous says:
  
  February 20, 2012 at 4:18 am
  
  I don't understand, that email should work? What are you asking?
  
  Reply to this comment
  - Igor says:
    
    February 20, 2012 at 6:54 am
    
    But it contains characters from cyrillic alphabet (п). Shouldn't regex reject such email addresses as invalid, since email address should only contain characters a-z?
    
    I tried to use your regex and validate this email address, and it passes, though I expected it would fail. Not sure if this is filter problem or regex implementation problem (tried on Windows Phone 7).
    
    Reply to this comment
    - Rhyous says:
      
      March 4, 2012 at 12:21 pm
      
      Ahh...I didn't notice the characters where cyrillic.
      
      Reply to this comment
    - Rhyous says:
      
      April 12, 2012 at 2:22 pm
      
      RFC 6530 says unicode characters are allowed now.
      http://tools.ietf.org/html/rfc6530
      
      Reply to this comment
Quantbuff says:

February 10, 2012 at 1:39 pm

Domain names does not start with a hyphen so you regex may not invalidate that ?

Reply to this comment
- Quantbuff says:
  
  February 10, 2012 at 2:29 pm
  
  Also \w contains _ so you don't need to mention underscore again. Also this allows domain names to contain _ in your logic. ( if that is what you wanted )
  
  Reply to this comment
Will says:

December 8, 2011 at 10:20 am

Great Post. But I think there's one mistake in the validation of the top level domain portion of the email. As written, it limits the TLD to 1-3 characters. But, there are TLD's that are more than 3 characters (museum and info just to name two, for a full list see http://data.iana.org/TLD/tlds-alpha-by-domain.txt). As of now, I don't think there are any 1 character TLD's, but I'm not sure if this is limited by specification, or just custom. A safer test might be {2,}.

Reply to this comment
- Will says:
  
  December 8, 2011 at 10:23 am
  
  Sorry, let me revise that slightly: it's currently {2,4}, not {1,3} as I stated, but the overall comment still stands. I'd still recommend {2,} for the TLD check.
  
  Reply to this comment
MadQ says:

October 28, 2011 at 10:09 am

You can make your regular expression even shorter by using zero-width negative assertions to ensure that the email address does not begin with period, and that the local part does not end with a period:

^(?!\.)[\.\w!#$%&'*+\-/=?\^_`{|}~]{1,64}(?<!\.)@((([\-\w]+\.)+[a-zA-Z]{2,4})|(([0-9]{1,3}\.){3}[0-9]{1,3}))$

Note that this will also allow two or more consecutive periods in the local part, which your regular expression does not. I'm not sure if consecutive periods in the local part are valid, and I'm too lazy to look it up right now 😉

Reply to this comment
- Rhyous says:
  
  October 29, 2011 at 9:04 am
  
  Consecutive periods are not allowed. If starting with a period and ending with a period in the localpart are bad, I should definitely add these to the test.
  
  Oops...they are already there.
  
  Reply to this comment
Paul says:

September 10, 2011 at 4:22 pm

Thanks for posting this. I was using the one published by MSDN: (http://msdn.microsoft.com/en-us/library/01escwtf.aspx) which would not allow a single-character subdomain such as foo@a.com. Yours appears to work correctly.

Reply to this comment
- Rhyous says:
  
  October 29, 2011 at 9:05 am
  
  Glad this worked for you. I tried to write the best regular expression possible.
  
  Reply to this comment

Rhyous

Knight of the Code

Regular Expressions in C# (including a new comprehensive email pattern)

What is a Regular Expression?

Example 1 – Parameter=Value

Example 2 – The email address

Code for both examples

Like this:

Related

22 Comments

Leave a Reply

Are you a Jeek?

Categories

Recent Posts

My other blogs