C# – Email Regular Expression
I wrote a regex for email that is gets the best results of any I have found online. Along with getting better results, it is shorter too.
Download the C# project with unit tests here: EmailRegEx on GitHub
The pattern of an email is described as follows:
- It will always have a single @ sign
- 1 to 64 characters before the @ sign called the local-part. Can contain characters a–z, A–Z, 0-9, ! # $ % & ‘ * + – / = ? ^ _ ` { | } ~, and . if it is not at the first or end of the local-part.
- Some characters after the @ sign that have a pattern as follows called the domain.
- It will always have a period “.”.
- One or more character before the period.
- Two to four characters after the period.
So a simple patterns of an email address should be something like these:
- This one just makes sure there are characters before and after the @
.+@.+ - This one makes sure the are characters before and after the @ as well as a character before and after the . in the domain.
.+@.*+\..+ - This one makes sure that there is only one @ symbol.
[^@]+@[^@]+\.
These are all quick an easy examples and will not work in every instance but are usually accurate enough for casual programs.
But a comprehensive example is much more complex.
- I wrote one myself that is the shortest and gets the best results of any I have found:
^[\w!#$%&'*+\-/=?\^_`{|}~]+(\.[\w!#$%&'*+\-/=?\^_`{|}~]+)*@((([\-\w]+\.)+[a-zA-Z]{2,4})|(([0-9]{1,3}\.){3}[0-9]{1,3}))\z
- Here is another complex one I found: [reference]
^(([^<>()[\]\\.,;:\s@\""]+(\.[^<>()[\]\\.,;:\s@\""]+)*)|(\"".+\""))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$
So let me explain the first one that I wrote as it passes my unit tests below:
The start | |
[\w!#$%&’*+\-/=?\^_`{|}~]+ | At least one valid local-part character not including a period. |
(\.[\w!#$%&’*+\-/=?\^_`{|}~]+)* | Any number (including zero) of a group that starts with a single period and has at least one valid local-part character after the period. |
@ | The @ character |
( | Start group 1 |
( | Start group 2 |
([\-\w]+\.)+ | At least one group of at least one valid word character or hyphen followed by a period. The attached project has a more complex hostname regex option too. |
[\w]{2,4} | Any two to four valid top level domain characters. |
) | End group 2 |
| | an OR statement |
( | Start group 3 |
([0-9]{1,3}\.){3}[0-9]{1,3} | A regular expression for an IP Address. The attached project has a more complex IP regex example too. |
) | End group 3 |
) | End group 1 |
\z | No end of line: \r or \n. |
Code for the Email Regular Expression
Here is code for both examples. My email regular expression is enabled and the one I found on line is commented out. To see how they work differently, just comment out mine, and uncomment the one I found online.
using System; using System.Collections.Generic; using System.Text.RegularExpressions; namespace RegularExpressionsTest { class Program { static void Main(string[] args) { String theEmailPattern = @"^[\w!#$%&'*+\-/=?\^_`{|}~]+(\.[\w!#$%&'*+\-/=?\^_`{|}~]+)*" + "@" + @"((([\-\w]+\.)+[a-zA-Z]{2,4})|(([0-9]{1,3}\.){3}[0-9]{1,3}))\z"; // The string pattern from here doesn't not work in all instances. // http://www.cambiaresearch.com/c4/bf974b23-484b-41c3-b331-0bd8121d5177/Parsing-Email-Addresses-with-Regular-Expressions.aspx //String theEmailPattern = @"^(([^<>()[\]\\.,;:\s@\""]+(\.[^<>()[\]\\.,;:\s@\""]+)*)|(\"".+\""))" // + "@" // + @"((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])" // + "|" // + @"(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$"; Console.WriteLine("Bad emails"); foreach (String email in GetBadEmails()) { Log(Regex.IsMatch(email, theEmailPattern)); } Console.WriteLine("Good emails"); foreach (String email in GetGoodEmails()) { Log(Regex.IsMatch(email, theEmailPattern)); } } private static void Log(bool inValue) { if (inValue) { Console.WriteLine("It matches the pattern"); } else { Console.WriteLine("It doesn't match the pattern"); } } private static List<String> GetBadEmails() { List<String> emails = new List<String>(); emails.Add("joe"); // should fail emails.Add("joe@home"); // should fail emails.Add("a@b.c"); // should fail because .c is only one character but must be 2-4 characters emails.Add("joe-bob[at]home.com"); // should fail because [at] is not valid emails.Add("joe@his.home.place"); // should fail because place is 5 characters but must be 2-4 characters emails.Add("joe.@bob.com"); // should fail because there is a dot at the end of the local-part emails.Add(".joe@bob.com"); // should fail because there is a dot at the beginning of the local-part emails.Add("john..doe@bob.com"); // should fail because there are two dots in the local-part emails.Add("john.doe@bob..com"); // should fail because there are two dots in the domain emails.Add("joe<>bob@bob.com"); // should fail because <> are not valid emails.Add("joe@his.home.com."); // should fail because it can't end with a period emails.Add("john.doe@bob-.com"); // should fail because there is a dash at the start of a domain part emails.Add("john.doe@-bob.com"); // should fail because there is a dash at the end of a domain part emails.Add("a@10.1.100.1a"); // Should fail because of the extra character emails.Add("joe<>bob@bob.com\n"); // should fail because it end with \n emails.Add("joe<>bob@bob.com\r"); // should fail because it ends with \r return emails; } private static List<String> GetGoodEmails() { List<String> emails = new List<String>(); emails.Add("joe@home.org"); emails.Add("joe@joebob.name"); emails.Add("joe&bob@bob.com"); emails.Add("~joe@bob.com"); emails.Add("joe$@bob.com"); emails.Add("joe+bob@bob.com"); emails.Add("o'reilly@there.com"); emails.Add("joe@home.com"); emails.Add("joe.bob@home.com"); emails.Add("joe@his.home.com"); emails.Add("a@abc.org"); emails.Add("a@abc-xyz.org"); emails.Add("a@192.168.0.1"); emails.Add("a@10.1.100.1"); return emails; } } }
Well, now you have the best C# Email Regular Expression out there.
Update: My attached project has an even better and more accurate one now too.
(Reference: wikipedia)
Spot on with this write-up, I seriously feel this site
needs much more attention. I'll probably be back again to
see more, thanks for the info!
[…] 자세한 내용은 여기를 참조하십시오 : C # – Email Regular Expression […]
joe@home is a valid email, not invalid, could you fix it?
See this question on Stack Overflow: https://stackoverflow.com/questions/21810464/why-is-this-angular-email-validation-valid
What is your use case? While, yes, joe@home is RFC valid, it is a special case that we don't want to allow to pass. Having this fail is desired.
While joe@home may be valid in highly localized environment, it is invalid 99.99999% of time on the internet.
On the internet, an email without a tld is a bad email. For internet and remote environments, which requires URL registration and dns which means you must have a format like: {domain}.{tld}.
You want me to support the .00001% instead of supporting the %99.99999.
However, you are more than welcome to update the regex yourself. You can change this part.
To something like this: (Untested)
public static string ComplexEmailPattern4 = "..." ; I would add "readonly" there
Hi
it doesnot work fine when i enter xxx@xxx.xxxxxxxxxxxxxxx
when domain name exceeds 4 characters, this doesnot work.
Can someone fix this?
It used to be that all TLDs were 1-3 characters. However, that has recently changed.
Here is a list of valid TLDs
http://data.iana.org/TLD/tlds-alpha-by-domain.txt
Using regex to validate against this list would be a bad idea. Here is a new regex that is the most complete and passes all RFC rules.
Then after that, I would have a following up check where your code checks that the email ends with a TLD contained in this list: http://data.iana.org/TLD/tlds-alpha-by-domain.txt
tested...i works fine but only one issue
regular expression not working for match test@-test.com or test@test-.com
Are you sure? Neither test-.com or -test.com are valid. I have specific unit tests for both of those being invalid and the unit tests are passing.
From RFC952
The last character must not be a minus sign or period.
The first character must be an alpha character.
From RFC 1123
One aspect of host name syntax is hereby changed: the restriction on the first character is relaxed to allow either a letter or a digit.
If a newline character is at the end of the address it passes as valid even against the most complex regex when it should not I guess. Such as "test@test.com\n"
Interesting...
\r works
\r\n works
\n doesn't
I am reading about it here:
http://stackoverflow.com/questions/988951/net-regex-and-newline
Looks like if I replace the very last $ with \z (lowercase z) it works.
Thank you, the shortest and still accurate one I have come across so far.
[...] See updates here: C# – Email Regular Expression [...]
I added a project with Unit tests and fixed a bug or two.
In the project, in the constructor of the EmailValidator object, just set the Pattern to the desired static email regular expression.