 |
|
 |
pretty new to regex. this is a great article.
can someone tell me whats wrong with my expression? I'm using this from C#. I'm getting the response from a blog in a malformed xml format. Need to extract entries out of it. In a simple format inpu is similar to the following.
string input = @"<entry><id>tag:myblog.com try</entry><entry><id>tag:myblog.com tryagain</entry><entry><id>tag:myblog.com hello </enty>";
I need to identify the number of entries, and then processing each of them.
Regex blogsRegEx = new Regex(@"<entry><id>tag:myblog.*</entry>");
MatchCollection blogEntries = blogsRegEx.Matches(input);
I always get just 1 entry. It matches the whole thing instead of matching multiple strings in the pattern <entry<id>tag:myblog....</entry>.
Can someone help what am I missing here? do i need to use subexpressions here?
modified 29 Apr '12.
|
|
|
|
 |
|
 |
Not sure about the regex (It makes my brain hurt), but you can use linq instead
string findText = @"<entry><id>tag:myblog";
int entriesCount = blogsText.Count(t => t.equals(findText));
Have not tested it but it should give you the correct results by just tweaking your findText variable.
|
|
|
|
 |
|
 |
By default, regex matching is greedy. That is, wildcards will match the longest possible chunk of input, so you need to make your .* non-greedy. You do that by putting a ? after it, so your line becomes Regex blogsRegEx = new Regex(@"<entry><id>tag:myblog.*?</entry>");
If you are going to do ANYTHING nontrivial with regexes, get a copy of Expresso. (See our Free Tools forum for details.)
Cheers,
Peter
Software rusts. Simon Stephenson, ca 1994.
|
|
|
|
 |
|
 |
I try to get the date number format from a date format string. The date number format is - if present - either a single d or a double d. But the whole string can contain a place holder for the dayname also - ddd or dddd. How can I get the d or dd, if present?
I tried [^d](?<pattern>d{1,2})[^d] which works well with the standard German format (dddd, d. MMMM yyyy). But that fails when the daynumber comes first (say: d. dddd MMMM yyyy; here, ^(?<pattern>d{1,2})[^d] does the trick), and it fails when the daynumber comes last.
Actually I need something like a "either start of the string or not a d", i.e. ^|[^d] - but that does not work. How can that be solved?
|
|
|
|
 |
|
 |
Solved it with @"\b(?d{1,2})\b"
|
|
|
|
 |
|
 |
Are you looking for this?
string sql = @"database.schema.table
database.schema.[table]
database.[schema].table
[database].[schema].table
should all be transformed to:
[database].[schema].[table]
";
string pattern = @"(?:\[?(\w+)\]?)?\.\[?(\w+)\]?";
Func<Match, string> replace = m =>
(m.Groups[1].Success ? "[" + m.Groups[1].Value + "]" : "") + ".[" + m.Groups[2].Value + "]";
Console.WriteLine("{0}", Regex.Replace(sql, pattern, m=>replace(m)));
|
|
|
|
 |
|
 |
Does that work for names that contain SPACEs? my database.my schema.my table
And Excel worksheet names that include a dollar sign ($) at the end?
(I realize those were not listed in the original spec.)
|
|
|
|
 |
|
 |
You will run into problems here.
The problem arises with spaces in the name, since the following pattern:
...([\w\s]+)...
matches
delete my database
as well as
delete database
In this case I guess you don't get away without a parser (use the Regex for tokenizing, use the parser to detect all commands and translate the arguments where needed).
Any names without spaces get easily translated, though, e.g.:
string pattern = @"(?:\[?([\w\$]+)\]?)?\.\[?([\w\$]+)\]?";
And if you have optional spaces around "[" and ".", the following a bit more complicted regex will do:
...
string open = @"(?:\[\s*?)?";
string close = @"(?:\s*?\])?";
string ident = @"([\w\$]+)";
string prefix = @"(?:" + open + ident + close + @"\s*?)?";
string suffix = @"(?:" + open + ident + close + @")";
string pattern = prefix + @"\.\s*?" + suffix;
...
|
|
|
|
 |
|
 |
Andreas Gieriet wrote: matches
delete my database as well
as
delete database
I expect the string to contain only the database, schema, and table names.
|
|
|
|
 |
|
 |
The Regex sees a line like
aaa bbb ccc . ddd . eee fff
What part of aaa bbb ccc is the database name? Only ccc or bbb ccc, etc.? You see the problem?
The same for eee fff.
Non-escaped/non-wrapped spaces in names is guess work to make them wrapped into [...].
I.e. to get from aaa bbb ccc .... to aaa [bbb ccc] .... is rather difficult, unless you know what aaa means or you say from outside that bbb ccc is a single name.
Quite a challenge.
Cheers
Andi
|
|
|
|
 |
|
 |
That should result in
[aaa bbb ccc ].[ ddd ].[ eee fff]
|
|
|
|
 |
|
 |
The line
aaa bbb ccc.ddd eee ...
could be
ALTER TABLE dbo.tVersion ADD ...
which in your approach would result in
[ALTER TABLE dbo].[tVersion ADD] ...
Forget about spaces or get as input the individual names (db name, table name, etc.) or make a parser that detects all language constructs and their db, table, etc. positions...
I still think it's not worth the effort with names that contain spaces - too fragile.
Cheers
Andi
|
|
|
|
 |
|
 |
No, the string contains only the database, schema, and table name separated by periods as per the original post.
|
|
|
|
 |
|
 |
I was confused since I understood (say: assumed...) that you have an SQL script that you want to patch... Never assume anything
In that case your initial regex is probably the simplest solution.
Cheers
Andi
|
|
|
|
 |
|
 |
Wait till Smitha tackles that post!
|
|
|
|
 |
|
|
 |
|
 |
You might consider making it more readable, e.g.:
string b = @"25[0-5]|2[0-4]\d|1?\d?\d"; string n = @"(?:"+b+@")"; string w = @"(?:\*|"+b+@")"; string d = @"\."; string ip = n+d+w+d+w+d+w;
Cheers
Andi
|
|
|
|
 |
|
 |
Is this solved?
If not, and if you are talking about C# Regex, then the following will do:
string p = @"\\begin\{verbatimtab\}[\s\S]*?\\end\{verbatimtab\}";
Console.WriteLine("{0}", Regex.Match(str1, p).Success);
The trick:
- use
[\s\S] to match any character, independent of singleline or multiline setting
- use lazy match
*? to allow multiple such groups to be matched individually
Cheers
Andi
modified 7 Apr '12.
|
|
|
|
 |
|
|
 |
|
 |
See my explanation below (I know, this is very old topic, but I see it was not solved in this thread, so I added my lengthly explanation below).
The Regex matches for spaces where the prefix expression ((?<=...)) matches.
Far too complicated for cases where one wants to have a string split into part separated by spaces, ignoring spaces within "...".
My preferred solution is using positive match criterion (as described in the sentence above):
string pattern = @"\s*(""[^""]*""|\S+)\s*";
var fields = Regex.Matches(input, pattern).Cast<Match>().Select(m=>m.Groups[1].Value);
Cheers
Andi
|
|
|
|
 |
|
 |
See The 30 Minute Regex Tutorial and search for all occurances of (?<= in that article. This explains the meaning of (?<=...).
You have always to separate the way you enter a pattern in C# and the pattern the Regex sees:
| C# @"..." pattern: | @"(?<=^(?:[^""]*""[^""]*"")*[^""]*) " |
| effective Regex pattern (here delimited by /.../): | /(?<=^(?:[^"]*"[^"]*")*[^"]*) / |
I'm now only talking in Regex domain (the 2nd row), not how it is entered in the C# string.
Let's start with the inner most part and work outwards:
..."[^"]*"...: "..."
...[^"]*"[^"]*"...: any number of non-"-char, followed by "..." from 1. above
...(?:[^"]*"[^"]*")*...: any repetition of the group described in 2. above
...^(?:...)*...: 3. above must match from the beginning of the text
...^(?:...)*[^"]*...: 4. above, followed by any number of non-"-char
(?<=...) : match a space that is preceeded by the expression from 5. above; the (?<=...) is not part of the match
The Regex searches for the space character and checks if the data before that space matches the prefix expression. If yes, the match is successful, otherwise, the Regex searches for the next space and checks again, etc.
The given Regex and the given data match only on one space, the one after all. The underlined part matches with all: (?<=^(?:[^"]*"[^"]*")*[^"]*) .
I.e. the regex splits the given data by spaces, respecting spaces within "..." strings as non-separators.
Very complicated, though. I would do this differently, namely in positive terms (what you want to be part of the fields rather than what splits them):
string pattern = @"\s*(""[^""]*""|\S+)\s*"; string[] split = Regex.Matches(input, pattern).Cast<Match>().Select(m=>m.Groups[1].Value).ToArray();
Cheers
Andi
modified 8 Apr '12.
|
|
|
|
 |