Non-overlapping text search
5 posts
• Page 1 of 1
Non-overlapping text search
The result of Find function ⍷ can be over overlapped.
Function textSearch implementes :While loop to remove overlapping.
Is there faster way to remove overlapping (except to use regular expressions)?
Function textSearch implementes :While loop to remove overlapping.
- Code: Select all
∇b←srchfor textSearch srchin;size;b1;ix;⎕ML
⎕ML←0
b←srchfor⍷srchin
size←⊃⍴,srchfor
:While ∨/b1←(1↓ix)<size+¯1↓ix←{⍵/⍳⍴⍵}b ⍝ Check overlapping.
b[⊃b1/1↓ix]←0
:EndWhile
∇
- Code: Select all
'aba' textSearchBool'babababababa'
0 1 0 0 0 1 0 0 0 1 0 0
Is there faster way to remove overlapping (except to use regular expressions)?
- mvranic
- Posts: 2
- Joined: Fri Jan 22, 2010 9:37 am
Re: Non-overlapping text search
I just pulled this subfn out of my find/replace function:
I knew it worked but I've never done any speed testing on it. Nor can I remember the sequence of steps I went through to develop it. I guess it's readonly code 'though some insight can probably be gained by breaking it inside all the braces.
cf runs the two fns against the same arguments multiple times up to one second and compares (divides) their times. (fi ≡ textSearch) is a fork that runs both fns and matches their results.
fi←{(⍴,⍺){⍵\⍺{⍵∊⊃⍺{⍵,⍺/⍨⍺≥⍺⍺+⌈/⍵}/⌽(-⍺),⍵}⍵/⍳⍴⍵}⍺⍷⍵}
I knew it worked but I've never done any speed testing on it. Nor can I remember the sequence of steps I went through to develop it. I guess it's readonly code 'though some insight can probably be gained by breaking it inside all the braces.
⍝ (or perhaps not.)
fi←{
(⍴,⍺){
⍵\⍺{
⍵∊⊃⍺{
⍵,⍺/⍨⍺≥⍺⍺+⌈/⍵
}/⌽(-⍺),⍵
}⍵/⍳⍴⍵
}⍺⍷⍵
}
)copy dfns cf
C:\Program Files\Dyalog\Dyalog APL-64 14.0 Uni...
cf runs the two fns against the same arguments multiple times up to one second and compares (divides) their times. (fi ≡ textSearch) is a fork that runs both fns and matches their results.
'aba'(fi ≡ textSearch) 'babababababa'
1
'aba'(fi cf textSearch) 'babababababa'
0.94419134396355
'aba'(fi ≡ textSearch) 1e3 ⍴ 'babababababa'
1
'aba'(fi cf textSearch) 1e3 ⍴ 'babababababa'
0.66764705882353
'aba'(fi ≡ textSearch) 1e4 ⍴ 'babababababa'
1
'aba'(fi cf textSearch) 1e4 ⍴ 'babababababa'
0.37480559875583
'aba'(fi ≡ textSearch) 1e5 ⍴ 'babababababa'
1
'aba'(fi cf textSearch) 1e5 ⍴ 'babababababa'
0.18665105386417
-
Phil Last - Posts: 628
- Joined: Thu Jun 18, 2009 6:29 pm
- Location: Wessex
Re: Non-overlapping text search
p.s. the structure in the middle:
is a for loop.
{
...
}/⌽(-⍺),⍵
is a for loop.
-
Phil Last - Posts: 628
- Joined: Thu Jun 18, 2009 6:29 pm
- Location: Wessex
Re: Non-overlapping text search
A general solution to the problem requires solving a transitive closure problem. See the ^: entry in the J dictionary http://www.jsoftware.com/help/dictionary/d202n.htm and section 1.2 of the APL87 paper Some Uses of { and } http://www.jsoftware.com/papers/from.htm . The details are left as an exercise for the reader for now. :-)
- Roger|Dyalog
- Posts: 238
- Joined: Thu Jul 28, 2011 10:53 am
Re: Non-overlapping text search
I forgot that I’d written a J Wiki essay on this topic: http://www.jsoftware.com/jwiki/Essays/Non-Overlapping_Substrings. I’ll provide a translation of it into Dyalog APL. Watch this space!
- Roger|Dyalog
- Posts: 238
- Joined: Thu Jul 28, 2011 10:53 am
5 posts
• Page 1 of 1
Who is online
Users browsing this forum: No registered users and 1 guest
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group