Non-overlapping text search

Learning APL or new to Dyalog? Ask "silly" questions here, without fear...

Non-overlapping text search

Postby mvranic on Fri Feb 13, 2015 4:07 pm

The result of Find function ⍷ can be over overlapped.

Function textSearch implementes :While loop to remove overlapping.
Code: Select all
∇b←srchfor textSearch srchin;size;b1;ix;⎕ML
 ⎕ML←0
 b←srchfor⍷srchin

 size←⊃⍴,srchfor
 :While ∨/b1←(1↓ix)<size+¯1↓ix←{⍵/⍳⍴⍵}b    ⍝ Check overlapping.
    b[⊃b1/1↓ix]←0
 :EndWhile

Code: Select all
'aba'  textSearchBool'babababababa'
0 1 0 0 0 1 0 0 0 1 0 0

Is there faster way to remove overlapping (except to use regular expressions)?
mvranic
 
Posts: 2
Joined: Fri Jan 22, 2010 9:37 am

Re: Non-overlapping text search

Postby Phil Last on Fri Feb 13, 2015 5:09 pm

I just pulled this subfn out of my find/replace function:

      fi←{(⍴,⍺){⍵\⍺{⍵∊⊃⍺{⍵,⍺/⍨⍺≥⍺⍺+⌈/⍵}/⌽(-⍺),⍵}⍵/⍳⍴⍵}⍺⍷⍵}

I knew it worked but I've never done any speed testing on it. Nor can I remember the sequence of steps I went through to develop it. I guess it's readonly code 'though some insight can probably be gained by breaking it inside all the braces.

      ⍝ (or perhaps not.)
fi←{
(⍴,⍺){
⍵\⍺{
⍵∊⊃⍺{
⍵,⍺/⍨⍺≥⍺⍺+⌈/⍵
}/⌽(-⍺),⍵
}⍵/⍳⍴⍵
}⍺⍷⍵
}


      )copy dfns cf
C:\Program Files\Dyalog\Dyalog APL-64 14.0 Uni...

cf runs the two fns against the same arguments multiple times up to one second and compares (divides) their times. (fi ≡ textSearch) is a fork that runs both fns and matches their results.

      'aba'(fi ≡ textSearch) 'babababababa'
1
'aba'(fi cf textSearch) 'babababababa'
0.94419134396355
'aba'(fi ≡ textSearch) 1e3 ⍴ 'babababababa'
1
'aba'(fi cf textSearch) 1e3 ⍴ 'babababababa'
0.66764705882353
'aba'(fi ≡ textSearch) 1e4 ⍴ 'babababababa'
1
'aba'(fi cf textSearch) 1e4 ⍴ 'babababababa'
0.37480559875583
'aba'(fi ≡ textSearch) 1e5 ⍴ 'babababababa'
1
'aba'(fi cf textSearch) 1e5 ⍴ 'babababababa'
0.18665105386417
User avatar
Phil Last
 
Posts: 628
Joined: Thu Jun 18, 2009 6:29 pm
Location: Wessex

Re: Non-overlapping text search

Postby Phil Last on Fri Feb 13, 2015 5:15 pm

p.s. the structure in the middle:
      {
...
}/⌽(-⍺),⍵

is a for loop.
User avatar
Phil Last
 
Posts: 628
Joined: Thu Jun 18, 2009 6:29 pm
Location: Wessex

Re: Non-overlapping text search

Postby Roger|Dyalog on Sat Feb 14, 2015 7:03 pm

A general solution to the problem requires solving a transitive closure problem. See the ^: entry in the J dictionary http://www.jsoftware.com/help/dictionary/d202n.htm and section 1.2 of the APL87 paper Some Uses of { and } http://www.jsoftware.com/papers/from.htm . The details are left as an exercise for the reader for now. :-)
Roger|Dyalog
 
Posts: 238
Joined: Thu Jul 28, 2011 10:53 am

Re: Non-overlapping text search

Postby Roger|Dyalog on Sun Feb 15, 2015 5:59 am

I forgot that I’d written a J Wiki essay on this topic: http://www.jsoftware.com/jwiki/Essays/Non-Overlapping_Substrings. I’ll provide a translation of it into Dyalog APL. Watch this space!
Roger|Dyalog
 
Posts: 238
Joined: Thu Jul 28, 2011 10:53 am


Return to New to Dyalog?

Who is online

Users browsing this forum: No registered users and 1 guest