The Meaning of Unique
10 posts
• Page 1 of 1
The Meaning of Unique
This might be a post more for English language mavens than APL language mavens...
So in V18, we can easily flag the first occurrence of every item in a vector using the "unique mask" function:
All well and good. However, I also want to flag the unique items of the vector. In the above case, 1 and 4 are not unique, as they obviously occur multiple times. So I can do:
Lets define these as:
(Bonus points: is there a better expression for UniqueMask2 I didn't even attempt the key operator, as I assume it will be longer. There is {1=+/⍵∘.=⍵}, but it is of course slow and wsfull prone.)
The unique (and unique mask) function in APL do not return the unique values of the array, but rather an array whose values are unique. I guess the mathematicians might say it returns a set from a bag.
I have a reporting requirement where both concepts (and their complements) are required. The term "unique" could be applied to either concept, and then a name or phrase must be coined for the other concept. Or two new names entirely.
The context is a database table. Given a key column(s), show me the unique rows (or non unique rows). Show me the rows that occur only once (or show me the rows that occur multiple times).
My feeling is to reserve the term unique for UniqueMask2:
UniqueMask1: ? (?)
UniqueMask2: Unique Rows (Non-Unique Rows)
(Or "duplicate" for non-unique)
But this leaves the question of what to call UniqueMask1. I don't think the word "distinct" helps as I think it is synonymous with "unique" in this case. Everything gets a bit wordy. The best I can to is:
UniqueMask1: First Occurrence Rows (Subsequent Occurrence Rows)
Any thoughts on the terminology? The UI will have to have something that explains things in more detail, but the short names for all the results should not be confusing.
So in V18, we can easily flag the first occurrence of every item in a vector using the "unique mask" function:
a←1 4 1 1 2 4 3
≠a
1 1 0 0 1 0 1
All well and good. However, I also want to flag the unique items of the vector. In the above case, 1 and 4 are not unique, as they obviously occur multiple times. So I can do:
{~⍵∊⍵/⍨~≠⍵}a
0 0 0 0 1 0 1
Lets define these as:
UniqueMask1←≠
UniqueMask2←{~⍵∊⍵/⍨~≠⍵}
(Bonus points: is there a better expression for UniqueMask2 I didn't even attempt the key operator, as I assume it will be longer. There is {1=+/⍵∘.=⍵}, but it is of course slow and wsfull prone.)
The unique (and unique mask) function in APL do not return the unique values of the array, but rather an array whose values are unique. I guess the mathematicians might say it returns a set from a bag.
I have a reporting requirement where both concepts (and their complements) are required. The term "unique" could be applied to either concept, and then a name or phrase must be coined for the other concept. Or two new names entirely.
The context is a database table. Given a key column(s), show me the unique rows (or non unique rows). Show me the rows that occur only once (or show me the rows that occur multiple times).
My feeling is to reserve the term unique for UniqueMask2:
UniqueMask1: ? (?)
UniqueMask2: Unique Rows (Non-Unique Rows)
(Or "duplicate" for non-unique)
But this leaves the question of what to call UniqueMask1. I don't think the word "distinct" helps as I think it is synonymous with "unique" in this case. Everything gets a bit wordy. The best I can to is:
UniqueMask1: First Occurrence Rows (Subsequent Occurrence Rows)
Any thoughts on the terminology? The UI will have to have something that explains things in more detail, but the short names for all the results should not be confusing.
- paulmansour
- Posts: 420
- Joined: Fri Oct 03, 2008 4:14 pm
Re: The Meaning of Unique
Hey Paul
When I first encountered your "unique" funcion I was confused both because I assumed it was in fact "nub sieve" (Ken's term) for (≠) and the fact that Dyalog had used the word "unique" for "nub" (∪) rather then your "which items are unique" that is a much more precise and literal concept for "unique". So my suggestion for "UniqueMask1" would be to revert to Ken's thought "nubsieve".
Phil
When I first encountered your "unique" funcion I was confused both because I assumed it was in fact "nub sieve" (Ken's term) for (≠) and the fact that Dyalog had used the word "unique" for "nub" (∪) rather then your "which items are unique" that is a much more precise and literal concept for "unique". So my suggestion for "UniqueMask1" would be to revert to Ken's thought "nubsieve".
Phil
-
Phil Last - Posts: 628
- Joined: Thu Jun 18, 2009 6:29 pm
- Location: Wessex
Re: The Meaning of Unique
And for UniqueMask2
paul←{~⍵∊⍵/⍨~≠⍵} ⍝ not a member of duplicatesNo apparent improvement. But providing the name of the already tokenized train makes all the difference
phil←(≠∧⌽∘≠∘⌽) ⍝ first occurrence is the only one
⊢z←?10⍴10
9 9 7 5 5 0 6 9 1 5
(paul,[-.1]phil)z
0 0 1 0 0 1 1 0 1 0
0 0 1 0 0 1 1 0 1 0
]runtime '{~⍵∊⍵/⍨~≠⍵} z' '(≠∧⌽∘≠∘⌽) z' -c -r=1s
{~⍵∊⍵/⍨~≠⍵} z → 6.4E¯7 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
(≠∧⌽∘≠∘⌽) z → 6.4E¯7 | +1% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
]runtime 'paul z' 'phil z' -c -r=1sA dilemma for ]RUNTIME; not for Paul.
paul z → 6.7E¯7 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
phil z → 4.1E¯7 | -40% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
-
Phil Last - Posts: 628
- Joined: Thu Jun 18, 2009 6:29 pm
- Location: Wessex
Re: The Meaning of Unique
Funny - two days ago I played around with the same problem.
And ended up with the train similar to Paul's:
When testing I got
My problem was to handle nested vectors, so
Just a teaser: what is the difference between phil and phil2? :)
-Veli-Matti
And ended up with the train similar to Paul's:
⍝
vmj←(~⊢∊⊢(/⍨)~⍤≠)
When testing I got
⍝
z←100(?⍴)100
]runtime -c "phil z" "paul z" "vmj z" "phil2 z"
phil z → 1.2E¯6 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
paul z → 2.2E¯6 | +82% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
vmj z → 1.9E¯6 | +54% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
phil2 z → 1.0E¯6 | -15% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
My problem was to handle nested vectors, so
⍝
z2←(,∘.,⍨⎕a)[(26×26)(?⍴)26×26]
]runtime -c "phil z2" "paul z2" "vmj z2" "phil2 z2"
phil z2 → 9.3E¯5 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
paul z2 → 1.2E¯4 | +29% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
vmj z2 → 1.2E¯4 | +24% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
phil2 z2 → 9.2E¯5 | -2% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
Just a teaser: what is the difference between phil and phil2? :)
-Veli-Matti
- Veli-Matti
- Posts: 93
- Joined: Sat Nov 28, 2009 3:12 pm
Re: The Meaning of Unique
phil2 is the way I would write it sans the railroad?
{(≠⍵)∧⌽≠⌽⍵}
- paulmansour
- Posts: 420
- Joined: Fri Oct 03, 2008 4:14 pm
Re: The Meaning of Unique
No, you stumped me there. Playing around with atop (f⍤g) (or (f g)) instead of compose (f∘g), adding parens to re-order, all made insignificant difference - which until this experience should have seemed less so:
]runtime 'phil z2' 'phil z2' 'phil z2' -cI rather think ]RUNTIME might need some attention - or a warning
phil z2 → 5.6E¯5 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
phil z2 → 5.8E¯5 | +4% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
phil z2 → 5.8E¯5 | +3% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
]runtime 'phil z2' 'phil z2' 'phil z2' -c
phil z2 → 5.7E¯5 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
phil z2 → 5.5E¯5 | -5% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
phil z2 → 5.7E¯5 | -1% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
"Please ignore differences less than 10%"
-
Phil Last - Posts: 628
- Joined: Thu Jun 18, 2009 6:29 pm
- Location: Wessex
Re: The Meaning of Unique
I somehow missed Paul's post:
Not in my machine in Dyalog 18.0:paulmansour wrote:phil2 is the way I would write it sans the railroad?{(≠⍵)∧⌽≠⌽⍵}
]runtime 'phil z' 'phil2 z' -c
phil z → 6.5E¯7 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
phil2 z → 7.5E¯7 | +14% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
]runtime 'phil z' 'phil2 z' -c
phil z → 6.4E¯7 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
phil2 z → 7.8E¯7 | +22% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
-
Phil Last - Posts: 628
- Joined: Thu Jun 18, 2009 6:29 pm
- Location: Wessex
Re: The Meaning of Unique
On my machine:
- Code: Select all
phil←(≠∧⌽∘≠∘⌽)
phil2←{(≠⍵)∧⌽≠⌽⍵}
z←?100000⍴99000
+/≠z
63028
cmpx 'phil z' 'phil2 z'
phil z → 2.5E¯4 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
phil2 z → 2.4E¯4 | -6% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
cmpx 'phil z' 'phil2 z'
phil z → 2.6E¯4 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
phil2 z → 2.3E¯4 | -10% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
phil←(≠∧⌽∘≠∘⌽)
]runtime -c 'phil z' 'phil2 z'
phil z → 2.7E¯4 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
phil2 z → 2.4E¯4 | -12% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
- paulmansour
- Posts: 420
- Joined: Fri Oct 03, 2008 4:14 pm
Re: The Meaning of Unique
Forgot to say, thanks for the function Phil. In addition to being much faster, it's a much nicer expression.
- paulmansour
- Posts: 420
- Joined: Fri Oct 03, 2008 4:14 pm
Re: The Meaning of Unique
I should probably upgrade to 8.1 or 8.2
By-the-way I asked a layperson a few questions:
By-the-way I asked a layperson a few questions:
- Q: What would be the unique list from the list 1 4 1 1 2 4 3?
A: That's not a list it's a row.
Q: What would make it a list?
A: Commas between the items.
Q: What would be the unique list from the list 1, 4, 1, 1, 2, 4, 3?
A: 2, 3.
Q: What would be the nub of the list 1, 4, 1, 1, 2, 4, 3?
A: Errr...
Q: Relative to the list 1, 4, 1, 1, 2, 4, 3 what would you call the list 1, 4, 2, 3?
A: The core.
-
Phil Last - Posts: 628
- Joined: Thu Jun 18, 2009 6:29 pm
- Location: Wessex
10 posts
• Page 1 of 1
Who is online
Users browsing this forum: No registered users and 1 guest
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group