Tuesday, March 1, 2011

How do you identify duplicate elements in an XPath 2.0 sequence ?

I have an XPath expression which provides me a sequence of values like the one below:

1 2 2 3 4 5 5 6 7

It is easy to convert this to a set of unique values "1 2 3 4 5 6 7" using the distinct-values function. However, what I want to extract is the list of duplicate values = "2 5". I can't think of an easy way to do this. Can anyone help?

From stackoverflow
  • Calculate the difference between your original set and the set of distinct values. This is the set of numbers that occur more than once. Note that numbers in this result set are not necessarily distinct if they occur more than twice in the original sequence so convert again to a set of distinct values if this is required.

  • Yes, but the problem is how do I calculate the difference between two sequences ? You can compare sequences using the union / intersect / except keywords, but none of these will provide the 'difference' between the 2 sets of values.

    DaveP : http://www.dpawson.co.uk/xsl/sect2/muench.html#d10875e108 shows the set difference techniques. DaveP
  • What about:

    distinct-values(
      for $item in $seq
      return if (count($seq[. eq $item]) > 1)
             then $item
             else ())
    

    This iterates through the items in the sequence, and returns the item if the number of items in the sequence that are equal to that item is greater than one. You then have to use distinct-values() to remove the duplicates from that list.

    Dimitre Novatchev : Hi Jeni, Seems there is a simpler solution :) $vSeq[index-of($vSeq,.)[2]] Cheers, Dimitre
  • What about xsl? Is it applicable to your request?

     <xsl:for-each select="/r/a">
      <xsl:variable name="cur" select="." />
      <xsl:if test="count(./preceding-sibling::a[. = $cur]) > 0 and count(./following-sibling::a[. = $cur]) = 0">
       <xsl:value-of select="." />
      </xsl:if>
     </xsl:for-each>
    
  • Thanks to all who answered, JeniT gave just the kind of solution I was looking for - thanks !

    Dimitre Novatchev : What about the one-line solution I posted two days ago? Seems you do not log in too frequently. Hint: At least you could accept one of the answers
    Dimitre Novatchev : Hint: You can *accept* one of the proposed solutions.
  • Given the following xml:

    <a>
        <b>1</b>
        <b>2</b>
        <b>2</b>
        <b>3</b>
        <b>4</b>
        <b>5</b>
        <b>5</b>
        <b>5</b>
        <b>6</b>
        <b>7</b>
    </a>
    

    The following XPath will give you a list of repeating values (in this case 2, 5, 5)

    /a/b[.=following-sibling::b]
    

    However if you wanted a distinct list of repeating values (in this case 2, 5) then the following XPath should do the business for you:

    /a/b[.=following-sibling::b][not(.=preceding-sibling::b)]
    
    Wilfred Knievel : How this works: the stuff in the first square brackets returns a list of the nodes that repeat (2,5,5) it’s worth noting that these values are kind of pointers to the values in the original list. The second square brackets work in the opposite direction on the main list to return only unique results
    Dimitre Novatchev : This question was asked for a *sequence* of items, not for a node-set. Your solution on the other side works for node-sets only. Also, it is not too efficient. As a first step, using /a/b[.=following-sibling::b][1] may be more efficient. Cheers
  • Use this simple XPath 2.0 expression:

          $vSeq[index-of($vSeq,.)[2]]

    where $vSeq is the sequence of values in which we want to find the duplicates.

    For explanation of how this "works", see:

          http://dnovatchev.spaces.live.com/Blog/cns!44B0A32C2CCF7488!904.entry

    Cheers,

    Dimitre Novatchev

    JeniT : Very nice. I constantly overlook the index-of() function.

0 comments:

Post a Comment