Search

Advertisement #1

University Search


Online universities directory by state, name and specialization.

Advertisement #2

University Search


Online universities directory by state, name and specialization.

Advertisement #3

Sales Courses

http://www.salestraining.co.in

Salestraining offer sales training courses and sales training programs throughout India.

RSS Feed

Sean's Blog

Account Login



Performance of Finding Duplicates in LINQ PDF Print E-mail
Blog - Review

Today I saw about a dozen lines of code in a project that I was working on simply dedicated to determining whether or not there are duplicates within a stringlist (List). This made me start to question whether or not there was a better way of doing this, to which the little light bulb in my head went off and directed me towards LINQ. A quick search on the web for "Finding duplicates with LINQ" landed me at this chunk of code:

myList.GroupBy(i => i).Where(g => g.Count() > 1).Select(g => g.Key).ToList()

This works beautifully and returns the non-unique items in myList. This not only reduces the line count in applications needing to do similar operations the harder way, but to me it is much more readable. These days, with the complex applications that are getting built, readability is a major importance in code.

However, even though this is the new .kewl. way for me to identify non-unique elements in lists, I still question whether or not it is the .efficient. way to do it. The first thing I looked into is what the code looks like after it is compiled. This is what I found:

private static List<string> LinqNonUnique(List<string> items) 
{ 
    return items.GroupBy<string, string>(delegate (string i) { 
        return i; 
    }).Where<IGrouping<string, string>>(delegate (IGrouping<string, string> g) { 
        return (g.Count<string>() > 1); 
    }).Select<IGrouping<string, string>, string>(delegate (IGrouping<string, string> g) { 
        return g.Key; 
    }).ToList<string>(); 
}

This makes sense given that what I know about LINQ is that the expressions mainly get turned into inline delegates. So, what does the logic within LINQ.s .GroupBy and .Where look like . back to reflector I go.

// GroupBy() Method 
public static IEnumerable<IGrouping<TKey, TSource>> GroupBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector) 
{ 
    return new GroupedEnumerable<TSource, TKey, TSource>(source, keySelector, IdentityFunction<TSource>.Instance, null); 
} 
// GroupedEnumerable Class Definition 
internal class GroupedEnumerable<TSource, TKey, TElement> : IEnumerable<IGrouping<TKey, TElement>>, IEnumerable 
{ 
    // Fields 
    private IEqualityComparer<TKey> comparer; 
    private Func<TSource, TElement> elementSelector; 
    private Func<TSource, TKey> keySelector; 
    private IEnumerable<TSource> source; 
    // Methods 
    public GroupedEnumerable(IEnumerable<TSource> source, Func<TSource, TKey> keySelector, Func<TSource, TElement> elementSelector, IEqualityComparer<TKey> comparer); 
    public IEnumerator<IGrouping<TKey, TElement>> GetEnumerator(); 
    IEnumerator IEnumerable.GetEnumerator(); 
} 
// GroupedEnumerable CTOR 
public GroupedEnumerable(IEnumerable<TSource> source, Func<TSource, TKey> keySelector, Func<TSource, TElement> elementSelector, IEqualityComparer<TKey> comparer) 
{ 
    if (source == null) 
    { 
        throw Error.ArgumentNull("source"); 
    } 
    if (keySelector == null) 
    { 
        throw Error.ArgumentNull("keySelector"); 
    } 
    if (elementSelector == null) 
    { 
        throw Error.ArgumentNull("elementSelector"); 
    } 
    this.source = source; 
    this.keySelector = keySelector; 
    this.elementSelector = elementSelector; 
    this.comparer = comparer; 
} 
// GroupedEnumerable Enumerator 
public IEnumerator<IGrouping<TKey, TElement>> GetEnumerator() 
{ 
    return Lookup<TKey, TElement>.Create<TSource>(this.source, this.keySelector, this.elementSelector, this.comparer).GetEnumerator(); 
}

So, it seems that the actual groupping operations are getting handed off to the enumerator to do the work with a comparison. This seems pretty good but I.m not really sold yet. I want to know just how much it power it takes to perform the .GroupBy and .Where operations.

private static List<string> LinqNonUnique(List<string> items) 
{ 
    return items.GroupBy(i => i).Where(g => g.Count() > 1).Select(g => g.Key).ToList(); 
} 
private static List<string> ForeachNonUnique(List<string> items) 
{ 
    List<string> retItems = new List<string>(); 
    for (int i = 0; i < items.Count; i++) 
    { 
        if (items.LastIndexOf(items[i]) != i) 
        { 
            retItems.Add(items[i]); 
        } 
    } 
    return retItems; 
}

I profiled the above two methods and found that using the foreach is about twice as slow as the linq method. I.m sure there are better ways of performing a uniqueness operation but I am basing this on what I most commonly see in code.

LinqUniqueness

All-in-all I feel more confident in LINQ after investigating this.

Last Updated on Friday, 16 October 2009 08:48
 

Comments 

 
0 #1 SHOX R2 2010-07-30 05:07 67.jours avant la soumission de votre manuscrit, révisée en finale que vous attendez trop SHOX RIVALRY longtemps! sont une LUNETTES CARRERA copie du projet, veuillez ajouter un examen, correct, et nous me rembourserLe 23 janvier nuit peut rendre un tel NIKE SKYLINE jugement, la théorie de la "nouvelle * écrit et l'écriture de Mao Zedong, Zhou, Yang dans le moyen moins joué le rôle de la NIKE TN naissance. Toutefois, dans cet article, les documents importants, Zhou Yang comme les cultures de leadership important Quote
 

Add comment


Security code
Refresh