Google SAS Search

Add to Google

Thursday, November 10, 2005

De-dupe In Excel

A lot of SAS programmers have to deliver data in Excel every once in a while. Sometimes, after you've gotten the data into Excel you find that you need to get rid of duplicates. Here's how to do it in Excel:

1) Be sure your columns are named.

2) Highlight the columns you want to use as your sort key (the ones you would use in your BY statement for PROC SORT).

3) From the drop-down menu go to Data->Filter->Advanced Filter.

4) Excel will automatically select the range you have highlighted. You should see a little checkbox that says "Unique Records Only." Check that.

5) Hit OK.

7 comments:

  1. Thanks for the tip! I had to do an additional step to make it useful for me. After doing the above steps the duplicates were still in the list but they were hidden. To get a list with the duplicates actually removed, I selected the Copy to another location option under the advanced filter dialog.

    ReplyDelete
  2. This is even simpler with Excel 2007. It is a one-click built in option.

    Now, I just need to figure out a way to actually delete both (all records) of my duplicates. Meaning, how would I go about removing all records of a duplicate item, both original and duplicate.

    ReplyDelete
  3. Ok, so I found a way to do this (in Excel 2007)

    Step 1: Run a conditional formatting rule and change all duplicates to a certain cell color.

    Step 2: Sort your list by cell color

    Step 3: Delete all rows with the cell color

    Perfect and easy to do!

    ReplyDelete
  4. sorry if this was already posted, but in the data tab there's an option called "Remove Duplicates". worked for me!

    ReplyDelete
  5. There are many dedupe software available in market to get out of duplicate content. Even though it is done i prefer to use the software which will does it work accurately..

    ReplyDelete
  6. I am doing a DM campaign and have already sent to 100 people. Some of those 100 may be in my 2nd list. How do I identify the dupes and get rid of both of the names. I can put them all in one sheet, but then just the dupe can be removed when I need the dupe and the original to be removed. This way I do not email to people I already emailed. Any ideas?

    ReplyDelete
  7. Just tried the conditional formatting by color and then sorted by color to dedupe. Worked great thanks for the suggestion.

    ReplyDelete