Pages

6/17/2011

Time to compare SAS and R -- Merge Dataset

I have been learning SAS which is quite an interesting language different from most of the languages I used. (Well it is quite similar to SQL syntax).

Today I am thinking whether R can do similar data manipulation as easy as SAS. Of course, the answer is: YES, R can!. I am going to show a simple example to demonstrate this. I will post other stuff while on my way to SAS master!

The example is from "The Little SAS Book":
We want to compare the two datasets and find out which customers did not make a purchase during a certain period. The datasets look like this:


#     ADDR.

101 Murphy's Sports 115 Main St.
102 Sun N Ski 2106 Newberry Ave.
103 Sports Outfitters 19 Cary Way
104 Cramer & Johnson 4106 Arlington Blvd.
105 Sports Savers 2708 Broadway

#     Sales

102 562.01
104 254.98
104 1642.00
101 3497.56
102 385.30


In SAS, you can easily find out which customers did not make a purchase:


data CustomerData;
input CustomerNumber Address & $30.;
datalines;
101 Murphy's Sports 115 Main St.
102 Sun N Ski 2106 Newberry Ave.
103 Sports Outfitters 19 Cary Way
104 Cramer & Johnson 4106 Arlington Blvd.
105 Sports Savers 2708 Broadway
;

data  OrdersData;
input CustomerNumber Total;
datalines;
102 562.01
104 254.98
104 1642.00
101 3497.56
102 385.30
;
proc sort data = OrdersData;
by CustomerNumber;
run;

data MergedData;
merge CustomerData OrdersData (in = ifin);
by CustomerNumber;
if ifin = 0;
run;


ObsCustomerNumberAddressTotal
1103Sports Outfitters 19 Cary Way.
2105Sports Savers 2708 Broadway.

# TODO: Add comment
# 
# Author: evert
###############################################################################


customerData = c(101, 'Murphy\'s Sports 115 Main St.', 
    102, 'Sun N Ski 2106 Newberry Ave.', 
    103, 'Sports Outfitters 19 Cary Way', 
    104, 'Cramer & Johnson 4106 Arlington Blvd.', 
    105, 'Sports Savers 2708 Broadway')
  
customerData = matrix(customerData, ncol = 2, byrow = TRUE)
customerData = data.frame(customerData) 
colnames(customerData) = c("CustomerNumber", "Address")

ordersData = c( 102, 562.01,
    104, 254.98,
    104, 1642.00,
    101, 3497.56,
    102, 385.30)
ordersData = matrix(ordersData, ncol = 2, byrow = TRUE)
ordersData = data.frame(ordersData) 
colnames(ordersData) = c("CustomerNumber", "Sales")
  
mergedData = merge(customerData,ordersData)
is.match = match(customerData$CustomerNumber, ordersData$CustomerNumber, nomatch = 0)

nonmatchData = customerData[which(is.match == 0),]
nonmatchData["Sales"] = NA


Variable "mergedData" gives the merged dataset (as merge....by...statements in SAS). "nonmatchData" shows the customers without purchase, as shown in the SAS output.

No comments:

Post a Comment