Today I am thinking whether R can do similar data manipulation as easy as SAS. Of course, the answer is: YES, R can!. I am going to show a simple example to demonstrate this. I will post other stuff while on my way to SAS master!
The example is from "The Little SAS Book":
We want to compare the two datasets and find out which customers did not make a purchase during a certain period. The datasets look like this:
# ADDR.
101 Murphy's Sports 115 Main St.
102 Sun N Ski 2106 Newberry Ave.
103 Sports Outfitters 19 Cary Way
104 Cramer & Johnson 4106 Arlington Blvd.
105 Sports Savers 2708 Broadway
# Sales
102 562.01
104 254.98
104 1642.00
101 3497.56
102 385.30
In SAS, you can easily find out which customers did not make a purchase:
data CustomerData;
input CustomerNumber Address & $30.;
datalines;
101 Murphy's Sports 115 Main St.
102 Sun N Ski 2106 Newberry Ave.
103 Sports Outfitters 19 Cary Way
104 Cramer & Johnson 4106 Arlington Blvd.
105 Sports Savers 2708 Broadway
;
data OrdersData;
input CustomerNumber Total;
datalines;
102 562.01
104 254.98
104 1642.00
101 3497.56
102 385.30
;
proc sort data = OrdersData;
by CustomerNumber;
run;
data MergedData;
merge CustomerData OrdersData (in = ifin);
by CustomerNumber;
if ifin = 0;
run;
Obs | CustomerNumber | Address | Total |
---|---|---|---|
1 | 103 | Sports Outfitters 19 Cary Way | . |
2 | 105 | Sports Savers 2708 Broadway | . |
# TODO: Add comment # # Author: evert ############################################################################### customerData = c(101, 'Murphy\'s Sports 115 Main St.', 102, 'Sun N Ski 2106 Newberry Ave.', 103, 'Sports Outfitters 19 Cary Way', 104, 'Cramer & Johnson 4106 Arlington Blvd.', 105, 'Sports Savers 2708 Broadway') customerData = matrix(customerData, ncol = 2, byrow = TRUE) customerData = data.frame(customerData) colnames(customerData) = c("CustomerNumber", "Address") ordersData = c( 102, 562.01, 104, 254.98, 104, 1642.00, 101, 3497.56, 102, 385.30) ordersData = matrix(ordersData, ncol = 2, byrow = TRUE) ordersData = data.frame(ordersData) colnames(ordersData) = c("CustomerNumber", "Sales") mergedData = merge(customerData,ordersData) is.match = match(customerData$CustomerNumber, ordersData$CustomerNumber, nomatch = 0) nonmatchData = customerData[which(is.match == 0),] nonmatchData["Sales"] = NA
Variable "mergedData" gives the merged dataset (as merge....by...statements in SAS). "nonmatchData" shows the customers without purchase, as shown in the SAS output.
No comments:
Post a Comment