Today I am thinking whether R can do similar data manipulation as easy as SAS. Of course, the answer is: YES, R can!. I am going to show a simple example to demonstrate this. I will post other stuff while on my way to SAS master!
The example is from "The Little SAS Book":
We want to compare the two datasets and find out which customers did not make a purchase during a certain period. The datasets look like this:
# ADDR.
101 Murphy's Sports 115 Main St.
102 Sun N Ski 2106 Newberry Ave.
103 Sports Outfitters 19 Cary Way
104 Cramer & Johnson 4106 Arlington Blvd.
105 Sports Savers 2708 Broadway
# Sales
102 562.01
104 254.98
104 1642.00
101 3497.56
102 385.30
In SAS, you can easily find out which customers did not make a purchase:
data CustomerData;
input CustomerNumber Address & $30.;
datalines;
101 Murphy's Sports 115 Main St.
102 Sun N Ski 2106 Newberry Ave.
103 Sports Outfitters 19 Cary Way
104 Cramer & Johnson 4106 Arlington Blvd.
105 Sports Savers 2708 Broadway
;
data OrdersData;
input CustomerNumber Total;
datalines;
102 562.01
104 254.98
104 1642.00
101 3497.56
102 385.30
;
proc sort data = OrdersData;
by CustomerNumber;
run;
data MergedData;
merge CustomerData OrdersData (in = ifin);
by CustomerNumber;
if ifin = 0;
run;
| Obs | CustomerNumber | Address | Total |
|---|---|---|---|
| 1 | 103 | Sports Outfitters 19 Cary Way | . |
| 2 | 105 | Sports Savers 2708 Broadway | . |
# TODO: Add comment
#
# Author: evert
###############################################################################
customerData = c(101, 'Murphy\'s Sports 115 Main St.',
102, 'Sun N Ski 2106 Newberry Ave.',
103, 'Sports Outfitters 19 Cary Way',
104, 'Cramer & Johnson 4106 Arlington Blvd.',
105, 'Sports Savers 2708 Broadway')
customerData = matrix(customerData, ncol = 2, byrow = TRUE)
customerData = data.frame(customerData)
colnames(customerData) = c("CustomerNumber", "Address")
ordersData = c( 102, 562.01,
104, 254.98,
104, 1642.00,
101, 3497.56,
102, 385.30)
ordersData = matrix(ordersData, ncol = 2, byrow = TRUE)
ordersData = data.frame(ordersData)
colnames(ordersData) = c("CustomerNumber", "Sales")
mergedData = merge(customerData,ordersData)
is.match = match(customerData$CustomerNumber, ordersData$CustomerNumber, nomatch = 0)
nonmatchData = customerData[which(is.match == 0),]
nonmatchData["Sales"] = NA
Variable "mergedData" gives the merged dataset (as merge....by...statements in SAS). "nonmatchData" shows the customers without purchase, as shown in the SAS output.
No comments:
Post a Comment