Combining data from different sources is a common task in data analysis. In SAS Enterprise Guide, merging two columns from different datasets into one new dataset allows you to consolidate information for enhanced analysis and reporting. This comprehensive guide, brought to you by conduct.edu.vn, will provide you with the necessary steps and understanding to effectively merge columns in SAS Enterprise Guide, thus creating a unified dataset and deriving more significant insights. Merging data, concatenating variables, and data manipulation are key techniques.
Table of Contents
- Understanding Data Merging in SAS Enterprise Guide
- Prerequisites for Combining Columns
- Step-by-Step Guide: Combining Two Columns in SAS Enterprise Guide
- 3.1. Sorting the Datasets
- 3.2. Using the MERGE Statement
- 3.3. Handling Missing Values
- 3.4. Specifying Options
- Types of Matches in SAS Merging
- 4.1. One-to-One Match
- 4.2. One-to-Many Match
- 4.3. Many-to-Many Match
- Advanced Techniques for Column Combination
- 5.1. Using the IN= Option
- 5.2. Renaming Variables During Merge
- 5.3. Using Multiple BY Variables
- Troubleshooting Common Issues
- 6.1. Datasets Not Sorted
- 6.2. Incorrect BY Variable
- 6.3. Data Type Mismatches
- 6.4. Missing Values Handling
- Best Practices for Data Merging
- Real-World Examples of Combining Columns
- 8.1. Combining Demographic and Survey Data
- 8.2. Merging Sales and Customer Data
- 8.3. Combining Product and Inventory Data
- Enhancing Data Integrity and Quality
- Optimizing Performance
- Additional Resources and Further Learning
- FAQ: Combining Columns in SAS Enterprise Guide
- Conclusion
1. Understanding Data Merging in SAS Enterprise Guide
Data merging is the process of combining two or more datasets into a single dataset based on one or more common variables. In SAS Enterprise Guide, this is primarily achieved using the MERGE
statement within a DATA
step. The MERGE
statement combines observations from two or more SAS datasets into a single observation in a new dataset. This is essential for integrating information from different sources, allowing you to perform comprehensive data analysis. Data integration and data consolidation are important aspects.
Merging is different from concatenating. Concatenating (using SET
) stacks datasets on top of each other, increasing the number of observations, whereas merging combines datasets side-by-side, increasing the number of variables. Understanding this distinction is crucial for choosing the right approach for your data manipulation needs.
2. Prerequisites for Combining Columns
Before you can combine columns in SAS Enterprise Guide, ensure the following prerequisites are met:
- SAS Enterprise Guide Installed: You must have SAS Enterprise Guide installed and accessible on your machine.
- Access to Datasets: Ensure you have access to the datasets you intend to merge. These datasets should be accessible within your SAS environment.
- Common Variable(s): Identify one or more common variables (columns) that exist in both datasets. These variables will be used to match and merge the rows.
- Understanding of Data: Have a clear understanding of the structure and content of your datasets, including data types and potential missing values.
- Sorting: Both datasets must be sorted by the common variable(s) before merging. This is a crucial step for ensuring correct merging.
- Data Integrity: Ensure that the data in the common variables is clean and consistent across both datasets. Inconsistencies can lead to incorrect merging.
3. Step-by-Step Guide: Combining Two Columns in SAS Enterprise Guide
3.1. Sorting the Datasets
Sorting is a mandatory step before merging datasets in SAS. SAS requires that the datasets being merged are sorted by the BY
variable(s). This ensures that the observations are correctly matched during the merge process.
Steps to Sort Datasets:
- Open SAS Enterprise Guide: Launch SAS Enterprise Guide and open the project containing your datasets.
- Navigate to the Datasets: Locate the two datasets you want to merge in the “Libraries” pane.
- Use PROC SORT: Use the
PROC SORT
procedure to sort each dataset. Here’s the syntax:
PROC SORT DATA=Dataset1;
BY CommonVariable;
RUN;
PROC SORT DATA=Dataset2;
BY CommonVariable;
RUN;
- Replace
Dataset1
andDataset2
with the actual names of your datasets. - Replace
CommonVariable
with the name of the variable that is common to both datasets and will be used for merging. If you have multiple common variables, list them in theBY
statement. For example:BY Variable1 Variable2;
- Ensure that the sorting order (ascending or descending) is consistent across both datasets. By default,
PROC SORT
sorts in ascending order.
PROC SORT DATA=patients;
BY Subject_ID;
RUN;
PROC SORT DATA=initial_appointments;
BY Subject_ID;
RUN;
3.2. Using the MERGE Statement
The MERGE
statement is used within a DATA
step to combine two or more datasets.
Steps to Use the MERGE Statement:
- Create a New DATA Step: Start a new
DATA
step by specifying the name of the new dataset you want to create.
DATA NewDataset;
- Replace
NewDataset
with the desired name for your merged dataset.
- Use the MERGE Statement: Use the
MERGE
statement to specify the datasets you want to merge.
MERGE Dataset1 Dataset2;
- Replace
Dataset1
andDataset2
with the names of your datasets. The order in which you list the datasets can be significant, especially in one-to-many merges.
- Specify the BY Statement: Use the
BY
statement to specify the common variable(s) used for merging.
BY CommonVariable;
- Replace
CommonVariable
with the name of the common variable.
- Complete the DATA Step: Add a
RUN;
statement to execute theDATA
step.
RUN;
Complete Code Example:
DATA NewDataset;
MERGE Dataset1 Dataset2;
BY CommonVariable;
RUN;
DATA one_to_one_match;
MERGE patients initial_appointments;
BY Subject_ID;
RUN;
3.3. Handling Missing Values
Missing values can complicate the merging process. By default, SAS will propagate missing values through the merged dataset. Understanding how SAS handles missing values and implementing appropriate strategies is crucial for maintaining data integrity.
Strategies for Handling Missing Values:
- Identify Missing Values: Before merging, identify the extent and pattern of missing values in your datasets. Use
PROC FREQ
orPROC MEANS
to get a summary of missing values.
PROC FREQ DATA=Dataset1;
TABLES CommonVariable / MISSING;
RUN;
- Use the
MISSING
option in PROC FREQ: To include missing values in the frequency counts, use theMISSING
option in theTABLES
statement. This helps you assess the extent of missing data in the common variable. - Conditional Logic: Use
IF-THEN
statements to handle missing values based on specific conditions.
DATA NewDataset;
MERGE Dataset1 Dataset2;
BY CommonVariable;
IF Dataset1.Variable1 = . THEN Dataset1.Variable1 = 0; /* Replace missing with 0 */
RUN;
- Use the
COALESCE
Function: TheCOALESCE
function returns the first non-missing value from a list of variables. This can be useful when merging variables where missing values need to be replaced with corresponding values from another dataset.
DATA NewDataset;
MERGE Dataset1 Dataset2;
BY CommonVariable;
Variable1 = COALESCE(Dataset1.Variable1, Dataset2.Variable1);
RUN;
- Using
RETAIN
Statement: TheRETAIN
statement can be used to carry forward the last non-missing value. This is helpful when merging longitudinal data where you want to fill missing values with the most recent available data.DATA NewDataset; MERGE Dataset1 Dataset2; BY CommonVariable; RETAIN LastKnownValue; IF NOT MISSING(Variable1) THEN LastKnownValue = Variable1; ELSE Variable1 = LastKnownValue; RUN;
- Use
PROC MI
for Imputation: For more sophisticated handling of missing data, consider usingPROC MI
to impute missing values before merging. This involves creating multiple plausible values for each missing entry, resulting in a more robust dataset.PROC MI DATA=Dataset1 OUT=ImputedDataset NIMPUTE=5; VAR Variable1 Variable2; RUN;
- Exclude Observations with Missing Values: If appropriate, exclude observations with missing values in the common variable(s) using a
WHERE
statement.
DATA NewDataset;
MERGE Dataset1 Dataset2;
BY CommonVariable;
WHERE NOT MISSING(CommonVariable);
RUN;
3.4. Specifying Options
SAS provides several options that can be used with the MERGE
statement to control how the datasets are merged. These options can help handle specific merging scenarios and improve the efficiency of the merge process.
Common Options:
- IN= Option: The
IN=
option creates a temporary variable that indicates whether a dataset contributed to the current observation. This is useful for identifying unmatched observations. - RENAME= Option: The
RENAME=
option allows you to rename variables during the merge process to avoid naming conflicts. - FIRSTOBS= and OBS= Options: These options allow you to specify a subset of observations to be merged, which can be useful when working with large datasets.
Using the IN= Option:
The IN=
option creates a temporary variable that is set to 1 if the dataset contributed to the current observation and 0 otherwise. This can be useful for identifying observations that are only present in one of the datasets.
DATA NewDataset;
MERGE Dataset1(IN=A) Dataset2(IN=B);
BY CommonVariable;
IF A AND B THEN Source = "Both";
ELSE IF A THEN Source = "Dataset1";
ELSE Source = "Dataset2";
RUN;
In this example, the variables A
and B
indicate whether the observation came from Dataset1
and Dataset2
, respectively. The Source
variable then indicates the source of each observation.
Using the RENAME= Option:
The RENAME=
option allows you to rename variables during the merge process. This is useful when both datasets have variables with the same name but different meanings.
DATA NewDataset;
MERGE Dataset1(RENAME=(Variable1=Variable1_1)) Dataset2(RENAME=(Variable1=Variable1_2));
BY CommonVariable;
RUN;
In this example, Variable1
in Dataset1
is renamed to Variable1_1
, and Variable1
in Dataset2
is renamed to Variable1_2
.
Using the FIRSTOBS= and OBS= Options:
The FIRSTOBS=
and OBS=
options allow you to specify the first and last observations to be processed during the merge. This can be useful when working with large datasets and you only need to merge a subset of the data.
DATA NewDataset;
MERGE Dataset1(FIRSTOBS=100 OBS=200) Dataset2(FIRSTOBS=50 OBS=150);
BY CommonVariable;
RUN;
In this example, only observations 100 to 200 from Dataset1
and observations 50 to 150 from Dataset2
are merged.
4. Types of Matches in SAS Merging
SAS merging can handle various types of matches between datasets. The type of match affects how the data is combined and how missing values are handled. The three primary types of matches are one-to-one, one-to-many, and many-to-many.
4.1. One-to-One Match
A one-to-one match occurs when each observation in the first dataset has exactly one matching observation in the second dataset. This is the simplest type of merge and is commonly used when combining data where each record is unique.
Example:
Suppose you have two datasets: one containing patient demographic information and another containing patient appointment data. Each patient has one record in each dataset.
Dataset A: Patient demographic information.
Subject_ID | DOB | Gender |
---|---|---|
1 | 9/20/1980 | Female |
2 | 6/12/1954 | Male |
3 | 4/2/2001 | Male |
4 | 8/29/1978 | Female |
5 | 2/28/1986 | Female |
Dataset B: Patient appointment data.
Subject_ID | Visit_Date | Doctor |
---|---|---|
1 | 1/31/2012 | Walker |
2 | 2/2/2012 | Jones |
3 | 1/15/2012 | Jones |
4 | 3/10/2012 | Smith |
5 | 1/29/2012 | Smith |
Merging dataset A with dataset B yields:
Dataset AB: Combined table of demographics and initial appointments.
Subject_ID | DOB | Gender | Visit_Date | Doctor |
---|---|---|---|---|
1 | 9/20/1980 | Female | 1/31/2012 | Walker |
2 | 6/12/1954 | Male | 2/2/2012 | Jones |
3 | 4/2/2001 | Male | 1/15/2012 | Jones |
4 | 8/29/1978 | Female | 3/10/2012 | Smith |
5 | 2/28/1986 | Female | 1/29/2012 | Smith |
SAS Code:
DATA patients;
INPUT Subject_ID DOB Gender $;
INFORMAT DOB MMDDYY10.;
FORMAT DOB MMDDYY10.;
DATALINES;
1 9/20/1980 Female
2 6/12/1954 Male
3 4/2/2001 Male
4 8/29/1978 Female
5 2/28/1986 Female
;
RUN;
DATA initial_appointments;
INPUT Subject_ID Visit_Date Doctor $;
INFORMAT Visit_Date MMDDYY10.;
FORMAT Visit_Date MMDDYY10.;
DATALINES;
1 1/31/2012 Walker
2 2/2/2012 Jones
3 1/15/2012 Jones
4 3/10/2012 Smith
5 1/29/2012 Smith
;
RUN;
PROC SORT DATA=patients;
BY Subject_ID;
RUN;
PROC SORT DATA=initial_appointments;
BY Subject_ID;
RUN;
DATA one_to_one_match;
MERGE patients initial_appointments;
BY Subject_ID;
RUN;
4.2. One-to-Many Match
A one-to-many match occurs when one observation in the first dataset matches multiple observations in the second dataset. This is common when combining data where one entity has multiple related records.
Example:
Suppose you have a dataset containing patient demographic information and another dataset containing appointment records. Each patient has one record in the demographic dataset but may have multiple records in the appointment dataset.
Dataset A: Patient demographic information.
Subject_ID | DOB | Gender |
---|---|---|
1 | 9/20/1980 | Female |
2 | 6/12/1954 | Male |
3 | 4/2/2001 | Male |
4 | 8/29/1978 | Female |
5 | 2/28/1986 | Female |
Dataset B: Appointment records.
Subject_ID | Visit_Date | Doctor |
---|---|---|
1 | 1/31/2012 | Walker |
1 | 5/29/2012 | Walker |
2 | 2/2/2012 | Jones |
3 | 1/15/2012 | Jones |
5 | 1/29/2012 | Smith |
5 | 2/6/2012 | Smith |
Merging dataset A with dataset B yields:
Dataset AB: Match-merged appointment records data with patient demographics included.
Subject_ID | DOB | Gender | Visit_Date | Doctor |
---|---|---|---|---|
1 | 9/20/1980 | Female | 1/31/2012 | Walker |
1 | 9/20/1980 | Female | 5/29/2012 | Walker |
2 | 6/12/1954 | Male | 2/2/2012 | Jones |
3 | 4/2/2001 | Male | 1/15/2012 | Jones |
5 | 2/28/1986 | Female | 1/29/2012 | Smith |
5 | 2/28/1986 | Female | 2/6/2012 | Smith |
SAS Code:
DATA patients;
INPUT Subject_ID DOB Gender $;
INFORMAT DOB MMDDYY10.;
FORMAT DOB MMDDYY10.;
DATALINES;
1 9/20/1980 Female
2 6/12/1954 Male
3 4/2/2001 Male
4 8/29/1978 Female
5 2/28/1986 Female
;
RUN;
DATA appointment_log;
INPUT Subject_ID Visit_Date Doctor $;
INFORMAT Visit_Date MMDDYY10.;
FORMAT Visit_Date MMDDYY10.;
DATALINES;
1 1/31/2012 Walker
1 5/29/2012 Walker
2 2/2/2012 Jones
3 1/15/2012 Jones
5 1/29/2012 Smith
5 2/6/2012 Smith
;
RUN;
PROC SORT DATA=patients;
BY Subject_ID;
RUN;
PROC SORT DATA=appointment_log;
BY Subject_ID;
RUN;
DATA one_to_many_match;
MERGE patients appointment_log;
BY Subject_ID;
RUN;
Important Considerations for One-to-Many Matches:
- The dataset with the “one” relationship (patient demographics in this case) should be listed first in the
MERGE
statement. - Ensure that the data is properly sorted by the common variable.
4.3. Many-to-Many Match
A many-to-many match occurs when multiple observations in the first dataset match multiple observations in the second dataset. This type of merge can produce a large number of observations and should be used with caution.
Example:
Suppose you have a dataset containing student course enrollments and another dataset containing course schedules. Each student can enroll in multiple courses, and each course can have multiple scheduled times.
Dataset A: Student course enrollments.
Student_ID | Course_ID |
---|---|
1 | 101 |
1 | 102 |
2 | 101 |
3 | 103 |
Dataset B: Course schedules.
Course_ID | Time | Instructor |
---|---|---|
101 | 9:00 AM | Smith |
101 | 10:00 AM | Johnson |
102 | 11:00 AM | Brown |
103 | 2:00 PM | Davis |
Merging dataset A with dataset B yields:
Dataset AB: Combined table of student enrollments and course schedules.
Student_ID | Course_ID | Time | Instructor |
---|---|---|---|
1 | 101 | 9:00 AM | Smith |
1 | 101 | 10:00 AM | Johnson |
1 | 102 | 11:00 AM | Brown |
2 | 101 | 9:00 AM | Smith |
2 | 101 | 10:00 AM | Johnson |
3 | 103 | 2:00 PM | Davis |
SAS Code:
DATA student_enrollments;
INPUT Student_ID Course_ID;
DATALINES;
1 101
1 102
2 101
3 103
;
RUN;
DATA course_schedules;
INPUT Course_ID Time $ Instructor $;
DATALINES;
101 9:00 AM Smith
101 10:00 AM Johnson
102 11:00 AM Brown
103 2:00 PM Davis
;
RUN;
PROC SORT DATA=student_enrollments;
BY Course_ID;
RUN;
PROC SORT DATA=course_schedules;
BY Course_ID;
RUN;
DATA many_to_many_match;
MERGE student_enrollments course_schedules;
BY Course_ID;
RUN;
Important Considerations for Many-to-Many Matches:
- The order of datasets in the
MERGE
statement is less critical than in one-to-many merges, but it can still affect the outcome. - Be mindful of the potential for a large number of observations in the resulting dataset.
- Consider using additional variables in the
BY
statement to refine the match and reduce the number of observations.
5. Advanced Techniques for Column Combination
5.1. Using the IN= Option
The IN=
option in the MERGE
statement creates a temporary variable that indicates whether the corresponding dataset contributed to the current observation. This is useful for identifying unmatched observations and handling them accordingly.
Example:
Suppose you want to merge two datasets, Customers
and Orders
, and identify customers who have not placed any orders.
SAS Code:
DATA Customers;
INPUT Customer_ID Name $;
DATALINES;
1 John
2 Alice
3 Bob
;
RUN;
DATA Orders;
INPUT Customer_ID Order_ID;
DATALINES;
1 101
1 102
2 201
;
RUN;
PROC SORT DATA=Customers;
BY Customer_ID;
RUN;
PROC SORT DATA=Orders;
BY Customer_ID;
RUN;
DATA CustomerOrders;
MERGE Customers(IN=A) Orders(IN=B);
BY Customer_ID;
IF A AND NOT B THEN HasOrders = 0; /* Customer without orders */
ELSE HasOrders = 1; /* Customer with orders */
RUN;
In this example, the IN=A
option creates a variable A
that is 1 if the observation comes from the Customers
dataset and 0 otherwise. Similarly, IN=B
creates a variable B
for the Orders
dataset. The IF
statement checks if a customer exists in the Customers
dataset but not in the Orders
dataset, indicating that the customer has not placed any orders.
5.2. Renaming Variables During Merge
The RENAME=
option allows you to rename variables during the merge process. This is useful when both datasets have variables with the same name but different meanings.
Example:
Suppose you have two datasets, Employees
and Salaries
, both containing a variable named ID
. To avoid naming conflicts during the merge, you can rename the ID
variable in one of the datasets.
SAS Code:
DATA Employees;
INPUT ID Name $ Department $;
DATALINES;
1 John Sales
2 Alice Marketing
;
RUN;
DATA Salaries;
INPUT ID Salary;
DATALINES;
1 50000
2 60000
;
RUN;
PROC SORT DATA=Employees;
BY ID;
RUN;
PROC SORT DATA=Salaries;
BY ID;
RUN;
DATA EmployeeSalaries;
MERGE Employees Salaries(RENAME=(ID=EmployeeID));
BY ID;
RUN;
In this example, the RENAME=(ID=EmployeeID)
option renames the ID
variable in the Salaries
dataset to EmployeeID
. This avoids a naming conflict when merging the two datasets.
5.3. Using Multiple BY Variables
You can use multiple variables in the BY
statement to create a more precise match between datasets. This is useful when a single variable is not sufficient to uniquely identify matching observations.
Example:
Suppose you have two datasets, Sales
and Promotions
, and you want to merge them based on both Product_ID
and Date
.
SAS Code:
DATA Sales;
INPUT Product_ID Date Sales;
DATALINES;
101 2023-01-01 100
101 2023-01-02 150
102 2023-01-01 200
;
RUN;
DATA Promotions;
INPUT Product_ID Date Promotion;
DATALINES;
101 2023-01-01 10
102 2023-01-01 15
;
RUN;
PROC SORT DATA=Sales;
BY Product_ID Date;
RUN;
PROC SORT DATA=Promotions;
BY Product_ID Date;
RUN;
DATA SalesPromotions;
MERGE Sales Promotions;
BY Product_ID Date;
RUN;
In this example, the BY Product_ID Date;
statement merges the datasets based on both Product_ID
and Date
. This ensures that only observations with matching values for both variables are merged.
6. Troubleshooting Common Issues
6.1. Datasets Not Sorted
One of the most common issues in SAS merging is forgetting to sort the datasets by the BY
variable(s). SAS requires that the datasets being merged are sorted by the BY
variable(s) to ensure that the observations are correctly matched.
Solution:
Always ensure that both datasets are sorted by the BY
variable(s) before merging. Use the PROC SORT
procedure to sort the datasets.
PROC SORT DATA=Dataset1;
BY CommonVariable;
RUN;
PROC SORT DATA=Dataset2;
BY CommonVariable;
RUN;
6.2. Incorrect BY Variable
Using an incorrect BY
variable can lead to incorrect merging results. The BY
variable should be a variable that exists in both datasets and uniquely identifies the observations to be merged.
Solution:
Double-check that the BY
variable is correct and exists in both datasets. Verify that the values in the BY
variable are consistent across both datasets.
6.3. Data Type Mismatches
Data type mismatches between the BY
variable(s) in the two datasets can cause merging to fail or produce incorrect results. For example, if the BY
variable is numeric in one dataset and character in the other, the merge will not work correctly.
Solution:
Ensure that the data types of the BY
variable(s) are the same in both datasets. Use the PROC CONTENTS
procedure to check the data types of the variables. If necessary, use the INPUT
function to convert the data type of the variable.
DATA Dataset1;
SET Dataset1;
CommonVariable = INPUT(CommonVariable, BEST12.); /* Convert to numeric */
RUN;
DATA Dataset2;
SET Dataset2;
CommonVariable = INPUT(CommonVariable, $12.); /* Convert to character */
RUN;
6.4. Missing Values Handling
Missing values in the BY
variable(s) can cause unexpected results during merging. By default, SAS treats missing values as the smallest possible value, which can lead to incorrect matches.
Solution:
Handle missing values in the BY
variable(s) before merging. You can either exclude observations with missing values or impute the missing values using an appropriate method.
DATA NewDataset;
MERGE Dataset1 Dataset2;
BY CommonVariable;
WHERE NOT MISSING(CommonVariable); /* Exclude missing values */
RUN;
7. Best Practices for Data Merging
To ensure accurate and efficient data merging in SAS Enterprise Guide, follow these best practices:
- Understand Your Data: Before merging, thoroughly understand the structure and content of your datasets. Identify the common variables and potential issues such as missing values and data type mismatches.
- Sort Your Datasets: Always sort your datasets by the
BY
variable(s) before merging. This is a mandatory step for ensuring correct merging. - Use Descriptive Variable Names: Use clear and descriptive variable names to make your code more readable and maintainable.
- Handle Missing Values: Implement appropriate strategies for handling missing values in the
BY
variable(s) and other relevant variables. - Document Your Code: Add comments to your code to explain the purpose of each step and the logic behind your merging strategy.
- Test Your Code: Thoroughly test your code with sample data to ensure that the merging is producing the expected results.
- Use Options Wisely: Utilize the various options available in the
MERGE
statement to control how the datasets are merged and handle specific merging scenarios. - Monitor Performance: Monitor the performance of your merging process, especially when working with large datasets. Use techniques such as subsetting and indexing to improve performance.
- Validate Data Integrity: After merging, validate the integrity of the merged dataset. Check for any unexpected missing values, inconsistencies, or errors.
- Create Backup Copies: Before performing any major data merging operations, create backup copies of your original datasets. This ensures that you can revert to the original data if something goes wrong during the merging process.
- Use Version Control: Implement version control for your SAS code. This allows you to track changes, revert to previous versions, and collaborate with others more effectively.
- Optimize Data Storage: After merging, consider optimizing the storage of your dataset. Remove any unnecessary variables or observations, and compress the dataset if appropriate.
8. Real-World Examples of Combining Columns
8.1. Combining Demographic and Survey Data
In market research, combining demographic data with survey responses is a common task. This allows you to analyze how different demographic groups respond to survey questions.
Scenario:
You have two datasets:
Demographics
: Contains demographic information about survey respondents (e.g., age, gender, income).SurveyResponses
: Contains responses to survey questions.
You want to merge these datasets to analyze how different demographic groups responded to the survey.
SAS Code:
DATA Demographics;
INPUT Respondent_ID Age Gender $ Income;
DATALINES;
1 25 Male 50000
2 35 Female 60000
3 45 Male 70000
;
RUN;
DATA SurveyResponses;
INPUT Respondent_ID Question1 Question2;
DATALINES;
1 4 5
2 5 4
3 3 5
;
RUN;
PROC SORT DATA=Demographics;
BY Respondent_ID;
RUN;
PROC SORT DATA=SurveyResponses;
BY Respondent_ID;
RUN;
DATA MergedData;