MWSUG 2010 Conference Sections & Abstracts
- Analytics, Statistics, Modeling, Forecasting, & Data Mining
- Applications Development
- Foundations & Fundamentals (Beginning Tutorials)
- Hands-On Workshops
- Health Care & Health Insurance
- JMP
- Posters
- Programming Beyond the Basics (Advanced Tutorials)
- Reporting & Information Visualization
- Resources and Support for SAS Users
- SAS with Excel or Database
- Solutions
Analytics, Statistics, Modeling, Forecasting, & Data Mining
Applications Development
Foundations & Fundamentals (Beginning Tutorials)
Hands-On Workshops
Health Care & Health Insurance
JMP
Posters
Programming Beyond the Basics (Advanced Tutorials)
Reporting & Information Visualization
Resources and Support for SAS Users
SAS with Excel or Database
Solutions
Abstracts
Analytics, Statistics, Modeling, Forecasting, & Data Mining
The Applied Use of Population Stability Index (PSI) in SAS® Enterprise Miner™Rex Pruitt, PREMIER Bankcard, LLC
(Monday, 8:00 AM - 8:50 AM, Location: Wright Ballroom B)
In this paper I will describe how to develop the technical components necessary to calculate Population Stability Index (PSI), implement PSI into a SAS® Enterprise Miner™ extension node, and interpret the results of applied PSI analytics as an industry solution “Best Practice”.
Why is this of any value? A profound reality exists in the universe: “Change is absolute”!
PREMIER Bankcard’s use of predictive modeling has resulted in the need for PSI utilization due to CHANGE experienced in the following areas:
1. Changes in business operations due to internal & external influences 2. Detection of data integrity and/or metadata issues caused by programmatic changes 3. Compliance with regulatory review requirements
As companies continue to amass large amounts of data and use it to develop statistical models, the PSI measure helps monitor data and scorecard integrity. This is especially important since statistical models are being used to make strategic decisions worth millions of dollars.
PROC MIXED: Underlying Ideas with Examples
David Dickey, North Carolina State University
(Monday, 9:00 AM - 9:50 AM, Location: Wright Ballroom B)
SAS® PROC MIXED provides a single tool for analyzing a large array of models used in statistics, especially experimental design, through the use of REML estimation. A strategy for identifying mixed models is followed by a description of REML estimation along with a simple example that illustrates its advantages. A comparison of some of the available tests for variance components is given along with several examples, both real and artificial, that illustrate the variety of models handled by PROC MIXED.
Adequacy and Information Values: Two Tools for Ranking Predictors in Generalized Linear Models
Douglas Thompson, Assurant Health
(Monday, 10:00 AM - 10:50 AM, Location: Wright Ballroom B)
Which predictors are most strongly associated with the predicted outcome? In business settings, this is one of the first questions that clients typically ask about predictive models. To answer this question, it is necessary to rank the predictors by some measure of strength of association with the predicted outcome. Adequacy and information values were the two ranking methods recommended in an earlier MWSUG paper (Thompson, 2009). This paper expands on the earlier one by presenting SAS macros that serve three purposes: 1) they rank predictors by adequacy and information values in a variety of generalized linear models (logistic, poisson, etc.); 2) they rank predictors on adequacy, taking into account continuous variables that are non-linearly associated with the outcome; and 3) they take interactions into account in the rankings. The overall goal of this presentation is to provide predictive modelers with tools that they can use to more clearly explain model results to their clients. This paper assumes basic knowledge of PROC GENMOD in SAS/STAT.
An Introduction to Multiple Imputation of Complex Sample Data Using SAS® 9.2
Patricia Berglund, University of Michigan
(Monday, 11:00 AM - 11:20 AM, Location: Wright Ballroom B)
This paper presents practical guidance on the proper use of multiple imputation tools in SAS® 9.2 and the subsequent analysis of multiple imputed data sets from a complex sample survey data set. Use of the MI and MIANALYZE procedures and SAS survey procedures for typical descriptive and inferential analyses is demonstrated. The analytic techniques presented can be used on any operating system and are intended for an intermediate level audience.
Survival Analysis with PHREG: Using MI and MIANALYZE to Accommodate Missing Data
Chris Ake, Kaiser Permanente
Art Carpenter, Data Explorations
(Monday, 11:30 AM - 11:50 AM, Location: Wright Ballroom B)
Survival analyses based on a data collection process which the researcher has little control over are often plagued by problems of missing data. Deleting cases with any missing data will result in information loss and usually results in bias, while many analytic procedures that retain this information in some form underestimate the resulting uncertainty in parameter estimates and other output. SAS Version 8 included two new procedures that allow the researcher to generate "complete" data sets from incomplete data by multiple imputation and to analyze the resulting data in ways which adequately account for the uncertainty involved. This paper presents suggestions for optimal use of PROC MI to perform such multiple imputation and PROC MIANALYZE to conduct various statistical analyses of modeling output, in this case from PROC PHREG, including design of control macros, structure of multiple imputed data sets, generation of binary from non-binary categorical variables, and options for presentation of results.
Evaluation of Methods to Determine Optimal Cut Points for Predicting Mortgage Default
Valentin Todorov, Assurant Specialty Property
Douglas Thompson, Assurant Health
(Monday, 1:00 PM - 1:50 PM, Location: Wright Ballroom B)
One of the lessons from the recent wave of mortgage defaults is that it is very important to know the level of risk associated with mortgages held in a portfolio. In the banking world, lending risk is usually defined in terms of the likelihood that a borrower will default on a mortgage. To estimate the probability of default, banks utilize various risk scoring techniques, which output a score. Depending on a predetermined threshold “cut point” for the score, loans are priced differently. Loans with scores under that cut point are assigned higher interest rates compared to loans with scores above the cutoff point. If there is no experience available to choose an optimal cut point, statistical criteria can be used. One method for determining the cut point, the Kolmogorov-Smirnov goodness-of-fit test, has become the industry standard. Other less commonly used methods are decision trees/rules and a series of chi-square tests. In this paper we evaluate and compare these methods for estimating an optimal cut point. The methods illustrated here are applicable in many fields some of which are biostatistics, finance, and insurance.
Ideas and Examples in Generalized Linear Mixed Models
David Dickey, North Carolina State University
(Monday, 2:00 PM - 2:50 PM, Location: Wright Ballroom B)
SAS® PROC GLIMMIX fits generalized linear mixed models for nonnormal data with random effects, thus combining features of both PROC GENMOD and PROC MIXED. I will review the ideas behind PROC GLIMMIX and offer examples of Poisson and binary data. PROC NLMIXED also has the capacity to fit these kinds of models. After a brief introduction to that procedure, I will show an example of a zero-inflated Poisson model, which a model that is Poisson for counts 1,2,3,…, but has more 0s than is consistent with the Poisson.
The Next Generation: SAS/STAT® 9.22
Maura Stokes, SAS
(Monday, 3:00 PM - 3:50 PM, Location: Wright Ballroom B)
Caught up with SAS/STAT 9.2? Want to add even more statistical tools to your arsenal? Then get ready for the 9.22 release of SAS/STAT software, which works with Base SAS 9.2 to deliver methodological advances in addition to customer-requested features.
New functionality for postprocessing equips the modern linear modeling procedures with comparable capabilities, including the latest techniques for testing complex research hypotheses. The new SURVEYPHREG procedure fits Cox regression models to survey data. The EFFECTPLOT statement uses ODS Graphics to create plots of effects from models produced by the GENMOD, LOGISTIC, and ORTHOREG procedures. Exact Poisson regression is one of several new exact methods for categorical data analysis. Other new features include updated spatial analysis capabilities, classification variable support in the ROBUSTREG procedure, and model averaging in the GLMSELECT procedure. This talk provides an overview of these exciting new enhancements to the statistical software.
Using Base SAS® and SAS Enterprise Miner™ to Develop Customer Retention Modeling
Rex Pruitt, PREMIER Bankcard
(Monday, 4:00 PM - 4:50 PM, Location: Wright Ballroom B)
In this paper I will describe how to develop the components necessary using SAS tools and business analytics to effectively identify a “Good Customer.”
Objective (Target): Develop the components necessary using MIS Analytics to effectively identify a “Good Customer.” This “Good Customer Score” will be used in modeling exercises designed to help improve the cost effectiveness and development of retention efforts at PREMIER.
Estimated Opportunity Value: For example, reduce the attrition of PREMIER’s “Top Good Customers” >= 2 Years on Book = $15+ Million annually.
Recommendation: Add the “Good Customer Score” to the Data Warehouse and begin using it to develop and implement specific targeted retention and other strategies.
Portfolio Scoring & Ranking: The accuracy of the new “Good Customer Score” is supported by the statistical correlation to Behavior Score (3rd party score), as well as other scores, when identifying those customers who will perform in the top 25% of the Portfolio ranked by Good Customer Score (Target). The strength of like scores is noted in the Chi-Square correlation table. Additionally, the statistical soundness of the score comparison exercise performed using modeling in E-Miner is supported by a KS Statistic of 58 and a target prediction accuracy of 85%.
Novel Use of SAS® Software in Industrial and Biomedical Consulting, Research, and Teaching
Timothy O'Brien, Loyola University Chicago
(Tuesday, 8:00 AM - 8:50 AM, Location: Wright Ballroom B)
Basic courses in applied biostatistics, statistical methods, design and regression focus on hypothesis testing and estimation for linear and nonlinear models by providing students and decision-makers the methodology (i.e., test statistics and confidence intervals) to reach conclusions by narrowly focusing on individual t-tests and global F-tests. These courses leave students and managers with an overly simplistic view of how informed statistical decisions are made in practice.
This paper focuses on the more recent pedagogical ideas of exposing students to underlying likelihood methods and treating these specific (t- and F-) tests as special cases embedded in this larger structure. Key to this better decision-making process is powerful statistical software, and our focus is here on the use of the NLMIXED and IML procedures available in SAS® software to provide the means to make some of these important decisions.
This approach enables students and managers to pose and examine more meaningful queries. For example, the techniques discussed here allow practitioners to focus on the estimation of important model parameters in the presence of serially correlated errors rather than on the detection of the exact time-series error structure. Numerous additional practical examples of the applicability of likelihood methods are provided and discussed; specifically, the provided illustrations include novel approaches useful in statistical modeling, drug synergy and relative potency.
Using SAS® to Assess and Model Time-to-Event Data with Non-Proportional Hazards
Michael Wilson, IUSM, Dept of Clinical Pharmacology
(Tuesday, 9:00 AM - 9:50 AM, Location: Wright Ballroom B)
Proportional Hazards Regression using a partial maximum likelihood function to estimate the covariate parameters (Cox, 1972) has become an exceedingly popular procedure for conducting survival analysis. It is a notably robust survival method because it makes no assumptions about the shape of the probability distribution for survival times. On the other hand, strong violations of the proportional hazards assumption can have detrimental effects on the validity and efficiency of the partial likelihood inference (Struthers and Kalbfleisch, 1986; Lin and Wei, 1989). In this paper, graphical and analytical procedures to assess violations and extensions of the model to improve inferential validity and efficiency in the presence of non-proportionality using SAS® are compared and presented.
The syntax and output discussed in this paper can be re-created using SAS® and SAS/STAT®, versions 8.2 and 9.1.3 on any operating system. However, during the presentation features in SAS/STAT® 9.2 will be discussed.
Who should attend? Intermediate-level biostatistical programmers, statistical analysts, epidemiologists, finacial analysts, and social scientists that have a basic understanding of hypothesis testing.
References for the Abstract Only: Cox, D. (1972), Regression Models and Lifetables (with discussion), Journal of the Royal Statisticial Society, Series B, 34, 187-220. Lin, D. Y. and Wei, L. J. (1989). The robust inference for the Cox proportional hazards model. J. Am. Statist. Assoc. 84, 1074-8. Struthers, C. A. and Kalbfleisch, J. D. (1986). Misspecified proportional hazard models. Biometrika 73, 363-9.
Introduction to Logistic Regression
Maura Stokes, SAS
(Tuesday, 10:00 AM - 11:50 AM, Location: Wright Ballroom B)
Logistic regression is one of the basic modeling tools for a statistician or data analyst. This tutorial focuses on the basic methodology behind logistic regression and discusses parameterization, testing goodness of fit, and model evaluation using the LOGISTIC procedure. The tutorial concentrates on binary response models, but direction for handling ordinal responses is also provided. This tutorial discusses numerous ODS graphics now available with the LOGISTIC procedure, as well as newer features of SAS 9.2 such as ROC comparisons and odds ratios with interactions. The tutorial includes numerous examples.
Kernel Density Estimation As an Alternative to Binning of Survey Data
David Corliss, University of Toledo
(Tuesday, 1:00 PM - 1:20 PM, Location: Wright Ballroom B)
In binning data, a certain amount of information is often lost: any information at a higher degree of accuracy than needed to place it into a bin is discarded. In this paper, Kernel Density Estimation (KDE) is used to determine a population distribution with full retention of the measured value for each observation in cases where the uncertainties are expected to be Gaussian. The KDE process is generalized to accommodate the point-by-point variation in uncertainty that may be found in experimental data. Because of this, PROC KDE is not used - only Base SAS and a moderately level of SAS experience is required.
Monty Hall, Paul Erdös, and Monte Carlo
Irvin Snider, assurant Health
(Tuesday, 1:30 PM - 1:50 PM, Location: Wright Ballroom B)
You are on a game show. The game show host shows you three doors. Behind one of the doors is a Cadillac. Behind the other two doors are goats. You pick a door. The game show host then opens one of the doors not chosen and reveals a goat. Now the host asks you if you want to stay with your original pick or if you want to switch. What should you do? What difference does it make?
This classic Monty Hall problem confused readers of Ask Marilyn and caused intense debate when first published in 1990. Even the famed mathematician Paul Erdös did not get the answer right and was only grudgingly convinced of the correct answer after seeing the output from a Monte Carlo simulation.
This presentation will give a brief history of the Monty Hall problem and will present several methods of deriving the answer, including a SAS Monte Carlo simulation.
PROC LIFEREG or PROC PHREG
Dachao Liu, Northwestern University
(Tuesday, 2:00 PM - 2:20 PM, Location: Wright Ballroom B)
Besides commonly used PROC LOGISTIC, PROC PROBIT, PROC GENMOD and PROC LIFETEST, SAS has PROC LIFEREG or PROC PHREG in survival analysis. They both contain REG, a reminder of regression analysis, and they both deal with time-to-event data. This sometimes makes us wonder when we should use PROC LIFEREG when we should use PROC PHREG, even for experienced statisticians who are using SAS. This paper will discuss this question by using some examples.
Some Basic Techniques for Data Quality Evaluation Using SAS®
George Hurley, The Hershey Company
(Tuesday, 2:30 PM - 2:50 PM, Location: Wright Ballroom B)
When one gets data for a new project, the first step is always to evaluate the data and establish what has been provided, the format, and in general, the quality of the data. This can and should include formatting, variable types, basic outlier detection, and summary statistics. SAS has a large number of procedures that can help with this. This presentation will attempt to provide the audience with a summary of the basic uses of a number of these procedures, in an attempt to provide a strategy for this type of initial investigative data analysis. This will be of particular use for people who are new to SAS or are new to analytics.
Power to Detect Therapist Effects in Randomized, Controlled Trials of Interventions in Clinical Psychology
Douglas Thompson, Assurant Health
Fary Cachelin, California State University-Los Angeles
Ruth Striegel-Moore, Wesleyan University
Terry Wilson, Rutgers University
(Tuesday, 3:00 PM - 3:50 PM, Location: Wright Ballroom B)
In randomized controlled trials (RCTs) of interventions in clinical psychology, it is typical for multiple therapists to administer a given intervention. A therapist effect is present when treatment outcomes differ between therapists. Often, therapist effects have been viewed as a nuisance or an adjustment factor. However, therapist effects may be of practical importance; for example, they might indicate interventions that require extensive training to administer effectively. How many therapists are required to ensure adequate power to detect clinically meaningful therapist effects? Few studies have examined this issue. The present study examined how power is impacted by the number of therapists and the size of the therapist effect. The study utilized Monte Carlo simulations in SAS. This paper describes how to implement power analyses of therapist effects in SAS, as well as results from an initial set of analyses.
Applications Development
Take a Whirlwind Tour around SAS® 9.2Diane Hatcher, SAS
(Monday, 8:00 AM - 8:50 AM, Location: Wright Ballroom C)
A new bevy of SAS 9.2 capabilities are available in the platform for Business Analytics. Come and join us for a whirlwind tour of what’s new across the suite of client applications for Business Intelligence and Data Integration. We’ll start the trip with SAS Management Console which highlights updates to the metadata architecture that will surface throughout the rest of the tour. Other featured stops include Information Map Studio, Enterprise Guide, Add-in for Microsoft Office, Web Report Studio, and BI Dashboard in the Information Delivery Portal. We’ll even take a jaunt over to Data Integration Studio to see how user productivity can be taken even higher. Finally, we’ll wrap up the tour with some integration with Microsoft SharePoint. Experience the excitement of the new capabilities and features that integrate intelligence across your organization.
Google Maps and SAS/GRAPH®
Darrell Massengill, SAS
(Monday, 9:00 AM - 9:50 AM, Location: Wright Ballroom C)
Google Maps have become one of the most widely recognized means of handling maps and location data. This paper introduces a tool and shows you how to create your own Google Maps from within SAS® and SAS/GRAPH. In addition, we will briefly discuss Google Maps and the legal and technical limitations and implications of using these maps.
Point-and-Click Programming Using SAS® Enterprise Guide®
Mira Shapiro, Capacity Planner, Consultant and Biostatistician
Kirk Paul Lafler, Software Intelligence Corporation
(Monday, 10:00 AM - 10:50 AM, Location: Wright Ballroom C)
SAS® Enterprise Guide® empowers organizations exploiting the power of SAS by offering programmers, business analysts, statisticians and end-users with powerful built-in wizards to perform a multitude of reporting and analytical tasks, access multi-platform enterprise data sources, deliver data and results to a variety of mediums and outlets, perform important data manipulations without the need to learn complex coding constructs, and support data management and documentation requirements quickly and easily. Attendees learn how to use the graphical user interface (GUI) to access tab-delimited and Excel input files; subset, group, and summarize data; join two or more tables together; flexibly export results to HTML, PDF and Excel; and visually manage projects using flowcharts and diagrams.
Better Decision Making with SAS® Enterprise Business Intelligence and Microsoft Outlook
James Tetterton, SAS
(Monday, 11:00 AM - 11:50 AM, Location: Wright Ballroom C)
The SAS Add-In for Microsoft Office has new Microsoft Outlook integration capabilities that enable better decision-making, increase productivity and decrease cost by taking full advantage of SAS Enterprise Business Intelligence through the Microsoft Outlook environment. This integration of SAS BI and Microsoft Outlook increases informed decision making by giving users intuitive access to reports, stored processes and dashboards. These capabilities reduce the user learning curve and improve productivity, thereby increasing the return on investment of business intelligence implementations. Users obtain contextual information from SAS Business Intelligence while performing daily tasks in Microsoft Outlook. The SAS Add-In for Microsoft Office user interface seamlessly integrates with the Microsoft Outlook look and feel. By leveraging the widely used Microsoft Outlook environment, SAS BI Outlook integration can make the rich functionality of the SAS Enterprise Business Intelligence server, reporting and analytics available to all employees within an organization.
It's All about Variation: Improving Your Business Process with Statistical Thinking
Robert Rodriguez, SAS
(Monday, 1:00 PM - 1:50 PM, Location: Wright Ballroom C)
Do you need to detect unusual variation in your data? Have you ever wondered whether control charts might work for business activity monitoring? This paper explains how concepts and methods of statistical process monitoring, traditionally used in manufacturing, are finding fresh and valuable applications in business environments ranging from financial services to health care.
The starting point is a set of problems encountered by statisticians, analysts, and managers in these settings. For each vignette, the key to the solution is statistical understanding of the variation in the business process. Approaches range from control chart methods in SAS/QC software to statistical modeling techniques in SAS/STAT software. Common to all these methods are graphical displays for visualizing the variation, which is critical for explaining results to clients and management, and these displays can be created with ODS Statistical Graphics in SAS 9.2. The paper provides example programs to get you started.
Make it SASsy: Using SAS® to Generate Personalized, Stylized, Automated Email
Lisa Walter, Cardinal Health
(Monday, 2:00 PM - 2:50 PM, Location: Wright Ballroom C)
Email is everywhere! With the continuously growing number of outlets to stay connected with email, some could argue it is the best way to reach an audience. Knowing how to effectively integrate SAS with email messaging is beneficial to the end user, the business, and the programmer. This paper will explain how to use macro variables, the FILENAME statement, and some introductory HTML and CSS to create professional looking, personalized, automated emails. An experienced SAS user with a good understanding of the macro language will be able to integrate the discussed techniques into current reporting and messaging processes. The provided examples are designed using SAS 9.1.3 on a server environment.
SAS® Macro Application Using Microsoft Excel As Both Control File and Process Documentation
Mike Tangedal, self
(Monday, 3:00 PM - 3:50 PM, Location: Wright Ballroom C)
SAS Macros are most useful with repeated processes. Control files are utilized most often when the number of parameters to be processes is excessive or monitoring of each run is beneficial. Microsoft Excel can be used to hold this control file information in a repeated process. Information from this metadata is easily converted into SAS data set variables and then into SAS macro variables. A simple SAS macro can be used to select the requested record from this control file and convert this information into SAS macro variables to replace the parameters in the production code. This process also insulates the production code from regular changes but shifts responsibility to the control file. This shift in responsibility calls for the contents of the SAS log to be written to a file associated with the control file to act as a time stamp and to help in monitoring the process. Finally, additional benefits can be derived from incorporating a quality assurance SAS macro as well to ultimately create a spreadsheet entry containing required metrics for each variable in the source file. For those who are comfortable with Excel and require more quality assurance, this methodology is greatly beneficial.
Setting Up Project-specific SAS® Windows Environments
Philip Wright, The University of Michigan, Institute for Social Research
(Monday, 4:00 PM - 4:20 PM, Location: Wright Ballroom C)
The paper highlights the different SAS windows environment customizations that can be made, especially those that would benefit specific projects. Customizations include, but are not limited to, specifying work directories, config file options to customize window titles, splash screens, session logging, etc.
Streamlining Reports: A Look into Ad Hoc and Standardized Processes
James Jenson, Self
(Monday, 4:30 PM - 4:50 PM, Location: Wright Ballroom C)
This paper provides a conceptual framework for quantitative reporting processes using SAS in conjunction with Microsoft applications, specifically Excel and PowerPoint. A variety of efficiency challenges, ranging from data compilation in SAS to creating the final comprehensive output for distribution, are addressed separately for both ad hoc and standardized reporting processes. The scope of the paper is broad and is primarily concerned with the interaction of reporting tools, in order to choose the optimal path for summarizing information, depending upon the task at hand. The framework presented draws from 3 years of on-the-job reporting experience in the financial/banking industry.
Introduction to Business Intelligence for SAS® Users
Thomas Miron, Systems Seminar Consultants
(Tuesday, 8:00 AM - 8:50 AM, Location: Wright Ballroom C)
Business Intelligence, or BI, refers to the compilation and presentation of data to end users as opposed to the gathering of that data from original sources. If you’re creating reports, graphics, predictive models, or other data presentations that drive business decisions then you are involved with business intelligence. This paper introduces key BI concepts and techniques and shows why and how they are useful for any SAS reporting or presentation project, from ad hoc report to full blown data warehouse.
Application Development with SAS® Stored Processes
John Xu, 1ST Consulting LLC
(Tuesday, 9:00 AM - 9:50 AM, Location: Wright Ballroom C)
In this presentation we will show how to convert a SAS program into a Stored Process using SAS Enterprise Guide. We then will show the user interface developed in Microsoft Excel through SAS Add-in for Microsoft Office and through web interface, without additional development work. Furthermore, we will discuss process control, user access security, job queue, etc. They are necessary for SAS programs that consume large amounts of disk space and CPU time.
Seamless Dynamic Web Reporting with SAS®
DJ Penix, Pinnacle Solutions, Inc.
(Tuesday, 10:00 AM - 10:50 AM, Location: Wright Ballroom C)
The SAS Business Intelligence platform provides a variety of reporting interfaces and capabilities through a suite of components. SAS Enterprise Guide, SAS Web Report Studio, SAS Add-In for Microsoft Office, and SAS Information Delivery Portal all provide a means to help organizations create and deliver sophisticated analysis to their information consumers. However, businesses often struggle with the ability to easily and efficiently create and deploy these reports to the web. If it is done, it is usually at the expense of giving up dynamic ad-hoc reporting capabilities in return for static output or possibly limited parameter driven customization.
Obstacles facing organizations that prevent them from delivering robust ad-hoc reporting capabilities on the web are numerous. More often than not, it is due to the lack of IT resources and/or project budget. Other failures may be attributed to stalled negotiations during the reporting requirements development process. If business unit(s) and developers cannot come to consensus on report layout, critical calculations, or even what specific data points should make up the report, projects often come to a grinding halt.
This paper discusses a solution that enables organizations to quickly and efficiently produce SAS reports on the web. It will also discuss how, by providing self-service functionality to end users, most of the reporting requirement development process can be eliminated, thus accelerating production-ready reports and reducing overall maintenance costs of the application. Finally, this paper also explores how other tools on the SAS Business Intelligence platform can be leveraged (if licensed) within an org
Seamless Web Data Entry for SAS® Applications
DJ Penix, Pinnacle Solutions, Inc.
(Tuesday, 11:00 AM - 11:50 AM, Location: Wright Ballroom C)
For organizations that need to implement a robust data entry solution, options are somewhat limited on the SAS platform. The SAS/FSP module provides integrated tools for data entry, computation, query, editing, validation, display, and retrieval; however, the solution falls short for organizations that need to deploy a web-entry system on a thin-client platform (i.e. Internet Explorer, Mozilla Firefox, etc.) Some companies will tackle the problem using other technologies such as HTML, .NET, or JAVA for example. But each of these approaches often requires expertise beyond the typical SAS programmer to build and maintain. Built upon an open-source platform, a simple, yet elegant, data entry solution can be obtained. The application data can then be integrated into any SAS application by connecting it via the SAS/ACCESS modules.
Applied Revolutionary BI
Charles Kincaid, COMSYS
(Tuesday, 1:00 PM - 1:50 PM, Location: Wright Ballroom C)
Two years ago, Revolutionary BI was presented at the SAS Global Forum describing a way to bring Business Intelligence and Business Analytics Reporting into the 21st Century. By taking advantage of internet usage patterns developed outside of the corporation as learned from Google, Apple, Amazon, Twitter, Facebook and many more, the Business Analytics paradigm inside the corporation can achieve a new level of capability.
In order to take those revolutionary ideas into the corporate cubicles, however, much of the technical and process details have to be worked out. This paper starts the conversation for implementing Revolutionary BI by presenting ideas and suggestions based on the creativity and experience of the COMSYS Analytic Development Practice.
This paper is intended for any SAS user or manager interested in enhancing the analytical reporting capabilities within their organization, regardless of skills or background. In order to suggest directions for implementing these ideas, we will point to a variety of technologies within SAS and outside of SAS, and some that may not even exist yet.
The attendee will come away with ideas and suggestions that can begin the conversation and conversion within their company across the board or in select areas.
SAS® Data and Stored Process for BlackBerry
Sy Truong, MetaXceed, Inc.
DJ Penix, Pinnacle Solutions, Inc.
(Tuesday, 2:00 PM - 2:50 PM, Location: Wright Ballroom C)
In the competitive smart phone environment, RIM BlackBerry® remains the leading smart phone worldwide changing how we use mobile computing. Although both the iPhone and Android have been making the headlines, BlackBerry is quietly growing in the area of mobile applications with its introduction of App World 2.0 and BlackBerry OS 5.0. All major websites such as Google, eBay, and Amazon have their services available as an “App” rather than requiring the user to launch a web browser. These BlackBerry applications add an efficient use of local mobile device resources that enhance the user mobile experience while leveraging traditional client server computing features that are similar to that of a web browser. SAS® has been an analytical business intelligence powerhouse for many years; yet it is a relative late comer to this mobile computing revolution. This paper demonstrates how a SAS data set can be viewed on a BlackBerry. It also takes a sample SAS stored process, or a traditional SAS macro, and its parameters which can be presented through a GUI for user selection on a BlackBerry. It then will execute the program with the output results displayed on the BlackBerry for review. Imagine how you can access the most up to date and dynamic business information delivered directly to you - anywhere where there is a cellular signal. At last, business analytics is no longer placed behind walls guarded by legions of power users, but rather it will be unleashed to users on the go!
Some New and Not So New Batch Processing Techniques
Mike Vanderlinden, COMSYS
Brian Varney, COMSYS
(Tuesday, 3:00 PM - 3:50 PM, Location: Wright Ballroom C)
By using a Windows batch file to submit a SAS program, one can minimize the maintenance and maximize the repeatability of SAS program submissions. This paper intends to discuss strategies as well as some new functionality in SAS 9.2. This paper also addresses ways to structure SAS parameterized programs that will make them easier to maintain thus reducing the chance of human error.
Foundations & Fundamentals (Beginning Tutorials)
Turning Raw Data into Polished ReportsLora Delwiche, University of California, Davis
Susan Slaughter, Avocet Solutions
(Monday, 8:00 AM - 8:50 AM, Location: Mitchell)
This paper shows how to turn raw data into polished looking reports. Starting at the beginning with reading your raw data file, we show how to sort and print your data, format data values, and add summary information to your reports. Using features of the Output Delivery System that are easy to learn, we also show how to create different types of output, and control style, font and color in your reports. With these techniques, you can create reports that are attractive and draw attention to your main points to impress your boss or client.
DATA Step and PROC SQL Programming Techniques
Kirk Paul Lafler, Software Intelligence Corporation
(Monday, 9:00 AM - 9:50 AM, Location: Mitchell)
Not sure whether to use a DATA step or PROC SQL step in your next project? This presentation examines the similarities and differences between DATA step and PROC SQL programming techniques. Topics include conditional logic concepts and scenarios such as IF-THEN-ELSE, SELECT-WHEN, and PROC SQL CASE expressions; row and column summarization techniques; and effective merge and join constructs. Attendees explore examples that contrast DATA step and PROC SQL programming techniques including conditional logic scenarios; conventional merge and join techniques including Cartesian Product joins, one-to-one match-merges and joins; and DATA step hash and PROC SQL hash techniques; and unconventional outer merge and join techniques.
SAS® System Options: The True Heroes of Macro Debugging
Kevin Russell, SAS
(Monday, 10:00 AM - 10:50 AM, Location: Mitchell)
It is not uncommon for the first draft of any macro application to contain errors. However, without debugging options, it is almost impossible to locate and correct all the errors in an application. Therefore, the first step in writing a macro application is to turn on the debugging options. This paper is intended for programmers at all levels who write macro code, and it concentrates primarily on three SAS system options that you can use to debug macros: MLOGIC, MPRINT, and SYMBOLGEN (our superheroes). In addition, this paper includes some less common options (the superhero sidekicks) that also display valuable information in the SAS log. These options can help you determine the source of many errors that occur when you program with the macro language.
Macrotize! A Beginner's Guide to Designing and Writing Macros
Stacey D. Phillips, i3 Statprobe
(Monday, 11:00 AM - 11:50 AM, Location: Mitchell)
Macro language can often be challenging and intimidating to the novice SAS programmer, and sometimes it’s hard to know how to get started. This paper is an introduction to the design aspects of creating macros and focuses less on nitty-gritty macro language details. The paper will demonstrate how to think about macro design using a real-world example where the programmer needed to reduce more than a hundred data sets of regression output down to three table summary tables. Using several easy steps, the reader will learn some great ways to make macro writing simpler as well as see an example of how to reduce a seemingly overwhelming task down to a simple and efficient macro.
ERRORs, WARNINGs, and NOTEs (Oh My!): A Practical Guide to Debugging SAS® Programs
Lora Delwiche, University of California, Davis
Susan Slaughter, Avocet Solutions
(Monday, 1:00 PM - 1:50 PM, Location: Mitchell)
Errors? "I never get errors!" Warnings? "I never bother to read them, my program still runs." Notes? "Who needs them?" This paper is based on the belief that debugging your programs is not only necessary, but also a good way to gain insight into how SAS works. Once you understand why you got an error a warning or a note, you'll be better able to avoid problems in the future. In other words, people who are good debuggers are good programmers. We cover common problems including missing semicolons, invalid data, truncated character values, missing-data-was-generated messages, character to numeric conversions, and the tricky problem of a DATA step that runs without suspicious messages but nonetheless produces the wrong results. For each problem we decipher the message, explain possible causes, and explain how to fix the problem. We also cover some programming strategies to avoid errors in the first place.
The Greatest Hits: ODS Essentials Every User Should Know
Cynthia Zender, SAS
(Monday, 2:00 PM - 2:50 PM, Location: Mitchell)
Just when you think you know every song (feature) in the ODS hit parade, you discover that there’s an option or destination or feature that has you singing its praises because the feature boosted your reports to the next level. Rather than covering every single little thing that ODS can do, this presentation covers some of the essential features and options of ODS that every user needs to know to be productive. This presentation will show you concrete code examples of the ODS “Greatest Hits”. Come to this presentation and learn some of the essential reasons why ODS rocks!
Reporting in the Age of Twitter: Concise Reports and Continuous Real-Time Monitoring
LeRoy Bessler, Bessler Consulting and Research
(Monday, 3:00 PM - 3:20 PM, Location: Mitchell)
Twitter is built on the concept of concise messages, and encourages, or at least facilitates, continuously keeping in touch.
This presentation will provide you two macros to create compact tabular reports. One of them “shows you the most with the least”, by implementing my long advocated design mantra of “Show Them What’s Important” and the concept of “Let Part Stand for the Whole”. The other provides a “twist”, literally and figuratively, on the traditional default wide display of columns, which can entail the annoying, inconvenient, anti-communicative requirement for sideways scrolling on a web page.
This presentation will also show you how to keep in touch with your data. You can build yourself a monitor to continuously check status of critical indicators and then send email alerts to people (including or only yourself if appropriate) with a need to know.
The Mystery of the PROC SORT Options NODUPRECS and NODUPKEY Revealed
Britta KelseyBassett, The Schwan Food Company
(Monday, 3:30 PM - 3:50 PM, Location: Mitchell)
The NODUPRECS (or NODUP) and NODUPKEY options can be useful with the SORT procedure but they can be dangerous if you do not understand them completely. They work similarly in that they both can eliminate unwanted observations, but NODUPRECS compares all the variables in your data set while NODUPKEY compares just the BY variables. Also, you must be aware of how your data set is currently sorted to eliminate the observations that you want because these options compare adjacent observations. In this paper, I will describe this in greater detail and show examples of how to use the NODUPRECS and NODUPKEY options. All examples shown were done in the SAS® system for PCs, version 9.1.3. The intended audience for this paper is beginner level SAS programmers.
Exploring, Analyzing, and Summarizing Your Data: Choosing and Using the Right SAS Tool from a Rich Portfolio
Douglas Thompson, Assurant Health
(Monday, 4:00 PM - 4:50 PM, Location: Mitchell)
This is a high-level survey of Base SAS and SAS/STAT procedures that can be used to see what your data is like, to perform analyses understandable by a non-statistician, and to summarize data. It is intended to help you to make the right choice for your specific task. Suggestions on how to make most efficient or effective use of some procedures will be offered. Examples will be provided of situations in which each procedure is likely to be useful, as well as situations in which the procedures might yield misleading results (for example, when it is better to use REG instead of CORR for looking at associations between two variables). For procedures that create listings only , or for situations in which you want to capture only part of the listing output, we will explain how to get the output with ODS. Some nifty graphical tools in PROCs UNIVARIATE and REG will be illustrated. Procedures discussed include: FREQ, SUMMARY/MEANS, UNIVARIATE, RANK, CORR, REG, CLUSTER, FASTCLUS, and VARCLUS. The intended audience for this tutorial includes all levels of SAS experience.
Using SAS® Output Delivery System (ODS) Markup to Generate Custom Pivot Tables and Pivot Charts
Chevell Parker, SAS
(Tuesday, 8:00 AM - 8:50 AM, Location: Mitchell)
This paper illustrates how to use ODS markup to create pivot tables and pivot charts. You can use these tables and charts in reports that help your organization with business requirements such analyzing expense trends over time or planning for inventory. Users of all experience levels will learn how to specify options in an ODS statement that enable them to perform the following tasks:
• customize pivot-table layouts
• customize cell format
• specify statistics for the analysis
• modify the display of the analysis
• generate a pivot table and a pivot chart for each worksheet
• format tables that you create
• customize worksheets
PROC REPORT: Compute Block Basics
Art Carpenter, California Occidental Consultants
(Tuesday, 9:00 AM - 9:50 AM, Location: Mitchell)
One of the unique features of the REPORT procedure is the Compute Block. Unlike most other SAS procedures, PROC REPORT has the ability to modify values within a column, to insert lines of text into the report, to create columns, and to control the content of a column. Through compute blocks it is possible to use a number of SAS language elements, many of which can otherwise only be used in the DATA step. While powerful, the compute block can also be complex and potentially confusing. This tutorial introduces basic compute block concepts, statements, and usages. It discusses a few of the issues that tend to cause folks consternation when first learning how to use the compute block in PROC REPORT.
The SAS® DATA Step: Where Your Input Matters
Peter Eberhardt, Fernwood Consulting Group Inc.
(Tuesday, 10:00 AM - 10:50 AM, Location: Mitchell)
Before the warehouse is stocked, before the stats are computed and the reports run, before all the fun things we do with SAS® can be done, the data need to be read into SAS. A simple statement, INPUT, and its close cousins FILENAME and INFILE, do a lot. This paper will show you how to define your input file and how to read through it, whether you have a simple flat file or a more complex formatted file.
Set, Match, Merge ... Don’t You Love SAS®?
Peter Eberhardt, Fernwood Consulting Group Inc.
Ying Liu,
(Tuesday, 11:00 AM - 11:50 AM, Location: Mitchell)
One table has accounts, another has new transactions. One table has patient data, another has treatments. No matter what your business or application it is rare that a single data set or table has everything you need to answer the questions you are expected to answer. SAS® has a simple and powerful mechanism to bring your tables together: the DATA Step Merge. But this simple operation can easily run awry. In this paper we will explain the basics of the DATA step merge, and the issues and problems you will encounter if you do not understand its workings. This paper is geared towards beginning SAS programmers, but it can be a useful refresher for anyone.
Choosing the Right Tool from Your SAS® and Microsoft Excel Tool Belt
Jennifer First, Systems Seminar Consultants
Steven First, Systems Seminar Consultants
(Tuesday, 1:00 PM - 1:50 PM, Location: Mitchell)
There are over a dozen ways to bring your data into Microsoft Excel and to push it out to Excel. Some are simple wizards, and others are more complex programming techniques. How do you know which tool is the best for your application? Which will cause you the least number of headaches? Which will produce the quickest, most accurate results? Which will best satisfy the end user? We will give an overview of import and export techniques for Excel and make recommendations for different application types.
An Introduction to SQL in SAS®
Pete Lund, Looking Glass Analytics
(Tuesday, 2:00 PM - 2:50 PM, Location: Mitchell)
SQL is one of the many languages built into the SAS® System. Using PROC SQL, the SAS user has access to a powerful data manipulation and query tool. Topics covered will include selecting, subsetting, sorting and grouping data--all without use of DATA step code or any procedures other than PROC SQL.
33 Tricks with PROC REPORT: A Quick Introduction to the Powerful REPORT Procedure
Ben Cochran, The Bedford Group
(Tuesday, 3:00 PM - 3:50 PM, Location: Mitchell)
When the REPORT Procedure was first introduced by SAS with the advent of Version 6, most SAS users were running on the mainframe. This new procedure brought with it a great deal of power and flexibility that added much strength to SAS’ arsenal of report generating procedures. It had powerful formatting, summarizing, and analysis capabilities that made it easier to create anything from a simple listing to a very complex report. However, some critics have stated that, while it has awesome features, it looks like a mainframe report. That may have been true until SAS released the Output Delivery System (ODS) experimentally in Version 7, and in production mode in Version 8. If the REPORT procedure was a cake, then ODS delivers the icing to generate truly beautiful reports. This paper offers a quick overview of the types of reports that can be generated with PROC REPORT, and how to add some of the ODS features to deliver stunning reports.
Hands-On Workshops
PROC REPORT Basics: Getting Started with the Primary StatementsArt Carpenter, California Occidental Consultants
(Monday, 8:00 AM - 9:50 AM, Location: Regency Ballroom)
The presentation of data is an essential part of virtually every study, and there are a number of tools within SAS® that allow the user to create a large variety of charts, reports, and data summaries. PROC REPORT is a particularly powerful and valuable procedure that can be used in this process. It can be used to both summarize and display data, and is highly customizable and highly flexible.
Unfortunately for many of those just starting to learn PROC REPORT, the terms "customizable" and "flexible" often seem to be euphemisms for "hard to learn". Fortunately PROC REPORT is NOT as hard to learn as it appears. All you really need to have in order to get started is a basic knowledge of a few primary statements.
In this introduction to PROC REPORT you will learn to use the PROC REPORT statement and a few of its key options. Several of the supporting statements, including COLUMN, DEFINE, BREAK, and RBREAK, and their primary options will also be covered.
The SAS® Hash Object: It’s Time To .find() Your Way Around
Peter Eberhardt, Fernwood Consulting Group Inc
(Monday, 10:00 AM - 11:50 AM, Location: Regency Ballroom)
“This is the way I have always done it and it works fine for me.”
Have you heard yourself or others say this when someone suggests a new technique to help solve a problem? Most of us have a set of tricks and techniques from which we draw when starting a new project. Over time we might overlook newer techniques because our old toolkit works just fine. Sometimes we actively avoid new techniques because our initial foray leaves us daunted by the steep learning curve to mastery. For me, the PRX functions and the SAS® hash object fell into this category.
In this workshop, we address possible objections to learning to use the SAS hash object. We start with the fundamentals of the setting up the hash object and work through a variety of practical examples to help you master this powerful technique.
PROC TABULATE: Getting Started
Art Carpenter, California Occidental Consultants
(Monday, 1:00 PM - 2:50 PM, Location: Regency Ballroom)
Although PROC TABULATE has been a part of Base SAS® since early version 6, this powerful analytical and reporting procedure is very under-utilized. TABULATE is different; its step statement structure is unlike any other procedure. Because the programmer who wishes to learn the procedure must essentially learn a new programming language, one with radically different statement structure than elsewhere within SAS, many do not make the effort.
The basic statements will be introduced, and more importantly the introduction will provide a strategy for learning the statement structure. The statement structure relies on building blocks that can be identified and learned individually and in concert with others. Learn how these building blocks form the structure of the statement, how they fit together, and how they are used to design and create the final report.
A Cup of Coffee and PROC FCMP: I Cannot Function Without Them
Peter Eberhardt, Fernwood Consulting Group Inc.
(Monday, 3:00 PM - 4:50 PM, Location: Regency Ballroom)
How much grief have you put yourself through trying to create macro functions to encapsulate business logic? How many times have you uttered "If only I could call this DATA step as a function"?
If any of these statements describe you, then the new features of PROC FCMP are for you. If none of these statements describe you, then you really need the new features of PROC FCMP. This paper will get you started with everything you need to write, test, and distribute your own "data step" functions with the new (SAS® 9.2) PROC FCMP. This paper is intended for beginner to intermediate programmers, although anyone wanting to learn about PROC FCMP can benefit.
Using PROC SGPLOT for Quick, High-Quality Graphs
Lora Delwiche, University of California, Davis
Susan Slaughter, Avocet Solutions
(Tuesday, 8:00 AM - 9:50 AM, Location: Regency Ballroom)
New with SAS® 9.2, ODS Graphics introduces a whole new way of generating high-quality graphs using SAS. With just a few lines of code, you can add sophisticated graphs to the output of existing statistical procedures, or create stand-alone graphs. The SGPLOT procedure produces a variety of graphs including bar charts, scatter plots, and line graphs. This paper shows how to produce several types of graphs using PROC SGPLOT, and how to create paneled graphs by converting PROC SGPLOT to PROC SGPANEL. This paper also shows how to send your graphs to different ODS destinations, how to apply ODS styles to your graphs, and how to specify properties of graphs, such as format, name, height, and width. Last, this paper shows how to use the SAS/GRAPH® ODS Graphics Editor to make one-time changes to graphs.
SAS/GRAPH® Elements You Should Know – Even If You Don’t Use SAS/GRAPH
Art Carpenter, California Occidental Consultants
(Tuesday, 10:00 AM - 11:50 AM, Location: Regency Ballroom)
We no longer live or work in a line printer - green bar paper environment. Indeed many of today’s programmers do not even know what a line printer is or what green bar paper looks like. Our work environment expects reports which utilize various fonts, with control over point size and color, and the inclusion of graphic elements. In general we are expected to produce output that not only conveys the necessary information, but also looks attractive. In the line printer days little could be done with the style and appearance of reports, and consequently little was expected. Now a great deal of control is possible, and we are expected to take advantage of the tools available to us. We can no longer sit back and present a plain vanilla report.
The Output Delivery System gives us a great deal of the kind of control that we must have in order to produce the kinds of reports and tables that are expected of us. Although we will often include graphical elements in our tables, it turns out that a number of options, statements, and techniques that are associated with SAS/GRAPH can be utilized to our benefit even when we are NOT creating graphs. Learn how to take advantage of these graphical elements, even when you are not using SAS/GRAPH.
A Hands-on Tour Inside the World of PROC SQL
Kirk Paul Lafler, Software Intelligence Corporation
(Tuesday, 1:00 PM - 2:20 PM, Location: Regency Ballroom)
Structured Query Language (PROC SQL) is a database language found in the Base SAS® software. It enables access to data stored in SAS data sets or tables using a powerful assortment of statements, clauses, options, functions, and other language features. This hands-on workshop presents core concepts of this powerful language as well as its many applications and is intended for SAS users who desire an overview of the capabilities of this exciting procedure. Attendees explore the construction of simple SQL queries, ordering and grouping data, the application of case logic for data reclassification, the creation and use of views, and the construction of simple inner and outer joins
Output Delivery System (ODS) – Simply the Basics
Kirk Paul Lafler, Software Intelligence Corporation
(Tuesday, 2:30 PM - 3:50 PM, Location: Regency Ballroom)
Are you looking for ways to improve the way your SAS® output appears? Output Delivery System (ODS) can help turn tired-looking output into great looking information with a purpose. Gone are the days when the only available formatting choice is boring output listings containing lifeless monospace fonts. ODS introduces exciting new features for your output. Using built-in format engines, ODS provides SAS users with a powerhouse of exciting capabilities to produce “quality” and publishable output. This hands-on workshop shows users how to select the output of interest using selection (and exclusion) lists; and how ODS is used to send selected data and output to output destinations including RTF, MS Excel spreadsheets, PDF, HTML, and SAS data sets.
Health Care & Health Insurance
A SAS® Primer for Health Care Data AnalystsChristopher Schacherer, Clinical Data Management Systems, LLC
Brent Westra, Mayo Clinic Health Solutions
(Tuesday, 8:00 AM - 8:50 AM, Location: Kilbourn)
As in other fields, analysts in healthcare come to their vocation from a variety of paths—statisticians with formal training in predictive modeling, IT professionals with experience in programming and report writing, billing and claims professionals who possess a wealth of subject-matter expertise, and accounting and finance professionals who understand the methodologies for valid reporting of financial results. Analytics and Business Intelligence groups within healthcare organizations are strengthened by this diversity of expertise, but individuals coming into a business intelligence or healthcare analytics department from these other areas often struggle to acquire the skills that will help them leverage their particular backgrounds against the analytic challenges they face. The current work focuses on helping these analysts acquire the SAS programming skills necessary to turn raw data into analysis-ready data sets. The intention is not to train analysts in the finer points of each PROC and variation on DATA STEP and MACRO programming, but to provide a basic understanding of SAS concepts using specific techniques that will be useful to them in their attempts to create analytic data sets from their source data.
Medical Guidelines: Consensus Panel Versus Healthcare Analytics
Patricia Cerrito, University of Louisville
(Tuesday, 9:00 AM - 9:50 AM, Location: Kilbourn)
Adherence to medical guidelines will become much more important given the recently enacted healthcare bill. However, there is often little evidence from studies using real data to define just what the guidelines should be. As a result, physician consensus panels are often used to define guidelines with the physicians relying primarily upon personal, anecdotal experience or upon general summary statistics. When this occurs, the average population value is used as an upper or lower limit, thereby defining half of the population in need of treatment, which then changes the average so that the guidelines are lowered or raised to reflect this new average. We can use available treatment records to investigate the relationship between the consensus parameters and other diseases using healthcare analytics. In this study, we examine hypertension, cholesterol, and BMI (body mass index). We use techniques in SAS/Stat, SAS Enterprise Miner, and SAS Text Miner. Results show that the relationship between the consensus of "high" levels versus co-morbidities is at best tenuous and at worse detrimental to overall health.
Using HPMIXED and Other SAS® Procedures to Efficiently Analyze Large Dimension Registry Data
Matthew Fenchel, Cincinnati Children's Hospital Medical Center
Gary McPhail, Cincinnati Children's Hospital Medical Center
Rhonda VanDyke, Cincinnati Children's Hospital Medical Center
(Tuesday, 10:00 AM - 10:50 AM, Location: Kilbourn)
HPMIXED is an experimental procedure introduced in SAS/STAT 9.2 software. Using sparse matrix techniques, PROC HPMIXED can process models with a very large number of fixed or random effects much more efficiently than the MIXED or GLIMMIX procedures. However, this initial release lacks some tools often needed in analyses. With Registry data (1994-2007) from the Cystic Fibrosis Foundation, we estimated FEV1% predicted (a standard measure of lung function) slope using a random intercept and slopes mixed model and compared parameter estimates and run-times of PROC HPMIXED with the MIXED and GLIMMIX procedures. We then ran PROC SGPLOT and PROC UNIVARIATE to produce residual, normal and quantile-quantile plots from the PROC HPMIXED output. Using the covariance parameter estimates from PROC HPMIXED, we show how PROC GLIMMIX and PROC MIXED can be used to “re-run” the models (now more quickly) to produce output not yet available in PROC HPMIXED (i.e. residual diagnostics, influence diagnostics, LS-means comparisons, etc.). Using a subset of 26 CF care centers (1,195 patients used as random effects), PROC HPMIXED successfully ran the model in a real-time of 6.89 seconds. PROC MIXED required almost 8 minutes; PROC GLIMMIX almost 60 minutes. PROC HPMIXED is a powerful, consistent tool for fitting large dimension data – especially mixed modeling. Such models might not be possible using the MIXED or GLIMMIX procedures. The applications described in this paper can be used by someone with an intermediate knowledge of mixed models and SAS procedures, using computing resources with at least 3 GB of RAM.
Plotting Summary ROC Curves from Multiple Raters Using the SmryROCPlot Macro
Matthew Karafa, Cleveland Clinic Foundation
Nancy Obuchowski, Cleveland Clinic Foundation
(Tuesday, 11:00 AM - 11:20 AM, Location: Kilbourn)
As an alternative to existing solutions provided by others, the authors created a macro to construct an ROC curve using average measures. The major problem with generating such a curve is for a given false positive rate (FPR, or 1- specificity), a rater may not have an actual measurement. Thus to accurately create the plot, we need to interpolate between the nearest actually observed points for a given reader to generate the Summary ROC plot. SmryROCPlot() can do this interpolation for you with minimal inputs. The user provides the SmryROCPlot() macro with variable names for the true event state, the observed event state, and the rater’s identification. The user then also lists the FPR’s at which the plot should be generated. The macro steps through each rater’s data, and either uses the actual measured specificity if it is available or calculates a simple linear interpolation between the nearest observed FPR’s for each “plot” FPR. The macro then provides an output data set which can be used by PROC GPLOT to make a nice looking Summary ROC curve.
SAS® with CLAS: Assessing Adherence to Federal Standards for Culturally and Linguistically Appropriate Services
Michelle Hopkins, Stratis Health
(Tuesday, 11:30 AM - 11:50 AM, Location: Kilbourn)
This paper shows how Base SAS 9 was used to do a gap analysis on how several health care facilities are doing on adhering to the 14 federal standards for Culturally and Linguistically Appropriate Services (CLAS). In the summer of 2009, these facilities were asked to have staff complete an online assessment in order to evaluate how well they were meeting these standards. SAS was used to analyze responses, and export scores and recommendation for improvement into a MS Excel report. In the summer of 2010, these same facilities, after completing several educational opportunities, are again completing the online assessment in order to evaluate how well they are meeting these standards after implementing various interventions. SAS is again being used to analyze responses, export scores and recommendations for improvement. SAS is also being used to evaluate whether or not there were statistically significant improvements between the pre-CLAS assessment and the post-CLAS assessment. Only basic SAS skill levels are required of the audience.
Comparative Effectiveness Analysis and Statistical Methodology
Patricia Cerrito, University of Louisville
John Cerrito, Kroger Pharmacy
(Tuesday, 1:00 PM - 1:50 PM, Location: Kilbourn)
The purpose of comparative effectiveness analysis is ordinarily defined as a means to compare the benefits of drug A versus drug B. However, particularly in relationship to cancer drugs, there is only drug A. Therefore, comparative effectiveness analysis tends to compare drug A to a quality adjusted threshold value, with a frequent conclusion that the cost of the drug is not worth the additional life. Ordinarily, a societal perspective is used to deny the drugs, since the additional life may be worth the drug cost for the patient. The British organization, the National Institute for Clinical Excellence (NICE) has denied many cancer drugs to their patients. The Centers for Medicaid and Medicare want to initiate a similar process, denying treatments that exceed a quality adjusted price of $50,000. There are similar provisions in the Healthcare Reform Act. With the emphasis upon medications, medical procedures are not subject to this comparative effectiveness scrutiny; procedures can frequently exceed the cost of medication treatments. However, each medication is considered separately; no analysis examines the total contribution of the treatment to the overall cost of healthcare. The Medical Expenditure Panel Survey can be used to find the national contribution of costs resulting from a patient treatment. We will demonstrate using SAS techniques how we can investigate the contribution of a procedure to the total cost of healthcare.
To Investigate the Impact of Medicare Part D on the Cost Effectiveness of Diabetes Medications and Health Outcomes with SAS®
Xiao Wang, University of Louisville
(Tuesday, 2:00 PM - 2:20 PM, Location: Kilbourn)
The purpose of this paper is to estimate the cost effectiveness of diabetes medications and the health outcomes in Medicare in 2005 and 2006 to examine the impact of Medicare, Part D.
In this study, several data sets from the Medical Expenditure Panel Survey are used, providing information on the prescribed drug, physician visits, home health care and inpatients. A period life table is utilized to calculate QALY (quality adjusted life year). The analysis is based on the Medicare drug plan used by patients with diabetes, who are also insured by Medicare in 2005. They are discovered in the data using the SAS SQL procedure. Cost effectiveness is evaluated by Medicare expenditures per QALY gained from the year 2005 to the year 2006. The MEANS procedure in Base SAS is used to compare the utilizations of health care services in these two years. The Decision Tree Model in Enterprise Miner is mainly used to predict the health outcomes.
Results show that from the year 2005 to the year 2006, insulin becomes the most cost-effective treatment and the combination of glyburide and metformin is the most inefficient. Glipizide and insulin users highly decrease the length of hospitalization at the cost of increasing the number of prescriptions filled. Metformin users increase their length of stay in the hospital and the frequency of prescription by 200% and 73% respectively. In 2006, drugs begin to account for a large number of the Medicare expenditures and have a decisive role in patient health status.
Tips for Automating Univariate Outcomes Analysis in Hematopoietic Stem Cell Transplantation
Peigang Li, CIBMTR, Medical College of Wisconsin, Milwaukee, Wisconsin
Xiaochun Zhu, CIBMTR, Medical College of Wisconsin, Milwaukee, Wisconsin
Min Chen, CIBMTR, Medical College of Wisconsin, Milwaukee, Wisconsin
(Tuesday, 2:30 PM - 2:50 PM, Location: Kilbourn)
Hematopoietic stem cell transplantation has been used to treat patients diagnosed with a variety of diseases including leukemia, severe aplastic anemia, Hodgkin’s disease, non-Hodgkin’s lymphoma, multiple myeloma, a number of blood disorders, and some solid tumor cancers. Typical outcomes include treatment-related mortality (TRM), relapse (REL), progression free survival (PFS), and overall survival (OS). Univariate outcomes analysis is characterized as either survival or cumulative incidence rate without adjusting for other covariates, i.e., unadjusted analysis (Klein & Moeschberger 2003; Klein et al. 2001, Part I). It is desirable to automatically generate summary tables containing probability, confidence intervals and p-values. Log-rank or pointwise p-values are used to look at equality over the strata or at a given fixed point in time such as three-year survival. The process was facilitated with a set of SAS macros.
Gastrointestinal Diseases: Diagnoses, Misdiagnoses, and Comorbidities
Pedro Ramos, University of Louisville
(Tuesday, 3:00 PM - 3:20 PM, Location: Kilbourn)
The gastrointestinal track is a complex system of organs that are exposed to external elements on a daily basis. Some of the diseases that affect the digestive system are caused by external agents; yet some of these diseases have been theoretically explained as emotional responses due to the intricate nervous network embedded in the digestive system. This paper shows how various data mining techniques and statistical methods can be employed studying this subject matter. SAS Enterprise Guide 9.2® was used to explore the MarketScan data set for the years 2000-2001, containing information on inpatient and outpatient doctor visits as well as the consumption of prescription drugs used by millions of Americans with health care coverage. The MarketScan database records patient specific health care utilization and expenditure for inpatient and outpatient settings as well as prescription drugs. It includes about 100 private sector payers and over 500 million claim records. Specifically, this data set represents the medical experience of privately insured employees and their dependents for the year 2000 and 2001. SAS Enterprise Miner 6.1® was used to process the data and conduct the data mining and statistical processes. Cluster analysis was used to define subpopulations while link analysis and association analysis were employed to define relationships between diseases and medications. Similarly, predictive models were designed to classify patients and predict how gastrointestinal diseases may affect them. In addition, by implementing categorical data analysis, risk factors were determined for several diseases such as gastric and colon cancer.
Characteristic Analysis of the Hospitalized Children with Lower Extremity Fractures in 2006: A Population-Based Approach
Yubo Gao, University of Iowa
(Tuesday, 3:30 PM - 3:50 PM, Location: Kilbourn)
The study examined the demographic and hospitalization characteristics of children hospitalized with Lower Extremity Fractures (LEF) using 2006 national discharge data from the Healthcare Cost and Utilization Project (HCUP) Kids’ Inpatient Databases (KID). Of the nearly 3.1 million cases in the database, 11,903 (0.38%) were identified through diagnostic coding of LEF. More boys than girls have LEF, and more than one half had private insurance as their primary payer. About one half of the children were between the ages of13 and 20 years, but all ages were represented from age 0 to 20. White children represented 56 percent. 93 percent LEF children stayed in urban hospitals, and 66 percent hospitals had teaching status. All patients had an average length of stay (LOS) 4.04 days, and infant patients had the longest average LOS of 5.46 days. The average number of diagnoses is 3.07, and the average number of procedures is 2.21. The average charge is $35,236, and the oldest patients had the largest average charge of $41,907. The average number of comorbidities increased as patient got older. The odds of child death are 55.6% more in non-teaching hospitals than in teaching hospitals, and the odds of child death in rural hospitals is ten times more than in urban hospitals.
JMP
Comparing JMP® and SAS® for Validating Clinical TrialsSandra Schlotzhauer, Schlotzhauer Consulting LLC
(Monday, 8:00 AM - 8:50 AM, Location: MacArthur)
When validating clinical trial analyses, an independent programmer typically confirms the results. Most companies use SAS as the standard software for performing analyses and generating results to submit to regulatory agencies. Validation requires replicating the content but not the appearance of the results. For example, the validator confirms a p-value, but does not format the results to match the appearance in a table submitted to the FDA. This paper discusses experiences in using both SAS and JMP for validation. The paper discusses the strengths of each software application.
Ad Hoc and Statistical Model Visualization Using JMP®, SAS®, and Excel
Jon Weisz, SAS
(Monday, 9:00 AM - 9:50 AM, Location: MacArthur)
Generally speaking, models are abstractions of real-life systems used to facilitate understanding and to aid in decision making. However, models mean different things in different disciplines. Engineers, financial analysts and statisticians all employ modeling as a core tool, but each discipline would define models differently. Even with the differences in how models are defined and developed, all disciplines need ways to communicate models and perform what-if analyses and simulations. This paper will highlight the use of the Profiler feature in JMP to visualize and perform what-if analysis and Monte Carlo simulation for models defined using SAS/STAT®, Microsoft Excel and engineering tools.
Using JMP® and SAS® Software Together to Prepare a Last-Minute Research Project for Publication, Or Statistical Modeling of Breathing Relief in Sarcoidosis Patients
James Woods, Wayne State University Detroit
(Monday, 10:00 AM - 10:50 AM, Location: MacArthur)
Can SAS and JMP help cure the "publish or perish" problem in academia? Let's see if we can put SAS and JMP to the test, to help get a research topic quickly submitted for publication in a lofty research journal. A research publication deadline looms and you have a bunch of data in Excel. Fortunately you have SAS and JMP software available, and you've heard that "SAS Saves Time" and "JMP is Statistical Discovery Software." Unfortunately you are a relative beginner in the use of SAS and JMP. Fortunately again, though, you have access to a statistics assistant who knows SAS and JMP and can help you get the correct buttons pressed and the reasonable logic applied.
Using the JMP® 9 R Interfaces to Perform Bayesian Analyses: Why, How, and What
Dave LeBlond, Abbott
(Monday, 11:00 AM - 11:50 AM, Location: MacArthur)
Modern computational algorithms such as Markov Chain Monte-Carlo (MCMC) make it possible to find exact solutions to a range of statistical problems that are beyond the reach of traditional statistical methods due to intractable mathematics or the requirement for over-simplifying assumptions. Adoption of these powerful approaches has been inhibited by a lack of appreciation of the Bayesian paradigm, as well as a lack of commercial software. Within the last decade, the advantages of these methods are becoming better known and are now available in commercial software such as SAS. With the inclusion of R interfaces in JMP version 9, a wide selection of statistical tools, including MCMC procedures, are now available to JMP users willing to learn R or WinBUGS language basics and write short JMP language (JSL) scripts. This paper provides simple, motivating examples of Bayesian estimation using random walk Metropolis or Gibbs sampling using JMP R interfaces. The examples include small sample coefficient of variation estimation and variance parameter and tolerance interval estimation for a two level hierarchical process. The examples are based on the author’s experience in medical device and pharmaceutical R&D, but should be of general interest. Background is provided so the reader can appreciate why using Bayesian/ MCMC approaches has value. The steps required to execute this kind of analysis are presented, including loading of the R and WinBUGS packages and required libraries, JSL scripts, and description of the associated theory. Finally, the output will be analyzed using JMP visualization and summary platforms.
The “Forgotten” JMP® Visualizations
Sam Gardner, JMP Division, SAS Institute
(Monday, 1:00 PM - 1:50 PM, Location: MacArthur)
JMP® has a rich set of visual displays that can help you see the information in your data. Many JMP users, however, remain unfamiliar with the graphing options available for multivariate data visualization under the Graph Menu. This talk will expose the features and value of these tools, including: Scatterplot3D, Scatterplot Matrix, Parallel Plot, Cell Plot, Bubble Plot, Variability Charts, and Tree Maps. In addition, some new visualization and charting features available in the upcoming JMP version 9 release will be shown.
Visualization with JMP – The Overlay, Spinning, Surface, and Contour Plots
Mark Anawis, Abbott
(Monday, 2:00 PM - 2:50 PM, Location: MacArthur)
The starting point for analysis should always be a graphical representation of the data. Graphics are a valuable tool for exploring and summarizing data. They often uncover patterns (e.g. clusters and outliers) and provide strategies for further analysis. These strategies may involve segmentation and/or analysis with and without outliers. Several of the plot platforms found in JMP v.8: the Overlay, Spinning, Surface, and Contour Plots are useful in creating an understanding and conveying an understanding of data patterns.
JMP® Simulation-Based Empirical Determination of Robust Scale Estimator and Comparison of Outlier Discrimination Performance
Gennadiy Gorelik, Consultant
Tarun Chandra, EmpiriQA LLC
(Monday, 3:00 PM - 3:50 PM, Location: MacArthur)
A robust scale estimator based on an empirically-derived correction factor (ECF) is proposed. ECF values for simulated samples of varying size (n=2 to 100) drawn from the standard normal distribution are obtained in JMP using the distribution of sample standard deviation (s) to sample median absolute deviation (MAD) ratios. ECF central tendency estimates based on median of s/MAD ratio distributions are compared to corresponding MAD-based theoretical correction factor (TCF) and s-based asymptotic correction factor (ACF=1.4826) estimates. Results indicate TCF > ECF > ACF and ECF (and TCF) asymptotically approach ACF as sample size increases. %Bias in estimation of s using ECF, TCF and ACF was compared using mean, median and root mean square (RMS) values of the normalized s-distribution. Outlier discrimination performance of non-robust scale estimator, s, and robust estimators based on ECF and TCF was also compared using simulations for single outlier-contaminated samples of varying size, n, drawn from a standard normal distribution. Different outlier scenarios were considered by combining (n-1) samples from N[µ = 0, s =1] with one outlier drawn from a statistically distinct normal distribution N[µ > 0, s =1]. For sample sizes n = 40, robust scale estimator ECF*MAD yielded better outlier discrimination when compared to TCF*MAD with “s” being the worst. In practical applications involving small sample sizes and occasional outlier contamination with no assignable cause, an ECF-based scale estimator could yield robust and reliable estimates of scale.
A DOE “Real Life” Example: Solid Phase Optimization of a Diagnostic Immunoassay
Steve Figard, Abbott Laboratories
(Monday, 4:00 PM - 4:50 PM, Location: MacArthur)
Experimental design in the context of immunoassay development in the in vitro diagnostics industry requires a thorough understanding of all aspects of a given product’s composition and manufacturing processes. For immunoassays, a multiplicity of interactions between reagent components, sample composition, and process parameters make characterization of the entire process so complex and time consuming that the advantages of doing so with Design of Experiments, or DOE, become immediately apparent. It is the strength of DOE to use statistics to define and/or optimize processes with fewer resources than standard experimental approaches.
This paper will first present a brief background of immunoassays to provide a context for the example that follows. The specific problem addressed using DOE is one that illustrates to maximum didactic advantage the use of DOE to simultaneously deal with a manufacturing problem that includes optimization of that process. The coating of a microparticle solid phase with antibodies for a specific analyte failed to yield the performance necessary as assessed by three different metrics (responses). Simultaneous optimization with only two DOE experiments corrected the problems and characterized and optimized the manufacturing process to allow the product to eventually go to market.
SPC As Part of an Operations Intelligence System Combining JMP® and SAS®
Michael Haslam, Predictum, Inc.
Wayne Levin, Predictum, Inc.
(Tuesday, 8:00 AM - 8:50 AM, Location: MacArthur)
Statistical Process Control (SPC) systems are setup to provide warnings of unusual or unlikely conditions that might lead to problems. In a typical scenario, a specified rule violation occurs and then a user or set of users may need to act and start an investigation. In addition to Western Electric and Westgard rules, manufacturers are increasingly using multivariate-rules that look for combinations of factor levels that may lead to trouble. All if this is multiplied across the increasing amount of measurement and monitoring found in manufacturing, network monitoring and transactional operations.
A JMP® stand-alone or JMP® and SAS® combination solution in what we refer to as and Operations Intelligence System (OIS) allows technical staff to manage with the increasing volumes of analysis. In addition to higher yields and consistent quality, OIS systems deliver improve personnel utilization and quality of work-life.
Simplifying Drug Discovery with JMP
John Wass, Quantum Cat Consultants
(Tuesday, 9:00 AM - 9:20 AM, Location: MacArthur)
There are many significant uses for JMP in Drug Discovery, and analyses may be performed without resorting to sophisticated (and expensive) high-throughput and data mining software. Although many academicians prefer the use of freely available R-based programs, JMP offers many advantages, not the least of which is its menu-driven simplicity coupled with powerful analytic capabilities. Add to this JMP’s inherent strategy of automatically pairing generated graphics with the numeric output, and the researcher is presented with a fast and easy tool to do preliminary, yet sophisticated, analyses. This presentation will overview some of the areas where JMP may be used in the Discovery process and highlight two areas: pharmacogenomics to define a gene set relative to the use of various drugs and use of sensitivity indices to assess the quality of chemical fragments in defining the probability of a lead actually becoming a drug. Use of descriptive and multivariate statistics as well as the rich graphical environment in JMP will illustrate the processes.
Salesforce Effectiveness at Sears: An Analytic Approach to Goal Setting and Tracking
Carl Schleyer, Sears Holdings Corporation
Jason Chavarry, Sears Holdings Corporation
(Tuesday, 9:30 AM - 9:50 AM, Location: MacArthur)
This paper highlights the organizational and data challenges faced by a small Human Capital Analytics team and how they were able to utilize JMP for multivariate analysis, data modeling, and data visualization to overcome those challenges.
In 2008 Sears Holdings (SHC) implemented a new organizational structure which separated merchant groups into distinct business units. This meant that each new business unit had a new senior leadership team and a new set of expectations. Within the Human Resources organization sat a small four-person team called the Analytic Center of Excellence (ACE).
ACE was asked to evaluate the current goals in place for associates in four separate business units with a workforce of over 30,000 associates. ACE first showed the current levels of compliance with the existing goals. Next, ACE proposed new means of calculating goals. Finally, ACE examined predictors of associate success within each business unit to determine potential constraints to program implementation.
JMP and statistical methodology was used extensively to turn an old-style industry like retail into a data-driven environment. Additionally, since there was an impact on people as opposed to product, proper thresholds had to be developed to ensure that Sears would not be exposed to liability in applying data-driven processes to the performance management sphere.
This is a presentation geared towards beginner to advanced statistical users interested in seeing how analytics operates in both a large corporate environment and in the human resources field.
Posters
Eliminating Redundant Custom Formats (or How to Really Take Advantage of PROC SQL, PROC CATALOG, and the DATA Step)Philip Wright, The University of Michigan-Institute for Social Research
(Monday, 12:30 PM - 12:55 PM, Location: Demo Room)
Custom formats are invaluable to the SAS® programmer. Their functionality provides for much more than simply a mechanism for explicitly labeling values in a data set. There can be, however, a major limitation—the DATA step can accommodate only 4,096 formats at a time. It is unlikely that a SAS programmer would generate this many formats in code, but this is not the only method that generates formats. PROC IMPORT and third-party data conversion programs may well generate a distinct custom format for every variable in a data set, and data sets with more than 4,096 variables are not uncommon. Oftentimes however, these formats can be quite redundant—the same coding scheme was used for many similar variables. Eliminating redundant custom formats may well get you below the 4,096 limit. PROC SQL, PROC CATALOG, and the DATA step are the tools that can be used to eliminate the redundant formats.
Mixing the Use of the Three Types of Comments for Ease of Use of SAS® Enterprise Guide® Code
Mark Menzie, Assurant Health
(Monday, 12:30 PM - 12:55 PM, Location: Demo Room)
A frequent criticism of code is that it is not documented thoroughly enough. It is possible, however, for extensive comments to interfere with use, further development or troubleshooting of the code. This paper suggests thoughtful use of the three classes of comment statements to make use and maintenance of code easier:
1. Statement comment: * [comment text] ;
2. Slash comment: /* [comment text] */
3. Macro comment: %MACRO macroname; [comment text] %MEND macroname;
The advantage of mixing the three styles of comments is in the way that they overlay each other. Using the statement comment style where feasible allows easy overlay with the Slash comment style.
Pulling Together Existing Code for One-Touch Execution by Using %INCLUDE or Macro References
Mark Menzie, Assurant Health
(Monday, 12:30 PM - 12:55 PM, Location: Demo Room)
This paper illustrates the convenience pulling together existing code files into a single stream using %include and macro references. Steps needed to pull in external code by each method will be illustrated and compared.
Programming Beyond the Basics (Advanced Tutorials)
You Created That Report in SAS®!? The Power of the ODS PDF DestinationPete Lund, Looking Glass Analytics
(Monday, 8:00 AM - 8:50 AM, Location: Walker)
The Output Delivery System (ODS) has been around since SAS® version 7 and yet many people still don’t realize that they use it every day just to send results to the output window. They’re still more amazed when they see that publication quality reports in PDF files can be created with SAS and ODS.
This paper explores a number of ODS options in general and, more specifically, their use in creating PDF output. We will cover ODS ESCAPECHAR, which allows for inline formatting of titles, footnotes and other text and new syntax for version 9.2; ODS LAYOUT, which lets you place output wherever you want it on the page - even output from more than one procedure, both text-based output and graphics; inline formatting in PROC REPORT; the new world of DATA _NULL_ reporting using the ODS object and more.
We'll work from real life examples and see how you can produce output that looks like it took hours to create.
Boot Camp for Programmers: Stuff You Need to Know That's Not in the Manual - Best Practices to Help Us Achieve Reproducibility
Elizabeth Axelrod, Abt Associates Inc.
(Monday, 9:00 AM - 9:50 AM, Location: Walker)
Imagine that you completed a research project a year ago, and now your clients are challenging the results. They doubt your numbers, and they want to know how you came to your conclusions. Could you retrace your steps and reproduce your results? Could you describe what you did and why you made those decisions along the way? Could you find the data or the programs that you wrote? With the passage of time, it becomes more and more difficult to recall or recover the process, data, or results of a project.
As programmers, we’re tempted to head straight for the keyboard and start writing code. But first, some basic training is in order. This paper is geared for programmers and programmer wannabes, focusing on specific guidelines, best practices, and techniques that can help us ensure the reproducibility of our results and, in doing so, improve the quality of our work.
Understanding the SAS® DATA Step and the Program Data Vector
Steven First, Systems Seminar Consultants
(Monday, 10:00 AM - 10:50 AM, Location: Walker)
The SAS system is made up of two major components: SAS PROCs and the SAS DATA step. The DATA step provides an excellent, full fledged programming language that allows programs to read and write almost any type of data value, convert and calculate new data, control looping and much, much more. In many ways, the design of the DATA step along with its powerful statements is what makes the SAS language so popular. This paper will address how the DATA step fits with the rest of the SAS System, DATA step assumptions and defaults, internal structures such as buffers, and the Program Data Vector. It will also look at major DATA step features such as compiler and executable statements.
A New Look at An Old Friend: Getting Reacquainted with Arrays
Joe Novotny, dunnhumbyUSA
Paul Kollner, dunnhumbyUSA
(Monday, 11:00 AM - 11:50 AM, Location: Walker)
Though SAS DATA step arrays have been around for a long time, they are one of the more elusive constructs in the SAS System. Even many experienced SAS programmers are less than proficient with these powerful SAS programming tools. This paper springs out of this very scenario and attempts to remedy the situation for one of the authors. We start at the beginning, defining what arrays are. We discuss array characteristics and uses at an introductory level, and then proceed on to more advanced topics. The paper is meant to serve as a solid introduction for those uninitiated in array processing as well as a review and deeper dive into some more intricate topics for more advanced SAS programmers. We review and provide examples of array definition and processing for the following: 1) one and two dimensional arrays, 2) explicit vs. implicit arrays, 3) _temporary_ arrays, 4) iterative processing with do-loops and indexing and 5) comparison of array processing with SAS DATA step hash objects.
Application of DICTIONARY Tables and SASHELP Views
Kirk Paul Lafler, Software Intelligence Corporation
(Monday, 1:00 PM - 1:50 PM, Location: Walker)
DICTIONARY tables and SASHELP views provide useful information about your operating environment, database objects (tables, indexes, views), and SAS session. At any time during a SAS session, information about system options, librefs, table names, column names and attributes, formats, indexes, and more can be accessed. This presentation illustrates the positives and negatives with traditional approaches to capturing metadata, explores the content-filled DICTIONARY tables and SASHELP views, and the application of DICTIONARY tables and SASHELP views for producing system management requirements including variable cross-reference listings, database object listings, table (data set) row (observation) counts, column and index analysis listings
Understanding WHERE Clause Processing and Indexes
Kirk Paul Lafler, Software Intelligence Corporation
(Monday, 2:00 PM - 2:50 PM, Location: Walker)
SAS® users can quickly access selected observations (rows) of data in SAS data sets using WHERE clause processing and indexes. This presentation explores valuable techniques for achieving improved query performance using indexes. Attendees learn what an index is, the purpose of an index, how an index is created, factors that determine when an index is warranted, how WHERE clause processing can take advantage of an index, and techniques for tuning indexes for better performance
Hard-to-find, But Powerful, PROC SQL Features
Kirk Paul Lafler, Software Intelligence Corporation
(Monday, 3:00 PM - 3:50 PM, Location: Walker)
The SQL Procedure contains many powerful and elegant language features for experienced SQL users. This presentation introduces SQL topics that will help users unlock many hidden features, options, and other hard-to-find gems in the SQL universe. Topics include CASE logic for handling conditional logic scenarios; the COALESCE and MONOTONIC functions; the _METHOD, _TREE, and OUTOBS= options; and several important performance tuning techniques
Stuff We All Do: Mistakes We’ve Made That You Won’t Have To
Joe Novotny, dunnhumbyUSA
Michael Bramley, dunnhumbyUSA
(Monday, 4:00 PM - 4:50 PM, Location: Walker)
Ever felt alone? Like you’re the only SAS Programmer up until 3am looking for that missing %end statement in a septuplly-nested macro? Fear not, there are at least two other Programmers out there just like you (and we venture to say thousands more who just won’t admit it). This paper helps to answer some questions you may (or should) have been asking but were too embarrassed to ask. Specifically, we delve into the depths of 1) How to keep DATA step execution under a dozen hours by appropriately setting the BUFSIZE and BUFNO options, 2) How to remain friends with your system administrator by not executing seemingly innocent PROC SQL code which consumes all your system resources, 3) How to keep your sanity by understanding when you need to force a step boundary, 4) How to keep your job by correctly creating summary variables outside vs inside of an iterating macro DO loop, 5) How to speed up your SAS jobs by not trying to download the entire Oracle data table to your local environment. Learning all these (plus additional techniques) will keep you employed and in good graces with the systems gurus at your company.
Ways of Creating a Macro Variable
Kelley Weston, Quintiles
(Tuesday, 8:00 AM - 8:50 AM, Location: Walker)
There are many ways of creating a SAS® macro variable, many of which many programmers are aware, but perhaps some methods may be new. Additionally, the scope and length of the macro variables are things of which to be aware. This will give you many of the ways of creating macro variables in SAS, so you can be aware of when a macro variable is being created, as well as the scope and values, including their length.
This paper only addresses Base SAS, and is operating-system independent. The intended audience would have some intermediate-level SAS experience.
The Ten Most Frequently Asked Macro Questions
Kevin Russell, SAS
(Tuesday, 9:00 AM - 9:50 AM, Location: Walker)
In the SAS® Technical Support Division, it is not uncommon to receive particular questions on a repeated basis. This is certainly true for questions regarding the SAS Macro Facility. This presentation discusses the ten most common questions we receive related to that topic. These questions cover all aspects of the macro language, from quoting functions to how the macro language interfaces with the SAS DATA step. At the conclusion of this presentation, you should have a better understanding not only of common features that the macro language offers, but also of some helpful macro techniques.
Best Practices and Advanced Tips and Techniques for SAS® Macro Programming
Gerry Bonin, We Energies
(Tuesday, 10:00 AM - 10:50 AM, Location: Walker)
Base SAS macros offer a powerful way to extend the SAS language and create reusable SAS code that is a custom fit for you. This paper shows you how to write macros for a production environment and includes some advanced tips and techniques. It includes templates for starting a new macro and for documenting macros, as well as some of the author’s favorite coding practices. Applicable to all levels of SAS programmers.
Using SAS® Macro Functions to Manipulate Data
Ben Cochran, The Bedford Group
(Tuesday, 11:00 AM - 11:50 AM, Location: Walker)
The SAS DATA step has the reputation for being one of the best data manipulators in the IT world. While the author of this paper agrees with this statement, it is possible to go beyond the capabilities of the DATA step by using SAS macro functions. It would be difficult to show the full power of these macro functions in a fifty-minute presentation, so this paper will look at a few commonly used macro functions and compare and contrast them to DATA step functions. These functions can be used not only to manipulate data, but to manipulate entire programs as well.
PROC TABULATE: Doing More
Art Carpenter, California Occidental Consultants
(Tuesday, 1:00 PM - 1:50 PM, Location: Walker)
TABULATE is different; its step statement structure is unlike any other procedure. Consequently the options and supporting statements associated with some of the more advanced and powerful techniques are more complex and less obvious than in most other procedures. These techniques are the emphasis of this workshop.
The material assumes that the user has a good general understanding of the statement structure and organization of the TABULATE step. We will build on the basics in order to cover a number of supporting statements, options, and techniques that allow you to take full advantage of this powerful and flexible procedure. Included are the use of ODS style overrides, traffic lighting, preloaded formats, CLASSDATA, EXCLUSIVE, and the calculation of percentages.
SGPANEL: Telling the Story Better
Charles Kincaid, COMSYS
(Tuesday, 2:00 PM - 2:50 PM, Location: Walker)
SAS® has a new set of graphics procedures called Statistical Graphics. They are built upon the Graphics Template Language (GTL) in order to make the powerful GTL easily available to the user.
SGPANEL, in particular, can be used to produce powerful graphics that used to require a lot of work. This upgrade is similar to the ease-of-use upgrade in output manipulation when ODS was first published. This paper will introduce the reader to PROC SGPANEL and the new capabilities it provides beyond the standard plot. By laying out plots in multi-cell graphs, any user can tell the story better.
SAS® Views - The Best of Both Worlds
David Thul, Blue Cross Blue Shield of MN
(Tuesday, 3:00 PM - 3:20 PM, Location: Walker)
My paper explains SAS views, illustrates how to use them in SAS programs and lists reasons why a programmer would want to use them. It explains in detail the I/O saving advantages of views. While I do not dive deeply into the behind–the-scenes workings of SAS views, my intended audience is seasoned Base SAS programmers who are very familiar with SAS programming concepts like Proc SQL and the DATA step. SAS views are platform independent but this paper assumes the audience is comfortable with the general concepts of their operating systems' file structures, sizing, etc.
A Different View of PROC SORT
Steven First, Systems Seminar Consultants
(Tuesday, 3:30 PM - 3:50 PM, Location: Walker)
The most common task performed by computers is sorting. This presentation will take a brief look at PROC SORT and will discuss what SORT does, some useful but not well-known options, and what impact sorting can have on successful, timely job completion.
Reporting & Information Visualization
Visualizing Key Performance Indicators Using the GKPI ProcedureBrian Varney, COMSYS
(Monday, 8:00 AM - 8:20 AM, Location: Juneau)
The GKPI procedure is new in SAS 9.2 SAS/Graph. This new procedure can be used to create graphical key performance indicator (KPI) charts which include sliders, bullet graphs, dials, speedometers, and traffic lights. This paper is intended to serve as an introduction to the GKPI procedure by discussing the syntax and demonstrating examples. In addition, this paper will discuss how results from the GKPI procedure can be integrated into existing SAS environments.
Multivariate Data Displays for Evaluating Clusters
Robert Moore, Ameriprise Financial
(Monday, 8:30 AM - 8:50 AM, Location: Juneau)
This paper illustrates some SAS graph techniques for displaying multidimensional data to facilitate visual evaluation of clusters. When performing a cluster analysis to identify customer segments, most clustering algorithms leave the number of clusters and the best clustering solution to the judgment of the analyst. Graphical techniques that display summary statistics for a number of variables simultaneously for all the clusters can be a helpful technique for visual evaluation of the clusters. The multivariate graphical displays illustrated in this paper include profile plots, Andrews plots, star charts, and Chernoff faces. The first three types are implemented using the PROC GPLOT and PROC GCHART procedures. Some preliminary data preparation is performed using the PROC MEANS and PROC TRANSPOSE procedures. The intended audience for this paper is assumed to have an intermediate SAS skill level, and some familiarity with customer segmentation.
ODS TAGSETS.RTF: Tips and Tricks
Xiangxiang Meng, University of Cincinnati
(Monday, 9:00 AM - 9:20 AM, Location: Juneau)
In SAS 8.1 and later, the ODS RTF destination delivers and saves tables, listings, and figures in Rich Text Format (RTF) that can be read and edited by Microsoft Word 2002 and other word processing packages. New with SAS 9.2, ODS TAGSETS.RTF is created as a measured markup to fix some problems with the traditional ODS RTF destination, such as memory consumption and page break restriction in the output of extremely large tables. This paper first introduces the common functions of the ODS RTF and ODS TAGSETS.RTF statements, and then illustrates some new features in the new ODS TAGSETS.RTF statement, such as controlling page break, adjusting table layout, and adding table of content to the RTF file.
Using the SGSCATTER Procedure to Create High-Quality Scatter Plots
Xiangxiang Meng, University of Cincinnati
(Monday, 9:30 AM - 9:50 AM, Location: Juneau)
Scatter plot is a useful exploratory tool for multivariate data analysis and is one of the most commonly used statistical graphics. In traditional SAS/GRAPH®, it needs the cooperation of the GPLOT procedure, several SYMBOL statements and GOPTION statements to create a high-quality scatter plot. New with SAS® 9.2, the SGSCATTER procedure can produce a variety of scatter plots and put them into panels with different layouts within just a few lines of code. This paper will introduce how to create different types of scatter plot with PROC SGSCATTER and how to use ODS GRAPHICS and ODS styles to enhance the graph.
Visual Display of Data in the Age of Twitter, Mobility, Web, Excel, and PowerPoint: Concise, Communication-Efficient, and Communication-Effective Graphs
LeRoy Bessler, Bessler Consulting and Research
(Monday, 10:00 AM - 11:50 AM, Location: Juneau)
I have long advocated and delivered simplicity in graphic design. The current popularity of concise wireless text messages, the widespread use of mobile phones and other devices with compact displays, and now the iPad, all reinforce the case for simplicity of information delivery.
This nearly-two-hour tutorial about design and construction covers not only graphs suitable for small screen mobile devices, but also for management reports, web pages, slide presentations, hardcopy, articles in journals, magazines, or newspapers, or for use as companions to spreadsheet tables. It also provides two excellent ways to produce dynamic graphs without requiring the viewing user to have any extra plug-ins installed. By dynamic graphs I mean: (a) web-enabled graphs with pop-up labels (and a forward-and-backward link to/from a spreadsheet of the input data); and (b) animated graphs that can be deployed on a web page or in a PowerPoint slide. (Also covered is how to insert graphs and tables directly to PowerPoint slides.)
For the case of a bar chart, a pie chart, and a line chart, the presentation includes the stepwise process of going from the minimal code for a default graph to the code needed to create a recommended communication-efficient and communication-effective alternative. For other example graphs only the needed final code is provided in the slides, or references are provided if the code is too voluminous.
Tips and Tricks: More SAS/GRAPH® Map Secrets
Darrell Massengill, SAS
(Monday, 1:00 PM - 1:50 PM, Location: Juneau)
“You can’t do that with PROC GMAP!” We hear that all the time. Many users don’t know the full range of capabilities of the SAS/GRAPH® mapping procedures. This presentation will share the secrets to allow you to exploit the power of SAS/GRAPH maps. Get the maps you really want; you can do that with PROC GMAP! We will demonstrate this using both existing functionality and new SAS® 9.2 functionality.
Integrating Multiple SAS/GRAPH® Procedures to Generate a Custom Image
Joshua Horstman, First Phase Consulting, Inc.
(Monday, 2:00 PM - 2:20 PM, Location: Juneau)
SAS/GRAPH offers a rich set of tools that can be used to programmatically generate custom image files. Using the MWSUG website header as a sample project, this paper provides an introduction to several lesser-known SAS/GRAPH procedures and demonstrates how to use them together. PROC GMAP is used to draw a custom map of the states in the MWSUG region. Text and other graphical elements are produced using PROC GSLIDE. Finally, the various pieces are arranged together with PROC GREPLAY. Various GOPTIONS parameters are discussed as they relate to the output of an image file of the desired type and size. No prior SAS/GRAPH experience is assumed.
Two EDF Graphs for Assessing Agreement between Paired Data
David Meek, USDA MWA NLAE
(Monday, 2:30 PM - 2:50 PM, Location: Juneau)
This paper is intended for those familiar with PROC GPLOT from SAS/GRAPH® and PROCs UNIVARIATE, TTEST, and NPAR1WAY in SAS/STAT®. Via macro procedures, two graphs related to the Smirnoff D statistic are developed to aid in the assessment of agreement between paired data. The first is a PROC GPLOT version of the ODS EDF plot in PROC NPAR1WAY; the second is a variant on it. A data set from a SAS manual is used to show the graphs.
More To It Than Meets the Eye: Creating Custom Legends That Really Tell a Story
Pete Lund, Looking Glass Analytics
(Monday, 3:00 PM - 3:50 PM, Location: Juneau)
Merriam-Webster defines a legend as “an explanatory list of the symbols on a map or chart.” We’re used to seeing just that – a table of symbols or a series of colored boxes with a word or two explaining what the symbols or colors mean. In almost every case, this is sufficient. However, what if a legend did more than just explain the colors on the map – what if it had some descriptive or analytic of its own?
This paper discusses techniques to build custom legends using the SAS/Graph Annotate facility. Two real-world examples will be presented. One creates a typical legend and then adds quantitative, descriptive information to it. The other creates a graphic that not only defines the colors on the map, but also stands alone as an analytic display.
Make Your Tables Pop: Embedding Micrographics in PROC REPORT Output
Pete Lund, Looking Glass Analytics
(Monday, 4:00 PM - 4:50 PM, Location: Juneau)
It’s a very common thing to have a table of information included in a report. It’s also very common to have graphics, such as bar charts, in the same report. This paper will look at techniques, available in SAS®, to place the graphics right in the tables.
The SAS/GRAPH Annotate facility will be used to create small, standalone graphics files. The files can then be placed in a “cell” of a table created by PROC REPORT. This technique allows you to create tables that draw the reader’s eyes to the important information much more quickly than scanning through all the text.
Resources and Support for SAS Users
Creating a Successful Support Model for the SAS® Enterprise Intelligence PlatformJerry High, Blue Cross and Blue Shield of Minnesota
(Tuesday, 8:00 AM - 8:50 AM, Location: Juneau)
SAS has a robust set of tools that provide business analysts many capabilities ranging from statistics and data mining to data integration and business intelligence. In the early years, prior to version 9, typical analysts would simply use the SAS programming language and write code to obtain the desired results. Support had two aspects: one from the systems administration team where the SAS software was simply installed and made available to the analysts, the other from the SAS community sharing their insights about business problems and coding techniques. Today under release 9, more specifically with the introduction of the SAS Enterprise Intelligence Platform, the capabilities have increased dramatically. So, too, has the need for a proper support model to ensure that today’s sophisticated business analysts and other general consumers of information can leverage the power of SAS to the fullest. This paper explores many aspects to consider when developing a successful support model for a SAS environment.
Supporting Users, Software, and a BI Server: Using SAS to Support SAS
LeRoy Bessler, Bessler Consulting and Research
(Tuesday, 9:00 AM - 9:50 AM, Location: Juneau)
This is a report on my methods, the resources, and my tools for the support of users, software, and a BI server. All of the tools were built with SAS software itself. Code for the tools is either in the paper itself, or in references cited therein.
Getting from SAS® 9.1.3 to SAS® 9.2: Practicalities of the Automated Tools
Diane Hatcher, SAS
(Tuesday, 10:00 AM - 10:50 AM, Location: Juneau)
As a follow-up to last year’s paper, “Migration from SAS® 9.1.3 to SAS® 9.2 – An Overview”, this paper looks specifically at the automated tools to support your migration of the platform for SAS® Business Analytics from SAS 9.1.3 and SAS 9.2. We will take a closer look at the SAS® Migration Utility and how it’s used with the SAS® Deployment Wizard. By understanding the roles, capabilities, and limitations of these tools, you can plan for a smooth migration initiative that will get your SAS 9.2 environment up and running in the shortest amount of time.
Custom Google Searches, PDF Sticky Notes, and Other Tips for Organizing and Accessing SAS® Help Resources
Roger Muller, First Phase Consulting, Inc.
Joshua Horstman, First Phase Consulting, Inc.
(Tuesday, 11:00 AM - 11:50 AM, Location: Juneau)
Today’s SAS programmer enjoys access to a wide variety of electronic SAS help resources, both from SAS Institute and other sources. These include help files accessible within the SAS interactive environment, SAS online documentation in both HTML and PDF formats, SAS technical notes, proceedings from SAS Global Forum and regional conferences such as MWSUG, and other websites that have become popular within the SAS user community.
In this paper, we describe these resources, discuss how to access them, and recommend ways to organize these resources to maximize their usefulness and convenience. We show how techniques such as site-specific custom Google searches and PDF sticky notes make these resources even handier than those paper SAS manuals sitting on your shelf.
This paper provides a wealth of useful information for the novice SAS programmer, but even experienced professionals will probably find something new.
Technical Resources for Beginning SAS® Programmers
Brian Varney, COMSYS
(Tuesday, 1:00 PM - 1:50 PM, Location: Juneau)
Between SAS Technical Support, support.sas.com and the SAS help tools within a SAS installation, there are many free resources that can help accelerate one’s learning, problem solving and troubleshooting abilities. These are valuable for SAS developers that are new to SAS or new to a particular SAS module, product or solution. Leveraging these free resources will result in more efficient and better SAS development.
Win with In-House Users Groups for SAS®, JMP®, and Special Interests
Charles Edwin Shipp, Shipp Consulting
Kirk Paul Lafler, Software Intelligence Corporation
(Tuesday, 2:00 PM - 2:20 PM, Location: Juneau)
Have you considered an in-house group for SAS, JMP or special interests? These are users groups within your university, company, or government agency. Topics include how to start and maintain an in-house group, benefits and leadership opportunities, peer-to-peer interaction, tutorials, collaboration of users and departments, a focal point for proper requests, getting to know other users and providers, differences in corporate cultures and examples of successful school, company, and government user groups, and beginning and continuing critical success factors.
Connect with SAS® Professionals around the World with LinkedIn and sasCommunity.org
Charles Edwin Shipp, Shipp Consulting
Kirk Paul Lafler, Software Intelligence Corporation
(Tuesday, 2:30 PM - 2:50 PM, Location: Juneau)
Accelerate your career and professional development with LinkedIn and sasCommunity.org. Establish and manage a professional network of trusted contacts, colleagues and experts. These exciting social networking and collaborative online communities enable users to connect with millions of SAS users worldwide, anytime and anywhere.
This presentation explores exciting features found in both virtual communities. Topics include creating a LinkedIn profile and social networking content, developing a professional network of friends and colleagues, joining special-interest groups, accessing a Wiki-based web site where anyone can add or change content on any page on the web site, sharing biographical information between both communities using a built-in widget, exchanging ideas in Bloggers Corner, viewing scheduled and unscheduled events, using a built-in search facility to search for desired wiki-content, collaborating on projects and file sharing, reading and responding to specific forum topics, and much more.
SAS® Programmer’s Challenge: It’s All about the Options
Art Carpenter, California Occidental Consultants
(Tuesday, 3:00 PM - 3:50 PM, Location: Juneau)
Come and join us at the SAS Programmer's Challenge. This lively, informal, and fun event will feature a number of puzzles and questions on various aspects of SAS programming – especially relating to options. The problems and questions will vary from easy code situations, to “How does SAS think in this situation?”, to situational DATA step logic, to the situations we get ourselves into.
This will be a great chance to pick up a few coding tips. You can come to watch. You can come to learn. You can come to compete (mostly against yourself).
SAS with Excel or Database
The Ten Most Frequently Asked Questions about SAS® to ExcelChevell Parker, SAS
(Monday, 8:00 AM - 8:50 AM, Location: Oak)
This presentation covers ten of the most frequently asked questions about exporting SAS output to Excel. Topics include various methods of exporting to Excel, why you would use one method over another, and common problems and pitfalls. Covered in greater detail will be using the Output Delivery System to export to Excel. We will look at the various tagsets such as ExcelXP, MSOffice2k , CSV, and other customized tagsets that address some of the most frequently asked questions. Finally, general tips will be given which allow the user to generate quality presentations and automate common tasks.
SAS and Excel, A Winning Combination, Part 2: DDE (Dynamic Data Exchange), a Popular Solution around the World
LeRoy Bessler, Bessler Consulting and Research
(Monday, 9:00 AM - 9:50 AM, Location: Oak)
To create complex, highly formatted Excel spreadsheets from a SAS program, Dynamic Data Exchange (DDE) is the tool preferred by numerous SAS users, including me. I get requests nearly every week from all over the world for my free DDE toolkit. Though it is probably the oldest SAS-to-Excel solution, DDE nevertheless is compatible with Excel 2007. In fact, I will share a capability tip from Craig Wildeman that is unique to Excel 2007.
DDE supercharges your use of SAS with Excel. SAS DDE programs can load worksheets, and format them maximally without point-and-click. Almost anything doable directly in Excel is doable with DDE commands from your program. You can access data, even to the level of a specific cell, a row or column, part of a row or column, or a range of contiguous rows or columns, or can work with an entire worksheet. Then you can write, read, or format data. Why would you want to run Excel from SAS? You can create a hands-off production job to be automatically scheduled by the computer to access and analyze the data, load the spreadsheet, and format it. You can even attach the report to, or include a link to it in, a SAS-dispatched email message (which does not require DDE). In any case, if you find yourself preparing the same highly formatted Excel report over and over, why not automate it?
Romancing Your Data: The Getting-to-Know-You Phase
Carole Jesse, Ally Financial
(Monday, 10:00 AM - 10:20 AM, Location: Oak)
Have you ever gained access to an Oracle® database only to find yourself asking these questions? “What’s in this database?” “Where do I start?” “What do I actually have access to in the database?” “What uniquely defines a record in this table?” “Is this table Indexed, and if so, what is it indexed on?” “Is this table Partitioned?”
The ability to gather and store large amounts of data has become commonplace, across industries. As these capabilities have grown, so has database complexity. The deluge of corporate acquisitions further complicates things, driving more and more undocumented data sources into the hands of new database users, often without the institutional knowledge to go with them. In these cases, the analyst must typically be his or her own data sleuth, developing the ability to discover the complexity of a database on their own before engaging in serious analytics.
This paper focuses on Oracle databases and use of SAS/ACCESS® Interface to ORACLE to explore the Oracle Data Dictionary Views containing the database metadata. In particular, we focus on metadata summaries of interest to a user new to the database. The metadata is presented via a series of programs utilizing Base SAS®, specifically The SQL Procedure and The Macro Language. The delivery of output will also include a treatment of SAS® Output Delivery System.
The target audience for this paper includes Statistical Modelers and Business Analysts, or anyone else who uses Oracle databases to solve business problems. Some basic understanding of Base SAS PROC SQL, Macro Language, and the SAS/ACCESS Interface to ORACLE product is helpful, but not required.
Macro Variables Make Life Easier: Applications with SAS® to Excel
Misty Johnson, State of Wisconsin Department of Health Services
(Monday, 10:30 AM - 10:50 AM, Location: Oak)
As SAS users, we all find ourselves telling SAS some basic information over and over, like the name of the report, recipient of the report, what is contained in the report and where we will write the output. Macro variables are a useful tool for the intermediate SAS user to minimize this redundancy and, as a result, minimize errors by declaring important criteria only once; saving you time and making life easier. This paper will demonstrate the use of macro variables to tell SAS which data to pull, double-check that the correct data was pulled before writing the output report, and finally define formatting done by Dynamic Data Exchange (DDE) of the output report in Excel.
Linking SAS® Reports to Spreadsheet Data
Ben Cochran, The Bedford Group
(Monday, 11:00 AM - 11:50 AM, Location: Oak)
This paper takes a step by step approach to building a summary report with the REPORT procedure, and then building 'hot links' to the detail data that is found in Excel spreadsheets. This paper starts with the data, either a SAS data set or spreadsheet data, shows the audience how to manipulate the data to get it ready for the report, shows how to use PROC REPORT to get the desired result, and covers the code needed to link to specific spreadsheets.
Using PROC SQL to Build and Incrementally Update That Data Mart
Ben Cochran, The Bedford Group
(Monday, 1:00 PM - 1:50 PM, Location: Oak)
Often SAS users need to access data from non-SAS sources. This is especially true when constructing a SAS data warehouse from other vendors’ databases. While this task is not too difficult, sometimes unforeseen challenges can arise, especially when dealing with date values. This tutorial initially takes a look at several methods for accessing different kinds of data to do the initial load of the data warehouse. Then attention is given to various ways of doing incremental updates and how to overcome some potential problems.
This paper follows the tasks that were involved in a specific retail application and how a certain organization faced and overcame the challenges that accompany building and updating a data warehouse. To accomplish its objectives, this paper is divided into seven sections. The first section looks at the environment and issues surrounding the initial load of the warehouse. The second part looks at the inevitable task of data manipulation, specifically dealing with date values. The third part examines methods for finding out the maximum date value of transactions in the data warehouse. Next, the paper looks at finding the maximum date values in the operational data that feeds the warehouse. The fifth step looks at the method for comparing the maximum date of the warehouse transactions with the maximum date of the operational data. The sixth step looks at doing the actual updating itself. The seventh step looks at accomplishing the above by using the SAS/ACCESS® LIBNAME statement with the SQL procedure.
Building an Extract, Transform, and Load (ETL) Server Using Base SAS®, SAS/SHARE®, SAS/CONNECT®, and SAS/ACCESS®
Christopher Schacherer, Clinical Data Management Systems, LLC
Timothy Steines, Independent Consultant
(Monday, 2:00 PM - 2:50 PM, Location: Oak)
There are a number of methods that can be used to efficiently build dimensional models for business intelligence systems using BASE SAS®, SAS/STAT®, and SAS/ACCESS®. Once developed, these programs must be run on a platform that provides the level of performance required to rebuild data models in ever-shortening build windows. The current paper describes how to build the infrastructure necessary to support resource-intensive processes such as extract, transform, and load (ETL) programs using SAS Server®, Base SAS®, SAS/SHARE®, SAS/CONNECT®, and SAS/ACCESS®. In addition to providing a robust infrastructure for quickly building dimensional models, the programming methods used to take advantage of this infrastructure can also increase the reliability of the ETL process and enhance the quality of the data in the data warehouse.
Using SAS® Software's Metadata to Generate ETL Precursor Files in a Heterogeneous Database System
Alexander Pakalniskis, Cedars-Sinai Medical Center
Peixin Wang, Cedars-Sinai Medical Center
Nilesh Bajania, Cedars-Sinai Medical Center
Alein Chun, Cedars-Sinai Medical Center
(Monday, 3:00 PM - 3:50 PM, Location: Oak)
Created by the Resource and Outcomes Management (ROM) Department of the Cedars-Sinai Medical Center (CSMC), the Extract Transform Load System (ETLS) is an automated tool supporting the controlled movement of data from various input sources into Structured Query Language (SQL) environments. The tool, written in base SAS and used in a Windows environment, ensures ongoing information quality, reduces the time and effort required for data extraction, transformation, and loading, and provides audit report capabilities for project management. The output of this tool is designed to be used by database administrators (DBAs) in a linear or multi-layered heterogeneous database system.
Base SAS® Methods for Building Dimensional Data Models
Christopher Schacherer, Clinical Data Management Systems, LLC
(Monday, 4:00 PM - 4:50 PM, Location: Oak)
As the volume of data available from operational systems continues to grow, dimensional models are becoming an increasingly important analytic tool for enterprise reporting and analysis. Although there are a number of software packages specifically designed for transforming data from operational systems into dimensional models, many smaller organizations find themselves unable to make the significant investment in new technology and personnel necessary to utilize these tools. Many of these same organizations, however, do use Base SAS as an analytic tool. Especially for these organizations (but also for larger organizations with more sophisticated business intelligence systems), the current work provides an example of using PROC SQL, SAS/ACCESS®, and SAS hash objects to extract, transform, and load data from an operational system into a dimensional data model.
Solutions
Mashups Can Be Gravy: Techniques for Bringing the Web to SAS®Jack Fuller, COMSYS
(Monday, 8:00 AM - 8:20 AM, Location: Kilbourn)
A mashup is an agglomeration of disparate content. While popular terminology tends to associate mashups with web-based data, physician John Snow demonstrated an early example of mashing content in 1854 when he identified the source of a London cholera outbreak by mapping cholera patients with their water sources. The guilty culprit turned out to be an infected water pump handle in Soho. Web based mashups have transitioned from metasearch engines like Dogpile® and Webcrawler® to Google™ maps detailing winery tours and public restrooms to Yahoo!® Pipes™ which allows users to define their own mashups.
SAS® has long facilitated moving information from SAS to the Web. This paper will explore mashup techniques for enabling the movement of information in the other direction: from the Web to SAS.
Use of FILENAME Statement to Create Automated Reports with Severe Weather Events Data from the National Weather Service
Valentin Todorov, Assurant Specialty Property
(Monday, 8:30 AM - 8:50 AM, Location: Kilbourn)
Risk managers at insurance companies and banks constantly monitor financial exposure and potential impact from various weather related events – hurricanes, tornadoes, wind and hail. To understand the exposure to such phenomena, managers can utilize the severe weather events data from NOAA’s National Weather Service Center (http://www.spc.noaa.gov/climo/reports/). This paper shows how the FILENAME statement can be employed to easily obtain and analyze this information. The Weather Service publishes daily files containing information about severe weather events on the NOAA’s website. The files are not cumulative, which makes the aggregation of the information very tedious. With the help of the FILENAME statement it is easy to obtain and aggregate files for as many periods as needed. The aggregated data can then be used for various purposes: 1) Find properties in companies’ portfolio close to a tornado touchdown to estimate financial impact; 2) Determine areas with high likelihood to incur financial losses from severe events; 3) Create custom reports and maps. The material from this presentation can be extended to obtain content from other sources (e.g., US Census Bureau, Bureau of Labor Statistics, Bureau of Economic Analysis).
This SAS® Program Says to Google, "What's Up Doc?"
Scott Davis, COMSYS
(Monday, 9:00 AM - 9:20 AM, Location: Kilbourn)
When you think of the internet, there are few things as ubiquitous as Google. While Google search, Gmail and other Google products are widespread, what may not be quite as well known is Google Docs. Google Docs is a web-based Office-like suite productivity tool. You can create and manage a number of different types of documents and store them for free on the internet.
There is a tremendous advantage for using a tool like Google Docs for everyone, but especially for the consumer and for the small business owner alike. That advantage, of course, is the cost. Google Docs = FREE. Where does SAS come into play? SAS has the tools (with a little finessing) to retrieve the data from one of these documents and turn it into information.
This paper explores using the FILENAME URL statement to retrieve data from a Google Docs spreadsheet. The data can then be manipulated, turned into a report and published back out to the internet for your colleagues.
Summaries by Class and Interval
Nathan Lindquist, U.S. Bank Credit Administration
(Monday, 9:30 AM - 9:50 AM, Location: Kilbourn)
This paper presents a method for summarizing performance results by class over consecutive time intervals. The resulting report is useful for grouping and monitoring changing trends by segment over time. The examples presented are taken from financial services; the technique, however, may be useful in many business areas. The primary SAS concepts used are PROC SQL and SAS macros.
How I Learned to Type 1,388,571 wpm, Or Creating Extensive Shift Tables
Kelley Weston, Quintiles
(Monday, 10:30 AM - 10:50 AM, Location: Kilbourn)
Sometimes in the pharmaceutical industry we need to be able to create something called a shift table, which in essence is just something that can show the progression of a subject's test. If there are very few possible values, and only the first and last visits to consider, then this may be a trivial task. However, if there are several visits, and / or many possible values, then this becomes something that takes some planning.
This paper uses only Base SAS, and should run on any operating system. It is intended for an intermediate to advanced level audience.
The Genealogy of Macros
Kelley Weston, Quintiles
(Monday, 11:00 AM - 11:20 AM, Location: Kilbourn)
When one macro calls another macro, which calls another macro (etc., ad infinitum), how can you find what the "macro genealogy" is? This paper will give an easy method that you can use to find all of the macros involved in your program. This method is SAS-version independent, and will work on any operating system.
Meta-Programming in SAS® with DATA Step
Paulo Tanimoto, KEMA Inc.
(Monday, 11:30 AM - 11:50 AM, Location: Kilbourn)
It is a common pattern in SAS to pass information from a data set to macro variables, perform a number of processing steps, and loop as necessary. Besides inefficient, this method can be prone to error. We present an alternative approach that eliminates the need for macros and macro variables altogether, running directly from a DATA STEP via CALL EXECUTE. We illustrate the technique with common examples that require meta-programming: importing a collection of Excel files, applying labels and formats from a list, and generating tables dynamically.
SAS and Excel, A Winning Combination, Part 1: Easy Solutions for Multi-Sheet Excel Workbooks, Color-Coding, Imbedded Graphs, etc.
LeRoy Bessler, Bessler Consulting and Research
(Monday, 1:00 PM - 1:20 PM, Location: Kilbourn)
SAS is my personal favorite computer tool, but the world’s commonest non-SAS tool for data presentation and working with data is Microsoft Excel. People often want your SAS report formatted as a spreadsheet, so that they and others can post-process or reformat data however they like. This short tutorial on SAS-to-Excel solutions is for anyone who needs to deliver information via Excel from a SAS program. In regard to the popular “traffic lighting” concept, this paper will explain how to color-code data so as to address the commonest form of color blindness. The paper will reach the limits of what the author can easily do with the tools selected. For another solution, the reader is referred to the companion paper on DDE (Dynamic Data Exchange).
The Art of Being Color-Communication-Effective, Not Just Colorful
LeRoy Bessler, Bessler Consulting and Research
(Monday, 1:30 PM - 1:50 PM, Location: Kilbourn)
Though it includes some information and coding specific to SAS, SAS/GRAPH, and ODS, this short tutorial is also a software-independent guide to using color to communicate, rather than decorate. Color does more than merely add visual excitement to your output. When it comes to the pitfalls of, and best practices for, using color, you don’t know what you don’t know. Come and find out.
Happy Birthday! But ... How Old Are You, Really?
Elizabeth Axelrod, Abt Associates Inc.
(Monday, 2:00 PM - 2:20 PM, Location: Kilbourn)
Chances are that at some point in your SAS career you've had to calculate AGE. There are various ways to do this, but one of the most common ones does not always produce a correct value. This paper presents several alternatives to calculate AGE, and illustrates what works - and what doesn't.
Building a Match Code Using SAS®
David Li, Prime Therapeutics
(Monday, 2:30 PM - 2:50 PM, Location: Kilbourn)
In business analytics, there are frequent needs to join data from different sources based on Name and Address. Instead of waiting for IT to run its process, a SAS Name and Address match code can allow the project to flow without interruption. The basic match code concept can be compactly implemented utilizing SAS Macros. Three key topics are highlighted in this paper: Parsing, Looped Sequential Processing, and Modular Design. SAS Macros are called upon to organize the modules, perform a sequence of tasks against one record at a time, and manipulate the parsing and transforming of text strings into the final match code. The techniques to move a specific data element through the various states of transformation and back again are discussed in detail. This paper assembles: how to parse out a sub element from a specific data set variable, the transformation of a variable’s values into macro variables, the storage of macro variable values as records within a data set, and other useful programming skills.
Simulating a Simple Solitaire Card Game Using SAS®
Robert Moore, Ameriprise Financial
(Monday, 3:00 PM - 3:20 PM, Location: Kilbourn)
This paper illustrates using SAS arrays and nested DO loops in a DATA step to simulate a card game. A SAS program is presented that simulates a simple version of solitaire played with a standard deck of cards. Arrays are used to represent the standard deck of playing cards, and a DATA step with nested DO loops is used to repeatedly play the game and capture the outcomes of each game. PROC FREQ and PROC GCHART are used to examine the distribution of possible outcomes of the game generated by simulating a large number of trials. The intended audience for this paper is assumed to have a beginning to intermediate SAS skill level and be familiar with arrays and DO loops.
Using a Few Key Elements of SAS® DATA Step Code and a Couple of Procedures to Optimize the Observation Length of a Data Set
Philip Wright, The University of Michigan-Institute for Social Research
(Monday, 3:30 PM - 3:50 PM, Location: Kilbourn)
The SAS DATA step supports quite a few statements and functions. Many of these are usually accompanied with acceptable default values. Due to the nature of SAS programming, however, the function defaults are sometimes overly generous. This is particularly true of length specifications for both the numeric and character variables. With the default of 8 bytes for numeric variables (the equivalent of double-float notation in other languages) and the default length of character variables sometimes as long as 200 bytes, the potential for shortening variable lengths is quite high. Some standard SAS functions and procedures enable you to shorten variable lengths quite easily. Shorter variable lengths mean shorter observation lengths, shorter observation lengths enable faster I/O processing, and less disk space usage without spending CPU time on compression and decompression.
Tables As Trees: Merging with Wildcards Using Tree Traversal and Pruning
Matthew Nizol, United BioSource Corporation
(Monday, 4:00 PM - 4:50 PM, Location: Kilbourn)
One of the most fundamental operations in SAS is to merge two data sets. When the values of the key variables involved in the merge are fully specified, the code is straightforward. However, if one of the tables contains wildcards in the key variables, standard merge and look-up techniques do not work. This paper will explore an algorithm for merging two SAS data sets in which the key variables of the master data set are fully specified whereas the key variables of the look-up data set may either be fully specified or wildcard (e.g. “don’t care”) values. The algorithm works by interpreting the look-up table as a tree structure: each key variable becomes a level in the tree, and each unique value for a given key variable becomes a node in the tree. As the algorithm visits each level in the tree, the tree is pruned to create a binary tree in which the left child holds the wildcard value and the right child holds the fully specified key value. If any paths from the root to the leaves remain after all pruning is complete, the records in the look-up data set represented by those paths are returned as matches. This paper is intended for intermediate to advanced users of SAS. A basic understanding of tree-based data structures will be helpful but not necessary. The code presented in this paper will work with Base SAS versions 9.1.3 and later.