- Conference Overview
- Conference Wrap-Up
- Conference Invitation
- Important Dates
- Registration & Rates
- Hotel & Parking
- Schedule Overview
- Conference Committee
- Conference Content
- Section Descriptions
- Paper Presentations
- Schedule Grid
- Mobile App
- Training Courses
- Special Events
- The Quad
- SAS Super Demos
- Meet the Presenters
- For Presenters
- Call for Papers
- Presenter Resources
- Scholarships
- Junior Professionals
- Students
- Get Involved
- Sponsorship
- Be a Volunteer
- Best Paper Winners
Proceedings
MWSUG 2016 Paper Presentations
Paper presentations are the heart of a SAS users group meeting. MWSUG 2016 will feature over 100 paper presentations organized into 12 academic sections covering a variety of topics and experience levels.
Note: Content and schedule are subject to change. Last updated 03-Oct-2016.
- BI / Customer Intelligence
- Beyond the Basics SAS
- Career Development
- Data Visualization and Graphics
- Hands-on Workshops
- Pharmaceutical Applications
- Rapid Fire
- SAS 101
- Statistics / Advanced Analytics
- System Architecture and Administration
- Tools of the Trade
- e-Posters
BI / Customer Intelligence
Paper No. | Author(s) | Paper Title (click for abstract) |
BI01 | Kirk Paul Lafler & Josh Horstman |
Building a Better Dashboard Using SAS® Base Software |
BI02 | Laurie Bishop | Macro method to use Google Maps" and SAS® to find the shortest driving and straight line distances between 2 addresses in the United States |
BI03 | Arjun Shrestha | SAS® Automation & SAS® Code Improvement (Making Codes Dynamic) |
BI04 | Nate Derby | Reducing Customer Attrition with Predictive Analytics for Financial Institutions |
Beyond the Basics SAS
Career Development
Paper No. | Author(s) | Paper Title (click for abstract) |
CD01 | Kirk Paul Lafler | What's Hot - Skills for SAS® Professionals |
CD04 | David Corliss | Statistical Volunteering With SAS - Experiences and Opportunities |
CD05 | Mindy Kiss et al. | Recruiting and Retention Strategies |
CD06 | Chad Melson | Mentoring and Oversight of Programmers across Cultures and Time Zones |
CD07 | Nate Derby | How to Use LinkedIn to Effectively Boost Your Career Development |
Data Visualization and Graphics
Paper No. | Author(s) | Paper Title (click for abstract) |
DV01 | Jesse Pratt | Using Animation to Make Statistical Graphics Come to Life |
DV04 | Louise Hadden | SAS/GRAPH® and GfK Maps: a Subject Matter Expert Winning Combination |
DV05 | Louise Hadden | Red Rover, Red Rover, Send Data Right Over: Exploring External Geographic Data Sources with SAS(R) |
DV08 | Stephanie Thompson | Four Thousand Reports Three Ways |
DV09 | Stephanie Thompson | Using Big Data to Visualize People Movement Using SAS Basics |
DV10-SAS | Dan Heath | Annotating the ODS Graphics Way! |
Hands-on Workshops
Paper No. | Author(s) | Paper Title (click for abstract) |
HW01 | Dave Foster | HOW - Visual Analytics |
HW02 | Kirk Paul Lafler | A Hands-on Introduction to SAS® DATA Step Hash Programming Techniques |
HW03 | Kent Phelps & Ronda Phelps |
The Joinless Join ~ The Impossible Dream Come True; Expand the Power of Base SAS® and SAS® Enterprise Guide® in a New Way |
HW04 | William E Benjamin Jr | Working with the SAS® ODS EXCEL Destination to Send Graphs, and Use Cascading Style Sheets When Writing to EXCEL Workbooks |
HW05 | Chuck Kincaid | Intermediate SAS(r) Macro Programming |
Pharmaceutical Applications
Paper No. | Author(s) | Paper Title (click for abstract) |
PH01 | Abhinav Srivastva | Pre-Data Checks for SDTM Development |
PH03 | Joe Palmer | Surviving septic shock: How SAS helped a critical care nursing staff fulfill its septic shock reporting requirements |
PH04 | Ronald Smith | Establishing Similarity of Modeled and Experimental PK/PD Hysteretic Loops using Pseudo Time Series Analysis and Dynamic Time Warping |
PH05 | Robin High | Fitting Complex Statistical Models with PROCs NLMIXED and MCMC |
PH06 | Kechen Zhao | Frequentist and Bayesian Interim Analysis in Clinical Trials: Group Sequential Testing and Posterior Predictive Probability Monitoring Using SAS |
Rapid Fire
SAS 101
Paper No. | Author(s) | Paper Title (click for abstract) |
SA01 | Kirk Paul Lafler | Top Ten SAS® Performance Tuning Techniques |
SA02 | Kirk Paul Lafler | SAS® Debugging 101 |
SA05 | Arthur Li | Simplifying Effective Data Transformation Via PROC TRANSPOSE |
SA06 | Keith Fredlund & Thinzar Wai |
Painless Extraction: Options and Macros with PROC PRESENV |
SA08 | Lakshmi Nirmala Bavirisetty et al. | Hashtag #Efficiency! An Introduction to Hash Tables |
SA09 | Nancy Brucken | Array of Sunshine: Casting Light on Basic Array Processing |
SA10 | Josh Horstman | Beyond IF THEN ELSE: Techniques for Conditional Execution of SAS® Code |
SA11 | Kiran Venna | Accessing Teradata through SAS, common pitfalls, solutions and tips |
Statistics / Advanced Analytics
System Architecture and Administration
Paper No. | Author(s) | Paper Title (click for abstract) |
SY01 | Troy Hughes | Spawning SAS® Sleeper Cells and Calling Them into Action: Implementing Distributed Parallel Processing in the SAS University Edition Using Commodity Computing To Maximize Performance |
SY02 | Venkateswarlu Toluchuri | Key Tips for SAS® Grid Users |
SY03 | David Corliss | Enterprise Architecture for Analytics Using TOGAF |
SY04 | David Ward | Avoiding Code Chaos - Architectural Considerations for Sustainable Code Growth |
SY05-SAS | Scott Parrish | SAS® Grid Administration Made Simple |
Tools of the Trade
e-Posters
Paper No. | Author(s) | Paper Title (click for abstract) |
PO02 | Troy Hughes | Sorting a Bajillion Records: Conquering Scalability in a Big Data World |
PO03 | Jingye Wang | A Predictive Logistic Regression Model for Chronic Kidney Disease |
PO04 | Richann Watson & Karl Miller |
When ANY Function Will Just NOT Do |
PO05 | Deanna Schreiber-Gregory | Multicollinearity: What Is It and What Can We Do About It? |
PO06 | Xi Chen & Hunter Moseley |
Protein NMR Reference Correction: A statistical approach for an old problem. |
PO07 | Abigail Baldridge et al. | StatTag: A New Tool for Conducting Reproducible Research with SAS |
PO08 | Louise Hadden & Roberta Glass |
Document and Enhance Your SAS® Code, Data Sets, and Catalogs with SAS Functions, Macros, and SAS Metadata |
PO09 | Louise Hadden | What to Expect When You Need to Make a Data Delivery. . . Helpful Tips and Techniques |
PO13 | Drew Doyle | Regression Analysis of the Levels of Chlorine in the Public Water Supply in Orange County, FL |
Abstracts
BI / Customer Intelligence
BI01 : Building a Better Dashboard Using SAS® Base SoftwareKirk Paul Lafler, Software Intelligence Corporation
Josh Horstman, Nested Loop Consulting
Tuesday, 2:30 PM - 2:50 PM, Location: Regency G
Organizations around the world develop business intelligence dashboards to display the current status of point-in-time metrics and key performance indicators. Effectively designed dashboards often extract real-time data from multiple sources for the purpose of highlighting important information, numbers, tables, statistics, metrics, and other content on a single screen. This presentation introduces basic rules for good dashboard design and the metrics frequently used in dashboards, to build a simple drill-down dashboard using the DATA step, PROC FORMAT, PROC PRINT, PROC MEANS, ODS, ODS Statistical Graphics, PROC SGPLOT and PROC SGPANEL in SAS® Base software.
BI02 : Macro method to use Google Maps" and SAS® to find the shortest driving and straight line distances between 2 addresses in the United States
Laurie Bishop, Cincinnati Children's Hospital Medical Center
Tuesday, 4:00 PM - 4:20 PM, Location: Regency G
Google Maps" is a very useful tool in finding driving distances between addresses and defining a geographic representation of an address which can be used to find straight line distances between two addresses. Often distances between addresses and defining latitude and longitudes for locations can be useful in research. Using macro and data step SAS® code, the shortest driving distance between two addresses can be found by searching the html code resulting from a Google Maps" driving directions search between two addresses. In addition, a search of the html code resulting from a singular address Google Maps" search, will enable one to define the latitude and longitude for a location. This will allow use of the SAS® GEODIST function to find the straight line distance between locations. Partial addresses are taken into consideration with this method, which was developed using SAS® v 9.3 for Windows®. References: Going the Distance: Google Maps Capabilities in a Friendly SAS Environment, Anton Bekkerman, Ph.D., Montana State University, Bozeman, MT http://wuss.org/Proceedings13/100_Paper.pdf Driving Distances and Drive Times using SAS and Google Maps, Mike Zdeb, University @Albany School of Public Health, Rensselaer, NY http://support.sas.com/resources/papers/proceedings10/050-2010.pdf
BI03 : SAS® Automation & SAS® Code Improvement (Making Codes Dynamic)
Arjun Shrestha, Centene Corporation
Tuesday, 3:00 PM - 3:50 PM, Location: Regency G
Process automation using dynamic SAS code will save time and money, giving your team more time to innovate rather than doing manual work. SAS automation is achieved by having a good architectural design, database design, robust IT infrastructure, and fluid work environment. With some easy considerations and out of the box thinking, it is possible to build more dynamic SAS codes.
BI04 : Reducing Customer Attrition with Predictive Analytics for Financial Institutions
Nate Derby, Stakana Analytics
Tuesday, 1:30 PM - 2:20 PM, Location: Regency G
As community banks and credit unions market themselves to increase their market share against the big banks, they understandably focus on gaining new customers. However, they must also retain (and further engage) their existing customers. Otherwise, the new customers they gain can easily be offset by existing customers who leave. Happily, by using predictive analytics as described in this paper, it can actually be much easier and less expensive to keep (and further cultivate) existing customers than to enlist new ones. This paper provides a step-by-step overview of a relatively simple but comprehensive approach to reduce customer attrition. We first prepare the data for a statistical analysis. With some basic predictive analytics techniques, we can then identify those customers who have the highest chance of leaving and the highest value. For each of these customers, we can also identify why they would leave, thus suggesting the best way to intervene to retain them. We then make suggestions to improve the model for better accuracy. Finally, we provide suggestions to extend this approach to cultivating existing customers and thus increasing their lifetime value. Code snippets will be shown for any version of SAS® but will require the SAS/STAT package. This approach can also be applied to many other organizations and industries.
Beyond the Basics SAS
BB01 : How to Create Sub-Sub Headings in PROC REPORT and Why You Might Want to: Thinking about non-traditional uses of PROC REPORTAmy Gravely, Center for Chronic Disease Outcomes Research, A VA HSR&D Center of Innovation, Minneapolis VA Medical Center
Barbara Clothier, Center for Chronic Disease Outcomes Research, A VA HSR&D Center of Innovation, Minneapolis VA Medical Center
Monday, 11:30 AM - 11:50 AM, Location: Wolverine AB
Often a statistician or programmer has a report in mind that they would like to generate through PROC REPORT. Additionally they might also like to automate it through SAS macro techniques. They go to the literature and do not find examples close to what they would like to produce. However, they do find things they can piece together to create the report that they want to see. This paper outlines a specific example of going this route. This paper outlines one non-traditional use of some PROC REPORT techniques to achieve sub-sub headings in a report. Often the literature shows breaks or headings before or after totals or summary statements, but you can utilize these techniques along with the $varying format to achieve sub-sub headings in any report. This paper will walk through the process and the end result.
BB02 : Name that Function: Punny Function Names with Multiple MEANings and Why You Do Not Want to be MISSING Out
Ben Cochran, The Bedford Group
Art Carpenter, CA Occidental Consultants
Tuesday, 8:00 AM - 8:50 AM, Location: Hoosier AB
The SAS® DATA step is one of the best (if not the best) data manipulators in the programming world. One of the areas that gives the DATA step its flexibility and power is the wealth of functions that are available to it. With over 450 functions in the DATA step it is difficult to learn and remember them all, however understanding how to take advantage of the power of these functions is key to taking full advantage of the DATA step. This paper takes a PEEK at some of those functions whose names have more than one MEANing. While the subject matter is very serious, the material will be presented in a way that is guaranteed not to BOR the audience. Syntax will be discussed and examples of how these functions can be used to manipulate data will be demonstrated. With so many functions available and with less than an HOUR to present, we obviously will have to TRIM our list so that the presentation will fit within the allotted TIME.
BB03 : I've Got to Hand It to You; Portable Programming Techniques
Art Carpenter, CA Occidental Consultants
Mary Rosenbloom, Alcon, a Novartis Company
Monday, 1:00 PM - 1:50 PM, Location: Hoosier AB
Experienced programmers know the importance of commenting code so that it can be reused and modified later if needed. As technology expands, we also have the need to create programs that can be handed off - to clients, to regulatory agencies, to parent companies, or to other projects - with little or no modification by the recipient. Minimizing modification by the recipient requires the program itself to self-modify. To some extent the program must be aware of its own operating environment and what it needs to do to adapt to it. There are a great many tools available to the SAS programmer, which will allow the program to self-adjust to its own surroundings. These include location-detection routines, batch files based on folder contents, the ability to detect the version and location of SAS, discerning and adjusting to the current operating system, programmatic zipping of files and the corresponding folder structure, the use of automatic and user defined environmental variables, and macro functions that modify system information. Need to create a portable program? We can hand you the tools.
BB04 : Simplifying Your %DO Loop with CALL EXECUTE
Arthur Li, City of Hope
Monday, 11:30 AM - 11:50 AM, Location: Hoosier AB
One often uses an iterative %DO loop to execute a section of a macro repetitively. An alternative method is to utilize the implicit loop in the DATA step with the EXECUTE routine to generate a series of macro calls. One of the advantages in the latter approach is eliminating the needs of using indirect referencing. To better understand the use of the CALL EXECUTE, it is essential for programmers to understand the mechanism and the timing of macro processing to avoid programming errors. These technical issues will be discussed in detail in this paper.
BB06 : Be Prompt! Do it Now! An Introduction to Prompts in SAS Enterprise Guide.
Ben Cochran, The Bedford Group
Monday, 8:30 AM - 9:20 AM, Location: Wolverine AB
There are so many things you can do with prompts in Enterprise Guide, and this paper looks at the many of them. First, this paper discusses the plethora of prompts and what they can do. Next, this paper shows you, in a step by step fashion`1, how to create prompts easily. Finally, this paper goes into a discussion on how to use the prompts that you just created.
BB07 : Advanced Prompting in SAS Enterprise Guide.
Ben Cochran, The Bedford Group
Monday, 1:00 PM - 1:50 PM, Location: Wolverine AB
This paper is a sequel to the Intro to EG Prompting paper. With some simple programming behind the scenes, you can take simple EG prompts, that are already powerful, and make them very robust. This paper shows many examples on how to do this. One of the big examples will be an illustration of a cascade prompt.
BB08 : Dont Let Your Annual Report Be Such a Manual Report: Neat New Ways from SAS to Combine Text, Graph and Tabular Report
Ben Cochran, The Bedford Group
Monday, 10:30 AM - 11:20 AM, Location: Wolverine AB
This paper looks at methods of combining all kinds of output, such as Text, Graphs and Tables, into a single pdf file. The process is accomplished totally within SAS. This paper is the product of having done this for an organization that was preparing its annual report. What originally took a few months to produce, now takes only a few minutes.
BB11 : A macro to batch read html files and generate standard output for the facility of the BISR from KUMC
Chuanwu Zhang, University of Kansas Medical Center
John Keighley, University of Kansas Medical Center
Byron Gajewski, University of Kansas Medical Center
Monday, 8:00 AM - 8:20 AM, Location: Hoosier AB
The Biostatistics and Informatics Shared Resource (BISR), an operation of University of Kansas Medical Center (KUMC) Biostatistics department, collaborates with University of Kansas Cancer Center (KUCC) for new and revised grant application, providing KUCC statistical service in terms of study design, data management, statistical oversight, and analysis. Clinical trial data is entered and stored in the Velos system using forms for data entry. Exporting forms from Velos and importing into SAS individually is time-consuming. The data files coming from Velos have the .xls file extension but they are actually html files. A macro is developed to automatically read the data from html files into SAS datasets and csv files simultaneously that exist in a folder. The macro only needs file path information, such as the location of the folder that contains the data from the forms. This macro also generates a summary report for each export/import cycle with built in error checks to ensure data quality of the conversion. In addition, a second macro will be used to summarize the demographics and adverse events forms. The macros increase productively by reducing the time required to move the data from storage to analysis and also gives basic reports needed for any trial automatically. SAS or JMP products: SAS 9.4 TS Level 1M1 operating system or SAS version dependency: Windows Version 3.2.9200 intended audience: audience from academic with basic SAS knowledge who analyzed the clinical data and works with data transportation periodically.
BB12 : Time is of Essence: The power of MISS is NOT being missed!
Gowri Madhavan, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
Monday, 8:30 AM - 8:50 AM, Location: Hoosier AB
Faced with managing data from an external databases such as REDCap (research electronic data capture) for customers; one is often encountered with analyzing data and conduct quick data manipulation for validating, error checking and running reports. REDCap provides the ability to export data into SAS using macro driven codes into SAS datasets. Time is always of essence and as your clock is ticking away, let the NMISS and CMISS twin codes take charge of your data that will display the number of missing values for each variables and count of missing values for each observation. The NMISS used with a proc means statement will display missing values for each variable. This coupled with the CMISS function can store the number of missing values for both numeric and character variable for each observation. These codes are powerful when used together to generate a SAS dataset using a mathematical operator that will produce the observations with missing values. This output can further be enhanced using traffic color codes to output using proc report with ODS to highlight the observations with missing values vs. validated to your end use customers. A bonus step is to delete observations from a dataset when all or most of the variables have missing data is by using custom macros to manage the variables and invoke the macro variable in a data step. This paper will give you the bandwidth to perform multiple functions.
BB14 : WAPtWAP, but remember TMTOWTDI
Jack Shoemaker, MDwise
Monday, 9:00 AM - 9:20 AM, Location: Hoosier AB
Writing a program that writes a program (WAPtWAP) is a technique that allows one to solve computing problems in a flexible and dynamic fashion. Of course, there is more than one way to do it (TMTOWTDI). This presentation will explore three WAPtWAP methods available in the SAS system.
BB15 : Anatomy of a Merge Gone Wrong
Josh Horstman, Nested Loop Consulting
Monday, 2:30 PM - 2:50 PM, Location: Hoosier AB
The merge is one of the SAS programmer's most commonly used tools. However, it can be fraught with pitfalls to the unwary user. In this paper, we look under the hood of the data step and examine how the program data vector works. We see what's really happening when datasets are merged and how to avoid subtle problems.
BB16 : Solving Common PROC SQL Performance Killers when using ODBC
John Schmitz, Luminare Data
Tuesday, 1:30 PM - 2:20 PM, Location: Hoosier AB
The PROC SQL routine is commonly used by many SAS programmers. However, poorly formed queries can create needless performance killers. This paper reviews common SQL coding practices observed in code reviews and provides simple alternatives that can dramatically reduce query run time and required resources. This paper will focus on techniques that should be comfortable to SQL programmers and yet can shave 75 percent or more off of query run times.
BB17 : Five Little Known, But Highly Valuable and Widely Usable, PROC SQL Programming Techniques
Kirk Paul Lafler, Software Intelligence Corporation
Monday, 9:30 AM - 10:20 AM, Location: Hoosier AB
The SQL Procedure contains a number of powerful and elegant language features for SQL users. This presentation highlights five little known, but highly valuable and widely usable, topics that will help users harness the power of the SQL procedure. Topics include using PROC SQL to identify FIRST.row, LAST.row and Between.rows in BY-group processing; constructing and searching the contents of a value-list macro variable for a specific value; data validation operations; data summary operations to process down rows and across columns; and using the MSGLEVEL= system option and _METHOD SQL option to capture information into the processes during query evaluation, the algorithm selected and used by the optimizer when processing a query, testing and debugging operations, and other processes.
BB18 : SAS Advanced Programming with Efficiency in Mind: A Real Case Study
Lingqun Liu, University of Michigan
Tuesday, 9:30 AM - 10:20 AM, Location: Hoosier AB
This paper uses a real work example to demonstrate the concept and some basic tips of SAS programming efficiency. The first section of the paper introduces the background of a SAS application and its performance metrics. The second section analyzes the structure and features of the SAS application. The third section analyzes the log of the application to identify efficiency issues. Also, in this section a log analysis utility is introduced. The fourth section provides a re-developed version of the application with performance improved to reduce 99.6% of its runtime. The last section tries to raise awareness of SAS programming efficiency and suggests some basic tips. The application discussed in the paper has been tested with SAS 9.2, 9.3 and 9.4 on Windows machines. The target audience includes SAS programmers from beginner to advanced level.
BB19 : How to Speed Up Your Validation Without Really Trying
Alice Cheng, --
Michael Wise, Experis
Justina Flavin, SimulStat
Monday, 10:30 AM - 11:20 AM, Location: Hoosier AB
To ensure the quality of data in clinical trials as well as other disciplines, double programming is often performed: datasets created independently by two different programmers using the same specifications are compared to ensure that they are identical. The COMPARE procedure in SAS® offers relatively simple but powerful tool to achieve this goal. In this paper, the authors provide an introduction of the COMPARE procedure. They then describe techniques to speed up the validation process culminating with a SAS® macro that takes advantage of the full range of the COMPARE procedure's capabilities.
BB20 : Leads and Lags: Static and Dynamic Queues in the SAS® DATA STEP
Mark Keintz, Wharton Research Data Services
Monday, 3:00 PM - 3:50 PM, Location: Hoosier AB
From stock price histories to hospital stay records, analysis of time series data often requires use of lagged (and occasionally lead) values of one or more analysis variable. For the SAS® user, the central operational task is typically getting lagged (lead) values for each time point in the data set. While SAS has long provided a LAG function, it has no analogous lead function - an especially significant problem in the case of large data series. This paper will (1) review the lag function, in particular the powerful, but non-intuitive implications of its queue-oriented basis, (2) demonstrate efficient ways to generate leads with the same flexibility as the lag function, but without the common and expensive recourse to data re-sorting, and (3) show how to dynamically generate leads and lags through use of the hash object.
BB21 : Finding National Best Bid and Best Offer - Quote by Quote
Mark Keintz, Wharton Research Data Services
Tuesday, 3:00 PM - 3:50 PM, Location: Hoosier AB
U.S. stock exchanges (currently there are 12) are tracked in real time via the Consolidated Trade System (CTS) and the Consolidated Quote System (CQS). CQS contains every updated quote from each of these exchanges, covering some 8,500 stock tickers. It provides the basis by which brokers can honor their fiduciary obligation to investors to execute transactions at the best price, i.e. at the NBBO (National Best Bid or Best Offer). With the advent of electronic exchanges and high frequency trading (timestamps are published to the microsecond), data set size (approaching 1 billion quotes requiring 80 gigabytes of storage for a normal trading day) has become a major operational consideration for market behavior researchers recreating NBBO values. This presentation demonstrates a straightforward use of hash tables for tracking constantly changing quotes for each ticker/exchange combination to provide the NBBO for each ticker at each time point in the trading day.
BB22 : From Stocks to Flows: Using SAS® HASH objects for FIFO, LIFO, and other FO's
Mark Keintz, Wharton Research Data Services
Tuesday, 10:30 AM - 11:20 AM, Location: Hoosier AB
Tracking gains or losses from the purchase and sale of diverse equity holdings depends in part on whether stocks sold are assumed to be from earliest lots acquired (a FIFO list) or the latest lots acquired (LIFO). Other inventory tracking applications have a similar need for application of either FIFO or LIFO rules. This presentation shows how a collection of simple ordered hash objects, in combination with a hash-of-hashes is a made-to-order technique for easy data-step implementation of FIFO, LIFO, and other less-likely rules (e.g. HIFO and LOFO).
BB23 : Automating the Process of Generating Publication Quality Regression Tables through SAS/Base Programming
Ji Qi, University of Michigan Health System
Monday, 10:00 AM - 10:20 AM, Location: Wolverine AB
Regression analysis is a widely used method for scientific investigation in biomedical and public health research. Most academic journals have specific requirements on the standard reporting of regression results. The output from major SAS/STAT regression procedures, however, does not usually meet such requirement and is therefore not directly usable for final presentation. While it is acceptable to manually edit the default output, such tasks can easily become arduous and error-prone when a large number of regression models are fitted and reported. This paper describes several SAS/Base programming tools for automating the process of generating publication quality regression tables. We start by reviewing the reporting rules specified in major journals such as Journal of American Medical Association and American Journal of Public Health. Next, we provide a detailed illustration of generating regression tables using the publicly available Behavioral Risk Factor Surveillance System (BRFSS) data. Key steps to be discussed include using PROC FORMAT to easily manipulate the reference group for categorical predictors; Applying ODS statement to extract key component(s) of regression output; Utilizing macro variables to store additional regression parameters to be used for titles/footnotes; and making use of various string functions (e.g., PUT, STRIP, concatenation) in DATA steps to format regression estimates and other related statistics. Both linear and logistic regression cases will be used to demonstrate the application of such programming techniques in constructing regression tables.
BB26 : Implementing a Medicaid Claims Processing System using SAS® software A Novel Implementation of BASE SAS® Using Limited Resources In An Effective Manner
Stephen Devoyd, Ohio Department of Developmental Disabilites
Tuesday, 2:30 PM - 2:50 PM, Location: Hoosier AB
When you think of Base SAS® what comes to mind? A processor of huge amounts of data which can be presented in a myriad of ways? Charts and graphs, detailed spreadsheets? Perhaps you think of advanced statistical measures? The Ohio Department of Developmental Disabilities (ODoDD) utilizes base SAS® in a novel or unconventional way. The purpose of this abstract will be to explain how ODoDD utilizes SAS® to support claims processing for over 40,000 individuals served by almost 15,000 Medicaid waiver providers for each of the 4 Medicaid Waivers administered by ODoDD. Each week over 6,000 claims files are collected electronically at the DODD secured website. These files are processed by a series of complex programs which evaluate an average of 600,000 claims weekly using data from SAS datasets which are sized from a few records to over 200,000,000 records. After ODoDD completes initial editing the data, the claims are sent electronically to the Ohio Department of Medicaid (ODM) for review and adjudication. The resulting adjudication file returned to ODoDD contains approved and denied claims. The claims marked to be paid are then processed by ODoDD's SAS® payment processing programs. These SAS® payment programs create vouchers for Medicaid providers for over 27,000,000 claims submitted annually. This volume represents over $1,500,000,000 in annual reimbursements to the provider community. The process outlined above is accomplished by using BASE SAS® software leveraging minimal SAS® Programming FTE's to support this dynamic and and ever-changing environment. This paper and presentation will include: - Overall System Architecture including SAS instances running in virtual server environments - ODoDD's use of SAS®, SAS® -ODS and SAS® -SQL. - How ODoDD leverages SAS® to accomplish tasks in an efficient manner. - Issues and lessons learned by ODoDD, including dealing with size and complexities and what ODoDD has done to address them. No dependancies on Operating System or Version....the skill level is anyone. We do not use the Enterprise Version of the SAS®
BB27 : A Macro that can Search and Replace String in your SAS Programs
Ting Sa, Cincinnati Children's Hospital Medical Center
Monday, 2:00 PM - 2:20 PM, Location: Hoosier AB
In this paper, a SAS macro is introduced that can search and replace any string in the SAS programs. To use the macro, the user only needs to pass the folder name, the search string to it. If the user wants to use the replacement function, the user also needs to pass the replacement string to the macro. The macro will check all the SAS programs in this folder and the subfolders to find out which files contain the search string. The macro will generate new SAS files for the replacement so that the old files will not be affected. An html report will be generated by the macro to include the original file locations,the line number of the SAS codes that contain the search string and the SAS codes with search string highlighted in yellow. If you use the replacement string function, the html report will also include the location information for the new SAS files. The location information in the html report is created with hyperlinks, so the user can directly open the files from the report.
BB28 : A Macro That Can Fix Data Length Inconsistency and Detect Data Type Inconsistency
Ting Sa, Cincinnati Children's Hospital Medical Center
Tuesday, 4:00 PM - 4:20 PM, Location: Hoosier AB
Common tasks that we need to perform are merging or appending SAS® data sets. During this process, we sometimes get error or warning messages saying that the same fields in different SAS data sets have different lengths or different types. If the problems involve a lot of fields and data sets, we need to spend a lot of time to identify those fields and write extra SAS codes to solve the issues. However, if you use the macro in this paper, it can help you identify the fields that have inconsistent data type or length issues. It also solves the length issues automatically by finding the maximum field length among the current data sets and assigning that length to the field. An html report is generated after running the macro that includes the information about which fields' lengths have been changed and which fields have inconsistent data type issues.
BB29 : A DDE Macro to Put Data Anywhere in Excel
Ting Sa, Cincinnati Children's Hospital Medical Center
Shiran Chen, Cincinnati Children's Hospital Medical Center
Monday, 9:30 AM - 9:50 AM, Location: Wolverine AB
In this paper, the authors introduce an SAS macro that integrates the SAS DDE techniques so you can use this macro to put data anywhere in your Excel file. SAS users only need to prepare the input data and call the macro for the data to be automatically inserted into the Excel file. This macro will be very helpful if you have large amounts of data to be inserted into different cells in the Excel file.
BB30 : A Failure to EXIST: Why Testing for Data Set Existence with the EXIST Function Alone Is Inadequate for Serious Software Development in Asynchronous, Multi-User, and Parallel Processing Environments
Troy Hughes, Datmesis Analytics
Tuesday, 9:00 AM - 9:20 AM, Location: Hoosier AB
The Base SAS® EXIST function demonstrates the existence (or lack thereof) of a data set. Conditional logic routines commonly rely on EXIST to validate data set existence or absence before subsequent processes can be dynamically executed, circumvented, or terminated based on business logic. In synchronous software design where data sets cannot be accessed by other processes or users, EXIST is both a sufficient and reliable solution. However, because EXIST captures only a split-second snapshot of the file state, it provides no guarantee of file state persistence. Thus, in asynchronous, multi-user, and parallel processing environments, data set existence can be assessed by one process but instantaneously modified (by creating or deleting the data set) thereafter by a concurrent process, leading to a race condition that causes failure. Due to this vulnerability, most classic implementations of the EXIST function within SAS literature are insufficient for testing data set existence in these complex environments. This text demonstrates more reliable and secure methods to test SAS data set existence and perform subsequent, conditional tasks in asynchronous, multi-user, and parallel processing environments.
BB31 : Stress Testing and Supplanting the SAS® LOCK Statement: Implementing Mutex Semaphores To Provide Reliable File Locking in Multi-User Environments To Enable and Synchronize Parallel Processing
Troy Hughes, No Affiliation
Tuesday, 11:30 AM - 11:50 AM, Location: Hoosier AB
The SAS® LOCK Statement was introduced in SAS version 7 with great pomp and circumstance, as it enabled SAS software to lock data sets exclusively. In a multi-user or networked environment, an exclusive file lock prevents other users or processes from accessing and accidentally corrupting a data set while it is in use. Moreover, because file lock status can be tested programmatically with the LOCK statement return code (&SYSLCKRC), data set accessibility can be validated before attempted access, thus preventing file access collisions and facilitating more reliable, robust software. Notwithstanding the intent of the LOCK statement, stress testing demonstrated in this text illustrates vulnerabilities in the LOCK statement that render its use inadvisable due to its inability to reliably lock data setsits only purpose. To overcome this limitation and enable reliable data set locking, a methodology is demonstrated that utilizes dichotomous semaphoresor flagsthat indicate whether a data set is available or is in use, and mutually exclusive (mutex) semaphores that restrict data set access to a single process at one time. With Base SAS file locking capabilities now restored, this text further demonstrates control table locking to support process synchronization and parallel processing. The SAS macro LOCKITDOWN is included and demonstrates busy-waiting (or spinlock) cycles that repeatedly test data set availability until file access is achieved or a process times out.
BB32 : All Aboard! Next Stop is the Destination Excel
William E Benjamin Jr, Owl Computer Consultancy LLC
Monday, 2:00 PM - 2:50 PM, Location: Wolverine AB
Over the last few years both Microsoft Excel file formats and the SAS® interfaces to those Excel formats have changed. SAS® has worked hard to make the interface between the two systems easier to use. Starting with Comma Separated Variable files and moving to PROC IMPORT and PROC EXPORT, LIBNAME processing, SQL processing, SAS® Enterprise Guide®, JMP®, and then on to the HTML and XML tagsets like MSOFFICE2K, and EXCELXP. Well, there is now a new entry into the processes available for SAS users to send data directly to Excel. This new entry into the ODS arena of data transfer to Excel is the ODS destination called EXCEL. This process is included within SAS ODS and produces native format Excel files for version 2007 of Excel and later. It was first shipped as an experimental version with the first maintenance release of SAS® 9.4. This ODS destination has many features similar to the EXCELXP tagsets.
BB34-SAS : Data Analysis with User-Written DS2 Packages
Robert Ray, SAS
Monday, 4:00 PM - 4:50 PM, Location: Hoosier AB
The DATA step and DS2 both offer the user a built-in general purpose hash object that has become the go-to tool for many data analysis problems. However, there are occasions where the best solution would require a custom object specifically tailored to the problem space. The DS2 Package syntax allows the user to create custom objects that can form linked structures in memory. With DS2 Packages it is possible to create lists or tree structures that are custom tailored to the problem space. For data that can describe a great many more states than actually exist, dynamic structures can provide an extremely compact way to manipulate and analyze the data. The SAS® In-Database Code Accelerator allows these custom packages to be deployed in parallel on massive data grids.
Career Development
CD01 : What's Hot - Skills for SAS® ProfessionalsKirk Paul Lafler, Software Intelligence Corporation
Tuesday, 9:30 AM - 10:20 AM, Location: Wolverine AB
As a new generation of SAS® user emerges, current and prior generations of users have an extensive array of procedures, programming tools, approaches and techniques to choose from. This presentation identifies and explores the areas that are hot in the world of the professional SAS user. Topics include Enterprise Guide, PROC SQL, PROC REPORT, Output Delivery System (ODS), Macro Language, DATA step programming techniques such as arrays and hash objects, SAS University Edition software, technical support at support.sas.com, wiki-content on sasCommunity.org®, published white papers on LexJansen.com, and other venues.
CD04 : Statistical Volunteering With SAS - Experiences and Opportunities
David Corliss, Peace-Work
Tuesday, 8:00 AM - 8:50 AM, Location: Wolverine AB
This presentation brings together experiences from SAS professionals working as volunteers for organizations, charities, and in academic research. Pro bono work, much like that done by physicians, attorneys and professionals in other areas is rapidly growing in statistical practice as important part of a statistical career, offering the opportunity to utilize your skills in a places where they are so needed but cannot be supported in a for-pay position. Statistical volunteers also gain important learning experiences, mentoring, networking, and other opportunities for professional development. The presenter will share experiences from volunteering for local charities, NGO's and other organizations and causes, both in the US and around the world. The mission statement and focus of some organizations are presented, including DataKind, Statisticians Without Borders, Peace-Work and other organizations.
CD05 : Recruiting and Retention Strategies
Mindy Kiss, Experis
Helen Chmiel, Experis
Andrea Moralez, Experis
Tuesday, 9:00 AM - 9:20 AM, Location: Wolverine AB
SAS Programmers in the IT staffing sector continue to be in high demand with projections of continued growth of 6% in 2016. IT overall employment from 1999 to 2013 has jumped more than 41%. Median unemployment rate for computer and mathematical occupations through 3Q14 was 2.9% vs. 5% overall. It is critical to business strategy to understand the challenges in recruiting and hiring which result from this growth and more importantly how to mitigate those challenges. An additional pressure on the Staffing sector is that organizations are relying more and more on a contingent workforce. The fierce competition to attract the best and the brightest talent among the limited supply of skilled workers to a market that is increasing in demand results in both a difficult recruiting environment and equally difficult retention environment. This talk will explore some of the techniques that organizations can employ to stay competitive in both the recruitment and retention of staff. Recruiting strategies that will attract the top players, including individualizing, use of social media, and calculating fit to the organization are explored. Retention strategies which emphasize employee engagement are also explored, emphasizing current research findings for contingent workers and including several unique approaches to employee engagement.
CD06 : Mentoring and Oversight of Programmers across Cultures and Time Zones
Chad Melson, Experis
Tuesday, 10:30 AM - 10:50 AM, Location: Wolverine AB
The career path of a SAS® programmer often develops in one of two ways, either as a 100% full-fledged programmer or as a manager with little to no opportunity to contribute as a programmer. Jobs that provide the opportunity to do both are not always available, especially in the world of consulting. Fortunately, I had the opportunity to take a role as a liaison between clinical SAS® programmers located in Ukraine and a client located on the US West Coast, while also maintaining my own project responsibilities as a programmer. This liaison position offered me a chance to mentor Ukrainian colleagues and apply and develop my technical and managerial skills. This was a new model for the client and there was some apprehension about the time difference and the unknowns of working with a team from a global region they had no prior experience with. Through specific examples, this paper will identify the skills needed to manage any type of client relationship, regardless of the geographical distance between stakeholders, and the value provided by mentoring to both the mentors and mentees. Also provided will be support for the ideas that in our virtual world, time zone differences can be much less of an issue than anticipated and that there were many more similarities than differences between Ukraine- and US-based teams.
CD07 : How to Use LinkedIn to Effectively Boost Your Career Development
Nate Derby, Stakana Analytics
Tuesday, 11:00 AM - 11:50 AM, Location: Wolverine AB
Most of us have a LinkedIn profile and are connected with other professionals. But how can you make your LinkedIn activity really boost your career development? In this paper, from the perspectives of both a SAS programmer looking for jobs and a consultant looking for projects, I first describe your target market. With this target market in mind, I then describe the process of building up your profile, finding appropriate LinkedIn Groups (and posting to them), publishing on LinkedIn Pulse, and developing an effective network of connections. Finally, let's remember that this is a long-term process and that LinkedIn is best used as an accompaniment for face-to-face meetings and not a substitute for them. If you effectively use these strategies (or develop similar ones on your own), you should have considerably more and better career opportunities than you ever had before!
Data Visualization and Graphics
DV01 : Using Animation to Make Statistical Graphics Come to LifeJesse Pratt, Cincinnati Children's Hospital Medical Center
Monday, 3:00 PM - 3:20 PM, Location: Buckeye A
The Statistical Graphics (SG) procedures and the Graph Template Language (GTL) are capable of generating powerful individual data displays. What if one wanted to visualize how a distribution changes with different parameters, how data evolve over time, or to view multiple aspects of a three dimensional plot? By utilizing macros to generate each a graph for each frame, combined with the ODS PRINTER destination, it is possible to create .gif files to create effective animated data displays. This paper outlines the syntax and strategy necessary to generate these displays, as well as provides a handful of examples. Intermediate knowledge of PROC SGPLOT, PROC TEMPLATE, and the SAS® MACRO language is assumed.
DV04 : SAS/GRAPH® and GfK Maps: a Subject Matter Expert Winning Combination
Louise Hadden, Abt Associates Inc.
Tuesday, 1:30 PM - 1:50 PM, Location: Wolverine AB
SAS® has an amazing arsenal of tools to use and display geographic information that is relatively unknown and underutilized. High quality GfK Geocoding maps have been provided by SAS since SAS 9.3 M2, as sources for inexpensive map data dried up. SAS has been including both GfK and "traditional" SAS map data sets with licenses for SAS/GRAPH for some time, recognizing there will need to be an extended transitional period. However, for those of us who have been putting off converting our SAS/GRAPH mapping programs to use the new GfK maps, the time has come, as the "traditional" SAS map data sets are no longer being updated. If you visit SAS MapsOnline, you will find only GfK maps in current maps. The GfK maps are updated once a year. This presentation will walk through the conversion of a long-standing SAS program to produce multiple US maps for a data compendium to take advantage of GfK maps. Products used are Base SAS® and SAS/GRAPH®. SAS programmers of any skill level will benefit from this presentation.
DV05 : Red Rover, Red Rover, Send Data Right Over: Exploring External Geographic Data Sources with SAS(R)
Louise Hadden, Abt Associates Inc.
Tuesday, 3:00 PM - 3:20 PM, Location: Wolverine AB
The intrepid Mars Rovers have inspired awe and Curiousity - and dreams of mapping Mars using SAS/GRAPH®. This presentation will demonstrate how to import SHP file data (using PROC MAPIMPORT) from sources other than SAS and GfK to produce useful (and sometimes creative) maps. Examples will include mapping neighborhoods, zcta3 areas, and of course, Mars. Products used are Base SAS® and SAS/GRAPH®. SAS programmers of any skill level will benefit from this presentation.
DV08 : Four Thousand Reports Three Ways
Stephanie Thompson, Rochester Inst. of Technology / Datamum
Monday, 3:30 PM - 3:50 PM, Location: Buckeye A
How do you go about generating over four thousand PDF reports in up to three different versions? When a large, southern research university decided to add up to five optional questions per class section and up to five more questions at the prefix level to their core set of fifteen questions on the student evaluation of faculty survey, it seemed like a project that would never be completed. If the additional questions weren't enough, the reports were being revamped at the same time to improve their appearance for delivery on the web. Each report had a tabular section and two customized box and whisker plots. Thanks to ODBC / SAS Access, PROC SQL, macro, DATA Step programming, PROC GPLOT, goptions, and ODS it all came together. This paper summarizes how each SAS® component was used and contributed to the completion of the project.
DV09 : Using Big Data to Visualize People Movement Using SAS Basics
Stephanie Thompson, Rochester Inst. of Technology / Datamum
Tuesday, 2:00 PM - 2:50 PM, Location: Wolverine AB
Visualizing the movement of people over time in an animation can provide insights that tables and static graphs cannot. There are many options but what if you want to base the visualization on large amounts of data from several sources? SAS is a great tool for this type of project. This paper will summarize how visualizing movement was accomplished using several datasets, large and small, and the various SAS PROCS to pull it together. The use of a custom shape file will also be highlighted. The end result is a gif which can be shared that provides insights not available with other methods.
DV10-SAS : Annotating the ODS Graphics Way!
Dan Heath, SAS
Monday, 2:00 PM - 2:50 PM, Location: Buckeye A
For some users, having an annotation facility is an integral part of creating polished graphics for their work. To meet that need, we created a new annotation facility for the ODS Graphics procedures in SAS® 9.3. Now, with SAS® 9.4, the Graph Template Language (GTL) supports annotation as well! In fact, GTL annotation facility has some unique features not available in the ODS Graphics procedures, such as using multiple sets of annotation in the same graph and the ability to bind annotation to a particular cell in the graph. This presentation covers some basic concepts of annotating that are common to both GTL and the ODS Graphics procedures. I apply those concepts to demonstrate some unique abilities of GTL annotation. Come see annotation in action!
Hands-on Workshops
HW01 : HOW - Visual AnalyticsDave Foster, Pinnacle Solutions
Monday, 1:00 PM - 2:50 PM, Location: Regency E
This Hands on work shop will walk through SAS Visual Analytics functionality and will allow users to build a report(s) using SAS demo data.
HW02 : A Hands-on Introduction to SAS® DATA Step Hash Programming Techniques
Kirk Paul Lafler, Software Intelligence Corporation
Monday, 3:00 PM - 4:50 PM, Location: Regency E
SAS software supports a DATA step programming technique known as hash that enables faster table lookup, search, merge/join, and sort operations. This hands-on workshop introduces what a hash object is, how it works, and the syntax required. Attendees learn essential programming techniques to define a simple key, sort data, search memory-resident data using a simple key, match-merge (or join) two data sets, handle and resolve collision scenarios where two distinct pieces of data have the same hash value, as well as more complex programming techniques that use a composite key to search for multiple values.
HW03 : The Joinless Join ~ The Impossible Dream Come True; Expand the Power of Base SAS® and SAS® Enterprise Guide® in a New Way
Kent Phelps, The SASketeers
Ronda Phelps, The SASketeers
Tuesday, 9:00 AM - 10:50 AM, Location: Regency E
Base SAS and SAS Enterprise Guide can easily combine data from tables or data sets by using a PROC SQL Join to match on like columns or by using a DATA Step Merge to match on the same variable name. However, what do you do when tables or data sets do not contain like columns or the same variable name and a Join or Merge cannot be used? We invite you to attend our exciting presentation on the Joinless Join where we will teach you how to expand the power of Base SAS and SAS Enterprise Guide in a new way. We will empower you to creatively overcome the limits of a standard Join or Merge. You will learn how to design a Joinless Join based upon dependencies, indirect relationships, or no relationships at all between the tables or data sets. In addition, we will highlight how to use a Joinless Join to prepare unrelated joinless data to be utilized by ODS and PROC REPORT in creating a PDF. Come experience the power and the versatility of the Joinless Join to greatly expand your data transformation and analysis toolkit. We look forward to introducing you to the surprising paradox of the Joinless Join.
HW04 : Working with the SAS® ODS EXCEL Destination to Send Graphs, and Use Cascading Style Sheets When Writing to EXCEL Workbooks
William E Benjamin Jr, Owl Computer Consultancy LLC
Tuesday, 1:30 PM - 3:20 PM, Location: Regency E
This Hands-On-Workshop will explore the new SAS® ODS EXCEL destination and focus on how to write Excel Worksheets with output from SAS Graph procedures and spice it up using Cascading Style Sheet features available on modern computer systems. Note that the ODS EXCEL destination is a BASE SAS product, which makes it available on all platforms. The workshop will be limited to the Windows platform, but it should be simple to port the code to other operating systems. The code will be on the computers and you will get a chance to see how it handles.
HW05 : Intermediate SAS(r) Macro Programming
Chuck Kincaid, Experis Business Analytics
Monday, 9:00 AM - 10:50 AM, Location: Regency E
The SAS Macro Language powerfully enhances a programmer's capabilities by providing an advanced level of flexibility and robustness to otherwise static programs. Programmers can use the Macro language to create dynamic programs that can be re-used in new situations, driven by the data, dependent upon the operating environment, and made to handle repetition without extensive coding. This Hands-On Workshop will explore some of the more advanced capabilities with the SAS Macro Language that may not be familiar to those just starting in the language. Some of the features explored include understanding variable scope, interacting with data steps, evaluating expressions, quoting variables, macro looping, macro debugging, and creating data driven programs. Basic experience with macro programming is assumed.
Pharmaceutical Applications
PH01 : Pre-Data Checks for SDTM DevelopmentAbhinav Srivastva, PaxVax Inc
Tuesday, 10:00 AM - 10:20 AM, Location: Regency G
In clinical trials SDTM development is a critical step in transforming raw data coming from CRFs into a standard format for FDA submission and building the foundation for downstream analysis and reporting (ADaM, TLFs, Define.xml etc). OpenCDISC (or Pinnacle 21) is a common tool used to validate SDTM datasets, but not enough effort is dedicated towards pre-processing the data coming from CRFs. The paper is targeted towards identifying basic data integrity checks that can be done at an early stage to get a full awareness of the data quality before proceeding with SDTM development. These checks are driven by CDISC compliance standards and are meant as pointers for SDTM programmers to initiate the investigation by raising a query with cross functional teams including Clinical data management, Vendors and Site representatives. Doing early quality checks on raw data provides an in-depth overview of how the final SDTM datasets will perform against the CDISC compliance tests. Additionally, it provides an extra layer of edit checks from those which may already be in place within the Electronic Data Capture (EDC) system used for the Clinical study like Medidata RAVE, InForm etc. ===== Technical requirements: Base SAS V9, Windows OS. Audience: Clinical / SDTM / Statistical programmers in Pharmaceutical Industry.
PH03 : Surviving septic shock: How SAS helped a critical care nursing staff fulfill its septic shock reporting requirements
Joe Palmer, OhioHealth
Tuesday, 10:30 AM - 10:50 AM, Location: Regency G
Sepsis is the result of a body's response to a bacterial or fungal infection and often results in organ damage and subsequent mortality. Starting in October 2015, The Centers for Medicare & Medicaid Services (CMS) requires hospitals to collect and submit data for patients that have severe sepsis and septic shock. Reporting requirements include pinpointing the time in which 2 indicators of Systemic Inflammatory Response Syndrome (SIRS) and 1 indicator of organ failure occur within a 6 hour period. In total, there are 4 indicators of SIRS and 8 indicators of organ failure. In addition, hospitals must report the time in which treatment was started. Manually looking up diagnostic and treatment times in the electronic medical record (EMR) system of a hospital is a lengthy process, even for an experienced nurse. A SAS program was created to query clinical data in a database fed by an electronic medical record system. The program generates an automated report which contains the required diagnosis and treatment times, thus allowing nurses to assess required data in less than 15 minutes per patient.
PH04 : Establishing Similarity of Modeled and Experimental PK/PD Hysteretic Loops using Pseudo Time Series Analysis and Dynamic Time Warping
Ronald Smith, Scientific Statistical Solutions
Tuesday, 9:30 AM - 9:50 AM, Location: Regency G
An iterative Bezier curve algorithm is used to approximate a hysteretic loop from a pharmacodynamics(PD) dose/response curve. The accuracy of the resulting vector-valued function is tested using graphs produced through PROC SGPLOT. The centroid of each curve is found through PROC IML. Euclidean distances from each centroid to points on the respective polygonal curves (experimental and model) produce several distance vectors for each curve. A sequence of these distance vectors is plotted as a pseudo time series for both the original experimental and the adapted model curves. The graphical similarity of these curves is examined using the Dynamic Time Warping (DTW) algorithm in PROC SIMILARITY. The information produced from the DTW process gives a numerical measure of how well the model follows the experimental curve. The parametric equations from this a posteriori model can form a composition with a time-based function to produce the final PK/PD effect versus time equations.
PH05 : Fitting Complex Statistical Models with PROCs NLMIXED and MCMC
Robin High, University of Nebraska Medical Center
Tuesday, 8:30 AM - 9:20 AM, Location: Regency G
SAS®/Stat software has several procedures which estimate parameters from generalized linear models designed for both continuous and discrete response data which includes proportions and counts. Procedures such as PROCs LOGISTIC, GENMOD, COUNTREG, GLIMMIX, and FMM, among others, offer a flexible range of analysis options to work with data from a variety of distributions and also with correlated or clustered data. Recent versions of SAS have procedures that model zero-inflated and truncated data. This paper demonstrates how statements from PROC NLMIXED can be written to match the output results from these procedures, including the LsMeans. Situations may arise where the flexibility of programming statements of PROC NLMIXED are needed such as for zero inflated, truncated counts, or proportions with random effects. A useful application of these coding techniques is that several programming statements from NLMIXED can be directly transferred into PROC MCMC to perform analyses from a Bayesian perspective with these various types of complex models.
PH06 : Frequentist and Bayesian Interim Analysis in Clinical Trials: Group Sequential Testing and Posterior Predictive Probability Monitoring Using SAS
Kechen Zhao, University of Southern California
Tuesday, 11:00 AM - 11:50 AM, Location: Regency G
Phase II/III trials focus on the evaluation of the compound's therapeutic effects and how well the compound performs at the recommended dose determined in Phase I trials. The goal of a typical Phase II oncological trial is to quickly screen compounds primarily based on their short-term efficacious effects. Interim monitoring is an important component of most Phase II/III clinical trials with the goal to stop a trial early for efficacy or for futility. Repeated hypothesis tests at a fixed level on accumulating data, however, inflate overall Type I error rate. To control Type I error, frequentist designs typically employ group sequential methods with alpha- or beta-spending functions, such as the Pocock and the O'Brien-Fleming (OBF) method, among others. The inflexible study designs, however, can be difficult to follow exactly because the interim data has to be evaluated at pre-specified fixed time point or fixed number of patients. In contrast, Bayesian methods, such as the Predictive Probability (PP) design, allow for continuous monitoring schedule flexibly with any number of stages and cohort sizes, which is more suitable in many clinical settings. This paper reviews theories of group sequential testing and demonstrates its usage in PROC SEQDESIGN and PROC SEQTEST procedures. The paper also reviews theories of predictive probability and demonstrates its usage and calibration of tuning parameters for achieving targeted Type I error rate/statistical power through extensive simulations in a SAS Macro developed by the author.
Rapid Fire
RF01 : Two Shades of Gray: Implementing Gray's Test of Equivalence of CIF in SAS 9.4Zachary Weber, Grand Valley State University
Tyler Ward, Student
Tuesday, 4:15 PM - 4:25 PM, Location: Buckeye B
Prior to SAS 9.4, the only way to test for equivalence over CIFs for competing risks in survival analysis was to use the %cif macro. With the implementation of SAS 9.4 this test can be easily achieved using proc lifetest. This paper will highlight, using a dataset of survival data, the two methods of completing this task in SAS with emphasis on the differing output and how the calculations vary slightly. The goal will be for the user to save time by switching from the macro and to begin using the standardized method through proc lifetest.
RF02 : Utilizing PROC CONTENTS with Macro Programming to Summarize and Tabulate Copious Amounts of Data
Kathryn Schurr, Quest Diagnostics
Tuesday, 1:15 PM - 1:25 PM, Location: Buckeye B
As data becomes more abundant, challenges exist where the data needs to be tabulated and summarized but the time and effort to code these univariate explorations is long and tedious. By using PROC CONTENTS on similarly formatted fields within a data source and implementing Macro programming, a user can produce hundreds of frequency tables or means tables in 40 lines of code or less.
RF03 : A Quick Macro to Replace Missing Values with Null for Numeric Fields within a CSV File
John Schmitz, Luminare Data
Tuesday, 3:15 PM - 3:25 PM, Location: Buckeye B
When large file uploads to RMDBS systems are required, csv file imports can provide a faster alternative than using SAS to insert the records. However, some RMDBS systems will convert null csv columns to 0, creating a potentially undesirable result. This macro using the original dataset and resulting csv file to identify and replace missing values with NULL for all numeric columns in the csv output.
RF04 : Importing CSV Data to All Character Variables
Art Carpenter, CA Occidental Consultants
Tuesday, 1:45 PM - 1:55 PM, Location: Buckeye B
Have you ever needed to import data from a CSV file and found that some of the variables have been incorrectly assigned to be numeric? When this happens to us we may lose information and our data may be incomplete. When using PROC IMPORT on an EXCEL file we can avoid this problem by specifying the MIXED=YES option to force all the variables to be character. This option is not available when using IMPORT to read a CSV file. Increasing GUESSINGROWS can help, but what if it is not sufficient? It is possible to force IMPORT to only create character variables. Although there is no option to do this, you can create a process that only creates character variables. The process is easily automated so that no intervention is required on the user's part.
RF05 : Fitting a Cumulative Logistic Regression Model
Shana Kelly, Spectrum Health
Tuesday, 2:00 PM - 2:10 PM, Location: Buckeye B
Cumulative logistic regression models are used to predict an ordinal response, and have the assumption of proportional odds. Proportional odds means that the coefficients for each predictor category must be consistent, or have parallel slopes, across all levels of the response. The paper uses a sample dataset to demonstrate how to test the proportional odds assumption, and use the UNEQUALSLOPES option when the assumption is violated. A cumulative logistic model is built, and then the performance of the cumulative logistic model on a test set is compared to the performance of a generalized multinomial model. This shows the utility and necessity of the UNEQUALSLOPES option when building a cumulative logistic model. The procedures shown are produced using SAS® Enterprise Guide 7.1.
RF06 : Your Place or Mine: Data-Driven Summary Statistic Precision
Nancy Brucken, InVentiv Health Clinical
Tuesday, 2:15 PM - 2:25 PM, Location: Buckeye B
The number of decimal places required for displaying summary statistics on a specific parameter is generally a function of the precision of the raw values collected for that parameter and the specific summary statistic requested. This logic can easily be hard-coded for tables displaying only a limited number of parameters, but becomes more difficult to maintain for tables displaying many such parameters. This paper presents a data-driven solution to the problem of displaying summary statistics at the correct level of precision for a larger number of parameters, such as are commonly found on summaries of clinical laboratory data.
RF07 : An Application of CALL VNEXT
John King, Ouachita Clinical Data Services, Inc.
Tuesday, 2:30 PM - 2:40 PM, Location: Buckeye B
VNEXT is a SAS® call routine that allows the examination of the program data vector (PDV), each call to the routine alters the values of its arguments to return information about each variable in the PDV. The data returned is variable name and optionally variable type and length. This Rapid Fire talk presents a short program that uses VNEXT to write a CSV file that includes the name row followed by the data rows all in one very short DATA step.
RF08 : The Power of Cumulative Distribution Function (CDF) Plot in Assessing Clinical Outcomes
Patricia Kultgen, Cook Research Inc.
Min Chen, Cook Research Inc.
Tuesday, 2:45 PM - 2:55 PM, Location: Buckeye B
Efficacy endpoints (e.g., inflammatory lesions, incontinence episodes, cognitive performance scores) are generally collected quantitatively at baseline and follow-up windows in clinical trials. One popular way of assessing efficacy is to calculate the percentage change from baseline, and then use a prospectively determined criterion, such as at least 50% improvement, to define success. One caveat of this method is that the pre-defined criterion is generally subjective and might not be the best for specific studies. This is especially true for trials at exploratory phases. In addition, the dichotomized outcome does not illustrate the whole story. The purpose of this paper is to encourage the use of the Cumulative Distribution Function (CDF) plot to assess distributions, display the complete data, and evaluate outcomes at multiple standards. All the graphs and statistics are generated using SAS 9.2 in window system, and only basic SAS skills are required to follow the output here.
RF09 : Hedonic House Price Project: Quantifying relative importance of crime rate on House price
Aigul Mukanova, University of Cincinnati
Tuesday, 3:00 PM - 3:10 PM, Location: Buckeye B
As a part of Urban and Regional Economics class at University of Cincinnati students were required to have a small empirical project for hedonic house price model. Using SAS author attempts to measure an effect of the crime on the house value in Ohio.
RF10 : I Have a Secret: How Can I Hide Small Numbers from Public View?
Fred Edora, South Carolina Department of Education
Tuesday, 1:30 PM - 1:40 PM, Location: Buckeye B
Due to many federal, state, and local regulations, maintaining privacy is not simply a requirement but also of utmost importance when reporting education data to the public. Special education data is no exception. Because of the highly sensitive nature of data associated with students, we must be vigilant in ensuring that data is transparent while also meeting all necessary privacy regulations. In most cases surrounding publicly reported data, numbers 10 and under must stay hidden and cannot be reversed engineered. How can this be achieved? SAS is no stranger to privacy and has multiple methods on hiding small numbers in publicly reported data. This paper will cover those methods and the pros and cons of each, and is intended for beginning or intermediate SAS users.
RF11 : Tips and Tricks for Producing Time-Series Cohort Data
Nate Derby, Stakana Analytics
Tuesday, 3:30 PM - 3:40 PM, Location: Buckeye B
Time-Series cohort data are needed for many applications where you're tracking performance over time. We show some simple coding techniques for effectively and efficiently producing time-series cohort data with SAS®.
RF12 : Generating Custom Shape Files for Data Visualization
Stephanie Thompson, Rochester Inst. of Technology / Datamum
Tuesday, 3:45 PM - 3:55 PM, Location: Buckeye B
There are times you may want to represent your data graphically but the existing options do not work. Using a custom shape file or even a file with coordinates you create yourself can get you what you want. This quick presentation will show you how.
RF13 : An Animated Guide: Understanding the logic of the ROC Curve
Russ Lavery, Contractor
Tuesday, 4:00 PM - 4:10 PM, Location: Buckeye B
The ROC curve is often taught to students without any explanation. Many professors simply say that, for a random process, the area under the ROC curve (AUC) is .5 and a good model has an AUC greater than .5. This example, and cartoon, rich talk explains the logic of the ROC curve. Understanding the logic of the ROC curve allows useful information to be gleaned from the shape of the curve.
SAS 101
SA01 : Top Ten SAS® Performance Tuning TechniquesKirk Paul Lafler, Software Intelligence Corporation
Monday, 8:00 AM - 8:50 AM, Location: Regency G
SAS® Base software provides users with many choices for accessing, manipulating, analyzing, and processing data and results. Partly due to the power offered by the SAS software and the size of data sources, many application developers and end-users are in need of guidelines for more efficient use. This presentation highlights my personal top ten list of performance tuning techniques for SAS users to apply in their applications. Attendees learn DATA and PROC step language statements and options that can help conserve CPU, I/O, data storage, and memory resources while accomplishing tasks involving processing, sorting, grouping, joining (merging), and summarizing data.
SA02 : SAS® Debugging 101
Kirk Paul Lafler, Software Intelligence Corporation
Monday, 11:00 AM - 11:50 AM, Location: Regency G
SAS® users are always surprised to discover their programs contain bugs (or errors). In fact, when asked, users will emphatically stand by their programs and logic by saying they are error free. But, the vast number of experiences along with the realities of writing code says otherwise. Errors in program code can appear anywhere; whether accidentally introduced by users when writing code. No matter where an error occurs, the overriding sentiment among most users is that debugging SAS programs can be a daunting, and humbling, task. Attendees learn about the various error types, identification techniques, their symptoms, and how to repair program code to work as intended.
SA05 : Simplifying Effective Data Transformation Via PROC TRANSPOSE
Arthur Li, City of Hope
Monday, 10:30 AM - 10:50 AM, Location: Regency G
You can store data with repeated measures for each subject, either with repeated measures in columns (one observation per subject) or with repeated measures in rows (multiple observations per subject). Transforming data between formats is a common task because different statistical procedures require different data shapes. Experienced programmers often use ARRAY processing to reshape the data, which can be challenging for novice SAS® users. To avoid using complex programming techniques, you can also use the TRANSPOSE procedure to accomplish similar types of tasks. In this talk, PROC TRANSPOSE, along with its many options, will be presented through various simple and easy-to-follow examples.
SA06 : Painless Extraction: Options and Macros with PROC PRESENV
Keith Fredlund, Grand Valley State University
Thinzar Wai, Grand Valley State University
Monday, 9:30 AM - 9:50 AM, Location: Regency G
The PRESENV procedure was introduced as part of SAS version 9.4, but little has been written about the capabilities of this procedure. The PRESENV procedure works in tandem with the PRESENV system option allowing the user to preserve certain global options, macros, macro variables, formats, and temporary data across SAS sessions. The PRESENV option can be turned on and off anytime within a program and a collection of global statements will be collected accordingly. This paper details procedure syntax, provides example code and output, and describes potential uses and limitations. Emphasis within the paper is given to use of the procedure to debug code written by others, macro value evaluation and the creation of a template for other programs. This procedure would be beneficial for users with an intermediate level of SAS data step programming.
SA08 : Hashtag #Efficiency! An Introduction to Hash Tables
Lakshmi Nirmala Bavirisetty, South Dakota State University
Deanna Schreiber-Gregory, National University
Kaushal Chaudhary, Eli Lilly and Company
Monday, 9:00 AM - 9:20 AM, Location: Regency G
Have you ever had to walk away from your computer during an analysis? Have you wondered if there is a way to increase your efficiency, save time, and be able to answer more questions? Hash tables to the rescue! This paper covers a brief introduction to the use of hash tables, their definition, benefits, concept, and theory. It also includes a review of some more applied approaches to hash table usage through code examples and applications that illustrate how using of hash tables can help improve performance time and coding efficiency. This paper will wrap up by providing a comparison of performance times between hash tables and traditional lookup and join/merge methods. This paper is intended for any level of SAS user who would like to learn about how hash tables can help process efficiency!
SA09 : Array of Sunshine: Casting Light on Basic Array Processing
Nancy Brucken, InVentiv Health Clinical
Monday, 10:00 AM - 10:20 AM, Location: Regency G
An array is a powerful construct in DATA step programming, allowing the application of a single process to multiple variables simultaneously, without having to resort to macro variables or repeat code blocks. Arrays are also useful in transposing data sets when PROC TRANSPOSE does not provide a necessary degree of control. However, they can be very confusing to less-experienced programmers. This paper shows how arrays can be used to solve some common programming problems.
SA10 : Beyond IF THEN ELSE: Techniques for Conditional Execution of SAS® Code
Josh Horstman, Nested Loop Consulting
Monday, 1:00 PM - 1:50 PM, Location: Regency G
Nearly every SAS® program includes logic that causes certain code to be executed only when specific conditions are met. This is commonly done using the IF&THEN&ELSE syntax. In this paper, we will explore various ways to construct conditional SAS logic, including some that may provide advantages over the IF statement. Topics will include the SELECT statement, the IFC and IFN functions, the WHICH and CHOOSE function families, as well as some more esoteric methods, and we'll make sure we understand the difference between a regular IF and the %IF statement in the macro language.
SA11 : Accessing Teradata through SAS, common pitfalls, solutions and tips
Kiran Venna, Experis Business Analytics Practice
Monday, 2:00 PM - 2:20 PM, Location: Regency G
There are some common pitfalls while accessing Teradata from SAS. SAS Options and SAS Macro's efficiently handle these pitfalls. Owing to its unique architecture, Teradata primary index has to be designed properly for both space and efficiency of accessing data in Teradata. Dataset option dbcreate_table_opts handles creation of primary index, while creating Teradata tables through SAS. Inefficient data types are created, when Teradata tables are created using SAS. Dataset option dbtype helps in creating efficient data types. SAS Macro can help to automatize dbcreate_table_opts and dbtype dataset options, when SAS is used to create Teradata tables. Case Specificity of Teradata can cause major concern when Explicit SQL Pass-Through is used. By issuing appropriate mode in connect statement, case specificity concerns can be resolved. While creating a SAS dataset with Explicit SQL Pass-Through, row_number function column defaults to big Integer in Teradata 15 as opposed to Integer in Teradata 14, which can make a query fail. Appropriate casting for row_number() in Teradata 15 Explicit SQL Pass-Through is necessary, when a SAS dataset is created. Large datasets in Teradata can be compressed by Explicit SQL Pass-Through by using query_band argument in connect statement.
Statistics / Advanced Analytics
AA01 : The Practice of Credit Risk Modeling for Alternative LendingKeith Shields, Magnify Analytic Solutions
Bruce Lund, Magnify Analytic Solutions
Tuesday, 9:30 AM - 10:20 AM, Location: Buckeye A
In recent years, data scientists in the credit risk profession have experienced less freedom to deviate from industry-accepted practices because the biggest users of credit risk models (banks and large lenders) are the very institutions that face increased regulatory pressure since the 2008 crisis. Regulators have thus been less anxious to bless practices that deviate from the scorecard and cutoff score paradigm. But things are changing. The lending industry is demanding innovation and creativity from its data scientists more than ever. The rise of alternative lending, driven mainly by Marketplace lenders (aka Pier-to-Pier lenders), is built on the notion that web-based platforms can integrate customer-facing technology and Big Data to acquire, fund, and underwrite loans in a manner that is more targeted and efficient than banks currently do today. Big Data techniques and "freeware" have increased in popularity, but using SAS to perform logistic regression and survival analysis on structured data is as good an option for quantifying credit risk as ever. In the new world of lending it is not the model fitting techniques that need to change, as much as it is the treatment of samples, potential predictors, and model refits.
AA02 : Identifying the factors responsible for loan defaults and classification of customers using SAS Enterprise Miner
Prashanth Reddy Musuku, Oklahoma State University
Juhi Bhargava, Oklahoma State University
Tuesday, 9:00 AM - 9:20 AM, Location: Buckeye A
Lending business is crucial to the profitability of a bank or financial institution. Loan defaults, delay in repayment by customers lead to a problems in cash flow position. The last economic crisis in US has been triggered by loan defaults. This study aims to identify the factors contributing towards loan defaults, delay in repayments as well as the characteristics of a borrower who will honor all the obligations of a loan. The results will enable us to determine the relationship between loan and customer characteristics and the probability to default. This results may also be used to appraise and monitor credit risk at the time of loan approval and during the currency of the loan. The loan data for December 2015 was extracted from the website of Lending club, an online credit market place. It consists of all loans issued through December, 2015 along with the loan status. It contains 111 variables such as the details of customer's loan account, amount, application type - individual or joint, principal outstanding, amount paid, interest rate, length of employment, annual income, loan status - current, default, in grace or late due, verification status, purpose of loan and so on. There are 421,095 records. The factors contributing towards loan default will be identified and predicted using models such as logistic regression, decision tree and artificial neural networks. The identified factors will then be implemented using random forest method to classify the customers whether they are likely to default, pay late or pay on time. The classification will enable the lending institutions to optimize their policies and strategies to reduce the loan defaults and also to make informed decisions about the current customers at the risk of default. Data Source: https://www.lendingclub.com/info/download-data.action
AA03 : Property & Casualty Insurance Predictive Analytics in SAS®
Mei Najim, Gallagher Bassett Services, Inc.
Tuesday, 1:30 PM - 2:20 PM, Location: Buckeye A
Predictive analytics has been evolving in property & casualty insurance for the past two decades. This paper will first provide an overview of predictive analytics in each of the following core business functions in Property & Casualty insurance: pricing, reserving, underwriting, claims, marketing, and reinsurance. Then, a common property & casualty insurance predictive modeling process with large data sets will be introduced. The steps of this process include data acquisition, data preparation, variable creation, variable selection, model building (a.k.a.: model fitting), model validation, and model testing. Finally, some successful models will be introduced. Base SAS, SAS Enterprise Guide/STAT, and SAS Enterprise Miner are presented as the main tools for this process. This predictive modeling process could be tweaked or directly used in many other industries as the statistical foundations of predictive analytics have large overlaps across property & casualty insurance, health care, life insurance, banking, pharmaceutical, and genetics industries, etc.. This paper is intended for any level of SAS® user or business people from different industries who is interested in learning about general predictive analytics.
AA04 : Discover the golden paths, unique sequences and marvelous associations out of your big data using Link Analysis in SAS® Enterprise Miner TM
Delali Agbenyegah, Alliance Data Systems
Xingrong Zhang, Alliance Data Retail
Tuesday, 2:30 PM - 3:20 PM, Location: Buckeye A
The need to extract useful information from large amount of data to positively influence business decisions is on the rise especially with the hyper expansion of retail data collection and storage and the advancement in computing capabilities. Many enterprises now have well established databases to capture Omni channel customer transactional behavior at the product or Store Keeping Unit (SKU) level. Crafting a robust analytical solution that utilizes these rich transactional data sources to create customized marketing incentives and product recommendations in a timely fashion to meet the expectations of the sophisticated shopper in our current generation can be daunting. Fortunately, the newly added Link Analysis node in SAS® Enterprise Miner TM provides a simple but yet powerful analytical tool to extract, analyze, discover and visualize the relationships or associations (links) and sequences between items in a transactional data set up and develop item-cluster induced segmentation of customers as well as next-best offer recommendation. In this paper, we discuss the basic elements of Link Analysis from a statistical perspective and provide a real life example that leverages Link Analysis within SAS Enterprise Miner to discover amazing transactional paths, sequences and links.
AA05 : Hybrid recommendation system to provide suggestions based on user reviews
Ravi Shankar Subramanian, GRADUATE STUDENT
Tuesday, 11:00 AM - 11:20 AM, Location: Buckeye A
If you have ever shopped on Amazon, Pandora or Netflix, you have probably experienced recommendation systems in action. These systems analyze historical buying behavior and make real time recommendations while you are shopping. The back end of these systems contain data mining models that make predictions about the product relevant to you. We plan to build a similar hybrid recommendation system to suggest restaurants. We intend to combine content from Yelp reviews, user's profile, their ratings/reviews for each restaurant visited, restaurant details and tips provided by the user. To implement our idea, we downloaded 2.2M reviews and 591K tips by 552,000 users at the Yelp website. The dataset for 77,000 restaurants contain information such as user profile information. Traditional systems utilize only user's ratings to recommend new restaurants. However, the system we propose will use both user's reviews or content and ratings to provide recommendations. The content based system is modeled by identifying the preferences for each user and associating them with key words such as cuisine, inexpensive, cleanliness and so on by constructing concept links and association rules based on their past reviews. The collaborative based system is modeled through k-means clustering by aggregating a particular user with other peer users based on the ratings provided for restaurants.The SAS tools that are going to be used for this paper is SAS Enterprise Miner 14.
AA06 : Analyzing sentiments in tweets for Tesla Model 3
Tejaswi Jha, Student
Tuesday, 10:30 AM - 10:50 AM, Location: Buckeye A
Tesla Model 3 is making news in the history of automobiles as never seen before. The new electric car already has more than 400,000 reservations and counting. We carried out a descriptive analysis of sales of all Tesla models and found that the number of reservations till date are more than three times of sales of all previous Tesla cars combined. Clearly there is a lot of buzz surrounding this and such buzz influences consumers' opinions and sentiments and which in turn lead to bookings. This paper aims to summarize findings about people's opinions, reviews and sentiments about Tesla's new car Model 3 using textual analysis of tweets collected from Feb 2016. For this, we will use the live streaming data from Twitter over time and study its pattern based on the booking timeline. We have been collecting data from February 2016 when the interest of people in this model spiked suddenly. Currently, we are analyzing about 10,000 tweets. We are using the SAS® Enterprise Miner and SAS® Sentiment Analysis Studio to evaluate key questions pertaining to the analysis such as following- What features do people think about? What are the factors that motivate people to reserve Tesla? What factors are discouraging them (e.g., waiting period)? Is the nature of sentiment in comments (positive or negative) changing over time?
AA11 : Assessing the Impact of Communication Channel on Behavior Changes in Energy Efficiency
Angela Wells, Direct Options
Ashlie Ossege, Direct Options
Monday, 11:30 AM - 11:50 AM, Location: Bluegrass AB
With government and commissions incentivizing electric utilities to get consumers to save energy, the number of energy saving programs has grown. Some, called behavioral programs, are designed to get consumers to change their behavior to save energy. Within behavioral programs, Home Energy Reports achieve savings as well as educate consumers. This paper examines the different Home Energy Report communication channels' effect on energy savings, using SAS® for linear models. For behavioral change, we often hear the questions: 1) Are the people that responded via direct mail solicitation saving at a higher rate than people who responded via an e-mail solicitation? Hypothesis(1): Because e-mail is easy to respond to, the type of customers that enroll through this channel will exert less effort for the behavior changes that require more time and investment toward energy efficiency changes and thus will save less. 2) Does the mode of that ongoing dialog (mail versus e-mail) impact the amount of consumer savings? Hypothesis(2): E-mail is more likely to be ignored and thus these recipients will save less. As savings is most often calculated by comparing the treatment group to a control group, and by definition you cannot have a dialog with a control group, the answers are not a simple PROC FREQ away. This study used clustering (PROC FASTCLUS) to segment the consumers channel and append cluster assignments to the respective control group. This study also used Difference-in-Differences and PROCGLM to calculate the statistical savings of these groups. Skill: Low-Medium
AA12 : An Innovative Method of Customer and Program Clustering
Brian Borchers, Direct Options
Monday, 8:30 AM - 8:50 AM, Location: Bluegrass AB
This paper describes an innovative way to identify groupings of customer offerings using SAS software. The authors investigated the customer enrollments in nine different programs offered by a large energy utility. These programs included levelized billing plans, electronic payment options, renewable energy, energy efficiency programs, a home protection plan, and a home energy report for managing usage. Of the 640,788 residential customers, 374,441 had been solicited for a program and had adequate data for analysis. Nearly half of these eligible customers (49.8%) enrolled in some type of program. To examine the commonality among programs based on characteristics of customers who enroll, cluster analysis procedures and correlation matrices are often used. However, the value of these procedures was greatly limited by the binary nature of enrollments (enroll or no enroll), as well as the fact that some programs are mutually exclusive (limiting cross-enrollments for correlation measures). To overcome these limitations, the PROC LOGISTIC procedure was used to generate predicted scores for each customer for a given program. Using the same predictor variables, PROC LOGISTIC was used on each program to generate predictive scores for all customers. This provided a broad range of scores for each program, under the assumption that customers who are likely to join similar programs would have similar predicted scores for these programs. The PROC FASTCLUS procedure was used to build k-means cluster models based on these predicted logistic scores. Two distinct clusters were identified from the nine programs. These clusters not only aligned with the hypothesized model, but were generally supported by correlations (using PROC CORR) among program predicted scores as well as program enrollments.
AA14 : An Analysis of the Repetitiveness of Lyrics in Predicting a Song's Popularity
Drew Doyle, University of Central Florida
Tuesday, 11:30 AM - 11:50 AM, Location: Bluegrass AB
In the interest of understanding whether or not there is a correlation between the repetitiveness of a song's lyrics and its popularity, the top ten songs from the year-end Billboard Hot 100 Songs chart from 2006 to 2015 were collect. These songs then had their lyrics assessed to determine the count of the top ten words used. These words counts were then used to predict the number of weeks the song was on the chart. The prediction model was analyzed to determine the quality of the model and if word count is a significant predictor of a song's popularity. To investigate if song lyrics are becoming more simplistic over time there were several tests completed in order to see if the average word counts have been changing over the years. All analysis was completed in SAS® using various PROCs.
AA15 : Weight-of-Evidence Coding and Binning of Predictors in Logistic Regression
Bruce Lund, Independent Consultant
Monday, 2:00 PM - 2:50 PM, Location: Bluegrass AB
Weight-of-evidence (WOE) coding of a nominal or discrete variable is widely used when preparing predictors for usage in binary logistic regression models. When using WOE coding, an important preliminary step is binning of the levels of the predictor to achieve parsimony without giving up predictive power. Two approaches to binning are discussed: (1) Binning so as to minimize the loss of information value (IV), and (2) Binning to achieve monotonicity in the case of an ordered predictor. Next, these concepts of WOE, binning, and IV are extended to ordinal logistic regression in the case of the cumulative logit model. SAS® code to perform binning is demonstrated. Lastly, guidelines for assignment of degrees of freedom for WOE-coded predictors within a fitted logistic model are discussed. The assignment of degrees of freedom bears on the ranking of logistic models by SBC (Schwarz Bayes). All computations in this talk are performed by using SAS® and SAS/STAT®.
AA17 : Using SAS to Generate p-Values with Monte Carlo Simulation
Brandy Sinco, University of Michigan
Edith Kieffer, University of Michigan School of Social Work
Michael Spencer, University of Michigan School of Social Work
Michael Woodford, Wilfrid Laurier University, Canada
Gloria Palmisano, CHASS Center
Gretchen Piatt, University of Michigan Medical School
Monday, 4:00 PM - 4:50 PM, Location: Bluegrass AB
Background. When multiple comparisons are made between groups, the type-1 error rate, typically .05, becomes inflated. The statistical literature indicates that Monte Carlo simulation, using bootstrap and permutation tests, is an effective method to address the inflated error risk from multiple comparisons. Objective. Discuss the theory behind bootstrap and permutation tests, and demonstrate how to generate bootstrapped p-values for commonly used analysis methods. Techniques to be covered include comparisons of counts and percentages, t-tests for means, linear regression, logistic regression, and linear mixed models. SAS Procedure Focus. For comparisons between counts and percentages, Proc Freq has user-friendly features to adjust for Monte Carlo simulation. Although Proc TTest does not have multiple comparison adjustment, Proc MULTTEST is an excellent alternative. For logistic regression, Proc Logistic has a Monte Carlo simulation option on both the Estimate and LSMEANS statements. If linear regression is done with Proc GLM, the LSMEANS, but not the Estimate statement, contains a Monte Carlo option. Similarly, Proc Mixed, for linear mixed models, offers Monte Carlo adjustment only on the LSMEANS statement. However, SAS procedures for linear models that do not offer Monte Carlo adjustment on the Estimate statement, can easily have their output routed to Proc PLM for Monte Carlo adjustment of estimates. Examples. Bootstrapped p-values will be demonstrated with SAS code from analyses of data from a study of LGBTQ college students by multiple race and gender categories, and from a diabetes study with three treatment groups across four time points.
AA18 : To be two or not be two, that is a LOGISTIC question
Robert G. Downer, GVSU
Tuesday, 9:00 AM - 9:20 AM, Location: Bluegrass AB
A binary response is very common in logistic regression modeling. The binary outcome could be the only possible construction of the response but it also could be the result of collapsing of additional response categories. Potential advantages of a binary response include easier interpretation of odds ratios and a single fitted model. Some information will be sacrificed through collapsing but what about other implications? Consequences such as model simplicity and prediction performance are explored through the investigation of data involving an immigration program. Two detailed PROC LOGISTIC examples give relevant syntax and output for a baseline multinomial logit modeland a standard binary logistic model. Utilizing standard SAS/STAT procedures for exploratory analysis is shown to be very practical for understanding the modeling. Some familiarity with logistic regression would be helpful for understanding this paper.
AA19 : Is My Model the Best? Methods for Exploring Model Fit
Deanna Schreiber-Gregory, National University
Monday, 10:30 AM - 11:20 AM, Location: Bluegrass AB
In the submission/presentation phase of any research or analytics project, it is reasonable to expect the reception of many types of questions aimed at clarifying the reliability and accuracy of the project's results. One of the most common questions to expect would be: So the model provides a feasible answer to the question, but does it provide the best answer? One way to answer this question with utmost confidence is to provide a variety of model fit analyses designed to support the conclusion of your final model. This paper provides a variety of techniques aimed at model fit exploration including default procedure settings as well as additive options. This paper will review the theory behind each of these fit procedures and the pros and cons of their use. Optional R-square calculations will also be explored. This paper is intended for any level of SAS® user.
AA20 : An Introduction to the HPFOREST Procedure and its Options
Carl Nord, Grand Valley State University
Jacob Keeley, Grand Valley State University
Tuesday, 3:00 PM - 3:20 PM, Location: Bluegrass AB
The acquisition of big data into usable formats can be quite a challenge. There is a large emphasis being placed on the efficiency and scope of data acquisition in many industries. With the increasing amount of data available to analyse, the best methods for creating predictive models from big data banks is becoming a desire of many sectors. In particular, sectors where prediction is the sole goal of the model. Decision tree techniques are a common and effective approach for creating optimal predictive models. A procedure, the HPFOREST procedure, creates random forests models in a high performance environment. There is limited amount of information on this procedure, which makes it a prime candidate for discussion. This procedure allows for a predictive model to be created based on decision tree methodology. This method of model averaging has been known to produce predictive models that generalize quite favourably (Breiman, 2001). This paper will include an outline of basic code structure for the procedure as well as options such as specifying maximum trees, outputting fit statistics, etc. Scoring new data with a model file and generating helpful figures will be discussed as well.
AA21 : Use Multi-Stage Model to Target the Most Valuable Customers
Chao Xu, Alliance Data
Jing Ren, Alliance Data
Hongying Yang, Alliance Data
Tuesday, 2:30 PM - 2:50 PM, Location: Bluegrass AB
To predict the likelihood of customers to be engaged into the business, logic models are usually employed in the current marketing industry. For example, marketers use logistic regression to predict the probability of customers' response to the campaign. By which, only those customers with high response rate will be selected for targeting to reduce the cost and maximize the return on investment (ROI). However, the logic models do not consider the values of customers after the response, which should be also important to the business. In this paper, we combined logistic regression with survival model to target the most valuable customers. Proc logistic, proc lifetest and proc phreg are utilized and explored in the two-stage model. We compared our predicted results with the ones only using traditional logistic regression model. We found that, after the likelihood of engagement is compromised by the value of customers, the ROIs of campaigns can be improved. All programming is executed in the environment of SAS Enterprise Guide" 7.1.
AA22 : Analyzing the effect of weather on Uber Ridership
Snigdha Gutha, Oklahoma State University
Anusha Mamillapalli, Oklahoma State University
Monday, 10:00 AM - 10:20 AM, Location: Bluegrass AB
Uber has changed the face of taxi ridership, making it more convenient and comfortable for riders. But, there are times when customers are left unsatisfied because of shortage of vehicles which ultimately led to Uber adopting surge pricing. It's a very difficult task to forecast number of riders at different locations in a city at different points in time. This gets more complicated with changes in weather. In this paper we attempt to estimate the number of trips per borough on a daily basis in New York City. We add an exogenous factor, weather to this analysis to see how it impacts the changes in number of trips. We fetched six month worth data (approximately 9.7 million records) of Uber rides in New York City ranging from January 2015 to June 2015 from github. We also gathered weather data (about 3.5 million records) of New York City for the same period of six months from Jan 2015 to June 2015 from National Climatic Data Center. We plan to analyze Uber data and weather data together to estimate the change in the number of trips per borough due to changing weather conditions. We will build a model to predict the number of trips per day for the one week ahead forecast for each borough of the New York City.
AA23 : A Demonstration of Various Models Used in a Key Driver Analysis
Steven Lalonde, Rochester Institute of Technology
Monday, 3:00 PM - 3:50 PM, Location: Bluegrass AB
A key driver analysis investigates the relationship between predictor variables and a response, such as customer satisfaction or net promoter score. The response is often measured on a five, seven, or ten-point scale, and collected using a survey. The predictors are generally other scaled questions, asked on the same survey, or demographics. Analyses often use multiple linear regression to fit the response as a function of the predictors, and some function of the regression coefficients as a measure of importance of individual predictors. This approach suffers from two major criticisms. The scaled response, especially if each point on the scale is individually labeled, may not be an interval scale, which would make the linear regression model invalid. Secondly, the predictors are generally correlated with one another, which can lead to counter-intuitive regression coefficients, even coefficients with the wrong sign! The first criticism can be alleviated by fitting an ordinal logistic model to the response, rather then the multiple linear regression. The second criticism is often addressed by fitting a more parsimonious model, using some form of variable selection. In this paper various approaches to the key driver analysis will be demonstrated using SAS STAT procedures, and the advantages and disadvantages of each approach will be summarized.
AA24 : An Animated Guide: Penalized variable selection techniques in SAS and Quantile Regression
Russ Lavery, Contractor
Tuesday, 8:00 AM - 8:50 AM, Location: Bluegrass AB
This paper discusses four SAS® procedures REG, GLMSELECT, QUANTREG, and QUANTSELECT. Among the features supported by these procedures are several penalized variable selection techniques (Ridge, LARS, LASSO, Elastic Net and others) and some machine learning techniques. The paper explains the theory behind these techniques, gives examples of SAS code to run the procedures, and discusses the reports and output of these procedures.
AA25 : An Animated Guide: Deep Neural Networks in SAS E. Miner
Russ Lavery, Contractor
Tuesday, 1:30 PM - 2:20 PM, Location: Bluegrass AB
Deep Neural Networks are recent machine learning technique that takes advantage of some theoretical breakthroughs to expand the network architecture from the one or two layers that used to be common to networks of 100 layers with hundreds of nodes in each layer. They have greatly expanded the capability of the networks and this technique has been added to SAS Enterprise Miner. This cartoon formatted, and example rich, talk will explain the theory of neural networks as well as give examples. It is appropriate for people with no exposure to Neural Networks as well as users of Neural Networks who want to learn about Deep learning.
AA26 : The State of Human Trafficking in the Cincinnati Metropolitan Area - a Statistical Case Study
David Corliss, Peace-Work
Heather Hill, Peace-Work
Tuesday, 3:30 PM - 4:20 PM, Location: Bluegrass AB
A statistical profile is presented of of human trafficking in the Cincinnati area. Working data and expertise from local agencies combating this problem and data from the National Human Trafficking Resource Center (NHTRC), volunteer statisticians have performed geographic, demographic and socio-economic analyses. Summary statistics show the scope of the problem here in the Cincinnati area, while predicting modeling identifies locations, times, practices and other key indicators of this crime. These analyses are performed entirely using SAS University Edition, a free version of SAS software available to university researchers, students, and other non-commercial applications.
AA27 : Fixed Item Parameter Calibration with MMLE-EM Using a Fixed Prior in SAS® IML
Sung-Hyuck Lee, ACT
Hongwook Suh, ACT
Monday, 1:30 PM - 1:50 PM, Location: Bluegrass AB
Fixed item parameter calibration (FIPC) has been popular in estimating parameters of the pretest (brand new) items administered with a computer adaptive test. In this study, a new FIPC method is proposed. In the new approach, the prior for the EM algorithm is computed only using the parameters of the operational (scored) items and the responses to them. During the EM cycles the prior is not updated but fixed in calibrating the pretest items. The main advantage of this new method is that any potential contamination from poor pretest items (e.g., bad model fit) is eliminated since pretest items are excluded from computing the prior during the EM cycles, which is a major difference of the new approach from the FIPC methods proposed so far. No commercial software is available to implement the new approach so a new SAS macro named SAS®-FIPC is written in SAS® IML to estimate the parameters of the pretest items. The estimation results of the new method will be compared to the ones from the existing methods through a simulation study.
AA29-SAS : Fitting Your Favorite Mixed Models with PROC MCMC
Maura Stokes, SAS
Monday, 9:00 AM - 9:50 AM, Location: Bluegrass AB
The popular MIXED, GLIMMIX, and NLMIXED procedures in SAS/STAT® fit linear, generalized linear, and nonlinear mixed models, respectively. These procedures take the classical approach of maximizing the likelihood function to estimate model parameters, using methods such as maximum likelihood and restricted maximum likelihood. The flexible MCMC procedure in SAS/STAT can fit these same models by taking a Bayesian approach. Instead of maximizing the likelihood function, PROC MCMC draws samples (using a variety of sampling algorithms) to approximate the posterior distributions of model parameters, which is the key to Bayesian analysis. Similar to the mixed modeling procedures, PROC MCMC provides estimation, inference, and prediction. This paper describes how to use the MCMC procedure to fit Bayesian mixed models and compares the Bayesian approach to how the classical models would be fit with the familiar mixed modeling procedures. The paper also discusses unique aspects of the Bayesian approach that are not related to the classical approach. Several examples illustrate the approach in practice.
AA30-SAS : Modeling Longitudinal Categorical Response Data
Maura Stokes, SAS
Tuesday, 9:30 AM - 11:20 AM, Location: Bluegrass AB
Longitudinal data occur for responses that represent binary and multinomial outcomes as well as counts. These data are commonly correlated and often include missing values, so any analysis needs to take both of these factors into consideration. This tutorial focuses on using generalized estimating equations for analyzing longitudinal categorical response data, but it also discusses the generalized linear mixed models approach. Strategies such as weighted generalized estimating equations for managing missing data are also discussed, along with the assumptions for these methods. Techniques are illustrated with real-world applications using SAS procedures such as GENMOD, GLIMMIX, and the GEE procedure. Experience with logistic regression is required for this tutorial.
System Architecture and Administration
SY01 : Spawning SAS® Sleeper Cells and Calling Them into Action: Implementing Distributed Parallel Processing in the SAS University Edition Using Commodity Computing To Maximize PerformanceTroy Hughes, No Affiliation
Monday, 10:30 AM - 10:50 AM, Location: Buckeye A
With the 2014 launch of the SAS® University Edition, the reach of SAS was greatly expanded to educators, students, researchers, and non-profits who could for the first time utilize a full version of Base SAS software for free, enabling SAS to better compete with open source solutions such as Python and R. Because the SAS University Edition allows a maximum of two CPUs, however, performance is curtailed sharply from more substantial SAS environments that can benefit from parallel and distributed processing, such as designs that implement SAS Grid Manager, Teradata, and Hadoop solutions. Even when comparing performance of the SAS University Edition against the most straightforward implementation of SAS Display Manager (running on the same computer), SAS Display Manager demonstrates significantly greater performance. With parallel processing and distributed computingincluding programmatic and non-programmatic methodsbecoming the status quo in SAS production software, the SAS University Edition will unfortunately continue to fall behind its SAS counterparts if it cannot harness parallel processing best practices. To curb this performance disparity, this text introduces groundbreaking programmatic methods that enable commodity hardware to be networked so that multiple instances of the SAS University Edition can communicate and work collectively to divide and conquer complex tasks. With parallel processing enabled, a SAS practitioner can now easily harness an endless number of computers to produce blitzkrieg solutions with SAS University Edition that rival the performance of those produced on costly, complex infrastructure.
SY02 : Key Tips for SAS® Grid Users
Venkateswarlu Toluchuri, Lead Business Analyst
Monday, 9:00 AM - 9:20 AM, Location: Buckeye A
SAS Grid is a cluster node system and SAS® Grid Manager is the controller in job scheduling and load balancing resources for optimal processing. It is an important to deal with SAS® Grid environment to manage effectively without having to depend on the SAS® Administrator. This paper focuses on tips and common issues for new SAS® Grid users, especially if they are coming from a traditional environment (Non-Grid). I describe a few common instructions and scenarios that will provide a SAS programmer with very basic level function to work with SAS® environment.
SY03 : Enterprise Architecture for Analytics Using TOGAF
David Corliss, Ford Motor Company
Monday, 8:00 AM - 8:50 AM, Location: Buckeye A
Enterprise Architecture is a set of practices for development and implementation of the overall design of a system. EA embraces hardware, software and analytics in a single, enterprise-wide environment. As the leading end-to-end analytic solution for large enterprises, SAS systems can benefit greatly from the use of EA best practices. As a framework for enterprise architecture, TOGAF supports creation of a set of inter-connected building blocks. When applied to analytic systems, TOGAF practices promote greater efficiency, improved cost and ease of use, streamlining the delivery all desired analytic requirements. This presentation describes the principles of Enterprise Architecture and goes step-by-step through the TOGAF framework as applied to analytic architecture. Practical examples are given of both the process and benefits of the TOGAF framework in a variety of industries and settings.
SY04 : Avoiding Code Chaos - Architectural Considerations for Sustainable Code Growth
David Ward
Monday, 11:00 AM - 11:50 AM, Location: Buckeye A
As projects grow in complexity - both in terms of code and personnel - source code can quickly become very difficult to maintain, extend, and test. While this problem certainly occurs across programming languages, SAS seems uniquely suited for unmanageable complexity. First, SAS programmers are often trained more as analysts than as software engineers, so they tend to think more about organizing data than a complex code base. Second, the procedural nature of the language itself can contribute to code chaos by allowing for extremely long programs without much re-usable code. Third, though the macro language is powerful, it can often be used to create code that is very hard to understand. This paper will present the essential components that must be considered when designing a SAS application that will thrive under growth and make concrete recommendations for their implementation. This outline can be taken back to your programming team to foster discussion and help work towards the adoption of design and coding standards that will make development more efficient, effective, and maintainable, and more importantly - make programming more enjoyable. The following topics will be treated: readability, syntax, clarity, headers, file names, modularization, inputs/outputs, directory structure, entry/exit points, handling log and output files, automation/looping techniques, and version control.
SY05-SAS : SAS® Grid Administration Made Simple
Scott Parrish, SAS
Monday, 9:30 AM - 10:20 AM, Location: Buckeye A
Historically, administration of your SAS® Grid Manager environment has required interaction with a number of disparate applications including Platform RTM for SAS, SAS® Management Console, and command line utilities. With the third maintenance release of SAS® 9.4, you can now use SAS® Environment Manager for all monitoring and management of your SAS Grid. The new SAS Environment Manager interface gives you the ability to configure the Load Sharing Facility (LSF), manage and monitor high-availability applications, monitor overall SAS grid health, define event-based alerts, and much, much more through a single, unified, web-based interface.
Tools of the Trade
TT01 : Downloading, Configuring, and Using the Free SAS® University Edition SoftwareKirk Paul Lafler, Software Intelligence Corporation
Ryan Paul Lafler, High School Student
Charlie Shipp, Shipp Consulting
Monday, 1:00 PM - 1:50 PM, Location: Buckeye B
The announcement of SAS Institute's free SAS University Edition is an exciting development for SAS users and learners around the world! The software bundle includes Base SAS, SAS/STAT, SAS/IML, Designer Studio (user interface), and SAS/ACCESS for Windows, with all the popular features found in the licensed SAS versions. This is an incredible opportunity for users, statisticians, data analysts, scientists, programmers, students, and academics everywhere to use (and learn) for career opportunities and advancement. Capabilities include data manipulation, data management, comprehensive programming language, powerful analytics, high quality graphics, world-renowned statistical analysis capabilities, and many other exciting features. This presentation discusses and illustrates the process of downloading and configuring the SAS University Edition. Additional topics include the process of downloading the required applications, key configuration strategies to run the SAS University Edition on your computer, and the demonstration of a few powerful features found in this exciting software bundle.
TT02 : Removing Duplicates Using SAS®
Kirk Paul Lafler, Software Intelligence Corporation
Tuesday, 8:00 AM - 8:50 AM, Location: Buckeye B
We live in a world of data - small data, big data, and data in every conceivable size between small and big. In today's world data finds its way into our lives wherever we are. We talk about data, create data, read data, transmit data, receive data, and save data constantly during any given hour in a day, and we still want and need more. So, we collect even more data at work, in meetings, at home, using our smartphones, in emails, in voice messages, sifting through financial reports, analyzing profits and losses, watching streaming videos, playing computer games, comparing sports teams and favorite players, and countless other ways. Data is growing and being collected at such astounding rates all in the hopes of being able to better understand the world around us. As SAS professionals, the world of data offers many new and exciting opportunities, but also presents a frightening realization that data sources may very well contain a host of integrity issues that need to be resolved first. This presentation describes the available methods to remove duplicate observations (or rows) from data sets (or tables) based on the row's values and/or keys using SAS®.
TT03 : Calculating Cardinality Ratio in Two Steps
Ronald Fehd, Stakana Analytics
Monday, 8:30 AM - 8:50 AM, Location: Buckeye B
The cardinality of a set is the number of elements in the set. The cardinality of a \SASregistered software data set is the number of observations of the data set, n-obs. The cardinality of a variable in a data set is the number of distinct values (levels) of the variable, n-levels. The cardinality ratio of a variable is n-levels / n-obs; the range of this value is from zero to one. Previous algorithms combined output data sets from the frequency and contents procedures in a data step. This algorithm reduces multiple frequency procedure steps to a single call, and uses scl functions to fetch contents information in the second data step. The output data set, a dimension table of the list of data set variable names, has variable cr-type whose values are in (few, many, unique); this variable identifies the three main types of variables in a data set, few is discrete, many is continuous, and unique is a row-identifier. The purpose of this paper is to provide a general-purpose program, ml-namex.sas, which provides enhanced information about the variables in a data set. The author uses this list-processing program in Exploratory Data Analysis (EDA) and in Test-Driven Development (TDD), a discripline of Agile and Extreme Programming. Audience: programmers, data managers, database administrators Keywords: SAS component language (scl) functions: attrn (nobs, nvar), close, open, varfmt, varinfmt, varlabel, varlength, varnum, vartype; frequency procedure, nlevels option
TT04 : A Sysparm Companion, Passing Values to a Program from the Command Line
Ronald Fehd, Stakana Analytics
Tuesday, 9:00 AM - 9:20 AM, Location: Buckeye B
SAS(R) software has sections in its global symbol table for options and macro variables. Sysparm is both an option and a macro variable. As an option, it can be assigned during startup on the command line; in programs, values can be assigned with the options statement; values are stored and referenced as a macro variable. The purpose of this paper is to provide a general-purpose program, parse-sysparm.sas, which provides a method of deconstructing a list of comma-separated values (csv) into separate macro variables. This is a useful method of passing a set of parameter values from one program to another. Audience: programmers Keywords: macro variables, scan function, startup options, sysparm
TT06 : List Processing Macro Function CallText, Unquoting Items with Special Characters
Ronald Fehd, Stakana Analytics
Monday, 2:00 PM - 2:20 PM, Location: Buckeye B
SAS(R) software provides a simple yet powerful macro processing language. This paper provides an explanation of the general-purpose list processing routine CallText. The parameters of this macro are a data set name and a text string which may contain special characters such as double quotes, parentheses and ampersands for references to macro variables. It reads each row of the date set and returns a stream of tokens, perhaps less than a statement, which are the expansion of the text with special characters. The purpose of this paper is to show how to use the macro function %sysfunc with the SCL data set functions that open a data set, read and process each variable in an observation, and then close the data set.
TT07 : A SAS Macro to Create a Data Dictionary with Ease
Amy Gravely, Center for Chronic Disease Outcomes Research, A VA HSR&D Center of Innovation, Minneapolis VA Medical Center
Barbara Clothier, Center for Chronic Disease Outcomes Research, A VA HSR&D Center of Innovation, Minneapolis VA Medical Center
Monday, 10:00 AM - 10:20 AM, Location: Buckeye B
Creating a data dictionary offers a huge added value over looking at PROC CONTENTS output to communicate what is contained in a dataset. For example, a data dictionary shows the actual formatted values for each variable and organizes variables into chunks delineating them with sub-headings. This creates a clear, complete, organized view of what is contained in a dataset. There is no need for the user to look up formatted values or to sift through variables labels to find what they are looking for. A data dictionary can save time and create efficiency when handing off a dataset to someone else on your team or someone outside of your company. Data dictionaries can also help to keep your own projects organized when working with just one dataset or many. However, creating such a data dictionary by hand can be time consuming with cutting, pasting and typing. This paper walks through the creation and end product of a data dictionary macro. You can learn both the advanced macro techniques used to create the macro as well as to see the final product and request the macro to use for yourself. There are a couple of steps to take before running the macro and only a handful of parameters to fill in to create a data dictionary in a short amount of time. Changes to the dataset can be quickly accommodated and tracked.
TT08 : Take a SPA Day with the SAS® Performance Assessment (SPA): Baselining Software Performance across Diverse Environments To Elucidate Performance Placement and Performance Drivers
Troy Hughes, No Affiliation
Monday, 3:30 PM - 3:50 PM, Location: Buckeye B
Software performance is often measured through program execution time with higher performing software executing more rapidly than lower performing software. Intrinsic factors affecting software performance can include the use of efficient coding techniques, other software development best practices, and SAS® system options. Factors extrinsic to software that affect performance can include SAS configuration and infrastructure, SAS add-on modules, third-party software, and hardware and network infrastructure. The variability in data processed by SAS software also heavily drives execution time, and these combined and commingled factors make it difficult to compare performance of one SAS environment to another. Moreover, many SAS users may work in only one or a few SAS environments, giving them limited to no insight into how performance of their SAS environment compares to other SAS environments. The SAS Performance Assessment (SPA) projectlaunched at SAS Global Forum in 2016examines FULLSTIMER performance metrics from diverse organizations with equally diverse infrastructures. By running standardized software that manipulates standardized data sets, the relative performance of unique environments can for the first time be compared. Moreover, as the number and variability of SPA participants continues to increase, the role that individual extrinsic factors play in software performance will continue to be disentangled and better understood, enabling SAS users not only to identify how their SAS environment compares to other environments, but also to identify specific modifications that could be implemented to increase performance levels.
TT09 : Your Local Fire Engine Has an Apparatus Inventory Sheet and So Should Your Software: Automatically Generating Software Use and Reuse Libraries and Catalogs from Standardized SAS® Code
Troy Hughes, No Affiliation
Monday, 9:00 AM - 9:50 AM, Location: Buckeye B
Fire and rescue services are required to maintain inventory sheets that describe the specific tools, devices, and other equipment located on each emergency vehicle. From the location of fire extinguishers to the make, model, and location of power tools, inventory sheets ensure that firefighters and rescue personnel know exactly where to find equipment during an emergency, when restocking an apparatus, or when auditing an apparatus' inventory. At the department level, inventory sheets can also facilitate immediate identification of equipment in the event of a product recall or the need to upgrade to newer equipment. Software should be similarly monitored within a production environment, first and foremost to describe and organize code modulestypically SAS® macrosso they can be discovered and located when needed. When code is reused throughout an organization, a reuse library and reuse catalog should be established that demonstrate where reuse occurs and to ensure that only the most recent, tested, validated version of code modules are reused. This text introduces Python code that automatically parses a directory structure, parses all SAS program files therein (including SAS programs and SAS Enterprise Guide project files), and automatically builds reuse libraries and reuse catalogs from standardized comments within code. Reuse libraries and reuse catalogs not only encourage code reuse but also facilitate backward compatibility when modules must be modified because all implementations of specific modules are identified and tracked.
TT10 : Performing Pattern Matching by Using Perl Regular Expressions
Arthur Li, City of Hope
Monday, 2:30 PM - 3:20 PM, Location: Buckeye B
SAS® provides many DATA step functions to search and extract patterns from a character string, such as SUBSTR, SCAN, INDEX, TRANWRD, etc. Using these functions to perform pattern matching often requires utilizing many function calls to match a character position. However, using the Perl Regular Expression (PRX) functions or routines in the DATA step will improve pattern matching tasks by reducing the number of function calls and making the program easier to maintain. In this talk, in addition to learning the syntax of Perl Regular Expressions, many real-world applications will be demonstrated.
TT11 : Base SAS® and SAS® Enterprise Guide® ~ Automate Your SAS World with Dynamic Code; Your Newest BFF (Best Friend Forever) in SAS
Kent Phelps, The SASketeers
Ronda Phelps, The SASketeers
Monday, 10:30 AM - 10:50 AM, Location: Buckeye B
Communication is the basic foundation of all relationships including our SAS relationship with the Server, PC, or Mainframe. To communicate more efficiently ~ and to increasingly automate your SAS World ~ you will want to learn how to transform Static Code into Dynamic Code that automatically recreates the Static Code, and then executes the recreated Static Code automatically. Our presentation highlights the powerful partnership which occurs when Dynamic Code is creatively combined with a Dynamic FILENAME Statement, the INDSNAME SET Option, and the CALL EXECUTE Command within 1 SAS Enterprise Guide Base SAS Program Node. You will have the exciting opportunity to learn how 1,469 time-consuming Manual Steps are amazingly replaced with only 1 time-saving Dynamic Automated Step. We invite you to attend our session where we will detail the UNIX syntax for our project example and introduce you to your newest BFF (Best Friend Forever) in SAS. Please see the Appendices to review starting point information regarding the syntax for Windows and z/OS, and to review the source code that created the data sets for our project example.
TT12 : Defensive Coding by Example: Kick the Tires, Pump the Brakes, Check Your Blind Spots, and Merge Ahead!
Nancy Brucken, InVentiv Health Clinical
Donna Levy, InVentiv Health
Monday, 11:00 AM - 11:50 AM, Location: Buckeye B
As SAS® programmers and statisticians, we rarely write programs that are run only once and then set aside. Instead, we are often asked to develop programs very early in a project, on immature data, following specifications that may be little more than a guess as to what the data is supposed to look like. These programs will then be run repeatedly on periodically updated data through the duration of the project. This paper offers strategies for not only making those programs more flexible, so they can handle some of the more commonly encountered variations in that data, but also for setting traps to identify unexpected data points that require further investigation. We will also touch upon some good programming practices that can benefit both the original programmer and others who might have to touch the code. In this paper, we will provide explicit examples of defensive coding that will aid in kicking the tires, pumping the brakes, checking your blind spots, and merging ahead for quality programming from the beginning.
TT13 : An Animated Guide: The Internals of PROC Report
Russ Lavery, Contractor
Tuesday, 10:30 AM - 11:20 AM, Location: Buckeye B
PROC report is a powerful tool that will often allow a programmer to produce a compete report in just one PROC. It is especially useful for big files (yes, it is a big data technique). The rules for creating variables, especially variables involving totals or sub-totals have been difficult to explain. This cartoon formatted, and example based, presentation makes understanding the creation of variables involving running sub-totals easy. Slides for this talk used to be included on a CD included in the back of Art Carpenter's excellent book on PROC Report.
TT14 : the Easiest Paper Ever Written. Ron Fehd, the Macro Maven, answers your macro questions
Ronald Fehd, Stakana Analytics
Monday, 4:00 PM - 4:50 PM, Location: Buckeye B
"We are not making this up!*
TT16-SAS : Recapping Two Winning ODS Layout Talks for SAS® 9.4: ODS Destination for PowerPoint and the ODS PDF Destination
Bari Lawhorn, SAS
Tuesday, 9:30 AM - 10:20 AM, Location: Buckeye B
Get your seats now for an ODS layout doubleheader! Bari Lawhorn, SAS Senior Principal Technical Support Analyst, will present highlights from two very popular SAS Global Forum presentations about ODS layout: Jane Eslinger's 2016 paper, The Dynamic Duo: ODS Layout and the ODS Destination for PowerPoint, and Scott Huntley's 2015 paper, An Insider's Guide to ODS LAYOUT Using SAS® 9.4. For the lead-off, Bari discusses the winning combination of ODS layout and the ODS destination for PowerPoint, which produces native PowerPoint files from your output. Using ODS destination for PowerPoint together with ODS layout enables you to dynamically place your output on each slide. Through code examples, Bari shows you how to create a custom title slide as well as place the desired number of graphs and tables on each slide. Next up in Bari's talk is the topic of the ODS LAYOUT syntax that has been preproduction in the PDF destination for many SAS versions. She will cover insider tips for how to update your SAS® 9.3 code and how to get the best-looking results when you use ODS LAYOUT and the PDF destination. This presentation is sure to be a hit with ODS fans.
e-Posters
PO02 : Sorting a Bajillion Records: Conquering Scalability in a Big Data WorldTroy Hughes, No Affiliation
"Big data" is often distinguished as encompassing high volume, velocity, or variability of data. While big data can signal big business intelligence and big business value, it also can wreak havoc on systems and software ill-prepared for its profundity. Scalability describes the ability of a system or software to adequately meet the needs of additional users or its ability to utilize additional processors or resources to fulfill those added requirements. Scalability also describes the adequate and efficient response of a system to increased data throughput. Because sorting data is one of the most common as well as resource-intensive operations in any software language, inefficiencies or failures caused by big data often are first observed during sorting routines. Much SAS® literature has been dedicated to optimizing big data sorts for efficiency, including minimizing execution time and, to a lesser extent, minimizing resource usage (i.e., memory and storage consumption). Less attention has been paid, however, to implementing big data sorting that is reliable and robust even when confronted with resource limitations. To that end, this text introduces the SAFESORT macro that facilitates a priori exception handling routines (which detect environmental and data set attributes that could cause process failure) and post hoc exception handling routines (which detect actual failed sorting routines). If exception handling is triggered, SAFESORT automatically reroutes program flow from the default sort routine to a less resource-intensive routine, thus sacrificing execution speed for reliability. However, because SAFESORT does not exhaust system resources like default SAS sorting routines, in some cases it performs more than 200 times faster than default SAS sorting methods. Macro modularity moreover allows developers to select their favorite sorting routine and, for data-driven disciples, to build fuzzy logic routines that dynamically select a sort algorithm based on environmental and data set attributes.
PO03 : A Predictive Logistic Regression Model for Chronic Kidney Disease
Jingye Wang, Washington University in St. Louis
Approximately 26 million adult Americans experienced CKD3. Hypertension and Diabetes were potential risk factors of most cases of CKD. Nearly half of patients with type 2 diabetes mellitus (T2DM) developed CKD eventually. Unfortunately, CKD is often ignored in its early, most treatable stages. The diagnose of CKD was based on blood, urine and imaging tests. Thus, many patients were diagnosed at end-stage of CKD. An ideal model to predict CKD should be easy to implement, accurate and daily. Thus, I used SAS® software 9.3 to build a predictive logistic regression model for diabetic patients to predict their posibility of having chronic kidney disease. The dataset contained 400 patients with or without CKD which was collected in Apollo Hospital from January 2010 to July 2015. The predictors of this model were age in years, blood glucose in mgs/dl and pedal edema. The outcome was the development of CKD. This model predicted 74.9% of the CKD patients correctly and 81.0% of the non-CKD patients correctly for a total of 77.7% correctly predicted. This model has important implications for diabetic patients to predict their possibility of having CKD at home. Using this model, lower risk patients could be managed by primary physician without additional testing or treatment of CKD. High risk patients can have CKD screening routinely. The strength of this model is easy to practice at home.
PO04 : When ANY Function Will Just NOT Do
Richann Watson, Experis
Karl Miller, InVentiv Health
Have you ever been working on a task and wondered if there might be a SAS® function that could save you some time? Let alone might be able to do the work for you? Data review and validation tasks can be time consuming efforts that with any gain in efficiency is highly beneficial, especially if you can achieve a standard level where the data itself can drive parts of the process. The 'ANY' and 'NOT' functions can help alleviate some of the manual work in many tasks from data review of variable values, data compliance and formats through the derivation or validation of a variable's datatype. The list goes on. In this poster we simply cover the functions and a summary of their details in use in an example of handling date and time data in mapping to the ISO8601 date/time formatting.
PO05 : Multicollinearity: What Is It and What Can We Do About It?
Deanna Schreiber-Gregory, National University
Multicollinearity can be briefly described as the phenomenon in which two or more identified predictor variables in a multiple regression model are highly correlated. The presence of this phenomenon can have a negative impact on the analysis as a whole and can severely limit the conclusions of the research study. This paper will review and provide examples of the different ways in which the multicollinearity can affect a research project, how to detect multicollinearity, and how to reduce it once it is found. In order to demonstrate the effects of multicollinearity and how to combat it, this paper will explore the proposed techniques through utilization of the Behavioral Risk Factor Surveillance System dataset. This paper is intended for any level of SAS® user. This paper is also written to an audience with a background in behavioral science and/or statistics.
PO06 : Protein NMR Reference Correction: A statistical approach for an old problem.
Xi Chen, University of Kentucky
Hunter Moseley, University of Kentucky
This is a statistical approach poster, not a programming paper. Accurate chemical-shifts assignments are a vital requirement for many aspects of biomacromolecular NMR, especially protein structure determination. While 4,4-dimethyl-4-silapentane-1-sulfonic acid (DSS) is the established reference standard, application is error-prone, especially by non-experts. Therefore, computational methods for correcting referencing are needed. We are developing a statistical-based algorithm to correct referencing by: 1) calculating composition probabilities of investigating protein C± and C² resonance pairs from the NMR data; 2) summing the probabilities across all resonance pairs to give an estimate of amino acid (AA) composition; and 3) employing a grid search method to find a minimum difference (correct referencing value) between predicted and actual protein AA composition. We show that C±/C² resonance covariance (dependence) is a potent statistic and that oxidized/reduced cysteine residues should be treated separately. Our results demonstrate that the overall approach is feasible and will provide the biomolecular NMR field with a unique tool allowing spectral referencing to be corrected and refined at the beginning of protein NMR data analysis without using chemical shift assignments or protein structure as needed by current retrospective referencing correction methods. Thus, our method should improve both the speed and quality of protein resonance assignment and downstream NMR-based analyses including structure determination.
PO07 : StatTag: A New Tool for Conducting Reproducible Research with SAS
Abigail Baldridge, Northwestern University
Leah Welty, Northwestern University
Luke Rasmussen, Northwestern University
StatTag is a free plug-in for conducting reproducible research and creating dynamic documents using Microsoft Word and SAS (or Stata). StatTag was developed to address a need in the research community: there were no broadly accessible tools to integrate document preparation in Word with statistical code, results, and data. Tools such as StatRep and SASWeave use LaTeX, and plain text editors for document preparation. Despite the merits of these programs, Microsoft Word remains the mainstay, and sometimes singular option, for manuscript preparation in many fields. SAS ODS, while user friendly and compatible with Word, is one-directional in editing capabilities. No downstream changes to the rendered RTF document are reflected in the source code. StatTag allows users to embed statistical output (estimates, tables, and figures) within Word and provides an interface to edit statistical code directly from Word. This output can then be individually or collectively updated in one-click with a behind-the-scenes call to SAS. With StatTag, modification of a dataset or analysis no longer entails transcribing or re-copying results in to a manuscript or table. StatTag is compatible with Microsoft Word 2010 or higher and SAS v9.4 installed on a computer running Microsoft Windows 7 or higher. This software has the potential to change the way statisticians and analysts collaborate on a daily basis, and will be broadly useful to all.
PO08 : Document and Enhance Your SAS® Code, Data Sets, and Catalogs with SAS Functions, Macros, and SAS Metadata
Louise Hadden, Abt Associates Inc.
Roberta Glass, Abt Associates Inc.
Discover how to document your SAS® programs, data sets, and catalogs with a few lines of code that include SAS functions, macro code, and SAS metadata. Do you start every project with the best of intentions to document all of your work, and then fall short of that aspiration when deadlines loom? Learn how your programs can automatically update your processing log. If you have ever wondered who ran a program that overwrote your data, SAS has the answer! And If you don't want to be tracing back through a year's worth of code to produce a codebook for your client at the end of a contract, SAS has the answer! This presentation will be relevant for all SAS users at all levels, and relies on tools available in BASE SAS on all platforms.
PO09 : What to Expect When You Need to Make a Data Delivery. . . Helpful Tips and Techniques
Louise Hadden, Abt Associates Inc.
Making a data delivery to a client is a complicated endeavor. There are many aspects that must be carefully considered and planned for: de-identification, public use versus restricted access, documentation, ancillary files such as programs, formats, and so on, and methods of data transfer, among others. This paper provides a blueprint for planning and executing your data delivery, regardless of versions of SAS(R), platforms and proficiency in SAS.
PO13 : Regression Analysis of the Levels of Chlorine in the Public Water Supply in Orange County, FL
Drew Doyle, University of Central Florida
This paper will analyze a particular set of water samples randomly collected from locations in Orange County, Florida. Thirty water samples were collected and had their chlorine level, temperature, and pH recorded. A linear regression analysis was performed on the data collected with several qualitative and quantitative variables. Water storage time, temperature, time of day, location, pH, and dissolved oxygen level were designated as the independent variables collected from each water sample. All data collected was analyzed through various Statistical Analysis System (SAS®) procedures. A partial residual plot was used for each variable to determine possible relationships between the chlorine level and the independent variables. Stepwise selection was used to eliminate possible insignificant predictors. From there, several possible models for the data were selected. F tests were conducted to determine which of the models appears to be the most useful. There was an analysis of the residual plot, jackknife residuals, leverage values, Cook's D, PRESS statistic, and normal probability plot of the residuals. Possible outliers were investigated and the critical values for flagged observations were stated along with what problems the flagged values indicate.