MWSUG 2017
- Conference Overview
- Conference Wrap-Up
- Conference Invitation
- Important Dates
- Registration & Rates
- Hotel & Parking
- Transportation
- Schedule Overview
- Conference Committee
- About the Logo
- Conference Content
- Section Descriptions
- Paper Presentations
- Schedule Grid
- Mobile App
- Special Events
- Training Courses
- The Quad
- SAS Super Demos
- Meet the Presenters
- For Presenters
- Call for Papers
- Presenter Resources
- Scholarships
- Students
- Junior Professionals
- Get Involved
- Be a Volunteer
- Best Paper Winners
Proceedings
MWSUG 2017 Paper Presentations
Paper presentations are the heart of a SAS users group meeting. MWSUG 2017 will feature dozens of paper presentations organized into 13 academic sections covering a variety of topics and experience levels.
Note: Content and schedule are subject to change. Last updated 29-Sep-2017.
- BI / Customer Intelligence
- Banking and Finance
- Beyond the Basics SAS
- Data Visualization and Graphics
- Data for Good
- Hands-on Workshops
- Pharmaceutical Applications
- Rapid Fire
- SAS 101
- Statistics / Advanced Analytics
- System Architecture and Administration
- Tools of the Trade
- e-Posters
BI / Customer Intelligence
Paper No. | Author(s) | Paper Title (click for abstract) |
BI02-SAS | Jesse Sookne et al. | If You Build It, Will They Understand? Designing Reports for the General Public in SAS® Visual Analytics |
BI03-SAS | Jesse Sookne et al. | Accessibility and SAS® Visual Analytics Viewers: Which Report Viewer Is Best for Your Users' Needs? |
Banking and Finance
Paper No. | Author(s) | Paper Title (click for abstract) |
BF01 | Samuel Berestizhevsky & Tanya Kolosova |
The Cox Hazard Model for Claims Data: Bayesian non-parametric approach |
BF02 | Chaoxian Cai | Computing Risk Measures for Loan Facilities with Multiple Lines of Draws |
BF03 | Hairong Gu et al. | Untangle Customer's Incrementality using Uplift Modeling with a Case Study on Direct Marketing Campaign |
Beyond the Basics SAS
Paper No. | Author(s) | Paper Title (click for abstract) |
BB015 | Art Carpenter | Advanced Macro: Driving a Variable Parameter System with Metadata |
BB042 | Derek Morgan | Demystifying Intervals |
BB047 | John Schmitz | Extraction and Use of Text Strings with SAS when Source exceeds the 32k String Length Limit |
BB071 | Josh Horstman | Fifteen Functions to Supercharge Your SAS Code |
BB113 | Ben Cochran | You Did That With SAS? Combining Text with Graphics Output to Create Great Looking Reports. |
BB114 | Ben Cochran | Tackling Unique Problems by Using TWO SET Statements in ONE DATA Step |
BB124 | Lynn Mullins & Richann Watson |
Exploring HASH Tables vs. SORT/DATA Step vs. PROC SQL |
BB129-SA | Jason Secosky | DATA Step in SAS Viya: Essential New Features |
BB142 | John King | DOSUBL and the Function Style Macro |
Data Visualization and Graphics
Paper No. | Author(s) | Paper Title (click for abstract) |
DV01 | Kirk Paul Lafler | An Introduction to ODS Statistical Graphics |
DV02 | Ilya Krivelevich et al. | Waterfall Plots in Oncology Studies in the Case of Multi-Arms Design |
DV04 | Ting Sa | A Macro that can Create U.S State and U.S County KML Files |
DV07 | Mia Lyst et al. | A Big Data Challenge: Visualizing Social Media Trends about Cancer using SAS® Text Miner |
DV08-SAS | Cheryl Coyle | Data Can Be Beautiful: Crafting a Compelling Story with SAS® Visual Analytics |
DV09 | Piyush Singh et al. | Patient Safety with SAS® Visual Analytics |
Data for Good
Paper No. | Author(s) | Paper Title (click for abstract) |
DG01 | Brandy Sinco et al. | Correlation and Structural Equation Analysis on the Effects of Anti-Discrimination Policies and Resources on the Well Being of Lesbian, Gay, and Bisexual College Students |
DG02 | Deanna Schreiber-Gregory | Exploring the Relationship Between Substance Abuse and Dependence Disorders and Discharge Status: Results and Implications |
DG03 | David Corliss | Lag Models with Social Response Outcomes |
DG04 | Andrea Frazier | The (Higher) Power of SAS® |
Hands-on Workshops
Paper No. | Author(s) | Paper Title (click for abstract) |
HW01 | Kirk Paul Lafler | Hands-on Introduction to SAS® and the ODS Excel® Destination |
HW02 | Ben Cochran | Using a Few SAS Functions to Clean Dirty Data |
HW03 | Kent Phelps & Ronda Phelps |
Base SAS® and SAS® Enterprise Guide® ~ Automate Your SAS World with Dynamic Code; Your Newest BFF (Best Friend Forever) in SAS |
HW04 | Ted Conway | A Hands-On Introductory Tour of SAS® ODS Graphics |
HW05 | Chuck Kincaid | Intermediate SAS® ODS Graphics |
Pharmaceutical Applications
Paper No. | Author(s) | Paper Title (click for abstract) |
PH01 | Lingling Xie & Xiaoqi Li |
Mapping MRI data to SDTM and ADaM |
PH02 | Derek Morgan | ISO 8601 and SAS®: A Practical Approach |
PH04 | Hao Sun et al. | AIR Binder 2.0: A Dynamic Visualization, Data Analysis and Reporting SAS Application for Preclinical and Clinical ADME Assays, Pharmacokinetics, Metabolite Profiling and Identification |
PH05 | Richann Watson & Josh Horstman |
Automated Validation of Complex Clinical Trials Made Easy |
PH06 | Nancy Brucken & Karin Lapann |
ADQRS: Basic Principles for Building Questionnaire, Rating and Scale Analysis Datasets |
Rapid Fire
Paper No. | Author(s) | Paper Title (click for abstract) |
RF01 | Jayanth Iyengar | Ignorance is not bliss - understanding SAS applications and product contents |
RF02 | Art Carpenter | Quotes within Quotes: When Single (') and Double (") Quotes are not Enough |
RF03 | Ming Yan | No News Is Good News: A Smart Way to Impute Missing Clinical Trial Lab Data |
RF05 | Ting Sa | Macro that can Provide More Information for your Character Variables |
RF06 | Aaron Barker | Cleaning Messy Data: SAS Techniques to Homogenize Tax Payment Data |
RF07 | Louise Hadden | PROC DOC III: Self-generating Codebooks Using SAS® |
RF08 | Nancy Brucken | What Are Occurrence Flags Good For Anyway? |
SAS 101
Statistics / Advanced Analytics
System Architecture and Administration
Paper No. | Author(s) | Paper Title (click for abstract) |
SY01 | Bob Matsey | Using Agile Analytics for Data Discovery |
SY04-SAS | Amy Peters | The Future of the SAS Platform |
Tools of the Trade
Paper No. | Author(s) | Paper Title (click for abstract) |
TT01 | Jack Shoemaker | Generating Reliable Population Rates Using SAS® Software |
TT02 | Richann Watson | Check Please: An Automated Approach to Log Checking |
TT03 | Misty Johnson | Arbovirus, Varicella and More: Using SAS® for Reconciliation of Disease Counts |
TT04 | Paul Kaefer | Code Like It Matters: Writing Code That's Readable and Shareable |
TT06 | Laurie Smith | From Device Text Data to a Quality Dataset |
TT07 | Doug Zirbel | Proc Transpose Cookbook |
TT08 | Louise Hadden | Get Smart! Eliminate Kaos and Stay in Control - Creating a Complex Directory Structure with the DLCREATEDIR Statement, SAS® Macro Language, and Control Tables |
TT09 | Lakhpreet Gill | An Array of Possibilities: Manipulating Longitudinal Survey Data with Arrays |
TT10 | David Oesper | Fully Automated Updating of Arbitrarily Complex Excel Workbooks |
e-Posters
Paper No. | Author(s) | Paper Title (click for abstract) |
PO01 | Louise Hadden | Red Rover, Red Rover, Send Data Right Over: Exploring External Geographic Data Sources with SAS® |
PO02 | Louise Hadden | SAS/GRAPH® and GfK Maps: a Subject Matter Expert Winning Combination |
PO03 | Lakshmi Nirmala Bavirisetty & Deanna Schreiber-Gregory |
Data Quality Control: Using High Performance Binning to Prevent Information Loss |
PO05 | Lynn Mullins & Richann Watson |
Let's Get FREQy with our Statistics: Let SAS® Determine the Appropriate Test Statistic Based on Your Data |
Abstracts
BI / Customer Intelligence
BI02-SAS : If You Build It, Will They Understand? Designing Reports for the General Public in SAS® Visual AnalyticsJesse Sookne, SAS
Ed Summers, SAS
Julianna Langston, SAS
Karen Mobley, SAS
Tuesday, 10:30 AM - 10:50 AM, Location: Regency F
Many organizations that use SAS® Visual Analytics must conform with accessibility requirements such as Section 508, the Americans with Disabilities Act, and the Accessibility for Ontarians with Disabilities Act. SAS Visual Analytics provides a number of different ways to view reports, including the SAS® Report Viewer and SAS® Mobile BI native apps for Apple iOS and Google Android. Each of these options has its own strengths and weaknesses when it comes to accessibility-a one-size-fits-all approach is unlikely to work well for the people in your audience who have disabilities. This paper provides a comprehensive assessment of the latest versions of all SAS Visual Analytics report viewers, using the Web Content Accessibility Guidelines (WCAG) version 2.0 as a benchmark to evaluate accessibility. You can use this paper to direct the end users of your reports to the viewer that best meets their individual needs.
BI03-SAS : Accessibility and SAS® Visual Analytics Viewers: Which Report Viewer Is Best for Your Users' Needs?
Jesse Sookne, SAS
Kristin Barker, SAS
Joe Sumpter, SAS
Lavanya Mandavilli, SAS
Tuesday, 11:00 AM - 11:50 AM, Location: Regency F
Many organizations that use SAS Visual Analytics must conform with accessibility requirements such as Section 508, the Americans with Disabilities Act, and the Accessibility for Ontarians with Disabilities Act. Visual Analytics provides a number of different ways to view reports, including the SAS Report Viewer and SAS Mobile BI native apps for iOS and Android. Each of these options has its own strengths and weaknesses when it comes to accessibility -- a one-size-fits-all approach is unlikely to work well for the people in your audience who have disabilities. This paper provides a comprehensive assessment of the latest versions of all Visual Analytics report viewers, using the Web Content Accessibility Guidelines (WCAG) version 2.0 as a benchmark to evaluate accessibility. You can use this paper to direct the end users of your reports to the viewer that best meets their individual needs.
Banking and Finance
BF01 : The Cox Hazard Model for Claims Data: Bayesian non-parametric approachSamuel Berestizhevsky, Consultant
Tanya Kolosova, Co-author
Tuesday, 2:00 PM - 2:50 PM, Location: Regency F
The central piece of claim management is claims modeling. Two strategies are commonly used by insurers to analyze claims: the two-part approach that decomposes claims cost into frequency and severity components, and the pure premium approach that uses the Tweedie distribution. In this article, we provide a general framework to look into the process of modeling of claims using Cox Hazard Model. The Cox proportional hazard (PH) model is a standard tool in survival analysis for studying the dependence of a hazard rate on covariates and time. This article is a case study intended to indicate a possible application of Cox PH model to workers' compensation insurance, particularly occurrence of claims (disregarding claims size). In our study, the claims data is from workers' compensation insurance in selected industries and states in the United States for the period of 2 years from November 01, 2014 till October 31, 2016. We present application of the Bayesian approach to survival analysis (time-to-event analysis) that allows dealing with violations of assumptions of Cox PH model. SAS 9.2, any operating system, no requirements to the skill level or statistical or machine-learning background.
BF02 : Computing Risk Measures for Loan Facilities with Multiple Lines of Draws
Chaoxian Cai, BMO Harris Bank
Tuesday, 3:00 PM - 3:20 PM, Location: Regency F
In commercial lending, a commitment facility may have multiple lines of draws with hierarchical loan structures. The risk measures for main, limit, and sublimit commitments are usually aggregated and reported at the main obligation level. Thus finding all hierarchical loan structures in loan and commitment tables is required in order to aggregate the risk measures. In this paper, I will give a brief introduction of commercial loans from simple standalone loans to revolving and non-revolving commitments with complex loan structures. I will present a SAS macro program, which can be used to identify main obligations and loan structures of future commitments from loan and commitment relational tables using Base SAS data steps. Risk measures, such as exposure at default (EAD) and credit conversion factor (CCF), are computed for these complicated loans and illustrated by examples.
BF03 : Untangle Customer's Incrementality using Uplift Modeling with a Case Study on Direct Marketing Campaign
Hairong Gu, Alliance Data
Yi Cao, Alliance Data
Chao Xu, Alliance Data
Tuesday, 3:30 PM - 3:50 PM, Location: Regency F
It is well known that uplift modeling, which directly models the incremental impact of a marketing promotion on consumer's behavior, helps marketers to identify and target these "persuadable" customers whose propensities of response are driven by promotion. However, in reality, some of those "persuadables" are promotion chasers, meaning that they tend to exploit the offer but not contribute to bottom line sales. In this paper, we propose a two-fold uplift model - modeling both incremental response rate and incremental sales. Using this model, promotion riders in "persuadables" can be further identified and customers who can bring maximal incremental campaign ROI will surface. We demonstrate this two-fold uplift model through a case study on a real-world direct marketing campaign. The goal of this study is pinpoint a roadmap to build useful uplift models with both technical and business acumen to bolster the success of marketing campaigns.
Beyond the Basics SAS
BB015 : Advanced Macro: Driving a Variable Parameter System with MetadataArt Carpenter, CA Occidental Consultants
Monday, 9:00 AM - 9:50 AM, Location: Regency A
When faced with generating a series of reports, graphs, and charts; we will often use the macro language to simplify the process. Commonly we will write a series of generalized macros, each with the capability of creating a variety of outputs that depend the macro parameter inputs. For large projects, potentially with hundreds of outputs, controlling the macro calls can itself become difficult. The use of control files (metadata) to organize and process when a single macro is to be executed multiple times was discussed by Rosenbloom and Carpenter (2015). But those techniques only partially help us when multiple macros, each with its own set of parameters, are to be called. This paper discusses a technique that allows you to control the order of macro calls along with each macro's parameter set, while using a metadata control file.
BB042 : Demystifying Intervals
Derek Morgan, PAREXEL
Tuesday, 9:00 AM - 9:50 AM, Location: Regency A
Intervals have been a feature of base SAS for a long time, allowing SAS users to work with commonly (and not-so-commonly) defined periods of time such as years, months, and quarters. With the release of SAS 9, there are more options and capabilities for intervals and their functions. This paper will first discuss the basics of intervals in detail, and then we will discuss several of the enhancements to the interval feature, such as the ability to select how the INTCK() function defines interval boundaries and the ability to create your own custom intervals beyond multipliers and shift operators.
BB047 : Extraction and Use of Text Strings with SAS when Source exceeds the 32k String Length Limit
John Schmitz, Luminare Data
Monday, 10:00 AM - 10:50 AM, Location: Regency A
Database Systems support text fields that can be much larger than what are supported by SAS® and SAS/ACCESS® systems. These fields may contain notes, unparsed and unformatted text, or XML data. This paper offers a solution for lengthy text data. Using SQL explicit pass-through, minimal native SQL coding, and some SAS macro logic, SAS developers can easily extract and store these strings as a set of substring elements that SAS can process and store without data loss.
BB071 : Fifteen Functions to Supercharge Your SAS Code
Josh Horstman, Nested Loop Consulting
Monday, 11:00 AM - 11:50 AM, Location: Regency A
The number of functions included in SAS software has exploded in recent versions, but many of the most amazing and useful functions remain relatively unknown. This paper will discuss such functions and provide examples of their use. Almost any SAS programmer should find something new to add to their toolbox.
BB113 : You Did That With SAS? Combining Text with Graphics Output to Create Great Looking Reports.
Ben Cochran, The Bedford Group
Monday, 8:00 AM - 8:50 AM, Location: Regency A
Using PROC DOCUMENT and other procedures and methods from the SAS ODS package enables a SAS user to create fantastic reports. And since ODS is a part of Base SAS, the DOCUMENT procedure is, likewise, a part of Base SAS. It can do many things to enhance output. The main thrust of this presentation is to illustrate combining a Text file with SAS Procedure (GPLOT and REPORT) output. This report will then be used to create a PDF file. This paper shows how to go beyond the limitations of PROC DOCUMENT to deliver the complete package.
BB114 : Tackling Unique Problems by Using TWO SET Statements in ONE DATA Step
Ben Cochran, The Bedford Group
Tuesday, 8:00 AM - 8:50 AM, Location: Regency A
This paper illustrates solving many problems by creatively using TWO SET statements in ONE DATA step. Calculating percentages, Conditional Merging, Conditionally using Indexes, Table Lookups and Look ahead operations are investigated in this paper.
BB124 : Exploring HASH Tables vs. SORT/DATA Step vs. PROC SQL
Lynn Mullins, PPD
Richann Watson, Experis
Tuesday, 11:30 AM - 11:50 AM, Location: Regency A
There are often times when programmers need to merge multiple SAS® data sets to combine data into one single source data set. Like many other processes, there are various techniques to accomplish this using SASsoftware. The most efficient method to use based on varying assumptions will be explored in this paper. We will describe the differences, advantages and disadvantages, and display benchmarks of using HASH tables, the SORT and DATA step procedures, and the SQL procedure.
BB129-SA : DATA Step in SAS Viya: Essential New Features
Jason Secosky, SAS
Tuesday, 10:00 AM - 10:20 AM, Location: Regency A
The DATA step is the familiar and powerful data processing language in SAS® and now SAS Viya. The DATA step's simple syntax provides row-at-a-time operations to edit, restructure, and combine data. New to the DATA step in SAS Viya are a varying-size character data type and parallel execution. Varying-size character data enables intuitive string operations that go beyond the 32KB limit of current DATA step operations. Parallel execution speeds the processing of big data by starting the DATA step on multiple machines and dividing data processing among threads on these machines. To avoid multi-threaded programming errors, the run-time environment for the DATA step is presented along with potential programming pitfalls. Come see how the DATA step in SAS Viya makes your data processing simpler and faster.
BB142 : DOSUBL and the Function Style Macro
John King, Ouachita Clinical Data Services, Inc.
Tuesday, 10:30 AM - 11:20 AM, Location: Regency A
The introduction of the SAS® function DOSUBL has made it possible to write certain function style macros that were previously impossible or extremely difficult. This Beyond the Basics talk will discuss how to write a function style macro that uses DOSUBL to run SAS code and return an "Expanded Variable List" as a text string. While this talk is directed toward a specific application the techniques can be more generally applied.
Data Visualization and Graphics
DV01 : An Introduction to ODS Statistical GraphicsKirk Paul Lafler, Software Intelligence Corporation
Monday, 2:00 PM - 2:50 PM, Location: Sterling 6
Delivering timely and quality looking reports, graphs and information to management, end users, and customers is essential. This presentation provides SAS® users with an introduction to ODS Statistical Graphics found in the Base-SAS software. Attendees learn basic concepts, features and applications of ODS statistical graphic procedures to create high-quality, production-ready output; an introduction to the statistical graphic SGPLOT procedure, SGPANEL procedure, and SGSCATTER procedure; and an illustration of plots and graphs including histograms, vertical and horizontal bar charts, scatter plots, bubble plots, vector plots, and waterfall charts.
DV02 : Waterfall Plots in Oncology Studies in the Case of Multi-Arms Design
Ilya Krivelevich, Eisai Inc
Kalgi Mody, Eisai Inc
Simon Lin, Eisai Inc
Monday, 3:00 PM - 3:20 PM, Location: Sterling 6
Clinical data are easier to understand when presented in a visual format. In Oncology, in addition to the commonly used survival curves, other types of graphics can be helpful in describing response in a study. These plots are becoming more and more popular due to their easy-to-understand representation of data. Waterfall plots can help to visualize tumor shrinkage or growth; in such plots, each patient in the study is presented by a vertical bar on the plot and each bar represents the maximum change in the measurement of tumors. In the studies with two arms, waterfall plots are often used to compare the outcome between arms. The excellent ground for understanding waterfall plots is proposed in the article of Theresa W. Gillespie, PhD, MA, RN: Understanding Waterfall Plots, Journal of the Advanced Practitioner in Oncology, 2012 Mar-Apr. This article claims that "A study using a randomization scheme other than 1:1 will not lend itself as well to a waterfall plot technique. As stated previously, since each vertical plot represents a single patient, waterfall plots limit the ability to portray different randomization schemes, e.g., 2:1 or 3:1". This presentation shows how we can solve this problem by new techniques, using PROC SGPANEL and Graph Template Language.
DV04 : A Macro that can Create U.S State and U.S County KML Files
Ting Sa, Cincinnati Children's Hospital Medical Center
Monday, 3:30 PM - 3:50 PM, Location: Sterling 6
In this paper, a macro is introduced that can generate the KML files for the U.S states and U.S counties. For the generated KML files, they can be used directly by the Google maps to add customized state and county layers on it with user defined colors and transparencies. When the state and county layers are clicked on the Google maps, customized information will show up. To use the macro, the user only need to prepare a simple input SAS data set. The paper will include all the SAS codes for the macro and provide examples to show you how to use it as well as show you how to display the KML files on the Google map.
DV07 : A Big Data Challenge: Visualizing Social Media Trends about Cancer using SAS® Text Miner
Mia Lyst, Pinnacle Solutions, Inc
Scott Koval, Pinnacle Solutions, Inc
Yijie Li, Pinnacle Solutions, Inc.
Monday, 4:00 PM - 4:20 PM, Location: Sterling 6
Analyzing big data and visualizing trends in social media is a challenge that many companies face as large sources of publically available data become accessible. While the sheer size of usable data can be staggering, knowing how to find trends in unstructured textual data is just as important of an issue. At a Big Data conference, data scientists from several companies were invited to participate in tackling this challenge by identifying trends in cancer using unstructured data from Twitter users and presenting their results. This paper explains how our approach using SAS analytical methods was superior to other Big Data approaches in investigating these trends.
DV08-SAS : Data Can Be Beautiful: Crafting a Compelling Story with SAS® Visual Analytics
Cheryl Coyle, SAS
Monday, 4:30 PM - 4:50 PM, Location: Sterling 6
Do your reports effectively communicate the message you intended? Are your reports aesthetically pleasing? An attractive report does not ensure the accurate delivery of a data story, nor does a logical data story guarantee visual appeal. This paper provides guidance for SAS® Visual Analytics Designer users to facilitate the creation of compelling data stories. The primary goal of a report is to enable readers to quickly and easily get answers to their questions. Achievement of this goal is strongly influenced by choice of visualizations for the data being shown, quantity and arrangement of the information included, and the use, or misuse, of color. This paper describes how to guide readers' movement through a report to support comprehension of the data story; provides tips on how to express quantitative data using the most appropriate graphs; suggests ways to organize content through the use of visual and interaction design techniques; and instructs report designers on color meaning, presenting the notion that even subtle changes in color can evoke different feelings than those intended. A thoughtfully designed report can educate the viewer without compromising visual appeal. Included in this paper are recommendations and examples which, when applied to your own work, will help you create reports that are both informative and beautiful.
DV09 : Patient Safety with SAS® Visual Analytics
Piyush Singh, TCS
Prasoon Sangwan, TCS
Ghiyasudin Khan, TCS
Monday, 11:30 AM - 11:50 AM, Location: Sterling 6
In clinical trials, ensuring patient safety is the supreme priority. Stringent monitoring of safety outcomes is a key regulatory consideration. A trial may coincide with unacceptable patient safety within a given indication and needs to be captured as early as possible to save time and money. Monitoring is done by generating various reports to identify any adverse event, checking its severity, identifying its reason etc. These reports can be huge and may take a lot of time and effort to put them together to come to an outcome. In this paper we will explain how SAS® Visual Analytics can play a vital role in generation of reports to enable easy and deep down monitoring of the patient safety. Its features not only helps in quick generation of reports but also help the monitoring committees to answer their queries , draw inferences and give a complete view of the safety outcome of the trial.
Data for Good
DG01 : Correlation and Structural Equation Analysis on the Effects of Anti-Discrimination Policies and Resources on the Well Being of Lesbian, Gay, and Bisexual College StudentsBrandy Sinco, University of Michigan
Michael Woodford, Wilfred Laurier University, Ontario, CA
Jun Sung Hong, Wayne State University
Jill Chonody, Indiana University
Monday, 8:00 AM - 8:50 AM, Location: Sterling 6
Methods: Among a convenience sample of cisgender LGBQ college students (n=268), we examined the association between college- and state-level structural factors and students' experiences of campus hostility and microaggressions, psychological distress, and self-acceptance. Relationships between these outcomes were first examined with Spearman correlation coefficients. Structural Equation Modeling (SEM) was used to explore the meditating relationship of college-level structural factors on discrimination, distress, and self-acceptance. SAS Proc Corr was used for the correlation analysis and Proc CALIS was used for the SEM. The EffPart feature in Proc CALIS was used to test for a mediating effect from an inclusive non-discrimination policy to (hostility and microaggressions) to psychological distress. Results: State-level factors were not correlated with students' experiences nor psychological well being. Both the correlation matrix and SEM results suggested positive benefits from select college policies and resources, particularly non-discrimination policies that include both gender identity and sexual orientation (versus only sexual orientation). Based on the SEM and correlation matrix, a non-discrimination policy that included both sexual orientation and gender identity was significantly associated with lower microaggressions and overt hostility, p<.05. Higher LGBTQ student organization to study body ratios were also significantly associated with reduced microaggressions and hostility, in addition to lower stress and anxiety. The SEM model indices indicated good absolute fit, incremental fit, parsimony, and predictive ability with CFI>.95, along with RMSEA and SRMR<.05. Conclusion: An inclusive non-discrimination policy, that includes transgender students, also provides a healthier college environment for cisgender students.
DG02 : Exploring the Relationship Between Substance Abuse and Dependence Disorders and Discharge Status: Results and Implications
Deanna Schreiber-Gregory, Henry M Jackson Foundation for the Advancement of Military Medicine
Monday, 9:00 AM - 9:50 AM, Location: Sterling 6
The goal of this study was to investigate the association between substance abuse and dependence diagnoses and discharge status for patients admitted to a short-term acute care facility in the United States while controlling for gender, age, marital status, region, admission type, primary form of payment, days of care, and race. A series of univariate and multivariate logistic regression analyses as well as a propensity analysis were conducted via SAS®9.4 to explore the association of the target variable, a primary diagnosis of substance abuse or dependence, with the treatment variable, discharge status, and identified control variables among patients who were admitted to a short-term acute care facility in the United States. The results revealed a significant relationship between having a primary substance abuse or dependence diagnosis and discharge status while controlling for discharge status propensity and possible confounding variables. Significant and non-significant odds ratio effects are provided and reviewed. Results supported that patients with a primary diagnosis of substance abuse or dependence differ significantly in terms of resulting discharge status than the rest of the patient population. Limitations and strengths of the data set used are discussed and the effects of these limitations and strengths on the power and results of this model are reviewed. This paper is for any level of SAS user with an interested in the statistical evaluation of mental health care in acute care facilities.
DG03 : Lag Models with Social Response Outcomes
David Corliss, Peace-Work
Monday, 10:00 AM - 10:50 AM, Location: Sterling 6
Lag models are a type of time series analysis where the current value of an outcome variable is modeled based, at least in part, on previous values of predictor variables. This creates new opportunities for the use of social media data, both as the result of previous events and as predictors of future outcomes. This paper demonstrates lag models with social media data to establish a connection between severe solar storms and subsequent hardware failures based on complaints recorded in Twitter. The methodology is then used to investigate the possibility of a statistical link between hate speech and subsequent acts of violence against persons targeted by the speech.
DG04 : The (Higher) Power of SAS®
Andrea Frazier, Presence Health
Monday, 11:00 AM - 11:20 AM, Location: Sterling 6
Are there spiritual benefits to using SAS® ? One synagogue in Chicago thinks so--and has its own all-volunteer informatics committee! The flexibility of SAS® software is well-known for business applications, but SAS® can also be used to improve membership data and evaluate programming, leading to a better congregant and community experience.
Hands-on Workshops
HW01 : Hands-on Introduction to SAS® and the ODS Excel® DestinationKirk Paul Lafler, Software Intelligence Corporation
Monday, 8:00 AM - 10:00 AM, Location: Sterling 9
SAS software is the "gold" standard for robust and reliable data access, manipulation, analytics, analysis, reporting and data discovery. Microsoft Excel is the most widely used software in the world. This hands-on workshop (HOW) demonstrates the various ways to transfer data, output and results between SAS and Excel software by presenting the most popular paths for connecting and sharing data and results between these software products.
HW02 : Using a Few SAS Functions to Clean Dirty Data
Ben Cochran, The Bedford Group
Monday, 10:00 AM - 12:00 PM, Location: Sterling 9
Manipulating Data can be a big part of what SAS programmers do. A big part of data manipulation is cleaning dirty data. SAS has a number functions that can be used to make this task a little easier. This HOW looks at many of them. Cleaning Data and making Data more consistent is the aim of this HOW.
HW03 : Base SAS® and SAS® Enterprise Guide® ~ Automate Your SAS World with Dynamic Code; Your Newest BFF (Best Friend Forever) in SAS
Kent Phelps, Illuminator Coaching, Inc.
Ronda Phelps, Illuminator Coaching, Inc.
Monday, 2:30 PM - 4:30 PM, Location: Sterling 9
Communication is the basic foundation of all relationships including our SAS relationship with the Server, PC, or Mainframe. To communicate more efficiently ~ and to increasingly automate your SAS World ~ you will want to learn how to transform Static Code into Dynamic Code that automatically recreates the Static Code, and then executes the recreated Static Code automatically. Our Hands-On-Workshop/presentation highlights the powerful partnership which occurs when Dynamic Code is creatively combined with a Dynamic FILENAME Statement, Macro Variables, the INDSNAME SET Option, and the CALL EXECUTE Command within 1 SAS Enterprise Guide Base SAS Program Node. You will have the exciting opportunity to learn how 1,469 time-consuming Manual Steps are amazingly replaced with only 1 time-saving Dynamic Automated Step. We invite you to attend our session where we will detail the UNIX syntax for our project example and introduce you to your newest BFF (Best Friend Forever) in SAS. Please see the Appendices to review starting point information regarding the syntax for Windows and z/OS, and to review the source code that created the data sets for our project example.
HW04 : A Hands-On Introductory Tour of SAS® ODS Graphics
Ted Conway, Self
Tuesday, 8:00 AM - 10:00 AM, Location: Sterling 9
You've heard that SAS ODS Graphics provide a powerful and detailed syntax for creating custom graphs, but for whatever reason still haven't added it to your bag of SAS tricks. Let's change that! Workshop participants will quickly gain experience creating a variety of charts by using an Excel-based code "playground" to submit SAS code examples and view the results directly from Excel. More experienced users will also find the code playground useful for compiling SAS ODS Graphics code snippets for themselves and to share with colleagues, as well as for creating Excel-hosted dashboards containing precisely sized and placed SAS graphics. This workshop is intended for all SAS users, and will use Base SAS, SAS Studio, SAS University Edition, and Microsoft Excel (no SAS Studio or SAS University Edition is needed).
HW05 : Intermediate SAS® ODS Graphics
Chuck Kincaid, Experis Business Analytics
Tuesday, 10:00 AM - 12:00 PM, Location: Sterling 9
This paper will build on the knowledge gained in the Intro to SAS® ODS Graphics. The capabilities in ODS Graphics grow with every release as both new paradigms and smaller tweaks are introduced. After talking with the ODS developers, a selection of the many wonderful capabilities was selected. This paper will look at that selection of both types of capabilities and provide the reader with more tools for their belt. Visualization of data is an important part of telling the story seen in the data. And while the standards and defaults in ODS Graphics are very well done, sometimes the user has specific nuances for characters in the story or additional plot lines they want to incorporate. Almost any possibility, from drama to comedy to mystery, is available in ODS Graphics if you know how. We will explore tables, annotation and changing attributes, as well as the BLOCK plot. Any user of Base SAS on any platform will find great value from the SAS ODS Graphics procedures. Some experience with these procedures is assumed, but not required.
Pharmaceutical Applications
PH01 : Mapping MRI data to SDTM and ADaMLingling Xie, Eli Lilly and Company
Xiaoqi Li, Eli Lilly and Company
Tuesday, 9:00 AM - 9:50 AM, Location: Sterling 2
Our studies collect Magnetic Resonance Imaging (MRI) data for spine and sacroiliac joints at multiple time points. The raw data is huge and contains more than 600 records from each of the two readers and a possible adjudicator, at each time point. In SDTM, we manage to map the data to XP domain. In ADaM, we map it to a BDS structure that contains more than 600 parameters combining information from test location, structure signal, laterality and slice, with additional five parameters for total scores derived based on five complicated scoring algorithms including how to handle missing data and consolidation between different readers. We utilize programming to make the specification writing and dataset programming more efficient and also preventing manual typing errors. It is a challenging task and we finally work out the SDTM and ADaM datasets that suit our analysis need.
PH02 : ISO 8601 and SAS®: A Practical Approach
Derek Morgan, PAREXEL
Tuesday, 10:30 AM - 10:50 AM, Location: Sterling 2
The ISO 8601 standard for dates and times has long been adopted by regulatory agencies around the world for clinical data. While there are many homemade solutions for working in this standard, SAS has many built-in solutions, from formats and informats that even take care of time zone specification, to the IS8601_CONVERT routine, which painlessly handles durations and intervals. These built-in capabilities, available in SAS 9.2 and above, will streamline your code and improve efficiency and accuracy. This paper also assumes the use of SAS® version 9.2 and above.
PH04 : AIR Binder 2.0: A Dynamic Visualization, Data Analysis and Reporting SAS Application for Preclinical and Clinical ADME Assays, Pharmacokinetics, Metabolite Profiling and Identification
Hao Sun, Covance, Inc.
Kristen Cardinal, Covance, Inc.
Richard Voorman, Covance, Inc.
Tuesday, 11:00 AM - 11:20 AM, Location: Sterling 2
Although regulatory agencies request pharmacometric datasets be submitted as SAS transport files for review, current clinical and preclinical ADME data analysis is handled mostly by non-SAS programs. Previously, we reported a SAS-based application, AIR Binder, for automatic analysis and reporting of a specific cytochrome P450 (CYP) inhibition assay, a key preclinical drug metabolism assay for the prediction of drug-drug interactions (PharmaSUG 2017). Thanks to the significantly improved productivity and efficiency, we expanded the application to a wide range of preclinical and clinical ADME assays for dynamic visualization, data analysis and reporting. Considering the complexity of data structures and presentation styles across these assays, SAS macros were designed and written to be more generalized and object-oriented. Key features include: various styles of ODS panel plots implemented to visualize profiles of metabolites for cross-species comparison and toxicology species selection; enhanced pharmacokinetic parameter analysis and display for metabolites; comprehensive statistical analysis of plasma protein binding data with PROC GLM; customized non-linear fitting for CYP inhibition and induction assays with kinetic parameter calculation and display using PROC NLIN. With the current infrastructure it is convenient to expand the program with the integration of new drug metabolism assay types for data analysis and visualization. Overall, AIR Binder 2.0 dynamically visualized data to efficiently convey information for quick decision making, which enhanced communications within study teams, between CRO and clients, and significantly shortened reporting turnaround time of drug metabolism projects for drug discovery and development.
PH05 : Automated Validation of Complex Clinical Trials Made Easy
Richann Watson, Experis
Josh Horstman, Nested Loop Consulting
Tuesday, 8:00 AM - 8:50 AM, Location: Sterling 2
Validation of analysis datasets and statistical outputs (tables, listings, and figures) for clinical trials is frequently performed by double programming. Part of the validation process involves comparing the results of the two programming efforts. COMPARE procedure output must be carefully reviewed for various problems, some of which can be fairly subtle. In addition, the program logs must be scanned for various errors, warnings, notes, and other information that might render the results suspect. All of this must be performed repeatedly each time the data is refreshed or a specification is changed. In this paper, we describe a complete, end-to-end, automated approach to the entire process that can improve both efficiency and effectiveness.
PH06 : ADQRS: Basic Principles for Building Questionnaire, Rating and Scale Analysis Datasets
Nancy Brucken, InVentiv Health Clinical
Karin Lapann, Shire
Tuesday, 11:30 AM - 11:50 AM, Location: Sterling 2
Questionnaires, ratings and scales (QRS) are frequently used as primary and secondary analysis endpoints in clinical trials. The Submission Data Standards (SDS) QRS sub-team has compiled a considerable library of SDTM supplements defining standards for the collection and storage of QRS data. The ADaM ADQRS sub-team has been formed to develop addenda to these supplements, which will define standards for corresponding analysis datasets. This paper represents the current thinking of the ADQRS sub-team regarding basic principles for building QRS analysis datasets.
Rapid Fire
RF01 : Ignorance is not bliss - understanding SAS applications and product contentsJayanth Iyengar, Data Systems Consultants LLC
Tuesday, 8:00 AM - 8:10 AM, Location: Regency F
Have you ever heard 'SAS Display Manager' in a presentation or discussion, and wondered to yourself, what exactly is SAS Display Manager? Or have you come across 'SAS/SQL' or 'SAS/Macros' in a job description and thought that SQL or Macros are separate package modules of the SAS System. There's a fair amount of confusion and misinformation regarding SAS products, and what they're composed of, even amongst experienced SAS users. In this paper, I attempt to provide a proper understanding of SAS components, and distinguish between SAS applications, and SAS modules. Also, I show how to determine what products are licensed and installed in the SAS windowing environment.
RF02 : Quotes within Quotes: When Single (') and Double (") Quotes are not Enough
Art Carpenter, CA Occidental Consultants
Tuesday, 8:15 AM - 8:25 AM, Location: Regency F
Although it does not happen every day, it is not unusual to need to place a quoted string within another quoted string. Fortunately SAS® recognizes both single and double quote marks and either can be used within the other. This gives us the ability to have two deep quoting. There are situations, however where two kinds of quotes are not enough. Sometimes we need a third layer or more commonly we need to use a macro variable within the layers of quotes. Macro variables can be especially problematic as they will generally not resolve when they are inside single quotes. However this is SAS, and that implies that there are several things going on at once and that there are several ways to solve these types of quoting problems. The primary goal of this paper is to assist the programmer with solutions to the quotes within quotes problem with special emphasis on the presence of macro variables. The various techniques are contrasted as are the likely situations that call for these types of solutions. A secondary goal is to help the reader understand how SAS works with quote marks and how it handles quoted strings. Although we will not go into the gory details, a surface understanding can be useful in a number of situations.
RF03 : No News Is Good News: A Smart Way to Impute Missing Clinical Trial Lab Data
Ming Yan, Eli Lilly
Tuesday, 8:45 AM - 8:55 AM, Location: Regency F
In clinical trials, specific lab microscopic UA, RBC morphology, and special WBC subordinate tests are reported to sponsors by a lab ONLY if an abnormality is observed. The normal results which are not explicitly reported, however, need to be in place in order to compute a percentage of abnormalities for each lab test in the subject population. This macro starts from SDTM LB domain that only stores observed abnormal lab tests and converts it to ADaM with all the normal test results filled out for all patients at all time-point.
RF05 : Macro that can Provide More Information for your Character Variables
Ting Sa, Cincinnati Children's Hospital Medical Center
Tuesday, 9:15 AM - 9:25 AM, Location: Regency F
Sometimes, we want to change character variables to numerical variables in batches. But before doing that, we may need to manually check if those variables contain all the numerical values that can be transformed to numeric type. Or sometimes, we want to make all the dates or datetimes be consistent among the SAS data sets, but if those variables are saved as character variables and we don't have a data dictionary, we will have to manually check the data sets. Also sometimes, we may have those character variables that just contain missing values and we want to delete them to save space. Using the macro in this paper, you can get those information for each character variable without manually checking one by one. The macro can check all the character variables in a library or in some data sets. An html report and excel report will be generated after running the macro to include the information about each checked character variable. Users can use the excel report to further filter the information. Based on the information, the user can decide the things they want to do for the character variables, like if the character variables contain all numerical values, they can be transformed to a numeric variable, if the character variables contain missing values, they can delete those variables etc.
RF06 : Cleaning Messy Data: SAS Techniques to Homogenize Tax Payment Data
Aaron Barker, Iowa Department of Revenue
Tuesday, 9:00 AM - 9:10 AM, Location: Regency F
One challenge frequently encountered by analysts aggregating over large volumes of data is dealing with inconsistencies among data sources. Fortunately, all versions of SAS have a wide array of tools available which a user of any skill level can employ to remedy such problems. This paper uses example tax payment data to demonstrate tools which have been employed to do a novel analysis of citizens' experience with the tax system in Iowa. The first method highlighted is conditional logic on identifier variables to determine how best to interpret the data in the identifier column. This is shown both using IF THEN logic and then again using SELECT and WHEN statements. The second method is determining when identifiers are of different types (e.g. permit numbers vs. Social Security Numbers) and then bringing in external data sources to recode inconsistent identifiers into consistent ones. This is shown using both PROC SQL as well as the traditional MERGE statement. The end result of these procedures is a data set that allows the analyst to see how many times a given individual or business interacts with the Department and provides valuable insight into the preferred payment method of taxpayers by tax type, payment type, and frequency of payments.
RF07 : PROC DOC III: Self-generating Codebooks Using SAS®
Louise Hadden, Abt Associates Inc.
Tuesday, 8:30 AM - 8:40 AM, Location: Regency F
This paper will demonstrate how to use good documentation practices and SAS® to easily produce attractive, camera-ready data codebooks (and accompanying materials such as label statements, format assignment statements, etc.) Four primary steps in the codebook production process will be explored: use of SAS metadata to produce a master documentation spreadsheet for a file; review and modification of the master documentation spreadsheet; import and manipulation of the metadata in the master documentation spreadsheet to self-generate code to be included to generate a codebook; and use of the documentation metadata to self-generate other helpful code such as label statements. Full code for the example shown (using the SASHELP.HEART data base) will be provided upon request.
RF08 : What Are Occurrence Flags Good For Anyway?
Nancy Brucken, InVentiv Health Clinical
Tuesday, 9:30 AM - 9:40 AM, Location: Regency F
The ADaM Structure for Occurrence Data (OCCDS) includes a series of permissible variables known as occurrence flags. These are optional Y/null flags indicating the first occurrence of a particular type of record within a subject. This paper shows how occurrence flags can be used with PROC SQL to easily produce tables summarizing adverse events (AEs) by System Order Class (SOC) and dictionary preferred term.
SAS 101
SA01 : An Introduction to PROC REPORTKirk Paul Lafler, Software Intelligence Corporation
Tuesday, 9:00 AM - 9:50 AM, Location: Regency B
SAS® users often need to create and deliver quality custom reports and specialized output for management, end users, and customers. The SAS System provides users with the REPORT PROCedure, a "canned" Base-SAS procedure, for producing quick and formatted detail and summary results. This presentation is designed for users who have no formal experience working with the REPORT procedure. Attendees learn the basic PROC REPORT syntax using the COLUMN, DEFINE, other optional statements, and procedure options to produce quality output; explore basic syntax to produce basic reports; compute subtotals and totals at the end of a report using a COMPUTE Block; calculate percentages; produce statistics for an analysis variables; apply conditional logic to control summary output rows; and enhance the appearance of output results with basic Output Delivery System (ODS) techniques.
SA02 : If you need these OBS and these VARS, then drop IF, and keep WHERE
Jayanth Iyengar, Data Systems Consultants LLC
Monday, 11:30 AM - 11:50 AM, Location: Regency B
Reading data effectively in the DATA step requires knowing the implications of various methods, and DATA step mechanics; The Observation Loop, and the PDV. The impact is especially pronounced when working with large data sets. Individual techniques for subsetting data have varying levels of efficiency and implications for input/output time. Use of the WHERE statement/option to subset observations consumes less resources than the subsetting IF statement. Also, use of DROP and KEEP to select variables to include/exclude can be efficient depending on how they're used.
SA03 : The Essentials of SAS® Dates and Times
Derek Morgan, PAREXEL
Monday, 10:00 AM - 10:50 AM, Location: Regency B
The first thing you need to know is that SAS® stores dates and times as numbers. However, this is not the only thing that you need to know, and this presentation will give you a solid base for working with dates and times in SAS. It will also introduce you to functions and features that will enable you to manipulate your dates and times with surprising flexibility. This paper will also show you some of the possible pitfalls with dates (and times and datetimes) in your SAS code, and how to avoid them. We'll show you how SAS handles dates and times through examples, including the ISO 8601 formats and informats, how to use dates and times in TITLE and/or FOOTNOTE statements, and close with a brief discussion of Excel conversions.
SA04 : PROC SORT (then and) NOW
Derek Morgan, PAREXEL
Tuesday, 2:00 PM - 2:20 PM, Location: Regency B
With the advent of big data, faster sorting methods have reduced the use of the old staple, PROC SORT. This paper brings some of the useful features added to PROC SORT to light; it's not as much of a dinosaur as you might think.
SA05 : Working with Datetime Variable from Stata
Haiyin Liu, University of Michigan
Wei Ai, University of Michigan
Monday, 11:00 AM - 11:20 AM, Location: Regency B
Most SAS® users have to transfer Stata data files into SAS frequently. However, you must be careful when converting Stata Datetime to SAS Datetime. In this paper, we examine how SAS or Stata stores Datetime variables differently. We propose a correction function to accurately transfer Stata Datetime into SAS.
SA06 : Merge with Caution: How to Avoid Common Problems when Combining SAS Datasets
Josh Horstman, Nested Loop Consulting
Monday, 3:00 PM - 3:50 PM, Location: Regency B
Although merging is one of the most frequently performed operations when manipulating SAS datasets, there are many problems which can occur, some of which can be rather subtle. This paper illustrates common merge issues using examples. We examine what went wrong by walking step-by-step through the execution of each example. We look under the hood at the internal workings of the DATA step and the program data vector (PDV) to understand exactly what is going wrong and how to fix it. Finally, we discuss best coding practices to avoid these problems in the first place.
SA07 : Beyond IF THEN ELSE: Techniques for Conditional Execution of SAS® Code
Josh Horstman, Nested Loop Consulting
Monday, 8:00 AM - 8:50 AM, Location: Regency B
Nearly every SAS® program includes logic that causes certain code to be executed only when specific conditions are met. This is commonly done using the IF&THEN&ELSE syntax. In this paper, we will explore various ways to construct conditional SAS logic, including some that may provide advantages over the IF statement. Topics will include the SELECT statement, the IFC and IFN functions, the CHOOSE and WHICH families of functions, as well as some more esoteric methods. We'll also make sure we understand the difference between a regular IF and the %IF macro statement.
SA09 : Parsing Useful Data Out of Unusual Formats Using SAS®
Andrew Kuligowski, HSN
Monday, 9:00 AM - 9:50 AM, Location: Regency B
Most "Introduction to Programming" courses will include a section on reading external data; the first assumption they make will be that the data are stored in some sort of documented and consistent format. Fortunately, in the "real world", a lot of the data we deal with has the same basic assumption of occurring in a documented, consistent format - a lot of it, but not all of it. This presentation will address some techniques that can be used when we are not dealing with cleanly formatted data, when the data we want is in a less-than-ideal format, perhaps intermingled or seemingly buried with unnecessary clutter. It will discuss the principles of using SAS® to parse a file to extract useful data from a normally unusable source. This will be accomplished by citing examples of unusual data sources and the SAS Code used to parse it
SA10 : The Building Blocks of SAS® Datasets - S-M-U (Set, Merge, and Update)
Andrew Kuligowski, HSN
Monday, 2:00 PM - 2:50 PM, Location: Regency B
S-M-U. Some people will see these three letters and immediately think of the abbreviation for a private university and associated football team in Texas. Others might treat them as a three-letter word, and recall a whimsical cartoon character created by Al Capp many years ago. However, in the world of the SAS® user, these three letters represent the building blocks for processing SAS datasets through the SAS DATA step. S, M, and U are first letters in the words SET, MERGE, and UPDATE - the 3 commands used to introduce SAS data into a DATA step. This presentation will discuss the syntax for the SET, MERGE, and UPDATE commands. It will compare and contrast these 3 commands. Finally, it will provide appropriate uses for each command, along with basic examples that will illustrate the main points of the presentation.
SA11 : Before You Get Started: A Macro Language Preview in Three Parts
Art Carpenter, CA Occidental Consultants
Monday, 4:00 PM - 4:50 PM, Location: Regency B
Using the macro language adds a layer of complexity to SAS® programming that many programmers are reluctant to tackle. The macro language is not intuitive, and some of its syntax and logic runs counter to similar operations in the DATA step. This makes the transfer of DATA step and PROC step knowledge difficult when first learning the macro language. So why should one make the effort to learn a complex counterintuitive language? Before you start to learn the syntax; where to put the semicolon, and how to use the ampersand and percent sign; you need to have a basic understanding of why you want to learn the language in the first place. It will also help if you know a bit about how the language thinks. This overview provides the background that will enable you to understand the way that the macro language operates. This will allow you to avoid some of the common mistakes made by novice macro language programmers. First things first - before you get started with the learning process, you should understand these basic concepts.
SA12 : Writing Code With Your Data: Basics of Data-Driven Programming Techniques
Joe Matise, NORC
Tuesday, 8:00 AM - 8:50 AM, Location: Regency B
In this paper aimed at SAS® programmers who have limited experience with data step programming, we discuss the basics of Data-Driven Programming, first by defining Data-Driven Programming, and then by showing several easy to learn techniques to get a novice or intermediate programmer started using Data-Driven Programming in their own work. We discuss using PROC SQL SELECT INTO to push information into macro variables; PROC CONTENTS and the dictionary tables to query metadata; using an external file to drive logic; and generating and applying formats and labels automatically. Prior to reading this paper, programmers should be familiar with the basics of the data step; should be able to import data from external files; basic understanding of formats and variable labels; and should be aware of both what a macro variable is and what a macro is. Knowledge of macro programming is not a prerequisite for understanding this paper's concepts.
SA13 : Make That Report Look Great Using the Versatile PROC TABULATE
Ben Cochran, The Bedford Group
Tuesday, 10:00 AM - 10:50 AM, Location: Regency B
Several years ago, one of my clients was in the business of selling reports to hospitals. He used PROC TABULATE to generate part of these reports. He loved the way this procedure 'crunched the numbers', but not the way the final reports looked. He said he would go broke if he had to sell naked PROC TABULATE output. So, he wrote his own routine to take TABULATE output and render it through Crystal Reports. That was before SAS came out with the Output Delivery System (ODS). Once he got his hands on SAS ODS, he kissed his Crystal Reports license good-bye. This paper is all about using PROC TABULATE along with ODS to generate fantastic reports. If you want to generate BIG money reports with PROC TABULATE, this presentation is for you.
SA14 : The Battle of the Titans (Part II): PROC TABULATE versus PROC REPORT
Kirk Paul Lafler, Software Intelligence Corporation
Ben Cochran, The Bedford Group
Ray Pass, Retired - and loving it!
Tuesday, 11:00 AM - 11:50 AM, Location: Regency B
Should I use PROC REPORT or PROC TABULATE to produce that report? Which one will give me the control and flexibility to produce the report exactly the way I want it to look? Which one is easier to use? Which one is more powerful? WHICH ONE IS BETTER? If you have these and other questions about the pros and cons of the REPORT and TABULATE procedures, this presentation is for you. We will discuss, using real-life report scenarios, the strengths (and even a few weaknesses) of the two most powerful reporting procedures in SAS® (as we see it). We will provide you with the wisdom you need to make that sometimes difficult decision about which procedure to use to get the report you really want and need.
SA15 : A Walk through Time: Growing Your SAS Career
Art Carpenter, CA Occidental Consultants
Tuesday, 2:30 PM - 2:50 PM, Location: Regency B
The measurement of time is an integral component of most of our data sets. We note the date of the patient's visit and both the date and the time of the administration of a test. The accurate calculation of time intervals, such as the time between drug delivery and onset of an adverse event, becomes critically important. Fortunately SAS ships with a diverse set of tools for working with dates and times. As SAS programmers and as users of SAS Software, time measurement is important, not only in the successful analysis of our data, but also as a gauge in the growth of our career. To be successful we must be able to measure and work with time values, but time is also a measure of the progression of our career. This paper introduces a number of measures of time, a variety of analytic techniques applied to date and time values, interwoven with a discussion of the Grand Canyon, the Great Wall of China, and the continued growth of our professional knowledge.
Statistics / Advanced Analytics
AA01 : Using SAS to Compare Two Estimation Methods on the Same Outcome: Example from First Trimester Pregnancy WeightsBrandy Sinco, University of Michigan
Edith Kieffer, University of Michigan
Kathleen Welch, University of Michigan
Diana Welmerink Bolton, University of Michigan
Monday, 10:00 AM - 10:50 AM, Location: Sterling 2
Background. Accurate pre-pregnancy weight is important for weight gain recommendations during pregnancy. Weight gain is linear during first trimester. Objective. Use a linear mixed model (LMM), with random slope and intercept, to predict weight at the end of the first trimester (week 13), for 276 women, from a study in which many pre-pregnancy weights were self-reported and likely inaccurate. Compare the predicted weights at week 13 from the LMM to weights computed by adding a constant per week for 13 weeks. Methods. For a sub-sample in which the weights at week 13 were known, error variances between predicted and self-reported weights were compared with a Proc Mixed random effects model and then by using Proc IML to conduct a likelihood ratio test for a variance comparison. Graphically, Proc SGPlot produced box plots and histograms to graphically display the variances between the two methods. Next, indicators were created for weight categories (under-weight, normal, over-weight, obese) and excessive weight gain. Accuracy of categories can be compared with Proc Freq by comparing the 95% confidence intervals for Cohen's kappas. Error rates can be compared with Proc Freq by using the McNemar test to compare error rates between the projected categories by the two prediction methods. Further, Proc Logistic can be used to evaluate accuracy by comparing the areas under the ROC curves between models using predicted and self-reported pre-pregancy weights. Results. The likelihood ratio test, kappa confidence interval, and McNemar's test indicated that weight prediction from the LMM had lower variance and error rates.
AA02 : Logistic Model Selection with SAS® PROC's LOGISTIC, HPLOGISTIC, HPGENSELECT
Bruce Lund, Independent Consultant
Tuesday, 2:30 PM - 3:20 PM, Location: Sterling 6
In marketing or credit risk a model with binary target is often fit with logistic regression. In this setting the sample size is very large while the number of predictors may be approximately 100. But many of these predictors may be classification (CLASS) variables. This paper discusses the variable selection procedures that are offered by PROC LOGISTIC, PROC HPLOGISTIC, and PROC HPGENSELECT. These selection procedures include the best subsets approach of PROC LOGISTIC, section by best SBC of PROC HPLOGISTIC, and selection by LASSO of PROC HPGENSELECT. The use of classification variables in connection with these selection procedures is discussed. Simulations are run to compare these methods on predictive accuracy and the handling of extreme multicollinearity.
AA04 : Claim Analytics
Mei Najim, Gallagher Bassett Services, Inc.
Monday, 10:00 AM - 10:50 AM, Location: Sterling 8
Claim Analytics has been evolving in the insurance industry for the past two decades. This paper is organized in four parts as follows: 1. This presentation will first provide an overview of Claim Analytics. 2. Then, a common high-level claim analytics technical process with large data sets will be introduced. The steps of this process include data acquisition, data preparation, variable creation, variable selection, model building (a.k.a.: model fitting), model validation, and model testing, etc. 3. A Case Study: Over the past couple of decades, in the property & casualty insurance industry, around 20% of closed claims have settled with litigation propensity representing 70-80% of total dollars paid. Apparently, the litigation is one of the main claim severity drivers. In this case study, we are going to introduce the Worker's Compensation (WC) Litigation Propensity Predictive Model at Gallagher Bassett which is designed to score the open WC claims to predict the open claims litigation propensity in the future. The data including WC Book of Business over a few thousand clients' data has been used and explored to build the model. Multiple cutting-edge statistical and machine learning techniques (GLM Logistic Regression, Decision Tree, Neural Network, Gradient Boosting, Random Forest, etc.) with incorporating WC business knowledge are utilized to discover and derive complex trends and patterns across the WC book of business data to build the model. 4. Conclusion
AA05 : Unconventional Statistical Models with the NLMIXED Procedure
Robin High, University of Nebraska Medical Center
Monday, 8:00 AM - 8:50 AM, Location: Sterling 8
SAS®/Stat and SAS/ETS software have several procedures which estimate parameters from generalized linear models for a variety of continuous and discrete distributions. The GENMOD, COUNTREG, GLIMMIX, LIFEREG, and FMM procedures, among others, offer a flexible range of unconventional types of data analysis options, including zero-inflated, truncated, and censored response data. The COUNTREG procedure also includes the Conway-Maxwell Poisson distribution and the negative binomial with two variance functions as choices. The FMM procedure includes the generalized Poisson distribution as well as the ability to work with several truncated and zero-inflated distributions for both discrete and continuous data. This paper demonstrates how the NLMIXED procedure can be utilized to duplicate their results in order to first gain insight into the complex computational details and also, the capability to enter programming statements into NLMIXED can then be expanded to work with data from even more unconventional data analysis situations.
AA06 : Multiple Imputation of Family Income Data in the 2015 Behavioral Risk Factor Surveillance System
Jia Li, NIOSH
Aaron Sussell, NIOSH
Monday, 11:30 AM - 11:50 AM, Location: Sterling 2
Multiple imputation methods are increasingly used to handle missing data in statistical analyses of observational studies to reduce bias and improve precision. SAS/STAT® PROC MI can be used to impute continuous or categorical variables with a monotone or arbitrary missing pattern. This study used the fully conditional specification (FCS) method to impute the family income variable in the 2015 Behavioral Risk Factor Surveillance System (BRFSS) data. BRFSS is a health survey that collects state data about U.S. residents regarding their health-related risk behaviors, chronic health conditions, and use of preventive services. In this paper, the study population was restricted to currently employed respondents (age>=18) from the 25 states that collected industry and occupation information. Of the total 87,483 respondents, 11% were missing income information. To impute the missing income data, all variables in the survey that are correlated with either income or missingness of income (N=28) were selected as covariates. BRFSS sample design variables that represent stratification and unequal sampling probabilities were also included in the imputation model to improve validity. The FCS method was chosen due to an arbitrary missing pattern and mixed data types among income and all covariates. Logistic regression and discriminant function options were used for imputing binary and ordinal/nominal variables respectively. Results show a significantly different distribution in imputed income values compared to the observed values, suggesting that using the traditional complete case analysis approach to analyze BRFSS income data may lead to biased results.
AA07 : GREMOVE, Reassign, and let's GMAP! A SAS Trick for Generating Contiguous Map Boundaries for Market-Level Research
Chad Cogan, Arbor Research Collaborative for Health
Jeffrey Pearson, Arbor Research Collaborative for Health
Purna Mukhopadhyay, Arbor Research Collaborative for Health
Charles Gaber, Arbor Research Collaborative for Health
Marc Turenne, Arbor Research Collaborative for Health
Tuesday, 10:00 AM - 10:20 AM, Location: Sterling 6
In health services research, accurate health care market definitions are crucial for assessing the potential market-level consequences of policy changes. Political units of geography (e.g. counties) are generally not sufficient for capturing the service area of a provider. Alternatively, researchers may generate customized boundaries using data-driven approaches based on patient flow only to find that their newly defined areas are not contiguous. We use a novel approach to correct for the lack of contiguity using the information produced by the GREMOVE procedure. GREMOVE is often used along with the GMAP procedure when there is a need to generate customized boundaries on a map by removing the internal boundaries of smaller units of geography. However, SAS users may not be aware of the logic used by PROC GREMOVE to assign segment values and the underlying data that goes into the maps. We first examine the logic used by PROC GREMOVE, and the map output dataset it produces. We identify some potential limitations of GREMOVE along with some alternative uses, which we demonstrate using basic polygons. We then look at customized map boundaries produced using a data-driven approach to combine zip code tabulation areas (ZCTAs) based on patient flow and show how GREMOVE identifies non-contiguous segments in a newly defined area. We then use a SAS trick to modify the GREMOVE logic for segment assignment, and generate new contiguous boundaries.
AA08 : Correcting for Selection Bias in a Clinical Trial
Shana Kelly, Spectrum Health
Monday, 11:00 AM - 11:20 AM, Location: Sterling 2
Selection bias occurs when data does not represent the population intended, and balance across all potential confounding factors based on randomization did not happen. Selection bias can cause misleading results when doing statistical analysis, and should be corrected for. This paper explores a few alternative techniques to correct for a disparity between the various comparison groups in a clinical trial. Food Prescription is a small clinical trial conducted by Spectrum Health to get impoverished individuals in the Grand Rapids community with a chronic disease such as diabetes to consume more fresh fruits and vegetables. Health outcomes are compared between the treatment and control groups after taking into account all covariates. The procedures shown are produced using SAS® Enterprise Guide 7.1.
AA11 : Dimensionality Reduction using Hadamard, Discrete Cosine and Discrete Fourier Transforms in SAS
Mohsen Asghari, Computer Engineering and Computer Science Department, University Of louisville
Aliasghar Shahrjooihaghighi, Computer Engineering and Computer Science Department, University Of louisville
Ahmad Desoky, Computer Engineering and Computer Science Department, University Of louisville
Tuesday, 9:00 AM - 9:50 AM, Location: Sterling 6
Dimensionality reduction studies various techniques to transform data in the most compact and efficient manner that allows modeling, analyzing, and predicting information with insignificant errors. Principle component analysis (PCA) is a method for reducing the dimensionality by decreasing the number of variables and selecting a smaller subset of uncorrelated transformed variables called principal components. PCA is data dependent and requires the computation of the correlation matrix of input data as well as the Singular Value Decomposition (SVD) of that matrix. Hadamard, Discrete Cosine Transform (DCT), Discrete Fourier Transform (DFT) are orthogonal transformations that are not data dependent and reduce the dimensionality by decreasing the correlation of the transform components. In this paper, we implemented Hadamard, DCT, and DFT in SAS on a standard dataset. Also, we compared the results of these transformations and PCA technique.
AA12 : Nothing to SNF At: Evaluating an intervention to reduce skilled nursing home (SNF) length of stay
Andrea Frazier, Presence Health
Monday, 9:30 AM - 9:50 AM, Location: Sterling 2
Length of stay (LOS) in skilled nursing facilities (SNF, pronounced "sniff") is a driver of high health care costs, particularly for Medicare patients. This study used survival analysis techniques to examine the effect of a simple intervention (educating providers and case managers on expected LOS for each patient) on SNF LOS. We'll also discuss techniques used to abate particular data collection challenges in this study.
AA13 : How Can an NBA Player Be Clutch?: A Logistic Regression Analysis
Logan Edmonds, Oklahoma State University
Tuesday, 10:30 AM - 10:50 AM, Location: Sterling 6
Many NBA players are known as clutch shooters that put fear in the opposing team as the clock is winding down. Michael Jordan is remembered as much for his game ending shots as the high-flying dunks. We know that these players can make the shot, but are there certain situations that contribute to the likelihood the game winner will go in? Using PROC LOGISTIC, this paper determines the key components of made shots during the crucial last two minutes of an NBA game. Using shot log data from the 2014-2015 NBA season, over 120,000 shots were filtered to those occurring in the last two minutes of regulation or overtime. Many things can effect whether a player will make these high pressure shots. Specifically, the effects of home court, back-to-back game fatigue, shot distance and dribble time before the shot are considered as possible predictors of clutch shooting. Assessment of only final game shots includes discussion of how teams could potentially use these variables in designing end of game plays. Finally, this analysis seeks a quantitative analysis of how players considered by pundits as 'clutch' shooters live up to their billing.
AA14 : Multicollinearity: What Is It and What Can We Do About It?
Deanna Schreiber-Gregory, Henry M Jackson Foundation for the Advancement of Military Medicine
Monday, 11:00 AM - 11:50 AM, Location: Sterling 8
Multicollinearity can be briefly described as the phenomenon in which two or more identified predictor variables in a multiple regression model are highly correlated. The presence of this phenomenon can have a negative impact on the analysis as a whole and can severely limit the conclusions of the research study. This paper will review and provide examples of the different ways in which the multicollinearity can affect a research project, how to detect multicollinearity, and how to reduce it once it is found. In order to demonstrate the effects of multicollinearity and how to combat it, this paper will explore the proposed techniques through utilization of the Behavioral Risk Factor Surveillance System dataset. This paper is intended for any level of SAS® user. This paper is also written to an audience with a background in behavioral science and/or statistics.
AA15 : Oscars 2017 - Text Mining & Sentimental Analysis
Karthik Sripathi, Oklahoma State University
Tuesday, 11:00 AM - 11:20 AM, Location: Sterling 6
OSCARS 2017 - TEXT MINING & SENTIMENTAL ANALYSIS Author: Karthik Sripathi, This email address is being protected from spambots. You need JavaScript enabled to view it. University of Scholar: Oklahoma State University, Stillwater, Oklahoma, USA Mentor(s): Dr. Goutam Chakraborthy, Dr. Miriam McGaugh Abstract It has always been fascinating to realize how the magnitude of award shows have been increasing year after year. It is the enormously positive response of audience that keeps the stage shows to envisage. We know that sentiments of people play a crucial role in deciding the prospects of a particular event. This paper summarizes the sentiments of individuals towards one of the most awards popular show, OSCARS. It provides crucial insights on how people sentiments could determine the success or failure of a show. The paper involves text mining of people's reactions towards the 2017 Oscars in general and a sentiment analysis of regarding the best picture mix up using SAS® Sentiment Analysis Studio. Social media has evolved as a platform where we can directly evaluate people's liking or disliking to an event. Understanding their opinions on a social media platform will open us to an unbiased environment. There are no filters to the way people react to an event and the information that we can tap in to from such a platform gives us different perspectives. By analyzing this information, improvements can be suggested for the event in the future. We also get a sense of how people react to an unexpected event at large award show productions. This paper aims to determine the success of an awards show based on individual sentiments before the show, during and after the show. This information will give a better picture of how to handle any unwanted circumstances during the event. We can conclude from the 2017 Oscars that the sentiments of the people were more positive or neutral indicating that the excitement about the show will over shadow any unwanted events. This analysis can be extended to build a text predictive model wherein there is a scope of predicting the sentiments towards unwanted events and will help us to set the stage better and be prepared for potential problems.
AA16 : Text and Sentiment Analysis of customer tweets of Nokia using SAS® Enterprise Miner and SAS® Sentiment Analysis Studio
Vaibhav Vanamala, Oklahoma State University
Tuesday, 11:30 AM - 11:50 AM, Location: Sterling 6
The launch of new Nokia phones has produced some significant and trending news throughout the globe. There has been a lot of hype and buzz going around the release of these Nokia phones in the mobile market at Mobile World Conference 2017. As a result, there has been a significant social media response after the launch of the Nokia phones. Social media provides a platform for millions of people to share or express their unbiased opinions. In this paper, my aim is to analyze the overall sentiment prevailing in the social media posts related to the release of Nokia phones. In order to achieve this, I have extracted real time data from twitter using google twitter API from 26th February ,2017 to 26th March ,2017 which resulted in 38000 of tweets and re-tweets. I have used SAS Enterprise Miner and SAS Sentimental Analysis Studio to evaluate key questions regarding the launch of Nokia phones such as understanding the needs and expectations of customers, perception of people about the launch of Nokia phones, how to increase the revenue of Nokia company by meeting customer expectations and targeted marketing. This paper helps Nokia manufacturers to improve the quality of their phones according to the expectations and needs of customers.
AA17 : Tornado Inflicted Damages Pattern
Vasudev Sharma, Oklahoma State University
Tuesday, 2:00 PM - 2:20 PM, Location: Sterling 6
On average, about a thousand tornadoes hit the United States every year. Three out of every four tornadoes in the world occur in the United States. They damage life and property in their path and they often hit with very little, sometimes no, warning. Tornadoes cause approximately 70 fatalities and 1,500 injuries in US every year. Once a tornado destroyed an entire motel in Oklahoma, and the motel's sign was later recovered in Arkansas. Tornadoes most frequently hit "Tornado Alley" which is mainly made up of Nebraska, South Dakota, Oklahoma, Texas, and Kansas. A tornado extends from a thunderstorm to the ground and appears as a funnel shaped cloud rotating with winds which can reach 300 miles per hour and can exceed a 1 mile radius. Tornadoes can travel very long distances making them very devastating. Since the ability to detect the intensity and direction of tornadoes prior to formation is limited, predicting the likelihood a tornado will form with accuracy can save many lives, as well as property. The purpose of the study is to find a pattern in the fatalities, injuries and property loss caused by tornadoes. The tools used are Base SAS, SAS Enterprise Miner, R, and Tableau. The results include statistical analysis, descriptive analysis, predictive analysis and visualizations from these tools.
AA18 : Agricultural Trip Generation - Linking Spatial Data and Travel Demand Modeling using SAS
Alan Dybing, North Dakota State University - Upper Great Plains Transportation Institute
Tuesday, 8:00 AM - 8:50 AM, Location: Sterling 6
Software Used: SAS 9.4 TS Level 1M2, X64_8PRO Platform on Windows 10 Pro Audience Level: The SAS techniques described in the paper can be replicated by users with general mastery of data step techniques. Data linkages from GIS and Cube Voyager require advanced knowledge of these softwares. The four-step travel demand modeling procedure (TDM) is commonly used to estimate and forecast traffic volumes for use in transportation planning. The trip generation step of the four-step model seeks to estimate the trip attractions and productions representing individual trips originating or terminating with a geographic boundary. Freight trip generation in rural areas within the great plains primarily result from agricultural production and marketing. This paper outlines a procedure for utilizing satellite imagery to estimate agricultural truck trip production at the township level linking ArcGIS and SAS to generate trip generation tables for use for TDM purposes. The National Agricultural Statistics Service (NASS) Cropland Data Layer (CDL) is a data sourced produced using a combination of satellite imagery, ground-truth surveys and data mining techniques resulting in a digital raster map providing land use type at a 30-meter (0.25 acre) resolution. At the township level, the raster data was converted to polygons and acreages by crop type were calculated using ArcMap and output to a shapefile. The database file was imported for SAS and using a combination of NASS county-level crop yield and fertilizer usage rate estimates, the total truck trips resulting from agricultural production activities were estimated by township. SAS was used to convert these estimate to a specific file format utilized by Cube Voyager for use in development of a truck TDM.
AA19-SAS : Getting Started with Multilevel Modeling
Mike Patetta, SAS
Monday, 2:00 PM - 2:50 PM, Location: Regency A
In this presentation you will learn the basics of working with nested data, such as students within classes, customers within households, or patients within clinics through the use of multilevel models. Multilevel models can accommodate correlation among nested units through random intercepts and slopes, and generalize easily to 2, 3, or more levels of nesting. These models represent a statistically efficient and powerful way to test your key hypotheses while accounting for the hierarchical nesting of the design. The GLIMMIX procedure is used to demonstrate analyses in SAS.
AA20-SAS : Power and Sample Size Computations
John Castelloe, SAS
Monday, 3:00 PM - 5:00 PM, Location: Regency A
Sample size determination and power computations are an important aspect of study planning; they help produce studies with useful results for minimum resources. Application areas are diverse, including clinical trials, marketing, and manufacturing. This tutorial presents numerous examples using the POWER and GLMPOWER procedures in SAS/STAT® software to illustrate the components of a successful power and sample size analysis. The practitioner must specify the design and planned data analysis and choose among strategies for postulating effects and variability. The examples cover proportion tests, t tests, confidence intervals, equivalence and noninferiority, survival analyses, logistic regression, and repeated measures. The logistic regression example demonstrates the new CUSTOM statement in the POWER procedure that supports extensions of power analyses involving the chi-square, F, t, normal, and correlation coefficient distributions. Attendees will learn how to compute power and sample size, perform sensitivity analyses for factors such as variability and Type I error rate, and produce customized tables and graphs using the POWER and GLMPOWER procedures and the %POWTABLE macro in SAS/STAT software.
AA99 : Tips and Best Practices Using SAS® in the Analytical Data Life Cycle
Tho Nguyen, Teradata
Paul Segal, Teradata
Monday, 9:00 AM - 9:50 AM, Location: Sterling 8
Come learn some tips and best practices with SAS/ACCESS, SAS formats, data quality, DS2, model development, model scoring, Hadoop and Visual Analytics - all integrated with the data warehouse.
System Architecture and Administration
SY01 : Using Agile Analytics for Data DiscoveryBob Matsey, Teradata
Monday, 2:00 PM - 2:50 PM, Location: Sterling 8
Companies are looking for an Agile/Self Service solutions to run their SAS Analytics without any delays from IT in a massively parallel Teradata database environment. They are looking for a seamless and open architecture for the business users that allows them to use whatever tools they would like to, while being able to manage and to load all their various types of data for discovery from many different environments. They need to ability to quickly explore, prototype, and test new theories while allowing them to succeed or fail fast all in a self-serve environment that does not depend on IT all the time. This session is intended for all skill levels and background.
SY04-SAS : The Future of the SAS Platform
Amy Peters, SAS
Monday, 3:00 PM - 3:50 PM, Location: Sterling 8
SAS has delivered integrated capabilities that organizations use to access, explore, transform, analyze, and govern data, delivering trusted insights on time and at scale. Over the last few years, recent trends in business and technology such as cloud, open APIs, microservices and new and emerging use cases have driven a need for the SAS platform to undergo an evolutionary transformation. This presentation will describe that transformation, how it came about and where the SAS platform is headed in the future.
Tools of the Trade
TT01 : Generating Reliable Population Rates Using SAS® SoftwareJack Shoemaker, MDwise
Tuesday, 8:30 AM - 9:20 AM, Location: Regency E
The business of health insurance has always been to manage medical costs so that they don't exceed premium revenue. Monitoring and knowing about these patient populations will mean the difference between success and financial ruin. At the core of this monitoring are population rates like per member per month costs and utilization per thousand. This paper describes techniques using SAS® software that will generate these population rates for an arbitrary set of population dimensions. Keeping the denominators in sync with the numerators is key for implementing trustworthy drill-down applications involving population rates.
TT02 : Check Please: An Automated Approach to Log Checking
Richann Watson, Experis
Tuesday, 9:30 AM - 9:50 AM, Location: Regency E
In the pharmaceutical industry, we find ourselves having to re-run our programs repeatedly for each deliverable. These programs can be run individually in an interactive SAS® session, which enables us to review the logs as we execute the programs. We could run the individual programs in batch and open each individual log to review for unwanted log messages, such as ERROR, WARNING, uninitialized, have been converted to, and so on. Both of these approaches are fine if there are only a handful of programs to execute. But what do you do if you have hundreds of programs that need to be re-run? Do you want to open every single one of the programs and search for unwanted messages? This manual approach could take hours and is prone to accidental oversight. This paper discusses a macro that searches a specified directory and checks either all the logs in the directory, only logs with a specific naming convention, or only the files listed. The macro then produces a report that lists all the files checked and indicates whether issues were found.
TT03 : Arbovirus, Varicella and More: Using SAS® for Reconciliation of Disease Counts
Misty Johnson, State of WI-DHS
Tuesday, 10:00 AM - 10:50 AM, Location: Regency E
Communicable disease surveillance by the Centers for Disease Control (CDC) is dependent upon reporting by public health jurisdictions in the United States. Communicable disease reporting in the State of Wisconsin is facilitated by an electronic surveillance system that communicates directly with the CDC. Disease reports are often updated with more test results and information; each addition of information to a disease case results in a new message generated by the surveillance system. The State of Wisconsin reconciles disease reporting between cases acknowledged by the program epidemiologist and disease reports sent to the CDC by the surveillance system on an annual basis. A SAS 9.4 program utilizing simple DATA steps with by-group processing is used to easily determine which reports were processed and counted by the CDC. PROC TABULATE, used in conjunction with the Output Delivery System Portable Document Format (ODS PDF) destination makes it easy to produce line-lists and simple case counts by disease that are used by the epidemiologist to verify their records and counts are complete and in agreement with the CDC. This paper is meant for all levels of SAS programmers and demonstrates basic coding techniques to perform simple data cleaning and validation, followed by removal of redundant reports and ending with the creation of five different pairs of output reports. Intermediate coding techniques include the use of macro variables to assign input and output file names and paths, Access to PC Files to import an Excel® file and printing output to file using the ODS PDF destination.
TT04 : Code Like It Matters: Writing Code That's Readable and Shareable
Paul Kaefer, UnitedHealthcare
Tuesday, 11:00 AM - 11:50 AM, Location: Regency E
Coming from a background in computer programming to the world of SAS yields interesting insights and revelations. There are many SAS programmers who are consultants or work individually, sometimes as the sole maintainer of their code. Since SAS code is designed for tasks like data processing and analytics, SAS developers working on teams may use different strategies for collaboration than those used in traditional software engineering. Whether a programmer works individually, on a team, or on a project basis (delivering code and moving on to the next project), there are a number of best practices that can be leveraged to improve their SAS code. These practices make it easier to read, maintain, and understand/remember why the code is written the way it is. This paper presents a number of best practices, with examples and suggestions for usage. The reader is encouraged not to apply all the suggestions at once, but to consider them and how they may improve their work or the dynamic of their team.
TT06 : From Device Text Data to a Quality Dataset
Laurie Smith, Cincinnati Children's Hospital Medical Center
Tuesday, 3:00 PM - 3:20 PM, Location: Regency E
Data quality in research is important. It may be necessary for data from a device to be used in a research project. Often it is read from an external text file and entered onto a CRF. Then the data is read from the CRF and entered it into a database. This process introduces many opportunities for data quality to be compromised. The quality of device data used in a study can be greatly improved if the data can be read directly from a device's output file directly into a dataset. If the device outputs results into a text file that can be saved electronically, SAS® can be used to read the data needed from the results and save the data directly into a dataset. In addition to improving data quality, data collection and monitoring time can also be reduced by taking advantage of these electronic files as opposed to recapturing this data on a CRF.
TT07 : Proc Transpose Cookbook
Doug Zirbel, Wells Fargo and Co.
Tuesday, 3:30 PM - 3:50 PM, Location: Regency E
Proc TRANSPOSE rearranges columns and rows of SAS datasets, but its documentation and behavior can be difficult to comprehend. For common input situations, this paper will show a variety of "what-you-have" and "what-you-want", plus code and an easy reference card.
TT08 : Get Smart! Eliminate Kaos and Stay in Control - Creating a Complex Directory Structure with the DLCREATEDIR Statement, SAS® Macro Language, and Control Tables
Louise Hadden, Abt Associates Inc.
Tuesday, 2:30 PM - 2:50 PM, Location: Regency E
An organized directory structure is an essential cornerstone of data analytic development. Those programmers who are involved in repetitive processing of any sort control their software and data quality with directory structures that can be easily replicated for different time periods, different drug trials, etc. Practitioners (including the author) often use folder and subfolder templates or shells to create identical complex folder structures for new date spans of data or projects, or use manual processing or external code submitted from within a SAS® process to run a series of MKDIR and CHDIR commands from a command prompt to create logical folders. Desired changes have to be made manually, offering opportunities for human error. Since the advent of the DLCREATEDIR system option in SAS version 9.3, practitioners can create single folders if they do not exist from within a SAS process. Troy Hughes describes a process using SAS macro language, the DLCREATEDIR option, and control tables to facilitate and document the logical folder creation process. This paper describes a technique wrapping another layer of macro processing which isolates and expands the recursive logical folder assignment process to create a complex, hierarchical folder structure used by the author for a project requiring monthly data intake, processing, quality control and delivery of thousands of files. Analysis of the prior month's folder structure to inform development of control tables is discussed.
TT09 : An Array of Possibilities: Manipulating Longitudinal Survey Data with Arrays
Lakhpreet Gill, Mathematica Policy Research
Tuesday, 8:00 AM - 8:20 AM, Location: Regency E
SAS arrays are extremely well-suited for handling longitudinal survey data, which are data collected at multiple time points for a sample population. Oftentimes, the study time period for observing individuals in the sample population may vary based on when respondents entered the study. The research question itself can also contribute to differences in the study time period. For this paper, the Health and Retirement Study is relied on to identify differing baseline and follow-up measurements as conditioned on age and response status. It demonstrates how to dynamically create an array index based on respondents entry and exit from the study, and how to implement the index in a variety of ways to extract needed information from the survey data. Specifically, programming topics covered are how to: Create an index based on respondent specific baseline and analysis waves Manipulate the index to select information for later waves Troubleshoot "out of bounds" cases Integrate the subscript with the index to populate time series variables Merge wide and long data to "look ahead" and "look across" This topic was presented as a powerpoint presentation to the Michigan SAS Users Group. It has since been used within Mathematica Policy Research for training research assistants who have been working for at least one year. The primary audience therefore is Intermediate SAS Users, as it assumes some basic knowledge of using arrays. SAS 9.4 in Enterprise Guide 7.1 was used.
TT10 : Fully Automated Updating of Arbitrarily Complex Excel Workbooks
David Oesper, Lands' End
Tuesday, 2:00 PM - 2:20 PM, Location: Regency E
You can generate some very sophisticated Excel workbooks using ODS EXCEL and ODS TAGSETS.EXCELXP, but sometimes you'll want to create your Excel workbook in Microsoft Excel-or someone else will provide it to you. I'll show you how you can use SAS to dynamically update (and distribute) any existing Excel workbook with no manual intervention required. You'll need only Base SAS 9.4 TS1M2 or later and SAS/ACCESS to PC Files to use this approach. Examples in both the Linux and Windows operating environments will be presented.
e-Posters
PO01 : Red Rover, Red Rover, Send Data Right Over: Exploring External Geographic Data Sources with SAS®Louise Hadden, Abt Associates Inc.
Monday, 10:00 AM - 10:20 AM, Location: Regency C
The intrepid Mars Rovers have inspired awe and Curiousity - and dreams of mapping Mars using SAS/GRAPH®. This presentation will demonstrate how to import SHP file data (using PROC MAPIMPORT) from sources other than SAS and GfK to produce useful (and sometimes creative) maps. Examples will include mapping neighborhoods, ZCTA5 areas, postal codes and of course, Mars. Products used are Base SAS® and SAS/GRAPH®. SAS programmers of any skill level will benefit from this presentation.
PO02 : SAS/GRAPH® and GfK Maps: a Subject Matter Expert Winning Combination
Louise Hadden, Abt Associates Inc.
Monday, 11:00 AM - 11:20 AM, Location: Regency C
SAS® has an amazing arsenal of tools to use and display geographic information that is relatively unknown and underutilized. High quality GfK Geocoding maps have been provided by SAS since SAS 9.3 M2, as sources for inexpensive map data dried up. SAS has been including both GfK and "traditional" SAS map data sets with licenses for SAS/GRAPH for some time, recognizing there will need to be an extended transitional period. However, for those of us who have been putting off converting our SAS/GRAPH mapping programs to use the new GfK maps, the time has come, as the "traditional" SAS map data sets are no longer being updated. If you visit SAS MapsOnline, you will find only GfK maps in current maps. The GfK maps are updated once a year. This presentation will walk through the conversion of a long-standing SAS program to produce multiple US maps for a data compendium to take advantage of GfK maps. Products used are Base SAS® and SAS/GRAPH®. SAS programmers of any skill level will benefit from this presentation.
PO03 : Data Quality Control: Using High Performance Binning to Prevent Information Loss
Lakshmi Nirmala Bavirisetty, Independent SAS User
Deanna Schreiber-Gregory, Henry M Jackson Foundation for the Advancement of Military Medicine
Monday, 2:00 PM - 2:20 PM, Location: Regency C
It is a well-known fact that the structure of real-world data is rarely complete and straight-forward. Keeping this in mind, we must also note that the quality, assumptions, and base state of the data we are working with has a very strong influence on the selection and structure of the statistical model chosen for analysis and/or data maintenance. If the structure and assumptions of the raw data are altered too much, then the integrity of the results as a whole are grossly compromised. The purpose of this paper is to provide programmers with a simple technique which will allow the aggregation of data without losing information. This technique will also check for the quality of binned categories in order to improve the performance of statistical modeling techniques. The SAS® high performance analytics procedure, HPBIN, gives us a basic idea of syntax as well as various methods, tips, and details on how to bin variables into comprehensible categories. We will also learn how to check whether these categories are reliable and realistic by reviewing the WOE (Weight of Evidence), and IV (Information Value) for the binned variables. This paper is intended for any level of SAS User interested in quality control and/or SAS high performance analytics procedures.
PO05 : Let's Get FREQy with our Statistics: Let SAS® Determine the Appropriate Test Statistic Based on Your Data
Lynn Mullins, PPD
Richann Watson, Experis
Tuesday, 10:30 AM - 10:50 AM, Location: Regency C
As programmers, we are often asked to program statistical analysis procedures to run against the data. Sometimes the specifications we are given by the statisticians will outline which statistical procedures to run. But other times, the statistical procedures to use need to be data dependent. To run these procedures based on the results of previous procedures' output requires a little more preplanning and programming. We will present a macro that will dynamically determine which statistical procedure to run based on previous procedure output, as well as, allowing the user to input parameters (for example, fshchi, plttwo, catrnd, bimain, and bicomp) and the macro will return counts, percents and appropriate p-value for Chi vs. Fisher and p-value for Trend and Binomial CI, if applicable.